**CORRELATED NEURONAL ACTIVITY AND ITS RELATIONSHIP TO CODING, DYNAMICS AND NETWORK ARCHITECTURE**

**Topic Editors Robert Rosenbaum, Tatjana Tchumatchenko and Rubén Moreno-Bote**

COMPUTATIONAL NEUROSCIENCE

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-357-8 **DOI** 10.3389/978-2-88919-357-8

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **CORRELATED NEURONAL ACTIVITY AND ITS RELATIONSHIP TO CODING, DYNAMICS AND NETWORK ARCHI-TECTURE**

Topic Editors:

**Robert Rosenbaum,** University of Notre Dame, USA **Tatjana Tchumatchenko,** Max Planck Institute for Brain Research, Germany **Rubén Moreno-Bote,** Parc Sanitari Sant Joan de Déu and Universitat de Barcelona, Spain Centrode Investigación Biomédicaen Redde Salud Mental, Spain

A schematic of spike train correlations arising from shared inputs.

Correlated activity in populations of neurons has been observed in many brain regions and plays a central role in cortical coding, attention, and network dynamics. Accurately quantifying neuronal correlations presents several difficulties. For example, despite recent advances in multicellular recording techniques, the number of neurons from

which spiking activity can be simultaneously recorded remains orders magnitude smaller than the size of local networks. In addition, there is a lack of consensus on the distribution of pairwise spike cross correlations obtained in extracellular multi-unit recordings. These challenges highlight the need for theoretical and computational approaches to understand how correlations emerge and to decipher their functional role in the brain.

# Table of Contents


Pengcheng Zhou, Shawn D. Burton, Nathan N. Urban and G. Bard Ermentrout

*192 Direct Connections Assist Neurons to Detect Correlation in Small Amplitude Noises*

E. Bolhasani, Y. Azizi and A. Valizadeh


Zachary P. Kilpatrick

## Correlated neuronal activity and its relationship to coding, dynamics and network architecture

## *Robert Rosenbaum1,2\*, Tatjana Tchumatchenko3 and Rubén Moreno-Bote4,5*

*<sup>1</sup> Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA*

*<sup>2</sup> Center for the Neural Basis of Cognition, Pittsburgh, PA, USA*

*<sup>3</sup> Department Theory of Neural Dynamics, Max Planck Institute for Brain Research, Frankfurt am Main, Germany*

*<sup>4</sup> Research Unit, Parc Sanitari Sant Joan de Déu and Universitat de Barcelona, Barcelona, Spain*

*<sup>5</sup> Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), Barcelona, Spain*

*\*Correspondence: robertr@pitt.edu*

#### *Edited and reviewed by:*

*Misha Tsodyks, Weizmann Institute of Science, Israel*

**Keywords: neuronal correlations, neural synchrony, neural coding, spike train analysis, neuronal networks, noise correlation**

Correlated and synchronous activity in populations of neurons has been observed in many brain regions and has been shown to play a crucial role in cortical coding, attention, and network dynamics (Singer and Gray, 1995; Salinas and Sejnowski, 2001). However, we still lack a detailed knowledge of the origin and function, if any, of neuronal correlations. In this Research Topic, new ideas about these long standing questions are put forward. One group of studies in this Research Topic investigates the interaction of neuronal correlations with cellular and circuit mechanisms at the level of single neurons and cell pairs. Bolhasani et al. (2013) study the interaction between direct synaptic coupling between two neurons with correlated stochastic input to the neurons. They find that excitatory synaptic coupling can alter the transfer of pairwise correlations from current input to spike output. Interestingly, there is an optimal value of synaptic coupling strength for which the sensitivity of output correlations to input correlations is maximized.

Bird and Richardson (2014) study the interaction between long term plasticity, synaptic vesicle depletion at multiple release sites and presynaptic spiking correlations. They find that there is an optimal number of release sites for driving postsynaptic spiking when synchrony is present in the presynaptic spike trains. Schwalger and Lindner (2013) investigated correlations between the interspike intervals of oscillator model neurons with adaptation. They reveal a fundamental connection between interval correlations and the phase response curve of the neuron model. They also show that when firing rates are high, negative interval correlations cause long-timescale variability of a model neuron's activity to be small.

A second group of studies in this Research Topic investigates neuronal correlations on the level of networks. The key questions that these studies addressing are: (1) How are pairwise and higher order correlations generated in networks and which of them are important for a given network? and (2) How should we uncover and interpret spike train correlations in a given dataset?

Four studies Zhou et al. (2013), Grytskyy et al. (2013), Barreiro et al. (2014), and Jahnke et al. (2013) have focused on the first question.

Zhou et al. (2013)investigated coupled pairs of neurons receiving temporally correlated input currents. They show that pairs of neurons may be more synchronized if they have some degree of heterogeneity in their intrinsic properties. Temporal correlations in the noise that these neurons receive may also promote synchrony.

Grytskyy et al. (2013) have addressed how recurrent neural networks can support the generation of pairwise correlations. The authors put forward a unified framework for the generation of pairwise correlations in recurrent networks and hypothesize that many different single model neurons, when coupled to a network, may generate the same pairwise correlation structures. Interestingly, the authors could show the equivalence of different single neuron models in a linear approximation to a model with fluctuating continuous variables. This could be a useful tool for assessing correlations across models and experiments.

In a complementary study, Barreiro et al. (2014) have focused on the emergence of pairwise and higher order correlations in retina models. The authors find that maximum entropy pairwise models capture surprisingly well the network spiking dynamics. What is surprising about these results is that higher-order correlations in this type of models can be constrained to be far lower than the statistically possible limits and that their strength depends more on the structure of the common input than on the synaptic connectivity profile.

Jahnke et al. (2013) focused on spike patterns rather than correlations and proposed a mechanism for precise spike time pattern generation and replay in neural networks that lack strong densely connected feed-forward structures. The authors put forward the hypothesis that a non-linearity in synaptic summation rules may explain the lack of observed strong feed-forward structures in live networks.

A team lead by Sonja Grün has tackled the second question, how spike correlations may be detected in a given data set. Torre et al. (2013) have extended our methodical toolbox and proposed a new method for the extraction of statistically overrepresented spike patterns that may be the functionally significant "cell assemblies" proposed by Abeles (1982). The challenge this study has taken on is to extract from large number of simultaneously recorded neurons candidate assemblies that are systematically co-activated. This search algorithm may help to reveal how precise multi-neuron synchronization patterns that go beyond the standard pairwise analysis may relate to behavior.

In an opinion article, Zanin and Papo (2013) also address the second question. They suggest that one has to be cautious about interpreting neuronal correlations between neurons or brain areas, because typical measurements of effective connectivity might lead to false positives even when the neurons or the brain areas are indeed performing independent computations.

A third group of studies in this Research Topic addresses the computational advantages of neuronal correlations in the brain. Kilpatrick (2013) studied neuronal networks that sustain bump attractors, a well-established model for the maintenance of spatial cues in working memory tasks (Funahashi et al., 1989; Wimmer et al., 2014). In these models, the position of the bump undergoes a diffusion process, implying that the encoded memory degrades as the time progresses. Notably, Kilpatrick found that connecting several areas with similar bump attractors resulted in an increased stability of the stored memories because the variability within the areas could be averaged out. However, if the variability across areas was correlated, the diffusion of the bump attractor underwent larger variability. This study, therefore, suggests that correlated noise across neuronal areas can impoverish the precision of the encoding of spatial cues in working memory task.

In another study, Dipoppa and Gutkin (2013) found that correlations might have a positive role in working memory tasks by a mechanism that they named "correlation-induced gating." These authors and others have previously showed that correlations tend to destabilize the memory trace of an item stored in working memory. This result might suggest that correlations are deleterious for working memory, but Dipoppa and Gutkin argue that this is not the case: correlations in working memory circuits can be strongly beneficial to suppress the harmful interference of distractors, irrelevant items that do not need to be stored in memory to solve the ongoing task. This study, therefore, shows in an elegant way how changing correlations within specific neuronal population can allow for flexible gating of sensory information into working memory circuits.

Previous works have showed that synchronization between neuronal ensembles might play an important role in the binding of features belonging to a same object (Engel and Singer, 2001). In a theoretical work presented in this Research Topic, Finger and Koenig (2014) took an important step forward by showing that binding of features in natural images can be mediated by phase synchronization in a network of neural oscillators. The authors also found that the network, trained with natural images, developed small-world properties, and even allowed binding of features over long distances. This study strongly supports the idea that neuronal correlations in the brain might play an important computational role.

In a study where the LFP and single-cell activity were recorded in the hippocampal formation of epileptic patients, Alvarado-Rojas et al. (2013) found that activity of a sizable fraction of neurons preceded interictal epileptiform discharges, as measured by LFP activity.

These studies give conspicuous examples for the ambivalent nature of neuronal correlations: in some conditions correlations might be a signature of dynamic instability of the network, but in other conditions correlations might be used to perform complex and flexible computations, such as binding or information gating. Although these works have provided new clues about the role of neuronal correlations, there are yet many unsolved questions, such as how neuronal correlations are generated and propagated (Moreno et al., 2002; Moreno-Bote and Parga, 2006; de la Rocha et al., 2007; Ostojic et al., 2009; Renart et al., 2010; Rosenbaum et al., 2010, 2011; Tchumatchenko et al., 2010; Cohen and Kohn, 2011; Tchumatchenko and Wolf, 2011; Helias et al., 2014) and how correlations are shaped by limited information in sensory inputs and by neuronal computations. It is clear that the study of the impact of neuronal correlations on information transmission and brain computation, and vice versa, is still an arena for exciting new discoveries.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; accepted: 07 August 2014; published online: 27 August 2014. Citation: Rosenbaum R, Tchumatchenko T and Moreno-Bote R (2014) Correlated neuronal activity and its relationship to coding, dynamics and network architecture. Front. Comput. Neurosci. 8:102. doi: 10.3389/fncom.2014.00102*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Rosenbaum, Tchumatchenko and Moreno-Bote. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## When do microcircuits produce beyond-pairwise correlations?

#### *Andrea K. Barreiro1 \*†, Julijana Gjorgjieva2 †, Fred Rieke3 and Eric Shea-Brown1,3*

*<sup>1</sup> Department of Applied Mathematics, University of Washington, Seattle, WA, USA*

*<sup>2</sup> Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK*

*<sup>3</sup> Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA*

#### *Edited by:*

*Robert Rosenbaum, University of Pittsburgh, USA*

#### *Reviewed by:*

*Tim Gollisch, University Medical Center Göttingen, Germany Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany*

#### *\*Correspondence:*

*Andrea K. Barreiro, Department of Mathematics, Southern Methodist University, 3200 Dyer Street, PO Box 750156, Dallas, TX 75275-0156, USA*

*e-mail: abarreiro@smu.edu*

#### *†Present address:*

*Andrea K. Barreiro, Department of Mathematics, Southern Methodist University, Dallas, USA; Julijana Gjorgjieva, Center for Brain Science, Harvard University, Cambridge, USA*

Describing the collective activity of neural populations is a daunting task. Recent empirical studies in retina, however, suggest a vast simplification in how multi-neuron spiking occurs: the activity patterns of retinal ganglion cell (RGC) populations under some conditions are nearly completely captured by pairwise interactions among neurons. In other circumstances, higher-order statistics are required and appear to be shaped by input statistics and intrinsic circuit mechanisms. Here, we study the emergence of higher-order interactions in a model of the RGC circuit in which correlations are generated by common input. We quantify the impact of higher-order interactions by comparing the responses of mechanistic circuit models vs. "null" descriptions in which all higher-than-pairwise correlations have been accounted for by lower order statistics; these are known as pairwise maximum entropy (PME) models. We find that over a broad range of stimuli, output spiking patterns are surprisingly well captured by the pairwise model. To understand this finding, we study an analytically tractable simplification of the RGC model. We find that in the simplified model, bimodal input signals produce larger deviations from pairwise predictions than unimodal inputs. The characteristic light filtering properties of the upstream RGC circuitry suppress bimodality in light stimuli, thus removing a powerful source of higher-order interactions. This provides a novel explanation for the surprising empirical success of pairwise models.

**Keywords: retinal ganglion cells, maximum entropy distribution, stimulus-driven, correlations, computational model**

## **1. INTRODUCTION**

Information in neural circuits is often encoded in the activity of large, highly interconnected neural populations. The combinatoric explosion of possible responses of such circuits poses major conceptual, experimental, and computational challenges. How much of this potential complexity is realized? What do statistical regularities in population responses tell us about circuit architecture? Can simple circuit models with limited interactions among cells capture the relevant information content? These questions are central to our understanding of neural coding and decoding.

Two developments have advanced studies of synchronous activity in recent years. First, new experimental techniques provide access to responses from the large groups of neurons necessary to adequately sample synchronous activity patterns (Baudry and Taketani, 2006). Second, maximum entropy approaches from statistical physics have provided a powerful approach to distinguish genuine higher-order synchrony (correlations) from that explainable by pairwise statistical interactions among neurons (Martignon et al., 2000; Amari, 2001; Schneidman et al., 2003). These approaches have produced diverse findings. In some instances, activity of neural populations is extremely well described by pairwise interactions alone, so that pairwise maximum entropy (PME) models provide a nearly complete description (Shlens et al., 2006, 2009). In other cases, while pairwise models bring major improvements over independent descriptions, it is not clear that they fully capture the data (Martignon et al., 2000; Schneidman et al., 2006; Tang et al., 2008; Yu et al., 2008; Montani et al., 2009; Ohiorhenuan et al., 2010; Santos et al., 2010). Empirical studies indicate that pairwise models can fail to explain the responses of spatially localized triplets of cells (Ohiorhenuan et al., 2010; Ganmor et al., 2011), as well as the activity of populations of ∼100 cells responding to natural stimuli (Ganmor et al., 2011). Overall, the diversity of empirical results highlights the need to understand the network and input features that control the statistical complexity of synchronous activity patterns.

Several themes have emerged from efforts to link the correlation structure of spiking activity to circuit mechanisms using both abstract (Amari et al., 2003; Krumin and Shoham, 2009; Macke et al., 2009; Roudi et al., 2009a) and biologically-based models (Bohte et al., 2000; Martignon et al., 2000; Roudi et al., 2009b); these models, however, do not provide a full description for why the PME models succeed or fail to capture neural circuit dynamics. First, thresholding non-linearities in circuits with Gaussian input signals can generate correlations that cannot be explained by pairwise statistics (Amari et al., 2003); the deviations from pairwise predictions are modest at moderate population sizes (Macke et al., 2009), but may become severe as population size grows large (Amari et al., 2003; Macke et al., 2011). The pairwise model also fails in networks of recurrent integrate-and-fire units with adapting thresholds and refractory potassium currents (Bohte et al., 2000). The same is true for "Boltzmann-type" networks with hidden units (Koster et al., 2013). Finally, small groups of model neurons that perform logical operations can be shown to generate higher-order interactions by introducing noisy processes with synergistic effects (Schneidman et al., 2003), but it is unclear what neural mechanisms might produce similar distributions. These diverse findings point to the important role that circuit features and mechanisms—input statistics, input/output relationships, and circuit connectivity—can play in regulating higher-order interactions. Nevertheless, we lack a systematic understanding that links these features and their combinations to the success and failure of pairwise statistical models.

A second theme that has emerged is the use of perturbation approaches to explain why maximum entropy models with purely pairwise interactions capture circuit behavior in the limit in which the population firing rate is very low (i.e., the total number of firing events from all cells in the same small time window is small) (Cocco et al., 2009; Roudi et al., 2009a; Tkacik et al., 2009). Also in this regime, higher-order interactions cannot be introduced as an artifact of under-sampling the network (Tkacik et al., 2009), a concern at higher population firing rates. However, the low to moderate population firing rates observed in many studies permit *a priori* a fairly broad range in the quality of pairwise fits. What is left to explain then is why circuits operating outside the low population firing rate regime often produce fits consistent with the PME model.

We approach this issue here by systematically characterizing the ability of PME models to capture the responses of a class of circuit models with the following defining features. First, we consider relatively small circuits of 3–16 cells, each with identical intrinsic dynamics (i.e., spike-generating mechanism and level of excitability). Second, we assume a particular structure for inputs across the circuit. Each neuron receives the same global input which, for example, represents stimuli in the receptive fields of all modeled cells. Neurons also receive an independent, Gaussian-like noise term. Third, the circuit has either no reciprocal coupling, or has all-to-all excitatory or gap junction coupling. We begin with circuit models fully constrained by measured properties of primate ON parasol ganglion networks, receiving full-field and checkerboard light inputs. We then explore a simple thresholding model for which we exhaustively search over the entire parameter space.

We identify general principles that describe higher-order spike correlations in the circuits we study. First, in all cases we examined, the overall strength of higher-order correlations are constrained to be far lower than the statistically possible limits. Second, for the higher-order correlations that do occur, the primary factor that determines how significant they will be is the bimodal vs. unimodal profile of the common input signal. A secondary factor is the strength of recurrent coupling, which has a non-monotonic impact on higher-order correlations. Our findings provide insight into why some previously measured activity patterns are well captured by PME descriptions, and provide predictions for the mechanisms that allow for higher-order spike correlations to emerge.

## **2. RESULTS**

## **2.1. QUANTIFYING HIGHER-ORDER CORRELATIONS IN NEURAL CIRCUITS**

One strategy to identify higher-order interactions is to compare multi-neuron spike data against a description in which any higher-order interactions have been removed in a principled way—that is, a description in which all higher-order correlations are completely described by lower-order statistics. Such a description may be given by a maximum entropy model (Jaynes, 1957a,b; Amari, 2001), in which one identifies the most unstructured, or maximum entropy, distribution consistent with the constraints. Comparing the predicted and measured probabilities of different responses tests whether the constraints used are sufficient to explain observed network activity, or whether additional constraints need to be considered. Such constraints would produce additional structure in the predicted response distribution, and hence lower the entropy.

A common approach is to limit the constraints to a given statistical order—for example, to consider only the first and second moments of the distributions, which are determined by the mean and pairwise interactions. In the context of spiking neurons, we denote μ*<sup>i</sup>* ≡ **E**[*xi*] as the firing rate of neuron *i* and ρˆ*ij* ≡ **E**[*xixj*] as the joint probability that neurons *i* and *j* will fire. The distribution with the largest entropy for a given μ*<sup>i</sup>* and ρˆ*ij* is referred to as the *PME* model.

We use the Kullback–Leibler divergence, *D*KL(*P*, *P*˜), to quantify the accuracy of the PME approximation *P*˜ to a distribution *P*. This measure has a natural interpretation as the contribution of higher-order interactions to the response entropy *S*(*P*) (Amari, 2001; Schneidman et al., 2003), and may in this context be written as the difference of entropies *S*(*P*˜) − *S*(*P*). In addition, *D*KL(*P*, *P*˜) is approximately − log2 *L*, where *L* is the average likelihood (over different observations) that a sequence of data drawn from the distribution *P* was instead drawn from the model *P*˜ (Cover and Thomas, 1991; Shlens et al., 2006). For example, if *D*KL(*P*, *P*˜) = 1, the average likelihood that a single sample, i.e., a single network response, came from *P*˜ relative to the likelihood that it came from *P* is 2−<sup>1</sup> (we use the base 2 logarithm in our definition of the Kullback–Leibler divergence, so all numerical values are in units of bits).

An alternative measure of the quality of the pairwise model comes from normalizing *D*KL(*P*, *P*˜) by the corresponding distance of the distribution *P* from an *independent maximum entropy* fit *D*KL(*P*, *P*1), where *P*<sup>1</sup> is the highest entropy distribution consistent with the mean firing rates of the cells (equivalently, the product of single-cell marginal firing probabilities) (Amari, 2001). Many studies (Schneidman et al., 2006; Shlens et al., 2006, 2009; Roudi et al., 2009a) use

$$\Delta = 1 - \frac{D\_{\rm KL}\left(P, \tilde{P}\right)}{D\_{\rm KL}\left(P, P\_1\right)};\tag{1}$$

a value of - = 1 indicates that the pairwise model perfectly captures the additional information left out of the independent model, while a value of -= 0 indicates that the pairwise model gives no improvement over the independent model. To aid comparison with other studies, we report values of in parallel with *D*KL(*P*, *P*˜) when appropriate.

We next explore and interpret the achievable range of *D*KL(*P*, *P*˜) values. The problem is made simpler if, following previous studies (Bohte et al., 2000; Amari, 2001; Macke et al., 2009; Montani et al., 2009), we consider only permutation-symmetric spiking patterns, in which the firing rate and correlation do not depend on the identity of the cells; i.e., μ*<sup>i</sup>* = μ, ρˆ*ij* = ˆρ for *i* = *j*. We start with three cells having binary responses and assume that the response is stationary and uncorrelated in time. From symmetry, the possible network responses are

$$\begin{aligned} p\_0 &= P\left[ (0,0,0) \right] \\ p\_1 &= P\left[ (1,0,0) \right] = P\left[ (0,1,0) \right] = P\left[ (0,0,1) \right] \\ p\_2 &= P\left[ (1,1,0) \right] = P\left[ (1,0,1) \right] = P\left[ (0,1,1) \right] \\ p\_3 &= P\left[ (1,1,1) \right] \end{aligned}$$

where *pi* denotes the probability that a particular set of *i* cells spike and the remaining 3 − *i* do not. Possible values of (*p*0, *p*1, *p*2, *p*3) are constrained by the fact that *P* is a probability distribution, so that the sum of *pi* over all eight states is one.

To assess the numerical significance of *D*KL(*P*, *P*˜), we can compare it with the maximal achievable value for any symmetric distribution on three spiking cells. For three cells, the maximal value is *D*KL(*P*, *P*˜) = 1 (or 1/3 bits per neuron), achieved by the XOR operation (Schneidman et al., 2003). This distribution is illustrated in **Figure 1A** (right), together with two distributions produced by our mechanistic circuit models illustrating observed deviations from PME fits for unimodal (left) and bimodal (middle) distributions of inputs (see below). The *KL*-divergence for these two patterns is 0.0013 and 0.091, respectively. As suggested by these bar plots (and explored in detail below), the distributions produced by a wide set of mechanistic circuit models are quite well captured by the PME approximation: to use the likelihood interpretation described above, an observer would need to draw many more samples from these distributions in order to distinguish between the true and model distributions: ≈1000 times and ≈10 times, respectively, in comparison to the XOR operator.

To further identify appropriate "benchmark" values of *D*KL(*P*, *P*˜) with which to compare our mechanistic circuit models, in **Figure 1B** we show plots of *D*KL(*P*, *P*˜) and vs. firing rate produced by an exhaustive sampling of symmetric distributions on three cells. From this picture, we can see that it is possible to find symmetric, three-cell spiking distributions that are poorly fit by the pairwise model at a range of firing rates and pairwise correlations, with the largest values of *D*KL(*P*, *P*˜)found at low correlations (note that the XOR distribution has an average pairwise covariance of zero (i.e., **E**[*X*1*X*2] = **E**[*X*1] **E**[*X*2])).

#### *2.1.1. A condition for higher-order correlations*

Possible solutions to the symmetric PME problem take the form of exponential functions characterized by two parameters, λ<sup>1</sup> and λ2, which serve as Lagrange multipliers for the constraints:

$$P\left[\left(\mathbf{x}\_{1},\mathbf{x}\_{2},\mathbf{x}\_{3}\right)\right] = \frac{1}{Z} \exp\left[\lambda\_{1}\left(\mathbf{x}\_{1} + \mathbf{x}\_{2} + \mathbf{x}\_{3}\right) + \right.\tag{2}$$

$$\lambda\_{2}\left(\mathbf{x}\_{1}\mathbf{x}\_{2} + \mathbf{x}\_{2}\mathbf{x}\_{3} + \mathbf{x}\_{1}\mathbf{x}\_{3}\right)\right].\tag{2}$$

**FIGURE 1 | A survey of the quality of the pairwise maximum entropy (PME) model for symmetric spiking distributions on three cells. (A)** Probability distribution *P* (dark blue) and pairwise approximation *P*˜ (light pink) for three example distributions. From left to right: an example from the simple sum-and-threshold model receiving skewed common input; an example from the sum-and-threshold model receiving bimodal common input [specifically, the distribution with maximal *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜)]; a specific probability distribution resulting from application of the XOR operator [for

illustration of a "worst case" fit of the PME model (Schneidman et al., 2003)]. **(B)** *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) vs. firing rate and vs. firing rate, for a comprehensive survey of possible symmetric spiking distributions on three cells (see text for details). Firing rate is defined as the probability of a spike occurring per cell per random draw of the sum-and-threshold model, as defined in Equation (16). Color indicates output correlation coefficient ρ ranging from black for ρ ∈ (0, 0.1), to white for ρ ∈ (0.9, 1), as illustrated in the color bars.

The factor *Z* normalizes *P* to be a probability distribution.

By combining individual probabilities of events as given by Equation (2) the following relationship must be satisfied by any symmetric PME solution:

$$\frac{\mathcal{P}\_3}{\mathcal{P}\_0} = \left(\frac{\mathcal{P}\_2}{\mathcal{P}\_1}\right)^3. \tag{3}$$

This is equivalent to the condition that the *strain* measure of Ohiorhenuan and Victor (2010) be zero (in particular, the strain is negative whenever *p*3/*p*<sup>0</sup> − (*p*2/*p*1)<sup>3</sup> < 0, a condition identified in Ohiorhenuan and Victor (2010) as corresponding to sparsity in the neural code).

For three-cell, symmetric networks, models that exactly satisfy Equation (3) will also be exactly described via PME. Moreover, note that probability models that meet this constraint fall on a surface in the space of (normalized) histograms, given by the probabilities *pj*. One can verify by straightforward calculations (see Appendix) that—given fixed lower order moments— *D*KL(*P*, *P*˜) is a convex function of the probabilities *pj*. This has interesting consequences for predicting when large vs. small values of *D*KL(*P*, *P*˜) will be found (see Appendix).

It is not necessary to assume permutation symmetry when deriving the PME fit *P*˜ to an observed distribution *P*, or in computing derived quantities such as *D*KL(*P*, *P*˜), and we do not do so in this study. However, most of the distributions we study are derived from mechanistic models that are themselves symmetric or near-symmetric. Therefore, we anticipate that the simplified calculations for permutationsymmetric distributions will yield analytical insight into our findings.

#### **2.2. MECHANISMS THAT IMPACT BEYOND-PAIRWISE CORRELATIONS IN TRIPLETS OF ON-PARASOL RETINAL GANGLION CELLS**

Having established the range of beyond-pairwise correlations that are possible statistically, we turn our focus to coding in retinal ganglion cell (RGC) populations, an area that has received a great deal of attention empirically. Specifically, PME approaches have been effective in capturing the activity of small RGC populations (Schneidman et al., 2006; Shlens et al., 2006, 2009). This success does not have an obvious anatomical correlate; there are multiple opportunities in the retinal circuitry for interactions among three or more ganglion cells. We explored circuits composed of three RGC cells with input statistics, recurrent connectivity and spike-generating mechanisms based directly on experiment. We based our model on ON parasol RGCs, one of the RGC types for which PME approaches have been applied extensively (Shlens et al., 2006, 2009). In addition, by examining how marginal input statistics are shaped by stimulus filtering, we also reveal the role that the specific filtering properties of ON parasol cells have in shaping higher-order interactions.

#### *2.2.1. RGC model*

We modeled a single ON parasol RGC in two stages (for details see section 4). First, we characterized the light-dependent excitatory and inhibitory synaptic inputs to cell *k* (*g*exc *<sup>k</sup>* (*t*), *<sup>g</sup>*inh *<sup>k</sup>* (*t*)) in response to randomly fluctuating light inputs *sk*(*t*) via a linearnonlinear model, e.g.,:

$$g\_k^{\text{exc}}(t) = N^{\text{exc}} \left[ L^{\text{exc}} \* s\_k(t) + \eta\_k^{\text{exc}} \right],\tag{4}$$

where *N*exc is a static non-linearity, *L*exc is a linear filter, and ηexc *<sup>k</sup>* is an effective input noise that captures variability in the response to repetitions of the same time-varying stimulus. These parameters were determined from fits to experimental data collected under conditions similar to those in which PME models have been tested empirically (Shlens et al., 2006, 2009; Trong and Rieke, 2008). The modeled excitatory and inhibitory conductances captured many of the statistical features of the real conductances, particularly the correlation time and skewness (data not shown).

Second, we used Equation (4) and an equivalent expression for *g*inh *<sup>k</sup>* (*t*) as inputs to an integrate-and-fire model incorporating a non-linear voltage and history-dependent term to account for refractory interactions between spikes (Badel et al., 2007, 2008). The voltage evolution equation was of the form

$$\frac{dV}{dt} = F\left(V, t - t\_{\text{last}}\right) + \frac{I\_{\text{input}}(t)}{C},\tag{5}$$

where *F* (*V*, *t* − *t*last) was allowed to depend on the time of the last spike *t*last. Briefly, we obtained data from a dynamic clamp experiment (Sharpe et al., 1993; Murphy and Rieke, 2006) in which currents corresponding to *g*exc(*t*) and *g*inh(*t*) were injected into a cell and the resulting voltage response measured. The input current *I*input injected during one time step was determined by scaling the excitatory and inhibitory conductances by driving forces based on the measured voltage in the previous time step; that is,

$$I\_{\rm input}(t) = -\mathbf{g}^{\rm exc}(t) \left( V - V\_E \right) - \mathbf{g}^{\rm inh}(t) \left( V - V\_I \right), \qquad (6)$$

We used this data to determine *F* and *C* using the procedure described in Badel et al. (2007); details, including values of all fitted parameters, are described in section 4. Recurrent connections were implemented by adding an input current proportional to the voltage difference between the two coupled cells.

The prescription above provided a flexible model that allowed us to study the responses of three-cell RGC networks to a wide range of light inputs and circuit connectivities. Specifically, we simulated RGC responses to light stimuli that were (1) constant, (2) time-varying and spatially uniform, and (3) varying in both space and time. Correlations between cell inputs arose from shared stimuli, from shared noise originating in the retinal circuitry (Trong and Rieke, 2008), or from recurrent connections (Dacey and Brace, 1992; Trong and Rieke, 2008). Shared stimuli were described by correlations among the light inputs *sk*. Shared noise arose via correlations in ηexc *<sup>k</sup>* and <sup>η</sup>ink *<sup>k</sup>* as described in section 4. The recurrent connections were chosen to be consistent with observed gap-junctional coupling between ON parasol cells. We also investigated how stimulus filtering by *L*exc and *L*inh influenced network statistics. To compare our results with empirical studies, constant light, and spatially and temporally fluctuating checkerboard stimuli were used as in Shlens et al. (2006, 2009).

## *2.2.2. The feedforward RGC circuit is well-described by the PME model for full-field light stimuli*

We start by considering networks without recurrent connectivity and with constant, full-field (i.e., spatially uniform) light stimuli. Thus, we set *sk*(*t*) = 0 for *k* = 1, 2, 3, so that the cells received only Gaussian correlated noise ηexc *<sup>k</sup>* and <sup>η</sup>inh *<sup>k</sup>* and constant excitatory and inhibitory conductances. Time-dependent conductances were generated and used as inputs to a simulation of three model RGCs. Simulation length was sufficient to ensure significance of all reported deviations from PME fits (see section 4). We found that the spiking distributions were strikingly well-modeled by a PME fit, as shown in the righthand panel of **Figure 2A**; *D*KL - *P*, *P*˜ is 2.90 × 10−<sup>5</sup> bits. This result is consistent with the very good fits found experimentally in Shlens et al. (2006) under constant light stimulation.

Next, we introduce temporal modulation into the full-field light stimuli such that each cell received the same stimulus, *sk*(*t*) = *s*(*t*), where *s*(*t*) refreshed every few milliseconds with an independently chosen value from one of several marginal distributions. For our initial set of experiments, the marginal distribution was either Gaussian (as in Ganmor et al., 2011) or binary (as used in Shlens et al., 2006). For both choices, we explored inputs with a range of standard deviations (1/16, 1/12, 1/8, 1/6, 1/4, 1/3, or 1/2 of a baseline light intensity) and refresh rates (8, 40, or 100 ms). The shared stimulus produced strong pairwise correlation between conductances of neighboring cells. However, values of *D*KL(*P*, *P*˜) remained small, under 10−<sup>2</sup> bits in all conditions tested.

## *2.2.3. Impact of stimulus spatial scale*

We next asked whether PME models capture RGC responses to stimuli with varying spatial scales. We fixed stimulus dynamics to match the two cases that yielded the highest *D*KL(*P*, *P*˜) under the full-field protocol: for both Gaussian and binary stimuli, we used 8 ms refresh rate and σ = 1/2. The stimulus was generated as a random checkerboard with squares of variable size; each square in the checkerboard, or *stixel*, was drawn independently from the appropriate marginal distribution and updated at the corresponding refresh rate. The conductance input to each RGC was then given by convolving the light stimulus with its receptive field, where the stimulus was positioned with a fixed rotation and translation relative to the receptive fields. This position was drawn randomly at the beginning of each simulation and held constant throughout (see insets of **Figures 3B,C** for examples, and section 4 for further details).

The RGC spike patterns remained very well described by PME models for the full range of spatial scales. **Figure 3A** shows this by plotting *D*KL(*P*, *P*˜) vs. stixel size. Values of *D*KL(*P*, *P*˜) increased with spatial scale, sharply rising beyond 128μm, where a stixel had approximately the same size as a receptive field center, illustrating that introducing spatial scale via stixels produces even closer fits by PME models (the points at 512μm correspond to the full-field simulations).

Values reported in **Figure 3A** are *averages* of *D*KL(*P*, *P*˜) produced by five random stimulus positions. At stixel sizes of 128μm and 256μm, the resulting spiking distributions differed significantly from position to position; in **Figure 3B**, we show the probabilities of the distinct singlet [e.g., *P*(1, 0, 0)] and doublet [e.g., *P*(1, 1, 0)] spiking events produced at 256μm. Each stimulus position created a "cloud" of dots (identified by color); large dots show the average over 20 sub-simulations. Each subsimulation was identified by a small dot of the same color; because the simulations were very well-resolved, most of them were contained within the large dots (and hence not visible in the figure). Heterogeneity across stimulus positioning is indicated by the distinct positioning of differently colored dots. At smaller spatial scales, the process of averaging stimuli over the receptive fields resulted in spiking distributions that were largely unchanged with stimulus position, as shown in **Figure 3C**, where singlet and doublet spiking probabilities are plotted for 60μm stixels. Thus, filtered light inputs were largely homogeneous from cell to cell, as each receptive field sampled a similar number of independent, statistically identical inputs; the inset of **Figure 3C** shows the projection of input stixels onto cell receptive fields from an example with 60μm stixels. The resulting excitatory conductances and spiking patterns were very close to cell-symmetric (see **Figures S2B,C**).

By contrast, spiking patterns showed significant heterogeneity from cell to cell when the stixel size was large, as illustrated in **Figure 3B**. This arises because each cell in the population may be located differently with respect to stixel boundaries, and therefore receive a distinct pattern of input activity; this is illustrated by the inset of **Figure 3B**, which shows the projection of input stixels onto cell receptive fields from one such simulation. However, PME models gave excellent fits to data regardless of heterogeneity in RGC responses (see **Figures S2E,F**); as seen in **Figure 3A**, over all 20 sub-simulations, and over all individual stixel positions, we found a maximal *D*KL(*P*, *P*˜) value of 0.00811.

## *2.2.4. Conductance profiles and impact of stimulus filtering*

Intrigued by the consistent finding of low values of *D*KL(*P*, *P*˜) from the RGC model circuit despite stimulation by a wide variety of highly correlated stimulus classes, we sought to further characterize the processing of light stimuli by this circuit. In particular, we examined the effects of different marginal statistics of light stimuli, standard deviation of full-field flicker, and refresh rate on the marginal distributions of excitatory conductances. We focused on excitatory conductances because they exhibit stronger correlations than inhibitory conductances in ON parasol RGCs (Trong and Rieke, 2008).

With constant light stimulation (no temporal modulation) the excitatory conductances were unimodal and broadly Gaussian (**Figure 2A**, middle panel). For a short refresh rate (8 ms) or small flicker size (standard deviation 1/6 or 1/4 of baseline light intensity), temporal averaging via the filter *L*exc and the approximately linear form of *N*exc over these light intensities produced a unimodal, modestly skewed distribution of excitatory conductances, regardless of whether the flicker was drawn from a Gaussian or binary distribution (see **Figures 2B,C**, center panels). For a slower refresh rate (100 ms) and large flicker size (s.d. 1/3 or 1/2 of baseline light intensity), excitatory conductances had multi-modal and skewed features, again regardless of whether the flicker was drawn from a Gaussian or binary distribution (**Figure 2D**). Other parameters being equal, binary light input

**FIGURE 2 | Results for RGC simulations with constant light and full-field flicker. (A–C)** (Left) A histogram and time series of stimulus, (center) a histogram of excitatory conductances and (right) the resulting distribution of spiking patterns. Stimuli are shown as deviations from a baseline intensity, expressed as a fraction of the baseline. Right panels show the probability distribution on spiking patterns *P* obtained from simulation ("Observed"; dark blue), and the corresponding pairwise approximation *P*˜ ("PME"; light pink). Each row gives these results for a different stimulus condition. **(A)** No stimulus (Gaussian noise only). **(B)** Gaussian input, standard deviation 1/6, refresh rate

8 ms. **(C)** Binary input, standard deviation 1/3, refresh rate 8 ms. **(D)** Binary input, standard deviation 1/3, refresh rate 100 ms. For panel **(D)**, the data in the left panel differs. (Left, top panel) The excitatory filter *L*exc(*t*) (Equation 7) is shown instead of a stimulus histogram; (Left, bottom panel) the normalized excitatory conductance, as a function of time (red dashed line), is superimposed on the stimulus (blue solid). (Center) The histogram of excitatory conductances and (right) the resulting distribution of spiking patterns. Both the form of the filter and the conductance trace illustrate that the LN model that processes light input acts as a (time-shifted) high pass filter.

produced more skewed conductances. While some conductance distributions had multiple local maxima, these were never well separated, with the envelope of the distribution still resembling a skewed distribution.

The mechanism that leads to unimodal distributions of conductances, even when light stimuli are binary, is high-pass filtering—a consequence of the differentiating linear filter in Equation (7) and illustrated in **Figure 2D**. To demonstrate this, we constructed an alternative filter with a more monophasic shape [Equation (9), illustrated in **Figure S1**] and compared the excitatory conductance distributions side-by-side. We saw a striking difference in the response to long time scale, binary stimuli: the distributions produced by the monophasic filter reflected the bimodal shape of the input. Interestingly, the resulting simulation produced eight-times greater *D*KL(*P*, *P*˜) (**Figure 4**). This suggests that greater *D*KL(*P*, *P*˜) may occur when ganglion cell inputs are primarily characterized via monophasic filters, e.g., at low mean light levels for which the retinal circuit acts to primarily integrate, rather than differentiate over time.

In **Figure 4A**, we examine this effect over all full-field stimulus conditions by plotting *D*KL(*P*, *P*˜) from simulations with the monophasic filter, against *D*KL(*P*, *P*˜) from simulations in which the original filter was used with the same stimulus type. An increase in *D*KL(*P*, *P*˜) was observed across stimulus conditions, with a markedly larger effect for longer refresh rates. This consistent change could not be attributed to changes in lower order statistics; there was no consistent relationship between the change in pairwise model performance and either firing rate or pairwise correlations (data not shown). Instead, large effects in *D*KL were accompanied by a striking increase in the bi- or multi-modality of excitatory conductances (see **Figure 4B**). In **Figure 4C**, we show an example stimulus and excitatory current trace taken from the simulation shown in **Figure 4B**: the monophasic filter allows the excitatory synaptic currents to track a long-timescale, bimodal stimulus with higher fidelity, transferring the bimodality of the stimulus into the synaptic currents. This finding was robust to specifics of the filtering process; we were able to reproduce the same results by designing integrating filters in different ways (data not shown).

## *2.2.5. Recurrent connectivity in the RGC circuit*

We next considered the role of recurrence in shaping higherorder interactions by incorporating gap junction coupling into our simulations. We did this separately for each full-field stimulus

**FIGURE 3 | Results for RGC simulations with light stimuli of varying spatial scale ("stixels"). (A)** Average *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) as a function of stixel size. Values were averaged over five stimulus positions, each with a different (random) stimulus rotation and translation; 512 μm corresponds to full-field stimuli. For the rest of the panels, data from the binary light distributions is shown; results from the Gaussian case are similar. **(B,C)** Probability of singlet and doublet spiking events, under stimulation by movies of 256μm **(B)** and 60μm **(C)** stixels. Event probabilities are plotted in 3-space, with the *x*, *y*, and *z* axes identifying the singlet

(doublet) events 001 (011), 010 (101), and 100 (110), respectively. The black dashed line indicates perfect cell-to-cell homogeneity (e.g., *P*[(1, 0, 0)] = *P*[(0, 1, 0)] = *P*[(0, 0, 1)]). Both individual runs (dots) and averages over 20 runs (large circles) are shown, with averages outlined in black (singlet) and gray (doublet). Different colors indicate different stimulus positions. Insets: contour lines of the three receptive fields (at the 1 and 2 SD contour lines for the receptive field center; and at the zero contour line) superimposed on the stimulus checkerboard (for illustration, pictured in an alternating black/white pattern).

filters. The marginal statistics and refresh rate are illustrated by icons inside black circles; here, binary stimuli with refresh rate 100 ms. The input standard deviation (expressed as a fraction of baseline light intensity) was 1/2. **(C)** Time course of stimulus and resulting excitatory conductances, from simulation shown in **(B)**: original (top) vs. monophasic (bottom) filters.

condition described earlier. In each case, we added gap junction coupling with strengths from 1 to 16 times an experimentally measured value (Trong and Rieke, 2008), and compared the resulting *D*KL with that obtained without recurrent coupling (**Figure 5**).

At the experimentally measured coupling strength (*g*gap = 1.1 nS) itself, the fit of the pairwise model barely changed (**Figure 5A**) from the model without coupling. At twice the measured coupling strength (*g*gap = 2.2 nS), recurrent coupling had increased higher-order interactions, as measured by larger values of *D*KL for all tested stimulus conditions. Higher order interactions could be further increased, particularly for long refresh rates (100 ms), by increasing the coupling strength to four or eight times its baseline level (*g*gap = 4.4 nS or *g*gap = 8.8 nS; see **Figures 5B,C**). Consistent with the intuition that very strong coupling leads to "all-or-none" spiking patterns, *D*KL(*P*, *P*˜) decreased as *g*gap increased further, often to a level below what was seen in the absence of coupling (**Figure 5D**). In summary, the impact of coupling on *D*KL is maximized at intermediate values of the coupling strength. However, the impact of recurrent coupling on the maximal values of *D*KL evoked by visual stimuli is small overall, and almost negligible for experimentally measured coupling strengths.

#### *2.2.6. Modeling heavy-tailed light stimuli in the RGC circuit*

Finally, we repeated the full-field, recurrent, and alternate filter simulations previously described with light stimuli drawn from either Cauchy or heavy-tailed distributions: such distributions

**FIGURE 5 | The impact of recurrent coupling on RGC networks with full-field visual stimuli.** The strength of gap junction connections was varied from a baseline level (relative magnitude *g* = 1, or absolute magnitude *g*gap = 1.1 nS) to an order of magnitude larger (*g* = 16, or *<sup>g</sup>*gap <sup>=</sup> <sup>17</sup>.6 nS). In each panel, *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) obtained with coupling is plotted vs. the value obtained for the same stimulus ensemble without coupling, for each of 42 different stimulus ensembles. **(A)** *g*gap = 1.1 nS (experimentally observed value); **(B)** *g*gap = 4.4 nS; **(C)** *g*gap = 8.8 nS; **(D)** *g*gap = 17.6 nS.

have been found to model the frequency of occurrence of luminance values in photographs of natural scenes (Ruderman and Bialek, 1994). In contrast to previous results with Gaussian and bimodal inputs, here we found very low *D*KL(*P*, *P*˜) over all stimulus conditions: the largest values found were more than an order of magnitude smaller than those obtained earlier. Specifically, for all conditions, we found *D*KL(*P*, *P*˜) < 4.5 × 10<sup>−</sup>4, over all 42 network realizations; for many simulations, this number did not meet a threshold for statistical significance (see section 4.1.7), indicating that *P* and *P*˜ were not statistically distinguishable. Using a more monophasic filter resulted in no apparent consistent change to *D*KL(*P*, *P*˜). When gap junction coupling was added, *D*KL(*P*, *P*˜) was maximized at an intermediate value; when *g*gap = 8.8, all simulations produced a statistically significant *D*KL(*P*, *P*˜) ≈ 3 − 4 × 10<sup>−</sup>3. However, overall levels remained relatively low, roughly 1/2 the value achieved with Gaussian or binary stimuli.

To explain these findings, we examined the excitatory input currents: we found that over a broad range of refresh rates and stimulus variances, the marginal distributions of excitatory input conductances produced were remarkably unimodal in shape, and showed little skewness (**Figure 6A**). By examining the time evolution of the filtered stimuli (see **Figure 6B**), we see that heavytailed distributions allow rare, large events, but at the expense of medium-size events which explore the full range of the linearnonlinear model used for stimulus processing (compare the blue with the red/green traces). When combined with the Gaussian background noise, this produces near-Gaussian excitatory conductances and, as may be expected from our original full-field simulations, very low *D*KL.

We hypothesize that the methodology of averaging over the entire stimulus ensemble may not capture the significance of rare events that may individually be detected with high fidelity: *D*KL was low even for full-field, high variance stimuli, which presumably caused (infrequent) global spiking events. Additionally, an important avenue for future work would be to test the ability of our RGC model, which was trained on Gaussian stimuli, to accurately model the response of a ganglion cell to stimuli whose variance is dominated by large events. Recent work examining the adaptation of retinal filtering properties to higher-order input statistics found little evidence of adaptation; however, the stimuli used in this work incorporated significant kurtosis but not heavy tails (Tkacik et al., 2012).

#### *2.2.7. Summary of findings for RGC circuit*

In summary, we probed the spiking response of a small array of RGC models to changes in light stimuli, gap junction coupling, and stimulus filtering properties, and identified two circumstances in which higher-order interactions were robustly generated in the spiking response. First, higher-order interactions were generated when excitatory currents had bimodal structure; we observed such structure when bimodal light stimuli was processed by a relatively monophasic filter. Secondly, higher-order interactions were maximized at an intermediate value of gap junction coupling; this value was, however, much larger (eight times) than the experimentally observed coupling strength.

**FIGURE 6 | Results for RGC simulations with heavy-tailed inputs. (A)** Histograms of excitatory conductances, for the original (left) vs. monophasic (right) filter. The marginal statistics are heavy-tailed skew (top) and Cauchy (bottom) inputs, and refresh rate is 40 ms for both panels. The input standard deviation (expressed as a fraction of baseline light intensity) was 1/2 for both simulations. **(B)** Sample 100 ms stimuli, filtered by the original linear filter *L*exc (top) and altered, monophasic filter *L*exc,M(bottom). Cauchy (blue solid), Gaussian (red dashed), and bimodal (green dash-dotted) stimuli are shown.

## **2.3. A SIMPLIFIED CIRCUIT THAT EXPLAINS TRENDS IN RGC CELL MODEL**

#### *2.3.1. Setup and motivation*

In the previous section, we developed results for a computational model tuned to a very specific cell type; we now ask whether these findings will hold for a more general class of neural circuits, or whether they are the consequence of system-specific features. To answer this question, we considered a simplified model of neural spiking: a feedforward circuit in which three spiking cells sum their inputs and spike according to whether or not they cross a threshold. Such highly idealized models of spiking have a long history in neuroscience (McCulloch and Pitts, 1943) and have been recently shown to predict the pairwise and higherorder activity of neural groups in both neural recordings and more complex dynamical spiking models (Nowotny and Huerta, 2003; Tchumatchenko et al., 2010; Yu et al., 2011; Leen and Shea-Brown, 2013).

In more detail, each cell *j* received an independent input *Ij* and a "triplet"—(global) input *Ic* that is shared among all three cells. Comparison of the total input *Sj* = *Ic* + *Ij* with a threshold determined whether or not the cell spiked in that random draw. An additional parameter, *c*, identified the fraction of the total input variance σ<sup>2</sup> originating from the global input; that is, *c* ≡ Var[*Ic*]/Var[*Ic* + *Ij*]. The global input was chosen from one of several marginal distributions, which included those used in the RGC model: Gaussian, bimodal, and heavytailed. The independent inputs *Ij* were, in all cases, chosen from a Gaussian distribution, consistent with our RGC model. When the common inputs are Gaussian, our model is equivalent to the Dichotomized Gaussian model previously studied by several groups (Amari et al., 2003; Macke et al., 2009, 2011; Yu et al., 2011), cf. (Tchumatchenko et al., 2010). For further details, see section 4.2.

In the RGC model large effects in *D*KL were accompanied by a striking increase in the bi- or multi-modality of excitatory conductances. Why are bimodal inputs, shared across cells, able to produce spiking responses that deviate from the pairwise model? We use our simple thresholding model to provide some intuition for how bimodal common inputs to thresholding cells lead to spiking probabilities that violate the constraints (Equation 3) which must hold for the pairwise model. For example, suppose that the common input *Ic* can take on values that cluster around two separated values, μ*<sup>A</sup>* < μ*B*, but rarely in the interval between; that is, the distribution of *Ic* is *bimodal*. If μ*<sup>B</sup>* is large enough to push the cells over threshold but μ*<sup>A</sup>* is not, then we see that any contribution to the right-hand side of Equation (3), *p*2/*p*1, depends only on the distribution of the independent inputs *Ij*; if either one or two cells spike, then the common input must have been drawn from the cluster of values around μ*A*, because otherwise all three cells would have spiked.

To be concrete, let *P*[**x**] refer to the probability of spiking event **x** = (*x*1, *x*2, *x*3), and *P*[**x** | *Ic* ≈ μ*A*] refer to the probability that **x** occurs, conditioned on the event *Ic* ≈ μ*A*. Then

$$P\left[\left(1,0,0\right)\right] = P\left[\left(1,0,0\right) \mid I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{A}\right]P\left[I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{A}\right]$$

$$+ P\left[\left(1,0,0\right) \mid I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{B}\right]P\left[I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{B}\right]$$

$$= P\left[\left(1,0,0\right) \mid I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{A}\right]P\left[I\_{\mathfrak{c}} \approx \mathfrak{\mu}\_{A}\right]$$

because *P* [(1, 0, 0) | *Ic* ≈ μ*B*] = 0: for the same reason,

$$P\left[\left(1,1,0\right)\right] = P\left[\left(1,1,0\right) \mid I\_{\mathfrak{c}} \approx \mathfrak{u}\_{A}\right] P\left[I\_{\mathfrak{c}} \approx \mathfrak{u}\_{A}\right].$$

therefore

$$\frac{p\_2}{p\_1} = \frac{P\left[\left(1, 1, 0\right) \mid I\_\iota \approx \mu\_A\right] P\left[I\_\iota \approx \mu\_A\right]}{P\left[\left(1, 0, 0\right) \mid I\_\iota \approx \mu\_A\right] P\left[I\_\iota \approx \mu\_A\right]}$$

$$= \frac{P\left[\left(1, 1, 0\right) \mid I\_\iota \approx \mu\_A\right]}{P\left[\left(1, 0, 0\right) \mid I\_\iota \approx \mu\_A\right]}$$

On the other hand,

$$\frac{p\_3}{p\_0} = \frac{P\left[I\_c \approx \mu\_B\right] + P\left[(1, 1, 1) \mid I\_c \approx \mu\_A\right] P\left[I\_c \approx \mu\_A\right]}{P\left[(0, 0, 0) \mid I\_c \approx \mu\_A\right] P\left[I\_c \approx \mu\_A\right]}.$$

By changing the relative likelihood of drawing the common input from one cluster or the other, without changing the values of μ*<sup>A</sup>* and μ*<sup>B</sup>* themselves (that is, change *P* [*Ic* ≈ μ*B*] and *P* [*Ic* ≈ μ*A*] but leave the conditional probabilities (e.g., *P* [(1, 0, 0) | *Ic* ≈ μ*A*]) fixed) one may change the ratio *p*3/*p*<sup>0</sup> *without* changing the ratio *p*2/*p*1. Hence the constraint specifying those network responses exactly describable by PME models can be violated when the common input is bimodal.

In contrast, we may instead consider a *unimodal* common input, of which a Gaussian is a natural example. Here, the distribution of the common input *Ic* is completely described by its mean and variance; both parameters can impact the ratio *p*3/*p*<sup>0</sup> (by altering the likelihood that the common input alone can trigger spikes) and the ratio *p*2/*p*1. Each value of *Ic* is consistent with both events *p*<sup>1</sup> and *p*2, with the relative likelihood of each event depending on the specific value of *Ic*; it is no longer clear how to separate the two events. In the following sections, we will confirm this intuition by direct evaluation of the resulting departure from pairwise statistics.

#### *2.3.2. Model input distributions*

Motivated by our observations of excitatory currents that arose in the RGC model, we chose several input distributions that allow us to explore other salient features, such as symmetry and the probability of large events. A distribution is called *sub-Gaussian* if the probability of large events decays rapidly with event size, so that it can be bounded above by a scaled Gaussian distribution (see section 4). We considered two sub-Gaussian distributions; the Gaussian itself, and a skewed distribution with a sub-Gaussian tail (hereafter referred to as "skewed"). We also considered the two "heavy-tailed" distributions used as stimuli to the RGC model—the Cauchy distribution, and a skewed distribution with a Cauchy-like tail (hereafter referred to as "heavy-tailed skewed"). In these distributions, the probability of large events decays polynomially rather than exponentially.

For each choice of common input marginal, we varied the input parameters so as to explore a full range of firing rates and pairwise correlations: specifically, we varied the input correlation coefficient *c* in the range [0, 1], the *total* input standard deviation σ in the range [0, 4], and the threshold in [0, 3]. In all cases the independent inputs *Ij* were chosen from a Gaussian distribution [of variance (1 − *c*)σ2]. For each choice of input parameters, we determine the resulting distribution on spiking states (as described in section 4) and compute the PME approximation.

#### *2.3.3. Unimodal common inputs fail to produce significant higher-order interactions in three-cell feedforward circuits*

We first considered common inputs chosen from a unimodal (e.g., Gaussian) distribution. If *Ic* is Gaussian, then the joint distribution of **S** = (*S*1, *S*2, *S*3) is multivariate normal, and therefore characterized entirely by its means and covariances. Because the PME fit to a continuous distribution is precisely the multivariate normal that is consistent with the first and second moments, every such input distribution on **S** *exactly* coincides with its PME fit. However, even with Gaussian inputs, outputs (which are now in the binary state space {0, 1}3) will deviate from the PME fit (Amari et al., 2003; Macke et al., 2009). As shown below, non-Gaussian unimodal inputs can produce outputs with larger deviations. Nonetheless, these deviations are small for all cases in which inputs were chosen from a sub-Gaussian distribution, and PME models are quite accurate descriptions of circuits with a broad range of unimodal inputs.

We first considered circuits with either Gaussian or skewed common inputs. Over the full range of input parameters, distributions remained well fit by the pairwise model, with a maximum value of *D*KL(*P*, *P*˜) (of 0.0038 and 0.0035 for Gaussian and skewed inputs, respectively) achieved for high correlation values and σ comparable to threshold. In **Figure 7A** we illustrate these trends with a contour plot of *D*KL(*P*, *P*˜) for a fixed value of threshold (here, = 1.5) and Gaussian common inputs (the analogous plot for skewed inputs is qualitatively very similar, **Figure S3A**).

Clear patterns also emerged when we viewed *D*KL(*P*, *P*˜) as a function of *output* spiking statistics rather than *input* statistics (as in Macke et al., 2011). Non-linear spike generation can produce substantial differences between input and output correlations; this relationship can vary widely based on the specific non-linearity (Moreno et al., 2002; de la Rocha et al., 2007; Marella and Ermentrout, 2008; Shea-Brown et al., 2008; Vilela and Lindner, 2009; Barreiro et al., 2010, 2012; Tchumatchenko et al., 2010; Hong et al., 2012). **Figure 7B** shows *D*KL(*P*, *P*˜) and - for all threshold values (including the data shown in **Figure 7A**), but now plotted with respect to the output firing rate. The data were segregated according to the Pearson's correlation coefficient <sup>ρ</sup> between the responses of cell pairs (<sup>ρ</sup> <sup>≡</sup> Cov(*xi*,*xj*) <sup>√</sup>Var(*xi*)Var(*xj*) <sup>=</sup>

ρˆ−μ<sup>2</sup> <sup>μ</sup>(1−μ)). For a fixed correlation, there was generally a one-to-one relationship between firing rate and *D*KL(*P*, *P*˜). For these distributions (**Figure 7B**, for Gaussian inputs; skewed inputs shown in **Figure S3B**), *D*KL(*P*, *P*˜) was maximized at an intermediate firing rate. Additionally, *D*KL(*P*, *P*˜) had a non-monotonic relationship with spike correlation: it increased from zero for low values of correlation, obtained a maximum for an intermediate value, and then decreased. These limiting behaviors agree with intuition: a spike pattern that is completely uncorrelated can be described by an independent distribution (a special case of PME model), and one that is perfectly correlated can be completely described via (perfect) pairwise interactions alone.

We next considered circuits in which inputs were drawn from one of two heavy-tailed distributions, the Cauchy distribution and a heavy-tailed skewed distribution, defined earlier. Here, distinctly different patterns emerge: for a fixed , *D*KL(*P*, *P*˜) is maximized in regions of high input correlation and high input variance σ, but relatively high values of *D*KL are achievable across a wide range of input values (see **Figure 7C** for Cauchy inputs; heavy-tailed skewed in **Figure S3C**). However, the maximum achievable values of *D*KL were achieved at intermediate *output* correlations ρ ≈ 0.4 (see **Figure 7D** for Cauchy inputs; heavy-tailed skewed shown in **Figure S3D**); this suggests that high input correlations do not result in high output correlations.

This somewhat unintuitive finding may be explained by the structure of the PDF of a heavy-tailed common input, which favors (infrequent) large events at the expense of mediumsize events. For instance, the probability that a Cauchy input is above a given threshold (*P*[*Ic* >> **E**[*Ic*]]) is often much smaller than for a Gaussian distribution of the same variance. However, an input can trigger at best one single spiking event regardless of size: therefore a Cauchy common input generates fewer correlated spiking events with larger inputs, while a Gaussian common input triggers correlated spiking events with smaller, but more frequent, input values. As a result, heavy-tailed inputs are unable to explore the full range of output firing statistics: **Figure 7D** shows that high output correlations only occur at very low firing rates. Overall, *D*KL(*P*, *P*˜) reaches higher numerical values than for sub-Gaussian inputs, possibly reflecting the higher-order statistics in the input. However, the maximal *D*KL(*P*, *P*˜) attained still falls far short of exploring the full range of possible values (compare with **Figure 1B**).

Finally, we examine the behavior of the *strain*, which quantifies both the magnitude and sign of deviation from the pairwise model (see Ohiorhenuan and Victor, 2010). It has been previously observed that the strain is negative for the DG model (Macke et al., 2011), a condition that has been related to sparsity of the neural code and with which our results agree (data not shown). However, we found that any other choice of input marginal statistics, both positive and negative values are seen; for heavy-tailed common inputs, positive values predominated except at very low firing rates.

#### *2.3.4. Bimodal triplet inputs can generate higher-order interactions in three-cell feedforward circuits*

Having shown that a wide range of unimodal common inputs produced spike patterns that are well-approximated by PME fits, we next examined bimodal common inputs. Such inputs substantially increased departures from PME fits in the ganglion cell models described above. As in the previous section, we varied *c*, σ, and so as to explore a full range of firing rates and pairwise correlations.

As a function of input parameter values, *D*KL(*P*, *P*˜) is maximized for large input correlation and moderate input variance σ<sup>2</sup> [see **Figure 7E**, which illustrates *D*KL(*P*, *P*˜) for a fixed threshold = 1.5]. **Figure 7F** shows *D*KL(*P*, *P*˜) values as a function of the firing rate and pairwise correlation elicited by the full range of possible bimodal inputs. We see that *D*KL(*P*, *P*˜) is maximized at an intermediate (but relatively high: ν ≈ 0.4) firing rate, and for intermediate-to-large correlation values (ρ ≈ 0.6 − 0.8).

We find distinctly different results when we view - (Equation 1), for these same simulations, as a function of output spiking statistics (right panels of **Figures 7B,D,F**). For unimodal, sub-Gaussian distributions (**Figure 7B**), is very close to 1, with the few exceptions at extreme firing rates. For heavy-tailed and bimodal inputs (**Figures 7D,F**), may be appreciably far from 1 (as small as 0.5) with the smallest numbers (suggesting a poor fit of the pairwise model) occurring for low correlation ρ. This highlights one interesting example where these two metrics for judging the quality of the pairwise model, *D*KL(*P*, *P*˜) and -, yield contrasting results.

Finally, we emphasize that while bimodal inputs can produce greater higher-order interactions than unimodal inputs, the values of *D*KL(*P*, *P*˜) accessible by feedforward circuits with global inputs remain far below their upper bounds at any given firing rate. The maximal values of *D*KL(*P*, *P*˜) reached by Cauchy and heavy-tailed skewed inputs were 0.0078 and 0.0153; bimodal common inputs reached a maximal value of 0.091. This is an order of magnitude smaller than possible departures among symmetric spike patterns (compare **Figure 1B**). The difference is illustrated in **Figure S4**, which compares the *D*KL(*P*, *P*˜) values obtained in the thresholding model and those obtained by direct exhaustive search at each firing rate by superposing the datapoints on a single axis.

#### *2.3.5. Mathematical analysis of unimodal vs. bimodal effects*

The central finding above is that circuits with bimodal inputs can generate significantly greater higher-order interactions than circuits with unimodal inputs. To probe this further, we investigated the behavior of *D*KL(*P*, *P*˜) for the feedforward threshold model with a perturbation expansion in the limit of small common input. We found that as the strength of common input signals increased, circuits with bimodal inputs diverged from the PME fit more rapidly than circuits with unimodal inputs; the full calculation is given in the Appendix. In brief, we determined the leading order behavior of *D*KL(*P*, *P*˜) in the strength *c* of (weak) common input. *D*KL(*P*, *P*˜) depended on *c*<sup>3</sup> for unimodal distributions, i.e., the low order terms in *c* dropped out; for symmetric unimodal distributions, such as a Gaussian, *D*KL(*P*, *P*˜) grew as *c*4. For bimodal distributions, *D*KL(*P*, *P*˜) grew as *c*2. Because of the *c*<sup>2</sup> dependence, rather than *c*<sup>3</sup> or *c*4, as the strength of common input signals *c* increases, circuits with bimodal inputs are predicted to produce greater deviations from their PME fits.

#### *2.3.6. Impact of recurrent coupling*

We next modified our thresholding model to incorporate the effects of recurrent coupling among the spiking cells. To mimic gap junction coupling in the RGC circuit, we considered all-toall, excitatory coupling, and assumed that this coupling occurs on a faster timescale compared with the timescale over which inputs arrive at the cells.

Our previous model was extended as follows: if the inputs arriving at each cell elicited any spikes, there was a second stage at which the input to each neuron receiving a connection from a spiking cell was increased by an amount *g*. This represented a rapid depolarizing current, assumed for simplicity to add linearly to the input currents. If the second stage resulted in additional spikes, the process was repeated: recipient cells received an additional current *g*, and their summed inputs were again thresholded. The sequence terminated when no new spikes occurred on a given stage; e.g., for *N* = 3, there were a maximum of three stages. The spike pattern recorded on a given trial was the total number of spikes generated across all stages.

We then explored the impact of varying *g* for a single representative value of σ and , and several values of the correlation coefficient *c*. We found that as *g* increased *D*KL(*P*, *P*˜) varied smoothly, reflecting the underlying changes in the spike count distribution. For small *c* (*c* = 0.02 shown in **Figure 8A**), where the variance of common input is very small, the results varied little by input type: for all input types *D*KL(*P*, *P*˜) reached an interior maximum near *g* ≈ 1.7. As *c* increases, the distinctions between inputs types become apparent (**Figures 8B,C** show *c* = 0.2, 0.5, respectively): for most input types and values of *c*, the value of *D*KL(*P*, *P*˜) reaches an interior maximum that exceeds its value without coupling (i.e., *g* = 0). However, overall values of *D*KL(*P*, *P*˜) remained modest, never exceeding 0.01 across the values explored here.

## *2.3.7. Summary of findings for simplified circuit model*

We examined a highly idealized model of neural spiking, so as to explore the generality of our earlier findings in a small array of RGC models. We found that our main results from the RGC model—that higher-order interactions were most significant when inputs had bimodal structure, and that when fast excitatory recurrence was added to the circuit, higher-order interactions were maximized at an intermediate value of the recurrence strength—persisted in this simplified model. Moreover, we were able to show that the first of these findings is general, in that it holds over a complete exploration of parameter space.

#### **2.4. SCALING OF HIGHER-ORDER INTERACTIONS WITH POPULATION SIZE**

The results above suggest that unimodal, rather than bimodal, input statistics contribute to the success of PME models. Next, we examined whether this conclusion continues to hold when we increase network size. The permutation-symmetric architectures we have considered so far can be scaled up to more than three cells in several natural ways; for example, we can study *N* cells with a global common input.

We considered a sequence of models in which a set of *N* threshold spiking units received global input *Ic* [with mean 0 and variance σ2*c*] and an independent input *Ij* [with mean 0 and variance σ2(1 − *c*)]. As for the three-cell network models considered previously, the output of each cell was determined by summing

and thresholding these inputs. Upon computing the probability distribution of network outputs (section 4), we fit a PME distribution. Again, we explored a range of σ, *c*, and and recorded the maximum value of *D*KL(*P*, *P*˜) between the observed distribution *P* and its PME fit *P*˜. **Figure 9** shows this *D*KL/*N* [i.e., entropy per cell (Macke et al., 2009)] for each class of marginal distributions.

We found that the maximum *D*KL(*P*, *P*˜)/*N* increased roughly linearly with *N* for Gaussian, skewed and Cauchy inputs; for heavy-tailed skew and bimodal inputs, *D*KL(*P*, *P*˜)/*N* appeared to saturate after an initial increase (**Figure 9**). The relative ordering for unimodal inputs shifted as *N* increased; as *N* → 16, the maximal achievable *D*KL(*P*, *P*˜) for sub-Gaussian inputs overtook the values for heavy-tailed inputs. At all values of *N*, the values for Gaussian and skewed inputs tracked one another closely. Regardless, the values for all unimodal inputs remained substantially below the maximal value achievable for bimodal inputs. **Figure 9B** shows that the probability distributions produced by these inputs qualitatively agree with this trend: departures from PME were more visually pronounced for global bimodal inputs than for global unimodal inputs. In addition, the distributions for heavy-tailed and sub-Gaussian inputs differed qualitatively, offering a potential mechanism for different scaling behavior. Using the relationship between *D*KL and likelihood ratios (described in section 2.1), at *N* = 16, the value *D*KL/*N* ≈ 0.1 for bimodal global inputs corresponds to a likelihood ratio of 0.33 that a single draw from *P* (single network output) in fact came from the PME fit *P*˜ rather than from *P*; a likelihood <0.01 is reached for four draws.

We next extended our model with recurrent coupling to *N* > 3 cells. In addition to the parameters for the uncoupled network, we varied the coupling strength, *g*, for each type of input. As in the *N* = 3 network, coupling was all-to-all. As for the small networks explored in an earlier section, *D*KL(*P*, *P*˜) generally peaked at an intermediate value of the coupling strength *g*; however, the value of *g* decreased as population size *N* increased (illustrated in **Figure 10A**, for *c* = 0.2). This may be attributed to the increased potential impact of recurrence at larger population sizes; as *N* increases, the number of potential *additional* spikes that may be triggered increases; consequently the average recurrent excitation received by each cell increases, and therefore the probability that one or two spikes will trigger a cascade to *N* spikes. In **Figure 10B** we demonstrate that the impact of this effect may be captured by plotting *D*KL(*P*, *P*˜) as a function of an *effective* coupling parameter, *g*∗*N*/3. Here, we plot the curves for six population sizes (*N* = 3, 4, 6, 8, 10, and 12) and five common input types; each curve was scaled by normalizing *D*KL(*P*, *P*˜) by its maximum value. For many sets of parameter values, the resulting curves line up remarkably well, suggesting a universal scaling with the effective coupling parameter.

We also explored the overall possible impact of recurrence on higher-order interactions, by surveying a range of circuit parameters *c*, σ, and *g*. The top panel of **Figure 10C** shows the maximal *D*KL(*P*, *P*˜) per neuron, for each type of input, up to population size *N* = 8. For unimodal inputs, recurrent coupling increased the available range of higher-order interactions modestly, compared with the range achieved with purely feedforward connections; however, these values remained significantly lower than those achieved for bimodal inputs.

Finally, we considered how higher-order interactions scale with population sampling size. The spike pattern distributions used to generate the last column of data points (*N* = 8) in the top panel of **Figure 10C** were reanalyzed by sub-sampling the spike pattern distributions on *k* < 8 cells. In each case, we chose our sub-population to be *k* nearest neighbors (for our setup, any subset of *k* cells is statistically identical). In the bottom panel of **Figure 10C**, we show the maximal value of *D*KL(*P*, *P*˜) per subsampled cell achieved over all input parameters (the curves for Gaussian, skewed and Cauchy inputs are so close together so as to be visually indistinguishable). This number increases or remains steady as *k* increases, indicating that sub-sampling a coupled network will depress the apparent higher-order interactions in the output spiking pattern.

To summarize, the greater impact of bimodal vs. unimodal input statistics on maximal values of *D*KL(*P*, *P*˜) persists in circuits with *N* = 3 cells up to *N* = 16 cells. Overall, for the circuit parameters producing maximal deviations from PME fits, it becomes easier to statistically distinguish between spiking distributions and their PME fits as the number of cells increases in feedforward networks.

**FIGURE 9 | The significance of higher-order interactions increases with network size. (A)** Normalized maximal deviation, *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜)/*N*, from the PME fit for the thresholding circuit model as network size *N* increases. For each *N* and common input distribution type, possible input parameters were in the following ranges: input correlation *c* ∈ [0, 1], input standard deciation σ ∈ [0, 4], and threshold ∈ [0, 3]. **(B)** Example

sample distributions for different types of common input: from top, bimodal, Gaussian, heavy-tailed skew, and Cauchy common inputs. For each input type, the distribution that maximized *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) for *<sup>N</sup>* <sup>=</sup> 16 is shown. Each distribution is illustrated with a bar plot contrasting the probabilities of spiking events in the true (dark blue) vs. pairwise maximum entropy (light pink) distributions.

**FIGURE 10 | The impact of recurrent coupling on the sum-and-threshold model, for increasing population size. (A)** *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜ ) as a function of the coupling coefficient, *g*, for a specific value of population size *N*. In all plots, input standard deviation σ = 1, threshold = 1.5 and input correlation *c* = 0.2. From top: *N* = 4; *N* = 8; *N* = 12. **(B)** *D*norm KL (*P*, *<sup>P</sup>*˜ ) as a function of the coupling coefficient, *g*, for populations sizes *N* = 3 − 12. For each curve, *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜ ) was scaled by its maximal value and plotted as a function of the scaled coupling coefficient, *g*∗*N*/3, to illustrate a universal scaling with effective coupling strength. The line style of each curve indicates the population size *N*, as listed in the legend. The marker and line color indicate

the common input marginal, as listed in the legend for **(A)**. **(C)** (Top) Maximal value of *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜)/*N*, achieved over a survey of parameter values *<sup>c</sup>*, <sup>σ</sup>, , and *g*, as a function of the population size *N* (solid lines). For each input marginal type, a second curve shows the maximal value obtained over only feed-forward simulations (*g* = 0; dashed lines). The marker and line color indicate the common input marginal, as listed in the legend for **(A)**. (Bottom) Maximal value of *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜)/*k*, achieved over a survey of parameter values *<sup>c</sup>*, σ, , and *g*, as a function of the *subsample* population size *k*. Data was subsampled from the *N* = 8 data shown in the top panel, by restricting analysis to *k* out of *N* cells.

## **3. DISCUSSION**

We used mechanistic models to identify input patterns and circuit mechanisms which produce spike patterns with significant higher-order interactions—that is, with substantial deviations from predictions under a PME model. We focused on a tractable setting of small, symmetric circuits with common inputs. This revealed several general principles. First, we found that these circuits produced outputs that were much closer to PME predictions than required for a general spiking pattern. Second, bimodal input distributions produced stronger higher-order interactions than unimodal distributions. Third, recurrent excitatory or gap junction coupling could produce a further, moderate increase of higher-order correlations; the effect was greatest for coupling of intermediate strength.

These general results held for both an abstract thresholdand-spike model and for networks of non-linear integrate-andfire units based on measured properties of one class of RGCs. Together with the facts that ON parasol cell filtering suppresses bimodality in light input, and that coupling among ON parasol cells is relatively weak, our findings provide an explanation for why their population activity is well captured by PME models.

## **3.1. COMPARISON WITH EMPIRICAL STUDIES**

How do our maximum entropy fits compare with empirical studies? In terms of *D*KL(*P*, *P*˜)—equivalently, the logarithm of the average relative likelihood that a sequence of data drawn from *P* was instead drawn from the model *P*˜—numbers obtained from our RGC models are very similar to those obtained by *in vitro* experiments on primate RGCs (Shlens et al., 2006, 2009). For example, in a survey of 20 numerical experiments under constant light conditions (each of length 100 ms, with spikes binned in 10 ms intervals), we find that *D*KL(*P*, *P*˜) ranges between 0 and 0.00029: similarly excellent fits were found by Shlens et al. (2006) (in which cell triplets were stimulated by constant light for 60 s with spikes binned at 10 ms), with one example given of 0.0008 (inferred from a reported likelihood ratio of 0.99944). These values can increase by an order of magnitude under full-field stimulation, as well as spatio-temporally varying stixel simulations (bounded above by 0.007). We can view the 60μm stixel simulations as a model of the checkerboard experiments of Shlens et al. (2006), for which close fits by the PME distribution were also observed (likelihood numbers were not reported). Similarly, the values of produced by our RGC model are close to those found by Schneidman et al. (2006); Shlens et al. (2006) under comparable stimulus conditions. We obtain - = 99.5% (for cell group size *N* = 3) under constant illumination, which is near the range reported by Shlens et al. (2006) for the same bin size and stimulus conditions (98.6 ± 0.5, *N* = 3 − 7). For full-field stimuli we find a range of numbers from 95.7% to 99.3% (*N* = 3).

With regard to the circuit mechanisms behind these excellent fits by pairwise models, the findings that most directly address the experimental settings of Shlens et al. (2006, 2009), are (1) the finding that in the threshold model, unimodal inputs generate minimal higher-order interactions, compared to bimodal inputs, and (2) the particular stimulus filtering properties of parasol cells can suppress bimodality that may be present in an input stimulus, resulting in a unimodal distribution of input currents. First, we believe that unimodal inputs are consistent with the whitenoise checkboard stimuli used in Shlens et al. (2006, 2009), where binary pixels were chosen to be small relative to the receptive field size; averaged over the spatial receptive field, they would be expected to yield a single Gaussian input by the central limit theorem. Second, temporal filtering may contribute to receipt of unimodal conductance inputs by cells for the full-field binary flicker stimuli that are delivered in Schneidman et al. (2006). With the 16.7 ms refresh rate used there, under the assumption that the filter time-scale of the cells studied in that paper is roughly similar to that of the ON parasol cell we consider, the filter would average a binary (and hence bimodal) stimulus into a unimodal shape (see **Figure 2C**, for example).

The simple threshold models that we have considered, meanwhile, give us a roadmap for how circuits could be driven in such a way as to lower -. The right columns of **Figures 7B,D,F** show plotted as a function of firing rate for circuits of *N* = 3 cells receiving global common inputs; we observe that - ≈ 1 for Gaussian inputs over a broad range of firing rates and pairwise correlation coefficients, but that values of can be depressed by 25–50% in the presence of a bimodal common input. Indeed, Shlens et al. (2006) showed that adding global bimodal inputs to a purely pairwise model can lead to a comparable departure in -. Our results are consistent with this finding, and explicitly demonstrate that the bimodality of the inputs—as well as their global projection—are characteristics that lead to this departure.

## **3.2. CONSEQUENCES FOR SPECIFIC NEURAL CIRCUITS**

Our results make predictions about when neural circuits are likely to generate higher-order interactions. A comprehensive study of our simple thresholding model shows that bimodal inputs generate greater beyond-pairwise interactions than unimodal inputs. This result can be extended to other circuits where a clear input– output relationship exists, and be used to predict higher-order correlations by analyzing the impact of stimulus filtering on a statistically defined class of inputs. For example, the effect holds in our model of primate ON parasol cells, where a biphasic filter suppresses bimodality in a stimulus with a timescale matched to that of the filter. We can use these results to extrapolate to other classes of RGCs or other stimulus conditions in which filters are less biphasic (Victor, 1999). Indeed, when we process long time-scale bimodal inputs through a preliminary model of the midget cell circuit, stimulus bimodality is no longer suppressed and is associated with higher-order interactions (see **Figure 4**). We predict that greater higher-order interactions will be found for stimuli or RGC circuits that elicit bimodal activity that is thresholded when generating spikes—in comparison to the parasol circuits and stimuli studied in Shlens et al. (2006, 2009). We believe that this principle will be further applicable in other sensory systems.

We found that recurrent excitatory connections further increase higher-order interactions, which are maximized at an intermediate recurrence strength; in particular, when the strength of an excitatory recurrent input was comparable to the distance between rest and threshold (**Figure 8**). For the primate ON parasol cells we considered, the experimentally measured strength of gap junction coupling would lead to an estimated membrane voltage jump of ≈1 mV in response to the firing of a neighboring RGC, while the voltage distance between the resting voltage and an approximate threshold is about 5–10 mV (Trong and Rieke, 2008). Consistent with this estimate, we found that in our ON parasol cell model, higher-order interactions were maximized when the strength of excitatory recurrence was eight times its experimentally measured value. The experimentally measured values of recurrence had little or no effect on higher-order interactions. We anticipate that this result may be used to predict whether recurrent coupling plays a role in generating higherorder interactions in other circuits where the average voltage jump produced by an electrical or synaptic connection can be measured.

To apply our findings to real circuits, we must also consider population size. A measurement from a neural circuit, in most cases, will be a subsample of a much larger, complete circuit. We addressed this question where it was computationally more tractable, for the thresholding model. Here, we found that the impact of higher-order interactions, as measured by entropy per cell unaccounted for by the pairwise model (*D*KL/*k*), increases moderately as subsample size *k* increases. Since recurrent connectivity in our model is truly global, this is consistent with the suggestion of Roudi et al. (2009a) and others that the entropy can be expected to scale extensively with population size *N*, once *N* significantly exceeds the true spatial connectivity footprint: we may see different results with limited, local connectivity.

#### **3.3. SCOPE AND OPEN QUESTIONS**

There are many aspects of circuits left unexplored by our study. Prominent among these is heterogeneity. Only a few of our simulations produce heterogeneous inputs to model RGCs, and all of our studies apply to cells with identical response properties. This is in contrast to studies such as Schneidman et al. (2006), which examine correlation structures among multiple cell types. For larger networks, feedforward connections with variable spatial profiles also occur, between the extremes of independent and global input connections examined here. It is also possible that more complex input statistics could lead to greater higher-order interactions (Bethge and Berens, 2008). Finally, **Figure 9** indicates that some trends in *D*KL(*P*, *P*˜) vs. N appear to become non-linear for *N* - 10; for larger networks, our qualitative findings could change.

Our study also leaves largely open the role of different retinal filters in generating higher-order interactions. We have found that the specific filtering properties of ON parasol cells suppress bimodality in light inputs, suggesting that other classes of RGCs, such as midget cells, may produce more robust higherorder interactions (compare panels in **Figure 4B**). This predicts a specific mechanism for the development of higher-order interactions in preparations that include multiple classes of ganglion cells (Schneidman et al., 2006). For a complete picture, future studies will also need to account for the possible adaptation of stimulus filters in response to higher-order stimulus characteristics (Tkacik et al., 2012); we did not consider the latter effect here, where our filter was fit to the response of a cell to Gaussian stimuli with specific mean and variance. An allied possibility is that multiple filters will be required, as was found when fitting the responses of salamander retinal cells to LN models (Fairhall et al., 2006). Distinguishing the roles of linear filters vs. static non-linearities in determining which stimulus classes will give the greatest higher-order correlations is another important step. Finally, we considered circuits with a single step of inputs and simple excitatory or gap junction coupling; a plethora of other network features could also lead to higher-order interactions, including multi-layer feedforward structures, together with lateral and feedback coupling. We speculate that, in particular, such mechanisms could contribute to the higher-order interactions found in cortex (Tang et al., 2008; Montani et al., 2009; Ohiorhenuan et al., 2010; Oizumi et al., 2010; Koster et al., 2013).

A final outstanding area of research is to link tractable network mechanisms for higher-order interactions with their impact (or lack of impact) on information encoded in neural populations (Kuhn et al., 2003; Montani et al., 2009; Oizumi et al., 2010; Ganmor et al., 2011; Cain and Shea-Brown, 2013). A simple starting point is to consider rate-based population codes in which each stimulus produces a different "tuned" average spike count (see for e.g., chapter 3 of Dayan and Abbot, 2001). One can then ask whether spike responses can be more easily decoded to estimate stimuli for the full population response (i.e., *P*) to each stimulus or for its pairwise approximation (*P*˜). In our preliminary tests where higher-order correlations were created by inputs with bimodal distributions, we found examples where decoding of *P* vs. *P*˜ differed substantially. However, a more complete study would be required before general conclusions about trends and magnitudes of the effect could be made; such a study would include complementary approach in which the full spike responses *P* are themselves decoded via a "mismatched" decoder based on the pairwise model *P*˜ (Oizumi et al., 2010). Overall, we hope that the present paper, as one of the first that connects circuit mechanisms to higher-order statistics of spike patterns, will contribute to future research that takes these next steps.

## **4. MATERIALS AND METHODS**

#### **4.1. EXPERIMENTALLY-BASED MODEL OF A RGC CIRCUIT**

We model the response of a individual RGC using data collected from a representative primate ON parasol cell, following methods in Murphy and Rieke (2006); Trong and Rieke (2008). Similar response properties were observed in recordings from 16 other cells. To measure the relationship between light stimuli and synaptic conductances, the retina was exposed to a full-field, white noise stimulus. The cell was voltage clamped at the excitatory (or inhibitory) reversal potential *VE* = 0 mV (*VI* = −60 mV), and the inhibitory (or excitatory) currents were measured in response to the stimulus. These currents were then turned into equivalent conductances by dividing by the driving force of ±60 mV; in other words

$$g^{\rm exc} = I^{\rm exc} / (V - V\_E); \qquad V - V\_E = -60 \text{ mV}$$

$$g^{\rm inh} = I^{\rm inh} / (V - V\_I); \qquad V - V\_I = 60 \text{ mV}$$

The time-dependent conductances *g*exc and *g*inh were now injected into a different cell using a dynamic clamp procedure (i.e., the input current was varied rapidly to maintain the correct relationship between the conductance and the membrane voltage) and the voltage was measured at a resolution of 0.1 ms.

#### *4.1.1. Stimulus filtering*

To model the relationship between the light stimulus and synaptic conductances, the current measurements *I*exc and *I*inh were fit to a linear-nonlinear model:

$$\begin{aligned} \mathcal{g}^{\mathrm{exc}}(t) &= N^{\mathrm{exc}} \left[ L^{\mathrm{exc}} \* s(t) + \eta^{\mathrm{exc}} \right], \\ \mathcal{g}^{\mathrm{inh}}(t) &= N^{\mathrm{inh}} \left[ L^{\mathrm{inh}} \* s(t) + \eta^{\mathrm{inh}} \right] \end{aligned}$$

where *s* is the stimulus, *L*exc (*L*inh) is a linear filter, *N*exc (*N*inh) is a non-linear function, and ηexc (ηinh) is a noise term. The linear filter was fit by the function

$$L^{\rm exc}(t) = P\_{\rm exc} \left( t/\tau\_{\rm exc} \right)^{\rm tre} \exp\left( -t/\tau\_{\rm exc} \right) \sin\left( 2\pi t/T\_{\rm exc} \right) \quad (7)$$

and the non-linear filter by the polynomial

$$N^{\rm exc}(\mathbf{x}) = A\_{\rm exc} \mathbf{x}^2 + B\_{\rm exc} \mathbf{x} + C\_{\rm exc} \,. \tag{8}$$

Fits minimized the mean-square distance between model and data. *L*inh and *N*inh were fit using the same parametrization.

The noise terms ηexc *<sup>k</sup>* , <sup>η</sup>inh *<sup>k</sup>* were fit to reproduce the statistical characteristics of the residuals from this fitting. We simulated the noise terms ηexc and ηinh using Ornstein–Uhlenbeck processes with the appropriate parameters; these were entirely characterized by the mean, standard deviation, and time constant of autocorrelation τη,exc - τη,inh , as well as pairwise correlation coefficients for noise terms entering neighboring cells. The noise correlation coefficients were estimated from the dual recordings of Trong and Rieke (2008).

Linear filter parameters computed (also listed in **Table 1**) were *P*exc = −8 × 104s <sup>−</sup>1, *n*exc = 3.6, τexc = 12 ms, *T*exc = 105 ms, and *P*inh = −1.8 × 10<sup>5</sup> s <sup>−</sup>1, *n*inh = 3.0, τinh = 16 ms, *T*inh = 120 ms. Non-linearity parameters were *A*exc = −8.3 × 10−<sup>7</sup> nS, *B*exc = 7 × 10−<sup>3</sup> nS, *C*exc = −0.95 nS, and *A*inh = 1.67 × 10−<sup>6</sup> nS, *B*inh = 6.2 × 10−<sup>3</sup> nS, *C*inh = 4.17 nS. Noise parameters were measured to be mean(ηexc *<sup>k</sup>* ) <sup>=</sup> 30, std(ηexc *<sup>k</sup>* ) = 500, τη,exc = 22 ms, and mean(ηinh *<sup>k</sup>* ) = −1200, std(ηinh *<sup>k</sup>* ) = 780, τη,inh = 33 ms. In addition, excitatory (inhibitory) noise to different cells ηexc *<sup>k</sup>* , <sup>η</sup>exc *<sup>j</sup>* (ηinh *<sup>k</sup>* , <sup>η</sup>inh *<sup>j</sup>* ) had a correlation coefficient of 0.3 (0.15).

For the filter demonstrated in **Figure 4**, we added a cosine component to the previous filter, i.e.,

$$\begin{split} L^{\text{exc,M}}(t) &= P\_{\text{exc,M}}\left(t/\tau\_{\text{exc,M}}\right)^{\eta\_{\text{exc,M}}} \exp\left(-t/\tau\_{\text{exc,M}}\right) \\ &\times \left[ \sin\left(2\pi t/T\_{\text{exc,M},\text{S}}\right) + R\_{\text{exc,M}} \cos\left(2\pi t/T\_{\text{exc,M},\text{C}}\right) \right] (9) \end{split}$$

Here *P*exc,M = −3.2 × 105 s <sup>−</sup>1, *n*exc,M = 2, τexc,M = 12 ms, *T*exc,M,S = 120 ms and *T*exc,M,C = 100 ms, and *P*inh,M = −3.5 × 105 s <sup>−</sup>1, *n*inh,M = 2, τinh,M = 13.2 ms, *T*inh,M,S = 132 ms and *T*inh,M,C = 110 ms, while *R*exc,M = *R*inh,M = 0.8.

#### *4.1.2. Voltage evolution*

We create a model of the cell as a non-linear integrate-and-fire model using the method of Badel et al. (2007), in which the membrane voltage is assumed to respond as

$$\frac{dV}{dt} = F\left(V, t - t\_{\text{last}}\right) + \frac{I\_{\text{input}}(t)}{C} \tag{10}$$

where *C* is the cell capacitance, *t*last is the time of the last spike before time *t*, and *I*input(*t*) is a time-dependent input current. We use the current-clamp data, which yields cell voltage in response to the input current *I*input(*t*) = −*g*exc(*t*)(*V* − *VE*) − *g*inh(*V* − *VI*), to fit a function *F*(*V*, *t*). When voltage data is segregated according to the time since the last spike *t* − *t*last, the *I* − *V* curve is well fit by a function of the form

$$F(V, t - t\_{\rm last}) = \frac{1}{\mathfrak{r}\_m} \left( E\_L - V + \Delta\_T e^{(V - V\_T)/\Delta\_T} \right) \tag{11}$$

where parameters are the membrane time constant τ*m*, resting potential (*EL*), spike width -*<sup>T</sup>* and knee of the exponential curve *VT*.

The values of these constants differed in each bin of voltage data; to estimate these constants, we first extracted their values from each mean *I* − *V* curve. We found that these constants, as a function of *t* − *t*last, were well fit by either a single exponential or a difference of two exponentials, with relaxation to a baseline rate (as in Badel et al., 2007, **Figure 3**). Specifically, we chose:

$$\frac{1}{\mathfrak{c}\_{m}} = c\_{\mathfrak{t}\_{m},1} + c\_{\mathfrak{t}\_{m},2}e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{\mathfrak{t}\_{\text{ult}},3}}$$

$$E\_{L} = c\_{\tilde{\varepsilon}\_{\text{L},1}} + c\_{\mathfrak{E}\_{\text{L},2}} \left( e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{\mathfrak{E}\_{\text{L},3}}} - e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{\mathfrak{E}\_{\text{L},4}}} \right)$$

$$\Delta\_{T} = c\_{\Delta\_{T},1} + c\_{\Delta\_{T},2} \left( e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{\Delta\_{T},3}} - e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{\Delta\_{T},4}} \right)$$

$$V\_{T} = c\upsilon\_{T,1} + c\upsilon\_{T,2}e^{-(t-\mathfrak{t}\_{\text{ball}})/c\_{V,3}} \tag{12}$$

We obtained the coefficients by least-squares fitting to the above functional forms: specifically, we found that (up to four digits): - *c*τ*m*,1,*c*τ*m*,2,*c*τ*m*,<sup>3</sup> = (0.3719 ms<sup>−</sup>1, 0.5412 ms<sup>−</sup>1, 13.2726 ms), - *cEL*,1,*cEL*,2,*cEL*,3,*cEL*,<sup>4</sup> = (−59.4858 mV, 5.8966 mV, 8.3076 ms, 233.1114 ms), - *c*-*T*,1,*c*-*T*,2,*c*-*T*,3,*c*-*<sup>T</sup>*,4 = (20.0487 ms, 19.0560 ms, 3.6280 ms, 2.4304 s), and - *cVT*,1,*cVT*,2,*cVT*,<sup>3</sup> = (−44.3323 mV, 25.1812 mV, 4.7653 ms). Coefficients are also listed in **Table 2**.

The capacitance was inferred from the voltage trace data by finding, at a voltage value where the voltage/membrane current relationship is approximately Ohmic, the value of *C* that minimizes error in the relation Equation (10) (Badel et al., 2007). The estimated value was *C* = 28 pF.

#### *4.1.3. Spiking dynamics: feedforward network*

For simulations without electronic coupling, our model neuron comprises Equations (10, 11) for *V* < *V*threshold; a spike was detected when *V* reached *V*threshold = −30 mV; voltage was then reset to *V*reset = −55 mV. The cell was then unable to spike for an absolute refractory period of τabs = 3 ms.

All simulations presented here were done in a three-cell network.

#### *4.1.4. Spiking dynamics: recurrent network*

Gap junction coupling was introduced as an additional current on the right-hand side of Equation (10):

$$\frac{I\_{\text{gap},j}}{C} = -\frac{g^{\text{gap}}}{C} \sum\_{k \neq j} \left( V\_j - V\_k \right) \tag{13}$$

The coupling strength *g*gap was held constant during a simulation. When coupling was present (i.e., when *g*gap = 0), *g*gap was varied from the measured level (1.1 nS) (Trong and Rieke, 2008) to 16 times this value (17.6 nS) between simulations. When present, coupling was all-to-all.

As in the feedforward model, Equations (10, 11) were integrated for *V* < *V*threshold, and a spike was detected when *V* reached *V*threshold = −30 mV. To model the voltage trajectory immediately following a spike, an averaged spike waveform was extracted from voltage traces of the same ON parasol cell used to fit Equations (10, 11). This spike waveform was then used to replace 1 ms of the membrane voltage trajectory during and after a spike; at the end of the 1 ms, the voltage was released at approximately −58 mV. The cell was unable to spike for an absolute refractory period of τabs = 3 ms. A relative refractory period was induced by introducing a declining threshold for the period of 3–6 ms following a spike, after which *V*threshold returns to −30 mV.

#### *4.1.5. Cell receptive field and stimulation*

We defined each cell's stimulus as the linear convolution of an image with its receptive field. The receptive fields include an ON center and an OFF surround, as in Chichilnisky and Kalmar(2002):

$$s\_{\vec{\jmath}}(\vec{\bf x}) = \exp\left(-\frac{1}{2}\left(\vec{\bf x} - \vec{\bf x}\_{\vec{\jmath}}\right)^{T} \mathbf{Q}\left(\vec{\bf x} - \vec{\bf x}\_{\vec{\jmath}}\right)\right) \tag{14}$$

$$-k \exp\left(-\frac{1}{2}r\left(\vec{\bf x} - \vec{\bf x}\_{\vec{\jmath}}\right) \mathbf{Q}r\left(\vec{\bf x} - \vec{\bf x}\_{\vec{\jmath}}\right)\right)$$

where the parameters *k* and 1/*r* give the relative strength and size of the surround. **Q** specifies the shape of the center and was chosen to have a 1 standard deviation (SD) radius of 50μm and to be perfectly circular. The receptive field locations *x* 1, *x* 2, and *x* <sup>3</sup> were chosen so that the 1 SD outlines of the receptive field centers will tile the plane (i.e., they just touch). Other parameters used were *k* = 0.3, *r* = 0.675.

Stimulation images were defined on a 512μm × 512μm grid that overlapped all three receptive fields. For full-field stimuli, light intensity was chosen be spatially constant and refreshed every 8, 40, or 100 ms by choosing independently from the specified stimulus distribution (Gaussian, binary, Cauchy, or heavy-tailed skew). For spatially variable stimuli, a checkerboard pattern was imposed on the stimulation image: the intensity value in each checkerboard square was chosen independently and refreshed

**Table 1 | Parameters used to model the transformation of stimuli into synaptic conductances for the RGC model, as described in Equations (7–9).**


#### **Additional parameters for monophasic filters**


*Asterisks (\*) indicate parameters that are superceded by later rows; note that the monophasic filter equations contain two filtering timescales—for example Texc,M,S and Texc,M,C, for the excitatory monophasic filter—and a relative weighting (e.g., Rexc,M).*


*The parameters 1*/τ*<sup>m</sup> and VT were fit to single exponentials as functions of time, with three free parameters. The parameters EL and* -*<sup>T</sup> were fit to differences of exponentials and therefore have four parameters. Units in the first and second columns are as stated; coefficients in the third and fourth column are in units of milliseconds (ms).*

at the appropriate interval. The checkerboard pattern was first given a random rotation and translation relative to the receptive fields: this was chosen at the outset of each batch of stixel simulations (for a total of five rotation/translation pairs per stixel size, refresh rate, and stimulus distribution). Two example placements are shown in **Figures S2A,D** for 256μm and 60μm pixels respectively.

#### *4.1.6. Numerical methods*

All simulations and data analysis were performed using MATLAB. Equations (10, 11) were integrated using the Euler method for >10<sup>5</sup> ms with a time step of 0.1 ms. The synaptic noise terms, ηexc *<sup>k</sup>* and <sup>η</sup>inh *<sup>k</sup>* , as well as the light input, were generated independently for each simulation. In response to uniform light stimuli, firing rates were 11.51 ± 0.38 Hz (standard deviations given across a total of 60 cells; 3 cells each from 20 10<sup>5</sup> ms simulations); 10 ms bins were used to discretize the spiking output. Firing rates were higher for full-field stimuli, ranging from 12 to 43 Hz (firing rates increased with stimulus variance); therefore shorter (5 ms) bins were used to discretize spike output for all other simulations. With this range of firing rates and bin size, multiple spikes were very rare (occurring in <1% of occupied bins). Empirical spiking distributions were computed from the binned spike data.

For each stimulus condition, 20 simulations (or subsimulations) were run, for a total integration time of > 20 × 10<sup>5</sup> ms. These 20 sub-simulations were used to estimate standard errors in both the probability distribution over spiking events and *D*KL(*P*, *P*˜). Numbers reported in section 2 are, unless specified otherwise, produced by collating the data from the 20 simulations.

To fit a maximum entropy model *P*˜ to an empirical probability distribution *P*, we used standard methods that have been explained elsewhere (Malouf, 2002). Briefly, we minimized the negative loglikelihood function:

$$L(\lambda) = -\sum\_{\mathbf{x}} P\left(\mathbf{x}\right) \log \tilde{P}\left(\mathbf{x}, \lambda\right) \tag{15}$$

where

$$\tilde{P}\left(\mathbf{x},\lambda\right) = Z\_{\lambda}^{-1} \exp\left(\sum\_{k} \lambda\_{k} f\_{k}\left(\mathbf{x}\right)\right);$$

*Z*<sup>λ</sup> is the partition function, *fk*, *k* = 1,..., *M* is a set of functions or "features" of the spiking state, and λ is a vector of parameters, each of which serves as a Lagrange multiplier enforcing the constraint **E***P*˜ [*fk*]. For the pairwise (PME) model on *N* cells, λ corresponds to *N* firing rates and *N*(*N* − 1)/2 covariances, and the sum is over all possible spiking states of the system. For *N* = 3 there are six such parameters, and

$$\begin{aligned} \log \tilde{P} \left( \{ \mathbf{x}\_1, \mathbf{x}\_2, \mathbf{x}\_3 \}, \lambda \right) &= \lambda\_1 \mathbf{x}\_1 + \lambda\_2 \mathbf{x}\_2 + \lambda\_3 \mathbf{x}\_3 + \lambda\_{1,2} \mathbf{x}\_1 \mathbf{x}\_2 \\ &+ \lambda\_{2,3} \mathbf{x}\_2 \mathbf{x}\_3 + \lambda\_{1,3} \mathbf{x}\_1 \mathbf{x}\_3 - \log Z\_{\lambda} .\end{aligned}$$

The function in Equation (15) is a convex function of the parameters λ which will be minimized precisely (and uniquely) when *P*˜ matches the desired moments from *P*: e.g., **E***P*[*x*1] = **E***P*˜ [*x*1]. Since *P*˜ is in log-linear form, the result will be the *maximum entropy* distribution that matches the desired moments (Malouf, 2002). In principle any unconstrained gradient descent method may be used; we used an implementation of the non-linear conjugate gradient method. The Kullback Leibler divergence *D*KL(*P*, *P*˜) was computed using the identity *D*KL(*P*, *P*˜) = *S*(*P*˜) − *S*(*P*), where *S*(*P*) is the entropy of *P*, i.e., *S*(*P*) = − **<sup>x</sup>** *P*(**x**)log *P*(**x**).

#### *4.1.7. Convergence testing*

To test our finding that the observed distributions were wellmodeled by the PME fit, we also performed the PME analysis on each of the 20 simulations for each stimulus condition. While in general *D*KL(*P*, *P*˜) can be quite sensitive to perturbations in *P*, the numbers remained small under this analysis. To confirm that our results for *D*KL(*P*, *P*˜) are sufficiently resolved to remove bias from sampling, we performed an analysis in which we collect the 20 simulations in subgroups of 1, 2, 4, 5, 10, and 20, and plot the mean *D*KL with estimated standard errors. As expected (e.g., Paninski, 2003), bias decreases as the length of subgroup increases and asymptotes at—or before—the full simulation length.

To provide a cross-validation test for the significance of our reported *D*KL(*P*, *P*˜) values, we divided our data into halves (which we denote *P*<sup>1</sup> and *P*2, each including data from 10 subsimulations) and performed the PME analysis on one half (say *P*1) to yield a model *P*˜ 1. We then computed *D*KL(*P*2, *P*˜ <sup>1</sup>) and *D*KL(*P*2, *P*1) (as in Yu et al., 2011), which we refer to the *crossvalidated* and *empirical* likelihood, respectively. The former tests whether the PME fit is robust to over-fitting; the latter tests how well-resolved our "true" distribution is in the first place. Most cross-validated likelihoods fall on or near the identity line; most empirical likelihoods are close to zero [and importantly, significantly smaller than either *D*KL(*P*, *P*˜) or *D*KL(*P*2, *P*˜ <sup>1</sup>), indicating that *D*KL(*P*, *P*˜) is accurately resolved]. We conclude that the deviations that we observe when these conditions are met can not be accounted for by the differences in testing and training data.

#### **4.2. COMPUTATION OF SPIKING PATTERNS IN THE SIMPLIFIED MODEL**

As a simplified model of a neural circuit, we consider a variant of the *Dichotomized Gaussian* (Amari et al., 2003; Macke et al., 2009, 2011), in which correlated inputs are thresholded to produce an output spike pattern. To be concrete, a set of *N* threshold spiking units is forced by a common input *Ic* [drawn from a probability distribution *PC*(*y*)] and an independent input *Ij* [drawn from a probability distribution *PI*(*y*)]. To relate these functions to the other free parameters in the model, *PC*(*y*) and *PI*(*y*) were always chosen so that *Ij* and *Ic* had mean 0 and variances (1 − *c*)σ<sup>2</sup> and *c*σ2, respectively (so that *c* yields the Pearson's correlation coefficient of the input to two cells). The output of each cell *xj* is determined by summing and thresholding these inputs:

$$\mathbf{x}\_{\circ} = H \left( I\_{\circ} + I\_{\circ} - \Theta \right) \tag{16}$$

where *H* is the Heaviside function [*H*(*x*) = 1 if *x* ≥ 0; *H*(*x*) = 0 otherwise]. Conditioned on *Ic*, the probability of each spike is given by:

$$\begin{aligned} \mathbf{Prob}\left[\mathbf{x}\_{\circ} = 1 \mid I\_{\circ} = a\right] &= \mathbf{Prob}\left[I\_{\circ} + a - \Theta > 0\right] \\ &= \mathbf{Prob}\left[I\_{\circ} > \Theta - a\right] \\ &= \int\_{\Theta - a}^{\infty} P\_{\varGamma}(\mathbf{y}) \, d\mathbf{y} \end{aligned}$$

Similarly, we have the conditioned probability that *xj* = 0:

$$\begin{aligned} \mathbf{Prob}\left[\mathbf{x}\_{\circ} = 0 \mid I\_{\circ} = a\right] &= \mathbf{Prob}\left[I\_{\circ} + a - \Theta < 0\right] \\ &= \mathbf{Prob}\left[I\_{\circ} < \Theta - a\right] \\ &= \int\_{-\infty}^{\Theta - a} P\_{I}(\boldsymbol{\wp}) \, d\boldsymbol{\wp} \end{aligned}$$

Because these are conditionally independent, the probability of any spiking event (*x*1, *x*2,..., *xN*) = (*A*1, *A*2,..., *AN*) is given by the integral of the product of the conditioned probabilities against the density of the common input.

$$\mathbf{Prob}\left[\mathbf{x}\_{1} = A\_{1}, \dots, \mathbf{x}\_{N} = A\_{N}\right] = \int\_{-\infty}^{\infty} d\boldsymbol{\eta} \, \boldsymbol{P} \boldsymbol{C}(\boldsymbol{\eta}) \tag{17}$$

$$\prod\_{j=1}^{N} \mathbf{Prob}\left[\boldsymbol{x}\_{j} = A\_{j} \mid I\_{\boldsymbol{\epsilon}} = \boldsymbol{\nu}\right]$$

The integral in Equation (17) is numerically evaluated via an adaptive quadrature routine, such at Matlab's quad or integral.

Four distinct unimodal inputs were used; two with heavy tails (Cauchy and heavy-tailed with skew), and two with sub-Gaussian tails (Gaussian and skewed). A random variable *X* is *sub-Gaussian* if the probability of large events can be bounded above by a scaled Gaussian; that is, if there exist constants *C*,*c* > 0 such that

$$P\left(|X| > \lambda\right) \le C \exp\left(-c\lambda^2\right)$$

for all λ (e.g., see Tao, 2012, p. 15).

Unimodal inputs *Ij*, *Ic* were chosen from different marginals with mean 0 and variances (1 − *c*)σ2, *c*σ2, respectively (for simplicity, we use σ<sup>2</sup> to refer to the variance of a generic probability distribution in the following three paragraphs). For Gaussian inputs with variance σ2, *P*(*x*) ∝ *e*−*x*2/2σ<sup>2</sup> ; for skewed inputs, *P*(*x*) ∝ (*x* + μ)*e*−(*x*+μ)2/2*a*, for *x* > −μ, where the parameter *a* sets the variance 2*a*(1 − <sup>π</sup> <sup>4</sup> ) and shifting by μ = *a*π <sup>2</sup> ensures that the mean of *P*(*x*) is zero.

The heavy-tailed unimodal inputs were chosen so that the rate of tail decay would mimic the *I*−<sup>2</sup> luminance statistics found in natural scenes (Ruderman and Bialek, 1994):

$$P(\mathbf{x}) \propto \frac{1}{\mathbf{x}^2 + 1}, \qquad -X < \mathbf{x} < X$$

$$P(\mathbf{x}) \propto \frac{\mathbf{x}}{\left(\mathbf{x}^2 + 1\right)^{3/2}}, \qquad 0 \le \mathbf{x} < X$$

for the Cauchy and heavy-tailed with skew distributions, respectively. A finite support of *X* was necessary in order to ensure the distributions had finite moments; *X* was chosen to be 1000. Given *X*, the distributions were shifted and scaled to ensure mean 0 and variance σ2.

Bimodal inputs with variance σ<sup>2</sup> were chosen in the following way: in all cases, *P*(*x*) was chosen to be a discrete distribution with support on two values {0, *X*} i.e., *P*(*X*) = *p* and *P*(0) = 1 − *p*. If possible (i.e., if σ<sup>2</sup> ≤ 1/4), *X* was chosen to be 1; otherwise, *X* was chosen so as to minimize the distance between 0 and *X*. Finally, *P*(*x*) was shifted to have the desired mean value.

## **ACKNOWLEDGMENTS**

This research was supported by NSF grants DMS-0817649 and 1056125 by a Career Award at the Scientific Interface from the Burroughs-Wellcome Fund (Eric Shea-Brown), by the Howard Hughes Medical Institute and by NIH grant EY-11850 (Fred Rieke), by a Trinity College Research Studentship (Julijana Gjorgjieva), and by an Early Career Award from the Mathematical Biosciences Institute (Andrea K. Barreiro).

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fncom.2014. 00010/abstract

#### **Figure S1 | Biphasic vs. monophasic filters used in simulations illustrated in Figure 4.**

**Figure S2 | Illustration of RGC simulations with light stimuli of varying spatial scale ("stixels"). (A–C)** For stixel size 60μm, results for one randomly chosen stimulus position. **(A)** Contour lines of the three receptive fields (at 0.5, 1, 1.5, and 2 SD; and at the zero contour line) superimposed on the stimulus checkerboard (for illustration, pictured in an alternating black/white pattern). The red scale bar indicates 100μm. **(B)** Histograms of the excitatory conductances, for each cell. **(C)** Spike pattern distribution, as obtained from computational simulations of the RGC model ("Observed"; dark blue), and the corresponding pairwise fit ("PME"; light pink). All eight spike patterns are shown, to allow for the possibility of non-symmetric responses; the three different probabilities labeled *p*<sup>1</sup> correspond to *P*[(1, 0, 0)], *P*[(0, 1, 0)], and *P*[(0, 0, 1)]. **(D–F)** As in **(A–C)**, but for stixel size 256μm. Panels **(E,F)** demonstrate that for this input, both excitatory inputs and spiking responses are heterogenous across the RGCs.

**Figure S3 | Strength of higher-order interactions produced by the threshold model as input parameters vary; relationship with other output firing statistics. (A)** For skewed common inputs: *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) as a function of input correlation *c* and input standard deviation σ, for a fixed threshold <sup>=</sup> <sup>1</sup>.5. Color indicates *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜); see color bar for range. **(B)** For skewed common inputs: *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) vs. firing rate **<sup>E</sup>**[*x*1] (Left) and the fraction of multi-information (-) captured by the PME model vs. firing rate **E**[*x*1] (Right). In **(B)**, possible input parameters were varied over a broad range as described in section 2. Firing rate is defined as the probability of a spike occurring per cell per random draw of the sum-and-threshold model, as defined in Equation (16). Color indicates output correlation coefficient ρ ranging from black for ρ ∈ (0, 0.1), to white for ρ ∈ (0.9, 1), as illustrated in the color bars. **(C,D)**: as in **(A,B)**, but for heavy-tailed, skewed common inputs.

**Figure S4 | The range of higher-order interactions produced by the**

**threshold model varies across input type.** Here, all values of *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) produced by the three-cell threshold model (previously displayed in **Figures 7**, **S3**) are superimposed to show the contrast between different input distributions. By comparing these data with data from direct sampling of all symmetric spiking distributions on three cells (from

**Figure 1** and shown here in yellow), one can see that only a limited set of output patterns are accessed by the feedforward thresholding model. Firing rate is defined as the probability of a spike occurring per cell per random draw of the sum-and-threshold model, as defined in Equation (16).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 July 2013; accepted: 20 January 2014; published online: 06 February 2014. Citation: Barreiro AK, Gjorgjieva J, Rieke F and Shea-Brown E (2014) When do microcircuits produce beyond-pairwise correlations? Front. Comput. Neurosci. 8:10. doi: 10.3389/fncom.2014.00010*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Barreiro, Gjorgjieva, Rieke and Shea-Brown. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX**

## **A.1 A MEASURE OF HIGHER-ORDER INTERACTIONS:** *D***KL***(P, P***˜***)*

We begin by observing that when *P*˜ is a maximum entropy distribution that approximates *P* (that is, it is log-linear, with coefficients chosen to enforce equality of a set of moments), then the KLdistance may be written as a difference of entropies (Cover and Thomas, 1991; Malouf, 2002):

$$D\_{\mathrm{KL}}\left(P,\tilde{P}\right) = -S(P) + S\left(\tilde{P}\right)$$

Here, the entropy of a probability distribution *P* on {0, 1}<sup>3</sup> is given

$$S(P) = -p\_0 \log \left(p\_0 \right) - 3p\_1 \log \left(p\_1 \right) - 3p\_2 \log \left(p\_2 \right) \qquad (18)$$
 
$$-p\_3 \log \left(p\_3 \right)$$

if we use the fact that the distributions are permutation-symmetric [i.e., *p*<sup>1</sup> ≡ *P*(1, 0, 0) = *P*(0, 1, 0) = *P*(0, 0, 1)]. We take the logarithms in the definitions of the entropy *S* and KL-divergence *D*KL to be base 2, so that any numerical values of these quantities are in units of bits. Using the fact that *P* must normalize to 1, we rewrite

$$S(P) = -\left(1 - 3p\_1 - 3p\_2 - p\_3\right) \log\left(1 - 3p\_1 - 3p\_2 - p\_3\right)$$

$$-3p\_1 \log\left(p\_1\right) - 3p\_2 \log\left(p\_2\right) - p\_3 \log\left(p\_3\right)$$

where the set of admissible distributions may now be described by the convex tetrahedron in <sup>R</sup>3, *<sup>C</sup>* = {*p*1, *<sup>p</sup>*2, *<sup>p</sup>*<sup>3</sup> <sup>≥</sup> <sup>0</sup>; <sup>3</sup>*p*<sup>1</sup> <sup>+</sup> <sup>3</sup>*p*<sup>2</sup> <sup>+</sup> *p*<sup>3</sup> ≤ 1}

We note that the set of distributions which satisfies a desired set of lower order moments is given by an affine subspace (in R3, a line) which intersects this tetrahedron:

$$\begin{aligned} \mu & \equiv \mathbf{E}[X\_i] = p\_1 + 2p\_2 + p\_3\\ \hat{\rho} & \equiv \mathbf{E}[X\_i^2] = p\_2 + p\_3 \end{aligned}$$

Denoting this set by *C*μ,ρˆ, we note that *C*μ,ρ<sup>ˆ</sup> is a convex set and that *<sup>S</sup>*(*P*˜) is constant on each *<sup>C</sup>*μ,ρˆ.

By straightforward differentiation we can check that the Hessian of −*S*(*P*) is positive definite, as long as the probabilities *p*0, *p*1, etc. are strictly greater than zero:

$$-D^2S(P) = \begin{bmatrix} \frac{3}{P\_1} & 0 & 0\\ 0 & \frac{3}{P\_2} & 0\\ 0 & 0 & \frac{1}{P\_3} \end{bmatrix} + \frac{1}{P^0} \begin{bmatrix} 9 \ 9 \ 3\\ 9 \ 9 \ 3\\ 3 \ 3 \ 1 \end{bmatrix}$$

Therefore <sup>−</sup>*S*(*P*) is convex on *<sup>C</sup>*μ,ρˆ; since *<sup>S</sup>*(*P*˜) is constant, *<sup>D</sup>*KL(*P*, *<sup>P</sup>*˜) is likewise convex on *<sup>C</sup>*μ,ρˆ. As a consequence, if *D*KL(*P*, *P*˜) has a local minimum, then it is unique and a global minimum as well. Since *D*KL(*P*, *P*˜) ≥ 0 with equality if and only if *P* = *P*˜, this minimum must be achieved occurs when *P* = *P*˜; the maximum is likewise achieved on the boundary of the admissible region *C*μ,ρˆ.

#### **A.2 A MEASURE OF HIGHER-ORDER INTERACTIONS: STRAIN**

We define the *strain*,

$$\gamma = \log\left(\frac{p\_3 p\_1^3}{p\_0 p\_2^3}\right) \tag{19}$$

$$= \log p\_3 - \log p\_0 + 3\log p\_1 - 3\log p\_2$$

a potential measure of the importance of higher-order interactions (Ohiorhenuan and Victor, 2010). By Equation (3), we can see that γ = 0 precisely for a pairwise maximum entropy (PME) distribution. We will show that as the distribution (*p*0, *p*1, *p*2, *p*3) is moved away from the constraint surface while fixing lower-order moments, the strain increases monotonically.

From the definition of lower-order moments,

$$\mu = \mathbb{E}[X\_i] = p\_1 + 2p\_2 + p\_3$$

$$\hat{\rho} = \mathbb{E}[X\_i X\_j] = p\_2 + p\_3$$

we can verify that in order to keep μ, ρˆ constant, if *p*<sup>1</sup> increases by *z* (i.e., *p*<sup>1</sup> → *p*<sup>1</sup> + *z*), then we must also have *p*<sup>2</sup> → *p*<sup>2</sup> − *z* and *p*<sup>3</sup> → *p*<sup>3</sup> + *z*. Then if each probability is strictly positive, then the derivative

$$\frac{\partial \chi}{\partial z} = \frac{1}{p\_3 + z} + \frac{1}{1 - p\_3 - \mathfrak{z}p\_1 - \mathfrak{z}p\_2 - z} + \frac{\mathfrak{z}}{p\_1 + z} + \frac{\mathfrak{z}}{p\_2 - z}$$

is strictly positive as well. In particular, it is strictly positive at *z* = 0 and will remain positive until *z* reaches a value such that one of the denominators reaches 0. Therefore γ increases monotonically for *z* > 0 and decreases monotonically for *z* < 0.

#### **A.3 AN ANALYTICAL EXPLANATION FOR UNIMODAL vs. BIMODAL EFFECTS**

We consider an analytical argument to support the numerical results that bimodal inputs generate larger deviations from PME model fits than unimodal inputs. As a metric, we consider *D*KL(*P*, *P*˜)—where *P* and *P*˜ are again the true and model distributions, respectively—when we perturb an independent spiking distribution by adding a common, global input of variance *c*. To simplify notation, the small parameter in the calculation will be denoted <sup>=</sup> <sup>√</sup>*c*.

We now compute *S*(*P*) and *S*(*P*˜) (defined in an earlier Appendix) by deriving a series expansion for each set of event probabilities. We can compute the true distribution *P* using the expressions derived in Equation (18); to recap, let the common input *Ic* have probability density *p*(*Ic*), and the independent input to each cell, *x*, have density *ps*(*x*). Let be the threshold for generating a spike (i.e., a "1" response). For each cell, a spike is generated if *x* + *Ic* > , i.e., with probability

$$d(I\_{\epsilon}) = \int\_{\Theta - I\_{\epsilon}}^{\infty} p\_s(\mathbf{x}) d\mathbf{x}.$$

Given *Ic*, this is conditionally independent for each cell. We can therefore write our probabilities by integrating over *Ic* as follows:

$$\begin{split} p\_0 &= \int\_{-\infty}^{\infty} p(I\_\ell)(1 - d(I\_\ell))^3 \, dI\_\ell \\ p\_1 &= \int\_{-\infty}^{\infty} p(I\_\ell) d(I\_\ell)(1 - d(I\_\ell))^2 \, dI\_\ell \\ p\_2 &= \int\_{-\infty}^{\infty} p(I\_\ell) d(I\_\ell)^2 (1 - d(I\_\ell)) \, dI\_\ell \\ p\_3 &= \int\_{-\infty}^{\infty} p(I\_\ell) d(I\_\ell)^3 \, dI\_\ell \end{split} \tag{20}$$

We develop a perturbation argument in the limit of very weak common input. That is, *p*(*Ic*) is close to a delta function centered at *Ic* = 0. Take *p*(*Ic*) to be a scaled function

$$p(I\_{\epsilon}) = \frac{1}{\epsilon} f\left(\frac{I\_{\epsilon}}{\epsilon}\right) \tag{21}$$

We place no constraints on *f*(*x*), other than that it must be normalized (**E**[1] = 1) and that its moments must be finite (so that **E**[*Ic*], **E**[*I*<sup>2</sup> *<sup>c</sup>* ], and so forth will exist, where **E**[*g*(*x*)] ≡ <sup>∞</sup> −∞ *g*(*x*)*f*(*x*) *dx*).

For the moment, assume that the function *f*(*x*) has a single maximum at *x* = 0. To evaluate the integrals above, we Taylorexpand *d*(*x*) around *x* = 0. Anticipating a sixth-order term to survive, we keep all terms up to this order. This gives, for small *x*,

$$d(\mathbf{x}) \approx d(0) + \sum\_{k=1}^{6} a\_k \mathbf{x}^k + O(\mathbf{x}^7)$$

where *a*<sup>1</sup> = *ps*() (the other coefficients *a*2-*a*<sup>6</sup> can be given similarly in terms of the independent input distribution at ). Substituting this into the expressions for *p*0, etc., above, with *p*(*Ic*) given as in Equation (21), gives us each event as a series in ; for example,

$$p\_3 = d\_0^3 + \left( 3a\_1 d\_0^2 \operatorname{E}[\boldsymbol{x}] \right) \epsilon + \left( \left( 3a\_1^2 d\_0 + 3a\_2 d\_0^2 \right) \operatorname{E}[\boldsymbol{x}^2] \right) \epsilon^2 + \dots, \epsilon$$

where expectations are, again, with respect to the unscaled PDF *f*(*x*). The entropy *S*(*P*) is now given by using these series expansions in Equation (18).

We note that our derivation does not rely on the fact that the distribution of common input is peaked at *Ic* = 0 in particular. For example, we could have a common input centered around μ. The common input distribution function would be of the form

$$p(I\_c) = \frac{1}{\epsilon} f\left(\frac{I\_c - \mu}{\epsilon}\right).$$

Changing regulates the variance, but doesn't change the mean or the peak (assuming, without loss of generality, that the peak of *f* occurs at zero). The peak of *p*(*Ic*) now occurs at μ, and the appropriate Taylor expansion of *d*(*x*) is

$$d(\mathbf{x}) \approx d(\boldsymbol{\mu}) + \sum\_{k=1}^{6} b\_k (\boldsymbol{\kappa} - \boldsymbol{\mu})^k + O(\boldsymbol{\kappa}^7),$$

where the coefficients *bk* now depend on the local behavior of *d* around μ. The expectations that appear in the expansion of *p*3, and so forth, are now centered moments taken around μ; the calculations are otherwise identical. In other words, the perturbation expansion requires the *variance* of the common input to be small, but not the mean.

For bimodal inputs, we consider a common input with a probability distribution of the following form:

$$p\left(l\_{\epsilon}\right) = \left(1 - \epsilon^2\right) \frac{1}{\epsilon} f\left(\frac{I\_{\epsilon}}{\epsilon}\right) + \epsilon^2 \frac{1}{\epsilon} f\left(\frac{I\_{\epsilon} - 1}{\epsilon}\right)$$

so that most of the probability distribution is peaked at zero, but there is a second peak of higher order (here taken at *Ic* = 1, without loss of generality). Again, we approximate the integrals given in Equation (20), and therefore the entropy *S*(*P*), by Taylor expanding *d*(*x*);

$$\begin{aligned} d(\mathbf{x}) &\approx d(\mathbf{0}) + \sum\_{k=1}^{6} a\_k \mathbf{x}^k + O(\mathbf{x}^7); \quad (\mathbf{x} \approx \mathbf{0})\\ &\approx d(\mathbf{1}) + \sum\_{k=1}^{6} b\_k \left(\mathbf{x} - \mathbf{1}\right)^k + O\left((\mathbf{x} - \mathbf{1})^7\right); \quad (\mathbf{x} \approx \mathbf{1}) \end{aligned}$$

around the two peaks 0 and 1, respectively. For each integral we have the same contributions from the unimodal case, multiplied by (1 − 2), as well as the corresponding contributions from the second peak multiplied by <sup>2</sup> (these weightings are chosen so that the common input has variance of order 2, as in the unimodal case). This makes clear at what order every term enters.

We now construct an expansion for the PME model *P*˜:

$$\begin{aligned} \tilde{P}\left(\mathbf{x}\_{1}, \mathbf{x}\_{2}, \mathbf{x}\_{3}\right) &= \frac{1}{Z} \exp\left(\lambda\_{1}\left(\mathbf{x}\_{1} + \mathbf{x}\_{2} + \mathbf{x}\_{3}\right)\right) \\ &+ \lambda\_{2}\left(\mathbf{x}\_{1}\mathbf{x}\_{2} + \mathbf{x}\_{2}\mathbf{x}\_{3} + \mathbf{x}\_{1}\mathbf{x}\_{3}\right), \end{aligned}$$

We approach this problem by describing λ<sup>1</sup> and λ<sup>2</sup> as a series in . We match coefficients by forcing the first and second moments of *P*˜ to match those of *P*—as they must. Specifically, take

$$\lambda\_1 = \tilde{\lambda} + \sum\_{k=1}^{6} \epsilon^k \mu\_k + O\left(\epsilon^7\right)$$

$$\lambda\_2 = \sum\_{k=1}^{6} \epsilon^k \nu\_k + O\left(\epsilon^7\right)$$

where λ<sup>1</sup> = λ˜, λ<sup>2</sup> = 0 are the corresponding parameters from the independent case. The events *p*˜0, *p*˜1, *p*˜2, and *p*˜<sup>3</sup> can be written as a series in . We then require that the mean and centered second moments of *P*˜ match those of *P*; that is

$$p\_1 + 2p\_2 + p\_3 = \tilde{p}\_1 + 2\tilde{p}\_2 + \tilde{p}\_3$$

$$\left(p\_2 + p\_3 - \left(p\_1 + 2p\_2 + p\_3\right)^2 = \tilde{p}\_2 + \tilde{p}\_3 - \left(\tilde{p}\_1 + 2\tilde{p}\_2 + \tilde{p}\_3\right)^2 \dots\right)$$

At each order *k*, this yields a system of two linear equations in *uk* and *vk*; we solve, inductively, up to the desired order; we now have *P*˜, and therefore *S*(*P*˜), as a series in .

Finally, we combine the two series to find that in the *unimodal* case,

$$\begin{split} D\_{\text{KL}}\left(P,\tilde{P}\right) &= \mathcal{S}\left(\tilde{P}\right) - \mathcal{S}(P) \\ &= \epsilon^{6} \left[ \frac{a\_{1}^{6} \left(2\operatorname{E}[\boldsymbol{\kappa}]^{3} - 3\operatorname{E}[\boldsymbol{\kappa}]\operatorname{E}[\boldsymbol{\kappa}^{2}] + \operatorname{E}[\boldsymbol{\kappa}^{3}]\right)^{2}}{2\left(1 - d\_{0}\right)^{3}d\_{0}^{3}} \right] \\ &+ \mathcal{O}\left(\epsilon^{7}\right) \end{split} (22)$$

If the first two odd moments of the distribution are zero

(something we can expect for "symmetric" distributions, such as a Gaussian), then this sixth-order term is zero as well.

For the *bimodal* case

$$\begin{aligned} D\_{\text{KL}}\left(P, \stackrel{\ast}{\hat{P}}\right) &= \mathcal{S}\left(\tilde{P}\right) - \mathcal{S}(P) \\ &= \epsilon^4 \left[ \frac{\left(d\_1 - d\_0\right)^6}{2\left(1 - d\_0\right)^3 d\_0^3} \right] + O\left(\epsilon^5\right) \end{aligned}$$

This last term depends on the distance *d*<sup>1</sup> − *d*0, in other words, how much more likely the independent input is to push the cell over threshold when common input is "ON". We can also view this as depending on the ratio *<sup>d</sup>*1−*d*<sup>0</sup> <sup>1</sup>−*d*<sup>0</sup> , which gives the fraction of previously non-spiking cells that now spike as a result of the common input.

*The main point here, of course, is that DKL*(*P*, *P*˜) *is of order* <sup>4</sup> *rather than* 6*.* So, as the strength of a common binary vs. unimodal input increases, spiking distributions depart from the PME more rapidly.

## Long-term plasticity determines the postsynaptic response to correlated afferents with multivesicular short-term synaptic depression

## *Alex D. Bird1,2,3\* and Magnus J. E. Richardson1*

*<sup>1</sup> Warwick Systems Biology Centre, University of Warwick, Coventry, UK*

*<sup>2</sup> Warwick Systems Biology Doctoral Training Centre, University of Warwick, Coventry, UK*

*<sup>3</sup> School of Life Sciences, University of Warwick, Coventry, UK*

#### *Edited by:*

*Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany*

#### *Reviewed by:*

*Jean-Pascal Pfister, Cambridge University, UK (in collaboration with Simone Surace) Michael Graupner, New York University, USA*

#### *\*Correspondence:*

*Alex D. Bird, Warwick Systems Biology Centre, Senate House, University of Warwick, CV4 7AL, Coventry, UK e-mail: a.d.bird@warwick.ac.uk*

Synchrony in a presynaptic population leads to correlations in vesicle occupancy at the active sites for neurotransmitter release. The number of independent release sites per presynaptic neuron, a synaptic parameter recently shown to be modified during long-term plasticity, will modulate these correlations and therefore have a significant effect on the firing rate of the postsynaptic neuron. To understand how correlations from synaptic dynamics and from presynaptic synchrony shape the postsynaptic response, we study a model of multiple release site short-term plasticity and derive exact results for the crosscorrelation function of vesicle occupancy and neurotransmitter release, as well as the postsynaptic voltage variance. Using approximate forms for the postsynaptic firing rate in the limits of low and high correlations, we demonstrate that short-term depression leads to a maximum response for an intermediate number of presynaptic release sites, and that this leads to a tuning-curve response peaked at an optimal presynaptic synchrony set by the number of neurotransmitter release sites per presynaptic neuron. These effects arise because, above a certain level of correlation, activity in the presynaptic population is overly strong resulting in wastage of the pool of releasable neurotransmitter. As the nervous system operates under constraints of efficient metabolism it is likely that this phenomenon provides an activity-dependent constraint on network architecture.

**Keywords: long-term plasticity, short-term plasticity, synaptic depression, correlations and synchrony, voltage fluctuations**

## **1. INTRODUCTION**

Synapses play a key role in transmitting and processing information throughout the nervous system and long-term shifts in synaptic efficacy are believed to underpin learning and memory (Hebb, 2002; Markram et al., 2011). Synapses function through release of neurotransmitters that then bind to receptors on the postsynaptic cell and transiently alter the membrane conductance. Neurotransmitters in the presynaptic terminal are stored and transported in vesicles (Fox, 1988; Hu et al., 2008). A number of vesicles are positioned at active sites where they have a certain probability of being released when the presynaptic cell spikes. Empty release sites are restocked after a variable period, with an overall rate of a few Hz (Südhof, 2004). Both the number of contacts per presynaptic cell and the activity in the presynaptic network can generate correlations in the release of neurotransmitter at synapses onto a single neuron; we demonstrate that postsynaptic activity is governed by a balance between these two sources of correlation.

The usage of vesicles due to presynaptic firing and stochastic replenishment means that the number of vesicles available for release is a highly dynamic quantity that is dependent on the history of afferent activity. In the immature cortex, the relatively high release probability and limited availability of vesicles causes a progressive reduction in synaptic efficacy during a period of sustained neuronal activity (Reyes and Sakmann, 1999; Chen and Buonomano, 2012). This short-term reduction in synaptic strength is known as vesicle depletion depression: an unstocked active site cannot induce a postsynaptic response to any incident action potential (Abbot, 1997; Tsodyks and Markram, 1997; Zucker and Regehr, 2002). The phenomenon is believed to play a role in gain control (Abbot, 1997; Abbott and Regehr, 2004; Rothman et al., 2009), information transmission (Zador, 1998; Kilpatrick, 2012; Scott et al., 2012), and adaptation to sensory stimuli (Furukawa et al., 1982; Hallermann and Silver, 2012). The synaptic plasticity models introduced by Abbot (1997) and Tsodyks et al. (1998) capture short-term depression accurately; they match empirical data and allow a richness of network behavior (Tsodyks et al., 1998) to emerge beyond that predicted by static synapses. Such models consider the mean efficacy of the synapse, averaged across several presentations of the same presynaptic stimulus; the predicted postsynaptic response therefore varies continuously. Several recent studies have considered a quantal model of synaptic function incorporating short-term depression, with probabilistic vesicle release and replacement to reflect trial-to-trial variability (Fuhrmann et al., 2002; de la Rocha and Parga, 2005; Rosenbaum et al., 2012). The impact of stochastic vesicle dynamics is particularly marked when mean synaptic drive is insufficient to bring the postsynaptic neuron to threshold and spiking activity is governed by fluctuations in the system (Gerstein and Mandelbrot, 1964; Kuhn, 2004). To induce postsynaptic firing in such a system it is necessary for the variable synaptic drive to exhibit coincidences; this occurs most regularly when that drive is correlated.

Correlations in neurotransmitter release between different sites can arise from two sources: from multiple contacts onto a postsynaptic neuron from the same presynaptic cell and from synchronous activity across the presynaptic population. The number of sites between a pair of neurons is fixed over short timescales, unlike the number of vesicles ready to release from the sites, but can vary widely over longer periods (Loebel et al., 2013) following potentiation or depression. Connections between neurons potentiate and depress in the long term chiefly through changes in this synaptic parameter—the number of independent release sites can be seen as a fundamental unit of memory. Synchronous firing in the presynaptic population emerges from the connectivity of neuronal networks (Aertsen et al., 1989) and has relevance for encoding sensory information (von der Malsburg, 1981; deCharms and Merzenich, 1996; Averbeck et al., 2006), motor control (Baker et al., 2001; Capaday, 2013) and decision making (Cohen and Newsome, 2008; Cain and Shea-Brown, 2013). Recent work suggests that modulation of correlations can be more significant for neuronal coding than alterations in the presynaptic firing rate (Seriès et al., 2004; Mitchell et al., 2009; Cohen and Kohn, 2011). Population synchronization is a transient phenomenon relative to the structural changes underlying long-term plasticity.

A detailed stochastic model of neurotransmitter dynamics at the presynaptic terminal is required to analyze the effects of presynaptic synchrony, particularly when long-term plasticity varies the structure of synapses through altering the number of release sites. It can be noted that multiple contacts between cells and transient correlations within a presynaptic population are likely to introduce considerable redundancy in the usage of vesicles: correlated events may lead to EPSPs many times larger than that required to reach threshold. However, evidence points to the nervous system operating under constraints of efficient metabolism (Levy and Baxter, 2002; Taschenberger et al., 2002; Savtchenko et al., 2012) suggesting such wastage would not commonly arise *in vivo*. It is therefore of interest to examine the effect on the postsynaptic cell of the interaction of partially synchronized afferent drive with multiple contacts per presynaptic cell. To this end, we analyze a model of a postsynaptic cell receiving input from a population of release sites distributed between different numbers of presynaptic neurons and with different levels of synchrony.

Following the basic model definitions, we first derive exact forms for the crosscorrelations of vesicle occupancies and release at multiple contacts from the same and different presynaptic cells. These correlations were previously derived by Rosenbaum et al. (2012) using a diffusion and additive-noise approximation, and our results show that this earlier method gave exact results for these quantities. We then go on to calculate the exact voltage mean and variance and, through comparison with the typical EPSP amplitude, argue that synaptic noise can become significantly non-Gaussian. We then derive two approximate limiting forms for the firing rate for low and high correlations and demonstrate that the postsynaptic response is optimal at intermediate levels of afferent correlations. We finally show that this effect is robust for neurons in which there is some level of synaptic homeostasis or soft limit on the total number of release sites.

## **2. METHODS**

We consider a population of *N* presynaptic neurons synapsing onto a single postsynaptic neuron. A presynaptic neuron makes synapses with *n* vesicle occupancy sites from each of which neurotransmitter may be independently released with a probability *p* on the arrival of a presynaptic action potential, occurring at a constant Poissonian rate *Ra*. In between presynaptic action potentials, empty release sites are restocked independently at a constant Poissonain rate *Rr*. Initially, we consider that the total number of release sites onto the postsynaptic cell is fixed at *M* = *nN* (example configurations are provided in **Figures 1A–C**). The number of independent release sites *n* was recently shown (Loebel et al., 2013) to be the synaptic parameter most closely correlated with the structural changes arising from long-term plasticity and so we will consider the effects of varying *n* (while initially keeping *M* constant) on the postsynaptic response. The binary variable *x* will be used to signify vesicle release-site occupancy: *x* = 1 if present or *x* = 0 if absent. The evolution of vesicle occupancy is given by the stochastic differential equation

$$\frac{d\mathbf{x}}{dt} = (1 - \mathbf{x}) \sum\_{m} \delta(t - t\_m) - \sum\_{k} \varrho\_k(\mathbf{x}) \delta(t - t\_k) \tag{1}$$

where *m* counts the restock events occurring at a rate *Rr* and *k* counts the presynaptic action potentials occurring at a rate *Ra*. The binary random variable *<sup>k</sup>*(*x*) signifies whether a release was successful at the *k*th action potential: if *x* = 1 then *<sup>k</sup>*(*x*) = 1 with probability *p* to model a successful release of neurotransmitter, and is 0 otherwise to model a failed release from a stocked site; if *x* = 0 then no release is possible and *<sup>k</sup>*(*x*) = 0. The δs are

**featuring** *n* **independent release sites onto a single postsynaptic cell. (A)** The stochastic dynamics are illustrated from left to right: if a vesicle is present it is released (with probability *p*) when an action potential arrives (Poissonian rate *Ra*); an empty release site; and restock of an empty release site (Poissonian rate *Rr* ). **(B,C)** examples with *M* = *nN* = 9 with **(B)** *n* = 1, *N* = 9 and **(C)** *n* = 3, *N* = 3 contacts and presynaptic neurons, respectively. **(D)** Example spike trains for *M* = *N* = 6 correlated presynaptic neurons that feature *S* = 3 synchronous spikes.

Dirac delta functions and whenever a delta function multiplies a dynamic variable it is assumed that the value of the variable used is that immediately before the delta event occurs. In other words, the equations are non-anticipating and should be interpreted in an Ito sense ( ¯ Gardiner, 2010).

#### **2.1. CORRELATIONS FROM STRUCTURE**

When a presynaptic neuron spikes, available vesicles at each of the *n* sites release their contents independently with probability *p*, and so the total number of release events is binomially distributed. Note that because these sites receive the same incoming action potentials correlations will arise despite the independent conditional release and restock events at each site. Globally, we first hold the total number of release sites, given by *M* = *nN*, constant so that the postsynaptic neuron receives a fixed overall excitatory drive. In this study we set *M* = 5000, which is of-theorder-of estimates by O'Kusky and Colonnier (1982), Megías et al. (2001), and Spruston (2008). This has the effect of maintaining the overall level of excitatory drive to the postsynaptic cell and in biological terms can be seen as a constraint of metabolic efficiency across the presynaptic population: as some contacts potentiate, others die out. The effects of relaxing this condition are discussed later. Recent analysis of long-term plasticity data has shown that changes in EPSP amplitude are captured by models in which the number of independent release sites *n* increases or decreases. Depending on the protocol, *n* can potentiate or depress by a factor of 5 or more (Loebel et al., 2013); a typical range for *n* is 5–50. However, contacts with a binomial *n* as low as 1 or as high as 100 sites have also been observed. Though the upper bound is unbiological, for completeness we vary *n* between 1 and 5000 in simulations.

#### **2.2. CORRELATIONS FROM PRESYNAPTIC SYNCHRONY**

The population of neurons driving a common target often displays substantial synchrony in spiking activity (Salinas and Sejnowski, 2000; Averbeck et al., 2006; Cohen and Kohn, 2011) (see **Figure 1D**). Here we model correlations in the presynaptic population by using a variation of the Multiple Interaction Process (MIP) introduced in Kuhn et al. (2003). We implement the process by considering a master spike train with a constant Poissonian rate *NRa*/*S*. For each spike in the master train we pick *S* of the presynaptic neurons at random and assign a synchronous spike in their trains. If *S* = 1 this would imply no correlations in the presynaptic population and *S* = *N* would be a fully synchronous presynaptic population. Note that the spiking of each presynaptic neuron is Poissonian at rate *Ra* as required and also that, given that one presynaptic neuron spikes, the probability that a particular other presynaptic neuron has a spike at the same time is *c* = (*S* − 1)/(*N* − 1). In reality, shared spikes will not be entirely synchronous and so in later simulations (specifically, those leading to **Figures 6**, **7**) we add independent, normally distributed jitter to the spike times with mean 0 and standard deviation τ*<sup>j</sup>* following de la Rocha and Parga (2005) and Cohen and Kohn (2011). Note that in **Figures 5**, **6A,B**, **7** the curves are truncated for increasing *n* because, for fixed *S* and fixed *M* = *nN*, it is invalid to have *S* greater than *N*. This is also the case for **Figures 6B,C** with increasing *S*.

## **2.3. POSTSYNAPTIC VOLTAGE**

We treat the postsynaptic neuron as a leaky integrate-and-fire model with each neurotransmitter release event causing the voltage to jump by an amount *a*. The membrane voltage *V* has a resting value *E* and a spike threshold *Vth*. After a spike, *V* is reset to *E* and held there for a time τ*<sup>r</sup>* to model the refractory period. If *N* presynaptic neurons each have *n* neurotransmitter release sites then the postsynaptic voltage is governed by

$$\text{tr}\frac{dV}{dt} = E - V + a\text{tr}\sum\_{i=1}^{N}\sum\_{j=1}^{n}\sum\_{k} \varrho\_{k}^{ij}(\mathbf{x}\_{\vec{\eta}})\delta(t - t\_{k}^{i})\tag{2}$$

where τ is the membrane time constant, *xij* is the occupancy variable for the *i*th presynaptic neuron's release site number *j* and *k* labels the order of incoming action potentials to release site with occupancy *xij*. Note that the spike times *t i <sup>k</sup>* are identical for all release sites with the same presynaptic neuron *i* and that some of the spike times will be common to release sites with distinct presynaptic neurons, depending on the level of synchrony given by the correlated MIP process parameterized by *S*. The values of other parameters used in simulations (unless otherwise stated) are given in (**Table 1**).

## **3. RESULTS**

We first derive exact forms for the crosscorrelations of vesicleoccupancy and of neurotransmitter-release time series. The latter can then be used to calculate the exact membrane voltage variance. Two approximations of the postsynaptic firing rate then lead us to the main result of the paper: that long-term synaptic plasticity—through its alternation of the synaptic parameter *n*—sets the optimal postsynaptic response to a presynaptic population with correlated firing. Throughout this section the notation φ denotes the steady-state expectation of the fluctuating quantity φ.

For the calculation of the crosscorrelations of objects separated by a time *T*, it is useful to consider how the steady-state expectation of the product of the occupancy *x* with some quantity ψ evaluated at an earlier time evolves with the separation time:

$$\frac{d}{dT} \left< \mathbf{x}(T)\boldsymbol{\psi}(0) \right> = \left< (1 - \mathbf{x}(T))\boldsymbol{\psi}(0) \right> R\_r - \left< \mathbf{x}(T)\boldsymbol{\psi}(0) \right> \boldsymbol{p} R\_a \tag{3}$$

where the first term on the right-hand side is the rate that an empty site is filled and the second term is the rate that a full site releases its contents. This equation can be rearranged into the form

$$\text{tr}\_{\mathfrak{x}} \frac{d}{dT} \left< \mathfrak{x}(T)\psi(0) \right> = \left< \mathfrak{x} \right> \left< \psi \right> - \left< \mathfrak{x}(T)\psi(0) \right> \tag{4}$$

where the time constant τ*<sup>x</sup>* and steady-state occupancy *x* are

$$\mathfrak{r}\_{\mathfrak{x}} = \frac{1}{R\_r + \mathfrak{p}R\_a} \text{ and } \langle \mathfrak{x} \rangle = \frac{R\_r}{R\_r + \mathfrak{p}R\_a}. \tag{5}$$

That the second quantity must be the steady-state occupancy *x* can be inferred by noting that in the limit *T* → ∞ the expectation



*x*(*T*)ψ(0) in Equation (3) loses its *T* dependence and factorises into the product *x* ψ. Note that the exponential solution to the differential Equation (4) implies that all crosscorrelations that include the occupancy *x* take a simple exponential form

$$\text{Crosscorr}(\mathbf{x}, \boldsymbol{\psi}) = (\langle \mathbf{x} \boldsymbol{\psi} \rangle - \langle \mathbf{x} \rangle \langle \boldsymbol{\psi} \rangle) e^{-t/\mathfrak{x}\_{\mathbf{x}}} \tag{6}$$

where *x*ψ is the expectation evaluated in the limit *T* → 0.

#### **3.1. VESICLE OCCUPANCY CROSSCORRELATIONS**

The autocorrelation of release-site occupancy can be calculated by making use of the fact that for the binary variable *x* we have *x*<sup>2</sup> = *x* and so *x*2 = *x*. Putting ψ = *x* in equation (6) gives

$$\text{Autocorr}(\mathbf{x}) = \langle \mathbf{x} \rangle (1 - \langle \mathbf{x} \rangle) e^{-|T|/\mathfrak{t}\_{\mathbf{x}}} = \frac{pR\_aR\_r}{(R\_r + pR\_a)^2} e^{-|T|/\mathfrak{t}\_{\mathbf{x}}} \tag{7}$$

where the extension of the exponential to negative times comes from a symmetry argument. For the crosscorrelation between different release sites, with occupancy variables *x* and *x* , we need to distinguish between cases where the release sites either share the same presynaptic neuron or have different presynaptic neurons when deriving *xx* . However, the derivation can be written in the same form by introducing a quantity γ that is the proportion of shared spikes: γ = 1 for release sites with the same presynaptic neuron or γ = *c* = (*S* − 1)/(*N* − 1) for different presynaptic neurons. A steady-state equation for the zero-time expectation *xx* can be found by considering the state where both sites are occupied and balancing the total rates into and out of this state

$$\left\langle \mathbf{x} (1 - \mathbf{x}') \right\rangle R\_r + \left\langle (1 - \mathbf{x}) \mathbf{x}' \right\rangle R\_r = \left\langle \mathbf{x} \mathbf{x}' \right\rangle (2R\_a p - \chi R\_a p^2). \tag{8}$$

The terms on the left-hand side represent the total rate into the double occupancy state, whereas the terms on the right-hand side multiplying the expectation are the combined rates of individual vesicle release minus the coincidence term to prevent overcounting of events. We now combine terms to obtain the required expectation

$$
\left<\infty\right>\_{\gamma} = \frac{2R\_r \left<\infty\right>}{2R\_r + R\_d p(2-\chi p)}\tag{9}
$$

where the γ subscript will be used later to distinguish the different cases. It can be inserted into Equation (6) with ψ = *x* to give

$$\text{Crossccorr}(\mathbf{x}, \mathbf{x}') = \frac{\gamma p^2 R\_a R\_r^2 e^{-|T|/\tau\_{\mathbf{x}}}}{(2R\_r + pR\_a(2 - p\gamma))(R\_r + pR\_a)^2} . \tag{10}$$

Example plots of Equation (7), and Equation (10) for cases with γ = 1 and γ = *c* are given in **Figures 2A,C,E**. It is interesting to note that our exact results are identical to those previously calculated in Rosenbaum et al. (2012) using a combined diffusion and additive-noise approximation, validating their method up to second-order statistics.

#### **3.2. NEUROTRANSMITTER RELEASE CROSSCORRELATIONS**

Though synchrony in the presynaptic population leads to positive correlations for release-site occupancy, we now show that the delayed restock following release leads to negative cross-correlations in the release events themselves. Let χ(*t*) and χ (*t*) be trains of delta pulses representing neurotransmitter release from sites with occupancies defined by *x*(*t*) and *x* (*t*), respectively, so that:

$$\chi(t) = \sum\_{k} \varrho\_{k}(\mathbf{x}) \delta(t - t\_{k}) \tag{11}$$

where *k* counts incoming action potentials at the contact with site occupancy *x*. In the steady state we have χ = *pRa x* because the rate of release is equal to the release rate *pRa* given vesicle occupancy multiplied by the occupancy probability *x*. The auto and crosscorrelations can be straightforwardly calculated using the general result of Equation (6) by setting ψ = χ and noting that χ(*T*)χ (0) = *pRa x*(*T*)χ (0) . However, some care needs to be taken when considering the case *T* = 0. The result of Equation (6) is valid in the limit *T* → 0; but there is an additional delta function in the crosscorrelation when *T* = 0 with an amplitude equal to the rate of simultaneous events in χ and χ that arises from the delta functions in Equation (11). The autocorrelation

function for χ therefore takes the form

$$\text{Autocorr}(\chi) = \mathsf{p} \mathsf{R}\_a \left< \mathsf{x} \right> \delta(T) - (\mathsf{p} \mathsf{R}\_a \left< \mathsf{x} \right>)^2 e^{-|T|/\mathsf{\tau}\_{\mathsf{x}}} \tag{12}$$

where the rate of simultaneous events for the autocorrelation is just the mean release rate *pRa x* and prefactor of the exponential is only − χ <sup>2</sup> because in the limit *T* → 0 the expectation of χ(*T*)χ(0) is zero as there is no time for a restock. A similar consideration gives the result for the crosscorrelation

$$\begin{split} \text{Crosscorr}(\chi, \chi') &= \chi p^2 R\_a \left< \text{xc}' \right>\_{\chi} \delta(T) \\ &+ R\_a^2 p^2 ((1 - \chi p) \left< \text{xc}' \right>\_{\chi} - \langle \chi \rangle^2) e^{-|T|/\mathfrak{t}\_{\chi}} \end{split} \tag{13}$$

where we are treating cases for which the release is from distinct contacts sharing the same presynaptic neuron γ = 1 or from distinct presynaptic neurons where γ = *c*. In Equation (13) the prefactor of the delta function arises from the rate of simultaneous releases, which is equal to the arrival of simultaneous spikes γ*Ra* multiplied by the probability that each contact releases a vesicle *p*<sup>2</sup> *xx* <sup>γ</sup>. The prefactor of the exponential shares the same squared component − χ <sup>2</sup> = −(*pRa x*)<sup>2</sup> as the autocorrelation, but also has a non-zero contribution from χ(*T*)χ (0) in the limit *T* → 0. This quantity is equal to the probability that both sites are occupied *xx* <sup>γ</sup> multiplied by the probability of a release from site *x* but no release from site *x* from a simultaneous presynaptic event, which is *Rap*(1 − γ*p*) multiplied by a subsequent release from site *x* just afterwards due to a second presynaptic spike, *pRa*. This exact result is again identical to that derived previously using a diffusion and additive-noise approximation (Rosenbaum et al., 2012). Example autocorrelation and crosscorrelation functions are plotted in **Figures 2B,D,F**.

#### **3.3. MEMBRANE VOLTAGE MEAN AND VARIANCE**

The tonic component of the presynaptic drive can be characterized by the mean voltage, which is straightforward to calculate in the absence of a threshold. The dynamics of this quantity can be found by taking the expectation of Equation (2) to yield the steady-state result

$$
\langle V \rangle = E + aM\mathfrak{r}pR\_a \langle \mathfrak{x} \rangle = E + \frac{aM\mathfrak{r}pR\_aR\_r}{R\_r + \mathfrak{p}R\_a}.\tag{14}
$$

Note that the mean voltage is independent of the synchrony *S* and is also independent of release-site number *n* when *M* = *nN* is held fixed.

The effect of correlated synaptic fluctuations on the postsynaptic neuron can also be characterized by deriving the steady-state variance of the postsynaptic voltage (again in the absence of a threshold-reset mechanism). This quantity is derived in the Appendix using the auto and crosscorrelations of χ (Equations 12, 13) and takes the form

$$\text{Var}(V) = \frac{a^2 \text{\textdegree N} \eta p R\_a}{2} \left( \left< \text{x} \right> + (n-1)p \left< \text{x} \right>\_1 + (N-1)ncp \left< \text{x} \text{x} \right>\_c \right)$$

$$+ \frac{Nn(a \text{\textdegree p} R\_a)^2}{1 + \text{\textdegree r} R\_r + p \text{\textdegree R}\_a} \left( (n-1)(1-p) \left< \text{x} \text{x} \right>\_1 \right.$$

$$+ (N-1)n(1-cp) \left< \text{x} \text{x} \right>\_c - Nn \left< \text{x} \right>^2 \right). \tag{15}$$

The first term arises from the δ-functions in Equations (12, 13) and the second term comes from the negative correlations in vesicle release due to short-term depression (the terms featuring exponentials in the same equations). For a related model (de la Rocha and Parga, 2005) it was demonstrated that on increasing the presynaptic rate a maximum can be seen in the conductance fluctuations. The exact result of Equation (15) allows for this effect of fluctuations in depressing synapses on the voltage itself to be analyzed. Example variances as a function of presynaptic rate are shown in **Figure 3** and, as expected from the previous analysis of conductance fluctuations (de la Rocha and Parga, 2005), the variance also shows a maximum at intermediate presynaptic rates.

Though the voltage variance measures one aspect of presynaptic fluctuations, it misses its increasing shot-noise nature as the correlations increase. Shot noise causes a non-Gaussian component in the tails of the membrane voltage distribution that, because they extend to the region of action-potential initiation, can significantly affect the post-synaptic firing rate (Richardson and Swarbrick, 2010). The mean EPSP amplitude can be used to see this effect: it is proportional to the mean of the vesicles released by a spike given the occupancy levels already computed, and so

$$
\langle \text{EPSP} \rangle = apn \text{S } \langle \text{x} \rangle = \frac{ap \text{S} nR\_r}{R\_r + pR\_a}. \tag{16}
$$

As correlations from increasing *n* or *S* become stronger, the mean EPSP amplitude increases. However, as noted above, the mean voltage (Equation 14) does not change under increasing *n* or *S*. Taken together, the implications are that in the limit of high correlations the synaptic drive becomes temporally sparse with large amplitude EPSPs generated from correlated events. This effect can be seen in simulations of the model with different parameter regimes (**Figure 4**). For parameters *N* = 125, *n* = 1, and *S* = 1 (no presynaptic synchrony) the presynaptic

spikes (**Figure 4A**) and neurotransmitter release (**Figure 4D**) are uncorrelated, and in the full system with *M* = 5000 the EPSPs are relatively small (**Figures 4G,H**) and the resulting voltage distribution is close to Gaussian (**Figure 4I**). Increasing *n* (**Figure 4B**) or *S* (**Figure 4C**) to 25 leads to correlations in neurotransmitter release (**Figures 4E,F**), larger EPSPs (**Figures 4J,K,M,N**) and a more variable and skewed membrane voltage (**Figures 4L,O**). Note the right-hand tails from the skewed membrane voltages under conditions of presynaptic correlation that extend toward voltages where action potentials would be initiated.

#### **3.4. RELEASE SITE NUMBER AND POSTSYNAPTIC RATE**

As the analyses of the previous section and examples in **Figure 4** demonstrate, for the case of few release sites and low synchrony the voltage distribution is close to Gaussian. However, for the case of many release sites the synchronous release events generate large EPSPs that are reminiscent of shot noise. With this in mind, approximations for the firing of the postsynaptic cell may be found for the cases of low *n*, when the voltage distribution is roughly Gaussian, and high *n* for which the EPSP amplitudes are of-the-order-of or larger than threshold.

### *3.4.1. Few release sites*

For the low *n* approximation we rely on a recent observation (Alijani and Richardson, 2011) that the firing rate of integrateand-fire neurons is relatively insensitive to temporal correlations as long as the subthreshold voltage mean and variance are matched. To this end we approximate the firing rate of the neuron by a white-noise equivalent that has a voltage mean μ equal to that

synaptic drive over an interval of the membrane time constant for the same parameters. **(I,L,O)** Voltage histograms for the same parameters. Note that, whereas the voltage is close to Gaussian for the single release-site and no-presynaptic-synchrony case, it develops a tail to the right when correlations arise either from multiple release sites or presynaptic synchrony.

of Equation (14) and variance σ<sup>2</sup> equal to that of Equation (15). The firing rate of a leaky-integrate-and-fire neuron with these parameters is given (Brunel and Hakim, 1999) by the reciprocal of

$$\propto \int\_0^\infty \frac{dz}{z} e^{-z^2/2} \left( e^{xz\_{\rm th}} - e^{xz\_{\rm tr}} \right) \tag{17}$$

where *zth* = (*Vth* − *E* − μ)/σ and in this case *zre* = −μ/σ.

#### *3.4.2. Many release sites*

For sufficiently large *n* the mean EPSPs are greater than that required to bring the neuron to threshold *apnS x Vth* − *E*, and so each synchronous presynaptic event is likely to cause the postsynaptic cell to spike. The postsynaptic cell receives input at a total rate of *NRa*/*S* and so we can approximate the rate in the large *n* case by

$$r \sim \frac{\text{NR}\_a}{\text{S}} = \frac{\text{MR}\_a}{n\text{S}}.\tag{18}$$

Therefore, increasing the presynaptic synchrony *S* will reduce the postsynaptic response when *n* is large.

#### *3.4.3. Optimal release-site number*

Under conditions of a fixed number of release sites onto the postsynaptic cell *M* = *nN*, increasing *n* has no effect on the voltage mean (Equation 14), but increases the voltage variance (Equation 15). Therefore, as *n* increases from an initially small value, the approximation given by Equation (17) predicts that the postsynaptic cell will fire at an increasing rate. However, from Equation (18), which is valid for high *n*, we see that the postsynaptic firing rate decreases as *n* increases. Hence, there must be an intermediate *n* for which the response of the postsynaptic cell is optimized. This effect can be clearly seen in the examples given in **Figure 5** in which the postsynaptic rate is plotted as a function of *n* for fixed *M*. The intersections of the two approximations for

each curve provide an estimate for the optimal *n*, which decreases as the presynaptic synchrony increases. It should be noted that this effect, which has a maximum as a function of release-site number at constant presynaptic rate, is a distinct phenomenon to the tuning curve as a function of presynaptic rate analyzed in de la Rocha and Parga (2005).

#### **3.5. LONG-TERM PLASTICITY AND RESPONSE TO SYNCHRONY**

The post-synaptic firing rate is sensitive to correlations arising from multiple release sites, as discussed above, as well as to presynaptic synchrony (de la Rocha and Parga, 2005). In particular, the firing rate has a maximal response at an optimal *n* that is a function of the presynaptic synchrony as can be seen in **Figure 6**. When neurotransmitter release is too strongly correlated in the presynaptic population, the postsynaptic response weakens because the quantity of neurotransmitter released is in excess of that necessary to take the postsynaptic cell to threshold and therefore this limited resource is wasted. The reduction in response to over-strong correlations gives rise to the optimal responses in the space of *n* and *S* seen in **Figures 6A–C**. Note that the band of optimal postsynaptic response is linear with negative gradient in the *n*, *S* log–log plot and so the optimal synchrony in the presynaptic population has an inverse relation to the number of release sites *n* each presynaptic cell makes onto the postsynaptic target.

Analyses of long-term plasticity data (over a 12 h period) by Loebel et al. (2013) demonstrated that connections between thick-tufted layer-5 pyramidal cells in the rat somatosensory cortex alter their efficacy by changing the binomial parameter *n*, in preference to probability of release or quantal amplitude. Among the experiments analyzed certain connections potentiated fourfold, from an effective binomial *n* of ∼25 to ∼100. Assuming that the mean excitatory drive remains constant, this potentiation would lead to the postsynaptic cell becoming maximally responsive to signals encoded by weaker presynaptic synchrony (see **Figure 6C**). It would also cease to amplify strongly correlated stimuli as effectively. Other connections showed four-fold reductions in *n* from ∼40 to ∼10 under protocols that cause long-term depression. In this case the postsynaptic cell would now act as a better amplifier of stimuli encoded with larger correlations.

#### **3.6. OPTIMAL RESPONSE AND SYNCHRONY JITTER**

The effects of fluctuations in a synchronous presynaptic population can be modeled by adding a Gaussian-distributed jitter, of timescale τ*j*, to the timing of each action potential. When the individual components of the synchronous MIP event are too dispersed temporally, i.e., when the jitter is greater than the membrane time constant τ*<sup>j</sup>* > τ, the MIP event will fail to integrate in the postsynaptic cell. Under these circumstances the effect of correlations is diminished, as illustrated in **Figure 7**. When jitter is absent (**Figure 7A**), different values of presynaptic synchrony *S* produce distinct and clearly defined optimal response curves. With a physiological jitter timescale of τ*<sup>j</sup>* = 2 ms (**Figure 7B**) the curves for different synchronies shift upwards in *n* and the peak postsynaptic firing rate falls, particularly for larger synchrony. When τ*<sup>j</sup>* = τ (**Figure 7C**) only relatively strong synchrony values are significantly different from the independent case (*S* = 1).

## **3.7. OPTIMAL-RESPONSE CURVES ARE A ROBUST FEATURE OF SYNAPTIC HOMEOSTASIS**

Throughout much of the above analysis we held the total number of release sites *M* = *nN* constant and demonstrated an optimal response curve in which the postsynaptic rate peaks at an intermediate *n*, which is dependent on the presynaptic synchrony *S*. The rationale for this choice is that, under conditions of homeostasis, synaptic potentiation (increasing *n*) amongst a subpopulation of presynaptic neurons will occur at the expense of pruning neurons that do not contribute to postsynaptic firing. This will lead to the postsynaptic neuron receiving afferent drive from fewer presynaptic neurons, though each of these will make more contacts (and vice-versa for long-term depression). The theoretical results and simulations are not predicated on the assumption of constant *M* and so it is interesting to investigate whether the optimal-response effect persists if this restriction is relaxed. Using the example *S* = 10 we plotted the postsynaptic rate as a function of the presynaptic neuron *N* and release site number *n* (see **Figure 8A**). As expected the postsynaptic rate increases with an increasing number of presynaptic neurons *N* or release sites *n*. Plotted on the same figure is the curve *N* = *M*/*n* with *M* = 5000 that, because of its reciprocal relation will have low rates at either asymptotes, and an intermediate maximum (see **Figure 8B**). Also plotted is the curve *N* = *M*<sup>0</sup> where *M*<sup>0</sup> is a constant. This corresponds to a scenario in which the entire presynaptic population

per presynaptic neuron *n* for increasing jitter standard deviations τ*j*. **(A)** No jitter τ*<sup>j</sup>* = 0. **(B)** Physiological levels of jitter τ*<sup>j</sup>* = 2 ms. **(C)** Response curves converge on the unsynchronized *S* = 1 case, as expected, when jitter is of the order of the postsynaptic membrane time constant τ*<sup>j</sup>* = 10 ms.

has either potentiated or depressed their contacts, thereby changing the number of release sites *n* a presynaptic neuron makes without altering the total number of presynaptic neurons *N*. For this case, which is arguably an extremum from the point-of-view of homeostatis, the intermediate maximum is lost: the postsynaptic rate increases monotonically and loses its *n* dependence when *n* is sufficiently large, as expected from the first form of Equation (18). However, for intermediate cases of homeostasis of the form *N* = *M*<sup>κ</sup> /*n*<sup>κ</sup> with κ = 3/4, 1/2, 1/4 a maximal postsynaptic rate again occurs at some intermediate *n* (see **Figure 8B**). Given the dependence of the postsynaptic rate on *n* and *N* in **Figure 8A** it can be seen geometrically that any curve in which there is a reciprocal relation between *N* and *n* will likely feature a maximum at intermediate *n* and so the optimal-response curves are a robust feature of a postsynaptic neuron in which there is some degree of homeostatic restriction on the total number of afferent contacts.

### **4. DISCUSSION**

We considered the effects of afferent correlations arising from multiple neurotransmitter release sites and a partially synchronized presynaptic population. We derived exact forms for the crosscorrelations of vesicle release site occupancy and vesicle release, and demonstrated that these are identical to those recently obtained from a diffusion and additive-noise approximation (Rosenbaum et al., 2012), validating that approach up

**FIGURE 8 | Curves with a maximal postsynaptic rate at intermediate** *n* **are a robust feature. (A)** Intensity plot of the postsynaptic rate as a function of presynaptic release site *n* and neuron number *N* for an example with *S* = 10. Also plotted are the relations *N* = *M*<sup>κ</sup> /*n*<sup>κ</sup> for κ = 0, 1/4, 1/2, 3/4, 1, where κ = 1 corresponds to the homeostatic scenario principally considered in this paper for which there is a restriction *M* = *nN* on the total number of afferent contacts. The case κ = 0 corresponds to a scenario with no such restriction, and the other values of κ are intermediate cases with varying degrees of homeostasis. **(B)** The postsynaptic rate as function of *n* for the curves in the upper panel. Cases for all values of κ, except κ = 0 in which there is no homeostatic restriction, show a maximal response at intermediate *n*. The example curves given have *M*<sup>κ</sup> chosen so that they all pass through the point *n* = 25 and *N* = 200.

to second-order statistics and explaining their perfect agreement between theoretical and simulational results. We further calculated the exact variance of the membrane voltage, in absence of spike threshold. This quantity extends previous calculations (de la Rocha and Parga, 2005) of synaptic conductance fluctuations and allows for an estimation of the postsynaptic rate in the lowcorrelation Gaussian regime. For the high-correlation regime, due to multiple release sites *n* or strong synchrony *S*, we argued that the EPSPs become increasingly large, the nature of the synaptic fluctuations increasingly shot-noise like, and so the postsynaptic rate tends to the rate of synchronous presynaptic events. Combing these two results for the low and high correlation regimes, we demonstrated that the postsynaptic response is maximal for an intermediate number of release sites or synchrony. The system therefore exhibits a tuning-curve response to synchrony that can be modulated by long-term plasticity, which alters the number of release sites *n*.

Neurons respond maximally to specific stimuli when processing sensory input. A coordination of long-term plasticity, afferent synchrony and short-term depression therefore provides a potential tuning mechanism for cells to achieve this sensitivity. Efficient responsiveness would then depend on historical changes in synaptic connectivity (Taschenberger et al., 2002; Loebel et al., 2013) and the transient correlations evoked by a particular stimulus (Averbeck et al., 2006; Cohen and Kohn, 2011). More generally, neuronal networks balance fidelity of signal transmission with the metabolic costs associated with neurotransmitter recycling (Levy and Baxter, 2002; Savtchenko et al., 2012). Although a release of neurotransmitter beyond that necessary to induce a postsynaptic spike may have medium-term conductance implications or counteract strongly fluctuating inhibition, an efficient network would not be expected to exceed the degree of pairwise connectivity that maximizes response to common spike frequencies and correlations. On the other hand, signals encoded by small numbers of cells would require highly potentiated connections to transmit information with any degree of consistency. This implies that across a neuronal network the degree of clustering would be optimally balanced with individual synaptic weights.

To investigate maximal firing rate response to a defined excitatory drive, we have neglected the effects of synaptic inhibition. As *in vivo* network behaviors arise from a balance of excitation and inhibition, a development of the ideas presented here along the above lines would need to incorporate inhibitory effects on the total synaptic conductance. By altering the timescales on which excitatory inputs are integrated, inhibitory drive could allow a more finely-tuned response to afferent sub-populations with varying degrees of temporal dispersion. Another extension of this work would be to incorporate different forms of shortterm synaptic plasticity into the model. This would be particularly appropriate when studying connections between specific celltypes where there is experimental evidence for other forms of synaptic dynamics. It is also likely that effects moderating synaptic depression, such as the increasing facilitation in the maturing neocortex (Reyes and Sakmann, 1999) would lead to qualitatively different behavior as cortical networks develop.

## **FUNDING**

This research was supported by a Warwick Systems Biology Doctoral Training Centre fellowship to Alex D. Bird funded by the UK EPSRC and BBSRC funding agencies.

#### **REFERENCES**


CA1 pyramidal cells. *Neuroscience* 102, 527–540. doi: 10.1016/S0306-4522(00) 00496-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 06 August 2013; accepted: 07 January 2014; published online: 30 January 2014.*

*Citation: Bird AD and Richardson MJE (2014) Long-term plasticity determines the postsynaptic response to correlated afferents with multivesicular short-term synaptic depression. Front. Comput. Neurosci. 8:2. doi: 10.3389/fncom.2014.00002*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Bird and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX**

## **DERIVATION OF THE VOLTAGE VARIANCE**

The voltage equation can be written in the form

$$
\pi \frac{dV}{dt} = E - V + a\mathfrak{r}\mathfrak{t} \tag{19}
$$

where ζ is the summation of the release trains across the *N* presynaptic neurons and each of their *n* contacts

$$\chi = \sum\_{i=1}^{N} \sum\_{j=1}^{n} \chi\_{ij} \tag{20}$$

where χ*ij* takes the form of Equation (11) for the *i*th presynaptic neuron's *j*th contact. The autocorrelation of ζ is therefore comprised of *Nn* autocorrelations of χ in the form of Equation (12), *Nn*(*n* − 1) crosscorrelations of χ for distinct release trains sharing the same presynaptic neuron given by Equation (13) with γ = 1 and *N*(*N* − 1)*n*<sup>2</sup> crosscorrelations of χ for release trains with different presynaptic neurons given by Equation (13) with γ = *c*.

Taking expectations of both side of Equation (19) in the steady state gives

$$
\langle V \rangle = E + a\mathfrak{r} \left\langle \xi \right\rangle = E + aM\mathfrak{r}R\_a p \left\langle \chi \right\rangle \,. \tag{21}
$$

We can now solve Equation (19) to give

$$\langle V - \langle V \rangle = a \int\_{-\infty}^{t} dt' e^{-(t-t')/\tau} \left( \langle \xi(t') - \langle \xi \rangle \right) \tag{22}$$

so that the voltage variance can be written as an integral over the autocorrelation of ζ, Autocorr(ζ) = ζ(*t* ) − ζ ζ(*t* ) − ζ 

$$\left( \left( V - \langle V \rangle \right)^2 = a^2 \int\_{-\infty}^t dt' \int\_{-\infty}^t dt'' e^{-(t-t')/\tau} e^{-(t-t'')/\tau} \text{Autocorr}(\xi). \tag{23}$$

As discussed above, the autocorrelation of ζ is the sum of the various crosscorrelations of χ so that it must take the form

$$\text{Autocorr}(\xi) = \alpha \\$(t'-t'') + \beta e^{-|t'-t''|/\tau\_{\text{x}}} \tag{24}$$

where α and β are obtained from the prefactors of the terms in Equations (12, 13) multiplied by their respective contributions. Inserting Equation (24) into (23) and performing the integration gives

$$\text{Var}(V) = a^2 \left( \frac{\alpha \mathbf{r}}{2} + \frac{\beta \mathbf{r}^2 \mathbf{r}\_x}{\mathbf{r} + \mathbf{r}\_x} \right). \tag{25}$$

On substituting the appropriate forms for α and β the result given in Equation (15) is obtained.

## Phase synchrony facilitates binding and segmentation of natural images in a coupled neural oscillator network

#### *Holger Finger <sup>1</sup> \* and Peter König1,2*

*<sup>1</sup> Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany*

*<sup>2</sup> Institute of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany*

#### *Edited by:*

*Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany*

#### *Reviewed by:*

*Abdelmalik Moujahid, University of the Basque Country, Spain Andrea K. Barreiro, Southern Methodist University, USA*

#### *\*Correspondence:*

*Holger Finger, Institute of Cognitive Science, University of Osnabrück, Albrechtstraße 28, 49069 Osnabrück, Germany e-mail: holger.finger@uos.de*

Synchronization has been suggested as a mechanism of binding distributed feature representations facilitating segmentation of visual stimuli. Here we investigate this concept based on unsupervised learning using natural visual stimuli. We simulate dual-variable neural oscillators with separate activation and phase variables. The binding of a set of neurons is coded by synchronized phase variables. The network of tangential synchronizing connections learned from the induced activations exhibits small-world properties and allows binding even over larger distances. We evaluate the resulting dynamic phase maps using segmentation masks labeled by human experts. Our simulation results show a continuously increasing phase synchrony between neurons within the labeled segmentation masks. The evaluation of the network dynamics shows that the synchrony between network nodes establishes a relational coding of the natural image inputs. This demonstrates that the concept of binding by synchrony is applicable in the context of unsupervised learning using natural visual stimuli.

**Keywords: oscillation, binding, synchronization, normative model, unsupervised learning, scene segmentation, object label, natural image statistics**

## **1. INTRODUCTION**

One of the central questions in neuroscience is how information about a given stimulus is processed in a distributed network of neurons such that it is perceived not only as a collection of unrelated features but as a unified single object. The concept of binding by synchrony has been proposed as a mechanism to coordinate the spatially distributed information processing in the cortex (Milner, 1974; Von Der Malsburg, 1981). Experiments in cat visual cortex have confirmed that inter-columnar synchronization indeed corresponds to a relational code that reflects global stimulus attributes (Gray et al., 1989; Singer, 1999; Engel and Singer, 2001). However, the physiological recordings in these early studies were based on the presentation of artificially designed stimuli. In a more recent study Onat et al. (2013) showed in experiments that long-range interactions in the visual cortex are compatible with Gestalt laws. This suggests that the concept of binding by synchrony is also feasible in the case of natural visual stimuli. It is still the center of a heated debate to what extend synchronized activity represents a neural code of binding and segmentation. Especially, how the neural system can learn this relational coding when it is exposed to new stimuli is still an open question. The most prominent possibility is that tangential cortico-cortical connections in the visual cortex lead to synchronized activity that implements Gestalt laws. Löwel and Singer (1992) showed in cats with artificially induced strabismus that selective stabilization of tangential connections occurs between cells that exhibit correlated activity induced by visual experience. Furthermore, König et al. (1993) found that the synchronization of cortical activity is impaired in these cats with artificial strabismus. These findings indicate that there is an important interplay between unsupervised learning of tangential connections on behavioral time scales and their role in synchronization phenomena on fast time scales.

The physiological experiments on binding by synchrony have been accompanied by theoretical studies early on. Sompolinsky et al. (1990) investigated how a model of coupled neural oscillators is able to process global stimulus properties in synchronization patterns using abstractly defined neuronal activation levels and predefined coupling strengths for the simulated network. These simulation results showed that the coupling of neural oscillators provides a viable mechanism implementing a coding of perceptual grouping. Such previous work includes studies ranging from networks build out of very simple elements to detailed simulations containing many compartments per unit.

To investigate the functional role of synchronization and its relation to coding, it is important to choose the right level of abstraction in the model. A simplification from detailed spiking neuron models to coupled phase oscillator models allows us to analyze neuronal synchronization in a broader context of a normative model involving unsupervised learning from natural stimuli. A review of these coupled neural oscillator models was done by Sturm and König (2001), where the authors show the derivation of simplified phase update equations from biologically measurable phase response curves. The simplifications in coupled phase oscillators are based on the assumption that neurons are close to their oscillatory limit-cycle and that a change in the phase of the neuronal inputs induces only a small perturbation to the neuronal phase. The phase update equation in our model is based on the Kuramoto model of coupled phase oscillators (Kuramoto, 1984) in the sense that our model also assumes a very simple sinusoidal phase interaction function. This approximation of the phase interaction by a sinusoidal function allows us to use mathematical simplifications in the simulation of the model.

Very similar to the work of Sompolinsky et al. (1990), we extend the standard formulation of the Kuramoto model with a second variable per neuron to encode the activation of the oscillators. Therefore, in our model the state of a neuron is represented by 2 degrees of freedom, which are separated into activation and phase variables. This discrimination between coding of receptive field features by activation and coding of relationships by phase is a biologically motivated segregation of their different functional roles. Maye and Werning (2007) specifically compare the synchronization properties of these coupled phase oscillator models with mean-field oscillator models based on the Wilson-Cowan model (Wilson and Cowan, 1972). They state that the simplified coupled phase oscillators allow decoupling the simulation time constants of fast oscillatory time scales from slow rate coding time scales. Another advantage is an easier analysis of the synchronization patterns, because the direct encoding of the phase variables means that all contextual relationships are coded at the same time. Consequently, we use the dual variable phase model, because it is suitable to answer fundamental questions about the interactions between synchronization phenomena and contextual coding in neural systems.

In contrast to these phase oscillator models, most recent work on segmentation in networks of coupled neural oscillators is based on the so called "local excitatory global inhibitory oscillator network" (LEGION) or similar variants of this model, which was first proposed by Wang and Terman (1997). In LEGION the dynamics of each oscillatory period of individual units is simulated in detail by time-varying variables describing the internal states of each neuron. In contrast, in our model the oscillatory period is not simulated, but represented only implicitly in the phase variables. Nonetheless, several aspects which we analyze in this work were previously also investiged in LEGION. Namely, similar to Li and Li (2011) we use a small-world topology, to reduce the computational cost while still allowing binding by synchrony over large distances. We also use parallel computations to speed up the simulations, which was also previously done in LEGION by Bauer et al. (2012).

The above-mentioned previous theoretical studies mostly investigated the processing of artificial stimuli in close analogy to the physiological experiments. These stimuli are heavily dominated by artificial geometric patterns as bars and gratings. However, the concept of binding by synchrony makes much more general claims about grouping of sensory representations of natural stimuli. By now a fair number of databases with images considered to be natural is available. However, a problem with generic natural stimuli is that segmentation is not only difficult, but no general ground truth is available. The LabelMe database (Russell et al., 2008) is rather unique, as it contains a large collection of images together with human labeled annotations of image segments. In theoretical studies these labels may serve as a ground truth to evaluate how the relative phases between neurons are coding relational structures on natural stimuli.

The processing of natural stimuli in neural systems can be described as a normative approach in which the representation of the input is learned by an optimization of computational principles (Einhäuser and König, 2010). It has been successfully employed in modeling receptive field properties of simple and complex cells in primary visual cortex. Furthermore, response properties of neurons in higher areas and other modalities have been suggested to follow similar rules. This approach might be extended to include the computational principles that underlie tangential interactions that directly influence synchronization phenomena. This might answer the question whether the concept of binding by synchrony can work in principle with unsupervised learning and natural stimuli.

In this study we investigate whether the concept of binding by synchrony, as has been investigated using abstract stimuli, is viable for natural stimuli. The most important novelty of our approach is the combination of these different concepts described above into one single simulation model to allow the investigation of their interplay: Specifically, we combine normative model approaches of unsupervised learning from natural stimuli with the concept of binding by synchrony in a network of coupled phase oscillators. Importantly, the data driven approach, that utilizes general principles, minimizes the number of heuristics and free parameters. We present large-scale simulations of neural networks encoding real-world image scenes. In the first stage of our algorithm forward projections generate activation levels of neurons corresponding to the primary visual cortex. In the second stage these activation levels are used in a simulation of tangential coupled phase oscillators. We present results with forward projections based on designed Gabor filters that are a good approximation of receptive fields in the primary visual cortex. To allow later canonical generalization in higher network layers, we also present results with forward projections learned in a normative model approach with a sparse autoencoder using natural image statistics. In addition to these learned forward weights, the structural connectivity of the phase simulations is also learned unsupervised using the correlated activity induced by natural stimuli. Performance of the network is tested using images taken from the LabelMe database. Thereby we can investigate how synchronization phenomena might be utilized in sensory cortical areas to bind different attributes of the same stimulus and how it might be exploited for scene segmentation.

## **2. MATERIALS AND METHODS**

The overall network architecture of our simulation model consists of two main parts: (1) Feedforward convolutional filters (red lines in **Figure 1**) are used to generate the activation levels for neurons in a layer corresponding to the primary visual cortex. On top of each pixel is a column of neurons which encode different features of a local patch in the input image (black bottom cuboid in **Figure 1**). Each feature type is described by a weight matrix which is applied using a 2-dimensional-convolutional operation on each rgb-color-channel of the input image. (2) The obtained activation levels in this 3-dimensional structure (black top cuboid in **Figure 1**) are subsequently used to simulate sparse connections (green lines in **Figure 1**) between coupled phase oscillators.

applied on the input image *vx*,*y*,*<sup>c</sup>* (bottom) to generate the activation levels *hx*,*y*,*<sup>j</sup>* of feature columns (red blocks in the top). These activations *hx*,*y*,*<sup>j</sup>* are then transformed to activations of oscillators *gx*,*y*,*<sup>j</sup>* using simple local regularization steps. The intralayer connections *ek*,*<sup>j</sup>* <sup>δ</sup>*x*,δ*<sup>y</sup>* (green) simulate these coupled phase oscillators which synchronize or desynchronize image features.

We start with the description of the stimulus material (section 2.1). This is followed by the description of the coupled phase oscillator model (section 2.2) and the sampling mechanism generating the horizontal sparse connections (section 2.3). Afterwards we describe the underlying mechanism of the feedforward generation of activation levels (section 2.4).

#### **2.1. NATURAL STIMULUS MATERIAL**

As stimulus material in our simulations we use images of suburban scenes from the LabelMe database (Russell et al., 2008). Due to computational time constrains we have to restrict the evaluations to a small subset of all available images in the database. In addition, the database is not fixed but new images and segmentation masks are often added. We use only the first 50 images in the folder *05june05\_static\_street\_boston* so that we have a consistent and fixed dataset of well defined images.

These images have initially a resolution of 2560 × 1920 pixels. We first resize the images to 400 × 300 pixels to further reduce the computation time of the simulations. Subsequently we subtract the mean pixel values and apply a smoothed zero-phase (ZCA) whitening transformation (Bell and Sejnowski, 1997). For an input image *X* the whitened pixel values are given by *X*ZCA = *UDUTX*, where *U* is a matrix containing the eigenvectors of the covariance matrix of the image statistics and *D* is a diagonal matrix with diagonal elements √ 1 <sup>λ</sup>*<sup>i</sup>* <sup>+</sup> <sup>0</sup>.<sup>1</sup> where <sup>λ</sup>*<sup>i</sup>* are the corresponding eigenvalues. This transformation applies local center-surround whitening filters that decrease the correlations in the input images. We implement this whitening transformation using a convolutional image filter.

The images in the LabelMe database come along with human labeled segmentation masks. These segmentation masks correspond to objects that are perceived as a unique concept with an associated abstract label like "tree," "car" or "house." We use these supervised segmentation masks for later evaluations of binding in the simulated phase maps. Please note that in our network simulations this segmentation information is not used at any moment in time. Instead, the network connectivity is based solely on unsupervised learning using the statistics of neuronal activations.

## **2.2. COUPLED PHASE OSCILLATOR MODEL**

Our network of coupled phase oscillators is based on the oscillator model described by Sompolinsky et al. (1990). In the following, we use the same motivational derivation of the phase update equations. We model the probability of firing *Px*,*y*,*k*(*t*) per unit time of a neuron at image position (*x*, *y*) encoding feature type *k* at time *t* by an isochronous oscillator. In our simulations we represent the state of the neuronal oscillators by seperated activation variables *gx*,*y*,*<sup>k</sup>* and phase variables *<sup>x</sup>*,*y*,*k*. These two variables are linked to the biological interpretation of firing probability by the equation

$$P\_{\mathbf{x},\mathbf{y},k}(t) = \mathbf{g}\_{\mathbf{x},\mathbf{y},k} \left(1 + \lambda \cdot \cos(\Phi\_{\mathbf{x},\mathbf{y},k}(t))\right),\tag{1}$$

where the parameter 0 < λ < 1 controls the relative strength of the temporal oscillation in relation to the overall firing probability of the neuron. The phase progression is a periodic function *<sup>x</sup>*,*y*,*k*(*t*) = *<sup>x</sup>*,*y*,*k*(*t* + 2π). In our work, the calculation of the activation levels *gx*,*y*,*<sup>k</sup>* significantly differs from the simple artificial tuning curves used in Sompolinsky et al. (1990). A detailed description of how these activation levels are obtained will be presented in section 2.4. The activation levels *gx*,*y*,*<sup>k</sup>* are normalized by dividing by the local sum of all activation levels at each image position such that *<sup>k</sup> gx*,*y*,*<sup>k</sup>* <sup>=</sup> <sup>1</sup>∀*x*, *<sup>y</sup>* <sup>∈</sup> <sup>Z</sup>. In the simulations presented in this work the activation levels of each neuron are only computed once from the input image using feedforward projections (red lines in **Figure 1**) and are then kept constant during the simulation of the phase model. This simplification of constant activation levels is based on the assumption that the stimulus presentaion on behavioral timescales (≈ seconds) remains constant during the phase synchronization which happens at very fast timescales (i.e., gamma frequency ≈ 40 Hz). Another argument to support this assumption is that the visual cortex seems to operate in a regime of self-sustained activity (Stimberg et al., 2009) and therefore we can assume constant activation levels during the phase simulation.

After these activation levels *V* are computed, we simulate the horizontal coupling between the phase oscillators. The phase connections in our network are described by a weighted graph *G* = (*H*, *E*) where the neurons *gx*,*y*,*<sup>j</sup>* ∈ *H* are the vertices organized in a three dimensional block (**Figure 1**). An edge *e* (*j*,*k*) <sup>δ</sup>*x*,δ*<sup>y</sup>* ∈ *E* describes synchronizing (positive) or desynchronizing (negative) connections from neurons *gx*,*y*,*<sup>j</sup>* to neurons *gx* <sup>+</sup> <sup>δ</sup>*x*,*<sup>y</sup>* <sup>+</sup> <sup>δ</sup>*y*,*k*. The phase of each neuron is then modeled according to a differential equation describing weakly coupled phase oscillators (Kuramoto, 1984)

$$\frac{d\Phi\_{\mathbf{x},\mathbf{y},k}(t)}{dt} = \boldsymbol{\omega} - \frac{1}{\mathbf{r}} \sum\_{e^{(j,k)}\_{\mathbf{x},\mathbf{y},\mathbf{y}} \in E} \mathbf{g}\_{\mathbf{x},\mathbf{y},k} \cdot e^{(j,k)}\_{\mathbf{x},\mathbf{y},\mathbf{y}} \cdot \mathbf{g}\_{\mathbf{x}-\mathbf{k},\mathbf{y}-\mathbf{k},j} \cdot \mathbf{r}$$
 
$$\sin(\Phi\_{\mathbf{x},\mathbf{y},k}(t) - \Phi\_{\mathbf{x}-\mathbf{k},\mathbf{y}-\mathbf{k},j}(t)) \,, \tag{2}$$

where τ is the time scale of the phase interactions and ω is the natural frequency of the modeled neural oscillations. We assume that all neurons have the same intrinsic natural frequency ω and the interaction strength *gx*,*y*,*<sup>k</sup>* · *e* (*j*,*k*) <sup>δ</sup>*x*,δ*<sup>y</sup>* · *gx* <sup>−</sup> <sup>δ</sup>*x*,*<sup>y</sup>* <sup>−</sup> <sup>δ</sup>*y*,*<sup>j</sup>* is proportional to the activation levels of the pre- and post-sysnaptic neurons. Note that our model is in contrast to the more common formulation of the Kuramoto model with heterogeneous frequencies and fixed homogenous all-to-all interaction strengths.

A major difference to the phase update equation used in Sompolinsky et al. (1990) is that we neglect the noise term in the differential equation of each oscillator. The noise term in Sompolinsky et al. (1990) is used as the primary source of desynchronization in the network. In contrast, in our work, we use a normative model to learn not only synchronizing but also desynchronizing connections (see section 2.3). For an easier analysis and interpretation of the results, it is advantageous to have only a single source for the desynchronization in the network. Therefore, we decided to use a deterministic phase model, although it was previously shown that noise is an important factor to control the network coherence. In addition to a simpler interpretation it reduces the number of model parameters and is also more compatible to further applications of gradient descent learning to change the strength of the phase interactions.

We can further simplify the equation by using the fact that we model isochronous oscillators with homogeneous frequencies. In Equation 2 all phase variables *<sup>x</sup>*,*y*,*k*(*t*) have a constant phase progression with frequency ω. We can use a simple transformation to a new variable, which represents only the phase offsets between neurons:

$$
\Phi\_{\mathbf{x},\mathbf{y},k}(t) = \Phi\_{\mathbf{x},\mathbf{y},k}(t) - \alpha t. \tag{3}
$$

This new phase variable ϕ*x*,*y*,*k*(*t*) describes the relative phase of neuron *k* to the global fixed network oscillation with frequency ω. Substitution into the equation above leads to a simplified phase update equation

$$\frac{d\varphi\_{\mathbf{x},\mathbf{y},k}(t)}{dt} = -\frac{1}{\mathfrak{r}} \sum\_{\mathbf{c}^{(j,k)}\_{\mathbf{x},\mathbf{b}^{\prime}} \in E} \mathfrak{g}\_{\mathbf{x},\mathbf{y},k} \cdot \mathfrak{e}^{(j,k)}\_{\mathbf{b}\mathbf{x},\mathbf{b}^{\prime}\mathbf{y}} \cdot \mathfrak{g}\_{\mathbf{x}-\mathbf{b}\mathbf{x},\mathbf{y}-\mathbf{b}\mathbf{y},j} \cdot \mathfrak{e}^{(j,k)}\_{\mathbf{b}\mathbf{x},\mathbf{b}^{\prime}\mathbf{y}}.$$

$$\sin\left(\mathfrak{q}\_{\mathbf{x},\mathbf{y},k}(t) - \mathfrak{q}\_{\mathbf{x}-\mathbf{b}\mathbf{x},\mathbf{y}-\mathbf{b}\mathbf{y},j}(t)\right). \tag{4}$$

In this equation it can be seen that the timescale τ of the phase interaction strength is decoupled from the oscillatory timescale 1/ω. Please also note, that a change of the parameter τ would not qualitatively change the results of our simulations. Instead it would just linearly change the units of the time axes. Therefore, we show the simulation results with the time axis measured in iterations, which could be linearly scaled to arbitrary time units to best fit to different biological measurements.

This phase update equation is used in our simulations to model the horizontal connections in the network. It allows directly specifying synchronizing interactions from neuron *gx*,*y*,*<sup>j</sup>* to neuron *gx* <sup>+</sup> <sup>δ</sup>*x*,*<sup>y</sup>* <sup>+</sup> <sup>δ</sup>*y*,*<sup>k</sup>* with a positive connection weight *e* (*j*,*k*) δ*x*,δ*y* and desynchronizing interactions with a negative weight respectively. We simulate these coupled differential equations using a 4th-order Runge-Kutta method.

#### **2.3. HORIZONTAL INTERACTION STRENGTHS**

We use correlation statistics of the induced activation levels to set the intralayer connection strengths similar to a simple Hebbian learning rule. We write <sup>ρ</sup>(*k*,*m*) *<sup>x</sup>*,*<sup>y</sup>* to denote the Pearson crosscorrelation between the activations of feature type *k* at image position (*x*˜, *y*˜) and the activations of feature type *m* at the shifted image position (*x*˜ + *x*, *y*˜ + *y*). Each correlation value in this tensor is calculated from the correlation statistics over approximately 1 million network activations induced by 50 natural images and presented at 236 × 86 image positions.

These horizontal connections make up the coupling between the neural oscillators. Instead of full connectivity, we use stochastically sampled sparse directed connections from the correlation matrix. To exclude noise in the correlation matrix, we use the Benjamini-Hochberg-Yekutieli procedure (Benjamini and Yekutieli, 2001) under arbitrary dependence assumptions with a false-discovery rate of 0.05.

The probability of a positive (+1) or a negative connection (−1) in the connectivity graph *G* = (*H*, *E*) is then given by

$$P\left(e\_{\mathbf{x},\mathbf{y}}^{(j,k)} = \pm 1\right) = \eta\_{\pm} \cdot \frac{\max\left(0, \pm \rho\_{\mathbf{x},\mathbf{y}}^{(j,k)}\right)}{\sum\_{\tilde{\mathbf{x}},\tilde{\mathbf{y}},m} \max\left(0, \pm \rho\_{\tilde{\mathbf{x}},\tilde{\mathbf{y}}}^{(m,k)}\right)},\qquad(5)$$

where η+ specifies the total number of afferent synchronizing connections and η− the total number of afferent desynchronizing connections per neuron. Therefore, synchronizing connections exist only between naturally correlated features and desynchronizing connections between anti-correlated features.

We sample this sparse tangential connection pattern such that it is invariant to spatial shift transformations. The convolutional structure of the forward projections leads to activation and phase variables that are stored in a 3-dimensional block (top of **Figure 1**) with two dimensions given by the spatial extend of the image and one feature dimension. This convolutional structure can be exploited for the sparse horizontal connections to significantly speed up the computation. Therefore, we specify the properties of the coupled oscillator connections only for a generic feature column. These connections are then applied at each image position. Specifically, in our implementation each sampled tangential connection is specified by 5 variables: the horizontal and vertical connection length in image directions and the indices of the afferent and efferent feature maps and the connection weight. This has the advantage that the phase update equation can be implemented as a vectorized convolutional operation although the connection pattern is highly sparse.

#### **2.4. FEEDFORWARD CONNECTIVITY**

We compare the binding and segmentation performance of the coupled neural oscillator model using two different ways to generate the activation levels for the neurons. We first describe hand-crafted feedforward Gabor weights (section 2.4.1) and then the unsupervised learning of receptive fields using a convolutional autoencoder (section 2.4.2). Finally, activation functions are presented to further regularize the resulting feature representations (section 2.4.3).

#### *2.4.1. Gabor filters*

For reference we use a set of Gabor filters with specified orientation, frequency and color tuning to generate the activation levels for the phase simulation. Thereby we can analyze the phase oscillator network based on a regularly defined set of features that can be parameterized.

We generate linear convolutional weights (marked in red in **Figure 1**) using an approximate Gaussian derivative model, which was shown to be a good fit for the receptive fields of simple cells in the primate visual cortex (Young, 1987). We use only nondirectional three-lobe monophasic receptive fields (Young and Lesperance, 2001) to reduce our model parameters. We implement the Gaussian derivative model using difference-of-offset-Gaussians with a slightly larger center compared to surround to code color offsets. The receptive fields that are used in our simulations have a size of 12x12 pixels and are defined by

$$\mathcal{W}\_{\mathbf{x},\mathbf{y}} = \operatorname{g}\_{2\sigma}(\mathbf{y}) \cdot \left(-\mathbf{5} \cdot \operatorname{g}\_{\sigma}(\mathbf{x} + \sigma) + 10.1 \cdot \operatorname{g}\_{\sigma}(\mathbf{x}) - \mathbf{5} \cdot \operatorname{g}\_{\sigma}(\mathbf{x} - \sigma)\right),\tag{6}$$

where *g*σ(*x*) is a one dimensional Gaussian distribution with standard deviations σ = 1.5 pixels (or *g*2σ(*y*) with standard deviation of 2σ = 3 pixels) and the coordinates *x* and *y* are rotated giving a total of 8 orientations in steps of 22.5◦. The convolutional filters are applied to the images with a stride of 2 pixels in both image dimensions and are followed by a sigmoidal activation function to scale the values to a reasonable interval between 0 and 1. We apply each orientation filter separately to all color channels (red, green, blue). Furthermore, we add features for the complementary color channels similar to the on-off discrimination in the visual pathway from the retina to the visual cortex. The direct linear dependency between these pairs of opponent-color channels is removed later with additional activation functions described in section 2.4.3. In summary, we have a total of 48 convolutional feature channels per image position: 8x orientations, 3x rgb-color channels, 2x opponent-color channels. This overcomplete neural representation of the input images is used to generate the activation levels for the phase simulations.

Cortical measurements show that the distribution of non-directional monophasic simple cells is roughly uniformly distributed between zero-, first- and second order Gaussian derivatives (Young and Lesperance, 2001). We performed the simulations presented here also with mixed receptive fields of zero-, first- and second-order Gaussian derivatives and obtained similar results. We present here only results with second order Gaussian derivatives, because this reduces the number of model parameters drastically.

#### *2.4.2. Autoencoder filters*

As a comparison to these regular hand-designed Gabor filters we analyze the oscillatory network based on activation levels generated by unsupervised learned autoencoder weights. A good overview of the concepts described in this section can be found in Le et al. (2011b), where the authors analyze different optimization methods for convolutional and sparse autoencoders. An autoencoder learns a higher level representation from the stimulus statistics such that the input stimuli can be reconstructed from the hidden representations. In addition, we optimize the sparsity of the activation levels in this representation, which was shown to learn connection weights which resemble receptive fields in the visual cortex (Olshausen and Field, 1996; Hinton, 2010; Le et al., 2011a).

A common trick in unsupervised learning in neural networks are shared connection weights to reduce the number of parameters that have to be learned, which can be accomplished by a convolutional feed-forward network in the case of images (LeCun et al., 1998; Lee et al., 2009). The structure of our convolutional autoencoder is shown in **Figure 2**. The feedforward projections that generate the activation of feature map *j* consist of convolutional filters *Wx*,*y*,*c*,*<sup>j</sup>* (red lines in **Figure 2**) with input features *c* ∈ {1, 2, 3} (rgb-colors) and a bias term *bj* and is followed by a sigmoidal activation function. Therefore, the hidden layer activation map of feature *j* ∈ {1, 2,..., *J*} is described by

$$h\_{\mathbf{x},\mathbf{y},\mathbf{j}} = f\left(\sum\_{\mathbf{c}=1}^{3} W\_{\mathbf{x},\mathbf{y},\mathbf{c},\mathbf{j}} \* \nu\_{\mathbf{x},\mathbf{y},\mathbf{c}} + b\_{\mathbf{j}}\right). \tag{7}$$

The hidden layer activation *h* of each input image sample is also a 3 dimensional block (horizontal and vertical image dimensions and the feature type). The weight matrix *W* is a 4 dimensional structure which describes the connection weights from a convolutional input block to one output column in the hidden layer. The convolutional image operations (∗) are applied in the image directions *x* and *y* between all combinations of input feature maps *c* and all output feature maps *j*.

We use linear activation functions for the backward projections (blue lines in **Figure 2**) so that the output matches the scale of the input images (zero-mean). We use another set of weights *W*ˆ *<sup>x</sup>*,*y*,*j*,*<sup>c</sup>* and bias terms ˆ *bc* to describe these backward connections. Therefore, the activation in the reconstruction layer is given by

$$
\hat{\nu}\_{\mathbf{x},\mathbf{y},\mathbf{c}} = \sum\_{j=1}^{J} \hat{W}\_{\mathbf{x},\mathbf{y},j,\mathbf{c}} \* h\_{\mathbf{x},\mathbf{y},j} + \hat{b}\_{\mathbf{c}},\tag{8}
$$

where *J* = 100 is the number of different feature types. During the learning stage only the valid part (no zero padding) of the convolutions are used for the forward and backward projections to avoid edge effects of the image borders on the learned weights. Similar to the Gabor filters the convolutional filters have a size of 12x12 pixels and are applied using a stride of 2 pixels leading to a reduction in the resolution of the hidden layer.

We use the sum of 3 optimization functions to learn the forward and backward weights of the autoencoder. The first optimization term which is minimized is the reconstruction error averaged over all positions and training samples *s* and is given by

$$
\Psi\_1 = \left\langle \frac{1}{2} \left\| \hat{\nu}\_{\mathbf{x}, \mathbf{y}, \mathbf{c}}^{(s)} - \nu\_{\mathbf{x}, \mathbf{y}, \mathbf{c}}^{(s)} \right\|^2 \right\rangle\_{\mathbf{x}, \mathbf{y}, \mathbf{s}}.\tag{9}
$$

The second term optimizes the sparseness of the hidden units as described by Hinton (2010) and Le et al. (2011a) with

$$\Psi\_2 = \mathfrak{h} \cdot \sum\_j \text{KL}\left(\tilde{h} \left\| \langle h\_{\mathbf{x}, \mathbf{y}, j}^{(s)} \rangle\_{\mathbf{x}, \mathbf{y}, s} \right\|, \tag{10}$$

where KL is the Kullback-Leibler-divergence between two Bernoulli distributions with expected values *h*˜ and *h*(*s*) *x*,*y*,*j <sup>x</sup>*,*y*,*s*. We set the desired average activation *h*˜ = 0.035.

The third term is a weight decay (L2-norm) of all forward and backward weights and is given by

$$
\Psi\_3 = \frac{\lambda}{2} \cdot \left( \sum\_{\mathbf{x}, \mathbf{y}, \mathbf{c}, \mathbf{j}} W\_{\mathbf{x}, \mathbf{y}, \mathbf{c}, \mathbf{j}}^2 + \sum\_{\mathbf{x}, \mathbf{y}, \mathbf{j}, \mathbf{c}} \hat{W}\_{\mathbf{x}, \mathbf{y}, \mathbf{j}, \mathbf{c}}^2 \right). \tag{11}$$

This optimization term pushes all connection weights toward zero such that only the connections which help to extract useful features remain. Therefore, it provides a regularization mechanism during learning.

For the simulations presented in this paper we use a relative weighting between these optimization functions given by β = 90 and λ = 0.3. The gradients of the optimization functions are calculated using back propagation of error signals and were checked using numerical derivatives. The sum of the three terms described above is minimized with the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS), which uses an approximation to the inverse Hessian matrix (Liu and Nocedal, 1989). We use the *minFunc* library of Mark Schmidt1with default parameters for line search with a strong Wolfe condition. We use L-BFGS because it converges much faster in comparison to standard gradient descent, especially in the case of autoencoders with sparseness constrains (Le et al., 2011b). Another advantage of L-BFGS is that extensive tuning of learning parameters as in standard gradient descent methods is not necessary.

The training data consists of 1000 color patches (60 × 60 pixels) sampled from the folder *05june05\_static\_street\_boston* of the LabelMe database (Russell et al., 2008). This corresponds to 625.000 training samples per convolutional fragment where the forward weight matrix is applied. After 500 iterations the features are mostly oriented patches and sensitive to different colors.

## *2.4.3. Regularization of activation levels*

Although the Gabor and autoencoder filters are both followed by a sigmoidal activation function, we further sparsify the activation levels *hx*,*y*,*<sup>k</sup>* with feature types *k* ∈ {1..*K*} in a similar way to local cortical circuitry. We want to constrain the number of active neurons, rather than the mean activation levels. Therefore, we subtract at each image position the average local activation levels. Subsequently a half-wave rectification is applied to constrain the activation levels again to the positive domain with roughly half of the neurons inactivated:

$$\tilde{h}\_{\mathbf{x},\mathbf{y},k} = \max\left(0, h\_{\mathbf{x},\mathbf{y},k} - \sum\_{j=1}^{K} h\_{\mathbf{x},\mathbf{y},j}\right). \tag{12}$$

Consequently the hard sparseness (Rehn and Sommer, 2007) is artificially increased and these inactivated neurons do not take part in the coupling of phase oscillations (see section 3.1). Thereby the number of possible interactions in the phase simulations is reduced.

As a last step we have to normalize the activation levels at every image position similar to local contrast adaptation in the visual system. We want to make sure that the overall local activation is uniform over the visual field such that an efficient coding of regions of high contrast and regions of low contrast is possible simultaneously. Therefore, we divide all activation levels by the sum of activations over all features at each image location:

$$\mathbf{g}\_{\mathbf{x},\mathbf{y},k} = \frac{\tilde{h}\_{\mathbf{x},\mathbf{y},k}}{\sum\_{j=1}^{K} \tilde{h}\_{\mathbf{x},\mathbf{y},j}}.\tag{13}$$

As a result we have sparse activation maps with a large proportion of inactive neurons and the same average local activations at all image positions.

## **3. RESULTS**

In a first step we analyse the properties of the activation patterns induced by the natural images (section 3.1). Subsequently we evaluate the correlation statistics of these induced feature activations (section 3.2) and the resulting sparse connectivity pattern (section 3.3). Based on this connectivity pattern we show simulations of the coupled phase oscillator model and the resulting dynamic phase maps (section 3.4). Finally, evaluations of these binding maps are presented based on human labeled segmentation masks (section 3.5).

#### **3.1. SPARSENESS OF ACTIVATION**

The simulation of the coupled phase oscillators is based on the activation levels that were generated from natural images. The phase coupling is highly dependent on the type of feature representation that is used to generate the activation levels. The first reason is that the connectivity is based on the correlation between features. The second reason is that also the actual strength of the dynamic coupling is proportional to the current activation levels. Therefore, the statistics of activation plays a crucial role in the formation of the dynamic binding maps.

<sup>1</sup>http://www.di.ens.fr/~mschmidt/Software/minFunc.html

Hand labeled photographs of suburban scenes from the LabelMe database (Russell et al., 2008) are used to generate feature representations with the linear convolutional forward weights followed by a sigmoidal function. The linear convolutional kernels of the Gabor receptive fields contain only one spatial frequency and equally spaced orientations (**Figure 3A**). In contrast, the learned weights of the sparse autoencoder (**Figure 3B**) cover a diverse set of spatial frequencies, colors and orientations.

We compare the activation levels of features obtained with the regular Gabor weights and the autoencoder weights. A very important characteristic of neuronal activations is the level of sparseness. A high level of activation sparseness means that the neuron is most of the time very silent and only rarely very active. This analysis of sparse coding should not be confused with the graph theoretic sparseness which will be analyzed in section 3.3. A qualitative comparison of the activation histograms (**Figure 4A**) shows that the autoencoder activations are sparser compared to the Gabor activations. The phase model is based on the assumption that the activation is restricted to the positive domain. Note that this is in contrast to many normative models of early visual processing which assume a feature code with a Gaussian distribution with zero mean. Furthermore, in our model we are mostly interested in the "hard sparseness" of the activation levels, meaning that the activation is most of the time exactly zero and only rarely very high (Rehn and Sommer, 2007). A comparison with a Gaussian distribution restricted to the positive domain with the same mean (dashed line in **Figure 4A**) reveals that the feature activations after the sigmoidal activation function are not necessarily sparse in the context of a positive distribution with this hard sparseness criteria.

The sigmoidal activation function is followed by the subtraction of mean, rectification and the division by the sum over all features. The resulting histograms of these activation levels (**Figure 4B**) show an increased hard sparseness for both types of receptive fields. These additional preprocessing steps are similar to local regulatory mechanisms in the cortex.

A quantitative evaluation of the sparseness of the activation levels is given by the kurtosis. We use the standard measure of excess kurtosis but without mean normalization because the phase model assumes a non-negative feature coding by activation. Therefore, we evaluate the hard sparseness of feature type *j* with activation levels *h*(*s*) *<sup>x</sup>*,*y*,*<sup>j</sup>* by the kurtosis of a zero-centered distribution given by

$$\text{kurt}\_{\circ} = \frac{\left\langle \left( h\_{\mathbf{x},\mathbf{y},\bar{j}}^{(s)} \right)^{4} \right\rangle\_{\mathbf{x},\mathbf{y},s}}{\left\langle \left\langle \left( h\_{\mathbf{x},\mathbf{y},\bar{j}}^{(s)} \right)^{2} \right\rangle\_{\mathbf{x},\mathbf{y},s} \right\rangle^{2}} - 3,\tag{14}$$

where . is the mean over all image positions (*x*, *y*) and image samples *s* from the labelMe database. The estimated median kurtosis over all receptive field types increases for the activations *g* after the normalization steps described above in comparison to the activations *h* before the normalizations (**Table 1**). A comparison with a Gaussian distribution, which has a kurtosis of 0, reveals that the additional activation functions indeed increase the sparseness and lead to a leptokurtic distribution of activations. Overall the activations generated by the autoencoder are more sparse in comparison with the hand designed Gabor filters.

The additional activation functions are crucial for the subsequent phase simulations. The mean subtraction and half-wave rectification increase the hard sparseness of activations. This reduction in the number of active neurons leads to a reduction in

**FIGURE 4 | Histogram of activation levels averaged over all feature types.** The distributions of activation levels are compared to a Gabor distribution. **(A)** After sigmoidal activation function. **(B)** After mean subtraction, half-wave rectification and division by the sum.



*1After the subtraction of mean, half-wave rectification and division by the local sum of the new activation levels.*

the number of active tangential phase connections. Therefore, the features in the input image do not only multiplicatively modulate the strength of the phase interaction but also deactivate many phase connections entirely leading to a completely new effective tangential connectivity pattern.

#### **3.2. STATISTICS OF HORIZONTAL CROSS-CORRELATIONS**

The horizontal connections between the coupled phase oscillators are sampled from the cross-correlations of induced activation levels as described in equation 5. Therefore, we describe the horizontal correlations in this section and evaluate the anisotropy of receptive field types. The 4 dimensional cross-correlation tensors <sup>ρ</sup>(*k*,*m*) *<sup>x</sup>*,*<sup>y</sup>* as defined in section 2.3 are shown in **Figure 5** for 8 feature types. The Gabor receptive fields have a more regular correlation matrix (**Figure 5A**) compared to the learned autoencoder receptive fields (**Figure 5B**). The correlations between the activations of Gabor receptive fields are itself similar to high frequency Gabor functions. In contrast, the receptive fields learned by the autoencoder capture different spatial frequencies and a variety of different colors which is also reflected in the spatial cross-correlations. In both cases the horizontal cross-correlations extend over visual space up to three times the receptive field size. This suggests that the correlations indeed comprise higherorder correlation statistics of the natural images and not only interactions between overlapping receptive fields.

To analyze and compare the correlation tensor of the autoencoder and the Gabor filters, we calculate statistics for different correlation distances in visual space. The indices of the tensor are illustrated in the schematic in **Figure 6A**. For each distance *r* in visual space we calculate statistics over ρ(*k*,*m*) *<sup>j</sup>* where

$$j \in R\_r := \left\{ (\mathbf{x}, \mathbf{y}) \in \mathbb{Z}^2 \, \middle| \, r - \frac{1}{2} \le \sqrt{\mathbf{x}^2 + \mathbf{y}^2} < r + \frac{1}{2} \right\}. \tag{15}$$

The mean absolute value of the cross-correlations decreases for larger correlation distances *r* as shown in **Figure 6B**. The mean standard deviation of these absolute correlation values over different spatial directions also decreases but with a steeper slope (**Figure 6C**). To make a relative statement about the isotropy in the correlation tensor we also calculate the coefficient of variation over different directions. Therefore, we define the average anisotropy at radius *r* as

$$\text{anisotropy}(r) := \left\langle \frac{\text{std}\_{j \in R\_r} \left( \rho\_j^{(k,m)} \right)}{\text{mean}\_{j \in R\_r} \left( \rho\_j^{(k,m)} \right)} \right\rangle\_{k,m} \tag{16}$$

This mean anisotropy averaged over all pairs of receptive field types has a local maximum at visual distances of around 8–10 pixels (**Figure 6D**). This suggests that the short range phase connections over this distance help more in the synchronization of fine structures. The anisotropy has a local minimum at distances around 15–16 pixels, where more long range phase connections are dominantly used to fill-in segment pixels with similar colors.

#### **3.3. SPARSELY CONNECTED OSCILLATOR NETWORK**

The correlation values are used to sample the sparse connections for the simulations of coupled phase oscillators. We restrict the sampled connectivity pattern in simulations of natural scenes to 200 synchronizing and 200 desynchronizing afferent connections per neuron if not stated otherwise. The phase simulations of natural image scenes are run in a network of 200 × 150 × 48 neurons for Gabor features or 200 × 150 × 100 for autoencoder features respectively. Therefore, the percentage of connections that are actually formed compared to all possible connections assuming full connectivity is approximately 0.014% in the case of Gabor features and 0.007% for autoencoder features. Thus, this procedure leads to a very sparse connectivity in comparison to a network of all-to-all interactions.

We evaluate the sampled connectivity based on natural image statistics using graph theoretic measures. The connectivity structure is represented as a graph *G* = (*H*, *E*) as described in section 2.3. We compute the statistics not only over the graph of all connections *E* but also for the subgraph of synchronizing connections *E*<sup>+</sup> := *e* (*j*,*k*) *<sup>x</sup>*,*<sup>y</sup>* <sup>∈</sup> *<sup>E</sup> e* (*j*,*k*) *<sup>x</sup>*,*<sup>y</sup>* = +<sup>1</sup> and the subgraph of desynchronizing connections *E*<sup>−</sup> := *e* (*j*,*k*) *<sup>x</sup>*,*<sup>y</sup>* <sup>∈</sup> *<sup>E</sup> e* (*j*,*k*) *<sup>x</sup>*,*<sup>y</sup>* = −<sup>1</sup> individually.

For a graph with edges *E* we calculate the fraction of intrafeature connections as

$$\mu = \frac{\left| \left\{ e\_{\ $x, \$ y}^{k, m} \in E \middle| k = m \right\} \right|}{\left| \left\{ e\_{\ $x, \$ y}^{k, m} \in E \middle| k \neq m \right\} \right|} \cdot 100\%. \tag{17}$$

The most obvious observation is that the fraction of intra-feature connections is larger for synchronizing connections in comparison to the desynchronizing connections (**Table 2**). The reason is that positive correlations, which are used to sample these synchronizing connections, are stronger between the same feature type shifted over visual space. In contrast negative correlations and thus desynchronizing connections are less likely to occur between the same feature type shifted over visual space. Another observation is that the fraction of intra-feature connections of the Gabor features is roughly twice as large as in the case of the autoencoder features. The reason is that we use 100 autoencoder features and only 48 Gabor features while the total number of sampled synchronizing and desynchronizing connections per feature remains constant.

A more elaborate evaluation of the sampled connectivity of our network can be done using the clustering coefficient and the small-world characteristics (Watts and Strogatz, 1998; Humphries et al., 2006), which are also shown in **Table 2**. To define the local clustering coefficient in an infinite graph *G* = (*H*, *E*), we analyze

**FIGURE 5 | Cross-correlations between different feature activations shifted in visual space.** The shown cross-correlations are based on the activation levels induced by natural images. Only a subset of 8 features is shown. The patches on the top and left row show the forward weight matrix of the receptive fields. The other patches show the spatial

space. In the bottom schematic the correlation tensor is indexed by *j* ∈ *Rr* for a certain distance *r* in visual space. The other panels

the connectivity of the neurons in a generic feature column at position (*x*, *y*) = (0, 0). We define the neighbors of neuron *g*0,0,*<sup>k</sup>* coding feature type *k* ∈ {1..*K*} as the set of all neurons which are directly connected in the graph as

$$N\_k = \left\{ \mathbf{g}\_{\mathbf{x}, \mathbf{y}, m} \in H \Big| e^{k, m}\_{\mathbf{x}, \mathbf{y}} \in E \vee e^{m, k}\_{-\mathbf{x}, -\mathbf{y}} \in E \right\},\tag{18}$$

where we consider outbound (*e k*,*m <sup>x</sup>*,*<sup>y</sup>* ) and inbound (*e <sup>m</sup>*,*<sup>k</sup>* −*x*,−*y*) connections of the neuron. Then we define the local clustering coefficient of a feature type *k* in our network as the fraction of the number of direct connections between neighbors to the number of pairs of neighbors:

receptive fields as described in the main text.

$$\gamma\_k = \frac{|\left| \{ e\_{\mathbf{x},\mathbf{y}}^{m,n} \in E | \mathbf{g}\_{\tilde{\mathbf{x}} + \mathbf{x}, \tilde{\mathbf{y}} + \mathbf{x}, m} \in N\_k \land \mathbf{g}\_{\tilde{\mathbf{x}}, \tilde{\mathbf{y}}, n} \in N\_k \} |}{|N\_k| \cdot (|N\_k| - 1)} \tag{19}$$

We show the global clustering coefficients γ =< γ*<sup>k</sup>* >*<sup>k</sup>* for our sampled networks comprising only the synchronizing, only

#### **Table 2 | Graph theoretic statistics of the sparse connectivity pattern.**


the desynchronizing or all connections in the second row of **Table 2**.

The evaluation of the graph comprising all connections shows that the mean clustering coefficient is roughly the same for the Gabor and the autoencoder features. But the evaluation of graphs individually reveals that the clustering coefficient of only the synchronizing graph is higher for the Gabor features in comparison to the autoencoder features. And reciprocally, the desynchronizing connections show a stronger clustering in the case of autoencoder features. An explanation for this difference is that the autoencoder learns a more diverse set of receptive fields by optimizing the reconstruction error. In comparison, the regular Gabor receptive fields cover only predefined colors, spatial frequencies and orientations, which are not optimized to cover a broad range of statistics in the input images. Therefore, the correlation structure in the Gabor activations shows stronger clustering. For comparison, we also show the corresponding clustering coefficients γrandom of the equivalent networks with the same connection lengths (measured in pixel distance) but rotated by random angles and connected to random features.

We can further use the small-world index to measure the capability of neurons in our network to reach other neurons via a small number of interaction steps. The small-world index is a quantitative definition of the presence of abundant clustering of connections combined with short average distances between neuronal elements, proposed by Humphries et al. (2006). It can characterize a large number of not fully connected network topologies. The connectivity within the 3-dimensional grid of our model is sampled such that it is invariant to shifts in the two image dimensions. Therefore, we have to slightly adapt the small-world index for our infinite horizontal sheet consisting of feature columns with identical connection patterns. We use the definition of the small-world index

$$
\sigma\_{\text{sw}} = \frac{\chi/\chi\_{\text{random}}}{\lambda/\lambda\_{\text{random}}},\tag{20}
$$

where the shortest path lengths λ and λrandom measure the number of network hops needed to connect two neurons within our sampled network and a random network respectively. We use the average over all shortest path lengths between all pairs of neurons within one feature column. A network graph must have a small-world index σsw larger than one to meet the small-world criteria. The evaluations show that the graph comprising the synchronizing connections exhibits small-world properties while the desynchronizing connections are closer to a random connectivity and do not exhibit small-world properties (**Table 2**). The smallworld property might be helpful in the synchronization of distant neurons.

## **3.4. PHASE SIMULATIONS**

The resulting connectivity pattern is used in the phase simulations. All shown simulations of the coupled phase oscillator networks are initialized with random phase variables. The activation levels are only set once in the beginning and remain the same throughout the phase simulations. During the simulations attractors are formed in the phase space and are localized in certain image regions.

A simulation of the coupled phase oscillator model with localized connectivity and with uniform activation levels shows that pinwheel structures will form in the phase map (**Figures 7A,B**). The connectivity length in the network determines the scale of the pinwheels. During the simulation these pinwheels attract each other and annihilate (Wolf and Geisel, 1998). The probability of the formation of pinwheels decreases for network connectivity patterns that are less locally dense but more sparse and spread out.

In the next simulations we use several feature types to encode different aspects of the input images. To visualize the resulting 3-dimensional structure of phase variables ϕ*x*,*y*,*<sup>k</sup>* we calculate the circular mean at each image position weighted by the corresponding activation levels:

$$\log\_{\mathbf{x},\mathcal{Y}} := \arg \left( \sum\_{k} \mathbf{g}\_{\mathbf{x},\mathcal{Y},k} e^{i\varphi\_{\mathbf{x},\mathcal{Y},k}} \right), \tag{21}$$

where arg is the complex argument. We show the average phase variables ϕavg *<sup>x</sup>*,*<sup>y</sup>* coded as color hue to visually represent the circular structure of the phase.

We use two simple artificial stimuli to demonstrate the basic function of the phase simulation in the presence of structure in the activation variables (**Figures 7C,D**). The stimuli of these simulations are artificially generated grayscale images containing bar segments and circle segments (insets in **Figures 7C,D**). The connectivity in both simulations is based on Gabor receptive fields with horizontal connectivity obtained from statistics of natural images. In the simulation of two collinear aligned bars the phase of the neurons coding the two bars are synchronizing although the

two bars are not directly connected in the image (**Figure 7C**). This suggests that the simulation can implement Gestalt laws of grouping, because neurons are grouped together by having the same phase value. Specifically, a human observer could interpret these two bars as one single continues line. Therefore, the simulation can be interpreted as implementing the Gestalt law of continuity because the neurons that are coding the two bars have the same phase. Please note, that in the simulation the gap between the two bars is not filled in because our model does not incorporate any feedback from the phase variables to the activation variables. In this study we focus on relational coding by phase variables and therefore neglect any recurrent dynamics in activation variables.

The other simulation uses a dashed black circle as input (**Figure 7D**). The phase map shows that all segments of the circle are synchronizing to the same phase value. The synchronized state of the circle means that the phase variables at different segments of the circle code the global attribute and bind the individual circle segments together. Similarly, humans usually perceive the circle segments all together as one single object. This indicates that the phase simulation can also implement the Gestalt law of closure. Depending on the initialization of random phase variables, cases exist where the circle does not synchronize to one coherent phase but forms a continuous phase progression one or multiple times from 0 to 2π. On one hand these simulations reproduce the previous studies demonstrating binding properties of coupled neural oscillators. On the other hand, in these simulations the connectivity is learned based on natural stimuli and not hand crafted. Hence, it demonstrates that these Gestalt properties are learned from the statistics of natural stimuli.

We next evaluate the concept of binding by synchrony also on natural visual scenes. All following simulations in this paper use color images from the LabelMe database (Russell et al., 2008) and either the Gabor filters or the autoencoder filters to generate the activation levels for the network. An example of a suburban scene is shown in **Figure 8A** with the corresponding human labeled segmentation masks in **Figure 8B**. We use the time constant τ = 1/3 for the simulations based on Gabor filters and τ = 1/30 for the simulations based on autoencoder filters. These values were chosen such that per iteration of the classical Runge-Kutta solver the phase of not more than 1% of all neurons changes more than π/2. The units of these time constants are arbitrary because our model of coupled phase oscillators describes the change in phase independent of the oscillation period. Examples of the resulting phase maps are shown in **Figure 8C** for Gabor activations and **Figure 8D** for autoencoder activations. The phase maps of simulations using autoencoder weights are blurred compared to the Gaborfilters because the peak of the receptive fields are not necessarily centered within the convolutional weight matrix, leading to shifts in visual space between different feature maps at segment boundaries. Yet in both examples an intuitive segmentation of the original can be recognized again in the distribution of phase values. We see a constantly increasing phase synchrony in labeled segments. This example suggests that high-level image objects are likely to synchronize to a coherent phase.

#### **3.5. EVALUATION OF PHASE MAPS**

We evaluate the simulated dynamic phase maps and compare them with human labeled binary segmentation masks of high level image objects from the LabelMe database. We begin with an evaluation of the resulting phase maps independently from the labeled image masks to show global properties of the coupled phase oscillator model and the influence of the number of horizontal connections (section 3.5.1). This is followed by an evaluation of the phase synchrony within labeled segments with respect to the surrounding of the segments (section 3.5.2). Finally a local evaluation of the phase maps at the boundaries of labeled segments is presented (section 3.5.3).

#### *3.5.1. Phase synchrony*

Segmentation and binding of neurons in the network can only be achieved if the phase variables are not random but also not completely synchronized. Therefore, we will first evaluate the local phase synchrony independent of segments in the image. We define the synchrony in a population *M* of neurons as

$$p\_M = \left| \frac{\sum\_{m \in M} \mathcal{g}\_m \cdot e^{i\varphi\_m}}{\sum\_{m \in M} \mathcal{g}\_m} \right|, \tag{22}$$

where *M* is defined as a set of 3-dimensional indices describing the position of the neurons.

In this section we analyze the simulation shown in **Figure 8** in more detail and evaluate how the number of synchronizing and desynchronizing connections effects the phase synchrony. We evaluate the local phase synchrony at image position (*x*, *y*) for a

certain radius *r* by calculating *pMx*,*y*,*<sup>r</sup>* for neurons at positions

$$M\_{\mathbf{x},\mathbf{y},r} = \left\{ (\tilde{\mathbf{x}}, \tilde{\mathbf{y}}, k) \Big| (\mathbf{x} - \tilde{\mathbf{x}})^2 + (\mathbf{y} - \tilde{\mathbf{y}})^2 < r^2 \right\},$$

$$(\mathbf{x}, \mathbf{y}) \in \mathbb{N}^2, k \in \{1..K\}, \tag{23}$$

where *K* is the number of feature maps. We average this quantity over all possible image positions (*x*, *y*). This mean local phase synchrony is shown in **Figure 9** for simulations using different number of connections, different iterations and for different radii *r*.

When the network has reached a steady state, the mean local phase synchrony depends on the number of synchronizing and desynchronizing connections (**Figures 9A,D**). The number of synchronizing connections increases the average local phase synchrony. In contrast, the number of desynchronizing connections can increase or decrease the average local phase synchrony depending on the number of synchronizing connections. At first sight, this may be counterintuitive. In the case of few synchronizing connections, the desynchronizing connections repel the associated phase variables from each other. This ultimately leads to a clustering in the circular phase space evoked by desynchronizing interactions. In the case of more synchronizing connections, the main force driving the network are attractor states and therefore desynchronizing connections decrease the overall phase synchrony.

The phase synchrony in the steady state condition increases with the ratio between synchronizing and desynchronizing connections up to a ratio of 16 times more synchronizing than desynchronizing connections (**Figures 9B,E**). Interestingly, the phase synchrony in the steady state condition decreases again in simulations with more than 800 synchronizing connections and very few desynchronizing connections. During the transient phase a very low or high ratio leads to a faster convergence to a more synchronized state. The slowest convergence is achieved at the cases with 4 times more desynchronizing connections or when the number of synchronizing and desynchronizing connections is balanced.

The phase simulations show synchronization behavior at a large variety of different spatial scales (**Figures 9C,F**). The level of synchrony at the steady state decreases for increasing radius of the phase synchrony evaluation. At all spatial scales the time to reach the steady state synchrony level is roughly the same. Only very localized regions over 1-2 pixel distances show a slightly faster convergence to the final phase synchrony level. When not otherwise stated we select in all simulations and evaluations an intermediate parameter range with balanced synchronizing and desynchronizing connections leading to rich dynamics. These standard parameters are marked with blue circles in **Figure 9**.

#### *3.5.2. Segmentation index*

The dynamic binding and segmentation of the simulated phase maps of natural images are evaluated using hand labeled segmentation masks. Here a baseline is necessary to accommodate for the higher probability of synchronization between neurons that are close by. Consequently we use the labeled image masks on the corresponding simulated phase maps and compare them to a baseline using the same image masks on simulations of different non-matching images.

The segmentation masks in the LabelMe database are specified as polygons on the images that are initially reduced in our

Simulations in the top row **(A–C)** are based on Gabor weights; simulations in the bottom row **(D–F)** are based on autoencoder weights. Blue circles indicate the standard parameters for subsequent evaluations. Colorbars of all panels are the same and shown on the right. The panels in the left column **(A,D)** show the phase synchrony after 20 iterations for different number of excitatory and inhibitory

to the diagonal elements marked with red arrows in panels **(A,D)**. And the shown time course of the average phase synchrony values are from the same simulations. In the right panels **(C,F)** the phase synchrony is shown for different sizes of the local circular region of the evaluations. The red circle indicates the radius which was used in the evaluations shown in the other panels.

simulation to a resolution of 400 × 300 pixels. The convolutional forward projections lead to a further reduction in the feature representation to a grid of 200 × 150 pixels. Therefore, we restrict the evaluations of the phase maps to segmentation masks which contain at least as many pixels as the specified patch size of the forward projections (6 × 6 neurons corresponding to 12 × 12 pixels in the input image). In addition, segments occupying more than half of the respective images are excluded to allow evaluations against a baseline synchrony of the surrounding regions. The range of labeled segments which is used in our evaluations is shown as a horizontal bar in **Figure 10**. Only in evaluations where the segment sizes are explicitly stated, we also evaluate these otherwise excluded very small and very large segments.

The number of labeled segments in the database decreases for larger segment sizes (**Figure 10A**). Yet the total area occupied by segments in the different bins increases for larger segment sizes (**Figure 10B**). Therefore, when applying labeled masks to nonmatching images small segments are highly likely to fall into large segments where a large number of tangential connections is functionally active. Consequently the phase synchrony within labeled segments is not a sufficient baseline for an unbiased comparison with simulations of non-matching images. Therefore, we need a baseline to control for the unequal distribution of segment sizes and their occupied region in the images.

To accommodate for the statistics of segment sizes in the evaluation of the matching and non-matching natural scenes, we define a segmentation index (**Figure 11**) that sets the phase synchrony in segments into the context of the surrounding neurons. Concretely, the segmentation index evaluates how the phase of neurons inside of segments is more or less synchronized compared to the synchrony of random neurons inside and outside of the segment. The neighborhood *N* of a segment *Q* is generated using a diamond shaped grow operation on the segmentation mask repeatedly until the number of neurons in *N* is doubled compared to the original segment *Q*. Therefore, *N* is the union of the segment *Q* and the surrounding *R* of the segment (*Q* and *R* are annotated in the example shown in **Figure 11**).

We calculate the phase synchrony values *pQj* and *pNl* for random subsets *Qj* ⊂ *Q* and *Nl* ⊂ *N* where *j*, *l* ∈ {1,..., 100} and *Qj*, *Nl* ∈ N1000. We define the segmentation index of segment *Q* as the difference between the mean synchrony within the segment *Q* to the mean synchrony in the neighborhood *N* = *R* ∪ *Q*:

$$\kappa(Q, N) = \langle \mathfrak{p}\_{Q\_{\rangle}} \rangle\_{\bar{j}} - \langle \mathfrak{p}\_{N\_l} \rangle\_l. \tag{24}$$

The segmentation index increases over simulation iterations for matching and non-matching masks and images (**Figure 12A**). The matching conditions have a steeper ascent and reach a higher segmentation index compared to the non-matching conditions. The difference between the matching segmentation index and the non-matching segmentation index increases for both simulations using Gabor weights and autoencoder weights (**Figure 12B**). The simulations using regular Gabor receptive fields show larger differences between matching and non-matching segmentation indices compared to the autoencoder weights. The ratio between matching and non-matching segmentation indices is roughly the same for both types of receptive fields. This demonstrates systematic binding in the phase maps of matching segments.

An evaluation for different segment sizes individually reveals more differences between the Gabor and autoencoder features. The evaluations of the matching conditions show that the segmentation index increases for larger segments in the case of the autoencoder features but decreases for larger segments in the case of the Gabor features (**Figure 12C**). An explanation is that the autoencoder contains more features with low spatial frequencies while the Gabor features are restricted to one specific spatial frequency.

The paired difference between matching and non-matching evaluations shows that the Gabor filter and the autoencoder have roughly the same performance for large segment sizes (**Figure 12D**). For small segment sizes the autoencoder has a decreased segmentation performance. One possible explanation might be that the receptive field weights are not centered (compare **Figure 3**) and therefore different feature neurons might be slightly misaligned relative to the hand labeled segmentation masks, which are defined as polygons with arbitrary precision on the image.

Overall the results show a significant difference between the matching and the non-matching segmentation indices for all evaluated segment sizes. The paired difference between the matching and the non-matching conditions increases as the simulation of the randomly initialized phase variables slowly converges to a state with clusters in the circular phase space. After about 20 network iterations the paired difference in the segmentation index reaches a high plateau. Therefore, the coupled phase oscillator model achieves a stable segmentation of the natural image scenes with a coding of binding by synchrony.

## *3.5.3. Segment boundaries*

To evaluate how well the phase maps segment different labeled regions at their borders we calculate a metric at random locations of segment boundaries. We sample 50 random locations from all boundary lines of the segments in each simulated image from the LabelMe database. At these locations we use the angle of the segment boundary to divide a local region into two semicircles with a radius of 10 pixels such that one half lies approximately within the segment and the other half outside of the segment (**Figure 13A**). The mean phase difference between both semicircles decreases

over simulation time (**Figure 13B**). The paired difference between the phase difference in matching compared to non-matching images shows that the phase difference over matching segment boundaries is significantly larger (**Figure 13C**).

The evaluation of the phase difference as a function of the size of this circular region shows that the segmentation performance using autoencoder features decreases for very small regions (**Figures 13D,E**). This might be due to the above described misalignments between the learned receptive field centers. For very large evaluation regions the performance decreases for both receptive field types because the circular regions are likely to extend beyond the hand labeled segment regions.

It is possible to evaluate the segmentation performance of the dynamic binding maps without the need for a baseline on non-matching images if we use an unbiased performance estimator with a clearly defined chance level. Therefore, we measure how well the phase map can predict the angle of the borders of segmentation masks. We use the phase variables at randomly sampled locations on segment boundaries (**Figure 13A**) and compute the image direction with the largest change in the phase variables. We define the local variance in phase at image position (*x*, *y*) as

$$\mathfrak{d}\_{\mathbf{x},\mathbf{y}} = 1 - \frac{1}{\mathbf{5} \cdot K} \cdot \left| \sum\_{k=1}^{K} e^{i\varphi\_{\mathbf{x},\mathbf{y},k}} + e^{i\varphi\_{\mathbf{x}-1,\mathbf{y},k}} + e^{i\varphi\_{\mathbf{x},\mathbf{y}-1,k}} \right| \tag{25}$$

$$+ e^{i\varphi\_{\mathbf{x}+1,\mathbf{y},k}} + e^{i\varphi\_{\mathbf{x},\mathbf{y}+1,k}} \Bigg| \tag{25}$$

where the sum is over all *k* ∈ {1..*K*} feature maps. We use the structure tensor of the local variance in phase to estimate the principal directions. To compute the structure tensor we use a Gaussian window function with a standard deviation of 3 pixels and the second order central finite difference of the local variance in phase. The eigenvector of the structure tensor gives an estimate of the border direction of the segmentation mask. The evaluation of the phase maps shows that the mean error in the estimation of the boundary angles decreases over simulation time (**Figure 13F**). A minimum is reached after around 20 network iterations with an error of approximately 28◦ in comparison to the chance level of 45◦. This demonstrates that the phase gradient systematically aligns itself orthogonal to the segment boundaries.

#### **4. DISCUSSION**

Here we investigate the concept of binding by synchrony, as has been previously studied with abstract stimuli, in the context of unsupervised learning and natural stimuli. The model consists of coupled phase oscillators with a connectivity based on natural image statistics. Specifically, the correlation of neuronal activity governs the structure of local horizontal connections in the network. Hence the connections are not constructed according to a heuristic or intuition, but solely data driven. Therefore, we can expect it to generalize well to other cortical areas. We show that the sampled sparse connectivity based on positive correlations in induced activations by natural stimuli exhibits small-world properties. We hypothesize that the small world property is a signature of Gestalt laws in the form of regular local correlations (objects) that can be flexibly combined on a global scale. We show that these horizontal connections influence the dynamics of the phase variables such that an effective coding of contextual relationships between active neurons is implemented by phase

lines) and non-matching images (dashed lines). **(C)** The paired difference

synchronization. Therefore, our results reveal that the concept of binding by synchrony is viable for natural stimuli.

The evaluation of phase synchronization as a code for grouping and segmentation utilizes hand labeled image segments, corresponding to high level objects, as ground truth. The evaluations reveal that the phase maps are binding active neurons together if they encode different attributes of the same stimulus. It follows that the phase variables are coding global stimulus attributes in contrast to the coding of local stimulus attributes by the rate variables. The coding of these global contextual relationships is not directly influenced by the rate variables but only by their indirect modulation of the phase interactions. Furthermore, we illustrate that discontinuities are formed in the phase maps at the borders of segments and that these discontinuities can predict the orientation of segment boundaries. Therefore, our results suggest that the segmentation driven by bottom up dynamical processes using natural image statistics matches to a certain degree the top-down labeling of abstract image objects.

Our study connects three different subject areas: natural image statistics, dynamical models of neural networks and normative models of sensory processing. In the following we will discuss the motivations and implications of our study from each of these perspectives.

## **4.1. CHOICE OF NATURAL STIMULI**

are 95% confidence intervals.

The choice of "natural" stimulus material is not as obvious as it might seem. A more natural choice from a biological perspective would be to use stimulus material generated by a moving agent. For example videos from a camera mounted to a cat's head were used previously to analyze the spatio-temporal structure of natural stimuli (Kayser et al., 2003). A similar setup from a human perspective is also possible (Açik et al., 2009). But time variant stimuli require more computational resources and the high number of horizontal connections in our simulations is computationally expensive although it is implemented as a vectorized operation. In addition, the analysis of the phase segmentation maps would be more difficult in the case of moving stimuli because of the unknown time lag between stimulus onsets and the resulting dynamic phase maps. Therefore, we decided to not use videos as stimulus material in the present study.

Differences in eye movements given different stimulus classes might also play a role in shaping the statistics in the visual input received by the primary visual cortex. There might be important interactions between saccadic eye movements and the dynamics of the horizontal connections in the visual cortex. One could simulate saccadic movements on static images using saliency maps and use the resulting images for the feedforward processing in our model. But as with moving stimuli in general it would complicate the analysis and would not contribute directly to the understanding of the central questions of binding by synchrony.

The LabelMe database provides a large set of only static images. It has the advantage that the images are accompanied by labeled region masks of well defined objects. These high level labeled masks are often overlapping in the case of part-based segmentations of objects. The segmentation evaluation is tricky in the case of occluded objects. But the LabelMe database allows us to investigate the relationship between natural image statistics and the coding of high level image concepts. Therefore, we think it is a reasonable choice to use this database in our study.

## **4.2. BIOLOGICAL PLAUSIBILITY**

As with most computational neural network models we have to ask ourselves in how far it is biologically plausible. To advance our knowledge about the underlying computation principles in the cortex, it is always a good choice to model only the level of detail which is necessary to explain the phenomena under investigation. Thereby we assure that the abstraction level of the model is as good as possible although it is very likely that some mechanisms below the level of detail modeled here play an important role in synchronization phenomena. We implement in our simulations the influence of correlated neuronal activity on large time scales to the network connectivity. Based on these connections we show how the dynamics on fast time scales can code for segmentation and binding. Therefore, we have to model the behavioral learning time scales (>days) to capture the natural image statistics and the dynamical network time scales (<seconds) simultaneously. Therefore, we consider the chosen network architecture of segregated rate and phase based coding suitable to investigate the role of correlated neuronal activity on the network dynamics and relational coding by synchronization.

The Kuramoto model restricts the dynamical interactions between coupled oscillators to a scalar phase variable. Breakspear et al. (2010) review this simplified model of coupled phase oscillators in the context of models of complex neurobiological systems. They find that it captures the core mechanisms of neuronal synchronization and a broad repertoire of rich, non-trivial cortical dynamics. Studies of the Kuramoto model mostly focus on regularly defined phase interactions without a separate network variable representing the activation levels of the oscillator neurons. This allows using mean-field approximations to further simplify the analysis of the Kuramoto model. In contrast, our study focuses on the simulation of heterogeneous connections which are modulated by heterogeneous activation levels induced by natural stimuli. Therefore, our simulation model is more similar to the diverse activations and connections found in biological neural systems but this comes with the drawback that a mean-field approximation is not warranted.

In principle two biological interpretations of the coupled phase oscillator model are possible. A conservative standpoint is an interpretation as a neural field model in which each network unit of our simulation represents a functional module, i.e., a cortical column, which is comprised of many biological neurons. In this case the phase variables would represent the average phase of a set of biological neurons, i.e., the phase of the local field potential. A second possible more fine-grained interpretation in which the phase oscillators represent individual biological neurons might seem far-fetched and oversimplified on first sight. Nonetheless the interpretation of the phase variables as spike timings might give further ideas about possible extensions of our proposed model. In this interpretation the oscillators represent the limit cycles of the dynamics of spike generation of biological neurons. The sinusoidal interaction function can then be related to an integral over the phase response function of a spiking neuron (Sturm and König, 2001). Furthermore, the spike interpretation could motivate the introduction of conduction delays in our model. This in turn might further allow studying spike-timing dependent plasticity in the context of a normative model.

Certainly, there are many phenomena that can only be modeled by more detailed spiking neuron models. For example spike-timing dependent plasticity could only be modeled with the phase oscillator model if we assume regular oscillatory firing but not in the case of irregular firing. For example, the ability of self-organizing recurrent networks (SORN) to learn spatio-temporal structures in the input depends on spike-timing dependent plasticity and irregular firing (Lazar et al., 2009). Similarly, Buonomano and Maass (2009) showed that spatiotemporal processing of natural stimuli can emerge from the dynamics of "hidden" neuronal states, such as short-term synaptic plasticity. Irregular firing is also needed for synfire chains of successively activated neural assemblies to explain the physiological measurements of spike patterns recurring with millisecond precision (Abeles, 1982). However, it might be possible to simulate some properties of synfire chains if we add more hierarchical layers and phase conduction in the feed forward projections in our model. Kumar et al. (2010) analyzed the coexistence of firing rate propagation and synchrony propagation in feed forward networks. Last but not least, self-organized criticality and cortical avalanches (Beggs and Plenz, 2003) can probably only be modeled with more detailed spike-based neuron models because the phenomenon requires a dynamical system of more complex coupled oscillators.

There are also other dynamical models of neural networks that were analyzed in the context of scene segmentation (Tononi et al., 1992). Wang and Terman (1997) described the local excitatory global inhibitory oscillator network (LEGION), which is comprised of units described by two differential equations that explicitly model a stable periodic orbit alternating between two phases with rapid transitions between them. This model has the advantage that fast synchronization of the coupled oscillators is possible. But it simulates each neuronal oscillation on a fast timescale and the synchronization of a population of neurons is only visible at certain simulated time points. In contrast, our phase model simplifies the phase plane to a continues phase variable averaged over many oscillatory periods, so that the phase relationships between all pairs of neurons is explicitly represented at all simulation time points. Another difference is that the implementation of LEGION involves many discontinuous operations to reduce the computation time. These discontinues operations prevent a normative model approach with optimizations using gradient descent. The full continuous dynamics in our model allows further optimizations of the horizontal connectivity using gradient descent methods.

In our model the forward connections are computed once and are then fixed during the phase simulation of horizontal connections. This is a very simplified model compared to the ongoing simultaneous processing of afferent and recurrent inputs in the cortex. But it is compatible with the fact that self sustained activity in the cortex can be measured also in the absence of stimulus inputs. Furthermore, computational models of cellular and network behavior support the conclusion that the cortical network operates in a recurrent rather than a purely feed-forward mode (Mariño et al., 2005). Therefore, it makes sense to simulate the lateral interactions decoupled from the time scale of forward projections that generate the activation levels.

We use the correlated neuronal activation levels as the probability to form horizontal intralayer connections. It was shown that the measured horizontal connectivity in the visual cortex of cats is indeed proportional to the correlation between receptive field wavelets in image statistics (Betsch et al., 2004). Our choice to use a sparse connectivity pattern instead of full connectivity with heterogenous connection strengths was initially intended as a computational shortcut to allow large-scale simulations. This sparse connectivity is in line with biological horizontal connectivity and reveals interesting properties that deserve further investigation. In the brain the binding of stimulus representations has to be distributed over many cortical areas. It was shown with graph theoretic measures that the sparse connectivity within the cortex is organized in hubs and shows properties of small-world networks (Sporns et al., 2004). One can speculate that this allows binding by temporal structure even between stimulus representations over distant cortical regions. Also in our network model the sampled sparse connection patterns generated from correlated neuronal activity were shown to have small-world properties in the case of synchronizing connections. Accordingly, we see in our network simulations fast synchronizations of distant neurons that are not directly connected. And in future studies our model could be extended to simulate even synchronizations between different cortical regions.

In the cortex a wide range of oscillatory frequencies at different spatial scales occur with cross-frequency couplings. This is highly prominent in different sleep stages (Belluscio et al., 2012) and plays an important role in memory encoding (Friese et al., 2012). Our model is highly simplified in the sense that all neurons are assumed to have the same oscillatory natural frequency. We simulate only horizontal connections between neurons with similar physiological properties which are operating in the same dynamical regime. In this context, the assumption that all active neurons are close to a similar dynamical limit cycle seems reasonable. In future work, several cortical rhythms could be implemented using several phase variables per neuron. One can conceive different algebraic structures which could efficiently represent cross-frequency couplings in the cortex. This would allow investigating fractal binding at different abstraction levels and segmentation at different scales.

In summary, the architecture of our model captures many important aspects of biological neural networks. In particular, it models the dynamical properties used for contextual coding and the unsupervised learning of statistics in natural stimuli. At the same time, our model keeps the simplicity required for the analysis of the network dynamics and allows relatively simple evaluations of the resulting phase relationships.

## **4.3. COMPARISON WITH OTHER NORMATIVE MODELS**

In recent years the abstraction from complex differential equations describing biological neural networks to normative models of rate-based sensory processing improved our knowledge on the underlying computational principles of the cortex (Olshausen and Field, 2005). Unsupervised learning of the inherent statistics in the sensory input seems to be one of the main mechanisms governing the structural connectivity between neurons in low level sensory areas of the cortex (Olshausen and Field, 1996; Wiskott and Sejnowski, 2002; Körding et al., 2004). On the other hand relatively few studies have investigated the relationship between unsupervised learning using correlated neuronal activity and the coding of contextual relationships through binding by synchrony. In this section we describe differences and similarities between our model and other normative models of sensory processing in the brain.

Wyss et al. (2006) and Franzius et al. (2007) show that ratecoding neurons form a hierarchy of processing stages resembling the ventral visual pathway. These studies use optimization functions of optimal stability and decorrelation while exposing the network to natural stimuli. Although these models provide important insights into the information processing mechanisms in the cortex, they don't take into account the processing of contextual information and lack an implementation of relational coding between different features. In a similar way to these studies, we use the statistics of natural stimuli not only to learn feature representations but also to explain relational coding in the context of binding by synchrony. This approach could allow combining multi-scale image segmentation and object recognition into a hierarchical neuronal network model. A prerequisite for analyzing the segmentation by synchrony in a hierarchical network is an unsupervised learning of the feed-forward connections to generate the activation levels for higher network layers. We have shown that the proposed segmentation by synchrony works with receptive fields obtained from convolutional autoencoders, which can be stacked to obtain the forward and backward connections within a hierarchy. This allows a completely unsupervised learning of feed-forward, feed-back and intralayer connections using natural image statistics. Binding and extraction of features can be accomplished simultaneously within the hierarchy.

Biologically inspired autoencoder models were shown to be efficient for unsupervised learning of receptive fields by minimizing the reconstruction error of the input (Coates et al., 2010). Complex valued autoencoders have similar to our model 2 variables per network node (Baldi and Lu, 2012). To our knowledge the available publications investigating complex valued autoencoders focus mainly on the aspect of learning compressed representations of complex valued inputs. They do not directly address the biological motivation of binding by synchrony. They are usually strictly defined on the typical complex algebra and are not described by a differential equation which corresponds to coupled oscillators. The formalism of complex valued autoencoders might be adapted to allow further abstractions of our model. This could support our understanding of the underlying computational principles of visual grouping and segmentation.

A very different and novel approach of coding contextual informations in autoencoder networks are mean-covariance restricted Boltzmann machines (Ranzato and Hinton, 2010). In these models latent hidden factors are used to efficiently represent the contextual information in the input in addition to the usual representation of pixel means in standard models of restricted Boltzmann machines. It was shown that the model can efficiently code pixel covariances in analogy to complex cells and pixel means in analogy to simple cells. However, the coding of contextual information in these models is limited to pair-wise interactions in the input layer. Therefore, this kind of generative model can capture only a linear combination of second order statistics so that contextual interactions between a large group of neurons is only possible through direct connections. In contrast, the grouping in our model is a dynamic process in which interactions between neurons are possible without a direct connection between them but through intermediate neurons. The reason is that our model uses a dynamical system approach with recurrent connections in contrast to probabilistic modeling of forward and backward connections.

Some mathematical theories of cortical processing mechanisms also take the contextual information into account. For example the free energy principle (Friston, 2010) and the theory of coherent infomax (Kay and Phillips, 2011) explicitly incorporate the context into single-variable local processors in the network. In contrast, the model presented in this paper takes the context into account in a separate phase variable, which codes relational properties similar to the dynamics on fast time scales in biological neural networks. Thereby our simulation allows to model higher order relational structures with a limited number of horizontal connections. In contrast, in the mathematical formalization of coherent infomax the contextual field input is assumed to be integrated into a single variable output of a local processor in the network. Thereby it doesn't allow implementing higher order relations between many local processors if the computational resources are limited. This limitation is of course only a matter of the used mathematical formalism and doesn't affect the general explanatory power of the free energy principle or the theory of coherent infomax. Therefore, in a broader sense our simulation model could be seen as an approximate implementation of these abstract concepts, although we use a biologically motivated architecture instead of a probabilistic derivation.

Our study combines aspects of these normative models of sensory processing and of detailed models of dynamical neural networks. We use only the statistics induced by natural images to learn unsupervised the forward and tangential phase connections. The supervised labeled segmentation masks are only used to evaluate how phase synchrony corresponds to a relational coding in the neural representation. Hence, the concept can be phrased completely in the form of a normative model. In future work, we plan to further formalize the model and conceive more complex learning rules for the phase interactions. These learning rules could replace the sampling of sparse connections from the correlation of activation by a more biologically motivated rule. For example, one could develop learning rules based on spike-timing dependent plasticity if phase delays are incorporated in the interactions of the network. This would additionally allow modeling phase locking between neurons and coding of syntactic relations in the network. These extensions to our model could provide new insights into the computational principles underlying higher order cognitive processes.

## **5. CONCLUSIONS**

Our study revealed that the concept of binding by synchrony is viable in the context of unsupervised learning using natural stimuli. We show that the structural connectivity based on correlated activity leads to relational coding in a neural network model of coupled phase oscillators. The presented novel evaluation methodology for image segmentation revealed that the phase of neurons code global stimulus attributes. This strengthens the evidence that phase synchronization plays a key role to coordinate the spatially distributed information processing in the cortex. One could further speculate on how higher level coordination and binding between cortical areas might evolve from unsupervised learning based on correlated neuronal activity.

## **ACKNOWLEDGMENTS**

The authors would like to thank Robert Martin for his valuable ¨ comments and helpful suggestions.

## **FUNDING**

This work was funded by the DFG through SFB 936 Multi-Site Communication in the Brain.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 June 2013; accepted: 30 December 2013; published online: 27 January 2014.*

*Citation: Finger H and König P (2014) Phase synchrony facilitates binding and segmentation of natural images in a coupled neural oscillator network. Front. Comput. Neurosci. 7:195. doi: 10.3389/fncom.2013.00195*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Finger and König. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Patterns of interval correlations in neural oscillators with adaptation

## *Tilo Schwalger 1,2\* and Benjamin Lindner 1,2*

*<sup>1</sup> Bernstein Center for Computational Neuroscience, Berlin, Germany*

*<sup>2</sup> Department of Physics, Humboldt Universität zu Berlin, Berlin, Germany*

#### *Edited by:*

*Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany*

#### *Reviewed by:*

*Magnus Richardson, University of Warwick, UK Richard Naud, Ecole Polytechnique Federale Lausanne, Switzerland*

#### *\*Correspondence:*

*Tilo Schwalger, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland e-mail: tilo@pks.mpg.de*

Neural firing is often subject to negative feedback by adaptation currents. These currents can induce strong correlations among the time intervals between spikes. Here we study analytically the interval correlations of a broad class of noisy neural oscillators with spike-triggered adaptation of arbitrary strength and time scale. Our weak-noise theory provides a general relation between the correlations and the phase-response curve (PRC) of the oscillator, proves anti-correlations between neighboring intervals for adapting neurons with type I PRC and identifies a single order parameter that determines the qualitative pattern of correlations. Monotonically decaying or oscillating correlation structures can be related to qualitatively different voltage traces after spiking, which can be explained by the phase plane geometry. At high firing rates, the long-term variability of the spike train associated with the cumulative interval correlations becomes small, independent of model details. Our results are verified by comparison with stochastic simulations of the exponential, leaky, and generalized integrate-and-fire models with adaptation.

**Keywords: spike-frequency adaptation, non-renewal process, serial correlation coefficient, phase-response curve, integrate-and-fire model, long-term variability**

## **1. INTRODUCTION**

The nerve cells of the brain are complex physical systems. They generate action potentials (spikes) by a nonlinear, adaptive, and noisy mechanism. In order to understand signal processing in single neurons, it is vital to analyze the sequence of the interspike intervals (ISIs) between adjacent action potentials. There is experimental evidence accumulating that the spiking in many cases is *not* a renewal process, i.e., a spike train with mutually independent ISIs, but that intervals are typically correlated over a few lags (Lowen and Teich, 1992; Ratnam and Nelson, 2000; Neiman and Russell, 2001; Nawrot et al., 2007; Engel et al., 2008) [further reports are reviewed in (Farkhooi et al., 2009; Avila-Akerberg and Chacron, 2011)]. These correlations are a basic statistics of any spike train with important implications for information transmission and signal detection in neural systems (Ratnam and Nelson, 2000; Chacron et al., 2001, 2004; Avila-Akerberg and Chacron, 2011) and man-made signal detectors (Nikitin et al., 2012). They are often characterized by the serial correlation coefficient (SCC)

$$\rho\_k = \frac{\langle (T\_i - \langle T\_i \rangle) \left( T\_{i+k} - \langle T\_{i+k} \rangle \right) \rangle}{\langle (T\_i - \langle T\_i \rangle)^2 \rangle},\tag{1}$$

where *Ti* and *Ti*<sup>+</sup>*<sup>k</sup>* are two ISIs lagged by an integer *k* and -· denotes ensemble averaging. ISI correlations can be induced via correlated input to the neural dynamics, e.g. in the form of external colored noise (Middleton et al., 2003; Lindner, 2004), intrinsic noise from ion channels with slow kinetics (Fisch et al., 2012), or stochastic narrow-band input (Neiman and Russell, 2001, 2005; Bauermeister et al., 2013).

Another ubiquitous mechanism for ISI correlations are slow feedback processes mediating spike-frequency adaptation (Chacron et al., 2000; Liu and Wang, 2001; Benda et al., 2005) a phenomenon describing the reduced neuronal response to slowly changing stimuli (Benda and Herz, 2003; Gabbiani and Krapp, 2006). In the stationary state, these adaptation mechanisms are typically associated with short-range correlations with a negative SCC at lag *k* = 1 and a reduced Fano factor as demonstrated by several numerical (Geisler and Goldberg, 1966; Wang, 1998; Liu and Wang, 2001; Benda et al., 2010) and analytical studies (Schwalger et al., 2010; Schwalger and Lindner, 2010; Farkhooi et al., 2011; Urdapilleta, 2011). The correlation structure of adapting neurons can show qualitatively different patterns, ranging from monotonically decaying correlations to damped oscillations when plotted as a function of the lag (Ratnam and Nelson, 2000). Because ISI correlations shape spectral measures (Chacron et al., 2004), they bear implications for neural computation in general. However, a simple theory that predicts and explains possible correlation patterns is still lacking.

In this article, we present a relation between the ISI correlation coefficient ρ*<sup>k</sup>* and a basic characteristics of nonlinear neural dynamics, the *phase-response curve*(PRC). The PRC quantifies the advance (or delay) of the next spike caused by a small depolarizing current applied at a certain time after the last spike (Ermentrout, 1996). For neurons which integrate up their input (integrator neurons), the PRC is positive at all times (type I PRC) whereas neurons, which show subthreshold resonances (resonator neurons), possess a PRC that is partly negative (type II PRC) (Ermentrout, 1996; Izhikevich, 2005; Ermentrout and Terman, 2010). Below we show that resonator neurons possess a richer repertoire of correlation patterns than integrator neurons do.

#### **2. RESULTS**

#### **2.1. MODEL**

Spike frequency adaptation can be modeled by Hodgkin–Huxley type neurons with a depolarization-activated adaptation current (Wang, 1998; Ermentrout et al., 2001; Benda and Herz, 2003). However, the spiking of such conductance-based models can in many instances be approximated by simpler multi-dimensional integrate-fire (IF) models that are equipped with a spike-triggered adaptation current (Treves, 1993; Izhikevich, 2003; Brette and Gerstner, 2005); adapting IF models perform excellently in predicting spike times of real cells under noisy stimulation (Gerstner and Naud, 2009). Here, we consider a stochastic nonlinear multi-dimensional IF model for the membrane potential *v*, *N* auxiliary variables *wj* (*j* = 1,..., *N*) and a spike-triggered adaption current *a*(*t*):

$$
\dot{\nu} = f\_0(\nu, \mathbf{w}) + \mu - a + \xi(t), \tag{2a}
$$

$$
\dot{\mathbf{w}}\_{\rangle} = f\_{\rangle}(\nu, \mathbf{w}), \tag{2b}
$$

$$
\pi\_a \dot{a} = -a + \pi\_a \Delta \sum\_i \\$\ (t - t\_i)\ . \tag{2c}
$$

The membrane potential *v*(*t*) is subject to weak Gaussian noise ξ(*t*) with ξ(*t*)ξ(*t* ) = 2*D*δ(*t* − *t* ) and noise intensity *D*. The dynamics is complemented by a spike-and-reset mechanism: whenever *v*(T) reaches a threshold υ(*t*), a spike is registered at time *ti* = *t* and *v*(*t*) and **w**(*t*) = [*w*1(*t*), . . . , *wN*(*t*)] <sup>T</sup> are reset to *v*(*t* + *<sup>i</sup>* ) = 0 and **w**(*t* + *<sup>i</sup>* ) = **w***<sup>r</sup>* (where *t* + *<sup>i</sup>* denotes the right-sided limit *t* → *ti* + 0). At the same time, *a*(*t*) suffers a jump by - ≥ 0 as seen from Equation (2c), which resembles high-threshold adaptation currents (Wang, 1998; Liu and Wang, 2001). The constant input current μ is assumed to be sufficiently large to ensure ongoing spiking even in the absence of noise. Note that the model is non-dimensionalized by measuring time in units of the membrane time constant τ<sup>m</sup> ∼ 10 ms and voltage in units of the distance between reset and spike-initiating potential (a typical value is 15 mV). In particular, the adaptation time constant τ*<sup>a</sup>* is measured relative to τ<sup>m</sup> and the unit of the firing rate is τ−<sup>1</sup> <sup>m</sup> ∼ 100 Hz.

An important special case, the adaptive exponential integrateand-fire model (Brette and Gerstner, 2005) with purely spiketriggered adaptation and a white noise current with constant mean is illustrated in **Figure 1**. It assumes an exponential nonlinearity *f*0(*v*) = −γ*v* + γ-*<sup>T</sup>* exp[(*v* − 1)/-*<sup>T</sup>*] (Fourcaud-Trocmé et al., 2003; Badel et al., 2008) and corresponds to *N* = 0. Time courses of *v*(*t*) and *a*(*t*) are shown in **Figures 1A1,B1** for two distinct correlation patterns possible in this model. The ISIs *Ti* = *ti* − *ti*<sup>−</sup><sup>1</sup> are obtained as differences between subsequent spiking times *ti*. The sequence *Ti*, *Ti*<sup>+</sup>1, *Ti*<sup>+</sup><sup>2</sup> displays patterns of *shortlong-long* (**Figure 1A1**) and *short-long-short* (**Figure 1B1**), corresponding to a negative SCC, which decays monotonically with the lag *k* (**Figure 1A3**) or to an SCC oscillating with *k* (**Figure 1B3**). In the following, we develop a theory to analyze these and other

**FIGURE 1 | Correlation patterns in the adaptive exponential IF model with** τ*<sup>a</sup>* **= 10***,* γ **= 1***, -<sup>T</sup>* **= 0***.***1,** υ**<sup>T</sup> = 2 ,** *D* **= 0***.***1.** Adaptation is weak (- = 1,μ = 15) in **(A)** and strong (- = 10,μ = 80) in **(B)**. Membrane voltage *v*(*t*) and adaptation variable *a*(*t*) with ISI sequences {*Ti*} and peak adaptation values {*ai*} are shown in **(A1,B1)**; time is in units of the membrane time constant τm. Colored pieces of trajectories in the phase plane (*v*, *a*) in **(A2,B2)** correspond to the respective colors in **(A1,B1)**. The deterministic limit cycle (LC), determined by the initial (post-spike) values *v* = 0, *a* = *a*∗, is indicated by a thick black line. For weak adaptation **(A2)** a short ISI *Ti* causes positive deviations δ*ai* = *ai* − *a*<sup>∗</sup> and δ*ai*<sup>+</sup><sup>1</sup> = *ai*<sup>+</sup><sup>1</sup> − *a*<sup>∗</sup> of peak values leading to long ISIs *Ti*<sup>+</sup><sup>1</sup> and *Ti*<sup>+</sup><sup>2</sup> and, hence, to a negative ISI correlation at all lags **(A3)**. Because of the qualitatively different limit cycle for strong adaptation **(B2)**, deviations δ*ai* and δ*ai*<sup>+</sup><sup>1</sup> differ in sign, yielding an oscillatory correlation pattern **(B3)**.

correlation patterns possible in multi-dimensional adapting IF models.

#### **2.2. GENERAL THEORY**

In our model Equation (2), *a*(*t*) is the only variable that keeps a memory of the previous spike times thereby inducing correlations between ISIs. Over one ISI the time course of adaptation is an exponential decay, relating two adjacent peak values *ai* = *a*(*t* + *i* ) and *ai*<sup>+</sup><sup>1</sup> = *a*(*t* + *<sup>i</sup>*+1) by

$$a\_{i+1} = a\_i e^{-T\_{i+1}/\mathfrak{t}\_u} + \Delta \tag{3}$$

(**Figures A1,B1**). We assume that in the deterministic case (*D* = 0) our model has a finite period *T*<sup>∗</sup> (i.e., the model operates in the tonically firing regime) and, hence, for *D* = 0 the map (3) has a stable fixed point

$$a^\* = \Delta / \left[ 1 - \exp\left( -T^\* / \mathfrak{r}\_a \right) \right]. \tag{4}$$

The asymptotic deterministic dynamics can be interpreted as a limit-cycle like motion in the phase space from the reset point to the threshold and back by the instantaneous reset [cf. **Figures 1A2,B2**].

Weak noise will cause small deviations in the period δ*Ti* = *Ti* − *T*<sup>∗</sup> ≈ *Ti* − -*Ti* that are mutually correlated with coefficient ρ*<sup>k</sup>* = δ*Ti*δ*Ti* <sup>+</sup> *<sup>k</sup>*/δ*T*<sup>2</sup> *<sup>i</sup>* . The peak adaptation values, however, also fluctuate, δ*ai* = *ai* − *a*∗, and both deviations are related by linearizing Equation (3):

$$
\delta T\_{i+1} = \frac{\mathfrak{r}\_a}{a^\*} \left( \delta a\_i - e^{T^\*/\mathfrak{r}\_a} \delta a\_{i+1} \right). \tag{5}
$$

A second relation between the small deviations can be gained by considering how a small perturbation in the voltage dynamics affects the length of the period. This effect is captured by the infinitesimal phase response curve (PRC), *Z*(*t*), *t* ∈ (0, *T*∗) (Izhikevich, 2005; Ermentrout and Terman, 2010) (see Section 4 for the precise definition). During the interval *Ti* <sup>+</sup> 1, the voltage dynamics in Equation (2a) can be written as *v*˙ = *f*0(*v*,**w**) + μ − (*a*<sup>∗</sup> + δ*ai*)*e*−(*<sup>t</sup>* <sup>−</sup> *ti*)/τ*<sup>a</sup>* + ξ(*t*). Compared to the deterministic limit cycle, the dynamics is perturbed by the weak noise and the small deviation in the adaptation δ*aie*<sup>−</sup>(*<sup>t</sup>* <sup>−</sup> *ti*)/τ*<sup>a</sup>* yielding in linear response

$$
\delta T\_{i+1} = \int\_0^{T^\*} \mathrm{d}t \, Z(t) \left( \delta a\_i e^{-\frac{l}{t\_d}} - \xi(t\_i + t) \right) . \tag{6}
$$

Combining Equations (5), (6) we obtain the stochastic map

$$
\delta a\_{i+1} = \alpha \vartheta \,\delta a\_i + \Xi\_i,\tag{7}
$$

where *<sup>i</sup>* <sup>=</sup> <sup>α</sup>*a*<sup>∗</sup> τ*a* ∞ <sup>0</sup> d*t Z*(*t*)ξ(*ti* + *t*) are uncorrelated Gaussian random numbers and

$$\alpha = e^{-T^\*/\mathfrak{r}\_d}, \qquad \vartheta = 1 - \frac{a^\*}{\mathfrak{r}\_d} \int\_0^{T^\*} \mathrm{d}t \, Z(t) e^{-\frac{t}{\mathfrak{r}\_d}}. \tag{8}$$

Note that local stability of the fixed point *a*<sup>∗</sup> requires that |αϑ| < 1. The covariance *ck* = δ*ai*δ*ai* <sup>+</sup> *<sup>k</sup>* of the auto-regressive process Equation (7) can be calculated by elementary means and using Equation (5) we obtain for *k* ≥ 1:

$$\rho\_k = -A(1-\vartheta)\left(\alpha\vartheta\right)^{k-1}, \qquad A = \frac{a(1-\alpha^2\vartheta)}{1+\alpha^2-2\alpha^2\vartheta}. \tag{9}$$

In order to compute α and ϑ via Equation (8), we have to calculate *T*<sup>∗</sup> and *Z*(*t*) (*a*<sup>∗</sup> then follows from Equation (4)), which can be done for simple systems analytically.

Our main result, Equations (8), (9), allows to draw a number of general conclusions. It shows that the SCC is always a geometric sequence with respect to the lag *k* that can generate qualitatively different correlation patterns depending on the value of ϑ and thus on PRC and adaptation current. Because |αϑ| < 1 and 0 < α < 1, the prefactor *A* in Equation (9) is always positive. Consequently, ρ<sup>1</sup> is negative for ϑ < 1 and positive for ϑ > 1. Looking at Equation (8), we find that a positive PRC inevitably yields ϑ < 1. This implies that adapting neurons with type I PRC possess negative correlations between adjacent ISIs. Intuitively, a short ISI causes in the following on average a higher inhibitory adaptation during the subsequent ISI. Such an inhibitory current always enlarges the ISI in type I neurons—hence, a short ISI is followed by a long ISI.

The sign of higher lags is determined by the base of the power: for ϑ > 0 correlations decay monotonically, whereas for ϑ < 0 the SCC oscillates. Two special cases are ϑ = 0 with a negative correlation at lag 1 and vanishing correlations at all higher lags and ϑ = 1 where all correlations vanish. Overall, we find five basic patterns corresponding to the cases −α−<sup>1</sup> < ϑ < 0, ϑ = 0, 0 < ϑ < 1, ϑ = 1 and 1 < ϑ < α<sup>−</sup>1. These basic patterns cover all interval correlations discussed in previous theoretical studies (Schwalger and Lindner, 2010; Urdapilleta, 2011). Our geometric formula generalizes the theory for the perfect IF model with adaptation (Schwalger et al., 2010) to more realistic, nonlinear multi-dimensional IF models with adaptation.

The cumulative effect of the correlations can be described by the sum over all ρ*k*, which determines the long-time limit of the Fano factor and the low-frequency limit of the spike train power spectrum (for a definition of these quantities, see Section 4.2). Evaluating the geometric series yields

$$\sum\_{k=1}^{\infty} \rho\_k = -\frac{A\left(1-\vartheta\right)}{1-\alpha\vartheta}.\tag{10}$$

This shows that adaptation in neurons with type I resetting (ϑ < 1) leads to a negative summed correlation and hence a reduced long-term variability. Furthermore, at high firing rates achieved by a strong input current μ, the sum in Equation (10) can be approximated by

$$\sum\_{k=1}^{\infty} \rho\_k \simeq -\frac{1}{2} + \frac{1/2}{\left(1 + \Delta \mathfrak{r}\_d / \nu\_\Gamma\right)^2}, \quad T^\* \ll \mathfrak{r}\_d. \tag{11}$$

In particular, for strong adaptation (τ*<sup>a</sup> v*T) the sum is only slightly larger than −1/2. Note that by virtue of the fundamental relation lim*t*→∞ *F*(*t*) = *C*<sup>2</sup> V 1 + 2 <sup>∞</sup> *<sup>k</sup>* <sup>=</sup> <sup>1</sup> ρ*<sup>k</sup>* (Cox and Lewis, 1966) (see Section 4.2), the smallest possible value for the sum is −1/2 in order to ensure the non-negativity of the Fano factor *F*(*t*). At this minimal value the long-term variability as expressed by the Fano factor vanishes even for a non-vanishing ISI variability as quantified by the coefficient of variation *C*V. The latter quantity can also be estimated using the weak-noise theory: From Equation (7) one can calculate the variance of *ai* and using Equation (5) an approximation for *C*<sup>2</sup> <sup>V</sup> ≈ δ*T*<sup>2</sup> *<sup>i</sup>* /*T*∗<sup>2</sup> can be obtained as follows:

$$C\_V^2 = 2D \frac{1 + \alpha^2 - 2\alpha^2 \vartheta}{[1 - \left(\alpha \vartheta\right)^2]T^{\*2}} \int\_0^{T^\*} \mathrm{d}t \, [Z(t)]^2. \tag{12}$$

#### **2.3. ONE-DIMENSIONAL IF MODELS WITH ADAPTATION**

In the simplest case (*N* = 0, *f*0(*v*,**w**) = *f*(*v*)) the PRC reads

$$Z(t) = Z(T^\*) \exp\left[\int\_t^{T^\*} \mathrm{d}t' f'(\nu\_0(t'))\right],\tag{13}$$

where *v*0(*t*) is the limit cycle solution and *Z*(*T*∗) = [*f*(*v*T) + μ − *a*<sup>∗</sup> + -] <sup>−</sup><sup>1</sup> is the inverse of the velocity *v*˙0(*T*∗) at the threshold, which is always positive. Thus, the PRC is positive for all *t* ∈ (0, *T*∗), i.e., one-dimensional IF models show type I behavior. From our general considerations, this implies a negative SCC Schwalger and Lindner Interval correlations in adapting neurons

at lag *k* = 1. The sign of the correlations at higher lags can be inferred from the sign of ϑ, for which one can show (Section 4) that

$$\vartheta = \left( f(0) + \mu - a^\* \right) Z(0). \tag{14}$$

Because *Z*(0) > 0, the sign of ϑ is determined by the sign of *f*(0) + μ − *a*∗. For weak adaptation such that *a*<sup>∗</sup> < *f*(0) + μ (achieved by a sufficiently small value of or τ*a*, **Figure 1A**), we will have ϑ > 0 and a negative correlation at all lags (**Figure 1A3**). In this case, a short ISI occurring by fluctuation will cause a positive deviation δ*ai* (**Figure 1A2**, green arrow). Geometrically, it is plausible that such a positive deviation causes a likewise positive deviation δ*ai*<sup>+</sup><sup>1</sup> in the subsequent cycle (**Figure 1A2**, red arrow). Because a positive deviation is associated with a long ISI, the initial short ISI is on average followed by longer ISIs.

In marked contrast, for strong adaptation such that *a*<sup>∗</sup> > *f*(0) + μ (achieved by a sufficiently large value of or τ*a*), ϑ becomes negative and hence the SCC's sign alternates with the lag. This alternation of the sign can be understood by means of the phase plane. Let us again consider a positive deviation δ*ai* due to a short preceding ISI (**Figure 1B2**, green arrow). Because *v*˙0(0) = *f*(0) + μ − *a*<sup>∗</sup> < 0, the neuron is reset above the *v*-nullcline and hence hyperpolarizes at the beginning of the interval, i.e., the trajectory makes a detour into the region of negative voltage (corresponding to a "broad reset" in Naud et al. (2008)). A positive deviation δ*ai* leads to a larger detour (green trajectory) causing a sign inversion and hence a negative deviation δ*ai*<sup>+</sup><sup>1</sup> (**Figure 1B2**, red arrow). Because a positive (negative) deviation corresponds on average to a long (short) ISI, the alternation of δ*ai* also entails an alternation of the ISI correlations. Thus, the distinction between monotonic and alternating patterns relates to a qualitative distinction of the voltage trace after resetting [cf. "sharp" vs. "broad" resets in Naud et al. (2008)].

As demonstrated in **Figures 1A3,B3**, our theory works well for the adapting exponential integrate-and-fire model. We next demonstrate the validity of our approach over a broad range of firing rates (**Figure 2**) for another important 1D model, the adapting leaky integrate-and-fire model (Treves, 1993) for which *f*(*v*) = −γ*v* and

$$Z(t) = \exp[\gamma(t - T^\*)]/(\mu - \gamma \nu\_T - a^\* + \Delta)\tag{15}$$

(here *T*<sup>∗</sup> has still to be determined from a transcendental equation). Changing the firing rate by varying the input current μ, we find a good agreement for the first two correlation coefficients and the sum of all ρ*k*; the approximation of the CV shows deviations from simulation results when the input current μ becomes small (approaching the fluctuation-driven regime). In accordance with previous findings (Wang, 1998; Liu and Wang, 2001; Benda et al., 2010; Nesse et al., 2010; Schwalger et al., 2010; Schwalger and Lindner, 2010; Urdapilleta, 2011), the first correlation coefficient ρ<sup>1</sup> displays a minimum corresponding to strong anti-correlations between adjacent intervals. The correlations at lag 2 can be positive for a finite range of firing rates if the adaptation strength is sufficiently large (**Figure 2B**), whereas for moderate adaptation we find a negative ρ<sup>2</sup> at all firing rates

**adapting LIF model vs. firing rate 1***/***-***Ti*  **≈ 1***/T***∗, where the rate is varied by increasing** μ**.** The gray-shaded area corresponds to the fluctuation-driven regime (μ < γ*v*T), where the assumptions of the theory do not hold. The panels display (from top to bottom) ρ1, ρ2, the sum *<sup>m</sup> <sup>k</sup>* <sup>=</sup> <sup>1</sup> ρ*<sup>k</sup>* and the CV for simulation (circles, *m* = 100) and theory (solid lines, *m* → ∞). **(A)** Moderate adaptation: - = 1, **(B)** strong adaptation: - = 10. Both: γ = 1, τ*<sup>a</sup>* = 10, *D* = 0.1, *v*<sup>T</sup> = 1. Note that the firing rate is given in units of the inverse membrane time constant τ−<sup>1</sup> <sup>m</sup> .

(**Figure 2A**). In both cases, however, the sum of SCCs approaches a value close to −1/2 for high firing rates as predicted by Equation (11) (**Figure 2**, bottom). This is strikingly similar to experimental data from weakly electric fish, in which some electro-receptors display a monotonically decaying SCC and some show an oscillatory SCC (Ratnam and Nelson, 2000) but all cells exhibit a sum close to −1/2 (Ratnam and Goense, 2004). Finally, **Figure 2** reveals a local maximum of the CV for some suprathreshold current μ—an effect that has been described by Nesse et al. (2008).

#### **2.4. GENERALIZED INTEGRATE-AND-FIRE MODEL WITH ADAPTATION**

Different correlation patterns become possible if we consider a type II PRC, which is by definition partly negative and can lead to a negative value of the integral in Equation (8), and hence to ϑ ≥ 1. This corresponds to a non-negative SCC at lag 1, which is infeasible in the one-dimensional case. To test the prediction ρ<sup>1</sup> ≥ 0, we study the generalized integrate-and-fire (GIF) model (Brunel et al., 2003) with spike-triggered adaptation. This model is defined by *f*0(*v*, *w*) = −γ*v* − β*w* and *f*1(*v*, *w*) = (*v* − *w*)/τ*w*. Using the method described in Section 4, the PRC is obtained as

$$Z(t) = \frac{e^{\frac{\chi}{2}(t - T^\*)} \left[ \cos(\Omega(t - T^\*)) - \frac{1 - \tau\_y \gamma}{2\tau\_w \Omega} \sin\left(\Omega(t - T^\*)\right) \right]}{\mu - \gamma \nu\_\Gamma - \beta \nu\_0 (T^\*) - a^\* + \Delta} \tag{16}$$

where ν = γ + 1/τ*w*, = β+γ <sup>τ</sup>*<sup>w</sup>* <sup>−</sup> <sup>ν</sup><sup>2</sup> <sup>4</sup> and *w*0(*t*) is one component of the deterministic limit-cycle solution [*v*0(*t*), *w*0(*t*), *a*0(*t*)] that we calculated numerically.

In **Figure 3B** we demonstrate that all possible correlation patterns can be realized in the GIF model and that the predicted

model **(A)**, ϑ < 1 and only three qualitative different cases are possible. The adapting GIF model **(B)** exhibits the full repertoire of correlation patterns because the PRC can be partly negative and ϑ can attain values from its entire physically meaningful interval [−1/α, 1/α]. The value of ϑ and hence the type of correlation pattern is set by the integral over the weighted PRC *<sup>Z</sup>*˜(*t*) <sup>=</sup> *<sup>Z</sup>*(*t*)*e*<sup>−</sup> *<sup>t</sup>* <sup>τ</sup>*<sup>a</sup> <sup>a</sup>*∗*<sup>T</sup>* <sup>∗</sup> <sup>τ</sup>*<sup>a</sup>* , shown in left panels. LIF parameters: *<sup>D</sup>* <sup>=</sup> <sup>0</sup>.1, τ*<sup>a</sup>* = 2, **(i)** μ = 20, - = 10, **(ii)** μ = 20, - = 4.47, (iii) μ = 5, - = 1. GIF parameters: **(i)** μ = 10, β = 3, τ*<sup>a</sup>* = 10. **(ii)** μ = 11.75, β = 3, τ*<sup>a</sup>* = 10. **(iii)** μ = 20, β = 1.5, τ*<sup>a</sup>* = 10. **(iv)** μ = 2.12, β = 1.5, τ*<sup>a</sup>* = 1, - = 10. **(v)** μ = 1.5, β = 1.5, τ*<sup>a</sup>* = 1, - = 9 *D* = 10<sup>−</sup>5. Unless stated otherwise, γ = 1, -= 1, τ*<sup>w</sup>* = 1.5, *D* = 10<sup>−</sup>4, *wr* = 0.

SCCs agree quantitatively well in theory and model simulations (for comparison, see the SCC for the LIF in **Figure 3A**). To each distinct pattern belongs a range of ϑ (**Figure 3**, left), determined by the area under the weighted PRC *<sup>Z</sup>*˜(*t*) <sup>=</sup> *<sup>a</sup>*<sup>∗</sup> τ*a e* − *<sup>t</sup>* <sup>τ</sup>*<sup>a</sup> Z*(*t*). The function *Z*˜(*t*) (left column in **Figures 3A,B**) illustrates, why an adapting GIF neuron can show vanishing (**Figure 3Biv**) or even *purely positive* ISI correlations (**Figure 3Bv**). In case of type II resetting, inhibitory input can *shorten* the ISI because of the negative part in the PRC; here inhibition acts like an excitatory input. Consequently, a short ISI will induce a stronger inhibition (adaptation) that now causes a likewise short interval and results thus in a positive correlation between adjacent ISIs. Also, the shortening effect of the adaption current in the early negative phase of the PRC can be exactly balanced by the delaying effect of the late positive phase of the PRC (pseudo-renewal case, in which the area under *Z*˜ is zero).

#### **3. DISCUSSION**

We have found a general relation between two experimentally accessible characteristics: the serial interval correlations and the phase response curve of a noisy neuron with spike-triggered adaptation. The theory predicts distinct correlation patterns like short-range negative and oscillatory correlations that have been observed in experiments (Ratnam and Nelson, 2000; Nawrot et al., 2007) and in simulation studies of adapting neurons (Chacron et al., 2000; Liu and Wang, 2001).

Beyond negative and oscillatory correlations, we have found, however, that resonator neurons with spike-frequency adaptation can exhibit purely positive ISI correlations or a pseudo-renewal process with uncorrelated intervals. Adaptation currents that are commonly associated with negative ISI correlations (Wang, 1998; Chacron et al., 2001; Liu and Wang, 2001; Chacron et al., 2003; Benda et al., 2010; Nesse et al., 2010) can thus induce a rich repertoire of correlation patterns. Despite the multitude of patterns, there is a universal limit for the cumulative correlations at high firing rates [cf. Equation (11)], which shows that the longterm variability of the spike train is in this limit always reduced in agreement with experimental studies (Ratnam and Goense, 2004).

Our analytical results apply to arbitrary adaptation strength and time scale but require that (1) the noise is weak and white, (2) the deterministic dynamics shows periodic firing with equal ISIs (i.e., a limit-cycle exists) and (3) the adaptation current is purely spike-triggered with (4) a single exponential decay time. Regarding the weak-noise assumption, we found from numerical simulations quantitative agreement with our theory for values of the coefficient of variation (CV) up to 0.4, which is, for instance, typical for neurons in the sensory periphery (Ratnam and Nelson, 2000; Neiman and Russell, 2004; Vogel et al., 2005). This holds even in the subthreshold regime at low CVs, where the deterministic system does not follow a limit cycle. In this case, *T*<sup>∗</sup> has to be replaced by the mean ISI. Moreover, we found qualitative agreement even for moderately strong noise with values of the CV up to 0.8, which is typical for cortical non-bursting neurons *in vivo* (e.g. **Figure 3** in Softky and Koch (1993)).

In the absence of a deterministic limit-cycle, i.e., in the fluctuation-driven regime at high CVs, different mathematical approaches have to be employed, such as those based on a hazard-function formalism (Muller et al., 2007; Nesse et al., 2010; Schwalger and Lindner, 2010; Farkhooi et al., 2011). Furthermore, for some parameter sets, we also observed repeat periods of the deterministic system that involved multiple ISIs corresponding to a periodic ISI sequence with *Ti* = *Ti* <sup>+</sup> *<sup>n</sup>*, where the smallest period is *n* ≥ 2. Such cases can realize bursting (Naud et al., 2008), which we did not consider in the present study. However, we expect that these parameter regimes yield interesting correlation patterns because already in the noiseless case a periodic ISI sequence exhibits correlations between ISIs.

Regarding the last two assumptions, it seems that the analytical derivation cannot be easily extended to the cases of adaptation currents activated by the subthreshold membrane potential ("subthreshold adaptation" Ermentrout et al., 2001; Brette and Gerstner, 2005; Prescott and Sejnowski, 2008; Deemyad et al., 2012) and multiple-time-scale adaptation (Pozzorini et al., 2013). Ermentrout et al. (2001) have shown that the inclusion of subthreshold adaptation can lead to type II PRCs, which according to our theory could qualitatively change the correlation patterns. An adaptation dynamics depending on the subthreshold membrane potential also involves a fluctuating component because *v* is noisy. According to Schwalger et al. (2010), this stochasticity could contribute positive correlations. The combined effect of spike-triggered, subthreshold and stochastic adaptation currents on the sign of the SCC is not clear.

The important cases of the fluctuation-driven regime and multiple-time-scale adaptation have been recently analyzed with respect to the first-order spiking statistics including the stationary firing rate as well as the mean response to time-dependent stimuli (Richardson, 2009; Naud and Gerstner, 2012). The secondorder statistics, which describes the fluctuations of the spike train ("neural variability," cf. Section 4.2) and which limits the information transmission capabilities of neurons, is however still poorly understood theoretically in these cases. How adaptation shapes second-order statistics in the cases of multiple adaptation time scales, fluctuation-driven spiking and sub-threshold adaptation is an interesting topic for future investigations.

As an outlook we sketch, how our theory could be used to constrain unknown physiological parameters by measured SCCs and PRCs. For instance, from the mean ISI we can estimate *T*<sup>∗</sup> = -*T*. Furthermore, knowing ρ<sup>1</sup> = −*A*(α, ϑ)(1 − ϑ) as well as the ratio ρ2/ρ<sup>1</sup> = αϑ one can eliminate ϑ and solve for α. This allows to estimate the unknown adaptation time constant τ*<sup>a</sup>* = −*T*∗/ ln α and the amplitude of the adaptation current

$$a^\* = \frac{\mathfrak{r}\_a}{\alpha} \left( a - \frac{\rho\_2}{\rho\_1} \right) \bigg/ \int\_0^{T^\*} \mathrm{d}t \, Z(t) e^{-\frac{t}{\mathfrak{t}\_a}} \,. \tag{17}$$

Although experimental PRCs are notoriously noisy (Izhikevich, 2005), the integral over *Z*(*t*) determining our estimate of *a*<sup>∗</sup> is less error-prone. Combining our approach with advanced estimation methods for the PRC (Galán et al., 2005), may thus provide an alternative access to hidden physiological parameters using only spike time statistics.

#### **4. MATERIALS AND METHODS**

#### **4.1. PHASE-RESPONSE CURVES OF ADAPTING IF MODELS**

We use the phase-response curve *Z*(*t* ) to characterize the shift of the *next* spike following a small current pulse applied at a given "phase" *t* ∈ [0, *T*∗] of an ISI. More precisely, let us assume that the last spike occurred at time *t*<sup>0</sup> = 0. Then, the next spike time *t*<sup>1</sup> of the perturbed limit cycle dynamics *v*˙ = *f*0(*v*,**w**) + μ − *a* + δ(*t* − *t* ), *v*(0) = 0, **w**(0) = **w***r*, *a*(0) = *a*∗, 0 < *t* ≤ *T*<sup>∗</sup> will be shifted by some amount δ*T*(*t* , ) = *t*<sup>1</sup> − *T*∗. The infinitesimal PRC can be defined as the limit

$$Z(t') = -\lim\_{\epsilon \to 0} \frac{\delta T(t', \epsilon)}{\epsilon},\tag{18}$$

where the sign has been chosen such that a spike advance (δ*T* < 0) due to a positive stimulation ( > 0) leads to a positive PRC. The definition of *Z*(*t*) by the shift of the next spike differs from the PRC that describes the asymptotic spike shift but is equivalent to the so-called "first-order PRC," which is often measured in experiments (Netoff et al., 2012).

#### *4.1.1. Adjoint equation and boundary conditions*

The PRC can be computed using the adjoint method (see e.g. Ermentrout and Terman (2010)). To this end, the dynamics is linearized about the *T*∗-periodic limit cycle solution **y**0(*t*) = [*v*0(*t*),**w**0(*t*), *a*0(*t*)]. The linearized limit-cycle dynamics **y**(*t*) = **y**0(*t*) + δ**y**(*t*) corresponding to Equation (2) is given by

$$
\dot{\delta\mathbf{y}} = A(t)\delta\mathbf{y} \tag{19}
$$

with the Jacobian matrix

$$A(t) = \begin{pmatrix} \frac{\partial f\_0}{\partial \nu} & \frac{\partial f\_0}{\partial w\_1} & \dots & \frac{\partial f\_0}{\partial w\_N} & -1 \\ \mathbf{r}\_1^{-1} \frac{\partial f\_1}{\partial \nu} & \mathbf{r}\_1^{-1} \frac{\partial f\_1}{\partial w\_1} & \dots & \mathbf{r}\_1^{-1} \frac{\partial f\_1}{\partial w\_N} & \mathbf{0} \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ \mathbf{r}\_N^{-1} \frac{\partial f\_N}{\partial \nu} & \mathbf{r}\_N^{-1} \frac{\partial f\_N}{\partial w\_1} & \dots & \mathbf{r}\_N^{-1} \frac{\partial f\_N}{\partial w\_N} & \mathbf{0} \\ \mathbf{0} & \dots & \mathbf{0} & -\mathbf{r}\_a^{-1} \end{pmatrix} \tag{20}$$

evaluated at *v* = *v*0(*t*),**w** = **w**0(*t*). The linear response of the ISI to perturbations of the limit-cycle dynamics in an arbitrary direction is given by the vector **Z**(*t*) = [*Z*(*t*), *Zw*<sup>1</sup> (*t*), . . . , *ZwN* (*t*), *Za*(*t*)] T, where the first component is equal to the PRC defined above. This vector satisfies the adjoint equation **Z**˙ = −*A*T**Z** (*A*<sup>T</sup> denotes the transpose of *A*) with the normalization condition *v*˙0(*t*)*Z*(*t*) + ˙**w**0(*t*)**Z***w*(*t*) + *a*˙0(*t*)*Za*(*t*) = 1. The remaining *N* + 1 boundary conditions are obtained by the following consideration: On the limit cycle , a phase φ : → [0, *T*∗] can be introduced in the usual way by inverting the map *t* → **y**0(*t*) and setting φ = *t*. Because we are interested in the shift of the *next* spike, it is useful to define the isochrons (sets of equal phase) as the sets of all points in phase space that will lead to the same first spike time. Put differently, phase points belonging to the same isochron will have their first threshold crossing in synchrony. As a consequence, the threshold hyperplane defined by the condition *v* = *v*<sup>T</sup> is a special isochron corresponding to the phase φ = *T*∗. Note that this definition of the phase implies that the reset line defined by the condition *v* = 0,**w** = **w***<sup>r</sup>* does generally *not* correspond to φ = 0 but to positive phases if *a* < *a*<sup>∗</sup> and negative phases if *a* > *a*∗. Thus, off-limit-cycle trajectories suffer a phase jump upon reset. Close to the threshold, the isochrons are parallel to the threshold, and thus, a perturbation perpendicular to the *v*direction does not change the phase. This insensitivity implies the boundary conditions *Zw*<sup>1</sup> (*T*∗) = ... = *ZwN* (*T*∗) = *Za*(*T*∗) = 0. Note that a definition of the PRC based on the asymptotic spike shift would require periodic boundary conditions (Ladenbauer et al., 2012).

From the above considerations, it becomes clear that the PRC *Z*(*t*) can be computed for *t* ∈ [0, *T*∗] by solving the system

$$
\begin{pmatrix}
\dot{Z} \\
\dot{Z}\_{\text{w}\_{1}} \\
\vdots \\
\dot{Z}\_{\text{w}\_{N}}
\end{pmatrix} = -\begin{pmatrix}
\frac{\partial f\_{0}}{\partial \mathcal{Y}} & \mathbf{r}\_{1}^{-1} \frac{\partial f\_{1}}{\partial \mathcal{Y}} & \dots & \mathbf{r}\_{N}^{-1} \frac{\partial f\_{N}}{\partial \mathcal{Y}} \\
\frac{\partial f\_{0}}{\partial \mathcal{w}\_{1}} & \mathbf{r}\_{1}^{-1} \frac{\partial f\_{1}}{\partial \mathcal{w}\_{1}} & \dots & \mathbf{r}\_{N}^{-1} \frac{\partial f\_{N}}{\partial \mathcal{w}\_{1}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f\_{0}}{\partial \mathcal{w}\_{N}} & \mathbf{r}\_{1}^{-1} \frac{\partial f\_{1}}{\partial \mathcal{w}\_{1}} & \dots & \mathbf{r}\_{N}^{-1} \frac{\partial f\_{N}}{\partial \mathcal{W}\_{N}}
\end{pmatrix} \begin{pmatrix} Z \\ Z\_{\text{w}\_{1}} \\ \vdots \\ \dot{Z}\_{\text{w}\_{N}} \end{pmatrix} \tag{21}
$$

subject to the boundary conditions

$$Z(T^\*) = \frac{1}{\dot{\nu}\_0(T^\*)} = \frac{1}{f\_0(\nu\_\Gamma, \mathbf{w}\_0(T^\*)) + \mu - a^\* + \Delta},\tag{22}$$

$$Z\_{\le \mathbf{w}\_k}(T^\*) = \mathbf{0}, \qquad k = 1, \ldots, N. \tag{23}$$

The PRC with respect to *a* is determined by

$$
\dot{Z}\_a = \frac{1}{\mathfrak{r}\_a} Z\_a + Z(t), \qquad Z\_a(T^\*) = 0. \tag{24}
$$

The matrix in Equation (21) is again evaluated on the limit cycle at *v* = *v*0(*t*),**w** = **w**0(*t*) and is therefore time-dependent. An analytic solution of Equation (21) is possible for one-dimensional models with adaptation (*N* = 0) or general linear IF models although in most cases the deterministic period *T*<sup>∗</sup> still has to be computed numerically.

#### *4.1.2. One-dimensional case*

In the case *N* = 0, the PRC satisfies the equation *Z*˙ = −*f* (*v*0)*Z* with boundary condition (22). The solution is given by Equation (13). In order to prove Equation (14), we compute *Za*(*t*) from Equation (24) yielding

$$Z\_a(t) = e^{\frac{t}{\epsilon\_a}} \left( Z\_a(0) + \int\_0^t Z(t')e^{-\frac{t'}{\epsilon\_a}} \,\mathrm{d}t' \right) \dots$$

Evaluation of this expression for *<sup>t</sup>* <sup>=</sup> *<sup>T</sup>*<sup>∗</sup> leads to <sup>ϑ</sup> <sup>=</sup> <sup>1</sup> <sup>+</sup> *<sup>a</sup>*<sup>∗</sup> <sup>τ</sup>*<sup>a</sup> Za*(0). Finally, using the normalization condition (*f*(0) <sup>+</sup> <sup>μ</sup> <sup>−</sup> *<sup>a</sup>*∗)*Z*(0) <sup>−</sup> *<sup>a</sup>*<sup>∗</sup> <sup>τ</sup>*<sup>a</sup> Za*(0) <sup>=</sup> 1 yields Equation (14).

#### **4.2. RELATION BETWEEN SECOND-ORDER STATISTICS OF SPIKE COUNT, SPIKE TRAIN AND INTERSPIKE INTERVALS**

A stationary sequence of spike times {..., *ti*<sup>−</sup>1, *ti*, *ti*<sup>+</sup>1,...} is often characterized by the statistics of the spike train *<sup>x</sup>*(*t*) <sup>=</sup> *<sup>i</sup>* δ(*t* − *ti*), the spike count *N*(*t*) = *t* <sup>0</sup> d*t x*(*t* ) or the sequence of ISIs {*Ti* = *ti* − *ti*<sup>−</sup>1}. In particular, neural variability can be quantified by the second-order statistics of these different descriptions as, for instance, the spike train power spectrum

$$S(f) = \int \mathrm{d}\mathfrak{r} \, e^{2\pi i f \, \mathfrak{r}} \langle \mathfrak{x}(t)\mathfrak{x}(t+\mathfrak{r}) \rangle,\tag{25}$$

the Fano factor

$$F(t) = \frac{\langle \mathcal{N}(t)^2 \rangle - \langle \mathcal{N}(t) \rangle^2}{\langle \mathcal{N}(t) \rangle},\tag{26}$$

and the coefficient of variation *C*<sup>V</sup> = -(*Ti* − -*Ti*)2/-*Ti* and SCC ρ*<sup>k</sup>* as defined in Equation (1). These statistics are connected by the fundamental relationship (Cox and Lewis, 1966) (see also (van Vreeswijk, 2010))

$$\lim\_{t \to \infty} F(t) = \langle T\_i \rangle \lim\_{f \to 0} S(f) = C\_\mathcal{V}^2 \left( 1 + 2 \sum\_{k=1}^\infty \rho\_k \right). \tag{27}$$

It shows that the summed SCC has a strong impact on the long-term variability of the spike train. In particular, a negative sum yields a more regular spike train on long time scales than a renewal process with the same *C*V.

## **ACKNOWLEDGMENTS**

This work was supported by Bundesministerium für Bildung und Forschung grant 01GQ1001A.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 August 2013; accepted: 26 October 2013; published online: 29 November 2013.*

*Citation: Schwalger T and Lindner B (2013) Patterns of interval correlations in neural oscillators with adaptation. Front. Comput. Neurosci. 7:164. doi: 10.3389/fncom. 2013.00164*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2013 Schwalger and Lindner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Propagating synchrony in feed-forward networks

## *Sven Jahnke1,2,3\*, Raoul-Martin Memmesheimer <sup>4</sup> and Marc Timme1,2,3*

*<sup>1</sup> Network Dynamics, Max Planck Institute for Dynamics and Self-Organization (MPIDS), Göttingen, Germany*

*<sup>2</sup> Bernstein Center for Computational Neuroscience (BCCN), Göttingen, Germany*

*<sup>3</sup> Fakultät für Physik, Georg-August-Universität Göttingen, Göttingen, Germany*

*<sup>4</sup> Department for Neuroinformatics, Donders Institute, Radboud University, Nijmegen, Netherlands*

#### *Edited by:*

*Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany*

#### *Reviewed by:*

*Robert Rosenbaum, University of Pittsburgh, USA Arvind Kumar, University of Freiburg, Germany Raul C. Muresan, Romanian Institute of Science and Tehnology, Romania*

#### *\*Correspondence:*

*Sven Jahnke, Network Dynamics, Max Planck Institute for Dynamics and Self-Organization (MPIDS), Am Faßberg 17, 37077 Göttingen, Germany e-mail: sjahnke@nld.ds.mpg.de*

Coordinated patterns of precisely timed action potentials (spikes) emerge in a variety of neural circuits but their dynamical origin is still not well understood. One hypothesis states that synchronous activity propagating through feed-forward chains of groups of neurons (synfire chains) may dynamically generate such spike patterns. Additionally, synfire chains offer the possibility to enable reliable signal transmission. So far, mostly densely connected chains, often with all-to-all connectivity between groups, have been theoretically and computationally studied. Yet, such prominent feed-forward structures have not been observed experimentally. Here we analytically and numerically investigate under which conditions diluted feed-forward chains may exhibit synchrony propagation. In addition to conventional linear input summation, we study the impact of non-linear, non-additive summation accounting for the effect of fast dendritic spikes. The non-linearities promote synchronous inputs to generate precisely timed spikes. We identify how non-additive coupling relaxes the conditions on connectivity such that it enables synchrony propagation at connectivities substantially lower than required for linearly coupled chains. Although the analytical treatment is based on a simple leaky integrate-and-fire neuron model, we show how to generalize our methods to biologically more detailed neuron models and verify our results by numerical simulations with, e.g., Hodgkin Huxley type neurons.

**Keywords: synchrony, networks, synfire chains, spike pattern, mathematical neuroscience, non-additive coupling, non-linear dendrites**

## **1. SPIKE PATTERNS AND SIGNAL TRANSMISSION IN NEURONAL CIRCUITS**

Reliable signal transmission is a core part of neuronal processing. A common hypothesis states that activity propagating along neuronal sub-populations that are connected in a feed-forward manner may support such signal transmission. Indeed, there is strong indication that activity propagation along feed-forward structures drives the generation of bird songs (Long et al., 2010) and experiments have shown propagation of synchronous and rate activity in feed-forward networks (FFNs) *in vitro* (Reyes, 2003; Feinerman et al., 2005; Feinerman and Moses, 2006). Sequential replay in the hippocampus and in neocortical networks also suggest underlying feed-forward mechanisms (August and Levy, 1999; Nadasdy et al., 1999; Lee and Wilson, 2002; Leibold and Kempter, 2006; Xu et al., 2011; Eagleman and Dragoi, 2012; Jahnke et al., 2012) and propagation of synchronous activity along feed-forward chains is a possible explanation for experimentally observed precise spike timing in the cortex (Riehle et al., 1997; Kilavik et al., 2009; Putrino et al., 2010). Further, the modular, hierarchical structure of many sensory and motor systems suggests propagation over sequences of areas in feedforward manner, e.g., in bottom-up signal transfer (Felleman and Van Essen, 1991; Scannell et al., 1999; Bullmore and Sporns, 2009; Kumar et al., 2010).

Feed-forward structures which support the propagation of synchronous activity are termed synfire chains. The concept was introduced by Abeles (1982) as groups of neurons (layers) with dense anatomical connections between subsequent groups that are embedded in otherwise roughly randomly connected local neural circuits. Two major questions regarding the dynamical options for synfire activity include a) how synchrony may actively propagate and b) how such spatio-temporally coordinated spike timing may be robust against irregular background activity, because the synfire chains are part of a cortical network with dynamics defined by the so-called irregular balanced state (van Vreeswijk and Sompolinsky, 1996, 1998).

Addressing these points, theoretical studies have established conditions for stable propagation of synchrony in synfire chains (Diesmann et al., 1999; Gewaltig et al., 2001). Most synfire chain models assume functionally relevant FFNs that exhibit a very dense, often all-to-all connectivity between subsequent layers (Aviel et al., 2003; Mehring et al., 2003; Kumar et al., 2008) (see also a recent review on this topic Kumar et al., 2010). Such highly prominent feed-forward-structures, however, have not been found experimentally. Since cortical neural networks are overall sparse (e.g., Braitenberg and Schüz, 1998; Holmgren et al., 2003), we may also expect some level of dilution for embedded feed-forward chains. So far, computational model studies assumed that such chains created from existing connections in sparse recurrent networks exhibit strong synaptic efficiencies and specifically modified neuron properties to enable synchrony propagation (Vogels and Abbott, 2005).

Recently, we have shown that non-additive dendritic interactions promote propagation of synchrony (Jahnke et al., 2012). The non-additive dendritic interactions considered are mediated by fast dendritic spikes (Ariav et al., 2003; Gasparini et al., 2004; Polsky et al., 2004; Gasparini and Magee, 2006): upon stimulation within a time interval less than a few milliseconds, dendrites are capable of generating sodium spikes. These induce a strong, short and stereotypical depolarization in the soma. If this depolarization elicits a somatic spike, the spike occurs a fixed time interval after stimulation with sub-millisecond precision. This dendritic non-linearities relax the requirement of dense feed-forward anatomy and thereby allow for robust propagation of synchrony even in *diluted* FFNs with synapses of moderate strength within the biologically observed range.

In the present article, we analytically and numerically investigate in detail under which conditions synchronous activity may reliably propagate along the layers of an FFN where the inter-group connectivity is diluted, as may be expected when they are part of a sparse cortical network. An embedding network is mimicked by external, noisy input. We study the influence of the network setup, including the influence of the emulated embedding network, and of different types of standard linearly additive as well as non-additive dendritic interactions.

We derive analytical estimates for the critical connectivity the minimal connectivity that allows robust propagation of synchrony. Some fundamental analytical results, in particular the ansatz for deriving a critical connectivity in the first place, have been briefly reported before (Jahnke et al., 2012). Here, we extend the approach and show how the bifurcation point, i.e., the transition point from the non-propagating to the propagating regime, can be estimated quantitatively from the neurons' ground state properties. We investigate the validity range of the analytical predictions and check them via direct numerical simulations. Furthermore, we discuss the applicability of our results to biologically more detailed neuron models and network setups. In particular, we argue that the assumptions underlying the analytical approach are met by a wide class of neuron models, including, e.g., conductance based leaky integrate-and-fire and Hodgkin–Huxley-type neurons.

The article is structured as follows: After introducing the neuron model and network setup in section 2, we study in the main part the propagation of synchrony in linearly coupled FFNs (section 3.1) and in FFNs incorporating dendritic non-linearities (section 3.2). In particular, we derive tools to study the system analytically, compare the results to computer simulations and elaborate differences of the dynamics of FFNs with and without non-additive dendritic interactions. In the final part (section 3.3), we discuss the application of our analytical results to biologically more detailed neuron models.

#### **2. METHODS AND MODELS**

#### **2.1. NEURON MODEL**

#### *2.1.1. Linear model*

Consider networks of leaky integrate-and-fire neurons that interact by sending and receiving spikes via directed connections. The state of neuron *k* at time *t* is described by its membrane potential *Vk*(*t*) and its dynamics satisfy

$$\frac{dV\_k(t)}{dt} = -\frac{V\_k(t)}{\mathfrak{r}\_k^{\rm m}} + I\_k^{\rm constant} + I\_k^{\rm net}(t) + I\_k^{\rm ext}(t),\tag{1}$$

where τ<sup>m</sup> *<sup>k</sup>* is the membrane time constant of neuron *<sup>k</sup>*, *<sup>I</sup>*const *<sup>k</sup>* := *I*0 *<sup>k</sup>* /τ<sup>m</sup> *<sup>k</sup>* a constant input current, *<sup>I</sup>*net *<sup>k</sup>* (*t*) the input current caused by spikes within the network and *I*ext *<sup>k</sup>* (*t*) the input current arising from spikes from external sources. When the neuron's membrane potential reaches or exceeds the threshold *<sup>k</sup>* its membrane potential is reset to *V*reset *<sup>k</sup>* and a spike is sent to the postsynaptic neurons *n*, where it changes the postsynaptic potential after a delay τ*nk*. After emitting a spike at *t* = *t*<sup>0</sup> the neuron becomes refractory for a time period *t* ref, i.e., *Vk*(*t*) = *V*reset *<sup>k</sup>* for *t* ∈ - *t*0, *t*<sup>0</sup> + *t* ref .

To keep the model analytically tractable, we model the fast rise of the membrane potential upon the arrival of presynaptic spikes by instantaneous jumps of the membrane potential, such that the resulting input current reads

$$I\_k^{\text{net}}(t) = \sum\_{l} \sum\_{m} \epsilon\_{kl} \delta \left( t - t\_{lm}^{\ell} - \tau\_{kl} \right). \tag{2}$$

Here *kl* denotes the coupling strength from neuron *l* to neuron *k*, *t f lm* is the *m*th spike time of neuron *l* and τ*kl* specifies the synaptic delay. In addition to spikes from the network each neuron receives excitatory and inhibitory random inputs that emulate an embedding network. These external inputs are modeled as random Poisson spike trains with rate νexc and νinh, respectively. The resulting input current is given by

$$I\_k^{\text{ext}}(t) = \sum\_m \epsilon^{\text{exc}} \delta \left( t - t\_{km}^{\text{ext}, \text{exc}} \right) + \sum\_m \epsilon^{\text{inh}} \delta \left( t - t\_{km}^{\text{ext}, \text{inh}} \right), \tag{3}$$

where *t* ext, exc *km* (*t* ext, inh *km* ) is the arrival time of the *m*th excitatory (inhibitory) spike to neuron *k* and exc > 0 (inh < 0) denote the corresponding coupling strength.

#### *2.1.2. Non-linear model*

In the above model all input currents are summed up linearly. To also investigate the effect of dendritic spikes we modulate the sum of synchronously arriving excitatory inputs by a non-linear dendritic modulation function σ*NL* (·). This can be directly read off from experimental data (Ariav et al., 2003; Gasparini et al., 2004; Polsky et al., 2004; Gasparini and Magee, 2006): If the sum of excitatory inputs is below the dendritic threshold *<sup>b</sup>*, the single inputs are processed linearly (σ*NL* (·) equals the identity). If the sum of inputs exceeds the dendritic threshold *b*, the depolarization is strongly non-linearly enhanced compared to that expected from linear summation. This is, in biological terms, due to a dendritic spike elicited. Larger inputs have been experimentally found to not further increase the somatic peak depolarization. The dendritic modulation function may then be modeled as

$$\sigma\_{\rm NL}(\epsilon) = \begin{cases} \epsilon & \text{for } \epsilon < \Theta\_b \\ \kappa & \text{otherwise} \end{cases}. \tag{4}$$

The dendrites process synchronous inputs non-additively: inputs below the dendritic threshold are summed linear, inputs above this threshold are summed supra-linear and, due to the saturation, very large inputs are summed sub-linear.

If not stated otherwise, we consider only exactly simultaneous arriving spikes as sufficiently synchronous; to allow for exactly simultaneous arrivals, the synaptic delays are chosen as homogeneous τ*kl* ≡ τ. The input currents caused by spikes that are received from the network are then given by

$$I\_k^{\text{net}}(t) = \sum\_{\mathbf{r}'} \left[ \sigma\_{NL} \left( \sum\_{l \in M\_{\text{rm}}\left(\mathbf{r}'\right)} \epsilon\_{kl} \right) + \sum\_{l \in M\_{\text{inh}}\left(\mathbf{r}'\right)} \epsilon\_{kl} \right] \delta\left(t - \mathbf{r}' - \mathbf{r}\right). \quad (5)$$

Here, the sum over *t <sup>f</sup>* denotes the sum over all times at which spike(s) are sent in the network, irrespective of which neuron(s) is (are) spiking. The sets *M*exc *t f* and *M*inh *t f* specify the set of neurons that send an excitatory or inhibitory spike at time *t f* , respectively. (To describe a network with linear dendrites σ*NL*() is replaced by ).

In section 3.3.1 we consider inhomogeneous delay distributions and finite dendritic integration window *t* (i.e., non-linear amplification of inputs received within finite time interval *t*) and discuss how the results achieved for homogeneous systems can be generalized.

#### **2.2. NETWORK TOPOLOGY**

We consider the propagation of synchrony in diluted Feed-Forward-Networks (FFNs, synfire-chains). They consist of a sequence of *m* layers, each composed of ω neurons. Neurons of one layer form excitatory projections to the neurons of the subsequent layer with probability *p*; the strength of an existing connection from neuron *l* to neuron *k* is denoted by *kl*.

For simplicity of presentation, we consider homogeneous neuronal populations, i.e., all neurons have identical properties (τ<sup>m</sup> *<sup>k</sup>* <sup>=</sup> <sup>τ</sup>m, *<sup>k</sup>* = and *V*reset *<sup>k</sup>* <sup>=</sup> *<sup>V</sup>*reset for all *<sup>i</sup>*), as well as homogeneous coupling strengths, i.e., *kl* = if a connection is realized, throughout this article. If not stated otherwise, we use τ<sup>m</sup> = 14 ms and - = 15 mV as standard values for the membrane time constant and the neuron threshold.

#### **2.3. GROUND STATE DYNAMICS**

We consider networks, where the single neurons are placed in a "fluctuation driven regime," i.e., in the ground state the average input to each neuron is sub-threshold and spiking of neurons is caused by fluctuations of the inputs. This setup allows to emulate the dynamics of neurons which are part of a balanced network (van Vreeswijk and Sompolinsky, 1996, 1998). The neurons fire asynchronously and irregularly with low firing rate ν; the spike trains resemble Poissonian spike trains (Tuckwell, 1988; Brunel and Hakim, 1999; Brunel, 2000; Burkitt, 2006). Thus, the inputs to the neurons may be described by three Poissonian spike trains with rates νexc (external, excitatory), νinh (external, inhibitory) and νint = ν*p*ω (inputs from the preceding layer). Since the number of inputs *N<sup>X</sup> <sup>T</sup>* , *X* ∈ {exc, inh, int}, in a time interval *T* is Poisson distributed, the expected number of inputs *N<sup>X</sup> T* and the variance *N<sup>X</sup> <sup>T</sup>* − *N<sup>X</sup> T* 2 , equal ν*XT* = *N<sup>X</sup> T* = *N<sup>X</sup> <sup>T</sup>* − *N<sup>X</sup> T* 2 . Then

$$\mu = I\_0 + \mathfrak{r}^{\mathrm{m}} \nu^{\mathrm{exc}} \epsilon^{\mathrm{exc}} + \mathfrak{r}^{\mathrm{m}} \nu^{\mathrm{inh}} \epsilon^{\mathrm{inh}} + \mathfrak{r}^{\mathrm{m}} p\omega \nu \epsilon \qquad (6)$$

is the mean of the total input to the neurons in an interval of the size of the membrane time constant, *T* = τm, and

$$
\sigma^2 = \mathfrak{r}^{\text{m}} \boldsymbol{\nu}^{\text{exc}} \left( \boldsymbol{\epsilon}^{\text{exc}} \right)^2 + \mathfrak{r}^{\text{m}} \boldsymbol{\nu}^{\text{inh}} \left( \boldsymbol{\epsilon}^{\text{inh}} \right)^2 + \mathfrak{r}^{\text{m}} \boldsymbol{\rho} \boldsymbol{\alpha} \boldsymbol{\nu} \boldsymbol{\epsilon}^2 \tag{7}
$$

is its variance. In diffusion approximation, the distribution of membrane potentials *PV*(*V*) and the mean firing rate ν can be derived analytically (Brunel and Hakim, 1999; Brunel, 2000; Helias et al., 2010). In particular, for networks with low firing rates the probability density of membrane potentials (see, e.g., Tuckwell, 1988)

$$P\_V(V) = \frac{1}{\sqrt{\pi \sigma^2}} \exp\left[-\left(\frac{V-\mu}{\sigma}\right)^2\right] \tag{8}$$

is Gaussian and can be expressed in terms of the input current. In this approximation the average firing rate is

$$\upsilon = \frac{1}{\sqrt{\pi} \,\mathrm{r}^{\mathrm{m}}} \frac{\Theta - \mu}{\sigma} \exp\left[ - \left( \frac{\Theta - \mu}{\sigma} \right)^{2} \right] \tag{9}$$

and depends on μ and σ only via the quotient

$$\alpha := \frac{\Theta - \mu}{\sigma},\tag{10}$$

which is the distance of the average input μ from the neurons' threshold normalized by the standard deviation σ of the input. For the analytical derivations throughout this article we focus on the regime of low spiking rates α - 2; ν 1.5Hz .

In the absence of synchronous activity each neuron receives a large number of inputs from the external network and only a few inputs from the previous layer of the FFN, such that the ground state dynamics of the network is mainly established by the external inputs. To keep the input balanced we choose νexc = νinh =: νext and exc = −inh =: ext throughout the article.

#### **2.4. PROPAGATION OF SYNCHRONY**

To initiate propagating synchronous activity along the considered diluted FFN, we excite in the first layer a subgroup of *g*<sup>0</sup> ≤ ω neurons to spike synchronously. This causes a synchronous input to the following layer after the synaptic delay τ and may therefore initiate synchronous spiking of a subgroup of neurons in that layer. These may again excite synchronous spiking in the next layer and so on. Depending on the ground state, i.e., the layout of the external network, on the layer size ω, and on the coupling strength , a synchronous pulse may or may not propagate along the FFN (cf. **Figures 1A,B,D,E**).

In addition to the triggered propagation, one might generally also expect the occurrence of spontaneous propagation of synchronous activity: Neurons of a particular layer share inputs from the previous layer and this causes correlations in their spiking

With increasing connection probability *p* propagation of synchrony can be enabled **(A,B)** in networks with additive (linear) and **(D,E)** in networks with non-additive (non-linear) dendritic interactions (*<sup>b</sup>* = 4 mV, κ = 11 mV). **(C,F)** Average number of synchronously active neurons in the second layer,

*n* = 10, 000 trials: solid line, transition probability: shading). Note that non-linear dendrites allow for sparser connectivity, **(E)** vs. **(B)** and for a sparser code, i.e., for smaller numbers of spiking neurons in an activated group, **(F)** vs. **(C)**.

activity. Over the layers these correlations can accumulate and lead to synchronous spiking (Aviel et al., 2003; Rosenbaum et al., 2010, 2011; Litvak et al., 2013). However, in the setups considered in this article, the effect is negligible due to two reasons: (1) each neuron receives a large number of external (uncorrelated) inputs and this background noise has a decorrelating effect, (2) we study the system near the critical point, i.e., for parameters where even synchronized spiking of all neurons of a particular layer is just sufficient to initiate a propagation of synchrony. Thus, spontaneous propagation of synchrony effectively does not occur.

We study the transition from the non-propagating to the propagating regime by means of a iterated map that yields the expectation value of the number of synchronously spiking neurons *gi* <sup>+</sup> <sup>1</sup> in layer *i* + 1 if *gi* neurons are synchronously active in layer *i*. There is always one trivial fixed point, *G*0, of this iterated map with 0 = *G*<sup>0</sup> = *gi* <sup>+</sup> <sup>1</sup> = *gi*, which corresponds to absent activity. If *gi* <sup>+</sup> <sup>1</sup> < *gi* for all *gi* > *G*0, synchronous activity will die out after a small number of layers. If *gi* <sup>+</sup> <sup>1</sup> ≥ *gi* for some substantial group size, *gi* > *G*0, a stable propagation of synchrony may be enabled (cf. **Figures 1C,F**). More precisely, we will show in this article that with increasing connectivity *p* the system undergoes a tangent bifurcation and two fixed points *G*<sup>1</sup> and *G*<sup>2</sup> ≥ *G*<sup>1</sup> appear. If existing, *G*<sup>1</sup> is always unstable (the diagonal is crossed from below; the slope of the iterated map needs to be larger than one) and *G*<sup>2</sup> is always stable [all connections within the FFN are excitatory such that the iterated map is monotonically increasing (slope larger than zero, in particular larger than −1)]; further at *G*<sup>2</sup> there is an intersection with the diagonal from above thus the slope is smaller than one and stationary propagation with group sizes around *G*<sup>2</sup> is enabled.

In computer simulations, we determine for each given network setup by the following procedure whether a propagation is possible: after some initial time *t* init we excite all neurons of the first layer to spike synchronously and measure the number of active neurons *gi* in the *i*th layer at the expected spiking time *t* exp *<sup>i</sup>* = *t* init + *i*τ. If *gi* is substantially larger than the number of active neurons arising from spontaneous activity in more than 50% of *n* trials (i.e., *n* repetition of the same simulation with different initial conditions), we denote the propagation of synchrony as successful. The critical connectivity *p*∗, that marks the transition from a regime where propagation of synchrony is not possible to a regime where propagation of synchrony is enabled, is found by determining the lowest connection probability *p* for which an initial synchronous pulse propagates successfully.

As the connections within the FFN are all excitatory, it is sufficient to check whether propagation of synchrony can be initiated by inducing synchronized spiking of all ω neurons of the first layer: Stationary propagation of synchrony can be enabled if there is a non-trivial stable fixed point (*G*2) of the iterated map for the average group size. For purely excitatory connections the basin of attraction of this fixed point is bounded from the left by an unstable fixed point (*G*1) and from the right by the maximum group size given by the layer size ω.

### **3. RESULTS AND DISCUSSION**

Under which conditions can synchronous signals propagate robustly along diluted FFNs? To answer this question in detail, we first focus on networks with linear dendrites. Afterwards we study the propagation of synchrony in networks incorporating non-additive dendritic interactions and compare with the linear case. Finally, we show that the derived results are directly applicable in biologically more detailed neuron models and network configurations.

#### **3.1. FFNs WITH LINEAR DENDRITES**

In this section, we consider linearly coupled FFNs. In the first part, we derive analytical estimates for the critical connectivity *p*∗ *<sup>L</sup>* that marks the transition from the non-propagating to the propagating regime; the initial steps follow the lines of Jahnke et al. (2012); Memmesheimer and Timme (2012). In the second part we investigate the influence of the external network on the propagation of synchrony and determine the parameter-region for which the analytical estimates are applicable. In particular, we show that the derived estimates are applicable in the biologically relevant parameter-region, where the spontaneous firing rate is low and the distribution of membrane potentials is sufficiently broad. Finally, we study how the properties of propagating synchronous pulses depend on different system parameters.

#### *3.1.1. Analytical derivation of critical connectivity*

To access the properties of propagation of synchrony we consider average numbers of active neurons in the different layers of an FFN: for this, we derive a iterated map which yields the expected number of neurons that will spike synchronously in one layer given that in the preceding layer a certain number of neurons was synchronously active.

If in the *i*th layer, *gi* neurons spike synchronously, the number of synchronous inputs *h* a single neuron in layer *i* + 1 receives follows a binomial distribution *h* ∼ *B gi*, *p* . We denote the spiking probability of a single neuron due to an input of strength *x* by *pf* (*x*). The average or expected spiking probability *p*sp *gi* of a single neuron in layer *i* + 1 is then given by

$$\operatorname{p}^{\operatorname{sp}}\left(\operatorname{g}\_{i}\right) = \operatorname{E}\left[\operatorname{p}\_{f}\left(h\epsilon\right)\Big|\operatorname{g}\_{i}\right] = \sum\_{h=0}^{\mathcal{S}i} \binom{\mathcal{S}\_{i}}{h} \operatorname{p}^{h}\left(1-\operatorname{p}\right)^{\operatorname{gr}-h} \operatorname{p}\_{f}\left(h\epsilon\right). \left(11\right))$$

Here and in the following we denote the expectation value of a function *f*(*X*) of a random variable *X* by E[*f*(*X*)]; conditional expectations are denoted by E[*f*(*X*)|*Y*]. The expected number of spiking neurons in layer *i* + 1 is then simply

$$\mathbb{E}\left[\left.\mathbb{g}\_{i+1}\right|\mathcal{g}\_{i}\right] = \alpha p^{\mathrm{sp}}\left(\mathcal{g}\_{i}\right) \tag{12}$$

$$=\boldsymbol{\omega} \sum\_{h=0}^{\mathcal{G}i} \binom{\mathcal{G}i}{h} \boldsymbol{p}^h \left(1-\boldsymbol{p}\right)^{\mathcal{G}i-h} \boldsymbol{p}\_f \left(h\boldsymbol{\epsilon}\right) . \tag{13}$$

If the connection probability *p* is low and/or the connection strengths are small, the spontaneous spiking activity in the absence of synchrony is only weakly influenced by the spiking activity within the FFN. Thus as a starting point, we assume that the ground state is exclusively governed by external inputs (effectively setting *ij* ≡ 0). Then, the mean input to the neurons in an interval of length τ<sup>m</sup> is μ = *I*<sup>0</sup> with standard deviation <sup>σ</sup> <sup>=</sup> ext<sup>√</sup> 2τmνext (cf. section 2.3). Using the probability density (Equation 8), we calculate the spiking probability of a single neuron, *pf* (*x*), due to an input of strength *x*;

$$p\_f\left(\mathbf{x}\right) = \int\_{\Theta - \mathbf{x}}^{\Theta} P\_V\left(V\right)dV\tag{14}$$

$$=\frac{1}{2}\left(\text{Erf}\left[\frac{\Theta-\mu}{\sigma}\right]-\text{Erf}\left[\frac{\Theta-\mu+x}{\sigma}\right]\right) \quad (15)$$

equals the probability of finding a neuron's membrane potential in the interval [- − *x*, -]. To derive a iterated map for the average number of active neurons (which maps *E*[*gi*] → *E*[*gi* <sup>+</sup> <sup>1</sup>]), we interpolate E - *gi* <sup>+</sup> <sup>1</sup> *gi* for continuous *gi* and in the second step replace *gi* by its expectation value E - *gi* . The fixed points, E - *gi* <sup>+</sup> <sup>1</sup> E - *gi* = E - *gi* , qualitatively determine the propagation properties of synchronous activity. In the rest of the manuscript we are dealing with the average number of active neurons in a given layer. Therefore, for simplicity we denote the expectation value of the average number of active neurons in a given layer *i* by *gi* instead of E - *gi* .

For sufficiently small connection probabilities *p* the map (Equation 12) has only one (trivial) fixed point *G*<sup>0</sup> = *gi* <sup>+</sup> <sup>1</sup> = *gi* = 0. Any initial synchronous pulse will die out after a small number of layers (see also **Figure 1**). With increasing connectivity two additional fixed points *G*<sup>1</sup> (unstable) and *G*<sup>2</sup> ≥ *G*<sup>1</sup> (stable) appear via a tangent bifurcation.

For FFNs with purely excitatory couplings between the layers, the second fixed point *G*<sup>2</sup> (if it exists) is always stable: The spiking probability *pf*(*x*) is monotonically increasing with input *x* and thus also the iterated map (Equation 13) is monotonically increasing (i.e., the slope is larger than 0). Moreover, if *G*<sup>2</sup> exists the slope of the iterated map at this intersection point with the diagonal is smaller than 1. This implies that *G*<sup>2</sup> is stable and synchronous pulses of size *gi* ≥ *G*<sup>1</sup> typically initiate a propagation of synchrony with an average number of active neurons around *G*2. The critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* at the bifurcation point marks the minimal connectivity that allows for stable propagation of synchrony.

Although the distribution of inputs from one layer to the subsequent one and the spiking probability of a single neuron *pf*(·) are known, there is no analytic closed form solution to the fixed point equation *gi* <sup>+</sup> <sup>1</sup> = *gi* = *g*<sup>∗</sup> *<sup>i</sup>* . In other words, we can compute the firing probability *pf* (*x*0) for any *x*0, and therefore also E - *gi* <sup>+</sup> <sup>1</sup> *gi* for any *gi*, but *g*<sup>∗</sup> *<sup>i</sup>* = E - *gi* <sup>+</sup> <sup>1</sup> *g*∗ *i* is transcendental. We thus derive an approximate solution. We choose some expansion point *gi* (see section 3.1.2 for details), and approximate the function E - *gi* <sup>+</sup> <sup>1</sup> *g*∗ *i* by a polynomial *gi* + *S*(*g*<sup>∗</sup> *<sup>i</sup>* − *gi*) in second order in (*g*<sup>∗</sup> *<sup>i</sup>* − *gi*) near *gi*. The arising quadratic fixed point equation *g*<sup>∗</sup> *<sup>i</sup>* <sup>=</sup> *gi* <sup>+</sup> *<sup>S</sup>*(*g*<sup>∗</sup> *<sup>i</sup>* − *gi*) is then analytically solvable in *g*<sup>∗</sup> *<sup>i</sup>* . This also allows to analytically compute the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>*: it is the parameter value at which the iterated map undergoes a tangent bifurcation, i.e., at which the two solutions of the fixed point equation become equal upon changing from complex-conjugate to real. Since the right hand side of Equation (13) does not offer itself for a direct series expansion in *g*<sup>∗</sup> *<sup>i</sup>* , we derive *gi* + *S*(*g*<sup>∗</sup> *<sup>i</sup>* − *gi*) from an appropriate expansion of *pf* (*h*) and a subsequent computation the arising expectation values.

In biologically relevant scenarios, the neurons usually receive a large number of synaptic inputs and thus the distribution of membrane potentials *PV*(*V*) is broad, *PV*(*V*) changes slowly with *V*. Then, *PV*(*V*) around some *V* = *V*<sup>0</sup> can be approximated by considering a series expansion with a small order and it is possible to derive an approximation for the critical connectivity *p*<sup>∗</sup> *L* based on an expansion of *pf*(·). Expanding *pf*(*x*) into a Taylor series around some *x*<sup>0</sup> and using Equation (12) yields

$$\mathbf{g}\_{i+1} = \omega \mathbf{E} \left[ \sum\_{n=0}^{\infty} \frac{p\_f^{(n)}(\mathbf{x}\_0)}{n!} \left( h \epsilon - \mathbf{x}\_0 \right)^n \bigg| \mathbf{g}\_i \right] \tag{16}$$

$$=\boldsymbol{\omega}\sum\_{n=0}^{\infty}\frac{\boldsymbol{\mathcal{P}}\_{f}^{(n)}(\boldsymbol{\chi}\_{0})}{n!}\mathrm{E}\left[(\boldsymbol{h}\boldsymbol{\epsilon}-\boldsymbol{\chi}\_{0})^{n}\,\middle|\,\mathcal{J}\_{i}\right].\tag{17}$$

Here and in the following we denote the *n*th derivative of a function *f*(*x*) at *x* = *x*<sup>0</sup> by

$$f^{(\eta)}\left(\mathbf{x}\_0\right) = \left. \frac{d}{d^\eta \mathfrak{x}} f(\mathbf{x}) \right|\_{\mathfrak{x} = \mathbf{x}\_0}. \tag{18}$$

Replacing the derivatives of *pf*(·) by the (one order lower) derivatives of probability density of membrane potentials *PV*(*V*) according to Equation (14) yields

$$\mathbf{g}\_{i+1} = \boldsymbol{\omega} \mathbf{p}\_f(\mathbf{x}\_0) + \boldsymbol{\omega} \sum\_{n=1}^{\infty} \frac{P\_V^{(n-1)}(V\_0)}{(-1)^{n-1}} \mathbb{E} \left[ (h\boldsymbol{\epsilon} - \boldsymbol{x}\_0)^n \Big| \, \mathbf{g}\_i \right], (19)$$

where we defined

$$V\_0 := \Theta - \mathfrak{x}\_0 \tag{20}$$

for better readability.

We have recently shown (Jahnke et al., 2012) that it is possible to derive a scaling law for the critical connectivity using

$$
\kappa\_0 = \mathfrak{g}i\mathfrak{p}\epsilon,\tag{21}
$$

the (unknown) average input from one layer to the next during stationary synchrony propagation, as expansion point. For this choice the expectation value E - (*h* − *x*0) *n gi* in Equation (19) simplifies to

$$\mathbb{E}\left[ (h\epsilon - \varkappa\_0)^n \big| \mathcal{g}\_i \right] = \epsilon^n \mathbb{E}\left[ (h - \mathbb{E}\left[h\right])^n \big| \mathcal{g}\_i \right] = \epsilon^n m\_n,\tag{22}$$

where we denote by *mn* the *n*th central moment of the Binomial distribution *B gi*, *p* , specifying the distribution of inputs to the (*i* + 1)th layer. In the limit of large layer sizes ω and small coupling strengths keeping the maximal input ω to each layer constant (to preserve the network state), all summands for *n* ≥ 2 vanish, and Equation (19) simplifies to

$$g\_{i+1} = \omega p\_f \left(\mathbf{g}\_i p \epsilon\right). \tag{23}$$

Using the implicit function theorem one can show that this implies the scaling law

$$p\_L^\* = \frac{1}{\lambda \epsilon \alpha} \tag{24}$$

where λ is a constant independent of and ω (Jahnke et al., 2012). We note that for the derivation of the scaling law (Equation 24) we did not use the actual functional form of the distribution of membrane potentials *PV*(*V*). Therefore this estimate holds if *PV*(*V*) is sufficiently slow changing with *V* such that the Taylor expansion (cf. Equation 16) is applicable, but its validity is not restricted to the low-rate approximation.

However, the dependence of the prefactor 1/λ on the layout of the external network remained unknown. Here, we present an approach that enables us to derive an approximate value for λ. We consider the expansion (Equation 19) around *x*<sup>0</sup> up to second order,

$$\begin{aligned} \mathfrak{g}\_{i+1} &\approx \mathfrak{w}p\_f(\mathfrak{x}\_0) + \mathfrak{w}P\_V\left(V\_0\right) \cdot \left(\mathfrak{e}\mathfrak{g}\_i p - \mathfrak{x}\_0\right) \\ &- \frac{\mathfrak{w}P\_V^{(1)}(V\_0)}{2} \left[ \left(\mathfrak{e}\mathfrak{g}\_i p - \mathfrak{x}\_0\right)^2 + \mathfrak{e}^2 \mathfrak{g}\_i p \left(1 - p\right) \right] \tag{25} \end{aligned}$$

The truncated series (Equation 25) is quadratic in *gi* such that the fixed points *g*<sup>∗</sup> <sup>1</sup>/<sup>2</sup> = *gi* <sup>+</sup> <sup>1</sup> = *gi* can be obtained analytically,

$$\mathbf{g}\_{\mathbf{l},2}^{\*} = \gamma\_{\mathbf{L}} \pm \sqrt{\mathbf{y}\_{L}^{2} - \frac{\mathbf{x}\_{0} \left(2P\_{V}(V\_{0}) + \mathbf{x}\_{0}P\_{V}^{(1)}(V\_{0})\right) - 2p\_{f}(\mathbf{x}\_{0})}{p^{2}P\_{V}^{(1)}(V\_{0})\epsilon^{2}}},\tag{26}$$

where we defined

$$\gamma\_L := \frac{p \epsilon \alpha \left(2 \left(P\_V(V\_0) + \mathbf{x}\_0 P\_V^{(1)}(V\_0)\right) + (p-1) \, P\_V^{(1)}(V\_0)\epsilon\right) - 2}{2p^2 P\_V^{(1)}(V\_0)\epsilon^2 \alpha}. \tag{27}$$

At the bifurcation point, the root in Equation (26) vanishes such that both fixed points agree (*g*<sup>∗</sup> <sup>1</sup> <sup>=</sup> *<sup>g</sup>*<sup>∗</sup> <sup>2</sup> ) and <sup>γ</sup>*<sup>L</sup>* <sup>=</sup> *<sup>g</sup>*<sup>∗</sup> <sup>1</sup> <sup>=</sup> *<sup>g</sup>*<sup>∗</sup> 2 specifies the average size of a propagating synchronous pulse. Consequently, the critical connectivity is obtained by choosing *p* such that

$$\gamma\_L^2 = \frac{\varkappa\_0 \left(2P\_V(V\_0) + \varkappa\_0 P\_V^{(1)}(V\_0)\right) - 2p\_f(\varkappa\_0)}{p^2 P\_V^{(1)}(V\_0) \epsilon^2} \tag{28}$$

which yields

$$p\_L^\* = \frac{1}{2} - \frac{1}{\epsilon} \left[ \frac{\lambda^\*}{P\_V^{(1)}(V\_0)} - \sqrt{\frac{2}{P\_V^{(1)}(V\_0)\alpha} + \frac{\left(\epsilon P\_V^{(1)}(V\_0) - 2\lambda^\*\right)^2}{4\left(P\_V^{(1)}(V\_0)\right)^2}} \right] \tag{29}$$

where we defined

$$\lambda^\* := P\_V(V\_0) + \mathbf{x}\_0 P\_V^{(1)}(V\_0) \tag{30}$$

$$-\sqrt{P\_V^{(1)}(V\_0) \left(\mathbf{x}\_0 \left(2P\_V(V\_0) + \mathbf{x}\_0 P\_V^{(1)}(V\_0)\right) - 2p\_f(\mathbf{x}\_0)\right)}$$

which is independent of the setup of the FFN and completely determined by the layout of the external network and the choice of the expansion point *x*0.

As before we consider the limit of large layer sizes ω and small coupling strengths , i.e., we replace ω → const and consider the leading terms of a series expansion of Equation (29). The expansion of the square bracket in Equation (29) yields

$$\frac{\lambda^\*}{P\_V^{(1)}(V\_0)} - \sqrt{\frac{2}{P\_V^{(1)}(V\_0)} \frac{\epsilon}{\text{const}} + \frac{\left(\epsilon P\_V^{(1)}(V\_0) - 2\lambda^\*\right)^2}{4\left(P\_V^{(1)}(V\_0)\right)^2}}$$

$$= \left[\frac{\lambda^\*}{P\_V^{(1)}(V\_0)} - \frac{\lambda^\*}{P\_V^{(1)}(V\_0)}\right] - \epsilon \left(\frac{1}{\lambda^\* \cdot \text{const}} - \frac{1}{2}\right) + O\left(\epsilon^2\right), \tag{31}$$

such that the critical connectivity assumes the functional form given by Equation (24),

$$p\_L^\* \approx \frac{1}{\lambda^\* \epsilon^\alpha}.\tag{32}$$

Thus λ = λ<sup>∗</sup> defined by Equation (30) provides an approximation of the constant λ fully specifying the critical connectivity *p*<sup>∗</sup> *L*.

#### *3.1.2. Optimal expansion point*

To derive Equation (30) we assumed that it is sufficient to consider the second order expansion of *pf*(*x*). It is thus necessary to choose an appropriate expansion point that results in fast convergence. In particular for the choice *x*<sup>0</sup> = *x*<sup>∗</sup> <sup>0</sup> , that we will now derive, Equation (37) below, the bifurcation diagram near the bifurcation point is well approximated already for *k* = 2 (cf. **Figure 2**).

The size of a propagating group at the critical connectivity is γ*<sup>L</sup>* (cf. Equation 27) and thus the resulting average input is *p*<sup>∗</sup> *<sup>L</sup>*γ*L*. Our expansion point *x*<sup>0</sup> should lie near to this value, which is, of course, unknown prior to solving the fixed point equation. We will thus compute a range in which *p*<sup>∗</sup> *<sup>L</sup>*γ*L* has to lie and choose the expansion point appropriately within. We assume that ω is large and employ Equation (23) which allows an direct estimate

that already a second order expansion (red), i.e., the lowest order at which a saddle node bifurcation can occur, approximates the bifurcation diagram

of this range as we know the functional form explicitly. Equation (23) with *gi* <sup>+</sup> <sup>1</sup> = *gi* is just another transcendental equation for the fixed points and it has zero, one, or two non-trivial fixed point solutions points *g*<sup>∗</sup> <sup>1</sup> and *<sup>g</sup>*<sup>∗</sup> <sup>2</sup> , which are then also solutions of Equation (19) with *gi* <sup>+</sup> <sup>1</sup> = *gi*. At the bifurcation point (*g*<sup>∗</sup> = *g*∗ <sup>1</sup> <sup>=</sup> *<sup>g</sup>*<sup>∗</sup> <sup>2</sup> ) where the diagonal is touched, the function *pf*(*gp*) has to be concave and monotonic increasing with respect to *g*. The definition (Equation 14) of *pf*(*x*) implies that it is monotonic increasing for all *x* ≥ 0. Moreover, it is concave for all *x* ≥ -− μ,

$$p\_f^{(1)}(\mathbf{x}) = \mathcal{V}\mathbf{v}(\Theta - \mathbf{x}) \ge 0 \quad \text{for } \mathbf{x} \ge \mathbf{0} \tag{33}$$

$$p\_f^{(2)}(\mathbf{x}) = -P\_V^{(1)}(\Theta - \mathbf{x}) \le \mathbf{0} \quad \text{for } \mathbf{x} \ge \Theta - \mu,\tag{34}$$

such that the bifurcation point satisfies

$$
\varkappa\_0 \ge \Theta - \mu. \tag{35}
$$

The condition Equation (33) holds because *PV*(*V*) ≥ 0 is a probability density and Equation (34) is derived directly from differentiating Equation (8). To maximize the quality of the second order approximation Equation (25), we choose *x*<sup>0</sup> = *x*<sup>∗</sup> <sup>0</sup> such that the contribution to the expansion (Equation 19) of the *k* = 3rd order term equals zero. According to Equation (19), all 3rd order terms are proportional to *P*(2) *<sup>V</sup>* (- − *x*0); so we determine the expansion point *x*<sup>∗</sup> <sup>0</sup> as a deflection point of *PV*(·), requiring that the second derivative of *PV*(*V*) vanishes for *V* = - − *x*<sup>∗</sup> 0 ,

$$p\_f^{(3)}(\mathbf{x}\_0^\*) = \frac{d^2 P\_V(V)}{dV^2} \Big|\_{V = \Theta - \mathbf{x}\_0^\*} \stackrel{!}{=} \mathbf{0}. \tag{36}$$

In the considered regime of low spiking rates, we find *x*<sup>∗</sup> <sup>0</sup> = - − μ ± <sup>√</sup><sup>σ</sup> 2 , cf. Equation (8). Due to Equation (35)

$$
\alpha\_0^\* = \Theta - \mu + \frac{\sigma}{\sqrt{2}}.\tag{37}
$$

For *x*<sup>0</sup> = *x*<sup>∗</sup> <sup>0</sup> the bifurcation diagram near the bifurcation point is well approximated already for *k* = 2 (cf. **Figure 2**) and Equation (30) provides a good estimate of the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* (cf. **Figure 3**).

#### *3.1.3. Influence of external network*

In the previous section we derived an iterated map for the average group size (cf. Equation 13) and an approximation for the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* (cf. Equations 30 and 32) that marks the transition from FFNs which do not support propagation of synchrony to FFNs that do. In this section we focus on the robustness of our results. How does the critical connectivity change with the layout of the external network? For which parameter range does the estimate of the critical connectivity (given by Equations 30 and 32) yield reasonable results?

The derivation was based on the assumption that the ground state dynamics of the neurons of the FFN is completely determined by the external inputs. This assumption holds if the spontaneous firing rate ν of the neurons and/or the coupling strengths

(blue) near the bifurcation point well.

 and/or the connectivity *p* are sufficiently small. We will generalize our approach and show how the impact of preceding layers on a layer's ground state can be taken into account. Thereafter we will compare the results with computer simulations, identify the regions in parameter space for which the derived approximations hold and discuss deviations between direct numerical simulations and analytics.

The first layer of an FFN receives inputs only from the external network and according to Equations (6, 7) the mean μ<sup>1</sup> and standard deviation σ<sup>1</sup> of its input is

$$
\mu\_1 = I\_0 \tag{38}
$$

$$
\sigma\_1 = \epsilon^{\text{ext}} \sqrt{2 \mathfrak{r}^{\text{m}} \nu^{\text{ext}}},\tag{39}
$$

as assumed in the previous section. All following layers receive external inputs and spikes from their preceding layer(s). The mean μ*<sup>n</sup>* and standard deviation σ*<sup>n</sup>* of the input to neurons of the *n*th layer (with *n* ≥ 2) reads (cf. Equations 6 and 7)

$$
\mu\_n = I\_0 + \mathfrak{r}^m p \omega \upsilon\_{n-1} \epsilon \tag{40}
$$

$$
\sigma\_n = \sqrt{2\nu^{\rm ext} \mathfrak{r}^{\rm m} (\epsilon^{\rm ext})^2 + p\alpha\nu\_{n-1}\mathfrak{r}^{\rm m}\epsilon^2}.\tag{41}
$$

Here we denote the spontaneous firing rate (in the absence of synchrony) of neurons of the (*n* − 1)th layer by ν*<sup>n</sup>* <sup>−</sup> 1. It is given by Equation (9) as

$$\upsilon\_{n-1} = \frac{1}{\sqrt{\pi} \mathfrak{r}^{\mathfrak{m}}} \frac{\Theta - \mu\_{n-1}}{\sigma\_{n-1}} \exp\left[ -\left( \frac{\Theta - \mu\_{n-1}}{\sigma\_{n-1}} \right)^2 \right]. \tag{42}$$

From layer to layer, the mean input, the standard deviation as well as the firing rate increase. For setups, where the ground state of the FFN is non-pathological, i.e., the firing rates of all layers are bounded, the additional corrections *Xn* := *Xn* − *Xn* <sup>−</sup> <sup>1</sup> for *X* ∈ {μ, σ, ν} decrease with *n*, and μ*n*, σ*<sup>n</sup>* and ν*<sup>n</sup>* saturate for sufficiently large *n*. Thus, μ∞ and σ∞ describe the input to the neurons of an infinitely long FFN and the single neurons of such an FFN spike with an average rate ν∞. Accordingly, replacing μ and σ by μ∞ and σ∞ in Equation (13) [where they appear as parameters of *pf*(·)] yields an iterated map for the average group size.

In **Figure 4**, we compare the critical connectivity found by numerically determining the bifurcation point of the iterated map (Equation 13) (i.e., we determined the connectivity *p* for which the iterated map touches the diagonal; solid lines) with computer simulations of propagating synchrony (markers). To also cover scenarios, where the input from the preceding layer is not negligible, we consider infinitely long FFNs (then, the distribution of membrane potentials is equal in all layers). In computer simulations this can be approximated by a sufficiently long FFN with periodic boundary conditions, i.e., an FFN where the last layer connects to the first layer. For moderate external inputs, i.e., moderate *I*<sup>0</sup> and ext, already the analytical results neglecting the influence of the preceding layers (using μ<sup>1</sup> and σ1) agree well with computer simulations (cf. **Figure 4A**, solid lines). However, for large external inputs, i.e., large *I*<sup>0</sup> and ext, the critical connectivity is overestimated. Here, the assumption that the distribution of membrane potentials is not influenced by the connectivity of the FFN does not hold. The additional input shifts the membrane potentials to higher values and consequently a lower connectivity is required for a propagation of a synchronous pulse. The corrections given by Equations (38–42) account for these deviations to some extent (cf. **Figures 4B,C**; solid lines), in particular for setups where the spontaneous firing rate is low. However, for very large *I*<sup>0</sup> and ext, the critical connectivity is under-estimated. Here, the spontaneous firing rate is too high and the low-rate approximation, Equations (8–9), is not adequate to describe the system; the firing rate and thus the mean input from the previous layer are over-estimated. This becomes particularly clear in **Figure 4C**, where we show the critical connectivity as a function of the strength of the external inputs ext. For any given *I*<sup>0</sup> (different colors), the critical connectivity for small ext is well approximated; with increasing ext the firing rate increases [α decreases and thus ν increases; cf. Equations (9 and 10)] and when the coupling strengths ext exceed a *I*0-dependent threshold, the low-rate approximation becomes inapplicable.

Applying the methods in Brunel and Hakim (1999); Brunel (2000), the firing rate and the distribution of membrane potentials can be derived in diffusion approximation for states with higher spontaneous firing rates. Although most of the analytical considerations above are also applicable within this approximation, the determination of an optimal expansion point (cf. Equations 36 and 37) becomes more difficult and a closed form expression does not exist. However, the critical connectivity can be obtained by numerically determining the fixed points of the iterated map (Equation 13) and we find that it agrees with

**FIGURE 4 | Robustness of analytical estimates of the critical connectivity. (A–C)** We consider the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* of infinitely long FFNs, that are approximated by an FFN (*m* = 20, ω = 150, = 0.2 mV) with periodic boundary conditions in direct numerical simulations (markers), for different layouts of the external network. Panels **(A,B)** show *p*<sup>∗</sup> *<sup>L</sup>* vs. *I*<sup>0</sup> for fixed ext and panel **(C)** shows *p*<sup>∗</sup> *<sup>L</sup>* vs. ext for fixed *<sup>I</sup>*0. The solid (colored) lines indicate the critical connectivity found by numerically determining the bifurcation point of the iterated map (Equation 13). In panel **(A)** we neglect the influence of previous layers on the ground state of a considered layer in the analytical computations [i.e., we use μ<sup>1</sup> and σ1, cf. Equations (38) and (39)]. In **(B,C)** we employ corrections to account for their influence, cf. Equations (38–42). We show the third order correction, higher orders add

computer simulations for the entire considered range of *I*<sup>0</sup> and ext, (cf. **Figures 4B,C**; gray lines).

Analogous to the approach presented above, corrections for the influence of preceding layers can be taken into account for the analytical estimate of the critical connectivity derived in the previous section (Equations 30 and 32). Replacing the connectivity *p* by the approximation *p*<sup>∗</sup> *<sup>L</sup>* <sup>=</sup> (λ∗ω) <sup>−</sup><sup>1</sup> in Equations (40, 41) yields

$$
\mu\_n = I\_0 + \mathfrak{r}^{\mathfrak{m}} / \lambda\_{n-1}^\* \upsilon\_{n-1} \tag{43}
$$

$$
\sigma\_n = \sqrt{2\nu^{\rm ext} \mathfrak{r}^{\rm m} (\epsilon^{\rm ext})^2 + \epsilon \nu\_{n-1} \mathfrak{r}^{\rm m} / \lambda\_{n-1}^\*} \tag{44}
$$

where λ<sup>∗</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup> := <sup>λ</sup><sup>∗</sup> (μ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>, <sup>σ</sup>*<sup>n</sup>* <sup>−</sup> <sup>1</sup>) is given by Equation (30) and ν*<sup>n</sup>* <sup>−</sup> <sup>1</sup> = ν (μ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>, σ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>) is given by Equation (42). In **Figure 4** we show the estimate of the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* = λ∗ *n*ω−<sup>1</sup> (cf. Equation 32) using λ<sup>∗</sup> <sup>1</sup> (panel a; dashed line), i.e., neglecting the influence of the preceding layers, and using a higher correction order (panel b,c; dashed line: third order). For sufficiently large ext the critical connectivity found by numerically determining the bifurcation point agrees with the analytical estimate given by Equation (32). As discussed above, the corrections Equations (43, 44) account for the deviations from the simulated data as long as the total spontaneous firing rate is sufficiently low. However, for small ext the critical connectivity is under-estimated. Here, the standard deviation of the inputs (cf. Equation 7) is low, such that the distribution of membrane potentials *PV*(*V*) is narrow [for ext → 0: *PV*(*V*) → δ (*V* − μ); cf. Equation (8)], the spiking probability of one neuron, *pf*(·), increases steeply in a small interval [for ext → 0: *pf*(*x*) → -(*x* − μ); cf. Equation (8)] and thus the approximation of *pf*(·) only small modifications to the curves, but the numerical computations get more costly. The thick gray lines in **(B,C)** indicate the bifurcation point of the iterated map (Equation 13) with *PV* (*V*) derived from the diffusion approximation of leaky integrate-and-fire neuron dynamics with Poissonian input (Brunel and Hakim, 1999; Brunel, 2000). The dashed lines are the estimates of the critical connectivity given by Equations (30 and 32). Again, in panel **(A)** we neglect the influence of previous groups on the ground state, in panels **(B,C)** we use the third order correction. The estimates agree with the data from numerical simulations within the biologically relevant parameter range, where (1) the spontaneous spiking activity is low and (2) the distribution of membrane potentials is sufficiently broad. For further explanations see text (section 3.1.3).

by the leading terms of a Taylor expansion is not sufficiently accurate.

However, in the biologically plausible parameter regime, where the firing rates are small and the distribution of membrane potentials is broad, the critical connectivity is approximated well by Equation (32) together with Equation (30) (defining λ∗), Equation (37) (defining *x*<sup>∗</sup> <sup>0</sup> ) and the corrections that account for the influence of the preceding layers, Equations (43, 44).

#### *3.1.4. Characteristics of propagating synchronous pulses*

In the previous sections, we have shown that a synchronous pulse may propagate along a diluted FFN. In this section we study the characteristics and properties of a propagating synchronous signal. We consider them at the transition to stable propagation, *p*∗ *<sup>L</sup>*, because there they depend only weakly on the network setup. How large is the fraction of neurons that participate in propagating synchrony? How does this fraction depend on the network setup?

To answer such questions, we consider the effect of a propagation synchronous pulse on the single layers in the network, as a measure for the effective pulse size. In other words, we consider the mean input μ*<sup>L</sup>* a neuron receives from the preceding layer if a synchronous pulse propagates along the FFN at the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>*. It is given by the product of the connection probability *p*<sup>∗</sup> *<sup>L</sup>*, the connection strength and the average size of a propagating synchronous signal γ*L*; using Equations (27) and (29) yields

$$\mu\_L = \gamma\_L p\_L^\* \epsilon = \frac{P V(\Theta - \mathbf{x}\_0^\*) + P\_V^{(1)}(\Theta - \mathbf{x}\_0^\*) \mathbf{x}\_0^\* - \lambda^\*}{P\_V^{(1)}(\Theta - \mathbf{x}\_0^\*)} \quad (45)$$

and after inserting λ<sup>∗</sup> as given by Equation (30),

$$\mu\_L = \sqrt{\frac{\mathbf{x}\_0^\* \left(2P\_V(\Theta - \mathbf{x}\_0^\*) + \mathbf{x}\_0^\* P\_V^{(1)} \left(\Theta - \mathbf{x}\_0^\*\right)\right) - 2p\_f \left(\mathbf{x}\_0^\*\right)}{P\_V^{(1)} \left(\Theta - \mathbf{x}\_0^\*\right)}}. \tag{46}$$

According to Equation (46) the average input μ*<sup>L</sup>* to the neurons due to a propagation of a synchronous pulse is independent of the layer size ω and coupling strength . For setups with moderate external inputs (i.e., inputs of the preceding layer influence the neurons' ground state only weakly; see also section 3.1.3) the distribution of membrane potentials *PV*(·) (cf. Equation 8), the firing probability of single neurons *pf*(·) (cf. Equation 14) as well as the expansion point (deflection point of *PV*(·); cf. Equation 37)

$$\mathbf{x}\_0^\* = \Theta - I\_0 + \epsilon^{\text{ext}} \sqrt{\mathbf{r}^{\text{m}} \boldsymbol{\nu}^{\text{ext}}} \tag{47}$$

are fully determined by the external inputs (*I*0, νext and ext). **Figures 5A,B** illustrates the dependence of μ*<sup>L</sup>* on the layout of the external network and the FFN: as expected from our analytical considerations, the dependence on the layer size and coupling strength is weak when *I*<sup>0</sup> and ext are kept fixed. With increasing mean of the external input (*I*0) the distribution of membrane potentials *PV*(*V*) is shifted toward the threshold -, such that it is more likely to find the membrane potential of the neurons near the threshold and the critical connectivity decreases (cf. also **Figures 4A,B**). Naturally this implies a decreasing average input μ*<sup>L</sup>* at *p*<sup>∗</sup> *<sup>L</sup>*, which is shown in **Figure 5A** for different external couplings ext and parameters of the FFN. Increasing the external coupling strength ext (and with it the variance of the external input) causes a broadening of the distribution of membrane potentials; the membrane potentials of some neurons are shifted toward the threshold and the membrane potentials of other neurons are shifted away from it. If the fraction of neurons that participate in the propagation of the synchronous pulse is large, this implies an increasing critical connectivity (**Figure 5B**; cf. also **Figure 4C**).

The spiking probability of a single neuron due to the mean input μ*<sup>L</sup>* equals the average fraction *p*frac of neurons of one layer that participate in a propagating synchronous pulse,

$$p^{\text{frac}} = \frac{\chi\_L}{\alpha} = p\_f \left( \mu\_L \right). \tag{48}$$

Interestingly, in the considered regime of low spiking rates and sufficiently broad distribution of membrane potentials, where the approximations given in section 3.1.1 are applicable, *p*frac depends on the setup of the external inputs only via the quotient α = - − μ <sup>σ</sup> (cf. Equation 10), or, equivalently, on the spontaneous firing rate ν of the neurons (cf. Equation 9). This can be shown by combining Equations (8, 37) and (Equation 46),

$$
\mu\_L = \sigma \left(\frac{e\pi}{2}\right)^{1/4} \left[\frac{\left(\sqrt{2} + 2\alpha\right)\left(3 + \sqrt{2}\alpha\right)}{2\sqrt{e\pi}}\right]
$$

$$
$$

$$=:\sigma f\_{\mu}(\alpha)\tag{50}$$

**FIGURE 5 | Properties of propagating synchronous pulses at the transition from the no-propagation to the propagation regime.** Panels **(A,B)** show the mean input μ*<sup>L</sup>* that a layer receives due to a propagating synchronous pulse in the preceding layer. μ*<sup>L</sup>* measures the effective pulse size (the impact of a propagating synchronous pulse) and is mainly determined by the external inputs rather than by the setup of the FFN. In **(A)** the variance of the external input (measured by ext) is fixed and μ*<sup>L</sup>* is plotted vs. *I*0; in **(B)** the mean external input *I*<sup>0</sup> is fixed and μ*<sup>L</sup>* is plotted vs. ext. The markers indicate μ*<sup>L</sup>* for FFNs of different sizes [ω and are given by the legend in **(A)**] obtained by numerical simulations of propagating synchrony. The dashed lines shows the approximation of μ*<sup>L</sup>* given by Equation (46) (which is independent of ω and ); the solid lines indicate μ*<sup>L</sup>* = *p*<sup>∗</sup> *<sup>L</sup>G*2; values of *<sup>p</sup>*<sup>∗</sup> *<sup>L</sup>* and *G*<sup>2</sup> are found semi-analytically, by numerically identifying the bifurcation point of the analytically derived iterated map (Equation 13) for the different network setups (both analytical estimates are corrected for the influence of inputs from the preceding layer up to the first order). Panel **(C)** shows the fraction *p*frac of neurons in a layer that participate in the propagation of a synchronous signal vs. α [(Equation 10); main panel] and vs. the spontaneous firing rate ν (inset). Data from different network setups are plotted without distinction as black dots in the main panel and with distinction by different colors and symbols in the inset (see legend); Simulations are repeated for different layouts of the external network (*I*<sup>0</sup> ∈ {1, 3,..., 11} mV; ext ∈ {0.1, 0.125,..., 1.0} mV). The solid lines indicate *pf* (μ*L*) = *fp* (α) as given by Equation (53). The layer size ω as well as the coupling strength influence *p*frac only weakly. *p*frac depends on the network setup mainly through α or, equivalently, through ν (cf. Equation 9): Measurement values from different network setups largely collapse to the graph of the function *pf* (μ*L*) = *fp* (α). For further explanations see text (section 3.1.4).

such that

$$p\_f(\mu\_L) = \frac{1}{2} \left[ \text{Erf} \left( \frac{\Theta - \mu}{\sigma} \right) + \text{Erf} \left( \frac{\mu\_L - \Theta + \mu}{\sigma} \right) \right] \tag{51}$$

$$=\frac{1}{2}\left[\text{Erf}\left(\alpha\right) + \text{Erf}\left(\frac{\mu\_L}{\sigma} - \alpha\right)\right] \tag{52}$$

$$=\frac{1}{2}\left[\text{Erf}\left(\alpha\right) + \text{Erf}\left(f\_{\mu}\left(\alpha\right) - \alpha\right)\right] =: f\_{\mathcal{P}}\left(\alpha\right). \tag{53}$$

In **Figure 5C** we compare the above predictions with direct numerical simulations: For different layer sizes ω, coupling strengths and layouts of the external networks (i.e., different values of *I*<sup>0</sup> and ext), we detect whether propagation of a synchronous pulse is possible and if so, we numerically determine the average fraction of participating neurons as well as the spontaneous firing frequency. We find that indeed the size of the synchronous pulse is determined essentially by the quotient α = - − μ <sup>σ</sup> and Equation (53) is a reasonable estimate of the average fraction of neurons spiking in each layer. With increasing α the fraction of participating neurons increases, it thus decreases with spontaneous firing rate ν see **Figure 5C**. For FFNs with low spontaneous spiking frequency almost all neurons of a layer participate in the propagation of a synchronous pulse.

#### **3.2. FFNs WITH NON-LINEAR DENDRITES**

In this section, we investigate propagation of synchrony mediated by dendritic non-linearities. Although the mechanism underlying the propagation is generally related to that in linear networks, the discontinuities introduced by non-additive dendritic interactions prevent a similar analytical approach. In the first part of this section, we thus derive analytical estimates for the critical connectivity *p*<sup>∗</sup> *NL* in non-linearly coupled networks based on a self-consistency approach (see also Jahnke et al., 2012). In the second part, we study the transition from propagation of synchrony mediated by linear dendrites to propagation of synchrony mediated by non-additive dendritic interactions upon increasing the degree of non-linearity in the networks. In the last part, we evaluate the robustness of the analytical estimates with respect to the layout of the external network.

#### *3.2.1. Analytical derivation of critical connectivity*

Neurons with non-additive dendritic interaction process excitatory input by a non-linear dendritic modulation function σ*NL* (see section 2.1), i.e., synchronous inputs that exceed the dendritic threshold *<sup>b</sup>* are amplified to an effective input of size κ (cf. Equation 4). Therefore the spiking probability of a single neuron due to a synchronous input of strength *x*, *pf* (σ*NL*(*x*)), is discontinuous and an approach based on expansion of *pf*(·) is inappropriate. To derive an analytical expression for the critical connectivity *p*<sup>∗</sup> *NL* in FFNs incorporating dendritic non-linearities, we consider the (average) fraction of neurons of one layer, *p*γ, that receive an input *x* larger than the dendritic threshold, *x* ≥ *b*, due to the propagating synchronous pulse. If there is a stable (stationary) propagation of synchrony established, *p*γ is constant throughout the layers, which allows us to formulate a self-consistency equation. The basic derivations have been published recently (Jahnke et al., 2012) and will be briefly reviewed in the following for the readers convenience.

For sufficiently small dendritic thresholds *<sup>b</sup>* and sufficiently large κ, the spiking probability of a neuron due to a sub-threshold input is small compared to the spiking probability of a suprathreshold input. Therefore, we approximate the spiking probability of a single neuron in response to a synchronous input of strength *x* by

$$p\_f \left( \sigma\_{NL}(\mathbf{x}) \right) = \begin{cases} p\_f \left( \kappa \right) & \text{if} \quad \kappa \ge \Theta\_b \\ 0 & \text{otherwise} \end{cases},\tag{54}$$

i.e., we assume that somatic spikes due to the synchronous pulse are exclusively generated by dendritically enhanced inputs. We denote the fraction of neurons that receive a dendritic spike by *p*γ. This may be considered as constant throughout the different layers if stable propagation of synchrony is enabled. Then the probability that a neuron receives exactly *k* inputs from the preceding layer follows a binomial distribution *k* ∼ *B* ω, *p*γ*pf* (κ) *p* , where *p*γ*pf* (κ) *p* is the probability that (1) a neuron of the preceding layer receives a supra-threshold input (*p*γ), (2) a somatic spike is elicited by that input *pf* (κ) and there is a connection from this spiking neuron to the considered neuron of the following layer (*p*). So we can formulate the self-consistency equation for *p*γ,

$$\rho\_{\mathcal{V}} = \sum\_{k=\lceil \Theta\_b/\epsilon \rceil}^{\alpha} \binom{\alpha}{k} \left( p\_{\mathcal{V}} p\_{\mathcal{f}} \left( \kappa \right) p \right)^k \left( 1 - p\_{\mathcal{V}} p\_{\mathcal{f}} \left( \kappa \right) p \right)^{\alpha - k} . \tag{55}$$

To solve Equation (55) we approximate the binomial distribution by a Gaussian distribution with mean δ := ω*p*γ*ppf* (κ) and standard deviation σδ := 'δ(1 − *p*γ*ppf* (κ)), which yields

$$p\_{\mathcal{Y}} = \frac{1}{2} \left[ 1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right) \right],\tag{56}$$

where we defined

$$m := \frac{8 - \Theta\_b/\epsilon}{\sigma\_{\aleph}}\tag{57}$$

$$=\frac{\alpha p\_{\text{\textdegree}} p p\_{f} \left(\kappa\right) - \Theta\_{b}/\epsilon}{\sqrt{\alpha p\_{\text{\textdegree}} p p\_{f} \left(\kappa\right) \left(1 - p\_{\text{\textdegree}} p p\_{f} \left(\kappa\right)\right)}}\tag{58}$$

as the difference between the average number of inputs (δ) and the number of inputs needed to reach the dendritic threshold (*<sup>b</sup>*/) normalized by the standard deviation of the number of inputs (σδ). Solving definition (Equation 58) for *p* and replacing *p*<sup>γ</sup> by Equation (56) yields

$$p\_{\rm NL} = \frac{n^2 \epsilon + 2\Theta\_b + n\sqrt{n^2 \epsilon^2 + 4\Theta\_b \left(\epsilon - \frac{\Theta\_b}{\omega}\right)}}{p\_f(\kappa)\epsilon(n^2 + \alpha)\left(1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)\right)},\qquad(59)$$

which is the connectivity *pNL* where stable propagation of synchrony with some given *n* (or, equivalently, some given *p*γ; cf. Equation 56) is established. We note that a propagation of synchrony mediated by dendritic spikes requires

$$
\epsilon \ll \phi > \Theta\_b \tag{60}
$$

(otherwise even the input caused by a synchronized spiking of all neurons of a layer in a fully connected FFN (*p* = 1) is not sufficient to reach the dendritic threshold *b*).

For parameters fulfilling the inequality (Equation 60), *pNL*(*n*) has a global minimum (see Appendix) and the critical connectivity *p*<sup>∗</sup> *NL*, again defined as the smallest connectivity that allows for a stable propagation of synchrony, matches that global minimum: any connectivity *p*NL above the minimal connectivity *p*<sup>∗</sup> *NL* has two preimages *n*<sup>1</sup> and *n*<sup>2</sup> corresponding to the both nontrivial fixed points *G*<sup>1</sup> and *G*<sup>2</sup> of the iterated map for the average group size (cf. **Figure 1** and section 2.4). However, there exists smaller connectivities for which a stationary propagation can be established. At the global minima *p*<sup>∗</sup> *NL* both preimages *n*<sup>1</sup> and *n*<sup>2</sup> collapse to *n*<sup>∗</sup> = *n*<sup>1</sup> = *n*<sup>2</sup> and correspond to the fixed point *G* = *G*<sup>1</sup> = *G*<sup>2</sup> of the iterated map at the bifurcation point of the tangent bifurcation. Here the transition from the regime where no propagation of synchrony is possible to the regime where a propagation of synchrony is enabled takes place. For *pNL* smaller than *p*<sup>∗</sup> *NL* there are no preimages (i.e., a stationary propagation of synchrony mediated by non-additive dendritic interactions cannot be established); this scenario correspond to the absence of the non-trivial fixed points of the iterated map for connectivities below the tangent bifurcation.

In the following we will obtain the minima of *pNL* (i.e., the critical connectivity *p*<sup>∗</sup> *NL*) in the limit of large layer sizes ω and small coupling strength . We first derive an approximation of Equation (59) (cf. Equation 62), determine the validity range of this approximation (cf. Equation 69) and finally obtain an estimate for the critical connectivity (cf. Equation 71). As before, we fix the maximal input ω to each neuron to preserve the network state and expand Equation (59) in a power series around → 0 and ω → ∞. Considering the leading terms yields

$$p\_{\rm NL} \approx p\_{\rm NL, \, a} := \frac{2\Theta\_b}{p\_f(\kappa)\,\epsilon\alpha} \frac{1 + n\sqrt{\frac{\epsilon}{\Theta\_b} - \frac{1}{\alpha}}}{1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)}.\tag{61}$$

Further a propagation mediated by dendritic spikes (as introduced above) requires that the layer size ω and the coupling strength are sufficiently large such that a sufficiently large fraction of neurons of each layer receive a total input larger than the dendritic threshold *<sup>b</sup>*. In particular for diluted FFNs, this requirement translates to ω *<sup>b</sup>* and Equation (61) simplifies further to

$$p\_{\rm NL, \, b} := \frac{2\Theta\_b}{p\_f \left(\kappa\right)\epsilon\alpha} \frac{1 + n\sqrt{\frac{\epsilon}{\Theta\_b}}}{1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)}.\tag{62}$$

Whereas *pNL* has always a global minimum for ω > *<sup>b</sup>*, this does not hold for the approximation *pNL*, *<sup>b</sup>*, e.g., (cf. also **Figure 6C**)

$$\lim\_{n \to -\infty} \left( p\_{NL,b} \right) = -\infty. \tag{63}$$

However, we will now show that *pNL*, *<sup>b</sup>* has a (local) minimum if (and only if) ∈ 0, <sup>2</sup>*b* π which approximates the global minimum of *pNL* and therefore serve as an estimate for the critical connectivity. Starting with *dpNL*, *<sup>b</sup>*(*n*) *dn <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*<sup>∗</sup> <sup>=</sup> 0 yields

$$\sqrt{\frac{\Theta\_b}{\epsilon}} = \sqrt{\frac{\pi}{2}} \exp\left(\frac{n^{\*^2}}{2}\right) \left(1 + \text{Erf}\left(\frac{n^\*}{\sqrt{2}}\right)\right) - n^\* =: f\left(n^\*\right), (64)$$

**FIGURE 6 | Determining the critical connectivity in FFNs with non-additive dendritic interactions. (A)** For a given setup, i.e., for a given dendritic threshold *<sup>b</sup>* and coupling strength <sup>&</sup>lt; <sup>2</sup>*b* <sup>π</sup> , the corresponding *n*<sup>∗</sup> (or equivalently *p*γ; cf. Equation 56) is found by Equation (64). The solid line indicates *n*<sup>∗</sup> vs. (left vertical scale), the dashed line *p*<sup>γ</sup> vs. (right vertical scale) and the markers *n*∗() for = {0.075, 0.3, 2.0} mV (see legend). [Here, the dendritic threshold is *<sup>b</sup>* = 4 mV, such that the estimate (Equation 64) is valid within the range ∈ (0, 2.55] mV; Equation (69)] **(B)** Knowing *n*<sup>∗</sup> allows to evaluate β *b* ∈ 1 <sup>2</sup> , 1 according to Equation (70).

Panel **(B)** shows β (cf. Equation 70) vs. (solid line, lower horizontal axis) and β vs. *n*<sup>∗</sup> (dashed line, upper horizontal axis), respectively. **(C)** Finally, the critical connectivity *p*<sup>∗</sup> *NL* is obtained by Equation (71) which depends on β *b* . Panel **(C)** shows the connectivity *pNL*[dashed; Equation (59)] and its approximation *pNL*, *<sup>b</sup>* [solid; Equation (62)] vs. *n*; for ∈ 0, *max* , *pNL*, *<sup>b</sup>* has a local minimum which agrees with the global minimum of *pNL*. The markers indicate the critical connectivity *p*<sup>∗</sup> *NL* obtained by the procedure described in **(A)** and **(B)**. For further explanations see text (section 3.2.1).

and *n*<sup>∗</sup> specifies the extremum of *pNL*, *<sup>b</sup>*(*n*). The second derivative of *pNL*, *<sup>b</sup>*(*n*) at the extremum *n*<sup>∗</sup> given by Equation (64) satisfies

$$\left. \frac{d p\_{\mathrm{NL},b}^2}{d n^2} \right|\_{n=n^\*} = \frac{2n^\* \sqrt{\frac{\Theta\_b}{\epsilon}}}{p\_f(\kappa) \, \alpha \left( 1 + \mathrm{Erf} \left[ \frac{n^\*}{\sqrt{2}} \right] \right)} > 0 \qquad (65)$$

if *n*<sup>∗</sup> > 0 such that the extremum actually is a minimum. Taken together, for a given setup, i.e., for given dendritic threshold *b* and coupling strength , the transcendent Equation (64) defines *n*<sup>∗</sup> which maximizes or minimize *pNL*, *<sup>b</sup>*(*n*) and if additionally *n*<sup>∗</sup> > 0 the extremum *pNL*, *<sup>b</sup>* (*n*∗) is a minimum.

Differentiating the right hand side of Equation (64),

$$\frac{df(n^\*)}{dn^\*} = n^\* \cdot e^{\frac{n^\* \overline{\pi}}{2}} \sqrt{\frac{\pi}{2}} \left( 1 + \text{Erf} \left[ \frac{n^\*}{\sqrt{2}} \right] \right) \tag{66}$$

$$\frac{d^2 f(n^\*)}{d n^{\*2}} = n^\* + \left(1 + n^{\*2}\right) e^{\frac{n^{\*2}}{2}} \sqrt{\frac{\pi}{2}} \left(1 + \text{Erf}\left[\frac{n^\*}{\sqrt{2}}\right]\right), \quad \text{(67)}$$

shows that *f* (*n*∗) (as defined in Equation 64) is (1) minimal for *n*<sup>∗</sup> = 0 and (2) monotonically increasing for *n*<sup>∗</sup> > 0; according to Equation (64) the minimum *n*<sup>∗</sup> = 0 corresponds to

$$
\epsilon^{\text{max}} := \frac{\Theta\_b}{\left[f(0)\right]^2} = \frac{2\Theta\_b}{\pi} \approx 0.64\Theta\_b. \tag{68}
$$

The left hand side of Equation (64), i.e., <sup>√</sup>*<sup>b</sup>*/, is monotonically decreasing with from infinity to zero. Thus Equation (64) has a solution for any

$$\left[\epsilon \in \left(0, \epsilon^{\max}\right]\right] = \left(0, \frac{2\Theta\_b}{\pi}\right] \tag{69}$$

and *p*<sup>∗</sup> *NL* := *<sup>p</sup>*<sup>∗</sup> *NL*, *<sup>b</sup>* (*n*∗) is the (local) minimum of Equation (62) and provides an estimate for the critical connectivity, the (global) minimum of Equation (59).

For better readability we define the function β(·),

$$\beta\left(\frac{\Theta\_b}{\epsilon}\right) := \frac{1}{2}\left(1 + \text{Erf}\left[\frac{n^\*}{\sqrt{2}}\right]\right) - n^\* \frac{e^{-\frac{n^\*^2}{2}}}{\sqrt{2\pi}},\tag{70}$$

where *n*<sup>∗</sup> = *n*<sup>∗</sup> *b* as given by Equation (64). We note that β *b* can also be considered as a function of *n*∗. By combining Equations (62), (64), and (70) we obtain the critical connectivity

$$p\_{\rm NL}^{\*} = \frac{\Theta\_{b}}{p\_{\uparrow}(\kappa) \epsilon \alpha} \cdot \frac{1}{\beta \left(\frac{\Theta\_{b}}{\epsilon}\right)}.\tag{71}$$

The function β(·) itself is monotonically decreasing with in the validity range ∈ (0, max] of the above approximation: within this interval *n*<sup>∗</sup> > 0 and *<sup>d</sup> dn*<sup>∗</sup> *<sup>f</sup>* (*n*∗) <sup>&</sup>gt; 0 and thus the derivative

$$\frac{d\emptyset}{d\epsilon} = \frac{d\emptyset}{dn^\*} \cdot \frac{dn^\*}{d\sqrt{\Theta\_b/\epsilon}} \cdot \frac{d\sqrt{\Theta\_b/\epsilon}}{d\epsilon} \tag{72}$$

$$=-\frac{e^{-\frac{\mu^{\*2}}{2}}n^{\*2}}{\sqrt{2\pi}}\cdot\left(\frac{df\left(n^{\*}\right)}{dn^{\*}}\right)^{-1}\cdot\sqrt{\frac{\Theta\_{b}}{4\epsilon^{3}}}\tag{73}$$

$$<0.\tag{74}$$

Consequently β assumes its minimum

$$
\beta^{\min} = \beta \left( n^\* = 0 \right) = \frac{1}{2} \tag{75}
$$

for = max = <sup>2</sup>*b* <sup>π</sup> and increases monotonically with decreasing against its asymptotic value

$$\beta^{\text{max}} = \lim\_{n^\* \to \infty} \left[ \frac{1}{2} \left( 1 + \text{Erf} \left[ \frac{n^\*}{\sqrt{2}} \right] \right) - n^\* \frac{e^{-\frac{n^\*^2}{2}}}{\sqrt{2\pi}} \right] = 1. \tag{76}$$

Thus the critical connectivity is bounded by

$$\mathbb{P}^0 := \frac{\Theta\_b}{p\_f(\kappa) \epsilon \alpha} \le p\_{NL}^\* \le 2 \cdot \frac{\Theta\_b}{p\_f(\kappa) \epsilon \alpha} = 2 \cdot p^0 \tag{77}$$

and converges to the lower bound *p*<sup>0</sup> for small and to its upper bound 2*p*<sup>0</sup> for large .

In **Figure 6** we visualize the determination of the critical connectivity (Equations 64, 70) and Equation (71). The critical connectivity obtained with the approach presented above agrees well with simulation data (cf. **Figure 7**).

#### *3.2.2. Transition from linear to non-linear propagation*

In the previous section we derived analytical estimates for the critical connectivity *p*<sup>∗</sup> *NL* in FFNs with non-additive dendritic interactions; *p*<sup>∗</sup> *NL* is determined by (1) the setup of the FFN (i.e., the layer size ω and coupling strength ; cf. **Figure 7**), (2) the parameters of the non-linear modulation function (i.e., the dendritic threshold *<sup>b</sup>* and enhancement level κ) and (3) the layout of the external network (i.e., the mean external input *I*<sup>0</sup> and its variance, which is proportional to ext). In this section, we discuss the influence of the parameters of the non-linear modulation function and study the transition from a regime where propagation of synchrony is mediated by dendritically enhanced inputs to a regime where the majority of inputs is processed linearly.

In general, with increasing threshold *<sup>b</sup>* more and more inputs are needed to reach this threshold and consequently the critical connectivity *p*<sup>∗</sup> *NL* increases. If *<sup>b</sup>* exceeds μ*L*, which is the average input to the neurons if a synchronous pulse propagates in linearly coupled FFNs (cf. Equation 45 and **Figure 5**), propagation mediated by linearly processed spikes is enabled for lower connectivities than propagation mediated by dendritic non-linearities. In this regime the linearly summed inputs (for *p* = *p*<sup>∗</sup> *<sup>L</sup>*) are sufficient to maintain propagation of synchrony, but are not sufficient to cross the dendritic threshold. Increasing *<sup>b</sup>* even further has no

influence on the critical connectivity *p*<sup>∗</sup> *NL*, here a propagation of synchrony is possible for *p* ≥ *p*<sup>∗</sup> *<sup>L</sup>* as discussed in section 3.1.

We illustrate this transition from non-linear to linear propagation in **Figure 8A**: We start with large *<sup>b</sup>* = μ*<sup>L</sup>* such that propagation is enabled for *p* ≈ *p*<sup>∗</sup> *<sup>L</sup>* and also set κ = μ*L*. In fact, the linear critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* slightly under-estimates the observed critical connectivity *p*<sup>∗</sup> *NL* as it does not account for the saturation of the non-linear modulation function, i.e., for the cutoff σ*NL*(*x*) = κ of inputs *x* ≥ κ. With decreasing *<sup>b</sup>* the critical connectivity is substantially reduced and well approximated by Equation (71). Propagation of synchrony is now mainly mediated by dendritically enhanced inputs as described in section 3.2.1. The inset illustrates the impact of decreasing the dendritic threshold *<sup>b</sup>* on the iterated map. Initially, for *<sup>b</sup>* = μ*<sup>L</sup>* = κ, the iterated map for linearly coupled and non-linearly coupled FFNs is similar; with decreasing *<sup>b</sup>* the jump like rise in the iterated map is shifted to lower group sizes and consequently the bifurcation point is shifted to lower connectivities.

The non-linear modulation function σ*NL*(·) (cf. Equation 4) saturates for strong inputs, thus the enhancement level κ defines the maximal (effective) input to a neuron and *pf* (κ) is an upper bound for the spiking probability of any neuron in response to incoming inputs. This implies that in contrast to linearly coupled FFNs, the average size of a propagating synchronous pulse, γ*NL*, given by the product of the probability of a neuron receiving sufficiently strong input to reach the dendritic threshold (*p*γ; cf. Equation 56), the spiking probability due to that input [*pf* (κ)] and the layer size ω, is bounded from above by

$$
\gamma\_{\rm NL} = p\_{\rm f} p\_{\rm f} \left( \kappa \right) \omega \leq \omega p\_{\rm f} \left( \kappa \right) =: \gamma^{\rm max}.\tag{78}
$$

This bound decrease with decreasing κ as illustrated by **Figure 8B** (inset), where we compare the iterated maps for different values of κ. *pf* (κ) also influences the critical connectivity *p*<sup>∗</sup> *NL* (cf. Equation 71): For small κ the spiking probability *pf* (κ) is low and thus *p*∗ *NL* is large (it may even exceed *<sup>p</sup>*<sup>∗</sup> *<sup>L</sup>*). With increasing κ also *pf*(κ) increases and consequently the critical connectivity *p*<sup>∗</sup> *NL* decreases; for very large κ the spiking probability *pf* (κ) approaches 1 (cf. Equation 14) and *p*<sup>∗</sup> *NL* saturates (cf. **Figure 8B**).

In **Figure 8C** we show the critical connectivity for an additive enhancement by a constant , i.e., inputs exceeding the dendritic threshold *<sup>b</sup>* are increased by the constant value = κ − *b*. For small κ the critical connectivity *p*<sup>∗</sup> *NL* is relatively large and may exceed *p*<sup>∗</sup> *<sup>L</sup>* due to the low saturation level of the non-linear modulation function σ*NL*(·) (cf. also **Figure 8B**). As mentioned above, with increasing κ, also *pf*(κ) increases and the critical connectivity *p*<sup>∗</sup> *NL* decreases. However, for large κ and thus large dendritic threshold *<sup>b</sup>* propagation of synchrony mediated by linearly processed spikes is possible for lower connectivities than propagation mediated by dendritic non-linearities. Consequently, *p*∗ *NL* converges toward *<sup>p</sup>*<sup>∗</sup> *<sup>L</sup>* (cf. also **Figure 8A**).

#### *3.2.3. Influence of external network*

In section 3.2.1 we derived an estimate of the critical connectivity *p*∗ *NL* for FFNs with non-additive dendritic interactions. So far we discussed the influence of the setup of the FFN (layer size ω and coupling strength ) as well as the parameters of the non-linear modulation function σ*NL* (dendritic threshold *<sup>b</sup>* and enhancement level κ). In the current section, we focus on the remaining determining factor, the layout of the external network. How does the critical connectivity change with the mean external input *I*<sup>0</sup> and external coupling strength ext and how well are these changes covered by our analytics?

For the derivation of *p*<sup>∗</sup> *NL* we assumed that somatic spikes are elicited exclusively by dendritically enhanced inputs (cf. Equation 54) and thus the critical connectivity depends on the layout of the external network only via *pf* (κ) (cf. also Equation 71), i.e., on the average spiking probability of a neuron receiving an input larger than the dendritic threshold *x* ≥ *<sup>b</sup>*. For sufficiently small *pf* (κ), *p*<sup>∗</sup> *NL* > 1 and propagation of synchrony is not possible. With increasing *pf* (κ) the critical connectivity decreases and for *pf* (κ) → 1 it converges to *<sup>b</sup>* (ωβ [*<sup>b</sup>*/]) <sup>−</sup>1, independent of the external network.

In the regime of low spiking rates, changing the mean external input *I*<sup>0</sup> simply shifts the distribution of membrane potentials *PV*(*V*) (which is a Gaussian distribution centered at *I*0; cf. Equation 8). Thus, with increasing *I*0, *pf* (κ) increases and the critical connectivity *p*<sup>∗</sup> *NL* decreases.

In **Figure 9A** we show the critical connectivity for different ext [which determines the width of *PV*(*V*)] vs. the mean external input *I*0. For *I*<sup>0</sup> = - − κ (such that the sum of a dendritically enhanced input and the center of the distribution of membrane

**FIGURE 8 | Transition from linear to non-linear propagation.** The figure shows the critical connectivity *p*<sup>∗</sup> *NL* vs. the parameters of the non-linear modulation function σ*NL* (cf. Equation 4) for different network setups (color code, see **(C)**). The lines are the theoretical predictions for *p*<sup>∗</sup> *NL* [solid, Equation (71)] and *p*<sup>∗</sup> *<sup>L</sup>* [dashed, Equation (32)]. The markers indicate the minimal connectivity for which a synchronous pulse propagates from the first to the last layer in an FFN (*I*<sup>0</sup> = 5 mV, νext = 3 kHz, ext = 0.5 mV) with *m* = 20 layers in at least 50% of *n* = 30 trials. The insets illustrate the effect of changing *<sup>b</sup>* and κ on the iterated map, cf. Equation (13), where connectivity is kept constant. **(A)** Critical connectivity vs. dendritic threshold *<sup>b</sup>* for constant enhancement level κ = μ*<sup>L</sup>* ≈ 13.7 mV (cf. Equation 50). If the dendritic threshold *<sup>b</sup>* is sufficiently small such that *pf* (*<sup>b</sup>*) *pf* (κ) (cf. Equation 54), the propagation of synchrony is mainly mediated by non-linear enhanced inputs and the critical connectivity can be estimated by

**FIGURE 9 | Dependence of the critical connectivity** *p***∗** *NL* **on the layout of the external network. (A,B)** The lines indicate the theoretical prediction for *p*∗ *NL* given by Equation (71) and agree well with the data from direct numerical simulations (markers; FFN with ω = 150, = 0.2 mV, *<sup>b</sup>* = 4 mV, κ = 11 mV, *m* = 20). Panel **(A)** shows the critical connectivity vs. the mean external input *I*<sup>0</sup> for fixed ext and panel **(B)** shows the critical connectivity vs. ext for fixed mean external input *I*0. The gray line indicates the minimal critical connectivity obtained for *pf* (κ) = 1. With increasing mean (external) input *I*<sup>0</sup> the distribution of membrane potentials *PV* (*V*) is shifted toward the somatic threshold -, thus the spiking probability *pf* (κ) upon the reception of a non-linear enhanced input increases and the critical connectivity *p*<sup>∗</sup> *NL* decreases. For *I*<sup>0</sup> = - − κ, *pf* (κ) ≈ 0.5 (cf. Equation 80) and *p*<sup>∗</sup> *NL* is largely independent of the layout of the external network [blue solid line in **(B)**; cf. also **(A)** where all curves coincide]. Further explanations see text (section 3.2.3).

potentials equals the somatic threshold -), *pf* (κ) simplifies to

$$p\_{\mathcal{f}}(\kappa) = \frac{1}{2} \left( \text{Erf} \left[ \frac{\Theta - I\_0}{\sigma} \right] + \text{Erf} \left[ \frac{\kappa - \Theta + I\_0}{\sigma} \right] \right) \tag{79}$$

Equation (71). For large *<sup>b</sup>* the probability that an input from the preceding layer exceeds the dendritic threshold is very low, propagation of synchrony is mainly mediated by linearly processed inputs and the critical connectivity is given by Equation (32). Between these scenarios (for moderate *<sup>b</sup>*) there is a "transition regime," where linear and non-linear propagation mix [similarly in **(C)**]. **(B)** Critical connectivity vs. enhancement level κ for constant threshold *<sup>b</sup>* = 4 mV. For small enhancement levels κ the (maximal) spiking probability of a single neuron, *pf* (κ), is small and thus the critical connectivity *p*<sup>∗</sup> *NL* is large. With increasing κ, *pf* (κ) increases and thus *p*<sup>∗</sup> *NL* decreases; for large κ, *pf* (κ) → 1 (a neuron will almost surely spike upon the receipt of a non-linearly enhanced pre-synaptic input) and the critical connectivity saturates. **(C)** Critical connectivity vs. enhancement level κ for an additive enhancement by a constant = κ − *<sup>b</sup>* = 4 mV. For further explanations see text (section 3.2.2).

$$=\frac{1}{2}\text{Erf}\left(\frac{\Theta - I\_0}{\sigma}\right) \tag{80}$$

and thus in the regime of low spiking rates, i.e., (- − *I*0) /σ 1, *pf* (κ) ≈ 0.5 independent of the width of the distribution of membrane potentials. Consequently, all curves for different ext coincide at this point. For *I*<sup>0</sup> > - − κ the majority of neurons (>50%) would spike upon receipt of a dendritically enhanced input. Thus *pf* (κ) increases and therewith the critical connectivity decreases upon decreasing ext . In the limit of → 0, *PV*(*V*) converges toward a δ-distribution centered at *I*<sup>0</sup> and *pf* becomes a step-function

$$p\_f(\kappa) = \begin{cases} 0 & \kappa < \Theta - I\_0 \\ 1 & \kappa \ge \Theta - I\_0 \end{cases} \tag{81}$$

such that the critical connectivity is either constant and minimal for *I*<sup>0</sup> ≥ - − κ or it diverges (no propagation possible) for *I*<sup>0</sup> < -− κ (cf. **Figure 9A**; magenta curve).

In **Figure 9B** we illustrate the effect of changing ext on the critical connectivity for constant *I*0. As discussed above for *I*<sup>0</sup> = - − κ, *pf* (κ) and thus *p*<sup>∗</sup> *NL* are rather independent of ext and for *I*<sup>0</sup> > - − κ the critical connectivity increases with ext. For *I*<sup>0</sup> < - − κ an increase of the width of the distribution of membrane potentials shifts the membrane potential of more and more neurons toward the relevant interval [- − κ, -] and thus *pf* (κ) increases and the critical connectivity *p*<sup>∗</sup> *NL* decreases.

For the derivation of *p*<sup>∗</sup> *NL* we have assumed that the ground state dynamics is essentially not influenced by the spontaneous activity of the FFN itself (i.e., <sup>μ</sup> <sup>=</sup> *<sup>I</sup>*<sup>0</sup> and <sup>σ</sup> <sup>=</sup> ext<sup>√</sup> 2τmνext ). As discussed in section 3.1.3, we can correct the results for such influences. However, since in non-linearly coupled FFNs the impact of (non-linearly enhanced) synchronous activity is much stronger than the impact of spontaneous activity (which is irregular and not amplified by non-additive dendritic interactions), we find that the deviations between the corrected and uncorrected version of *p*<sup>∗</sup> *NL* is negligible.

Finally, we compare the critical connectivity for networks with and without non-additive dendritic interactions: The factor

$$\mathcal{L}^{\text{rat}} := \frac{p\_L^\*}{p\_{NL}^\*} = \frac{p\_f^\*(\kappa)}{\lambda \Theta\_b} \mathfrak{F}\left(\frac{\Theta\_b}{\epsilon}\right) \tag{82}$$

measures how much the connectivity within the FFN can be reduced by introducing non-additive dendritic interactions. It is independent of the layer size ω and becomes maximal in the limit of small coupling strengths as β (*<sup>b</sup>*/) → βmax = 1 for → 0 (cf. Equation 76). It increases with decreasing *<sup>b</sup>* and increasing κ (see discussion in section 3.2.2). In **Figure 10** we show the influence of the external network. As discussed above, for small *I*0, propagation of synchrony is not possible (the non-linear enhanced input is insufficient to elicit sufficiently many spikes in the layers of the FFN; white areas in **Figure 10**). With increasing *I*0, *p*<sup>∗</sup> *NL* decreases and *<sup>c</sup>*rat increases.

#### **3.3. GENERALIZATIONS**

In the final section we discuss generalizations of the methods and results we derived. Compared to biological neurons, our models have simplifications which enable the analytical treatment, but might be suspected to be influential on the final result. These simplifications are the homogeneous delay distribution, the simplified initiation and impact of dendritic spikes, the limit of short synaptic currents and the sub-threshold leaky integrate-and-fire

dynamics. Here, we verify that our results generalize to biologically more detailed neurons without these simplifications. In particular, we show that the estimates for the critical connectivity hold. Further, we consider a qualitatively different dendritic interaction function which assumes that the saturation is incomplete, i.e., beyond a region of saturation the impact of larger inputs increases. We show that the tools developed in the article are still applicable and reveal a new phenomenon, the coexistence of linear and non-linear propagation of synchrony.

In the first part (section 3.3.1), we discuss the influence of inhomogeneous delay distribution and finite dendritic integration windows. In the second part (section 3.3.2), we consider the non-linear modulation function with incomplete saturation. Finally, we consider biologically more detailed neuron models (section 3.3.3).

#### *3.3.1. Heterogeneous delays*

So far we considered FFNs with homogeneous delay distribution and dendritic modulation functions with integration window of zero length, i.e., only exactly synchronized inputs were possibly non-linearly amplified. Are these assumptions crucial for the obtained results? How does the critical connectivity change in the presence of heterogeneous delay distributions?

To answer this question, we consider synaptic delays τ*kl* (specifying the synaptic delay between neuron *l* and *k*) uniformly drawn from

$$\mathbf{r}\_{kl} \in \left[\mathbf{r} - \frac{\Delta T}{2}, \mathbf{r} + \frac{\Delta T}{2}\right],\tag{83}$$

where τ is the mean delay. A direct consequence of heterogeneous delay distribution is that the spikes of the propagating synchronous signal are not simultaneous (i.e., exactly synchronized) anymore. To describe the system accurately one has to consider additionally to the size (*gi*) also the temporal jitter (*si*) of the synchronous pulse in the *i*th layer and investigate the twodimensional iterated map for (*gi*, *si*) (e.g., Diesmann et al., 1999; Gewaltig et al., 2001; Goedeke and Diesmann, 2008). However, even if the synchronous pulse is blurred out to a pulse packet with finite width, for sufficiently large connectivity stable propagation still can be obtained (see e.g., Gewaltig et al., 2001).

For linearly coupled FFNs, with increasing width of the delay distribution, *T*, the propagating pulse becomes broader and thus the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* increases (cf. **Figures 11A,B**; squares). However, the scaling with layer size (cf. **Figure 11A**) and coupling strength (data not shown) is the same.

Under the assumption that the width of the pulse packet stays bounded, one can derive a lower bound for the critical connectivity. We assume that a pulse in layer *i* is perfectly synchronized and calculate the effective peak of the depolarization in the (*i* + 1)th layer. Replacing the coupling strength by the effective depolarization (derived below, cf. Equation 89) in the estimate of the critical connectivity (cf. Equation 32) one gains an estimate of the critical connectivity for systems with heterogeneous delays [Equation (90); shown in **Figure 11**]. Consider a perfectly synchronized pulse in layer *i*. Due to inhomogeneities in the delay, the inputs arriving at the (*i* + 1)th layer are distributed uniformly in an interval of size *T* (Equation 83). We assume that all inputs

arriving at a neuron of layer *i* + 1 are equidistantly distributed over [−*T*/2, *T*/2], i.e., the arrival time of the *l*th of a total number of *k* inputs is

$$t\_l^{\text{arr}} = \pi - \frac{\Delta T}{2} + \frac{\Delta T}{k - 1} \cdot (l - 1) \,. \tag{84}$$

We consider the subthreshold dynamics only. Each single input depolarizes the neuron by an amount and afterwards the membrane potential *V*(*t*) decays exponentially toward its asymptotic value (*I*0) with the membrane time constant τ<sup>m</sup> (cf. Equations 1, 2) until the next input arrives after a time interval *<sup>T</sup> k* − 1 (cf. Equation 84). Thus the total (effective) depolarization caused by the sum of these *k* inputs at the end of the considered time interval τ + *<sup>T</sup>* 2 is

$$
\Delta \epsilon\_k \, = \sum\_{l=1}^{k} \epsilon \exp \left( -\frac{1}{\mathfrak{r}^{\rm m}} \frac{\Delta T}{k-1} \, (l-1) \right) \tag{85}
$$

$$\epsilon = \epsilon \, \frac{\exp\left(-\frac{\Delta T}{\mathfrak{t}^{\text{m}}} \frac{k}{k-1}\right) - 1}{\exp\left(-\frac{\Delta T}{\mathfrak{t}^{\text{m}}} \frac{1}{k-1}\right) - 1}. \tag{86}$$

We consider the effective depolarization per input, , in the limit of a large number of inputs *k* (*k* → ∞),

$$
\epsilon' = \lim\_{k \to \infty} \left( \frac{\Delta \epsilon\_k}{k} \right) \tag{87}
$$

$$=\frac{\mathfrak{r}^{\mathrm{m}}}{\Delta T}\left(\mathrm{l}-\exp\left[-\frac{\Delta T}{\mathfrak{r}^{\mathrm{m}}}\right]\right)\epsilon\tag{88}$$

$$\epsilon = \colon \mathbb{C} \left( \Delta T \right) \epsilon. \tag{89}$$

Thus the correction factor *C* (*T*) ≤ 1 defined in Equation (89) relates the coupling strength to the effective coupling strength in the presence of inhomogeneous delays. The critical connectivity is then given by (cf. Equation 32)

$$p\_L^\* = \frac{1}{C\_\cdot(\Delta T)} \cdot \frac{1}{\lambda^\* \epsilon \alpha} \tag{90}$$

and this estimate agrees well with direct numerical simulations (cf. **Figure 11**).

For FFNs with dendritic non-linearities and inhomogeneous delays τ*kl*, one has to consider a finite dendritic integration window *t* d. Instead of amplifying only simultaneously received spikes (cf. Equation 5), the sum of spikes within the time interval *t* is considered. We denote the sum of inputs to a neuron within the time interval [*t* − *t*,*t*] by

$$S\_k^{\Delta t}(t) = \sum\_{l} \sum\_{m} \epsilon \chi\_{\left[t-\Delta t, \left.t\right|\right]} \left(\mathfrak{l}\_{lm}^{f} + \mathfrak{r}\_{kl}\right),\tag{91}$$

where

$$\chi\_A(\mathbf{x}) = \begin{cases} 1 & \text{if } \mathbf{x} \in A \\ 0 & \text{if } \mathbf{x} \notin A \end{cases} \tag{92}$$

is the indicator function and *t f lm* is the *m*th firing time of neuron *l* as before. If *S<sup>t</sup> <sup>k</sup>* (*t*) exceeds the dendritic threshold *<sup>b</sup>* for some *t* = *t*0, neuron *k* is depolarized additionally (to the depolarization arising from linear spike summation) by

$$
\epsilon\_{\kappa}^{\text{add}}\left(t\_0\right) = \kappa - S\_k^{\Delta t}\left(t\_0\right) \tag{93}
$$

such that the total (effective) depolarization caused by an input *x* ≥ *<sup>b</sup>* equals κ, modeling the effect of a dendritic spike; cf. also section 3.3.3. After such an additional depolarization the dendrite becomes refractory for a time *t* ref,ds and does not transfer additional spikes within the interval - *t*0, *t*<sup>0</sup> + *t* ref,ds . For *t* = 0 we recover the non-linear modulation function σ*NL*(·) given by Equation (4). Due to the finite dendritic interaction window, a delay distribution with *T* ≤ *t* affects the critical connectivity only weakly (cf. **Figure 11B**). For *T* > *t*, some of the inputs received from the preceding layer upon a propagation of synchrony fall out of the dendritic interaction window *T* and thus the critical connectivity increases. However, the scaling with layer size ω (cf. **Figure 11B**) and coupling strength (data not shown) is practically identical with the scenario *T* = 0.

Before we discuss propagation of synchrony in biologically more plausible neuron models in section 3.3.3, we consider generalization of the non-linear modulation function in the following section.

#### *3.3.2. Coexistence of linear and non-linear propagation*

In this article, we employed a non-linear modulation function σ*NL*() that is linear for dendritic stimulation smaller than the dendritic threshold, < *b*, and constant (i.e., saturates) for supra-threshold stimulation, ≥ *<sup>b</sup>* (cf. Equation 4). Biologically, if the linear inputs are transmitted despite the dendritic sodium spike and are not shadowed by, e.g., an NMDA spike, they may lead to a second, later peak depolarization after the one generated by the sodium spike. Since our models replace depolarizations by jumps to the peak depolarization, we have to account for the later peak as soon as it exceeds the earlier one. In this part, we thus assume that if the synchronous input is so large that the depolarization it generates upon linear summation exceeds the depolarization κ generated by the dendritic spike, this former is considered as the effect of the input. In other words, we assume that the dendritic modulation function continues linearly beyond κ, i.e., we define

$$\sigma\_{NL}'(\epsilon) = \begin{cases} \epsilon & \text{for} \quad \epsilon \le \Theta\_b \\ \kappa & \text{for} \quad \Theta\_b \le \epsilon \le \kappa \\ \epsilon & \text{for} \quad \epsilon \ge \kappa \end{cases} \tag{94}$$

(cf. inset of **Figure 12A**).

The iterated map, mapping the number of active neurons in layer *i* to the average number of active neurons in layer *i* + 1 may now have (depending on the system parameters) between one and five fixed points (cf. **Figure 12**). As before, *G*<sup>0</sup> = 0 is a trivial fixed point corresponding to the level of absent activity and the only fixed point of the iterated map for small connectivity *p*. With increasing connectivity *p*, two additional pairs of fixed points *G*<sup>1</sup> ≤ *G*<sup>2</sup> and *G*<sup>3</sup> ≤ *G*<sup>4</sup> appear via tangent bifurcations. The first pair of fixed points, *G*<sup>1</sup> and *G*2, correspond to the propagation of synchrony mediated by non-additive dendritic interactions (as discussed in section 3.1), the second pair, *G*<sup>3</sup> and *G*4, correspond to propagation of synchrony mediated by linearly processed inputs (as discussed in section 3.2). By further increasing the connectivity *p*, the fixed points *G*<sup>2</sup> and *G*<sup>3</sup> disappear via a tangent bifurcation (cf. **Figure 12A**). Within the region, where five fixed points exists, both types of propagation of synchrony coexists (illustrated in **Figures 12B–D**): Synchronized pulses of size *g*<sup>0</sup> < *G*<sup>1</sup> typically decay to zero after a small number of layers. Pulse sizes with *G*<sup>1</sup> < *g*<sup>0</sup> < *G*<sup>3</sup> typically initiate propagation of synchrony with an average pulse size around *G*<sup>2</sup> (where the propagation is mediated by nonadditive dendritic interactions) and synchronous pulses of size *g*<sup>0</sup> > *G*<sup>3</sup> typically initiate propagation of synchrony with average pulse sizes around *G*<sup>4</sup> (linear propagation). For sufficiently large *p*, i.e., the fixed points *G*<sup>2</sup> and *G*<sup>3</sup> disappeared, a synchronized pulse of size *g*<sup>0</sup> ≥ *G*<sup>1</sup> will initiate propagation of synchrony with pulse sizes around *G*4; in this parameter region the nonadditive dendritic interactions essentially increase the basin of attraction of *G*4.

Within the framework of our analytical tractable model, we neglect, e.g., the initiation time of a dendritic spike (in our model non-linear amplifications are instantaneous) or the different shapes of potential deflections caused by linearly and nonlinearly processed inputs. Therefore, propagating synchronous signals mediated either by linear or non-linear dendrites differ only in their size. In biological more detailed models (briefly discussed in section 3.3.3 below) both propagation types will be more distinct, e.g., the propagation frequency (speed) and the quality of synchrony of the propagating pulses are different (see also Jahnke et al., 2012).

#### *3.3.3. Biological more detailed models*

The model we mainly consider in this article has the advantage of being analytically tractable. Here we ask whether it over-simplifies the considered systems. More precisely, we study whether the results derived above, in particular the analytical estimates for the critical connectivity, generalize to biologically more detailed models.

**FIGURE 12 | Coexistence of linear and non-linear propagation. (A)** Bifurcation diagram obtained from Equation (13) for an FFN (ω = 150, = 0.225 mV) with a non-linear modulation function σ *NL* with incomplete saturation [cf. Equation (94) and inset]. Panel **(B)** shows the iterated maps (Equation 13) for *p* = 0.5 with the different non-linear modulation functions considered in this article (linear coupling: green,dashed; non-linear coupling σ*NL*: red, dashed; modified non-linear coupling σ *NL*: blue). Panel **(C)** depicts

the development of the size of the synchronous pulse along the layers of the FFN (single trials). The blue and yellow regions are the basins of attraction of *G*<sup>2</sup> and *G*4, respectively, derived from the data in panel **(B)**. Panel **(D)** shows the probability *p*conv of converging to the linear propagation regime (yellow area, blue line) and the non-linear propagation regime (blue area, red line) after *m* = 20 layers (*p*conv is obtained from *n* = 150 runs with different networks and initial conditions).

The main assumption underlying our analysis of linearly coupled networks is a very general one, namely that synchronous single inputs sum up linearly: we assumed that the spiking probability *pf*(·) of a neuron due to the reception of *x* synchronous inputs of size equals the spiking probability due to the reception of one single input of size *y* = *x*. Therefore, the results will hold also for more complex neuron models, as long as the effect of a synchronous input pulse is approximately the sum of the effects of single inputs. In particular, if the spiking probability due to an input of strength *x*, *pf*(*x*), is sufficiently slowly changing with *x*, according to Equation (24) the critical connectivity scales like *p*<sup>∗</sup> *<sup>L</sup>* ∝ (ω) <sup>−</sup><sup>1</sup> for sufficiently large layer sizes and small coupling strengths. To fully compute the critical connectivity, the actual form of *pf*(·) has to be known. Our leaky integrate-and-fire neuron with infinitesimally short current pulses approximates the behavior of a wide class of neuron models for which an analytical derivation of *pf*(·) is impossible. Still even for more detailed models, *pf*(·) is accessible for measurements in single neuron (computer) experiments.

In **Figure 13** we verify our predictions exemplary for two types of neuron models: We employ a model of conductance based leaky integrate-and-fire-type neurons with exponential input conductances (CB-type; see Appendix) and a Hodgkin-Huxleytype neuron model with alpha-function shaped input currents (HH-type; see Appendix). The post-synaptic potential induced by single excitatory inputs is shown in panels (a) and (b) and the scaling of the critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* with ω in panel (c): the scaling of *p*<sup>∗</sup> *<sup>L</sup>* is well described by *<sup>p</sup>*<sup>∗</sup> *<sup>L</sup>* ∝ (ω) −1.

The main assumptions underlying our analysis of non-linearly coupled networks are (1) that the maximal spiking probability due to inputs which are subthreshold relative to the dendritic threshold, *pf* (*b*), is significantly smaller than the spiking probability due to a suprathreshold input, *pf* (κ), and (2) that the temporal jitter of somatic spikes evoked by suprathreshold inputs is small such that synchronized inputs stay synchronized. Both conditions have been found to be satisfied in biological neurons (e.g., Ariav et al., 2003). Therefore, Equation (71) specifying the critical connectivity *p*<sup>∗</sup> *NL* also holds for more detailed neuron models if these models incorporate biologically plausible features of fast dendritic spikes. To obtain a quantitative prediction of *p*<sup>∗</sup> *NL*, it is sufficient to estimate (a) the number of inputs needed to elicit a dendritic spike, *<sup>b</sup>*/, (b) the layer size ω, and (c) the spiking probability due to the reception of a total input that is sufficiently strong to elicit a dendritic spike.

To investigate the scaling of the critical connectivity *p*<sup>∗</sup> *NL* in direct numerical simulations, we account for the effects of dendritic spikes in the CB-type and HH-type: When the total excitatory input within the dendritic integration window exceeds the dendritic threshold level, a current pulse modeling the effect of a dendritic spike is initiated and causes an additional

**FIGURE 13 | Same scaling of propagating regime for networks of biologically more detailed neuron models. (A,B)** Time course of the membrane potential of single neurons receiving inputs that are sufficiently strong to elicit a dendritic spike, with (non-linear model) and without (linear model) dendritic spike generation mechanism, for **(A)** a conductance based LIF-type neuron (henceforth: CB-type), and **(B)** a Hodgkin–Huxley-type neuron (HH-type). The insets show the observed peak of the induced postsynaptic potential (pEPSP) vs. the pEPSP expected from linear input summation (equivalent to the dendritic modulation function in the analytically tractable model). **(C)** Critical connectivity *p*<sup>∗</sup> *<sup>L</sup>* vs. ω in linearly coupled networks. For each value ω, we evaluated the critical connectivity for four different group sizes ω = 100, 300, 500, 700 and four different coupling strengths = 0.3, 0.6, 0.9, 1.2 nS (CB-type; squares; lower horizontal axis) and = 9, 18, 27, 36 pA (HH-type; crosses; upper horizontal axis), respectively. The lines are fitted functions of the form (λω) <sup>−</sup>1. The analytical estimate given by Equation (24) holds in the limit of large layer sizes ω and small couplings ,

therefore we exclude data points from the fitting where a single input yields an EPSP larger than 0.6 mV (CB-type: ≥ 1.4 nS; HH-type: ≥ 46 pA; these points are marked in gray). **(D,E)** Probability distribution of somatic spike times after stimulation of the neuron by an input which is sufficiently strong to generate a dendritic spike (**D**: CB-type, **E**: HH-type). We show exemplary two different configurations for the external inputs, which result in a total somatic spiking probability after dendritic spike generation of *p<sup>f</sup>* ≈ 0.97 (solid lines; set 1) and *p<sup>f</sup>* ≈ 0.67 (dashed lines; set 2). *p<sup>f</sup>* equals the saturation level of the corresponding cumulative distribution function (shown in the insets). **(F)** Critical connectivity *p*<sup>∗</sup> *NL* vs. group size ω (lower horizontal scale) and coupling strength normalized by threshold *<sup>b</sup>* (upper horizontal scale), respectively. The theoretical estimate of *p*<sup>∗</sup> *NL* (cf. Equation 71) is a function of ω, *<sup>b</sup>*/ and *p<sup>f</sup>* , therefore the predictions agree for both models and the data from direct numerical simulations are consistent with the theoretical predictions. [All simulations of FFNs in this figure are obtained for inhomogeneous delay distribution with *T* = 1 ms (cf. Equation 83)].

depolarization of the soma of the post-synaptic neuron (see Appendix for details; cf. also section 3.3.1). In **Figure 13** we compare the results of direct numerical simulations with the estimate given by Equation (71). The post-synaptic potential induced by single excitatory inputs is shown in panels (A) and (B). Panel (D) and (E) shows the spiking probability of a single neuron (in the ground state of the FFN), *p<sup>f</sup>* , due to an input exceeding the dendritic threshold level; as examples we present two different setups with *p<sup>f</sup>* = {0.67, 0.97}. Panel (F) shows the scaling of *p*<sup>∗</sup> NL with layer size and coupling strength and the good agreement of the analytical estimate with direct numerical simulations.

## **4. SUMMARY AND CONCLUSIONS**

Propagation of synchrony in feed-forward sub-structures that are embedded in randomly connected recurrent networks has been a research topic for more than two decades now [see, e.g., review on this topic (Kumar et al., 2010)] and it is hypothesized that such propagation possibly explain the emergence of spatio-temporal spike patterns and information transmission.

In this article, we have analyzed diluted FFNs and investigated their capability to propagate synchrony. In addition to conventional additive (linear) input processing at single neurons, we considered non-additive dendritic interactions modeling the impact of fast dendritic spikes (Ariav et al., 2003; Gasparini et al., 2004; Polsky et al., 2004; Gasparini and Magee, 2006). We emulated the influence of the embedding recurrent network which establishes the irregular ground state in the FFN, by random Poissonian inputs (van Vreeswijk and Sompolinsky, 1996, 1998; Brunel, 2000). This approach does not account for back-reactions of activity within the FFN on the embedding network. It is justified as long as the connectivity and connection strength between the neurons of the FFN and the embedding network is low and weak compared to the feed-forward connectivity and connection strength. The back-reaction then influences the activity of the embedding network only weakly and a robust propagation of synchrony can be achieved (Vogels and Abbott, 2005; Kumar et al., 2008; Jahnke et al., 2012). Yet, if the condition is not met, synchronous activity within the FFN may spread out over the embedding network and potentially cause pathological activity ("synfire-explosions") (Mehring et al., 2003). For specifically structured networks also more complex interactions are possible, such as an enhancement of propagating synchrony (manuscript in preparation).

In the main part of the article, we studied the propagation of synchrony employing leaky integrate-and-fire neurons in the limit of temporally short synaptic inputs and homogeneous synaptic delays. Synchronous pulses consist of exactly synchronized (simultaneous) spikes. This allows to investigate propagation of synchrony by considering the size of a synchronized pulse only, so that the analysis becomes analytically tractable. Nevertheless, in the second part of our article we also consider systems with heterogeneous coupling delays and temporally extended interactions. In agreement with the literature (e.g., Diesmann et al., 1999; Gewaltig et al., 2001; Goedeke and Diesmann, 2008), we observe that pulse packets tend to synchronize along the layers of the FFN so that the results of our simplified description are directly applicable.

We derived scaling laws as well as quantitative estimates for the critical connectivity marking the bifurcation point between the regime where robust propagation of synchrony is possible and where it is not. In particular, based on a suitable series expansion we have shown that for linearly coupled FFNs the critical connectivity decays inversely proportional to layer size and coupling strength. Moreover, the proportionality factor can be estimated from the ground state properties of the single neurons. The estimate agrees with direct numerical simulations within the biologically relevant parameter regime where (a) the spontaneous firing rate of the neurons is low and (b) the distribution of membrane potentials is broad (each neuron receives a huge number of almost random presynaptic inputs). If a synchronous pulse propagates along the layers of a linearly coupled FFN, most of the neurons of each layer participate in the propagation of synchrony, independent of the actual layer size, coupling strength or layout of the external network.

For neurons incorporating non-additive dendritic interactions, the spiking probability as a function of the dendritic stimulation becomes discontinuous. Therefore, the analytical estimation of the critical connectivity in non-linearly coupled FFNs required a different approach than the treatment of linearly coupled FFNs. We have shown that the critical connectivity decays inversely proportional to the layer size (as in linearly coupled FFNs), and we have derived the dependence on the coupling strength which is more complicated. The critical connectivity is completely determined by layer size, spiking probability of the single neuron upon the reception of a non-linearly enhanced presynaptic input and the number of inputs required to reach the dendritic threshold. Our results indicate that in presence of non-linear dendrites, neurons process synchronous inputs similar to threshold units. Such units have been previously used as simplified rate neuron models to study activity propagation in discrete time, e.g., in Nowotny and Huerta (2003); Leibold and Kempter (2006); Cayco-Gajic and Shea-Brown (2013). Because the non-linear modulation function saturates, FFNs with nonadditive dendritic interactions allow for a sparser coding, i.e., only a sub-fraction of each layer (the actual size depends on the non-linear enhancement level) participates in the propagation of synchrony. Whereas stable propagation of synchrony is possible in systems with and without dendritic non-linearities, it occurs in non-linearly coupled FFNs with substantially reduced feed-forward anatomy (reduced connectivity or reduced coupling strength) compared to linearly coupled FFNs.

The analytic derivation of the critical connectivity is based on rather general assumptions: (a) the effect of a synchronous input pulse is approximately the sum of the effects of single inputs and (b) for networks with non-additive dendritic interactions the spiking probability due to non-linearly enhanced input is substantially larger than due to a non-enhanced input. Therefore the predictions and estimates are directly applicable to networks of biologically more detailed neuron models.

In our article we have shown that even highly diluted feedforward structures are suitable to reliably support the directed and constrained propagation of synchronous activity. Such structures occur naturally in sparse, random recurrent networks which are typical for the cortex. These structures might be enhanced by simple synaptic plasticity to enable synchrony propagation. Fast dendritic spikes promote this propagation, as they selectively amplify synchronous inputs and are only weakly influenced by irregular background activity.

Indeed, important candidate regions for the generation of propagating synchrony such as the hippocampus and other, neocortical regions exhibiting replay of activity (Nadasdy et al., 1999; Lee and Wilson, 2002; Ji and Wilson, 2007; Xu et al., 2011; Eagleman and Dragoi, 2012) are sparse and show synaptic plasticity (Debanne et al., 1998; Kobayashi and Poo, 2004). Dendritic spikes as prominently found in, e.g., the hippocampus (Ariav et al., 2003; Gasparini et al., 2004; Polsky et al., 2004; Gasparini and Magee, 2006) trigger depolarizations and calcium influx sufficient to change synaptic strengths (Golding et al., 2002; Remy and Spruston, 2007) and the dendrites itself exhibit branch "strength potentiation," i.e., the strength of a dendritic spike on a dendritic branch exhibits experience- and activity-dependent plasticity (Losonczy et al., 2008; Makara et al., 2009; Müller et al., 2012).

Our work indicates that fast dendritic spikes reduce the required synaptic strength and connection density for replay of spike patterns. Moreover, their saturation and the resulting sparse coding might explain the observed variability during replay. Thus, in particular, our understanding of propagation along diluted feed-forward chains may now be combined with knowledge on synaptic plasticity and generation of activity accompanying replay (e.g., sharp wave/ripples) to gain an integrated mechanistic understanding for encoding, replay and memory transfer.

## **ACKNOWLEDGMENTS**

This work was supported by the BMBF (Grant No. 01GQ1005B) [Sven Jahnke, Marc Timme], the DFG (Grant No. TI 629/3- 1) [Sven Jahnke], the Swartz Foundation [Raoul-Martin Memmesheimer], and the Max Planck Society [Marc Timme]. Simulation results of networks with biologically more complex neuron models were obtained using the simulation software NEST (Gewaltig and Diesmann, 2007). Sven Jahnke thanks Harold Gutch, Elian Moritz, and Jonna Jahnke for stimulating discussions.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 June 2013; accepted: 11 October 2013; published online: November 2013. 15*

*Citation: Jahnke S, Memmesheimer R-M and Timme M (2013) Propagating synchrony in feed-forward networks. Front. Comput. Neurosci. 7:153. doi: 10.3389/fncom. 2013.00153*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Jahnke, Memmesheimer and Timme. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **A. APPENDIX**

**A.1 PROOF OF EXISTENCE OF A GLOBAL MINIMUM OF** *PNL(n)* We will show that *pNL*(*n*) as derived in Equation (59),

$$p\_{NL}(n) = \frac{n^2 \epsilon + 2\Theta\_b + n\sqrt{n^2 \epsilon^2 + 4\Theta\_b \left(\epsilon - \frac{\Theta\_b}{\omega}\right)}}{p\_f(\kappa)\epsilon(n^2 + \omega)\left(1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)\right)} \quad \text{(A.1)}$$

$$= \frac{1}{p\_f(\kappa)} \frac{2\Theta\_b + n^2 \epsilon\left(1 + \sqrt{1 + \frac{\omega}{n^2}}\right)}{\left(1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)\right)\left(n^2 \epsilon + \omega \epsilon\right)}, \qquad \text{(A.2)}$$

2

has a global minimum for ω > *<sup>b</sup>*. In Equation (A.2) we defined

$$
\alpha := \frac{4\Theta\_b}{\epsilon} \left( 1 - \frac{\Theta\_b}{\epsilon \alpha} \right). \tag{A.3}
$$

For ω > *<sup>b</sup>*, *pNL* is positive and continuous, and approaches

$$\lim\_{n \to -\infty} \left( p\_{NL}(n) \right) = \infty,\tag{A.4}$$

$$\lim\_{n \to \infty} \left( p\_{NL}(n) \right) = \frac{1}{p\_f(\kappa)},\tag{A.5}$$

in the limit of large/small *n*. Further, the derivative of *pNL* can be written as

$$\frac{d}{dn}p\_{\rm NL}(n) = \left(2 - h\_1(n)\right)h\_2(n),\tag{A.6}$$

where we defined the functions

$$h\_{1}(n) = \frac{1}{\epsilon\omega} \left( \frac{\sqrt{\frac{2}{\pi}}e^{-\frac{u^{2}}{2}} \left(n^{2} + \omega\right) \left(\frac{2\Theta\_{b}}{\sqrt{a+n^{2}}+n} + n\epsilon\right)}{1 + \text{Erf}\left(\frac{n}{\sqrt{2}}\right)} \right.$$

$$+ \frac{4\Theta\_{b}n}{\sqrt{a+n^{2}}+n} + \frac{\alpha\epsilon\left(n^{2}+\omega\right)}{\alpha + n\left(\sqrt{a+n^{2}}+n\right)} \right), \text{(A.7)}$$

$$\alpha\left(\sqrt{a+n^{2}}+n\right)$$

$$h\_2(n) = \frac{\alpha \left(\sqrt{\alpha + n^2} + n\right)}{p\_f(\kappa) \left(\text{Erf}\left(\frac{n}{\sqrt{2}}\right) + 1\right) \left(n^2 + \alpha\right)^2}. \tag{A.8}$$

For *n* > 0 and ω > *b*,

$$
\alpha > 0,\tag{A.9}
$$

$$h\_1(n) > 0,\tag{A.10}$$

$$h\_2(n) \,>\, 0,\tag{A.11}$$

and in the limit of large *n*,

$$\lim\_{n \to \infty} h\_1(n) = \frac{1}{\epsilon \alpha} \left( 0 + 2\Theta\_b + \frac{\alpha \epsilon}{2} \right) \qquad \qquad (\text{A.12})$$

$$=2\frac{2\Theta\_{b\Theta}\epsilon-\Theta\_b^2}{\omega^2\epsilon^2}\tag{A.13}$$

$$\lim\_{n \to \infty} h\_2(n) = 0.\tag{A.14}$$

For ω > *<sup>b</sup>*, *h*1(*n*) is smaller than two for sufficiently large *n* (cf. Equation A.13) and thus the derivative of *pNL*(*n*) becomes positive (cf. Equation A.6). Consequently *pNL* approaches 1/*pf* (κ) from below for large *n* (cf. also Equation A.5). This proves the existence of a global minimum of *pNL*(*n*), because *pNL* > 1/*pf* (κ) for sufficiently small *n* (cf. Equation A.4).

#### **A.2 BIOLOGICAL MORE DETAILED NEURON MODELS**

In section 3.3.3 we consider biologically more detailed neuron models. In this appendix we present descriptions of these models including the parameters used for the numerical simulations in **Figure 13**. These simulations were done using NEST (Gewaltig and Diesmann, 2007), a simulator for spiking neural network models (available at http://www.nest-initiative.org). We implemented new model classes within the NEST framework to handle conductance-based leaky integrate-and-fire neurons with double exponential input conductances as well as nonlinear dendritic interactions (source code available from Sven Jahnke).

#### *A.2.1 CB-type model*

The CB-type model is a leaky integrate-and-fire neuron with conductance based synapses, augmented with a mechanism for the generation of current pulses mimicking the effect of a dendritic spike (see also Memmesheimer, 2010; Jahnke et al., 2012). The subthreshold dynamics of the membrane potential *Vl* of neuron *l* obeys the differential equation

$$\mathcal{C}\_{l}^{\rm m}\frac{dV\_{l}(t)}{dt} = \mathcal{g}\_{l}^{L}\left(V\_{l}^{\rm rest} - V\_{l}(t)\right) + \mathcal{g}\_{l}^{A}(t)\left(E^{\rm Ex} - V\_{l}(t)\right)$$

$$+ \mathcal{g}\_{l}^{G}(t)\left(E^{\rm In} - V\_{l}(t)\right) + I\_{l}^{\rm DS}(t) + I\_{l}^{0}.\quad(A.15)$$

Here, *C*<sup>m</sup> *<sup>l</sup>* is the membrane capacity, *<sup>g</sup><sup>L</sup> <sup>l</sup>* is the resting conductance, *V*rest *<sup>l</sup>* is the resting membrane potential, *<sup>E</sup>*Ex and *<sup>E</sup>*In are the reversal potentials, and *g<sup>A</sup> <sup>l</sup>* (*t*) and *<sup>g</sup><sup>G</sup> <sup>l</sup>* (*t*) are the conductances of excitatory and inhibitory synaptic populations, respectively. *I*DS *<sup>l</sup>* (*t*) models the current pulses caused by dendritic spikes and *I*0 *<sup>l</sup>* is a constant current gathering slow external and internal currents. The time course of single synaptic conductances contributing to *g<sup>A</sup> <sup>l</sup>* (*t*) and *<sup>g</sup><sup>G</sup> <sup>l</sup>* (*t*) is given by the difference between two exponential functions (e.g., Dayan and Abbott, 2001) with time constants τ*A*, <sup>1</sup> and τ*A*, <sup>2</sup> for the excitatory and τ*G*, <sup>1</sup> and τ*G*, <sup>2</sup> for the inhibitory conductances. Whenever the membrane potential reaches the spike threshold *<sup>l</sup>*, the neuron sends a spike to its postsynaptic neurons, is reset to *V*reset *<sup>l</sup>* and becomes refractory for a period *t* ref *<sup>l</sup>* . Additionally to inputs from the preceding layer each neuron receives excitatory and inhibitory Poissonian input spike trains with rates νex and νin; single inputs have coupling strength ex and in, respectively.

To account for dendritic spike generation, we consider the sum *gl*,*<sup>t</sup>* of excitatory input strengths (characterized by the coupling strengths), arriving at an excitatory neuron *l* within the time window *t* for non-linear dendritic interactions,

$$\mathfrak{g}\_{,\Delta t}(t) = \sum\_{j} \sum\_{k} \epsilon\_{lj} \chi\_{[t-\Delta t, \, t]}(t\_{jk}^{f} + \mathfrak{r}),\tag{A.16}$$

where χ[*<sup>t</sup>* <sup>−</sup> *t*, *<sup>t</sup>*] is the characteristic function of the interval [*t* − *t*,*t*], *t f jk* is the *k*th firing time of neuron *j* and τ denotes the synaptic delay. We denote the peak conductance (coupling strength) for a connection from neuron *j* to neuron *l* by *g*max *lj* . If *gl*, *<sup>t</sup>* exceeds a threshold *g*-, a dendritic spike is initiated and the dendrite becomes refractory for a time window *t* DS,ref. The effect of the dendritic spike is incorporated into the model by the current pulse that reaches the soma a time τDS thereafter. This current pulse is modeled as the sum of three exponential functions,

$$I\_l^{\rm DS}(t) = c(\mathcal{g}\_{\Delta t}) \left[ -A e^{-\frac{l}{\rm T\mathcal{S}, \Gamma}} + B e^{-\frac{l}{\rm T\mathcal{S}, \Sigma}} - C e^{-\frac{l}{\rm T\mathcal{S}, \Gamma}} \right], (\text{A.17})$$

with prefactors *A* > 0, *B* > 0, *C* > 0, decay time constants τDS,1, τDS,2, τDS,<sup>3</sup> and a dimensionless correction factor *c g<sup>t</sup>* , where *g<sup>t</sup>* is the summed excitatory input at the initiation time of the dendritic spike as given by Equation (A.16). The factor *c g<sup>t</sup>* modulates the pulse strength, ensuring that the peak of the excitatory postsynaptic potential (pEPSP) reaches the experimentally observed region of saturation. At very high excitatory inputs, the conventionally generated depolarization exceeds the level of saturation, *c g<sup>t</sup>* is zero and the pEPSP increases (cf. inset of **Figure 13A**).

## *Parameters for Figure 13*

The single neuron parameters for the numerical simulations are *C*<sup>m</sup> *<sup>l</sup>* <sup>=</sup> *<sup>C</sup>*<sup>m</sup> <sup>=</sup> 400 pF, *<sup>g</sup><sup>L</sup> <sup>l</sup>* <sup>=</sup> *<sup>g</sup><sup>L</sup>* <sup>=</sup> 25 nS, *V*rest *<sup>l</sup>* <sup>=</sup> *<sup>V</sup>*rest = −65 mV, *<sup>l</sup>* = - = −50 mV, *t* ref *<sup>l</sup>* = *t* ref = 3 ms and *V*reset *<sup>l</sup>* <sup>=</sup> *<sup>V</sup>*reset = −65 mV. The reversal potentials are *E*Ex = 0 mV and *E*In = −75 mV and the time constants for the excitatory and inhibitory conductances are τ*A*,<sup>1</sup> = τ*G*,<sup>1</sup> = 2.5 ms and τ*A*,<sup>2</sup> = τ*G*,<sup>2</sup> = 0.5 ms. The parameters of the dendritic spike current are *t* = 2 ms, *g*- = 8.65 nS, τDS = 2.7 ms, *A* = 55 nA, *B* = 64 nA, *C* = 9 nA, τDS,<sup>1</sup> = 0.2 ms, τDS,<sup>2</sup> = 0.3 ms, τDS,<sup>3</sup> = 0.7 ms and *t* ref, DS = 5.2 ms and the dimensionless correction factor is given by *c*(*g*) = max / 1.5 − *g* · 0.053nS−1, 0 0 . For the first setup (*p<sup>f</sup>* ≈ 0.97) we set *I*<sup>0</sup> *<sup>l</sup>* <sup>=</sup> *<sup>I</sup>*<sup>0</sup> <sup>=</sup> 250 pA, νex = 2.4 kHz, νin = 0.6 kHz, ex = 0.6 nS and in = 6.6 nS; for the second setup (*p<sup>f</sup>* ≈ 0.67) we set *I*<sup>0</sup> *<sup>l</sup>* <sup>=</sup> *<sup>I</sup>*<sup>0</sup> <sup>=</sup> 0 pA, νex = 20 kHz, νin = 5 kHz, ex = 0.6 nS and in = −6.6 nS.

#### *A.2.2 HH-type model*

We employ a standard model provided by NEST ("hh\_psc\_alpha"; Hodgkin–Huxley type neuron with alphafunction shaped postsynaptic currents) and incorporated a dendritic spike current as in the CB-Model. The membrane potential *Vl* of neuron *l* obeys the differential equation

$$\mathbf{C}\_{l}^{\rm m}\frac{dV\_{l}(t)}{dt} = I\_{l}^{\rm Na}(t) + I\_{l}^{\rm K}(t) + I\_{l}^{\rm L}(t) + I\_{l}^{0}$$

$$+ I\_{l}^{\rm ex}(t) + I\_{l}^{\rm in}(t) + I\_{l}^{\rm DS}(t). \tag{A.18}$$

For clarity we drop the index *l* in the following; all quantities refer to some neuron *l*. In Equation (A.18),

$$I^{\rm Na}(t) = g^{\rm Na} m(t)^3 h(t) \left[ E^{\rm Na} - V(t) \right] \tag{A.19}$$

$$I^{\mathcal{K}}(t) = \mathcal{g}^{\mathcal{K}} n(t)^4 \left[ E^{\mathcal{K}} - V(t) \right] \tag{A.20}$$

$$I^{\mathcal{L}}(t) = \mathcal{g}^{\mathcal{L}} \left[ E^{\mathcal{L}} - V(t) \right] \tag{A.21}$$

specify the Na+current, the K<sup>+</sup> current and leak current. The dynamics of the gating variables *m*, *n* and *h* are governed by

$$\frac{dm(t)}{dt} = \alpha\_{\rm m}(t) \left[1 - m(t)\right] - \beta\_{\rm m}(t)m(t) \tag{A.22}$$

$$\frac{dh(t)}{dt} = \alpha\_{\rm h}(t) \left[1 - h(t)\right] - \beta\_{\rm h}(t)h(t) \tag{A.23}$$

$$\frac{dn(t)}{dt} = \alpha\_{\text{n}}(t) \left[1 - n(t)\right] - \beta\_{\text{n}}(t)n(t),\tag{A.24}$$

where the voltage dependencies are given by

$$\alpha\_n(t) = \frac{0.01\left[\mathring{V}(t) + 55\right]}{1 - \exp\left[-\frac{\mathring{V}(t) + 55}{10}\right]}\tag{A.25}$$

$$\beta\_n(t) = 0.125 \cdot \exp\left[-\frac{\tilde{V}(t) + 65}{80}\right] \tag{A.26}$$

$$\alpha\_m(t) = \frac{0.1\left[\tilde{V}(t) + 40\right]}{1 - \exp\left[-\frac{V(t) + 40}{10}\right]}\tag{A.27}$$

$$\beta\_m(t) = 4 \cdot \exp\left[-\frac{\tilde{V}(t) + 65}{18}\right] \tag{A.28}$$

$$\alpha\_h(t) = 0.07 \cdot \exp\left[-\frac{\tilde{V}(t) + 65}{20}\right] \tag{A.29}$$

$$\beta\_h(t) = \left(1 + \exp\left[-\frac{\tilde{V}(t) + 35}{10}\right]\right)^{-1}.\qquad(A.30)$$

In Equations (A.25–A.30) *V*˜ (*t*) := *<sup>V</sup>*(*t*) 1mV is the value of membrane potential normalized by 1 mV. Spikes are detected by a combined threshold-and-local-maximum search, if there is a local maximum above a certain threshold of the membrane potential, *U*- = 0 mV, it is considered a spike (for more details see the NEST manual and the model implementation available at http://www.nest-initiative.org). After a synaptic delay time τ a spike initiates an alpha-function shaped current pulse at the postsynaptic neurons. The total excitatory and inhibitory input to neuron *l* is given by

$$I^{\rm ex}(t) = \sum\_{k} \epsilon\_{k}^{\rm ex} \frac{e}{\mathbf{r}^{\rm ex}} \exp\left[-\frac{t}{\mathbf{r}^{\rm ex}}\right] \Theta\left[t - t\_{k}^{\rm ex}\right] \qquad \text{(A.31)}$$

$$I^{\rm in}(t) = \sum\_{k} \epsilon\_k^{\rm in} \frac{e}{\mathfrak{r}^{\rm in}} \exp\left[-\frac{t}{\mathfrak{r}^{\rm in}}\right] \Theta\left[t - t\_k^{\rm in}\right], \qquad \text{(A.32)}$$

where ex *<sup>k</sup>* > 0 in *<sup>k</sup>* < 0 is the strength of the *k*th arriving excitatory (inhibitory) spike at neuron *l*, *t* ex *k t* in *k* denotes the reception time of that spike and *e* is the Euler constant [the currents *I*ex(*t*) and *I*in(*t*) are normalized such that an input of strength = 1 pA causes a peak current of 1 pA]. The time constants τex and τin are the synaptic time constants. As before, we account for dendritic spike generation by considering the sum of excitatory input strengths received by neuron *l* within the time window *t*,

$$\epsilon\_{\Delta t}(t) = \sum\_{k} \epsilon\_{k}^{\text{ex}} \chi\_{[t-\Delta t, t]}(t\_{k}^{\text{f}} + \mathfrak{r}).\tag{A.33}$$

If this sum exceeds the dendritic threshold *I*-, a dendritic spike is initiated and we model its effect is by the current pulse

$$I^{\rm DS}(t) = c(\epsilon\_{\Delta t}) \left[ -A e^{-\frac{t}{\rm T^{\rm DS}}} + B e^{-\frac{t}{\rm T^{\rm DS}}} - C e^{-\frac{t}{\rm T^{\rm DS}}} \right], (\text{A.34})$$

starting after a delay time τDS after the initiation time of the dendritic spike. The correction factor *c* (*t*) modulates the pulse strength such that the depolarization saturates for suprathreshold inputs until the effects of linearly summed input exceed the effects of the dendritic spike (cf. inset of **Figure 13B**).

## *A.2.3 Parameters for Figure 13*

As before, we consider homogeneous neuronal properties. The single neuron parameters for the numerical simulations are *C*<sup>m</sup> = 200 pF, *E*<sup>K</sup> = −77 mV, *E*<sup>L</sup> = −70 mV, *E*Na = 50 mV, *g*<sup>K</sup> = 3600 nS, *g*<sup>L</sup> = 30 nS, *g*Na = 12000 nS, τex = 2 ms and τin = 2 ms. The parameters of the dendritic spike current are *t* = 3.5 ms, *I*- = 270 pA, τDS = 2.7 ms, *A* = 27.5 nA, *B* = 32 nA, *C* = 4.5 nA, τDS,<sup>1</sup> = 0.2 ms, τDS,<sup>2</sup> = 0.3 ms, τDS,<sup>3</sup> = 0.7 ms and *t* ref,DS = 5.2 ms and the dimensionless correction factor is given by *c*() = max / 1.54 − · 0.002 pA−1, 0 0 . For the first setup (*p<sup>f</sup>* ≈ 0.97) we set *I*<sup>0</sup> = 500 pA, νex = 3 kHz, νin = 3 kHz, ex = 20 pA and in = −20 pA; and for the second setup (*p<sup>f</sup>* ≈ 0.67) we set *I*<sup>0</sup> = 250 pA, νex = 10 kHz, νin = 10 kHz, ex = 20 pA and in = −20 pA.

## Simultaneous stability and sensitivity in model cortical networks is achieved through anti-correlations between the in- and out-degree of connectivity

#### *Juan C. Vasquez <sup>1</sup> \*, Arthur R. Houweling2 and Paul Tiesinga1 \**

*<sup>1</sup> Department of Neuroinformatics, Donders Institute for Brain, Cognition and Behavior, Radboud University Nijmegen, Nijmegen, Netherlands <sup>2</sup> Department of Neuroscience, Erasmus Medical Center, Rotterdam, Netherlands*

#### *Edited by:*

*Robert Rosenbaum, University of Pittsburgh, USA*

#### *Reviewed by:*

*Takuma Tanaka, Tokyo Institute of Technology, Japan Xin Tian, Tianjin Medical University, China*

#### *\*Correspondence:*

*Juan C. Vasquez and Paul Tiesinga, Department of Neuroinformatics, Donders Institute for Brain, Cognition and Behavior, Radboud University Nijmegen, Heyendaalseweg 135, Postvak 66, 6525 AJ Nijmegen, Netherlands e-mail: jc.vasquez@science.ru.nl; p.tiesinga@science.ru.nl*

Neuronal networks in rodent barrel cortex are characterized by stable low baseline firing rates. However, they are sensitive to the action potentials of single neurons as suggested by recent single-cell stimulation experiments that reported quantifiable behavioral responses in response to short spike trains elicited in single neurons. Hence, these networks are stable against internally generated fluctuations in firing rate but at the same time remain sensitive to similarly-sized externally induced perturbations. We investigated stability and sensitivity in a simple recurrent network of stochastic binary neurons and determined numerically the effects of correlation between the number of afferent ("in-degree") and efferent ("out-degree") connections in neurons. The key advance reported in this work is that anti-correlation between in-/out-degree distributions increased the stability of the network in comparison to networks with no correlation or positive correlations, while being able to achieve the same level of sensitivity. The experimental characterization of degree distributions is difficult because all pre-synaptic and post-synaptic neurons have to be identified and counted. We explored whether the statistics of network motifs, which requires the characterization of connections between small subsets of neurons, could be used to detect evidence for degree anti-correlations. We find that the sample frequency of the 3-neuron "ring" motif (1→2→3→1), can be used to detect degree anti-correlation for sub-networks of size 30 using about 50 samples, which is of significance because the necessary measurements are achievable experimentally in the near future. Taken together, we hypothesize that barrel cortex networks exhibit degree anti-correlations and specific network motif statistics.

**Keywords: barrel cortex, detection threshold, nanostimulation, degree distribution, computational model, network motifs**

## **INTRODUCTION**

Rodents can be trained to use their whiskers to detect an object that predicts a reward and respond with licking to obtain this reward (Huber et al., 2012). The neural responses in barrel cortex to whisker stimulation are hypothesized to play an important role in performing this task (Petersen and Crochet, 2013). Animals can also be trained to detect electrical microstimulation (Butovas and Schwarz, 2007; Houweling and Brecht, 2008) or optogenetic stimulation (Huber et al., 2008) of barrel cortex. Microstimulation activates a large number of neurons that are spatially distributed within a few hundred microns around the stimulating electrode (Histed et al., 2009). An important question is how many neurons need to be activated for the subject to reliably detect the stimulation and whether some cell types are more sensitive than others. Answers to these questions may come from nanostimulation experiments in which a single neuron is activated through juxtacellular stimulation (Houweling and Brecht, 2008). These experiments show that adding trains of 10-15 action potentials in a single cortical neuron can indeed be detected, but the reliability of detection is low and reaction times are long compared to microstimulation.

The spontaneous firing rates in the barrel cortex are low, ranging from less than 1 Hz in the superficial layers to a few Hz in the deep layers (de Kock and Sakmann, 2009; Barth and Poulet, 2012), and whisker stimuli typically evoke a single spike (or none) in responsive neurons. The activity in the low firing rate state (LFS) is also stochastic, both in time as well as across cells, but the precise nature of sparse firing is still being quantified (Barth and Poulet, 2012). For a LFS a single spike could represent a significant perturbation, potentially yielding 28 additional spikes in postsynaptic neurons (London et al., 2010). The network state therefore needs to be stable against small fluctuations that may be amplified through recurrent connectivity. At the same time the aforementioned experiments show that the network is sensitive to small perturbations that are externally generated. Sensitivity and stability are connected and can in general not be optimized at the same time, as the increase in one causes a decrease in the other. Furthermore, stable LFS, in the sense of asynchronous and irregular activity, is difficult to achieve (Kumar et al., 2008).

We use two insights to find the optimal trade-off between stability and sensitivity. First, the external and internal generated firing rate fluctuations may have different statistics. The external perturbation is a train of action potentials [e.g., of 200 ms duration (Houweling and Brecht, 2008)] in a single neuron, thus correlated in time, whereas the internal fluctuations are likely to be of shorter duration and involve a more diverse set of neurons. Second, network structure may be such that these fluctuations have different stability properties (possibly through learning). Our guiding hypothesis is that simultaneous stability and sensitivity are achieved through an anti-correlation between the in- and out-degree of synaptic connectivity between neurons in barrel cortex. Thus, neurons with a low number of synaptic inputs have a high number of synaptic outputs and neurons with a high number of inputs have a low number of outputs. We further hypothesize that such an anti-correlation leads to a distribution of synaptic connectivity motifs that is different than for a random network (Milo et al., 2002). Experiments show that barrel cortical circuits have a motif distribution that is different from random (Song et al., 2005; Perin et al., 2011), whereas theoretical studies show that networks with non-random motif distribution have different synchronization properties (Roxin, 2011; Zhao et al., 2011; Litwin-Kumar and Doiron, 2012) (LaMar and Smith, 2010) and can emerge through synaptic plasticity during reward-based learning (Bourjaily and Miller, 2011a,b). Our work is the first that focuses on the effect on network dynamics of correlations between the in- and out-degree of the same neuron, rather than between in- and/or out-degrees of different neurons, which is referred to as assortativity (Newman, 2010).

Here we test these hypotheses in simplified networks of neurons. In order to focus on the effect of network structure, rather than the full dynamics of spiking neurons, we model neurons as binary units. The inputs to the binary units are determined through a connection matrix with a pre-specified degree distribution generated by a configuration model (Newman, 2010). We first describe how the networks are constructed and then determine (1) their stability in terms of the maximal coupling constant for which the LFS is still stable and (2) their sensitivity to singlecell perturbations using a receiver operating characteristic (ROC) analysis. Finally, we address the issue of how to detect evidence for anti-correlations in the degree distribution experimentally on the basis of sampling sub-networks.

Taken together, we find that anti-correlated networks are more stable than equivalent correlated and uncorrelated networks, but can still reach the same level of sensitivity, which represents a key theoretical advance in terms of a hypothesis for the experimentally observed sensitivity and stability of neuronal networks in the rodent barrel cortex. Furthermore, the hypothesis is of experimental significance, because our analysis shows that correlations in the degree distribution can be detected using sub-networks of sizes that are experimentally accessible in the near future.

#### **METHODS**

#### **NETWORK DYNAMICS**

The model network was composed of *N* binary excitatory neurons, whose state at time *t* is given by *xi*(t), a *N*-dimensional vector with ones for neurons that are active and zeros for ones that are not, here *i* is the index of the neuron. The new state *xi*(t + 1) is obtained in two steps. First, the probability ν*i*,*<sup>t</sup>* <sup>+</sup> <sup>1</sup> of a neuron being active is calculated using Equation (1). Second, for each neuron the firing probability is compared to a random number that is uniformly distributed between 0 and 1. The neuron is set to 1 when the random number is less than or equal to the probability value

$$\nu\_{i,t+1} = \frac{1}{1 + \exp\left(h\_0 - \frac{I}{N p\_c} \sum\_j \boldsymbol{w}\_{ij} \boldsymbol{x}\_{j,t}\right)}\tag{1}$$

The probability has a sigmoidal form, with the exponent consisting of a constant term *h*0, which sets the probability of firing in the absence of inputs from other neurons, and a coupling term representing the network input. The coupling term contains the adjacency matrix *wij*, whose construction is described below, and in which *wij* = 1 if there is an input from neuron *j* to neuron *i* and *wij* = 0 otherwise. The overall probability of a connection is *pc*. Hence the sum across rows of the adjacency matrix is on average *Npc* and we normalize the coupling term by *J*/*Npc* so that *J* then represents the overall coupling strength. The network activity is calculated in time bins that we consider to be 10 ms. The network has a high firing rate state (HFS), in which each neuron is active on each time step, to which the network will converge when enough neurons are active on a previous time step. We are primarily interested in the LFS, in which each neuron fires only in a fraction of the time bins, corresponding to a firing rate of approximately 1 Hz (Barth and Poulet, 2012). Alternatively, in a given time bin, only a fraction of neurons are active.

The network activity is represented by the mean probability of firing of a neuron during a time bin and is calculated as the total number of spikes divided by the number of neurons. When normalized by the bin width, it represents the mean firing rate of a network neuron in spikes per second (Hz).

#### **NETWORK CONNECTIVITY**

Our goal is to determine whether correlations in the in- and outdegree distribution are beneficial in that they increase sensitivity and/or stability relative to uncorrelated networks. Hence, we need a control network without degree correlations. Although the standard random network, Erdos-Renyi (ER) (Newman, 2010), does not have correlations in the degree distribution and is easy to generate samples of, it is not appropriate as a control because it has a sharp degree distribution (see below) and we instead need large variance degree distributions.

For ER networks with a connection probability *p*, the degree distribution (for both out- as well as in-degree) is given by a binomial distribution

$$p(k) = \binom{N-1}{k} p^k (1-p)^{N-1-k} \tag{2}$$

which has a mean of (*N* − 1)*p* and a variance of (*N* − 1)*p*(1 − *p*), which in the limit of large *N* converges to Gaussian distribution

**FIGURE 1 | Construction of networks with a correlation between outand in-degree.** In panels **(A)** to **(D)**, we show scatterplots of the out-degree vs. in-degree, whereas the corresponding marginal distributions are shown for **(E)** the in-degree and **(F)** the out-degree. We considered four types of networks, each with *N* = 2000 neurons and a connection probability of *pc* = 0.05. **(A)** The Erdos-Renyi (ER) network in which each connection is chosen at random with a probability *pc* = 0.05, for which there is no correlation [ρ = 0.0034 (standard deviation: 0.018)] and the relative variance of in- and out-degree across neurons is small for large networks. In order to examine networks with a higher variance of degree values, we first generated a degree distribution in the form of a truncated, bivariate Gaussian. In **(B)** the covariance matrix was diagonal, with equal variance for the out- and in-degrees, which yielded uncorrelated in- and out-degrees [ρ = 0.0010 (0.019)]. To generate correlations we started from a covariance matrix with unequal variances and rotated it by 45 degrees anticlockwise to obtain **(C)** anti-correlated [ρ = 0.821 (0.0085)] and by 45 degrees clockwise to obtain **(D)** correlated degree distributions [ρ = 0.821 (0.0085)]. In the anti-correlated case, nodes with a high out-degree had a low in-degree and vice versa, whereas in the correlated case, nodes with a high out-degree also had a high in-degree, as illustrated by the insets in **(C)** and **(D)**, respectively. **(E,F)** The networks were constructed so that the marginal distributions for the correlated (red), anti-correlated (blue) and uncorrelated (green) case were the same. The ER network (purple) had much tighter marginal distributions.

with a ratio of the standard deviation over the mean of

$$\sqrt{\frac{1-p}{N}}\tag{3}$$

This means that the distribution becomes very tight for large network sizes (**Figures 1A,E,F**).

Hence we generated networks from a truncated bivariate Gaussian for the joint in- and out-degree distribution as explained below (**Figures 1B–F**). We start from a bivariate Gaussian with a diagonal covariance matrix given by

$$p(\mathbf{x}, \boldsymbol{\chi}) = \frac{1}{\sqrt{4\pi^2} \sigma\_{\mathbf{x}} \sigma\_{\mathbf{y}}} \exp\left(-\frac{(\mathbf{x} - \boldsymbol{\mu})^2}{2\sigma\_{\mathbf{x}}^2} - \frac{(\boldsymbol{\chi} - \boldsymbol{\mu})^2}{2\sigma\_{\mathbf{y}}^2}\right) \tag{4}$$

which is rotated across 45 degrees clockwise or anticlockwise to obtain a distribution with positive and negative correlations, respectively. The resulting distribution is truncated below at 1 because the degree cannot be negative and we exclude the case of zero (since a zero degree neuron would not be considered part of the network) and above at twice the mean degree to make the distribution symmetric. The resulting distribution is normalized to make the integral over the positive quadrant equal to one. The short axis is represented by σ<sup>x</sup> and the long axis is represented by σ*y*. The mean degree μ was equal to *Npc*, with a network size *N* = 2000 and connection probability *pc* = 0.05 (Holmgren et al., 2003) this yields μ = 100. The long axis was σ*<sup>y</sup>* = μ/3. The term dispersion refers to the ratio σ*x*/σ*y*, which was set to 0.3 for the standard parameter set.

Correlated degree distributions were obtained by sampling for each neuron *i*, the in- and out-degree from the above bivariate Gaussian, *d*in *<sup>i</sup>* and *<sup>d</sup>*out *<sup>i</sup>* . The simplest method for generating a realization of the corresponding network is the configuration method (Newman, 2010). A list with *d*out *<sup>i</sup>* stubs with value *i*, is made and concatenated into a list *s* out *<sup>k</sup>* . Likewise, a list with *d*in *<sup>i</sup>* stubs with value *i*, is made and concatenated into a list *s* in *k* and randomly permuted. From these two lists, pairs are picked from the same position, i.e., the *k*th stub on the out-list is matched to the *k*th stub on the in-list to make the connection *s* out *<sup>k</sup>* to *s* in *<sup>k</sup>* . This algorithm produces networks with two artifacts, there could be self-connections *s* out *<sup>k</sup>* = *s* in *<sup>k</sup>* , and a given connection could be sampled twice (or more), *s* out *<sup>k</sup>* = *s* out *<sup>l</sup>* and *s* in *<sup>k</sup>* = *s* in *l* . For sparse networks the likelihood of self-edges is small (0.05%), but the probability for multi-edges was larger, around 2.7%. For the cases in which there were multi- or self-edges, we removed the corresponding links.

#### **NETWORK STABILITY**

Cortical networks with a low firing rate need to be stable in the sense that stochastic fluctuations should not lead to large increases in the firing rate that could be detected as a stimulation, resulting in a false positive. We characterized the network stability in three ways.

First, we simulated the network and determined the mean firing rate, averaged across neurons and across time bins, as a function of the coupling strength *J* for various levels of background activity *h*0. To determine both the maximal stability and tease apart the contribution of neuronal heterogeneity and stochasticity to instability, we performed the simulations according to a number of different schemes. We considered the mean field limit, in which the network is taken to be so large that each neuron received the same number of inputs and that the resulting mean firing rate of each neuron was the same. Equation 1 reduces in that case to

$$\nu\_{t+1} = \frac{1}{1 + \exp(h\_0 - J\nu\_t)}\tag{5}$$

yielding the following equation for the fixed points

$$\nu = \frac{1}{1 + \exp(h\_0 - f\nu)}\tag{6}$$

which correspond to the roots of the function

$$f(\mathbf{x}) = \mathbf{x} - \frac{1}{1 + \exp(h\_0 - J\mathbf{x})} \tag{7}$$

and can be obtained by iterating the fixed point equation Equation 6 or using Matlab's root finder fzero. The background field h0 determines the baseline firing rate *r*0, which is the rate obtained in the absence of coupling, *J* = 0:

$$r\_0 = \frac{1}{\Delta t} \frac{1}{1 + \exp(h\_0)}\tag{8}$$

where we have divided by the bin size t to obtain a firing rate in Hz.

There is always a high firing rate solution for sufficiently high coupling strength *J*, because when all neurons are active on a given time step, they will also all be active on the next time step. There can also be a low firing rate solution which depends on the coupling strength and the baseline firing rate. The coupling strength *Jc* at a given baseline firing rate below which the LFS exist is the upper limit of stability. The stochastic dynamics generates fluctuations, which could push the network away from the LFS, whereas a degree distribution with a large variance would cause a dispersion in the mean firing rate across neurons. These effects are characterized by performing the full simulations without stochasticity to determine the effect of firing rate dispersion,

$$\nu\_{i,t+1} = \frac{1}{1 + \exp\left(h\_0 - \frac{I}{N p\_\epsilon} \sum\_j \boldsymbol{\omega}\_{ij} \boldsymbol{\nu}\_{j,t}\right)}\tag{9}$$

and the stochastic version in Equation (1) to determine the effect of fluctuations.

Second, in the latter case, the state (LFS vs. HFS) reached is not deterministic, because a network can have a firing rate that fluctuates around the LFS or veers off to the HFS due to a somewhat larger fluctuation. We therefore performed the simulation multiple times and recorded how often (on what fraction of the trials) the network ended up at the HFS state as a function of the coupling constant. In this case we defined *Jc* to be the value of the coupling constant at which 50% of the states converged to the HFS within 400 time steps. The initial condition of the network was obtained by making a random set of neurons active in such a way that on average it had the same number of active neurons as expected based on the firing rate in the mean-field limit.

Third, when fluctuations stay in the basin of attraction (BOA) of the LFS, the network will not diverge, which means that the above fraction is an indirect measure of the BOA. We also determined a more direct measure by starting networks from different initial conditions, each with a different number of active neurons, and determining which fraction of trials goes to the HFS within 400 time steps. These initial states are characterized by the effective number *N*eff of active neurons as is explained in the Results section and represented in Equation 11.

#### **NETWORK SENSITIVITY**

The sensitivity to a perturbation in experiment is tested in the model by activating a few selected neurons for a fixed duration. The stimulation was characterized by the number *np* of neurons stimulated (typically *np* = 8), the number of time bins, *T*stim, the stimulation lasted (typically *T*stim = 6) and the mean outdegree of the stimulated neurons represented by *N*eff. For a fair comparison between different networks we randomly picked the stimulated neurons from the network and repeated the stimulation for 50 different realizations of the network. In order to estimate the effect of out-degree on the detection of the stimulation, we also ordered neurons based on their out-degree, with the highest out-degree first. This ordered set was divided into ten groups of equal size. We then randomly selected the stimulated neurons from a specific group and compared how the network response depended on which group was being stimulated.

#### **ROC ANALYSIS**

The ROC is obtained by picking a threshold and determining how often a firing rate response from the unstimulated network exceeds this threshold: the fraction of false positives. In addition, it is determined how often the firing rate of the stimulated network exceeds this threshold, this is the fraction of true positives. The ROC curve is traced out by plotting the true positives vs. the false positives for each possible threshold. When the distributions are exactly the same, the number of true positives equals the number of false positives, hence the ROC is the diagonal with an area under the curve (AUC) of 0.5. The deviation of the ROC curves from the diagonal, or equivalently deviation of the AUC from 0.5, is a measure for how different the distributions are and maps for Gaussian distributions on to d , which is the difference in means of the distributions divided by the standard deviation (Kingdom and Prins, 2010). This also means that one can determine how many trials are needed to detect, given a particular ROC value, a difference between stimulation trials and unstimulated responses. The errors in the ROC curve and AUC value were determined by resampling of the simulated trials. Typically *Nr* = 2000 resamplings were used.

#### **FUZZY CLUSTERING AND PERCEPTRON ANALYSIS**

Fuzzy c-means (FCM) was used to cluster data points, such as a vector of network firing rates in consecutive time bins, or the motif distribution for a particular realization of a network, into groups with similar properties. FCM can be understood by first considering *K*-means clustering. In a *K*-means clustering, a number of clusters is chosen and the objects to be clustered are assigned on a random basis to each of the potential clusters (Duda et al., 2001). The name of the algorithm derives from the convention that the number of clusters is denoted by *K*. Using these assignments, the mean of each cluster is found. Then, using these means, objects are re-assigned to each cluster based on which cluster center they are closest to. This process repeats until the cluster centers have converged onto stable values or a maximum iteration count is reached. This type of clustering minimizes the sum of the squared distances of the clustered objects from their cluster means. FCM functions in the same way, but rather than belonging to any particular cluster, each object *i* is assigned a set of normalized probabilities *uij* of belonging to cluster *j* (Bezdek, 1981). This is equivalent to minimizing a non-linear objective function of the distances of the objects from the cluster centers, characterized by the "fuzzifier" parameter, which is set to two. After the algorithm converges each data point is assigned to the cluster to which it is most likely to belong (maximizing the *uij* with respect to the cluster index *j*). A more complete description is given in (Fellous et al., 2004).

The perceptron algorithm is a method to classify responses x of the network (Duda et al., 2001). Here the vector *x* = (*rt*,*rt*+1) represents either a point in the firing rate return map or it represents the binary activity for each neuron during a particular time bin. The algorithm tries to find a weight vector *w* such that the sign *wT*x is positive when *x* belongs to group 1 (stimulated network) and negative when it belongs to group 2 (unstimulated).

#### **ANALYSING MOTIF COUNT DISTRIBUTIONS**

To investigate whether we could use motif statistics (restricting ourselves to 3-node motifs) of smaller parts of the complete network to distinguish between networks with different degree correlations, we generated *Nr* = 1000 realizations of each network type: correlated, anti-correlated and uncorrelated. We used smaller networks, *N* = 200, because these networks are adequate to represent sub-network statistics of size *N*sub up to 200. We used standard parameters, *pc* = 0.05, now yielding μ = 10 and σ*<sup>y</sup>* = μ/3 = 3.33 and σ*<sup>x</sup>* = 0.3σ*<sup>y</sup>* = 1.0 for the smaller network. From each realization we sample sub-networks of *N*sub from 4 to 24 in steps of 4 and from 30 to 200 in steps of 10. For each (sub) network we count the number of 3-node motifs using the explicit formulas given in Table III of Itzkovitz et al. (2003). Each motif is labeled by a number according to the convention also found in Itzkovitz et al. (2003). The counts in an ER network vary with powers of the expected number of edges per node *k* and network size *N*, λ*N*3(*k*/*N*)*<sup>e</sup>* , where λ is a factor representing the symmetry of the pattern [see Table III in Itzkovitz et al. (2003)] and *e* is the number of edges in the pattern, which defines the complexity of the motif.

As a first step in the analysis we determined the mean and standard deviation of the motif count across the *Nr* realizations. To reduce the size of statistical fluctuations we also pooled motif counts by averaging them across *Nav* realizations. We either split the original *Nr* realizations into *Nr*/*Nav* groups, yielding a reduced number of data points or we randomly sampled with replacement *NrNav* samples from the original *Nr* samples to keep the same number of pooled motif counts. The count distribution was often not Gaussian, which meant we could not use the *t*-test for the difference in mean count over the standard deviation. Hence, we utilized a ROC analysis. In order to obtain error estimates we created *Nb* = 20 different sets of *Nr* = 500 realizations, each of which were obtained by randomly sampling with replacement from amongst the *Nr* = 1000 original realizations.

We also wanted to determine whether incorporating counts of pairs of motifs would improve the ability to distinguish between networks with different degree correlations. We considered each realization, drawn from one or the other group of networks, as a two-dimensional data point and used FCM to find two clusters. FCM outputs the confidence (or probability) that the data point belongs to cluster 1. This value can be used as part of an ROC procedure. For a given threshold, the true positive corresponds the fraction of data points belonging to group 1 for which the confidence exceeds the threshold, whereas the false positive corresponds to the fraction exceeding threshold that belongs to the second group. We applied this procedure for each possible pair of motifs and for each sub-network size.

## **RESULTS**

#### **ANTI-CORRELATED NETWORKS ARE MOST STABLE IN THE ZERO-NOISE CASE**

The mean-field limit, corresponding to an infinite network, is studied by considering the dynamics of a network where each neuron has the same firing rate, each neuron has the same number of synaptic inputs, i.e., in-degree, and there is no stochasticity. In this case the dynamical equations reduce to a self-consistent equation for the average firing rate *v* (Equation 6 in Methods), which is solved according to the fixed point method. There are typically two stable solutions, one corresponding the HFS, in which the neuron is constantly firing (firing probability *v* = 1 or close to one) and one corresponding to the LFS at much lower rates, together with one unstable solution in between (**Figure 2A**). For high enough coupling constants only the HFS solution remains. We studied this by starting from an initial value of *vt* near zero and then iterating Equation 5 until convergence, if there is a LFS, it will converge to the LFS and if there is no LFS it will converge to the HFS, resulting in a sudden jump in firing rate as a function of *J* (**Figure 2B**). The coupling strength for which this jump occurs is denoted by *Jc* and depends on the baseline firing rate *r*<sup>0</sup> (defined in Equation 8, **Figures 2B,C**). The higher *r*<sup>0</sup> the less stable the network is. The firing rate of the LFS for *J* values just before it becomes unstable, referred to as *rc*, is the maximum firing rate that the network can sustain, which varies approximately linear with the baseline firing rate (**Figure 2D**).

The effect of network size is studied by iterating Equation 9 for a vector of firing rate values, which ignores the effects of noise that are present in the full equations, Equation 1. In these finite size systems, the LFS is less stable, as reflected in the *Jc* values that are much below the mean-field limit (**Figure 2E**). There also is a difference between networks depending on their degree correlations, with the anti-correlated network being more stable than the ER network, correlated and uncorrelated networks. These differences become more pronounced for larger networks (**Figure 2E**). The difference also depends on the baseline firing rate, with the anti-correlated network again being the most stable (**Figure 2F**).

**FIGURE 2 | Anti-correlations in the degree distribution improve the stability of the low firing rate state (LFS).** We compared the stability of finite-size networks with different degree correlation structure by iterating Equation 9 (which is Equation 1 without taking into account stochastic spiking). **(A)** The mean-field solution, corresponding to an infinite-size network, is simulated by assuming that the firing rate of each neuron is equal, yielding Equation 6, of which all roots are shown in the graph. **(B)** Mean firing rate r vs. coupling constant J in the mean-field limit for different values for the baseline firing rate *r*0. When the LFS loses stability, the only remaining solution is the HFS. As a result the plotted firing rate suddenly jumps to the maximum possible rate of 100 Hz (corresponding to 1 spike per bin). **(C)** The range of stable coupling constants, which are between 0 and *Jc* , decreases with increasing baseline firing rate. **(D)** The firing rate *rc* of the LFS just before it turns unstable increases linearly with *r*0. **(E)** The stability of the LFS depends on system size and approaches the mean-field limit (cyan) gradually as network size *N* increases (baseline rate *r*<sup>0</sup> = 1 Hz). The anti-correlated network (blue) is always more stable than the ER (purple), correlated (red), and uncorrelated (green) networks. **(F)** The difference between the mean field *Jc* and that of the finite-size networks decreases with baseline firing rate (network size *N* = 2000).

## **ANTI-CORRELATED NETWORKS ARE MORE STABLE AGAINST FLUCTUATIONS**

The dynamics of binary networks is stochastic because on each time step the expected firing rate is translated into a binary value. Hence, the firing rate, either averaged across network neurons during one time bin, or of one neuron averaged over a few time bins, will fluctuate. These fluctuations will alter the stability because these fluctuations could drive the network out of the BOA of the LFS toward that of the HFS state. The firing rate in the LFS state vs. coupling constant curve for the stochastic

**FIGURE 3 | The anti-correlated network is more stable against fluctuations. (A)** The firing rate vs. coupling strength for the mean-field solution (cyan) and networks with uncorrelated (green), correlated (red) or anti-correlated (blue) degree distributions (*r*<sup>0</sup> = 1 Hz, *N* = 2000). The anti-correlated degree distribution leads to the most stable network. The dashed box approximately indicates the interval of coupling strengths highlighted in panels **(B)** and **(C)**. **(B)** Despite the existence of a stable LFS for a particular coupling strength, fluctuations in network activity may perturb the network away from it and the network ends up in the co-existing stable HFS state. The fraction of states that end up in the HFS state is close to zero far below *Jc* and increases to unity above *Jc* . The LFS state is more stable for the anti-correlated (blue) network than for the uncorrelated network (green), which in turn is more stable than the correlated network (red). The dashed lines are fits to the sigmoidal function in Equation 10. **(C)** The stability depends on the strength of the correlation. When the width (dispersion) corresponding to the small axis in the bivariate Gaussian degree distribution is increased, which means lower correlation, the stability is reduced. Data are for an anti-correlated network. **(D)** A neuron's firing rate is correlated with its in-degree, but the degree of correlation is reduced to 0.519 (0.014) due to jitter in this relation for Equation 1 (blue dots) from 0.997 (0.002) for Equation 9 (green dots). Data for anti-correlated network, *J* = 30.96. **(E,F)** The degree of stability can be qualified by *J*gap, the distance of the *Jc* for the finite-size network from that for the mean-field network, shorter distances meaning more stable networks. *J*gap decreases with the **(E)** baseline firing rate *r*<sup>0</sup> and with **(F)** network size. In both panels the anti-correlated network (blue line) corresponds to the lowest curve indicating higher stability compared to ER (purple), uncorrelated (green) and correlated (red), an advantage that increases with network size. The network had *N* = 2000 neurons, for each coupling strength *Nt* = 100 simulations were performed, with a length of 500 time steps, of which the first 100 were discarded as a transient.

network (**Figure 3A**) looks similar to that for the zero noise case (not shown), but the fraction of trials on which the HFS state is reached displays a sigmoidal behavior (**Figure 3B**): with some networks switching to the HFS state close to, but below the critical coupling constant *Jc*, whereas most of the networks go to HFS for coupling constants above *Jc*. In between there is a transition point where an equal number of networks go to the LFS and HFS state. The anti-correlated state is more stable, because this transition point lies to the right of the transition point for the other networks (**Figure 3B**). We have fitted the probability to the following expression,

$$\mathbf{p(l)} = 1/(1 + \exp(-\left(\mathbf{l} - \mathbf{J}\right)/\sigma\mathbf{l})\tag{10}$$

where *Jh* is the transition point and σ*<sup>J</sup>* represents the sharpness of the transition. The transition for correlated and anti-correlated networks is sharper than for uncorrelated networks, as indicated by the σ*<sup>J</sup>* = 0.424 and 0.420, compared to 0.391, respectively, with *R*<sup>2</sup> values (fraction of explained variance) all approximately 0.999.

The in- and out-degrees are drawn from a bivariate Gaussian, which has a long axis, in the direction of the correlation, and a short axis perpendicular to that direction (Equation 4, Methods). Increasing the standard deviation along the short axis, termed dispersion, reduces the degree of correlation. In addition, it makes the anti-correlated network less stable (**Figure 3C**).

The stability properties of the finite-size networks are different from that in the mean-field limit (**Figure 2**), because the firing rate of a neuron depends on the number of inputs (in-degree), which varies across neurons in the network (**Figure 3D**, green dots). The correlation between the neuron's firing rate and its in-degree is almost perfect for the non-stochastic network, with squared Pearson correlation *R*<sup>2</sup> = 0.997 (0.002), but becomes jittered due to the stochastic spiking resulting in a squared Pearson correlation of 0.519 (0.014) (**Figure 3D**, blue dots).

The mean-field limit represents the highest level of stability, because both finite-size and noise effects reduce it. The reduction in stability can be captured into *J*gap, which is the mean-field critical coupling minus the critical coupling value for the noisy, finite-size network. The smaller *J*gap is, the more stable the system is. The gap decreases both with baseline firing rate (**Figure 3E**) and network size (**Figure 3F**). As the network size increases, the comparative stability advantage of anti-correlated networks increases.

The stability against fluctuations can be analyzed differently. Non-linear dynamical systems are characterized in terms of the basin of attraction (BOA). Consider a simple one-dimensional system with two stable fixed points (and an unstable one in between) (Strogatz, 1994). Depending on the initial condition of the one state variable, the system will converge to one or the other fixed point. The catchment area of the first fixed point, the range of initial conditions that converge toward it, is the BOA. There is a well-defined boundary between the two BOAs. Our goal is to characterize this boundary between LFS and HFS for the binary networks studied here, which is complicated because of the high dimensionality of the state space and the stochasticity, which means that a given initial condition near the boundary could converge to a LFS or HFS depending on the role of the dice. The first issue means we have to find a more effective and compact description of the initial state. Our initial choice was to use the number *Na* of active neurons in the initial condition. However, when the *Na* highest out-degree neurons are active, the network is more likely to converge to the HFS than when the *Na* lowest out-degree neurons are active, even though the initial state has an equal number of active neurons. Hence, we used the so called effective number of active neurons, where each neuron's contribution is weighted by their out-degree:

$$N\_{\text{eff}} = N \frac{\sum\_{i \in \text{active}} d\_i^{\text{out}}}{\sum\_i d\_i^{\text{out}}} \tag{11}$$

We started the simulations from a random initial state, characterized by a specific number of active neurons (range: between 0 and 200), and repeated this procedure enough times (*Nr* = 4000) to ensure sufficient coverage across the relevant *N*eff values. For each *N*eff value so sampled, a fraction converged to the LFS and the remainder went to the HFS state (**Figure 4A**). For small *N*eff most states converge to LFS and for *N*eff larger than a transition value *N*eff,<sup>90</sup> most converge to the HFS (**Figure 4B**). We choose as transition value the lowest *N*eff value for which 90% or more states went to the HFS. The transition value *N*eff,<sup>90</sup> decreases with coupling strength *J* (**Figures 4C,D**) until its value comes close to the number of active neurons represented by the average firing rate of the mean-field network, at which point stability is lost. This is because the BOA of the LFS shrinks

**FIGURE 4 | The basin of attraction of the LFS is larger for anti-correlated networks indicating enhanced stability against fluctuations. (A)** Simulations were started from initial states with a different number *Na* of active neurons, which is translated into an *N*eff value (see text) to allow for a fair comparison of initial conditions. We show the firing rate as a function of time (in units of iterations). For low *Na* the anti-correlated network converged to the LFS, whereas for high *Na* runs it converged to the HFS. **(B)** This was reflected in the histogram where green filled bars indicate the number of states with a particular *N*eff that converged to the LFS and the open bars indicate the number of states that converged to the HFS. Data for anti-correlated network with *J* = 25. **(C)** *N*eff,<sup>90</sup> as a function of coupling constant *J* for uncorrelated (green), correlated (red) and anti-correlated (blue) networks together with the number *Na*,*av* of active neurons corresponding to the firing rate of the mean-field solution (cyan dashed line) as a reference. **(D)** Close-up of panel **(C)**. The data were obtained from a network of *N* = 2000 neurons, with a baseline rate of 1 Hz. For each coupling strength, and, each network type we used *Nr* = 1000 initial conditions and averaged across 4 realizations of the network.

Vasquez et al. Stability & sensitivity in cortical networks

to zero and most initial conditions go to the HFS. The anticorrelated network is more stable because it can sustain initial states with a higher number of active neurons and still return to the LFS as compared to other networks. Furthermore, for the anti-correlated networks the BOA is finite for larger values of the coupling constant compared to other networks. Overall, when a sufficient number of neurons are active in the initial condition, both the effective and unnormalized number of active neurons yield similar results for the size of the BOA (not shown).

#### **THE SENSITIVITY OF THE NETWORK CAN BE CHARACTERIZED USING ROC ANALYSIS**

During spontaneous (unstimulated) activity in the network, the firing rate will fluctuate from time bin to time bin, which can be considered random draws from a distribution. When the network is stimulated, the average firing rate will be altered, trivially because of the activated neurons, but non-trivially through the downstream effect of this stimulation on the other neurons. The stimulation is characterized by the number of cells *np* stimulated (and their out-degree, see below) and the duration of the stimulation *T*stim. We used *np* = 8 and *T*stim = 6. Its effect on the network can be detected when there is a systematic difference between the network states, quantified, for instance, in terms of the mean firing rate of the overall activity. An ROC analysis quantifies how different the distribution of firing rate is between the stimulated and unstimulated networks and how easy it is to detect this difference and can thus be compared to measured behavioral responses. In all of the following analyses we exclude the stimulated cells themselves. One reason is that the decision process would be based on downstream neurons, hence we should detect the difference in the downstream population.

The histogram of the simulated firing rates was shifted relative to that of the unstimulated network (**Figure 5A**). In **Figure 5B**, the ROC curve corresponding to the empirical distributions in panel a is shown. The evaluation of the corresponding AUC, as a function of time is shown in panel c. Before the stimulation at *t* = 10, the statistics of both networks are the same, yielding an AUC of close to 0.5, whereas after stimulation the AUC rises to the 0.75. The ability to detect a stimulation increases with the strength of the coupling constant (**Figure 5D**). This can be simply understood because a higher *J* increases the impact of presynaptic activity on the neuron's firing rate, hence it also increases the effect of stimulation. There is no difference in sensitivity due to the correlation structure of the network as long as neurons with similar out-degrees are stimulated, because the sensitivity only depends on the out-degree. The AUC also increases with baseline firing rate of the network (**Figure 5E**), which indicates that network state changes, such as those occurring during arousal or with attention in which the overall firing rate increases, could improve task performance. Also for this behavior there was no difference between networks with the different type of degree correlations. The derivative of the mean firing rate *r* with respect to *J* increases with baseline firing rate *r*0, suggesting that the effect of a stimulation on the network firing rate increases with *r*0, which is indeed borne out by the simulation results in **Figure 5E**.

**FIGURE 5 | Network sensitivity, when evaluated using an ROC analysis, depends only on the mean out-degree of the stimulated neurons and not on the degree correlations. (A)** Distribution of firing rate across cells in a 10 ms time bin for spontaneous activity (red) and for the stimulated network (blue), in which 8 random cells were stimulated. Note that the stimulated cells were not included in this ROC analysis and we used the binary responses *xi*(*t*) to determine the firing rate. **(B)** The corresponding ROC curve (blue) quantifies the difference between the distributions, relative to the diagonal (gray), which represents distributions that cannot be distinguished. **(C)** The area under the curve (AUC) for the ROC curves calculated for different time bins. The AUC before stimulation was close to 0.5 because the distributions were the same apart from fluctuations due to sampling. After the stimulation, which started at *t* = 10 and ended at *t* = 15, the AUC rose to around 0.75. **(D)** The AUC increases with increasing coupling constant and **(E)** with increasing baseline firing rate. **(F)** The AUC depended on the mean out-degree of the stimulated neurons. Neurons were divided into ten groups according to their out-degree, with the first group having the highest out-degree. The group index is indicated on the x-axis. The results in **(D–F)** were not significantly different for correlated (red), anti-correlated (blue) or uncorrelated (green) networks, *t*-test, *p* = 0.4479, 0.6279, 0.7421, respectively. The network was comprised of *N* = 2000 neurons, of which *np* = 8 neurons were stimulated for the duration of *T*stim = 6 time units starting on the 10th bin. In panel **(A–C)** results for an uncorrelated network are shown. Parameters: **(A–C,F)** *J* = 18, *r*<sup>0</sup> = 1; for **(D)** *r*<sup>0</sup> = 1 and **(E)** *J* = 20.

Stimulus detection depended on which cells were stimulated, with their average out-degree being the most important factor. We chose *np* neurons to be stimulated randomly from 10 different groups with different mean out-degree, which were generated as follows. First all neurons were ordered according to their out-degree, with the highest out-degree neurons coming first, and then divided into ten equally-sized groups, labeled Vasquez et al. Stability & sensitivity in cortical networks

1 to 10. Multiple stimulation trials were done with *np* neurons picked from one of the groups from which the group AUC was determined. The AUC for each group was then plotted as a function of the group label (**Figure 5F**). The AUC values for the first group were much higher than for the next groups demonstrating clearly that the group with the highest mean out-degree also had the highest AUC.

Taken together, these simulations show that the correlations in the degree distribution do not directly affect network sensitivity to stimulation. Rather, this sensitivity is determined by the outdegree of the stimulated neurons. Networks can display a higher sensitivity if they have a larger variability in the out-degree distribution and those cells with the highest out-degree are being stimulated. ER networks have a low variance in the out-degree, and will therefore have a reduced sensitivity compared to the networks here, compare the AUC of the first group to that of the fifth group which represents neurons with an out-degree closest to the mean.

The fluctuations in firing rate during spontaneous activity are expected to have different temporal correlations compared to those in the stimulated network, as an increase due to an external stimulation is going to persist across the time bins during which the stimulation takes place. Hence, the detection rate could improve by taking into account (spatio) temporal correlations. The first step is to consider the correlation in network firing rate *r* between two consecutive time bins. When *rt*<sup>+</sup><sup>1</sup> is plotted vs. *rt* a return map would be obtained. However, because the firing rate values are restricted to *x*/(*Nt*), where *x* is an integer between 0 and *N*, and *N* the network size, the return map would have a non-informative appearance. Hence, we made a density representation, by replacing each sample by a two-dimensional Gaussian (kernel density estimate) with a standard deviation (bandwidth) optimally estimated from data following the Silverman's rule of thumb (Silverman, 1986). The hot spot in the return map density obtained for stimulated networks (**Figure 6B**, plus sign) is shifted along the diagonal in the positive direction (i.e., higher rates) in comparison to the return map for spontaneous activity (**Figure 6A**).

We determined whether such a two-dimensional representation would improve the detection rate. An equal number of samples from spontaneous activity and from stimulated activity were provided to a fuzzy clustering method (FCM) routine in Matlab in order to find two clusters (Fellous et al., 2004). The FCM returns for each data point *i* the probability *uij* that it belongs to cluster *j*. As the sum of probabilities needs to be unity, for two possible clusters we only need to consider *ui*1. We thus obtain a distribution of *ui*<sup>1</sup> values for data points from the spontaneous activity and a distribution for data points from the stimulated network. The difference between these distributions is a measure for how well stimulation can be detected and can thus be subjected to a ROC analysis. In this ROC analysis the *ui*<sup>1</sup> values are treated in exactly the same way as the firing rates used to obtain the results in **Figure 5**. The resulting AUC values were 5% higher than based on the distribution of firing rates in one bin (*t*-test, *p* = 0).

In the firing-rate based detection procedure, each neuron (except the directly stimulated ones) carries equal weight. The

**FIGURE 6 | Detection can be improved by including past activity and weighting neurons depending on how many inputs they receive from directly stimulated neurons. (A,B)** Density representation of the firing rate return map, wherein the probability of obtaining consecutive rate values (r*t*, r*t*<sup>+</sup>1) is represented by a color scale, with red indicating the highest probability and blue indicating a near zero probability. The results are shown for **(A)** spontaneous activity and **(B)** stimulated activity. The plusses indicate the location of the peak in panel **(B)**. **(C)** Analysis of factors that contribute to a neuron's weight in detection decision that is outputted by the perceptron procedure. There was significant but small correlation between weight and (top left) in-degree or (top right) out-degree. There was a correlation between the weight and the number of direct inputs from stimulated neurons (bottom left), but only a weak correlation with the number of inputs from cells that received direct input from the stimulated cells (bottom right). The network was comprised of *N* = 2000 neurons, coupling constant *J* = 18, baseline firing rate *r*<sup>0</sup> = 1 Hz. In the stimulated network, *np* = 8 neurons were stimulated for a duration of *T*stim = 6 time bins.

cells that are not directly connected to the stimulated neurons would display firing rate fluctuations that are unrelated to the stimulation, hence act as noise that reduces probability of detection. The signal to noise of the firing rate fluctuations could be improved by weighing those neurons less. To explore this hypothesis we applied a perceptron procedure (see Methods) to learn the optimal weights for classifying the network state vectors (Duda et al., 2001). An equal number of network states for spontaneous activity and for stimulated networks were supplied to the perceptron routine together with the corresponding class labels. The output was a weight for each neuron. As before the activity of the directly stimulated neurons was not included in this analysis. To determine what features contributed to the weight we plotted the weight vs. feature value in a scatter plot and calculated the corresponding Pearson correlation. There was a small, but significant correlation between the weight and the in-degree (**Figure 6C**, top left, correlation 0.073 ± 0.03, *p* = 0.0011) and with the out-degree (**Figure 6C**, top right, correlation−0.052 ± 0.027, *p* = 0.018). There was a strong correlation between the weight and the number of direct inputs the neuron received from stimulated neurons (**Figure 6C**, bottom left, correlation 0.410 ± 0.15, *p* = 0.0). The number of indirect inputs from stimulated neurons was less relevant (**Figure 6C**, bottom right, correlation 0.072 ± 0.032, *p* = 0.012). We calculated this by determining the number of inputs from cells that received direct inputs.

Taken together, these analyses show that our estimates for the detection of stimulation based on overall firing rate are underestimates and can be improved by taking into account network history and by selecting which neurons to listen to. The latter of which may be achieved through synaptic plasticity and the appropriate learning rules.

## **DETECTING ANTI-CORRELATION IN THE DEGREE DISTRIBUTION WITH LIMITED DATA**

The results here establish that anti-correlation between in- and out-degrees results in more stable, but equally sensitive networks, compared to networks without correlations between in- and outdegree, or positive correlations between them. Hence, learning to detect a stimulation could proceed by altering the correlation between in- and out-degree. To demonstrate such a learning effect, the in- and out-degree of a number of neurons needs to be sampled. Classical tracing techniques are not appropriate because they involve the connections to or from multiple nearby neurons (Lanciego and Wouterlood, 2011). For instance, when the retrograde tracer horseradish peroxidase is injected, it is absorbed by multiple axon terminals and transported to their respective cell bodies. These axon terminals do not necessarily synapse on one and the same neuron near the injection site. Hence, the data cannot be used to determine the in-degree of a neuron near the injection site.

New viral-based techniques could help, because they work by infecting a few cells in the neighborhood where the virus is injected (Wickersham et al., 2007; Osakada et al., 2011). The virus will then retrogradely label the cells presynaptic to these cells by crossing one synapse and one synapse only. In the presynaptic cells the infection stops because the virus misses the proteins necessary to cross another synapse. The challenge with this method is to infect only one cell, with both an anterogradely and retrogradely crossing virus.

Currently, the gold standard is to simultaneously record multiple cells *in vitro* and assess connections by inducing action potentials in one neuron at a time and recording the post-synaptic responses in the other cells. The current record is 12 cells recorded simultaneously (Song et al., 2005; Perin et al., 2011). This means that the anti-correlation in the degree distribution will have to be assessed indirectly, by sampling from sub-networks.

Motifs represent patterns in the connectivity that occur more often than expected if the connections were made random (Milo et al., 2002). For instance, consider a network for which the average probability of a connection is *p*. For two neurons, if these connections are made randomly, the probability of having no connection is (1 − *p*)2, for having one connection 2*p*(1 − *p*) and for having a bidirectional connection *p*2. When it is found that bidirectional connections occur significantly more than the expected *p*<sup>2</sup> then there is additional, non-random, structure in the network (Song et al., 2005). Motifs most often refer to triplets of neurons and the patterns of connectivity between them that occur more often than expected in a random network (Milo et al., 2002). A motif distribution is the number of times each motif occurs in a network and a motif is considered present when it occurs more often than in a control network. Motif distributions are affected by many network properties such as, for instance, the degree distribution. The networks studied here, even when uncorrelated, have a different degree distribution than the ER network, which means that ER random networks are not a good control. Hence, we have to numerically generate the control distributions rather than having access to the analytical expression for the expected rate of each motif. In addition, in experimental settings we do not have access to the whole network from which to determine the motif distribution, we have to do with sub-networks. These subnetworks do not come from the same network, rather they come from networks sampled from an ensemble of networks with similar properties. To obtain estimates for how to observe evidence for anti-correlation in the degree distribution we need to deal with each of these issues.

The overall goal is to distinguish between pairs of networks with anti-correlated, uncorrelated and correlated degree distribution with the same marginal distribution for in- and out-degree.

We considered 13 different motifs that consisted of three connected neurons and gave each motif a numerical label as shown in **Figure 7A**. We determined the number of motifs in each realization of a network with correlated, anti-correlated or uncorrelated degree distribution and took the average. This was done for the full network (here reduced to *N* = 200) as well as for subnetworks (size *N*sub). The complexity of a motif corresponds to the number of edges in the pattern, ranging from 2 to 6, which determines how often it is counted in a network. We normalized the counts such that they took values on the order of unity in order to better compare them across motifs. The mean count as a function of *N*sub converged to a constant for network sizes between 50–100 neurons (**Figure 7B**), with more complex motifs requiring larger *N*sub. The width of the count distribution, quantified as the standard deviation, decreased with *N*sub as the-3/2 power (**Figure 7C**). Hence, for large enough networks the differences in mean counts across network type can be detected with certainty. This power law behavior is consistent with the results for a Binomial process with probability *p* and on the order of *n* ∼ *N*<sup>3</sup> sub trials, for which the mean is *np* and the variance is *np*(1 − *p*). In that case the normalized mean is *p*, and its variance (*1-p*)*/n* (see also Equations 2, 3), leading to a standard deviation varying as *n*−1/<sup>2</sup> = *N*−3/<sup>2</sup> sub .

We are looking for motifs whose counts are different between the analyzed network types. In **Figure 7D** we show the count as a function of motif for the three network types, with the highest difference occurring for the complex motifs 110 and 238. However, the counts for these motifs, which have the largest number of edges, are characterized by a large standard deviation. When we plot the motifs in an order based on the ratio of count difference over standard deviation, motif 98 comes up as winner instead (**Figure 7E**). **Figure 7D** shows that there are fewer motif 98 in

**FIGURE 7 | Motif 98 is the most sensitive to degree correlations. (A)** There are 13 motifs that involve 3 connected nodes. Below the graphical representation we plot the numbering used here, which follows Itzkovitz et al. (2003). The expected number of motifs depends on network size, hence we normalize the count by *N*3(*k*/*N*)*e*, with *N* the number of nodes, *k* the expected number of edges per node and *e* the number of edges in the motif. In addition, we include a numerical factor representing the equivalent permutations [listed in Table 3 in Itzkovitz et al. (2003)]. **(B)** The normalized counts, averaged across a thousand realizations, converge to constant values for sub-networks larger than 50–100 nodes, with the precise value depending on the complexity of the motif involved. **(C)** The standard deviation of the normalized counts fall off as *N*−3/2. We illustrate the results for the anti-correlated network, which are typical for the correlated and uncorrelated network also. In addition, we omitted motif 238 because it occurs at such a low probability that it makes the statistics noisy. **(D)** The normalized counts for each motif for the (red) correlated, (blue) anti-correlated and (green) uncorrelated networks. We used the counts for the full network, rather than sub-networks. Network size in panel **(D)** and **(E)** was *N* = 200. We used a bivariate Gaussian degree distribution with a mean number of nodes equal to 10, a standard deviation along the long axis of σ*<sup>y</sup>* = 3.33 and along the short axis of σ*<sup>x</sup>* = 1.0. **(E)** The maximum difference in mean count between all three possible comparisons (black bars), relative to the mean standard deviation of these counts across the three network types. The motifs are ordered on the count over standard deviation ratio, starting with the largest. According to this analysis motif 98 should be used to best distinguish between different network correlation structures.

anti-correlated networks compared to correlated networks. This can be understood intuitively by noting that in "ring" motif 98 each neuron has the same number of inputs as outputs, namely 1, which is more representative for correlated networks (**Figure 1D**, inset) than for anti-correlated networks (**Figure 1C**, inset). Furthermore, this means there is a lower probability of closing the ring, because in an anti-correlated network a neuron with many inputs has fewer outputs to get to the next neuron in the ring.

The count distributions are not Gaussian for small subnetworks. **Figure 8A** shows the count distribution for motif 98 for networks with *N* = 200. Each network gives rise to a symmetric appearing distribution, with the peak at a different location depending on the network type. The distribution for the anticorrelated and correlated network were farthest apart, with that of the uncorrelated distribution situated in the middle. For *N*sub = 30 (**Figure 8B**), the corresponding distributions fell on top of each other and are asymmetric because the counts are always positive. To compare the distributions we therefore performed an ROC analysis. As expected based on the reduced overlap between distributions, the AUC increases with sub-network size, and motif 98 comes out on top with the highest AUC (**Figure 8C**). Furthermore, given the lower overlap between the anti-correlated and correlated distribution (**Figure 8A**), the AUC values for the comparison between anti-correlated and correlated network is higher (**Figure 8C**) than for either the comparison between anti-correlated and uncorrelated (**Figure 8D**) or correlated with uncorrelated (not shown).

In experiments only relatively small networks can be mapped, up to 12 cells using paired recordings and a few tens to hundreds using population calcium imaging. For these numbers the degree correlations cannot be reliably distinguished based on a single measurement. We therefore pooled measurements to see if this improved discriminability for more experimentally accessible smaller sub-networks. This procedure (pooling motif counts across *Nav* = 50 network realizations) indeed reduced overlap between distributions (**Figure 8E**, compare to **Figure 8B**). The more motif counts were pooled, the higher the AUC was (**Figure 8F**). Furthermore, the value of unity, corresponding to perfect discriminability is reached for smaller sub-network sizes. For *Nav* = 50, *N*sub = 30 networks are perfectly discriminable and the AUC transitions from values just above 0.7 to unity between *Nav* = 30 and 50 (**Figure 8F**). Taken together, subnetworks of a few tens of neurons could be used to test our hypothesis experimentally.

The question is whether this result can be improved by including counts for multiple different motifs (**Figure 9A**). Without pooling, motif 98 by itself outperforms any pair of motif counts, according to the AUC value (**Figure 9B**). To determine the AUC value for pairs of motif counts we used the FCM procedure as outlined in the methods section. When counts are pooled (**Figure 9C**), some motif pairs outperform motif 98 by a small margin. The pairs are highlighted in **Figure 9D**, and involve motif 98 itself. The more separated the cloud of points corresponding to different network types is, the better the FCM procedure classifies the networks, compare the plusses (correct discrimination) and dots (incorrect) in **Figure 9A**.

## **DISCUSSION**

The overall firing rates in barrel cortex (de Kock et al., 2007; Greenberg et al., 2008; Barth and Poulet, 2012) are much lower than might be expected based on the classic experiments in macaque visual cortex (Hubel and Wiesel, 1968). Neural activity is also variable, which can be characterized as across trial reliability, or in terms of the coefficient of variation and Fano factor of spontaneous activity (Shadlen and Newsome, 1998). These measures reveal that the activity is similar to that of a Poisson process, in which the occurrence of a spike in a time bin is uncorrelated with whether or not a spike occurred in previous bins. The mean

**FIGURE 8 | Degree correlations can be distinguished by pooling fifty measurements of networks at least thirty neurons in size. (A)** Motif count varies across network realizations, but degree correlations can be distinguished when the corresponding distribution show little overlap. We show the distribution of the normalized counts of motif 98 for (red) correlated, (blue) anti-correlated, and (green) uncorrelated networks with 200 neurons. **(B)** For smaller sub-networks (*N*sub = 30), the distributions overlap. Furthermore, these distributions are not Gaussian as they are skewed because counts are always positive. Hence, a more general procedure, such as the ROC analysis needs to be used instead of looking at the differences in mean count relative to the standard deviation. **(C,D)** The area under the ROC curve (AUC) as a function of sub-network size *N*sub for the comparison **(C)** between correlated and anti-correlated networks and **(D)** between anti-correlated and uncorrelated networks. Each motif is labeled with a line style and color as indicated in the legend. Motif 98 is most sensitive in both cases (as well as for the correlated vs. uncorrelated comparison that is not shown). It is more difficult to distinguish an anti-correlated network from an uncorrelated one than to distinguish it from a correlated network. The average AUC values were determined based on the AUC value for each of twenty different motif distributions of 500 network realizations, which were sampled randomly with replacement out of 1000 realizations. **(E)** The motif distribution for *N*sub = 30 can be pooled across *Nav* = 50 network realizations in order to shrink the width of the distribution, so that the differences in mean counts become clearer [(compare to panel **(B)]**. **(F)** The AUC for larger *Nav* values reaches unity (distributions are perfectly distinguishable) for smaller sub-network sizes. We show (green) no pooling, (blue) pooling across *Nav* = 5 realizations and (red) pooling across *Nav* = 50 realizations. The AUC goes from 0.7 to 1.0 between *N*sub = 30 and 50 when pooled across *Nav =* 50 realizations, indicating that networks of size 30 can be used to determine degree correlation structure.

firing rate is maintained by the intrinsic excitability of neurons and their synaptic inputs, including recurrent excitation. High variability together with a low firing rate implies that the network dynamics should be stable against fluctuations in the mean activity in the sense that these fluctuations do not generate states

**FIGURE 9 | The gain in discriminability by using the joint distribution of motif counts rather than the marginal distribution is limited. (A)** The outcome of the FCM procedure to find two clusters. Red points indicate counts obtained from a correlated network and blue points are those for the anti-correlated network. The plusses indicate points correctly classified by FCM and the dots represent incorrectly classified network realizations. Each motif-pair ROC curve is obtained by applying the ROC analysis to the FCM generated probability of each network to belong to cluster one (see methods). **(B)** The resulting AUC as a function of sub-network size for motif 98 (red) is higher than for all possible pairs of motifs (black curves). The AUC shown is for comparing anti-correlated and correlated networks. **(C)** When the motif count is pooled across *Nav* = 20 realizations some two-motif curves exceed the single motif curve, which are shown separately in panel **(D)**. This suggests that for a specific size of the sub-network distinguishability can be improved by considering pairs of motif counts.

with networks bursts in which all neurons in the network are active at the same time.

Experiments show that rodents can detect single-cell stimulation in barrel cortex, in which a single neuron is electrically stimulated to produce a high-frequency train of action potentials (Houweling and Brecht, 2008). This may mean that single-cell stimulation can cause an increase (or decrease) in the firing rate of the local network that is significantly different from that occurring during spontaneous activity. Taken together, this means that cortical networks with a low firing rate should at the same time be stable against fluctuations in firing rate and sensitive to weak stimulation. The overall goal of this paper was to a find a potential explanation for how the contrasting demands of sensitivity and stability can be realized. To achieve this we examined the dynamics of binary neural networks with correlation between the inand out-degrees of neurons. In the following we summarize the main results with the aim of linking the detection performance of the network to experimentally obtained behavioral results, the mechanism by which sensitivity and stability can be achieved, and predicting the anatomical signatures of the hypothesized network. We also discuss the role of other biophysical factors, such as inhibition, not taken into account in the present study.

## **GENERATING THE NETWORK CONNECTIVITY UNDERLYING ENHANCED STABILITY**

Our guiding hypothesis is that networks with an anti-correlation between the in- and out-degree of neurons are more stable and equally sensitive as other networks with comparable marginal degree distributions. Network sensitivity is generated by neurons with a high out-degree, because these would amplify the effect of nanostimulation the most. This amplification would also destabilize the network, so these cells should not be activated during spontaneous activity. As the input to neurons is proportional to the mean firing rate and their in-degree, this can be achieved by making sure that high out-degree neurons have low in-degrees. To maintain the average degree, both in and out, there then also need to be neurons with a low out-degree and a high in-degree. We implemented this hypothesis as an anti-correlation in the inand out-degree.

In the standard Erdos-Renyi networks, the relative variance in the degree distribution for large networks becomes too small to have out-degrees that are much larger than the mean degree, which is needed to reach the desired sensitivity. Hence, we needed to broaden the degree distribution artificially by using a truncated bivariate Gaussian distribution. Networks with this sampled degree distribution were generated via the configuration model (Newman, 2010). This configuration model generates networks with self-edges and multi-edges. Analytical calculations show that the probability for obtaining a network with one or more of these edges is close to one for the large mean degrees we consider (Blitzstein and Diaconis, 2006). Nevertheless, the number of these edges is low and their impact on the dynamics was limited.

There are a number of ways to address the multi and self-edge problem in a more principled approach that differ in computational efficiency and ease of implementation. First, one can use the configuration model procedure, but reject an invalid edge and find a valid replacement. This carries the risk that the algorithm stops when there are no valid edges available, which means that the whole procedure has to be restarted. Alternatively, as mentioned before, one can identify the invalid edges when the network construction has been completed and remove them or replace them by valid ones. See Blitzstein and Diaconis (2006) for a review. Second, one can find one graph that satisfied the degree distribution using the Havel-Hakim procedure (Viger and Latapy, 2005; Erdos et al., 2010; Chatterjee et al., 2011) and generate samples from the overall graph distribution by swapping links (Blitzstein and Diaconis, 2006). Swapping links refers to the procedure where randomly chosen existing links *i* → *j* and *k* → *l* are swapped into *i* → *k* and *j* → *l* when this yields a simple graph without self-edges and multi-edges. This requires careful calibration of the number of swaps and also introduces bias because these swaps do not change the number of triangles in the network (Roberts and Coolen, 2012). Third, a sequential method can be defined that produces all possible graphs, by randomly selecting amongst the allowed edges that keep the residual degree distribution graphical (Del Genio et al., 2010; Kim et al., 2012). A degree distribution is graphical when there exists a simple graph with that distribution, after each step the degree distribution is lowered to account for the connections realized, and this is referred to as the residual degree distribution. This method does not produce the graphs with the correct probability. Hence, averages based on these graphs have to be reweighted to take this into account. Furthermore, in our hands, an implementation of this method produces graphs with a correlation between the in- and/or out-degrees between different nodes, which is referred to as assortativity. This necessitates a number of link swaps to remove these correlations. Fourth, edges can be sampled according to a Boltzman function (Park and Newman, 2004), where the expectation value of the degree of a node is fixed through a Lagrange multiplier, for which the appropriate value has to be picked, which can be achieved, for instance, through a maximum likelihood approach or iterative rescaling (Chatterjee et al., 2011). Taken together, we opted to use the simplest method here, because these alternative methods for network generation were computationally more intensive and also suffered from aforementioned additional drawbacks, such as graphs that were not sampled according to a uniform probability (Del Genio et al., 2010) or other biases in the network statistics (Roberts and Coolen, 2012). Recently developed methods for generating networks with degree correlations, both in a single neuron as well as between pairs of neurons look very promising (Roberts and Coolen, 2012).

## **STABILITY IS ENHANCED WHEN THE IN- AND OUT-DEGREE ARE ANTI-CORRELATED**

Our aim was to find stable networks, by which we mean that fluctuations do not cause a cascade of recurrent excitation resulting in all cells being active at the same time. One solution would be to have inhibitory neurons, but this does not affect the stability of the LFS, it just changes the ultimate level of activity reached (Avermann et al., 2012). Stability can be assessed in a number of different ways. First, stability in the nonlinear system sense: is the LFS a fixed point of a noise-less, infinite size system? We determined that there was a range of coupling strengths *J*, below *Jc*, for which such a LFS exists. The higher the baseline firing rate, the smaller that range is. Finite-size systems have a smaller range of stable coupling strength, because there is heterogeneity, not every neuron has the same in-degree. For instance, the uncorrelated network had a higher variance in the degree distribution than the ER network, and also had a smaller *Jc*. Interestingly, networks with a positive correlation between in- and out-degrees reduced stability even more, leading to a lower *Jc*, whereas for networks with a degree anti-correlation, *Jc* was higher, even exceeding the value for the ER network of the same size.

These calculations ignore the effects of fluctuations, which we subsequently introduced by making the dynamics stochastic. This did not alter the stability as determined before in terms of the existence of the LFS, but introduced other features. The LFS has a BOA with a fuzzy boundary due to the stochastic dynamics. A network can then be unstable when the fluctuations are large enough to leave the BOA when you wait long enough. This is primarily a concern for *J* values close to (and below) *Jc*. We determined the fraction of trials during which the network left the LFS BOA during the simulated time interval. As expected the anti-correlated network is more stable, because *Jc* is larger. For coupling constants away from *Jc*, this way of characterizing the BOA does not work. Hence, we started the network in states with many more neurons active than would be expected as a result of any normal fluctuation, and determined whether it converged to the LFS or HFS. This revealed that the BOA was larger for the anti-correlated network even away from *Jc*.

Taken together, these results clearly show that anti-correlated networks are more stable than uncorrelated ones, which means they can operate stably at higher coupling strengths and baseline firing rates, which confers advantages when the sensitivity is higher for higher coupling strengths and baselines rates. Furthermore, their sensitivity is enhanced compared to ER networks with the same connection probability, because of a subset of neurons with a high out-degree.

Recent experiments summarized in Barth and Poulet (2012) show that the average firing rate in sensory cortex is low, especially in superficial layers. This holds for spontaneous as well as evoked activity, and for both anesthetized animals and awake animals and is the basis for the parameter settings in the model. Nevertheless, there is a small subset of cells that display high firing rates. Cells that have recently been active express the immediateearly gene *c-fos*. When the *c-fos* promoter is used to express the fluorescent marker GFP, the recently active cells can be targeted for recording *in vivo* and *in vitro*. The so called fosGFP+ cells had a higher firing rate both *in vivo* and *in vitro* and received more excitatory inputs and less inhibitory inputs (Yassin et al., 2010). Furthermore, these cells are more likely to be connected amongst themselves. In the anti-correlated networks, there are neurons with a high in-degree but a low out-degree which make the network more stable, and neurons with high out-degree but low in-degree that make the network more sensitive. The fosGFP+ neurons could correspond to the former group, which form the backbone for the spontaneous activity. We did not explicitly build in assortativity in the network to preferentially connect high indegree neurons to each other as suggested by Yassin et al. (2010). We take from this result that the prevailing homeostatic processes create networks with more strongly connected sub-networks and produce cell-to-cell heterogeneity in the balance between excitation and inhibition. Training to detect electrical stimulation should thus be able to induce similar changes in network structure.

## **THE SENSITIVITY ESTIMATED USING DIFFERENT MEASURES OF NETWORK ACTIVITY**

Rodents were able to distinguish between patterns of neural activity during spontaneous activity and those caused by single-cell nanostimulation. Nevertheless, this distinction was small, given the effect size measured experimentally (Houweling and Brecht, 2008). One hypothesis is that the total amount of activity (firing rate) due to nanostimulation significantly exceeds that expected of a typical fluctuation. For a stationary network dynamics, this implies a fixed threshold above which a fluctuation is more likely caused by nanostimulation, whereas fluctuations below the threshold are more likely due to spontaneous activity. This can be quantified using a ROC curve, and the area under it, the AUC. The ROC is the curve traced out by varying this threshold and plotting the true positive rate (nanostimulation above threshold) vs. false positive (spontaneous fluctuations above threshold). When both distributions for the fluctuations are Gaussians, the AUC corresponds to the difference in means divided by the (common) standard deviation (Kingdom and Prins, 2010). Hence it is a measure of the difference in response relative to the size fluctuations around it. We found that the main determinant of the AUC is the out-degree of the stimulated neurons, independent of the correlation between in- and out-degree in the network. The AUC increases with coupling strength and baseline firing rate. The anti-correlated network has an advantage because it allows for a broader range of *J* and *r*<sup>0</sup> values. It thus has an increased stability at equal sensitivity.

The above represents an underestimate of the sensitivity, because it assumes that the activity of each neuron contributes equally to the detection (decision) and that the temporal signature of the firing rate fluctuation is not informative. Our further analysis shows that each of these factors would improve detection performance and makes it therefore likely that state-of-the-art classification approaches such as support vector machines would even further improve performance. Taken together this means that as a system the rodent brain could reach a much higher sensitivity than predicted here, when it could utilize all the information available in the network activity. Model simulations of spike pattern detection by cortical networks (Haeusler and Maass, 2007) suggests that laminar models with plastic synapses allow for more accurate estimates of the detection capability compared to neural networks that do not take into account the layered structure of cortex.

## **DETECTING SIGNATURES OF ANTI-CORRELATED DEGREE DISTRIBUTIONS**

The model makes the prediction that anti-correlated networks would be more appropriate for the detection of nanostimulation in stable networks. To test this prediction we need to be able to distinguish correlations in the degree structure of the network without having access to all the inputs and all the outputs of a subset of neurons. We find that anti-correlations change the frequency of specific network motifs in a way that is independent of the network size, which means that it can be determined by averaging across many smaller sub-networks. A "ring" motif, number 98, which was a projection from neuron 1 to 2, from 2 to 3 and from 3 to 1, discriminated best between correlated and anticorrelated networks (**Figure 7**). Pairs of motif counts increased discriminability to a small extent, and only when the counts were pooled. This shows that these networks can be detected experimentally based on sampling sub-networks comprised of 30 neurons, when enough samples are available.

## **FUTURE STUDIES SHOULD INCORPORATE MULTIPLE TYPES OF INTERNEURONS**

The model was highly simplified so that we could focus on the connectivity structure. Having established the advantages of anticorrelation, our goal is to study the effects in more realistic networks. There are many other biophysical features that could be included in the model that would change the results quantitatively or, in some cases, even qualitatively. Here we highlight a small selection of the most relevant ones.

The first issue is inhibition. Experimental evidence shows that two types of inhibitory neurons, those expressing parvalbumin (PV) and somatostatin (SOM), are relevant in determining the gain of the response of pyramidal cells to whisker stimulation, visual stimulation or current injection (Gentet et al., 2010; Kwan and Dan, 2012; Lee et al., 2012). Avermann and coworkers (Avermann et al., 2012) constructed a model of L2/3 in barrel cortex constrained by *in vitro* measurements and studied the effect of stimulating varying amounts of pyramidal cells expressing channelrhodopsin by light pulses. In this model the strongest projection, in terms of the connection probability and synaptic strength, was from pyramidal cells to fast spiking (FS) interneurons (corresponding to PV neurons). Even when a relatively small fraction of the pyramidal cells were stimulated, almost all FS cells were recruited. For higher fractions of stimulated pyramidal cells, the non-fast spiking (NFS) interneurons (such as SOM interneurons) would become gradually activated. As a result the pyramidal cell activity remained low despite strong stimulation. The authors hypothesize that the strong inhibition is a mechanism to maintain sparse spiking in the pyramidal cells, with the NFS cells providing a back-up inhibitory mechanism. It is not clear how this computational model would be applicable to *in vivo* dynamics where FS cells are already spontaneously active. Furthermore, the level of activity in the different interneurons depends on brain state (Gentet et al., 2010). We have simulated binary networks with inhibitory neurons and find that anti-correlated degree distributions in the E–E sub-network improve stability (and yield the same sensitivity).

Detection could also take place by a state change in the network. The network has a LFS and a not too biologically plausible HFS, which in the context of a network with inhibition would perhaps correspond to something like an upstate. The true positive rate would correspond to how often single-cell stimulation would drive the network out of the BOA for the LFS, whereas the false positive rate would correspond to how often this would happen in the spontaneous state. The latter is given by the fraction of trials the system goes to the HFS state (**Figure 3**). The former can be tuned by changing the number of neurons and the duration of stimulation. A proper examination of this issue would require a network with a population of inhibitory neurons (Avermann et al., 2012).

A second issue is the effect of including spike timing. Synapses are sensitive through short-term depression and facilitation to the temporal patterns of stimulation (Abbott and Regehr, 2004), which could thereby affect the postsynaptic response in a nonlinear fashion, thereby preferentially activating specific populations of neurons. Dendritic nonlinearities also affect the impact of synaptic inputs based on their temporal coincidence and whether they arrive on the same part of the dendrite (Gasparini et al., 2004; Major et al., 2008; Polsky et al., 2009; Lavzin et al., 2012). Either of these effects could increase the sensitivity to external stimulation, while not appreciably changing the stability, thereby strengthening the results reported in this paper. However, to fully quantify these effects would require new and more extensive simulations that fall outside the scope of this paper.

## **PERCEPTUAL RELEVANCE OF ELECTRICAL OR OPTICAL STIMULATION IN EXPERIMENT**

Our study explores a hypothesis for how to achieve detection of an electrical stimulation by quick recurrent excitation that escapes before being shut down by inhibition, without destabilizing the spontaneous state. We now review the relevant literature focusing on the difference between electrical and sensory stimulation and the role of inhibition.

The barrel cortex normally processes thalamic activity generated in response to whisker stimulation. According to the canonical cortical circuit (Douglas and Martin, 2004; Lefort et al., 2009; Petersen and Crochet, 2013) this activity arrives first in layer 4 (L4) of the barrel column representing the stimulated whisker and then goes to L2/3 and subsequently to L5. It stands to reason that when during a task an animal needs to make a decision based on whisker stimulation, this is based on activity in L2/3 or L5 that came there by way of L4. The path taken by activity induced by optical, micro- or nanostimulation does not necessarily directly involve L4 and improving detection could thus require altering the underlying cortical circuit.

When monkeys were trained to detect microstimulation at a location in the visual cortex corresponding to a specific retinotopic location, the stimulation threshold for detection was reduced from about 50μA to 5μA over a few thousands of trials (Ni and Maunsell, 2010). At the same time the contrast threshold needed to detect real visual stimuli at the same retinotopic location increased from 4–8% to 8–60%. When the monkeys were subsequently retrained on detecting visual stimuli, the sensitivity was recovered in another few thousand trials, but the sensitivity to electrical stimulation was reduced. One possible interpretation is that learning to detect electrical stimulation reorganizes the recurrent circuits in L2/3 to become more sensitive at the expense of the L4 to L2/3 feedforward connection.

The animal improves its performance when learning to detect microstimulation, which could also be the case for single-cell nanostimulation modeled here. This improvement could occur because of one or more of the following reasons. First, the network could become more anti-correlated by changing the in-degrees. This means that the stability of the network would improve over time and perhaps that the number of false positives would reduce. Second, the out-degree of the stimulated neurons could increase, so that the nanostimulation signal becomes louder, hence the true positives should increase. Third, the neurons involved in the detection process become more sensitive to neurons directly downstream of the stimulated cells.

The threshold for detecting microstimulation in monkey visual cortex matches the strength necessary to elicit action potentials in mouse and cat cortex in the neighborhood of the electrode, 5–10μA (Histed et al., 2009) and in rat barrel cortex 2–5μA (Houweling and Brecht, 2008). These numbers did not depend on whether metal or glass pipette electrodes were used. Stimulation close above this threshold activated a set of widely dispersed neurons within a few hundred microns from the electrode, through antidromic action potentials in axons that are close to the electrode. As a result, the spatial pattern of activation was very sensitive to small changes in the location of the electrode.

Similar stimulus strength, 10μA for 0.1 to 0.5 ms, yielding charge transfers on the order of 1 nC, applied in the infragranular layers could be detected in rats (Butovas and Schwarz, 2007). The authors (Butovas and Schwarz, 2003) estimate that this corresponds to activating 80% of the pyramidal cells within 450 micron of the electrode, yielding an increase in their firing rate of 25% corresponding to about 0.5 excess spike per neuron. Interestingly, trains of electrical stimulation were more effective, indicating that temporal correlation may be necessary to distinguish stimulation from spontaneous activity. Physiological measurements indicated that synapses of pyramidal cells on fast spiking interneurons depress more than the pyramidal to pyramidal synapses, which means that pulse trains could lead to more a prominent increase in activity than single stimuli (Holmgren et al., 2003).

Optogenetics was used to determine how many neurons in L2/3 would be required to generate a change in activity that would be detectable by a mouse (Huber et al., 2008). The authors' estimate of 300 neurons producing one action potential was based on a measured distribution of light intensity thresholds necessary to elicit an action potential, the number of neurons expressing the light-sensitive channelrhodopsin (ChR2) channels and the spatial fall off of the light intensity, and represents according to these authors an overestimate. The number of 300 neurons corresponds to about 5% of the approximately 6500 neurons present in a mouse barrel column (Lefort et al., 2009).

Nanostimulation refers to electrical activation of an individual neuron with a glass pipette in the juxtacellular configuration. Nanostimulation in rat barrel cortex must have led to behaviorally relevant changes in network activity, as the animal was able to detect nanostimulation, but the average effect size was rather small (Houweling and Brecht, 2008). The nature of this activity could not be assessed, but experiments in mouse visual cortex may shed some light on this. Single-cell stimulation led to spikes in the stimulated neuron and calcium transients in some of the surrounding neurons that could be detected using two-photon microscopy (Kwan and Dan, 2012). Such stimulation induced postsynaptic activity in very few other pyramidal cells, 20 out of 1152 measured. SOM interneurons [corresponding to the NFS of Avermann et al. (2012)] were most strongly activated, 5 out of 17 measured. PV expressing cells did not respond to this stimulation, but their calcium transients were most strongly correlated to the network activity produced by the rest of the measured cells. This indicates that in this state the SOM cells would be required to damp the increase in activity generated by the recurrently connected pyramidal cell network.

#### **SUMMARY**

Taken together, experimental results suggest that detection of single-cell stimulation requires a quick propagation of excitatory cell activity, before the various types of inhibition kick in. Our studies indicate that anti-correlated degree distributions could be an important strategy for increasing sensitivity while maintaining stability.

#### **AUTHOR CONTRIBUTIONS**

Paul Tiesinga and Arthur R. Houweling designed the research project, Paul Tiesinga and Juan C. Vasquez wrote the code, Juan C. Vasquez and Paul Tiesinga performed the simulations, Paul Tiesinga wrote the manuscript together with Arthur R. Houweling.

#### **ACKNOWLEDGMENTS**

This work was supported by the Netherlands Organization for Scientific Research (NWO), through a grant entitled "Reverse physiology of the cortical microcircuit," grant number 635.100.023.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 July 2013; accepted: 17 October 2013; published online: 07 November 2013.*

*Citation: Vasquez JC, Houweling AR and Tiesinga P (2013) Simultaneous stability and sensitivity in model cortical networks is achieved through anti-correlations between the in- and out-degree of connectivity. Front. Comput. Neurosci. 7:156. doi: 10.3389/fncom.2013.00156*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2013 Vasquez, Houweling and Tiesinga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

#### *Emiliano Torre1 \*, David Picado-Muiño2, Michael Denker 1, Christian Borgelt <sup>2</sup> and Sonja Grün1,3*

*<sup>1</sup> Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6), Jülich Research Centre and JARA, Jülich, Germany*

*<sup>2</sup> European Centre for Soft Computing, Mieres, Spain*

*<sup>3</sup> Theoretical Systems Neurobiology, RWTH Aachen University, Aachen, Germany*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Shigeru Shinomoto, Kyoto University, Japan Srdjan Ostojic, Ecole Normale Superieure, France*

#### *\*Correspondence:*

*Emiliano Torre, Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6), Jülich Research Centre, Wilhelm-Johnen-Strasse, 52425 Jülich, Germany e-mail: e.torre@fz-juelich.de*

We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (*item sets)* in massively parallel spike trains. This search outputs the occurrence count (*support*) of individual patterns that are not trivially explained by the counts of any superset (*closed frequent item sets*). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size *z* and support *c*. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (*pattern spectrum filtering*, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of *pattern set reduction* (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains.

**Keywords: higher-order correlations, neuronal cell assemblies, spike patterns, spike synchrony, multiple testing, data mining**

## **1. INTRODUCTION**

The cortex is comprised of a highly interconnected network of neurons and thus one may speculate that information processing in the brain may only be understood on the basis of the concerted activity of the neuronal population. Hebb (1949) suggested that neurons coordinate their activities by organizing in functional groups, termed cell assemblies. Synchronous spike input to receiving neurons is known to be more effective in generating output spikes (Abeles, 1982; König et al., 1996), which leads to the hypothesis that temporal coordination of spiking activity or correlational processing is the defining expression of an active cell assembly (Singer et al., 1997; Harris, 2005). As excitatory postsynaptic potentials are small in amplitude compared to the gap between the resting potential and the neuronal firing threshold, it is expected that a cell assembly is composed of many neurons firing in a correlated fashion. This observation is the basis for the assumption that higher-order synchronous spiking activity serves as a signature expression of an active assembly (Riehle et al., 1997; Berger et al., 2010; Staude et al., 2010b; Shimazaki et al., 2012).

In order to observe and detect such signatures in the brain, the spiking activities of many neurons must be recorded simultaneously. Fortunately, in recent years considerable progress has been made in the development of multi-electrode recording techniques [e.g., Nicolelis, 1998; Buzsaki, 2004; Hatsopoulos et al., 2007; Riehle et al., 2013], which enable to record the activity of hundred(s) of neurons. Such massively parallel spike train data pose statistical challenges due to the inherent complexity of the required multivariate approaches. Most notably, increasing the number of observed neurons leads to a combinatorial explosion of the number of potential spike patterns that need to be detected and tested. Based on pairwise correlation analyses only, the existence and functional relevance of neuronal correlations could be demonstrated in various cortical systems and behavioral paradigms [e.g., Gerstein and Aertsen, 1985; Riehle et al., 1997; Kohn and Smith, 2005; Berger et al., 2007; Fujisawa et al., 2008; Feldt et al., 2009; Humphries, 2011; Masud and Borisyuk, 2011]. Nevertheless, a correlation analysis considering the complete set of simultaneously recorded spike trains is required to uncover also higher-order correlations among neurons. In recent years several such approaches were developed, each of which focuses on different aspects: (i) methods to determine the presence of higher-order spike correlations with a minimum order without explicitly identifying the participating neurons [e.g., Louis et al., 2010a; Staude et al., 2010a,b]; (ii) methods that test whether individual neurons participate in synchronous spiking activity without identifying the groups of correlated neurons [e.g., Berger et al., 2010]; (iii) methods that test for the presence of correlation as predicted by a specific correlation model such as a synfire chain (Abeles, 1991), that is, spatio-temporal spike patterns or propagation of synchronous spiking activity [e.g., Abeles and Gerstein, 1988; Schrader et al., 2008; Gerstein et al., 2012; Gansel and Singer, 2012]; (iv) methods that directly identify the members of cell assemblies on the basis of the patterns of synchronous spiking activity [e.g., Gerstein et al., 1978; Pipa et al., 2008; Feldt et al., 2009; Gansel and Singer, 2012; Shimazaki et al., 2012; Picado-Muiño et al., 2013].

In Picado-Muiño et al. (2013) we presented the basic approach and relevant statistics to employ frequent item set mining (FIM) to identify significant patterns of spike synchrony in massively parallel spike trains. FIM enables fast and efficient counting of synchronous spike patterns by pruning the tree of all possible patterns. To address the problem of multiple testing, statistics are not computed for individual patterns, but on the pattern spectrum that collects the number of observed patterns based on their signature. A signature is defined as the pair (*z*,*c*) of pattern size *z* (i.e., number of participating neurons) and *support c* (i.e., number of pattern occurrences). In *pattern spectrum filtering* (PSF) those identified sets of neurons for which patterns with the same signature (*z*,*c*) occur also in appropriate surrogate data are then marked as chance patterns and discarded.

Here, we extend the approach of Picado-Muiño et al. (2013) in three ways that will enable the application of the method to biological data. First, we refine the statistical test employed in pattern spectrum filtering for reporting significant patterns of a given signature (Section 2). Then, we introduce a subsequent analysis step, termed *pattern set reduction* (PSR), to additionally filter out those patterns that are detected as significant, but are compositions of chance spikes or patterns and the actual cell assembly pattern (Section 3). Finally, we report on the performance of our method related to features describing the data (e.g., coincidence rate, assembly pattern size, firing rate heterogeneity or non-stationarity) and analysis parameters (Section 4). The discussion (Section 5) includes a step-by-step instruction on how to utilize the proposed method in the context of massively parallel spike trains obtained from electrophysiological recordings.

## **2. SPIKE PATTERN DETECTION AND STATISTICAL TESTING**

In this section we introduce our approach to detect frequent synchronous spike patterns in massively parallel spike trains (MPST). We first briefly review frequent item set mining (FIM) and related terminology and definitions as proposed in Picado-Muiño et al. (2013) as a tool to efficiently detect and count synchronous spike patterns in MPST. Then we derive a modified version of the FIM-based statistics proposed in Picado-Muiño et al. (2013) for assessing pattern significance.

### **2.1. FREQUENT ITEMSET MINING**

Given *N* parallel spike trains with neuron ids 1, 2,..., *N*, observed in the time window [0, *T*), we partition [0, *T*) into *b* exclusive bins {*bi*} *b <sup>i</sup>* <sup>=</sup> <sup>1</sup> of identical width *w* = *T*/*b* (typically chosen as a few ms): *bi* = [(*i* − 1) · *w*, *i* · *w*). If one or more spikes of one neuron fall into a bin, we consider the bin occupied and reduce the entry to 1 (*clipping*), so that each time bin contains at most one spike per neuron. Spikes from different neurons falling into the same time bin are defined as *synchronous* (see **Figure 1A**). Borrowing terminology from FIM, we define each neuron id as an *item*, the set *Ti* of all items spiking in *bi* as the *i*-th *transaction* in the binned data, and {*Ti*} *b <sup>i</sup>* <sup>=</sup> <sup>1</sup> as the *transaction list*. Given a *minimum pattern size z*0, each set of *z* ≥ *z*<sup>0</sup> items in *Ti* constitutes a *pattern of synchronous spikes*, or *item set* (see **Figure 1B**). Here we

**FIGURE 1 | From spike data to closed frequent itemsets. (A)** Sketch of a raster plot of 4 neurons firing in parallel. Shaded colors separate adjacent bins. Red spikes mark the occurrences of the synchronous pattern composed of neurons 1, 3, 4. **(B)** Transaction list derived from the spike data in **(A)** after binning. **(C)** List of item sets obtained from **(B)**, together with their occurrence counts. Black boxes mark non-frequent item sets (support set to 2), blue boxes mark non-closed frequent item sets, red boxes mark CFISs.

**(D)** Average number of item sets (dashed black line), frequent item sets (dashed blue line) and CFISs (dashed red line) obtained from 100 simulations of 100 parallel independent spike trains with a firing rate of 20 Hz, as a function of the simulation time. Other parameters are bin width *w* = 3 ms and minimum pattern size *z*<sup>0</sup> = 2. Bars mark ±1 std. dev. The solid line indicates the number of time bins (and thus transactions) as a function of the simulation time.

set *z*<sup>0</sup> to 2. Due to clipping, each item set occurs at most once per transaction. The number of occurrences of an item set in the transaction list is the *support* of that item set.

A transaction that contains *K* items yields 2*<sup>K</sup>* − *K* − 1 different (but possibly overlapping) item sets of size *z* ≥ 2, that is, all 2*<sup>K</sup>* possible subsets without the empty set and the *K* singletons. The total number of different item sets in a transaction list can thus largely exceed the number of transactions (i.e., time bins). This number grows with the duration of the data set (see **Figure 1D**) and with the number of parallel spike trains (not shown).

In order to limit the data to potentially interesting and nontrivial item sets, we select only item sets whose support *c* is larger than or equal to a *minimum support c*<sup>0</sup> (*c*<sup>0</sup> ≥ 1) as introduced by Picado-Muiño et al. (2013). Here we set *c*<sup>0</sup> to 2. An item set whose support equals or exceeds the minimum support is called *frequent item set*. For *c*<sup>0</sup> > 1, frequent item sets are usually a small fraction of all item sets (**Figure 1D**, compare black dashed line to blue dashed line). Furthermore, we discard any frequent item set occurring as many times as any of its supersets. These patterns are trivially explained by the occurrences of their supersets, which are more significant due to the larger number of neurons involved. Non-trivial frequent item sets are called *closed frequent item sets* (CFISs; see **Figure 1C**). Discarding non-closed frequent item sets does not yield any loss of information. Indeed, the set *F* of all frequent item sets can be reconstructed from the set *C* of CFISs by

$$\mathcal{F} = \bigcup\_{I \in \mathcal{C}} \bigcup\_{J \subset I, \ |J| \ge z\_0} J.$$

The support *s*(*I*) of a non-closed frequent item set *I* ∈ *F* can be computed as *s*(*I*) = max*J*∈*C*, *<sup>J</sup>*<sup>⊃</sup>*Is*(*J*).

If *A* and *B* are two CFISs such that *B* - *A*, and *cA*, *cB* their respective supports, it follows from the definition of CFISs that *cB* > *cA* (*a priori* property). We refer to the (non-empty) set *A*\*B* as the *excess items* of *A* with respect to *B*, and to the difference *cB* − *cA* as the *excess occurrences* of *B* with respect to *A*.

Following Picado-Muiño et al. (2013), we make use of frequent itemset mining [FIM; for a review, see Goethals (2010), Borgelt (2012)] to extract CFISs and their support from an MPST transaction list. FIM performs a non-redundant search for spike patterns, starting from those of size *z*<sup>0</sup> and then moving on to supersets of increasing size. Starting at lowest-size patterns, the search is organized in a search tree in layers of increasing pattern size. A branch connects two patterns if one is a subset of the other. Each pattern is visited at most once. FIM exploits the *apriori* property to stop the search at infrequent patterns, as no supersets of an infrequent item set can be frequent. The output of FIM is a list of all CFISs with their support (**Figure 1C**).

#### **2.2. PATTERN SPECTRUM FILTERING**

Direct statistical tests of all individual patterns occurring in MPST are not suitable, as they cause a severe multiple testing problem yielding large occurrences of false positives (FPs), or enhanced levels of false negatives (FNs) after statistical corrections. Therefore Picado-Muiño et al. (2013) proposed to pool CFISs according to their size *z* (number of neurons involved) and their support *c* (number of occurrences) in a two-dimensional histogram (*pattern spectrum*) and to evaluate patterns of the same signature (*z*,*c*) for significance by a Monte-Carlo approach using surrogate data. Here we present a refinement of this original approach, named *pattern spectrum filtering* (PSF), that bases the test for a specific signature (*z*,*c*) also on patterns of higher size and support than specified by the signature.

In order to implement the null hypothesis *H*<sup>0</sup> of independent spiking, and to approximate the *p*-values of the signatures (*z*,*c*), from the original data (**Figure 2A**) we repeatedly generate surrogate data (**Figure 2B**), collect from each one its CFISs through FIM as done for the original data, and compute the corresponding surrogate pattern spectrum (**Figure 2C**). The surrogates are generated from the original data by intentionally destroying correlations while keeping other features, such as firing rates, intact [e.g., by spike randomization or spike dithering, Louis et al. (2010b)].

Let be the partial ordering on the real plane, that is, (*x*∗, *y*∗) (*x*, *y*) if *x*<sup>∗</sup> ≥ *x* and *y*<sup>∗</sup> ≥ *y*, where holds if at least one inequality is strict. From each surrogate pattern spectrum we compute a binary spectrum which takes value 1 at each signature (*z*,*c*) such that at least one signature (*z*∗,*c*∗) (*z*,*c*) is occupied, and value 0 otherwise [in contrast to Picado-Muiño et al. (2013) where only the occupation of signature (*z*,*c*) is checked]. Formally, we define the *signature operator* sgt(·) such that, given a CFIS *A* with size *zA* = |*A*| and occurrence count *cA*, sgt(*A*) := (*zA*,*cA*). For each list *S<sup>i</sup>* of CFISs from one surrogate data set, let *P*ˆ*<sup>i</sup>* be the *binary pattern spectrum*, defined for each *z*,*c* ≥ 2 by:

$$\hat{P}\_i(z,c) := \begin{cases} 1 \text{ if } \exists A \in \mathcal{S}\_i \text{ : } \text{sgt}(A) \succeq (z,c) \\ 0 \text{ otherwise} \end{cases}.$$

Averaging the binary spectra at each signature, we get the *p-value spectrum P*ˆ:

$$\hat{P}(z,\mathcal{c}) := \frac{1}{K} \# \left( \mathcal{S}\_i : \exists A \in \mathcal{S}\_i : \text{sgt}(A) \succeq (z,\mathcal{c}) \right).$$

*P*ˆ(*z*,*c*) yields an estimate of the probability to observe (one or more) patterns with signature (*z*∗,*c*∗) (*z*,*c*) under *H*<sup>0</sup> (see **Figure 2D**).

We then classify any signature (*z*,*c*) whose *p*-value is lower than the significance level α∗ as significant. Given the desired overall significance level α for PSF, we derive α∗ from α by Bonferroni correction for the number *m* of tests, i.e., the number of signatures in the data to test for: α<sup>∗</sup> = α/m. Any signature (*z*,*c*) for which *P*ˆ(*z*,*c*) < α<sup>∗</sup> is classified as significant. Formally, we introduce the *significance spectrum S*ˆ defined at each (*z*,*c*) by

$$
\hat{S}(z,c) := \begin{cases}
1 & \text{if } (z,c) \text{ is significant} \\
0 & \text{otherwise}
\end{cases}
$$

In **Figure 2E** *S*ˆ(*z*,*c*) = 1 is marked in white, *S*ˆ(*z*,*c*) = 0 in gray. The border between the two is the *detection border*, on the left of which signatures in the original data are classified as not significant and rejected. Signatures to its right (*S*ˆ(*z*,*c*) = 1) are considered as significant (marked in red in **Figure 2E**). The corresponding patterns and their supports are listed in **Figure 2F**.

**FIGURE 2 | PSF on artificial data. (A)** Raster plot of 100 parallel simulated spike trains consisting of independent Poisson activity plus 6 injections of one pattern of synchronous spikes (highlighted in red) from neurons 1 to 10, occurring at random times (see Section 4 for details). The total firing rate of each neuron is 20 Hz, the simulation time is 3 s. **(B)** Same as in **(A)**, but without injection of synchronous patterns. The spike trains are therefore completely independent. **(C)** Pattern spectrum of CFISs extracted from the data in **(A)** by FIM (*z*<sup>0</sup> = 2, *c*<sup>0</sup> = 2, *w* = 5 ms). Counts are color-coded (logarithmic scale). **(D)** *P*-value spectrum drawn from 5000 surrogate, independent data sets of the type shown in **(B)**. *P*-values are color-coded (logarithmic scale). **(E)** Significance spectrum (overall significance α = 0.01, Bonferroni-corrected for *m* = 50 tests yielding α<sup>∗</sup> = 2 · 10<sup>−</sup>4). Gray squares indicate signatures that are not significant, white squares mark potentially significant signatures. Red squares mark significant signatures of the pattern spectrum shown in **(C)**, i.e., which fall into white squares of the significance spectrum. **(F)** List of patterns detected by PSF. Besides the injected pattern *A* = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, PSF also classifies additional patterns as significant, all being subsets or supersets of *A*.

## **3. PATTERN SET REDUCTION**

PSF tests the significance of patterns under the null hypothesis *H*<sup>0</sup> of fully uncorrelated spike trains. However, PSF might fail in rejecting patterns that result from combinations of chance spikes or chance patterns with the assembly pattern (see list of detected patterns in **Figure 2F** besides the injected one). These patterns are a specific kind of false positive, not resulting from merely independent data. They may be subsets or supersets of the assembly pattern, or patterns that partially overlap with it (**Figures 3A–C**). In this section we define the type of FPs that may occur, investigate why PSF is prone to return such FPs, and propose an additional statistical analysis, termed *pattern set reduction* (PSR), to remove them.

## **3.1. TYPES OF FPs**

#### *3.1.1. Chance subsets*

If a CFIS *A* repeats *cA* times and a subset *B* of *A* (with |*B*| ≥ *z*0) has *c* additional chance occurrences, *B* represents a CFIS repeating *cB* = *cA* + *c* total times. We call *B* a *chance subset* of *A*, having *c* excess occurrences (**Figure 3A**). PSF is designed to test the significance of signature (|*B*|, *cB*) under *H*<sup>0</sup> (complete independence), thus disregarding the fact that *cA* occurrences are due to pattern *A*. As a result it classifies *B* as a significant pattern, thus yielding an FP outcome. This is illustrated in **Figure 2F**, where e.g., pattern {4, 6, 10} occurs twice by chance plus 6 times as a subset of pattern {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. The corresponding signature (3, 8) is significant compared to the surrogates (**Figure 2E**), so that PSF does not reject it.

## *3.1.2. Chance supersets*

If a CFIS *B* occurs *cB* times and another set *C* of neurons fires by chance synchronously with *B* in *c* of those *cB* transactions (with *c* ≥ *c*0), then the pattern *A* = *B* ∪ *C* represents a CFIS repeating *cA* = *c* times. We call *A* a *chance superset* of *B*, with |*C*| excess neurons (**Figure 3B**). PSF tests the significance of signature (|*A*|, *cA*) under *H*0, disregarding the fact that |*B*| of the |*A*| neurons of *A* are due to the presence of pattern *B*. The test is therefore prone to classify *A* as significant. This is the case for patterns {1, 2,..., 10, 80}, {1, 2,..., 10, 28} and {1, 2,..., 10, 24} in **Figure 2F**, each of which occurs twice as a superset of {1, 2,..., 10}. The corresponding signature (11, 2) is significant compared to the surrogates (**Figure 2E**), so that PSF classifies these patterns as significant.

**FIGURE 3 | Excess occurrences and excess items.** Sketch of the possible relationship between a reference pattern and patterns sharing neuron identities and/or time occurrences with it. In each panel, ticks represent individual spikes. Rows correspond to neurons and columns to transactions, i.e., time bins. Spikes forming a pattern are grouped by an ellipse. The reference pattern of each panel is shown by black ticks and is indicated by a

solid ellipse. **(A)** *B* is a subset of *A* with excess occurrences (red). **(B)** *A* is a superset of *B* with excess items (blue). **(C)** *B* is a subset of *A* with excess occurrences (red). Neurons in *C* (blue) additionally fire synchronously to *A* and to excess occurrences of *B*. Thus pattern *D* = *B* ∪ *C* forms a CFIS, which partially overlaps with *A*. **(D)** Patterns *A* and *B* are disjoint: they are composed of different neuron identities and occur at different time bins.

## *3.1.3. Chance overlapping sets*

The simultaneous presence of excess items and excess occurrences can yield yet another type of FP outcome, namely patterns that overlap with the actual assembly. Given an assembly *A*, assume that a subset *B* of *A* has additional chance occurrences. If an additional set *C* of neurons disjoint from *A* fires synchronously to *A and* to an excess occurrence of *B* for a total of *c* ≥ *c*<sup>0</sup> chance times, then the set *D* = *B* ∪ *C* represents a CFIS which partially overlaps with *A* (**Figure 3C**). PSF is prone to classify *D* as significant.

## *3.1.4. Disjoint patterns*

Two patterns which do not have items in common are *disjoint* (**Figure 3D**). In contrast to the previous classes of chance patterns, the presence of an active assembly does not enhance chance patterns disjoint from it. PSF therefore correctly estimates their significance and manages to filter out almost all of them, as shown in 4.

## **3.2. PSR STATISTICS**

Let *P* be the class of CFISs reported as significant by PSF. Given a pair (*A*, *B*) ∈ *P* × *P* such that *B* ⊂ *A* (therefore *cB* > *cA* by definition of CFIS, and |*B*| < |*A*|), we propose statistical tests to assess the conditional significance of either *A given B* (*A*|*B*) or *B given A* (*B*|*A*), i.e., of one pattern given that the other represents an assembly pattern. These tests can be applied, using different strategies, to the class of all such (*A*, *B*) pairs, reducing *P* to a subclass *Q* of patterns which are mutually significant given each other.

## *3.2.1. Subset filtering*

This procedure aims at rejecting FPs that are chance subsets of other CFISs. For each pair (*A*, *B*) ∈ *P* × *P* such that *B* ⊂ *A* (so that *cB* > *cA*), *B* has *cB* − *cA* excess occurrences with respect to *<sup>A</sup>*. Subset filtering tests *<sup>B</sup>*|*A*, i.e., the null hypothesis *<sup>H</sup>B*|*<sup>A</sup>* <sup>0</sup> that *B* is a chance subset of the actual assembly *A*, by assessing the significance of the excess occurrences of *<sup>B</sup>*. Equivalently, *<sup>H</sup>B*|*<sup>A</sup>* 0 states that the pattern *B* defined by the same items as *B* but its excess occurrences only (red spikes in **Figure 3A**) is a chance pattern. If *<sup>H</sup>B*|*<sup>A</sup>* <sup>0</sup> is rejected, *B* is kept and *A* discarded, otherwise *A* is kept and *B* discarded. Thus, the procedure keeps either *A* or *B* and discards the other (*exclusive*). We present two alternatives to test *<sup>H</sup>B*|*<sup>A</sup>* <sup>0</sup> .

*3.2.1.1. Exact test.* This test computes the *p*-value of the signature (|*B*|,*cB* − *cA*) of *B* . If *cB* − *cA* < *c*0, *B* is classified as a chance subset of *A*. Otherwise, let *T <sup>A</sup>* be the transaction list obtained from *T* by discarding the transactions where *A* occurred, and keeping in the remaining transactions only the items composing *A*. All the excess occurrences of subsets of *A* must be contained in *T A*. *B* itself is a CFIS in this transaction list: it is an item set because |*B* |=|*B*| ≥ *z*0, it is frequent because *cB* − *cA* ≥ *c*0, it is closed because otherwise *B* itself would be non-closed. To test the significance of *B* , one can therefore run FIM and PSF on surrogates of *T <sup>A</sup>* to estimate the significance of its signature (|*B*|,*cB* − *cA*)**.** If (|*B*|,*cB* − *cA*) is significant, *B* is significant in *T <sup>A</sup>* and *B* is classified as significant in *T* (given *A*). Otherwise, *B* is classified as non-significant.

*3.2.1.2. Approximate test.* This test approximates the *p*-value of the signature (|*B*|,*cB* − *cA*) in *T <sup>A</sup>* by the *p*-value of the signature (|*B*|,*cB* − *cA* + *h*), *h* ≥ 1, in *T*, already obtained when performing PSF. In contrast to *T <sup>A</sup>*, *T* is composed of more neurons than those which can actually form chance subsets of *A* (because it does not contain the items of *A* only), and more transactions than those where such subsets could actually display excess occurrences (because it also contains the transaction where *A* is already present). Therefore, the *p*-value of (|*B*|,*cB* − *cA*) would be underestimated if computed over *T* instead of *T <sup>A</sup>*. Parameter *h* heuristically corrects for this by substituting it with the *p*value of a signature with the same size but higher support. The lower *h*, the higher the probability to reject *B*. If *h* ≥ *cA*, then (|*B*|,*cB* − *cA* + *h*) (|*B*|, *cB*) and *B* is necessarily reported as significant. This test avoids to run FIM and PSF on *T <sup>A</sup>* and is therefore computationally more efficient.

## *3.2.2. Superset filtering*

This procedure aims at rejecting FPs that are chance supersets of other CFISs. For each pair (*A*, *B*) ∈ *P* × *P* such that *B* ⊂ *A* (so that |*B*| < |*A*|), *A* has |*A*|−|*B*| excess items with respect to *B*. Subset filtering tests *<sup>A</sup>*|*B*, i.e., the null hypothesis *<sup>H</sup>A*|*<sup>B</sup>* <sup>0</sup> that *A* is a chance superset of the actual assembly *B*, by assessing the significance of the excess items of *<sup>A</sup>*. Equivalently, *<sup>H</sup>A*|*<sup>B</sup>* <sup>0</sup> states that the pattern *A* defined by the same transactions as *A* but containing its excess items only (blue spikes in **Figure 3B**), is a chance pattern. If *<sup>H</sup>A*|*<sup>B</sup>* <sup>0</sup> is rejected, *A* is kept and *B* discarded from *P*, otherwise *B* is kept and *A* discarded from *P*. Thus, the procedure keeps either *A* or *B* and discards the other (*exclusive*). We present two alternatives to test *<sup>H</sup>A*|*<sup>B</sup>* <sup>0</sup> .

*3.2.2.1. Exact test.* This test computes the significance of the signature (|*A*|−|*B*|, *cA*) of *A* . If |*A*|−|*B*| < *z*0, *A* is classified as a chance superset of *B*. Otherwise, let *T <sup>B</sup>*¯ be the transaction list obtained from *T* by keeping only the transaction where *B* occurred, and discarding from them the items constituting *B*. All groups of excess items of *B* (i.e., neurons that fire synchronously to *B*) must be contained in *T <sup>B</sup>*¯ . *<sup>A</sup>* itself is a CFIS of this transaction list: it is an item set because |*A* |=|*A*|−|*B*| ≥ *z*0, it is frequent because *cA* ≥ *c*0, it is closed because otherwise *A* itself would be non-closed. To test the significance of *A* , one can therefore run FIM and PSF on surrogates of *T <sup>B</sup>*¯ to estimate the *p*-value of its signature (|*A*|−|*B*|, *cA*)**.** If (|*A*|−|*B*|, *cA*) is significant, *A* is significant in *T <sup>B</sup>*¯ and *A* is classified as significant in *T* (given *B*). Otherwise, *A* is classified as non-significant.

*3.2.2.2. Approximate test.* This test approximates the *p*-value of the signature of *A* in *T <sup>B</sup>*¯ by the *p*-value of signature (|*A*|−|*B*| + *k*, *cA*), *k* ≥ 1, in *T*, already obtained when performing PSF. In contrast to *T <sup>B</sup>*¯ , *T* is composed of more neurons than those that can actually form excess items of *B* (because it contains the items of *B*, too), and more transactions than those where supersets of *B* could actually occur (because it contains also transactions where *B* does not occur). Therefore, the *p*-value of (|*A*|−|*B*|,*cA*) would be underestimated if computed over *T* instead of *T <sup>B</sup>*¯ . Parameter *k* heuristically corrects for this by substituting it with the *p*-value of a signature with the same support but higher size. The lower *k*, the higher the probability to reject *A*. Note that if *k* ≥ |*B*| then (|*A*|−|*B*| + *k*, *cA*) (|*A*|, *cA*) and *A* is necessarily reported as significant. This test allows to avoid running FIM and PSF on *T <sup>B</sup>*¯ for each *B*.

## *3.2.3. Covered-spikes criterion*

This simple selection strategy consists of taking all pairs (*A*, *B*) ∈ *P* × *P* for which *B* ⊂ *A*, and keeping for each pair the pattern covering the largest number of spikes, while rejecting the other. Specifically, the criterion prefers *A* to *B* if *zA* · *cA* ≥ *zB* · *cB*, *B* to *A* otherwise. It does not involve significance tests, but is based on the observation that, given the probability *p* for a neuron to spike in a time bin, the probability for *z* neurons to fire synchronously in a bin is approximately *pz*, so that the probability that this pattern occurrs *c* times is binomially distributed and approximately proportional to *pz*·*<sup>c</sup>* . The larger the *z* · *c* score, the less likely a pattern of such size and support. This matches the finding that the detection border separating non-significant signatures (marked gray in **Figure 2E**) from significant ones (marked white in **Figure 2E**) in the significance spectrum exhibits a hyperbolic shape. The criterion thus keeps the less likely of the two patterns.

A variant consists in keeping the pattern with the largest (*z* − 1) · *c* score. This choice is motivated by the observation that a pattern of size *z* and support *c* can be seen as *z* − 1 spike trains which synchronize their spikes to another train *c* times. Thus, (*z* − 1) · *c* spikes are coincident to spikes in another spike train. Keeping the pattern with the largest (*z* − 1) · *c* score amounts to keeping the pattern which covers more coincident spikes. Geometrically, penalizing the pattern size corrects for the fact that the hyperbolic shape of the detection border in **Figure 2E** is elongated toward the pattern support (*y*-axis) rather than being equilateral.

## *3.2.4. Combined filtering*

Subset filtering, superset filtering and covered-spikes criterion can be combined into a filtering procedure which tests for both excess coincidences and excess items. Combined filtering tests for each pair (*A*, *<sup>B</sup>*) <sup>∈</sup> *<sup>P</sup>* <sup>×</sup> *<sup>P</sup>* both the null hypotheses *<sup>H</sup>B*|*<sup>A</sup>* <sup>0</sup> (i.e., that *B* is a chance subset of *<sup>A</sup>*) *and <sup>H</sup>A*|*<sup>B</sup>* <sup>0</sup> (i.e., that *A* is a chance superset of *B*). If one of the null hypotheses is rejected, the corresponding pattern is retained as significant. Thus, if both hypotheses are rejected, both patterns are retained (*inclusive*). Accepting one null hypothesis does not necessarily lead to the rejection of the corresponding pattern (in contrast to subset or superset filtering): the pattern is rejected only if the other pattern is accepted, i.e., if the other null hypothesis is rejected. If both *<sup>H</sup>B*|*<sup>A</sup>* <sup>0</sup> and *HA*|*<sup>B</sup>* <sup>0</sup> are accepted, one of the two patterns is kept based on the covered-spikes criterion.

## **4. CALIBRATION ON ARTIFICIAL DATA**

In this section we compare the performance (in terms of FPs and FNs) of PSF to PSF followed by PSR to illustrate the advantages yielded by the latter. For the sake of computational efficiency we employ the approximate versions of the tests for the subset and superset filtering with parameters *h* = 1 and *k* = 2, respectively. We test different types of artificial data that involve typical features of experimental data. After studying the general behavior of the analysis method for stationary, homogeneous data, we study data sets with heterogeneous firing rates across neurons, and with non-stationary firing rates in time.

## **4.1. CORRELATED DATA**

As a model for data containing assembly activity, we generate correlated spike trains by a modified version of the singleinteraction-process [SIP; Kuhn et al. (2003); Berger et al. (2010)], which we keep calling SIP for convenience. First, we simulate *N* = 100 parallel independent Poisson spike trains as background activity. Then we model assembly activity by inserting synchronous spike events in a subset of *z* of the *N* neurons (the *SIP neurons*, with ids 1 to *z*). This is done by generating a hidden Poisson process with the desired number *c* of pattern occurrences, from which spikes are copied into each of the *z* spike trains of the SIP neurons. Thus, as compared to the model proposed by Kuhn et al. (2003) we insert correlated firing only in a specific subset of the parallel processes. Before insertion of the synchronous patterns, the background firing rate of the SIP neurons is reduced by the rate of the hidden process to ensure the same firing rate for all neurons. In the simplest scenario, the firing rates and the pattern occurrence rate are stationary over time and homogeneous across neurons. More complicated cases will include either nonstationarity or heterogeneity of rates. The purpose of the analysis of such data is to test under controlled conditions if the simulated assembly is indeed detected and can be distinguished from background activity.

### **4.2. INDEPENDENT DATA**

To implement the null-hypothesis *H*<sup>0</sup> of complete independence needed to derive the significance of signatures of the correlated data, we generate independent Poisson processes of the same rates as the data to be tested, thus keeping the same marginal statistics. This is one way of implementing the null-hypothesis. However, in the context of analyzing real experimental data, one may want to keep more statistical features of the experimental data (e.g., nonstationary and heterogenous firing rates, deviation from Poisson, and so on). This can be realized by the use of more complex surrogates derived by manipulation of the original data, e.g., spike dithering (Grün, 2009; Louis et al., 2010b).

#### **4.3. ASSESSING SIGNIFICANCE**

We evaluate the performance of our analysis in terms of the average number of FPs and FNs obtained with PSF and PSR in *R* = 1000 iterations on the same model of correlated data (SIP of size *z* in *N* = 100 parallel spike trains). To study the performance of our analysis, we investigate 243 models differing in the size of the injected assembly *z* = 2,..., 10, its injection count *c* = 2,..., 10, and the firing rates *r* = 5, 10 or 20 Hz (here: homogeneous for all neurons). We analyse each model with a bin width *w* = 3 ms and *w* = 5 ms for the detection of synchronous spike patterns. See **Table 1** for an overview of the parameter combinations. For the significance estimation we generate surrogate data, i.e., independent Poisson processes with the same firing rates as the correlated data, and analyse them with FIM as done for the correlated data. This procedure is repeated for *K* = 5000 times to derive the *p*-value spectrum and then the significance spectrum by employing an overall significance level of α = 0.01, Bonferroni-corrected for the number of signatures tested. The latter is given by the number of signatures existent in the correlated data, which never exceeded *m* = 50. In order to have the same corrected significance level for each of the 1000 iterations of each SIP model, we always correct for *m* = 50 tests, instead of correcting for the individual number *m* < *m* of signatures found in each data set. This yields the corrected significance level α<sup>∗</sup> = 2 · 10<sup>−</sup>4, which is typically more conservative than correcting individually for *m* tests. This procedure allows us to use a single significance spectrum for all 81 SIP models with the same firing rates, differing by parameters *z* and *c* only, and for all 1000 realizations of each model. To obtain the *p*-values with precision α∗ we generate *K* = 1/α<sup>∗</sup> = 5.000 surrogates, compute their binary spectra and average them to draw the *p*-value spectrum (see Section 2.2).

**Figure 4** shows significance spectra obtained from surrogate data for models differing by the firing rate *r* (5, 10 or 20 Hz) analysed with different bin widths *w* (dark gray for *w* = 3 ms, light gray for *w* = 5 ms; α<sup>∗</sup> = 2 · 10<sup>−</sup>4). The set of non-significant signatures shows a hyperbolic shape, which grows with both *r* and *w* to higher *z* and higher *c*. Both factors, higher firing rates and


*Parameters for the background activity: N: number of neurons, r: firing rate, T : simulation time. Parameters for correlated data: z: number of neurons in correlated activity (size of SIP), c: SIP occurrences. Analysis parameters: w: bin width, c0: minimum item set support, z0: minimum item set size. Statistical parameters:* α*\*: Bonferroni-corrected significance level (for m* = *50 tests), K: number of surrogates, R: number of simulation runs per SIP model.*

**FIGURE 4 | Significance spectra for different parameter sets.** Independent Poisson spike trains (*N* = 100; *T* = 3 s) of different firing rates (*r* = 5, 10 or 20 Hz) serve as surrogates for the computation of three significance spectra (from left to right). Each square represents a (*z*, *c*) signature. Dark-shaded gray squares mark non-significant signatures obtained with *w* = 3 ms. Light-shaded squares represent further non-significant signatures for *w* = 5 ms. White squares indicate significant signatures for both choices of the bin width. Other parameters: *z*<sup>0</sup> = 2, *c*<sup>0</sup> = 2, α<sup>∗</sup> = 2 · 10<sup>−</sup>4, *K* = 5000.

larger bin width, cause more spikes per bin, and therefore larger and more frequent chance patterns.

## **4.4. PERFORMANCE, HOMOGENEOUS FIRING RATES**

For each SIP parameter set we simulate the corresponding model *R* = 1000 times, and evaluate FPs and FNs of each realization. Their averages measure the performance of the analysis for each parameter constellation.

As previously discussed (Section 3), in the presence of correlations PSF tends to classify chance subsets, supersets or overlapping sets as significant, thus yielding FPs. **Figure 5**, top row, shows this effect on simulations of SIP models differing by SIP size (*x*-axis of each panel) and injection count (*y*-axis). For each model, the FP level is computed as an average over 1000 stochastic simulations. The total amount of FPs increases as the SIP size and/or the number of injections get larger. The contribution of FP supersets (green) and FP subsets (blue) is about the same, while in comparison FP overlapping sets (yellow) occur only at higher values for *z* and *c*, and FP disjoint patterns (purple) are almost never observed. As shown in **Figure 5**, bottom row, PSR (here, combined filtering) largely reduces the amount of FPs. Although the PSR statistical tests apply to chance subsets (blue) and supersets (green) only (Section 3.2), they successfully remove most of the overlapping patterns (yellow) as well. The reason is that, if there is a CFIS *D* overlapping with the actual assembly *A* by *z*<sup>0</sup> or more items, their intersection *B* is a CFIS as well (**Figure 3C**). In most cases PSF classifies *B* as significant together with *A* and *D*. If so, PSR likely rejects *D* when testing *HD*|*<sup>B</sup>* <sup>0</sup> , and rejects *B* when testing *HB*|*<sup>A</sup>* <sup>0</sup> .

A reduction of the amount of FPs typically comes at the expense of enhanced FNs. In particular, FNs may occur if the real pattern is rejected in favor of one of its subsets or supersets. **Figure 6** shows, for a range of combinations of SIP size and injection count, the resulting level of FPs, FNs, and the maximum of the two (as a measure of overall performance) after performing each of the proposed PSR strategies. The significance spectrum used to determine significance for all realizations of the SIP models is the one for *w* = 3 ms shown in **Figure 4** (top right, dark-shaded entries). For the FPs shown in **Figure 6**, top row, the color-coded level refers to the fraction of simulations (out of 1000) containing one or more FPs. This measure takes values between 0 and 1, unlike the average FP counts shown in **Figure 5**. This representation simplifies the comparison with the average FN level, which ranges between 0 to 1 since here only a single spike pattern is injected in every simulation. To aid the comparison between the performances of PSF and PSR, gray dots mark those squares that correspond to models where the error rates exceed 5%. PSF on its own never performs well in terms of FNs and FPs simultaneously, while all PSR strategies yield a range of models for which both quantities are low. In summary, the relative improvement of PSR versus PSF shows that any PSR strategy reduces the FP rate considerably, while causing only a minor increase in the FN rate.

## **4.5. PERFORMANCE, HETEROGENEOUS FIRING RATES**

If neurons have the same spiking statistics, the spike pattern statistics depends on the pattern size only. Thus, the *p*-value of

**FIGURE 5 | Average number of FPs, distinguished by type, after PSF and PSR.** Average number of FPs obtained for different SIP models on *R* = 1000 model simulations. FPs are shown after performing PSF (top) and then PSR with combined filtering (bottom), and are distinguished by type (columns from left to right: FP supersets, FP subsets, FP overlapping, FP disjoint

patterns). Each panel shows the average number of FPs obtained for different SIP models, each corresponding to a square in the grid: the models differ by the SIP size (from 2 to 10; *x*-axis) and its injection count (from 2 to 10; *y*-axis). Other parameters (same for all simulations): *N* = 100, *T* = 3 s, *r* = 20 Hz, *w* = 3 ms, *K* = 5000, α<sup>∗</sup> = 2 · 10<sup>−</sup>4.

(third row) indicates the combined error rate. Each matrix shows the performance for 81 different SIP models varying by SIP size (from 2 to 10, *x*-axis) and number of SIP injections (from 2 to 10, *y*-axis), of stationary and applied after PSF, from left to right: no filtering, subset filtering, superset filtering, covered-spikes criterion, combined filtering. Other parameters (same for all panels): *N* = 100, *T* = 3 s, *w* = 3 ms, *K* = 5.000, α<sup>∗</sup> = 2 · 10<sup>−</sup>4.

each pattern is fully determined by the pattern signature. This does not hold when neurons have different spiking statistics, and in particular different firing rates. Here we discuss the case of heterogeneous firing rates across neurons, which are often present in electrophysiological data. Higher firing rates lead to a higher spiking probability per time bin. Patterns composed of neurons with higher firing rate are more likely to occur by chance, and are therefore less significant than patterns composed of neurons with lower rates. Thus, the *p*-values of patterns with the same signature (*z*,*c*) differ for different compositions of the firing rates. Pooling patterns by size and support in the pattern spectrum does not take into account the heterogeneity of firing rates across neurons and thus may lead to a biased statistics.

To investigate the robustness of our method against firing rate heterogeneity, we first simulate independent data consisting of 100 neurons, with a small population of neurons (2 to 10) firing at a higher rate (20 Hz) than the rest of the neurons (5 Hz). We simulate 1000 data sets of this type, and evaluate FPs in each of them by means of FIM and PSF (*K* = 5000 surrogates). In none of the simulations we detect significant signatures, i.e., FPs. The opposite scenario, where 2 to 10 neurons fire at 5 Hz and the others at 20 Hz, does not yield FPs as well. Thus, employing rate-preserving surrogates allows PSF to correctly estimate the significance of signatures under *H*0, also when rates are heterogeneous across neurons.

Next we study correlated data characterized by heterogeneous background firing rates. We investigate two cases based on a SIP model. In scenario S1, a pattern is injected in a set of neurons firing with lower firing rate (*rS* = 5 Hz) than the independent neurons firing at rate *rI* = 20 Hz (**Figure 7**, left column). In contrast, in scenario S2 the pattern is injected in neurons with higher firing rates (*rS* = 20 Hz, *rI* = 5 Hz; **Figure 7**, right column). In comparison to the homogeneous case where all neurons fire at 5 Hz (data not shown), the overall performance drops significantly, but does not so compared to the 20 Hz homogeneous case

**FIGURE 7 | Performance of PSR with heterogeneous firing rates.** Performance of PSR (combined filtering with parameters *h* = 1, *k* = 2) in terms of FP rates (*top* row), FN rates (middle row) and the combined error rates (maximum of FP and FN rates) (*bottom* row) of data with heterogeneous rates. Left column: SIP neurons fire at *rS* = 5 Hz, independent neurons fire at *rI* = 20 Hz. Gray dots mark entries where the error rate is above 5%. Right column: SIP neurons fire at *rS* = 20 Hz and independent neurons at *rI* = 5 Hz. Other parameters (same for all panels): *N* = 100, *T* = 3 s, *w* = 3 ms, *K* = 5.000, α<sup>∗</sup> = 2 · 10<sup>−</sup>4.

(see **Figure 6**, right column). This is consistent with the previous finding that higher rates worsen the performance by shifting the detection border in the significance spectrum to the right (**Figure 4**, left vs. right). This also explains why FP and FN rates in scenario S1 are higher than in scenario S2: the average firing rate in the former ranges (depending on the SIP model) from 18.5 to 19.7 Hz, in the latter from 5.3 to 7 Hz. Our choice of using PSR with combined filtering leads to a better performance in this scenario than the covered spikes criterion (not shown). Taken together, these results indicate that the method can deal well with heterogeneity of firing rates without severe performance loss.

#### **4.6. PERFORMANCE, NON-STATIONARY FIRING RATES**

Now we want to consider the case when the firing rates of the neurons are not stationary in time. To explore the sensitivity of our method to non-stationarities we employ simulated data, again consisting of 100 parallel spike trains, which fire in two consecutive epochs of length *T*<sup>1</sup> and *T*<sup>2</sup> (the total simulation time *T* = *T*<sup>1</sup> + *T*<sup>2</sup> is 3 s, as in the data previously analysed) at different rates (*r*<sup>1</sup> = 5 Hz and *r*<sup>2</sup> = 20 Hz; or vice versa), homogeneously across the neurons in both epochs. In the first epoch, correlated activity is inserted by the SIP model. SIP of size 2 to 10, injected 2 to 7 times, amount to a coincidence rate of 1.33 to 4.66 Hz in the first epoch. The background rate is reduced correspondingly. For comparison, we also study the stationary case, where all neurons fire at *r* = 10 Hz. The performance for the three scenarios is shown in **Figure 8** (first column: *r*<sup>1</sup> = 5 Hz, *r*<sup>2</sup> = 20 Hz; second column: *r*<sup>1</sup> = 20 Hz, *r*<sup>2</sup> = 5 Hz; third column: *r*1,<sup>2</sup> = 10 Hz). Although our analysis performs better (detection border more to the left) in the stationary case (*r* = 10 Hz; third column), it can still recover SIP activity with no FPs in a large portion of the parameter space, provided that rate-preserving surrogates are employed. As in the heterogeneous case, FPs increase when the SIP neurons have higher firing rates and thus more FP subsets occur. As apparent from **Figure 8**, bottom row, the method can correctly detect significant patterns in a wide range of models also in the presence of non-stationary rates. To study whether short transients in the firing rates tend to generate FPs, we repeated the analysis for *T*<sup>1</sup> = 0.5 s, *T*<sup>2</sup> = 2.5 s, setting first *r*<sup>1</sup> = 5 Hz, *r*<sup>2</sup> = 20 Hz and then *r*<sup>1</sup> = 20 Hz, *r*<sup>2</sup> = 5 Hz. In all cases we do not find enhanced FPs (data not shown), indicating that employing ratepreserving surrogates suffices to correct for rate non-stationarity in independent data.

## **5. DISCUSSION**

In this study we have presented a method to detect significant patterns of synchronous spiking in a subset of massively parallel spike trains in the presence of background activity. Our work is rooted in Picado-Muiño et al. (2013), where we demonstrated how to efficiently detect spike patterns in such data, and assess their significance under the null hypothesis of independent firing. Here we refined this significance test, which evaluates the significance of patterns using PSF on the basis of the pattern signature (size and support). PSF is prone to report FP patterns that arise due to the activation of an actual assembly mixed with chance synchrony because of background activity. To identify and remove these FP

comparison, the third column shows the performance for the stationary case with all neurons firing at rate *r* = 10 Hz, and a duration of *T* = 3 s. Other parameters (same for all panels): *N* = 100, *w* = 3 ms, *K* = 5.000, α<sup>∗</sup> = 2 · 10<sup>−</sup>4. Gray dots mark entries where the error rate is above 5%.

detections, we introduced here PSR as an additional statistical testing step. As shown in **Figure 6** (second to last columns), PSR succeeds in eliminating FPs for a wide range of parameters, at the expense of a minor increase in FNs. A series of calibrations demonstrates the effectiveness of our approach under conditions of heterogeneous and non-stationary firing rates.

The relevance of higher-order correlations for information processing in the nervous system is hotly debated. Approaches based on maximum entropy models, such as Schneidman et al. (2006), suggest that higher-oder correlations contribute by a negligible fraction to the total network correlation, which appears to be dominated by pairwise correlations. However, it is important to stress that for correlations of a specific order, maximum entropy models estimate the overall magnitude of that correlation order, and are not sensitive to individual correlation structures of that order. Thus, the presence of a single group of correlated neurons with a certain size in the data is not enough for maximum entropy models to report significant correlation of the corresponding order. The study by Shlens et al. (2006) addresses this point, discussing that maximum entropy models may miss higher-order correlations because they overall contribute only by a negligible fraction to the total correlation. Besides, Roudi et al. (2009) showed that the statistical power of maximum entropy models describing spike correlations in heavily undersampled biological systems (such as parallel recordings with electrode arrays) is low. Despite these challenges, Ohiorhenuan et al. (2010) have shown using a maximum entropy model approach that in visual cortex local microcircuits exhibit evidence of higher-order interactions, whereas correlation statistics across long-range connections are explained on the basis of pair-wise interactions. However, methods designed to investigate individual spike patterns are needed to investigate the detailed structure of correlation in groups of spiking neurons.

A majority of current methods for spike correlation analysis limit themselves to fully synchronous patterns or to patterns with a specific size of typically low order [e.g., Grün et al., 2002a,b; Berger et al., 2007, 2010; Shimazaki et al., 2012]. Other approaches, such as CuBIC (Staude et al., 2010b), conclude on the presence of higher order correlations based on the statistics of the population activity without identifying the specific units engaged in such correlations. While Gansel and Singer (2012) presented a method for the detection of higher-order patterns, they identify pattern subsets by a purely heuristic procedure that is not accessible by analytic treatment, and that tests patterns directly, which requires a number of statistical corrections to avoid FPs (at the expense of FNs). Our proposed method instead first tests the significance of pattern signatures. PSF eliminates non-significant signatures based on surrogate data through the significance spectrum (see **Figure 4**), and determines the class *P* of associated significant patterns. Testing patterns on the basis of their signature rather than testing individual patterns reduces the number of required statistical tests to the number of signatures found in the data. We have shown that the composition of assembly and background spikes typically leads to the identification of additional significant patterns (i.e., FPs). In order to remove this type of FPs, we introduced here the PSR procedure that is based on conditional pairwise tests.

We have tested the performance of our analysis on artificial data where we embedded groups of synchronously spiking neurons in background activity of independent Poisson spike trains [SIP, cf. Kuhn et al. (2003)]. We studied the rate of FP and FN detections for occurrence rates of the synchronous pattern varying from 0.66 to 3.33 Hz, which reflect plausible values for the activation frequency of the assumed assemblies (Grün et al., 1999; Denker et al., 2010). The analysis shows in particular that by introducing PSR, assembly detection becomes possible with near perfect reliability and precision for a large range of SIP parameters. The transition shifts toward higher support and assembly size as the bin width or the firing rates increase (cf., **Figure 4**). Nevertheless, for physiologically realistic parameters only for very small or very infrequent SIP injections these patterns cannot be distinguished from chance synchrony. Moreover, evaluating patterns obtained from a larger set of simultaneously recorded neurons will have only minor impact on our findings due to a slight increase in the average size of observed patterns.

Non-stationarities of the firing rate in time or across neurons are a common concern faced by correlation analysis methods. The effect of non-stationary firing rates on PSF is two-fold. First, the surrogates used to calculate the significance estimates on pattern signatures should adequately reproduce the experimental rate profiles. Even if the underlying rate profile is not known, a variety of suitable approaches for surrogate generation is available for this task (Grün, 2009; Louis et al., 2010b). However, the sensitivity of detecting assembly activations is further affected by where these occur with respect to the rate non-stationarity. In this respect we tested the performance of PSF and PSR in a scenario of step-wise non-stationary firing rates where spike patterns were injected at selected rate levels only. Compared to the stationary case, the method retains a high performance for large parameter regimes (**Figure 8**), and shows only a slight increase in the number of FNs. For very large rate non-stationarities, a time-resolved analysis may be used to additionally aid the detection, as done, e.g., in the Unitary Events analysis (Grün et al., 2002b). In a similar framework, we found that also heterogeneous firing rates across neurons (**Figure 7**) exhibit a performance similar to the stationary case. While we see minor increases in the number of FPs, we remark that to a large extent these are indeed supersets of the injected pattern due to the high probability of gaining an additional coincident spike by chance from the set of neurons spiking at high rates.

In this study we assumed that assemblies occur at the time resolution of the data, i.e., that spike times of the assemblies are not jittered in time. In electrophysiological data this is a rare scenario, and instead spike synchrony typically occurs with a temporal jitter of up to several milliseconds [Grün et al., 1999; Pazienti et al., 2008]. In order to capture such slightly imprecise synchrony, exclusive binning is typically applied (Grün et al., 1999), where the bin width is chosen large enough to capture the jittered spike pattern. However, the spikes of the pattern may be split into adjacent bins with a probability that depends on the jitter, bin size, and pattern size. Therefore, the original synchronous events are destroyed, leading to increased FN rates (Grün et al., 1999). In **Figure 9** we show how this effect can have a substantial impact on the performance of the method. We applied PSF followed by PSR (combined filtering) on data where synchronous patterns are injected with a jitter of ±1 ms, and analysed with a bin width of *w* = 3 ms (left column) and *w* = 5 ms (right column). The performance drops considerably due to an increase of the FP rate for higher *z* and *c*, and an overall increase of the FN rate. The performance is slightly better for a bin width of 5 ms. Consistent with these findings, Grün et al. (1999) showed that for two parallel spike trains about 60% of the synchronous events are lost if the bin width corresponds to the jitter width. An earlier modification of exclusive time binning [multiple shift method, Grün et al., 1999] that avoids the splitting of jittered synchrony was not trivially applicable to large numbers of parallel spike trains. In Picado-Muiño et al. (submitted) we demonstrate how to implement a method for pattern detection based on the inter-spike distances rather than discrete time binning. This approach successfully detects jittered spike patterns and therefore trivially exhibits a performance in the context of PSF that is similar to that achieved in the absence of jitter (see Picado-Muiño et al., submitted, for details). Thus, it also complements the PSR framework presented in this study. Therefore, we suggest to detect jittered synchrony by the continuous detection method and perform the analysis by the proposed sequence of FIM, PSF, and PSR.

A further scenario that remains to be addressed in the future is unreliability in spiking activity that causes neurons to selectively skip participation in assembly activations. This scenario

was discussed in the context of the synfire chain model, where it was shown that stable propagation of synchronous spike packages through the network happens reliably although the probability that individual neurons participate in each activation of the synfire chain is lower than 1 (Diesmann et al., 1999). Selective participation may arise as a consequence of synaptic failure. The multiple interaction process [MIP; Kuhn et al., 2003] was proposed as a stochastic model implementing such a behavior. Our method would interpret the variable composition of spikes in a single MIP event as occurrences of multiple SIP events of lower support.

We conclude with a discussion of the practical implementation of the proposed analysis on data from electrophysiological recordings. Given a set of parallel spike recordings obtained at a resolution (i.e., binning) *w*, we choose the minimum pattern size *z*<sup>0</sup> and the minimum pattern support *c*<sup>0</sup> of the analysis. First, the spike data is binned and, using FIM, the CFISs and the corresponding pattern signatures are obtained from the transaction list. While this approach is feasible for the experimental data available today, with several hundreds of parallel recordings the computational effort may become too large. In this scenario, we suggest to pre-filter the data entering the analysis as suggested by Berger et al. (2010) before applying FIM on the reduced set of neurons. To monitor dynamic changes in the correlation structure of the activity, e.g., if assemblies are time locked to a particular behavioral event, one may choose to additionally perform the analysis in sliding windows.

Next, the significance of the observed patterns is evaluated by PSF under the null-hypothesis of full independence implemented by uncorrelated surrogate data. For experimental data, several techniques for surrogate generation based on stochastic sampling have been proposed in the past [for a review, see Grün, 2009]. Surrogates that preserve the firing rate profiles, such as spike dithering, seem most appropriate since PSF determines pattern significance based on the firing rates. Given the significance level α and *m* detected pattern signatures, a minimum of *K* = *m*/α surrogates are required to achieve the Bonferronicorrected significance level α<sup>∗</sup> = α/*m* . Once the surrogates have been generated, we follow the procedure described for the simulated data. CFISs, pattern signatures and the resulting binary pattern spectrum are obtained for each surrogate run. Next, the *p*-value spectrum is obtained as an average of the binary spectra (see Section 2.2). The signatures whose *p*-values do not exceed the Bonferroni-corrected significance level α∗ are marked as significant, and the CFISs of significant signatures are collected into the class *P* of potential assemblies. Finally, PSR with combined filtering is performed to reduce *P* to a subclass *Q* of patterns which are mutually significant with respect to each other.

In summary, the use of FIM combined with the statistical tests described in this study and in Picado-Muiño et al. (submitted)

#### **REFERENCES**


Buzsaki, G. (2004). Large-scale recording of neuronal ensembles. *Nat. Neurosci.* 7, 446–451. doi: 10.1038/ nn1233


represents a powerful tool to extract candidate assemblies from experimental data. The method is statistically rigid, computationally feasible, robust against heterogeneity in the data, and powerful enough to deal with the limited amount of data typically available from electrophysiological experiments. We expect that our approach will help to reveal how precise spike synchronization observed by pairwise analysis in relation to behavior (Riehle et al., 1997) is manifested at the level of neuronal populations.

## **ACKNOWLEDGMENTS**

We thank Günter Palm for stimulating discussions. This work was partially supported by the Helmholtz Alliance on Systems Biology, BrainScales (EU Grant 269912), the Helmholtz Portfolio Supercomputing and Modelling for the Human Brain (SMHB), and the Spanish Ministry for Economy and Competitiveness (MINECO Grant TIN2012-31372).

## **SOFTWARE AND SUPPLEMENTAL MATERIAL**

The FIM library underlying the Python scripts with which we carried out our experiments is available at http://www.borgelt. net/pyfim.html. Python and shell scripts for related experiments as well as more extensive result diagrams are available at http://www.borgelt.net/accfim.html and http://www.borgelt.net/ cocofim.html. Please also consult http://www.spiketrain-analysis. org for these codes and further information on the analysis of parallel spike trains.


Staude, B., Rotter, S., and Grün, S. (2010b). Cubic: cumulant based inference of higher-order correlations in massively parallel spike trains. *J. Comput. Neurosci.* 29, 327–350. doi: 10.1007/s10827-009- 0195-x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 11 September 2013; published online: 23 October 2013.*

*Citation: Torre E, Picado-Muiño D, Denker M, Borgelt C and Grün S (2013) Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Front. Comput. Neurosci. 7:132. doi: 10.3389/fncom.2013.00132*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Torre, Picado-Muiño, Denker, Borgelt and Grün. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Correlations in background activity control persistent state stability and allow execution of working memory tasks

## *Mario Dipoppa1,2\*† and Boris S. Gutkin1,3\**

*<sup>1</sup> Departement d'Etudes Cognitives, Ecole Normale Superieure, Group for Neural Theory, Laboratoire des Neurosciences Cognitives INSERM U960, Paris, France*

*<sup>2</sup> Ecole Doctorale Cerveau Cognition Comportement, Université Pierre et Marie Curie, Paris, France*

*<sup>3</sup> Centre national de la recherche scientifique, Paris, France*

#### *Edited by:*

*Robert Rosenbaum, University of Pittsburgh, USA*

#### *Reviewed by:*

*Carson C. Chow, National Institutes of Health, USA Zachary P. Kilpatrick, University of Houston, USA*

#### *\*Correspondence:*

*Mario Dipoppa and Boris S. Gutkin, Departement d'Etudes Cognitives, Ecole Normale Superieure, Group for Neural Theory, Laboratoire des Neurosciences Cognitives INSERM U960, 29 rue d'Ulm, 75005 Paris, France e-mail: m.dipoppa@ucl.ac.uk; boris.gutkin@ens.fr*

#### *†Present address:*

*Mario Dipoppa, University College London, 21 University Street, London WC1E 6DE, UK*

Working memory (WM) requires selective information gating, active information maintenance, and rapid active updating. Hence performing a WM task needs rapid and controlled transitions between neural persistent activity and the resting state. We propose that changes in correlations in neural activity provides a mechanism for the required WM operations. As a proof of principle, we implement sustained activity and WM in recurrently coupled spiking networks with neurons receiving excitatory random background activity where background correlations are induced by a common noise source. We first characterize how the level of background correlations controls the stability of the persistent state. With sufficiently high correlations, the sustained state becomes practically unstable, so it cannot be initiated by a transient stimulus. We exploit this in WM models implementing the delay match to sample task by modulating flexibly in time the correlation level at different phases of the task. The modulation sets the network in different working regimes: more prompt to gate in a signal or clear the memory. We examine how the correlations affect the ability of the network to perform the task when distractors are present. We show that in a winner-take-all version of the model, where two populations cross-inhibit, correlations make the distractor blocking robust. In a version of the mode where no cross inhibition is present, we show that appropriate modulation of correlation levels is sufficient to also block the distractor access while leaving the relevant memory trace in tact. The findings presented in this manuscript can form the basis for a new paradigm about how correlations are flexibly controlled by the cortical circuits to execute WM operations.

#### **Keywords: correlations, background activity, working memory, spiking neural network, persistent activity**

## **INTRODUCTION**

Working memory (WM), defined as short term storage of information that is actively used on-line to carry out actions and decisions and drive learning, is one of the key processes that underpins our cognitive abilities. WM is characterized by an information bottleneck with resources restricting its "on-line" capacity to a relatively limited number of items at high levels of performance (Miller, 1956; Luck and Vogel, 1997; Cowan, 2001; Vogel et al., 2001) and a rapid decrease in performance with item number due to limited resource allocation (Wilken and Ma, 2004; Bays and Husain, 2008; van den Berg et al., 2012) as suggested by the recent experiments. Furthermore by its very nature, WM is characterized by the need to operate on the stored information rapidly. Such limitations and rapid operations of WM create the need for selective gating and rapid updating as well as active information maintenance to enable its immediate use (Frank et al., 2001). One of the central unresolved issues is how the multiple requirements for WM are carried out by the brain circuits: whether the maintenance, read-in, gating, and read-out are implemented by separated systems (e.g., as suggested by Baddeley, 2003) or by operations within the same neural circuit (e.g., as recently put forward by Machens et al., 2005).

Electrophysiological data from primate performing delayedresponse tasks show that persistent neuronal activity in prefrontal cortex (PFC) underlies the maintenance of WM: during the delay period between the stimulus presentation and the read-out, neurons selective to the memorized stimulus fire spikes at an elevated rate with respect to the resting state (Fuster and Alexander, 1971; Fuster and Jervey, 1981; Funahashi et al., 1989; Miller et al., 1996; Romo et al., 1999).

In order to highlight the unique requirements of the WM as a neural process let us focus on the DMS task with distractors as a prototypical example (Miller et al., 1996). In this task the subject must remember the identity of an item briefly shown (the sample) and respond correctly only when the item is shown again (match) all the while ignoring other items flashed (distractors). To execute correctly this task, the neural circuitry needs to perform three operations (**Figure 1A**): first, encode and maintain in memory the sensory stimulus during the delay period; second, robustly maintain the memory face to distractors presentation; third, erase the memory trace at task completion to make the store available again, given the limited WM capacity. These operations are translated in terms of neural activity as follows: item-related activity is turned on rapidly and selectively by the sample-stimulus, is

**FIGURE 1 | Outline of the models. (A)** Time sequence of the delay match-to-sample task for the working memory network. Active neurons are represented in full colors. Successively: (I) both populations are in a quiescent state, (II) sample stimulus (blue arrow) activates blue population, (III) the network prevent a distracting stimulus (red arrow) to activate the red population, (IV) match stimulus allows the read-out of the encoded memory in the blue population, and (V) persistent activity is erased in the blue population. **(B)** Correlations in external

background activity generated by a common source of noise, in addition to independent sources of noise. **(C)** Single unit network receiving shared and independent sources of noise. **(D)** Winner-take-all network with two competing excitatory populations coupled through one inhibitory population. In addition to independent sources the excitatory populations receive background activity by two different common noise sources. **(E)** Two-unit network with two excitatory populations receiving shared noise.

protected from distractors during the delay period, and is rapidly turned off on response by the match.

A number of spiking network models have been conceived to describe the neural substrate for WM where persistent activity is maintained by recurrent connections that allow for co-existing attractor memory states and a ground non-memory state (Amit and Brunel, 1997; Compte et al., 2000; Brunel and Wang, 2001; Gutkin et al., 2001; Laing and Chow, 2001; Machens et al., 2005; Miller and Wang, 2006; Ardid et al., 2010). In some of these models, protection from distractors and memory clearance are performed through the recruitment of inhibition (Compte et al., 2000; Brunel and Wang, 2001; Machens et al., 2005). As an alternative to the erasing-by-inhibition paradigm, it has been shown, in a spatial WM model, that a transient excitatory stimulus matching the memory trace "location" on the network extinguishes the persistent state by transiently synchronizing the spike-times of the neurons (Gutkin et al., 2001; Laing and Chow, 2001). This work, along with Machens et al. (2005) showed how the read-out and clear-out can be merged into a single operation. However, in these alternative frameworks, protection from distractors, or selective gating, was not addressed. Here we propose that the gating is obtained by flexibly controlling the spike-time structure of the WM network activity. In support of this idea, it has been shown that spike-time synchronization is modulated in association with cognitive processing (Abeles et al., 1993; Riehle et al., 1997; Funahashi and Inoue, 2000) and in particular in WM (Sakurai and Takahashi, 2006; Pipa and Munk, 2011).

Critically, WM trace appears in the context of on-going background activity. While background activity is not related to task parameters, this is not without structure. Correlations have been found broadly in spontaneous neural activity in the cortex (Tsodyks et al., 1999). In particular, it has been shown that nearby neurons receive common inputs from afferent neurons making their voltages correlated (Lampl et al., 1999). Effects of correlations have been widely studied for their effect on population code (Salinas and Sejnowski, 2001), to measure network connectivity (Aertsen et al., 1989; Cocco et al., 2009), on neural dynamics for coupled neurons (Ly and Ermentrout, 2009), and for multiple independent neurons (Galán et al., 2006; Moreno-Bote et al., 2008).

In computational models of WM the background activity has been largely seen as problematic for memory maintenance. For example one of the more sensitive technical issues addressed by several computational proposals is how to stabilize the WM trace in face of random background activity (Compte et al., 2000). The benefits of external input correlations on persistent activity in recurrent networks have only recently started to be addressed theoretically (Buice et al., 2010; Polk et al., 2012). For the specific case of line-attractor networks (modeling parametric WM) Polk et al. (2012) showed in a detailed analysis how properly tuned input noise correlations can promote stability of the persistent firing rate. This was further noted in Lim and Goldman (2012) who also showed that the correlation structure of background noise can suggest the optimal architecture of neural networks for short term memory performance.

Finally, in this article we examined the influence of input correlations on recurrent spiking networks, finding that the correlation level in fact may destabilize the persistent activity state, rendering it a slow transient state. Buice et al. (2010) used a path integral approach to integrate the effects of correlations and synchronization into a rate model of recurrent networks and examined the stability of the persistent state. For a bistable firing-rate network they noted that transient increases in input correlations (synchronizing noise input) can lead to a turnoff of the persistent activity. This approach may in fact provide an analytical framework of the observations we make in the present manuscript for recurrent spiking networks and the correlation-based control of the persistent state lifetime. In this manuscript we also go beyond noting that input correlations defined the lifetime of persistent activity; we show that input correlations can effectively control the access to WM by disallowing transient stimuli to initiate persistent activity. The functional consequences of these two effects are the central topic of this work.

To demonstrate that, by controlling the correlation-driven synchronization of the background activity it is possible to control the lifetime of the persistent state and to manipulate selectively the transitions in sustained activity and consequently to perform the required operations of the WM task, we first consider a minimal recurrent network. In this recurrent network the neurons receive an excitatory random background noise, and background correlations are induced by a common noise source. Then we implement a discrete item WM model where the modulation of the background correlation level sets the network into different regimes allowing for loading of memory, protection from distractors and memory persistence. In addition we show the possibility to merge the read-out and the clearance in a single operation since the presentation of the match stimulus can directly quench the persistent activity.

## **MATERIALS AND METHODS**

#### **NEURAL MODELS**

In this work we study recurrent spiking networks that show bistability between a ground state and an active persistent spiking state. Our goal is to construct and analyze a minimal network capable of showing the required bistability. Hence we consider networks of recurrently connected excitatory pyramidal neurons. The elements of the network are represented by non-linear "point" neurons that are sparsely connected by instantaneous excitatory recurrent synapses. The dynamics of a neuron's membrane potential *v* is described by the Quadratic Integrate and Fire (QIF) equation, which represents the normal form of type 1 spike generating dynamics (Ermentrout, 1996):

$$
\pi \frac{d\nu}{dt} = \nu^2 - b^2 + I\_{\text{syn}}(t) \tag{1}
$$

$$\nu(t) = V\_t \to V\_r \tag{2}$$

where τ represents the membrane time constant, −*b* is the resting potential, *I*(*t*) the input current, *Vt* a spike threshold, and *Vr* the reset membrane potential. The voltage of the neuron is scaled such that *v* is a non-dimensional variable. When the membrane potential neuron attains the threshold value *v* = *Vt*, a spike is emitted and a post-synaptic current (PSC) is transmitted to an output neuron. We set the parameters as follows: *Vr* = −20, *Vt* = 20, *b* = 1 and τ = 20 ms.

The input current to a given cell in the network is decomposed into three different components:

$$I\_{\rm syn}(t) = I\_r(t) + I\_s(t) + I\_{\rm ba}(t) \tag{3}$$

where *Ir*(*t*) represents the recurrent input due to other neurons in the network, *Is*(*t*) represents the input from external stimuli directed to the network, and *I*ba(*t*) represents a non-specific background activity. Each of the three currents corresponds to a sum of PSCs originating from synaptic inputs generated by the presynaptic neurons at times *tn*. The PSCs are modeled with delta pulses:

$$I(t) = \sum\_{a} \sum\_{\{t\_n\}} J\_a \pi \delta(t - t\_n) \tag{4}$$

where *Ja* represents the synaptic strength for a given connection and could be positive (corresponding to an AMPA synapse) or negative (corresponding to a GABA synapse).

#### **BACKGROUND ACTIVITY AND CORRELATIONS MEASURES**

Ample data shows that cortical neurons receive a large amount of non-specific cortical and subcortical inputs whose structure is not directly related to the specific task and stimulus [e.g., see Shadlen and Newsome (1994) and summary of data in Amit and Brunel (1997)]. We refer to this type of input as an external background activity. It is taken to be composed of sequences of excitatory PSCs of synaptic strength *J*<sup>0</sup> and with the synaptic times generated by a Poisson process. The synaptic currents are depolarizing in accordance with the notion that cortical neurons receive inputs from long-range excitatory glutamatergic projections.

In our model, this background activity can be either unstructured (uncorrelated) or structured (correlated). The correlation level, between two spike trains *Si*(*t*) and *Sj*(*t*) is given by:

$$
\lambda\_{\vec{\imath}\vec{\jmath}} = \frac{1}{\langle S\_{\vec{\imath}}(t) \rangle} \int \text{CCV} \mathbf{F}\_{\vec{\imath}\vec{\jmath}}(s) ds \tag{5}
$$

where CCVF corresponds to the cross-covariance function (Brette, 2009). This function is normalized to zero if *Si*(*t*) and *Sj*(*t*) are generated by independent Poisson processes.

We consider two ways for constructing the background activity:

#### *Uncorrelated background activity*

All *N* neurons receive spike trains generated by *N* independent channels with rate ν0. This leads to CCVF(*s*) = 0 and thus the correlation level is λ*ij* = 0.

#### *Correlations induced by a common source of noise (Figure 1B)*

All the *N* neurons receive inputs both from independent channels, with frequency (1 − λ)ν0, and from a common channel, with frequency λν<sup>0</sup> and 0 ≤ λ ≤ 1. Each channel generates a spike train with Poisson statistics. The average background input rate is ν<sup>0</sup> for each neuron. The cross-covariance function is then CCVF(*s*) = λ*Si*(*t*)δ(*s*) and the correlation level is λ*ij* = λ. This gives purely spatial correlations.

We measure the correlation level of the synaptic input among cells in the network with the mean Pearson correlation coefficient. We first compute a running mean (averaged over a time window of 5 ms) of the synaptic input *I<sup>i</sup> <sup>a</sup>*(*t*) for each cell during a certain interval of time. Then we compute the Pearson correlation between the synaptic input of two cells:

$$\rho\_{ij} = \frac{\text{cov}(I\_a^i, I\_a^j)}{\sigma(I\_a^i)\sigma(I\_a^j)} \tag{6}$$

Finally we compute the average over all the cell pairs of the network ρ = [2/*N*(*N* − 1)] *Ni* <sup>=</sup> <sup>1</sup> *<sup>N</sup> <sup>j</sup>* <sup>=</sup> *<sup>i</sup>* <sup>+</sup> <sup>1</sup> ρ*ij*. In particular, in **Figure 4**, we performed this measure for the recurrent input *a* = *r* and background input *a* = *ba* .

#### **FUNCTIONAL NETWORK STRUCTURES IMPLEMENTING WM TASKS**

In this work we study three different networks. We start out by studying a homogeneous network of recurrently coupled excitatory neurons. This network can be also thought of as a encoding a single item of WM: a "single-unit network". The second model consists of two homogeneous excitatory networks coupled together through a population of inhibitory neurons: a "winner-take-all network" of two discrete competing short-term memory items. The third model is made up of two recurrent excitatory populations without mutual connections: a "two-unit network".

#### *Single-unit network*

A homogeneous network with *N* = 100 identical sparsely coupled neurons is represented in **Figure 1C**. Each neuron in the network receives synaptic inputs from *cN* other excitatory neurons, where *c* = 0.2 is the probability of connection, and *J* = 0.26 is the recurrent synaptic strength [described in Equation (4)]. Neurons receive excitatory inputs also from external background activity, with synaptic strength *J*<sup>0</sup> = 0.151 and firing rate ν<sup>0</sup> = 106 Hz. Neurons also receive an excitatory input from external sensory stimuli with synaptic strength *J*<sup>1</sup> = 1.5 and firing rate ν<sup>1</sup> = 56 Hz for a duration of 50 ms, as will be described hereafter. Parameters of the network are chosen such that the network sustains a quiescent state, with low firing rate (*f* < 5 Hz), and a persistent state, with high firing rate (≈20 Hz).

#### *Winner-take-all network*

The second model is a reduced version of the network proposed by Amit and Brunel (1997) (**Figure 1D**). The network is composed of two excitatory populations and one inhibitory neural population. Each of the two excitatory populations has *NE* = 40 neurons, and the third population is made up of *NI* = 20 inhibitory neurons. An excitatory neuron receives synaptic inputs from *c*EE*NE* (*c*EE = 0.45) neurons of the same population, with synaptic strength *J*EE = 0.3, and from *c*EI*NI* (*c*EI = 0.35) inhibitory neurons with synaptic strength *J*EI = −0.25. An inhibitory neuron receives synaptic inputs from *c*IE*NE* (*c*IE = 0.34) excitatory neurons with synaptic strength *J*IE = 0.05 from each excitatory population. Other parameters of the network are: *J*<sup>0</sup> = 0.4, *J*<sup>1</sup> = 1.5, ν<sup>0</sup> = 60 Hz, and ν<sup>1</sup> = 17 Hz. In addition we augment the mutual inhibition network with the added feature to control the amount of correlated noise in each excitatory population. More precisely each excitatory population receives background activity by common noise sources in addition to independent sources. In such a way the correlation level λ is regulated independently in each excitatory population.

#### *Two-unit network*

We devised a third version of our network models that is made of two excitatory independent populations, each one making recurrent connections with itself. Both networks share a common excitatory noise source projecting simultaneously to all excitatory neurons in addition to the independent uncorrelated background noise (**Figure 1E**). Since the common noise source is shared between the two populations the correlation level λ varies equally in the two excitatory populations. The parameters of each excitatory population are those given for the single-unit network, with the only difference that we used here a larger network (*N* = 1000) and we scaled accordingly the recurrent synaptic strength (*J* = 0.026).

#### **DELAYED MATCH-TO-SAMPLE TASK**

We study the spike-timing based mechanisms able to implement the DMS task (**Figure 1A**). The sequence of operations and the neural dynamics aim to reproduce the experimental results of Miller et al. (1996). For illustrative purposes, the discrete items can be viewed as corresponding to colors. The activation of one excitatory population encodes color blue, that we define population *B*, while the other encodes color red, that we define population *R*. If the populations are both in a quiescent state, the state of the network represents the absence of color information. For simplicity we represent the spontaneous state as the quiescent state (average firing rate ≈0 Hz).

During the task, the animal has to maintain a memory of an item (a color) during the delay period. In terms of neural activity, the corresponding excitatory population should be activated and maintained in a persistent sate. Additionally the model should protect the memory from the presentation of a distractor stimulus. At task completion after the decision, the system should erase rapidly the memory, i.e., the persistent activity should be deactivated to its quiescent state.

To establish that a network performs a WM task correctly we require it to perform all the operations of the task. The first operation, *load*, corresponds to loading the memory by the sample signal, and corresponds to *B* that is activated in a persistent state while *R* is in a quiescent state. The second operation, *protect*, corresponds to the maintenance of the blue item memory in the face of the distractor presentation. In terms of activity it corresponds to *B* maintained in the persistent state and *R* that is not activated to the persistent state even when the red stimulus is presented during the delay period. In networks (**Figure 1E**) where population *B* and population *R* are not connected, the operation *protect* can be separated in two independent sub-operations: *maintain* (maintain item memory in population *B*) and *block* (prevent activation of population *R*). The third operation, *clear* corresponds to the clearance of the memory encoded in the network. This is equivalent to the erasing of the persistent activity in the network. Note that in this work we do not focus explicitly on the read-out mechanism following the presentation of the match stimulus.

In particular in the winner-take-all network (resp. the twounit network), operation *load* is executed with success if the sample stimulus activates population *B*. This is measured before distractor presentation during 350–450 ms (resp. 350–450 ms): ν*<sup>B</sup>* > 5 Hz and ν*<sup>R</sup>* < 5 Hz, where ν*<sup>B</sup>* < 5 and ν*<sup>R</sup>* denote the average population firing rates of populations blue and red, respectively. Operation *protect* is executed with success if population *B* maintains the persistent state and population *R* is not activated. This is measured before match presentation during 750–850 ms (resp. 700–800): ν*<sup>B</sup>* > 5 Hz and ν*<sup>R</sup>* < 5 Hz. Operation *clear* is executed with success if population *B* is deactivated at task completion. This is measured during an interval after match presentation, during 1150–1250 ms (resp. 1050–1150 ms): ν*<sup>B</sup>* < 5 Hz and ν*<sup>R</sup>* < 5 Hz.

#### **NUMERICAL ANALYSIS**

All the numerical results are obtained by algorithms run in Python. The differential equations are integrated with Euler steps of *dt* = 0.1 ms. The mean population firing rate *f* is computed over population average in 10 ms.

Data points for networks and associated error bars are computed by averaging over simulated individual network realizations. We generated random connectivity matrices such that every neuron receives the same number of input connections. For a fixed network connectivity matrix, we computed the average over 100 realizations of background activity and stimuli for each of 30 random realizations of the network connectivity matrix when not otherwise stated.

## **RESULTS**

## **EFFECTS OF CORRELATIONS ON PERSISTENT ACTIVITY STATE IN THE SINGLE-UNIT NETWORK: ERASING AND BLOCKING THE MEMORY TRACE**

We examine how correlations in the background activity control selective persistent activity in WM networks. Hence we start out by analyzing how background correlations affect the transitions between the quiescent and self-sustained states in our network model.

Correlations in background activity are generated by the addition of a common noise source to independent stochastic channels (see **Figure 1B**). By changing the relative firing rate of the common source with respect to the independent channels we control the correlation level λ. We set two different protocols represented in **Figure 2A**. In the first protocol, the correlation level is increased instantaneously from λ = 0 to some value λ > 0 at 500 ms. Therefore given that the stimulus activates the persistent state, this protocol allows us to test the effects of the correlations on the probability that the active state is erased and we refer to it as the erasing protocol. In the second protocol the correlation level is set λ > 0 for all the time, before the transient stimulus appears. In this way it is possible to see the effect of correlations on blocking the ability of the stimulus, presented during 50–100 ms, to initiate the persistent state and we refer to it as the blocking protocol.

We first demonstrate the prevalent effect of correlations: control of active memory state and control of access to the memory. In an example of the erasing protocol the excitatory stimulus activates the network into a persistent state; at 500 ms correlations are increased and the persistent state is disrupted (**Figure 2A**). In an example of the blocking protocol the excitatory stimulus is not able to activate the persistent state (see **Figure 2B**). In order to understand how these effects depend on the activity parameters we ran a large number of simulations where we injected background activity with different correlation levels for 0 ≤ λ ≤ 1 to networks with different connection probability *c* to measure how this effect spread thanks to the network architecture (**Figure 2C**). We compared networks with the same scaled synaptic strength *J* such that *cJN* = const. = 5.2. In the erasing protocol we estimated the erasing probability *Pe*(*c*, λ) defined as the probability for the network to have the firing rate ν < 5 Hz in the interval 800–900 ms. We discarded trials where the network is not in a persistent state (ν > 5 Hz during 400–500 ms). In the blocking protocol we estimated the probability that the correlations block the stimulus; the blocking probability *Pb*(*c*, λ) defined as the probability for the network to have the firing rate ν < 5 Hz in the interval 400–500 ms. This could also be seen as a gating of the persistent activity. We observe that for both protocols, the increase of both *c* and λ disrupts the persistent state: in the first case by erasing it and in the second case by blocking its activation.

We hence wanted to assess how the network size influences the stability of the persistent state under the various background activity regimes (**Figure 2D**). We compared networks with different size *N* with an equal average synaptic input *cJN* = const. = 5.2. We measured both erasing probability *Pe*(*N*, λ) and blocking probability *Pb*(*N*, λ) as a function of λ. We observe that both the erasing probability and the blocking probability (*Pe* and *Pb*) increase with the network size *N*. We observe that both for fixed *c* and for fixed *N Pb* > *Pe*. Finally we studied the probabilities *Pe* and *Pb* as function of *N*, fixing both *J* = 0.26 and the number of inputs that each neuron receives, i.e., *cN* = cost = 20. We computed these probabilities averaging over 500 trials. We found that with such a scaling both *Pe* and *Pb* are approximately constant (**Figure 2E**).

In order to determine whether the optimal stimulus parameters to load of a memory (or activation of a persistent state) depend on the correlation strength we measured the loading probability (1 − *Pb*) as function of the stimulus strength (ν1) and for different values of λ (**Figure 3A**), we computed the probabilities of **Figure 3** averaging over 300 trials. Different values of λ change the amplitude of (1 − *Pb*) but do not shift the tuning with respect to ν1. We also found that there are two peaks of (1 − *Pb*): one at about ν<sup>1</sup> ≈ 20 Hz and another at about ν<sup>1</sup> ≈ 50 Hz. To test whether the positions of the two peaks depend on the recurrent network properties we measured the loading probability as function of ν<sup>1</sup> and for different values of the recurrent synaptic strength *J* (**Figure 3B**). Similarly to the previous results, different values of *J* change the amplitude of (1 − *Pb*) but do not shift the peaks of the curves with respect to ν1. In summary this indicates that indeed the strength of the stimulus required to active the persistent state with a set probability is dependent on the background correlations, and yet the tuning is rather broad.

To further investigate the effect of correlation on the stability of the persistent state we determined the lifetime of the sustained activity and the level of correlations prior to the erasing time. We defined the end of the persistent state *t*stop (magenta vertical line,

**Figure 4A**) as the first period of 10 ms (after the correlation onset) during which the firing rate of the network falls below 5 Hz. Noticing that in most of the trails a peak of activity was preceding the persistent state erasing, we defined the time of such a peak *t*peak (black vertical line, **Figure 4A**) as the last period of 10 ms before *t*stop that the firing rate attains a local maximum (in time) and that is beyond 20 Hz. We determined for each trial where the persistent state was not erased before the onset of correlations *t*corr = 800 ms (red vertical line, **Figure 4A**), the interval *t*c.p. = *t*peak − *t*corr. We determined the interval *t*p.s. = *t*stop − *t*peak. We performed this protocol for three different values of the correlation level: λ = {0.3, 0.6, 0.9} (**Figure 4B**). We found that the distribution of *t*c.p. decreases with time for all the values of

λ = 0.8 all the time (yellow shaded areas). The network receives an excitatory stimulus during 50–100 ms. The stimulus fails to activate the

> correlations. When the level of correlations is larger (**Figure 4B**, top) the probability of reaching the peak earlier in time slightly increases with λ. Furthermore we found that the interval between the peak of activity and the erasing of the activity in the network is narrowly distributed in time. Finally this interval is independent of the correlation level, meaning that the correlations do not have a strong effect on this timing (**Figure 4B**, bottom). We computed these distributions averaging over 500 trials.

fixed λ = 0.6 and *J* = 0.26 and with *c* scaled such that *cN* = *const* = 20. *Pe* and *Pb* remain approximately constant.

The mean Pearson correlation coefficient ρ (see Materials and Methods) of the synaptic input in the network during the interval with uncorrelated background activity was compared with that during the interval with correlated background activity just preceding the peak. Only trials where *t*peak − *t*corr > 100 ms were considered. The interval with uncorrelated background activity is defined as the 100 ms preceding *t*corr (gray shaded area, **Figure 4A**). The interval with correlated background activity is defined as the 100 ms preceding *t*peak (red shaded area, **Figure 4A**). We computed ρ for the background input (red lines, **Figure 4C**) and for the recurrent input (black lines, **Figure 4C**) both for the uncorrelated interval (dashed lines) and for the

strength (ν1) for different values of the correlation level (λ). **(B)** Probability of activation as function of the stimulus strength (ν1) for different values of the recurrent synaptic strength (*J*).

correlated input (continuous lines) when λ = 0.6. We found that ρ during the correlated input is smaller in the recurrent input with respect than in the background input. However, ρ of the recurrent input is larger during the correlated interval than during the uncorrelated interval. Interestingly we found that during the correlated interval, while the correlation coefficient of the background input increases with λ (**Figure 4D**, bottom), the correlation coefficient of the recurrent input instead remains approximately equally distributed when λ is changed (**Figure 4D**, top). This suggests that the network has reached the maximal amount of sustainable correlations before turning off.

To understand whether the persistent activity deactivation is caused by an increase of spike synchrony we tracked the synchrony of the spike times using the multivariate SPIKE-distance measure *S* (Kreuz et al., 2013) (**Figure 5A**). The spike synchrony is given by 1 − *S* spanning the values between 0 (no synchrony) and 1 (perfect synchrony). We compared the average spike synchrony during two intervals, similarly to **Figure 4**: the first interval corresponds to the 100 ms preceding the start of correlated background activity and the second interval corresponds to the 100 ms preceding the last peak of activity before the deactivation of the persistent activity (provided that the onset of this last interval does not precede the start of the correlated background activity). The distribution of the average value of 1 − *S* (computed

**FIGURE 4 | Persistent state suppression is preceded by an increase of recurrent correlation. (A)** Timing outline of erasing persistent activity. Mean recurrent input (blue trace), mean background input (red trace), and external input (black trace) are represented together with the time at which activity is erased (*t*stop, magenta vertical line), time of the last peak of activity before erasing (*t*peak, black vertical line), and time of the onset of the correlations in background activity (*t*corr, red vertical line). The interval of uncorrelated activity and the interval of correlated background activity during which the correlation coefficient is measured

in panels **(C)** and **(D)** are represented with gray and red shaded areas, respectively. **(B)** Distribution of the interval *t*c.p. = *t*peak − *t*corr (top) and distribution of the interval *t*p.s. = *t*stop − *t*peak (bottom) for different values of λ. **(C)** Mean Pearson correlation coefficient (ρ) of the recurrent input (black traces) and of the background input (red traces), computed during the uncorrelated interval (dashed traces) and the correlated interval (continuous traces). **(D)** Coefficient ρ of the recurrent input computed during the correlated interval for different values of λ (top). Same analysis for the background input (bottom).

over 2000 trials) during these two intervals shows that there is a weak increase of spike synchrony preceding the persistent activity deactivation with respect to the case of uncorrelated background activity (**Figure 5B**). This weak increase could indicate that few spike coincidences might be the cause of the persistent activity turning off.

## **EFFECTS OF BACKGROUND ACTIVITY CORRELATIONS IN A WINNER-TAKE-ALL NETWORK**

We show here that modulating appropriately in space and time the correlation level of the background activity in a network performing a WM task significantly improves correct execution of all the required operations: *load*, *protect*, and *clear*.

We compared two different versions of the winner-take-all network; each made of two excitatory populations *B* and *R*, representing respectively colors blue and red. The two populations interact via a third population of inhibitory neurons that creates a winner-take-all mechanism. The two versions differ in that the first one receives only uncorrelated background activity while in the second each excitatory population receives also background activity from a different common noise source (**Figure 1D**).

We fixed the stimuli sequence as follows: during 50–150 ms a sample blue stimulus excites population *B*; during 450–550 ms a distractor red stimulus excites population *R*; during 850–950 a match blue stimulus excites again population *B*. The operations that the network has to do are to load the blue item in memory, to protect the memory at red item presentation, and to clear the memory after the match presentation.

Brunel and Wang (2001) pointed out that in order to perform the DMS task correctly, the distractor stimulus strength needs to be controlled with care: above a certain strength persistent memory-trace is perturbed by the distractor. For our case we suppose that it is reasonable to assume that all sensory stimuli in the task are of the same strength. As a preliminary test we want to confirm that in absence of background correlations the network without common noise source does not perform efficiently when the stimuli are too strong, as was already stated in the reference network described by Brunel and Wang (2001). In the example shown in **Figure 6A** the distractor activates *R* and via the inhibitory population the persistent state in *B* is deactivated leading to a failure of the operation *protect*.

We then consider the network represented in **Figure 1D** that allows the modulation of the correlation level λ in each excitatory population independently. The network initially receives uncorrelated background activity to λ = 0.9. After the first item has been loaded, the system increases the correlation level in the background activity of the other non-activated population *R*. After the match stimulus has been presented, the correlation level is increased also in population *B* to λ = 0.9.

We show an example of the network executing the WM task where the correlation level is modulated independently in the excitatory populations (**Figure 6B**). In this particular example we illustrate a trial where the network performs the required operations of the WM task (compare with **Figure 1D** and see below for statistics across trials). The distractor excites *R* only transiently such that excitation does not last enough to disrupt the activity in *B*, in addition as shown below this happens also for strong distractor stimuli. Therefore the operation *protect* has been executed with success and the memory is maintained. At the end of the match stimulus the persistent activity is disrupted also in population *B* caused by the increase of λ in that population too. Therefore the operation *clear* is executed with success and the memory is erased in the network. This example illustrates that the success of the operations *protect* and *clear* in the network with correlations are not due to the presence of inhibition as was set in the model of Brunel and Wang (2001).

To get quantitative measures of performance for these two networks (with and without correlations in background activity), we analyzed the statistics of *load* and *protect* performance, as a function of the stimulus intensity ν<sup>1</sup> (and thus its strength) (**Figure 7A**). We consistently find higher *protect* performance for correlated background activity than for uncorrelated background activity throughout the whole range for ν1. In fact the success of the *protect* operation depends only gradually on

**FIGURE 6 | Selective correlations implemented in a working memory task.** Two competing populations network, with two item-selective excitatory populations (blue and red) and one inhibitory non-selective population (black). **(A)** Without background correlations, the distracting stimulus activates population *R* and population *B* is deactivated. **(B)** After the activation of *B* at 150 ms, a common source of noise increases the correlations in background

activity (λ = 0.9) in *R*. The correlations block the activation of *R* and maintain the persistent state *B*. After the completion of the task at 950 ms the correlations erase persistent activity in *B*. (Top) Raster plot of the neural activity in the task. (Bottom) Successively: sample stimulus to *B* (50–150 ms), distracting stimulus to *R* (450–550 ms), and match stimulus to *B* (850–950 ms).

optimal value for the network with correlations. **(B)** The performance of the network is measured on four different operations for the optimal value of

probability of preventing memory disruption by a distracting stimulus in the protocol with correlations (protect), probability of erasing of the memory at the end of the task (clear). All probabilities have a high value showing that the network has good task performance, above chance.

the distractor strength. On the other hand in order to perform operation *protect* above chance level in the network with uncorrelated background activity distractors should be carefully adjusted to have intensity ν<sup>1</sup> < 5 Hz. However, in this range the operation *load* is suboptimal. Hence the uncorrelated model fails in the task. This fact illustrates a recurrent problem in the protect-by-inhibition paradigm: it needs finetuning and achieves only low performance if the stimuli are too strong. Instead, using correlations as a mechanism to protect the activity does not need precise fine-tuning as can been seen in the large range in which both *load* and *protect* are well above chance level. We found the value ν<sup>1</sup> = 4.8 Hz maximizes the joint probability of executing with success *load* and *protect* (**Figure 7A**, vertical dashed line). We show in **Figure 7B** probability of success of the three operations *load*, *protect*, and *clear* finding that all of them score a value higher than chance level.

## **IMPLEMENTING WORKING MEMORY TASK BY FLEXIBLE CORRELATIONS MODULATION**

We now go on to show that mutual inhibition is not a required mechanism for implementing the WM task. We show here that modulating appropriately the background activity correlations in time in a network without inhibitory population allows correct execution of all the required WM operations: *load*, *maintain*, *block*, and *clear* ( Please note that since the network studied here is made of two separated excitatory populations the component *maintain* and *block* of the operation *protect* can be treated separately).

### *Network operating regimes*

In order to characterize the network performance statistics during the task we need to track three probabilities. The first probability *P*g.o. = *PePb* corresponds to the joint probability of deactivating by correlations the network that is in the persistent state (erase) and to block activation of the network that is in the quiescent state and is excited by a stimulus. When *P*g.o. dominates over the other probabilities the system is in a gate-out regime, i.e., memory cannot neither be loaded nor maintained in the network. The second probability *P*g.i. = (1 − *Pe*)(1 − *Pb*) corresponds to the joint probability that, despite the correlations, the network maintains the persistent state, if previously activated, and that the stimulus activates the persistent state when the network is in the quiescent state. When *P*g.i. dominates, the system is in a gate-in regime, i.e., the memory can be loaded and maintained in the network. Finally the third probability *P*s.g. = (1 − *Pe*)*Pb* corresponds to the probability of maintaining the persistent activity in the presence of correlations while blocking the activation of a persistent state with correlated background activity when the system is in a quiescent state and is excited by a stimulus. When *P*s.g. dominates the system is in a selective-gate regime, i.e., the memory is maintained but cannot be loaded. In a sense we want to show that correlations in the background activity can selectively switch the network from the gate-in regime at the outset of the task, to the selective-gate regime during the memory period. We do not consider the probability *Pe*(1 − *Pb*).

To obtain the network performance on the DMS task we considered the statistical results presented in **Figure 2** for the single excitatory population (*N* = 1000 and *c* = 0.2). We note that there is a difference between the erasing probability *Pe*(λ) and the

Joint probability of (not-)erase and (not-)block: *P*g.o.(λ) (black curve), *P*g.i.(λ) (blue curve), and *P*s.g.(λ) (red curve). Gate-out regime corresponds to domination of *P*g.o.(λ) and falls in the range λ > 0.11. Gate-in regime corresponds to dominance of *P*g.i.(λ) and falls in the range λ < 0.04. Selective-gate corresponds to dominance of *P*s.g.(λ) and falls in the range 0.04 <λ< 0.11, with maximal value at λ = 0.07 (red star).

blocking probability *Pb*(λ) in function of the correlation level. In **Figure 8** we present the results for network we consider in this manuscript. We see that when *P*g.i.(λ) dominates (λ < 0.04), the system is in a gate-in regime, i.e., the memory can be loaded and maintained in the network (**Figure 8**). When *P*s.g.(λ) dominates (0.04 <λ< 0.11) the system is in a selective-gate regime. We do not consider here the gate-out regime that corresponds to *P*g.o.(λ) dominating over the other probabilities (λ > 0.11 ). We set the gate-in regime at λ = 0 and the selective-gate regime at λ = 0.07.

### *Modulation of correlation level in time*

We now show that correlations induced by a global common noise source to the whole network executes the DMS task efficiently by modulating the correlation level λ during the different phases of the task. We note that in the mutual inhibition model, at task completion, increase in the correlation level induces the gateout regime and erases the memory. We show here, in a two-unit model (**Figure 1E**), how the presentation of the match stimulus directly can erase the memory thereby implementing a direct match-based suppression without requiring inhibition. In this model each of the two excitatory populations receives background activity from sources independent to each neuron and from a noise source common to all neurons.

An example of the network performing the DMS task is represented in **Figure 9A**. The stimuli are presented in the following sequence: sample stimulus to population *B* at time during 100– 150 ms, distractor stimulus to population *R* during 450–500 ms, and match stimulus to population *B* at time 800–850. In the beginning the system is in the gate-in regime (λ = 0): the sample stimulus activates *B*. From 300 ms the network is set in a selective-gate regime (λ = 0.07): the distractor stimulus activates transiently population *R* while persistent activity is maintained in population *B*. At the end of the task the match stimulus, first, increases the activity in *B* and, then, destroys it.

We can then compute the task performance of the network, corresponding to the success rate that the operations *load*, *maintain*, *block*, and *clear* are executed successfully (**Figure 9B**). These measure are all above chance level. Notice the high level of performance of the *clear* operation.

### **DISCUSSION**

#### **RESULTS AND DATA DISCUSSION**

In this work we present a novel paradigm explaining how the persistent activity can be modulated on-line by the mean of both information-related signal and background activity. This paradigm is based on our result showing that background correlations influence the transition between the persistent state and the quiescent state in a bistable recurrent neural network. We call this phenomenon correlation-induced gating.

In order to implement a multi-unit network performing a WM task, we began by establishing the basis of the correlation-induced gating on a single-unit network. We show that background correlations block and erase a persistent state in a homogeneous recurrent neural network representing a single unit. We found that the transition rate from the persistent state to the quiescent state increases, with the network size and with the connection probability. For all situations the probabilities increase

with the correlation level. Increasing the network size, while fixing the connections probability and renormalizing the synaptic inputs to keep the average input strength constant scales up the probabilities. In other words, in larger size network with weaker but more numerous synapses, correlations appear to have a stronger effect. On the other hand, when we fix the total number of synapses each neurons receives, growing network size does not appear to have much effect on the correlation driven probabilities. These effects could be related to the fact that the amount of correlation between neurons sharing common input is mainly determined by pooling (Rosenbaum et al., 2010, 2011).

We implemented a winner-take-all network composed of two excitatory populations and one inhibitory population. Each of the excitatory populations receives background input from independent noise sources and a noise source common to the neurons of such population. The amount of correlation could be changed independently in the two excitatory populations. By increasing the level of correlations in the populations encoding an irrelevant information we prevented a distractor from loading a memory item in such population. In particular we showed that this model allows to prevent stronger distractors with respect to a model inspired by Brunel and Wang (2001) where the distractor is blocked only by the mutual inhibition. Our model could therefore explain how the response to the distractor stimulus in a WM task could be as strong as for the sample stimulus (Miller et al., 1996). This effect would be in fact not compatible with a model where a distractor is prevented by mutual inhibition.

We implemented a WM network differing by the same previous one in the construction of background correlations that are induced by a shared source. We show that modulating the correlation level in background activity we can set the system in the different regimes. This time instead of modulating the correlations level "in space" we modulate it in time. Depending on the strength of the correlations the system is set in different operating points, namely the gate-in, selective-gate, and gate-out regimes. The gate-in regime allows to load a memory in the WM store and to maintain it subsequently. The selective-gate regime maintains a previously loaded memory but blocks the load of any new memory. The gate-out regime blocks the network both to load and to maintain a memory. We can switch instantaneously from one dynamic regime to the other by tuning the strength of the background activity correlations. We further show that the projection of a strong match stimulus can be sufficient to clear the memory at task completion, thereby suggesting that correlations also play a role in match-suppression.

We must also note that in this work we considered spatial correlations and their effect on the persistent activity and WM task executions. In a companion paper we have shown that temporal structure also has an important effect: the gating modes are modulated by the oscillatory frequency content of the background activity (Dipoppa and Gutkin, 2013). While in the companion paper the block and erase probabilities, and thus the gating modes of the network, are modulated by the oscillation frequency, in this work they are modulated by the correlation level. As opposed to the non-monotone relationship between the oscillation frequency and the block and erase probabilities [Figure 3A of Dipoppa and Gutkin (2013)], there is a monotone relationship between the correlation level and the same measures (**Figures 2C,D**). Hence control of the WM through spatial correlations could be implemented by a simple increase or decrease of activity within a neural population furnishing connections common to the WM store, while the oscillatory control would require more complex task-dependent shifting between the frequency bands. We would like to speculate that the two mechanisms could represent two independent modes of control over the WM networks. Furthermore, in this work we uniquely examine the role of mutual inhibition and show that the spatial correlation structure alleviates the network sensitivity to stimulus strength.

Although we do not propose a mechanism for the readout of the memory information we note that the mechanism proposed by Brunel and Wang (2001) for their network would be compatible with our model. This mechanism corresponds to the fact that a match stimulus will elicit a stronger response with respect to a distractor stimulus in the first few tenths of milliseconds since the first will excite a network that is already in the persistent state. Hence we might speculate that a complementary population of neurons sensitive to rapid transients in the activity might be a way to signal read-out differentially.

#### **MODEL PREDICTIONS AND OPEN QUESTIONS**

The novel paradigm that we present here allows to manipulate persistent activity through background correlations. An advantage of the correlation-induced gating with respect to the inhibition-induced gating is that the gate can be rapidly and flexibly opened or closed depending on the correlation level, instead of being fixed by the network connectivity structure.

The effects that we find for the flexible changes in the correlation levels is the major prediction of the model. We predict that an examination of multi-unit electrophysiological recordings of animals performing a WM task will show the following modulation of correlation level: low level during loading and intermediate level during maintenance (as in the two-unit model of **Figure 1E**) or alternatively high level of correlations in the population of neuron selective for a non-memorized item during the delay period (as in the winner-take-all model of **Figure 1D**). To our knowledge experiments specifically analyzing how correlations change in the PFC as the delay-response task unfolds are still lacking.

At the same time, there are several lines of indirect evidence that lead us to believe that task dependent correlation modulation is indeed possible. First, it has been found that there is a modulation of spike coincidences during different phases of a motor task (Riehle et al., 1997). Riehle et al. (1997) found that at times during the delay when the animal was expecting to generate a response there were transients of synchronized spikes. Furthermore for successful trials there were more synchronized spikes during the delay period than for failure trails. This indeed suggests that spike coincidence is modulated in a functional way. The increase of excess synchrony at response (or expected response) times is compatible with the correlation based memory clearance discussed in this manuscript. Furthermore, it has been found that a change in representation during the delayresponse task leads to an increase of synchronization (Sakamoto et al., 2008). Pipa and Munk (2011) analyzed multi-unit activity during the delay period of a match-to-sample task and found that on correct vs. incorrect trials there is a modulation of spike synchronization and further, synchronous spike events are more prevalent at match presentation. This last point again suggests that increased correlations may be involved in erasing the memory trace.

In fact there is ample literature relating changes in oscillatory synchrony, coherence and frequency during WM tasks (Tallon-Baudry et al., 1998; Pesaran et al., 2002; Lee et al., 2005; Pipa et al., 2009). For example Pesaran et al. (2002) found that gamma-band spiking coherence is increased during the delay period in the lateral intraparietal cortex (LIP) in primates performing a delayed response task. Given that LIP is coupled to the PFC and is also involved in WM trace (Chafee and Goldman-Rakic, 1998), this is suggesting of increased input correlations to the PFC during the WM task. In the context of irregular poisson firing, oscillatory coherence is nothing other than correlations organized both in time (the frequency) and space. Oscillatory effects are beyond the scope of this paper and are a subject of the companion manuscript (Dipoppa and Gutkin, 2013).

The data reviewed above does show that there is a modulation of activity correlations during the WM task, yet it does not provide the mechanism. Here we propose that the mechanism is in the background input correlations generated by a common source. One hence might ask where such inputs may be coming from. As hinted above, one source could be coherent firing activity in the cortical regions coupled to the PFC and involved in WM processing (e.g., LIP). In addition, we propose that the source of shared background input generating spatial correlations can reside in the striatum, a subcortical area thought to be involved in WM. In fact the structure of the cortico-striatal loops as been longly seen as a disadvantage for the WM capacity if the memory store is located also in the striatum. Since the number of striatal neurons is much lower than the number of pyramidal neurons (Lange et al., 1976) and the loop is based on divergence (resp. convergence) in the striato-cortical (resp. cortico-striatal) direction, then the striatum could not have the same memory capacity of the cortex. It has been suggested that instead that divergent/convergent structure could be useful since the basal ganglia do not encode the individual information of WM but they control the gate of other region and decide when they can be updated (Frank et al., 2001). We also suggest that striatum plays a gating role since it could be the source of the common noise that creates the different regimes.

The correlation-induced gating is a robust effect to parameters variation. We propose the following explanation for this phenomenon: background correlations induce spike-times synchronization in the recurrent network, as was found similarly for independent neurons by Galán et al. (2006), and this leads to persistent activity erasing and block because of the refractory period of the neurons, as was found by Laing and Chow (2001) and Gutkin et al. (2001). Providing a proof of this assumption and a mathematical explanation of the correlation-induced gating will be the subject of future research.

## **ACKNOWLEDGMENTS**

The authors want to thank Ole Jensen, Romain Brette, Christian Machens, and Thomas Kreuz for constructive discussions. Mario Dipoppa was partially supported by MESR (France) (for Mario Dipoppa). Boris S. Gutkin was partially supported by CNRS, ANR-Blanc Grant Dopanic, CNRS Neuro IC grant, Neuropole Ile de France, Ecole de Neuroscience de Paris collaborative grant, and LABEX Institut des Etudes Cognitives, INSERM, and ENS.

## **REFERENCES**


*Comput. Neurosci.* 11, 121–134. doi: 10.1023/A:1012837415096


*Acad. Sci. U.S.A.* 103, 201–206. doi: 10.1073/pnas.0508072103


working-memory tasks. *J. Neurosci.* 26, 10141–10153. doi: 10.1523/ JNEUROSCI.2423-06.2006


*Hum. Percept. Perfor.* 27, 92–114. doi: 10.1037/0096-1523.27.1.92

Wilken, P., and Ma, W. J. (2004). A detection theory account of change detection. *J. Vis.* 4, 1120–1135. doi: 10.1167/4.12.11

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 July 2013; accepted: 25 September 2013; published online: October 2013. 21*

*Citation: Dipoppa M and Gutkin BS (2013) Correlations in background* *activity control persistent state stability and allow execution of working memory tasks. Front. Comput. Neurosci. 7:139. doi: 10.3389/fncom.2013.00139*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Dipoppa and Gutkin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Single-unit activities during epileptic discharges in the human hippocampal formation

## *Catalina Alvarado-Rojas 1,2, Katia Lehongre1,2, Juliana Bagdasaryan1,2, Anatol Bragin3, Richard Staba3, Jerome Engel Jr. 3, Vincent Navarro1,2,4 and Michel Le Van Quyen1,2\**

*<sup>1</sup> Centre de Recherche de l'Institut du Cerveau et de la Moelle Epinière, INSERM UMRS 975 - CNRS UMR 7225, Hôpital de la Pitié-Salpêtrière, Paris, France*

*<sup>2</sup> Université Pierre et Marie Curie, Paris, France*

*<sup>3</sup> Department of Neurology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA*

*<sup>4</sup> Epilepsy Unit, Groupe Hospitalier Pitié-Salpêtrière, Paris, France*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Emili Balaguer-Ballester, Bournemouth University, UK Abdelmalik Moujahid, University of the Basque Country UPV/EHU, Spain*

#### *\*Correspondence:*

*Michel Le Van Quyen, Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, INSERM UMRS 975 - CNRS UMR 7225, Hôpital de la Pitié-Salpêtrière, 47 Bd de l'Hôpital, 75651 Paris, Cedex 13, France e-mail: quyen@ t-online.de*

Between seizures the brain of patients with epilepsy generates pathological patterns of synchronous activity, designated as interictal epileptiform discharges (ID). Using microelectrodes in the hippocampal formations of 8 patients with drug-resistant temporal lobe epilepsy, we studied ID by simultaneously analyzing action potentials from individual neurons and the local field potentials (LFPs) generated by the surrounding neuronal network. We found that ∼30% of the units increased their firing rate during ID and 40% showed a decrease during the post-ID period. Surprisingly, 30% of units showed either an increase or decrease in firing rates several hundred of milliseconds before the ID. In 4 patients, this pre-ID neuronal firing was correlated with field high-frequency oscillations at 40–120 Hz. Finally, we observed that only a very small subset of cells showed significant coincident firing before or during ID. Taken together, we suggested that, in contrast to traditional views, ID are generated by a sparse neuronal network and followed a heterogeneous synchronization process initiated over several hundreds of milliseconds before the paroxysmal discharges.

**Keywords: interictal epileptiform discharges, microelectrode recordings, multiunit activity, temporal lobe epilepsy, spike synchronization**

## **INTRODUCTION**

Synchronization of local and distributed neuronal assemblies is thought to underlie fundamental brain processes such as perception, learning, and cognition (Varela et al., 2001). In neurological diseases, neuronal synchrony can be altered and in epilepsy may play an important role in enhanced cellular excitability (Jasper and Penfield, 1954). Besides ictal events or seizures, interictal discharges (ID) are a typical signature of abnormal neuronal synchronization, seen spontaneously between seizures in scalp and intracranial EEG. They are used as a clinical indicator for the location of the epileptogenic zone, the region that generates seizures. Furthermore, it is believed that this region contains both, the seizure onset zone and the surrounding "irritative zone," which generates ID and limits with normal tissue (Talairach and Bancaud, 1966). These transient epileptic synchronization events are characterized by a large-amplitude, rapid component lasting 50–100 ms that is usually followed by a slow wave of 200–500 ms duration (de Curtis and Avanzini, 2001). In some cases, they are associated with an oscillation in the high frequency range greater than 40 Hz (Bragin et al., 1999; Jacobs et al., 2011; Le Van Quyen, 2012). Despite their fundamental importance in diagnosing and treating epilepsy, little is known about the neurophysiological mechanisms generating these events in the human brain. Experimental work on animals and human tissue propose the paroxysmal depolarization shift (PDS) as the cellular correlate of ID (Prince and Wong, 1981; Avoli and Williamson, 1996). This event is defined as a burst of action potentials on a large depolarization, followed by a longer hyperpolarization. However, *in vivo* human evidence is scarce, because of the limited opportunities to study the behavior of single neurons in human subjects. To overcome this difficulty, epilepsy patients suitable for surgical treatment are sometimes studied with intracranial depth electrodes in order to record EEG activity from deep cortical structures and accurately identify the regions originating seizures. Using depth electrodes specially adapted with microelectrodes (Fried et al., 1997; **Figure 1A**), ID can be studied by simultaneously recording action potentials from individual neurons and the local field potentials (LFP). Studies using microelectrode technology, have reported a variable and complex relation between ID and the activity of individual neurons, more heterogeneous than simple PDS (Babb et al., 1973; Wyler et al., 1982; Ulbert et al., 2004; Keller et al., 2010; Alarcon et al., 2012). In particular, a large diversity of neuronal response were found including an increase or decrease in their firing rates or even changes in firing that precede the defining interictal discharge itself. Most of these studies were performed on patients with neocortical epilepsy that exhibit a wide range of heterogeneity. In the present work, we recorded ID in the hippocampal formation of 8 patients with drug-resistant mesial temporal lobe epilepsy. Our objective is to describe firing patterns and neuronal synchronization of single-unit activities during spontaneous IDs.

**FIGURE 1 | (A)** Macro- and micro-electrodes superimposed on a magnetic resonance imaging scan. Nine microwires (40μm diameter) extend beyond the tip of each macro-electrode and record the hippocampal formation. **(B)** Interictal discharges (ID) recorded with microelectrode local field potentials from adjacent electrodes in the

hippocampus of a patient. **(C)** Example of wide-band recording of an ID event with the corresponding extracted single unit activities. **(D)** Raster plot and peri-event histogram (bin size, 10 ms) of the single unit activity shown above. Note the strong changes in the firing rate and instantaneous frequency (red) during the ID.

## **MATERIALS AND METHODS**

#### **DATABASE**

Subjects were 8 patients [two female, mean age ± standard deviation (SD) 36.3 ± 10.5 years] with pharmacologically intractable temporal lobe epilepsy who were implanted with 8–14 intracranial depth electrodes in order to localize epileptogenic regions for possible resection. The placement of the electrodes was determined exclusively by clinical criteria (Fried et al., 1999). Extending beyond the tip of each electrode were nine Pt-Ir microwires (40μm diameter) with inter-tip spacing of 500μm, eight active recording channels and one reference. Each microwire was sampled at 28 kHz (Cheetah recording system, Neuralynx Inc., Tucson, AZ). Spatial localizations were determined on the basis of postimplant computed tomography scans coregistered with preimplant 1.5T MRI scans. Our results are based on microelectrode recordings located in the anterior hippocampus (*n* = 40 channels in 5 patients) and entorhinal cortex (*n* = 24 channels in 3 patients). The recording states were quiet wakefulness and slow waves sleep (stages 1–4). All studies conformed to the guidelines of the Medical Institutional Review Board at University of California, Los Angeles.

#### **SPIKE SORTING**

In order to detect single-units, all channels were high-pass filtered at 300 Hz and were visually examined for the presence of unit activities. In those microwires with clear unit activities, we performed spike detection (>4:1 signal to noise ratio) to obtain multi-unit activities (MUA). Single-unit activities were extracted with spike sorting using KlustaKwik 1.7 program (Software: http://klustakwik.sourceforge.net/; Harris et al., 2000) which employs the 10 principal components of the spike shape and an unsupervised Conditional Expectation Maximization (CEM) clustering algorithm (Hazan et al., 2006). After automatic clustering, the clusters containing non-spike waveforms were visually deleted and then the units were further isolated using a manual cluster cutting method. Only units with clear boundaries and less than 0.5% of spike intervals within a 1 ms refractory period are included in the present analysis. Typically we isolated 1 or 2 distinct neurons from each microwire, but in several cases we observed up to 4 distinct neurons from a single microwire. The instantaneous spike frequency was measured by convolving the timing of each unit with a Gaussian function of standard deviation of 20 ms (*Ts* = 1 ms), set close to the modal interspike interval (Le Van Quyen et al., 2008, 2010). This operation leads to an analog trace of the instantaneous firing rate (Paulin, 1996).

#### **OSCILLATION ANALYSIS**

LFP are complementary to action potential information and have shown prominent oscillatory activity within the high-frequency frequency range from 40 to 300 Hz (Worrell et al., 2012). A wavelet time-frequency analysis was used to determine precisely the mean frequency, maximum amplitude and onset and offset of these LFP oscillations. The advantage of the wavelet analysis lies in the fact that the time resolution is variable with frequency, so that high frequencies have a sharper time resolution (Le Van Quyen and Bragin, 2007). The Complex Morlet wavelet was here applied that uses a wave-like scalable function that is well-localized in both time and frequency:

$$
\Psi\_{\mathfrak{r},f}(\mathfrak{u}) = \sqrt{f} \exp(j2\pi f(\mathfrak{u} - \mathfrak{r})) \exp\left(-\frac{(\mathfrak{u} - \mathfrak{r})^2}{2\sigma^2}\right).
$$

This wavelet represents the product of a sinusoidal wave at frequency *f*, with a Gaussian function centered at time τ, with a standard deviation σ proportional to the inverse of *f*. The wavelet coefficients of a signal *x(t)* as a function of time (τ) and frequency (*f*) are defined as: *W*(τ, *f*) = +∞ −∞ *x*(*u*)<sup>τ</sup>,*f*(*u*)*du*. It depends solely on σ, which sets the number of cycles of the wavelet: *nco* = 6*f* σ. The value *nco* determines the frequency resolution of the analysis by setting the width of the frequency interval for which phase are measured. Here, we chose *nco* = 5. For baseline correction, the average and SD of power were first computed at each frequency of the baseline period. Then, the average baseline power was subtracted from all time windows at each frequency, and the result scaled by 1/SD, yielding baseline-adjusted *Z* scores. Significant increases with respect to baseline activity showed up as positive *Z*-values and tabulated probability values indicate that, for absolute values of *Z* > 3.09, we have *P* < 0.001. The Kolmogorov– Smirnov test was applied to assess the distribution normality of the wavelet coefficients, using a 0.05 probability level.

#### **SPIKE SYNCHRONIZATION**

Different measures exist to detect and quantify synchronization between spike trains (Brown et al., 2004; Kreuz et al., 2007). In this study, we used two complementary techniques: (1) Crosscorrelation analysis was performed for cell pairs (Perkel et al., 1967; Amarasingham et al., 2012). To evaluate the significance of the correlation, we used a boot-strap method that accounts for the firing rate changes of the neurons (Hatsopoulos et al., 2003; Grün, 2009). Since the widths of the peaks in the original crosscorrelograms were typically in the range of 5–30 ms (Krüger and Mayer, 1990), the spikes were jittered by adding a random value from a normal distribution with a 50-ms SD and 0 mean to the spike times. For each cell pair, 1000 jittered spike trains were created, and the expected cross-correlogram (and 99% confidence interval) was estimated on 1 ms time bins. For any given cell pair where at least one bin in the [1.5 ms, 30 ms] interval exceeded the 99% confidence interval, the interaction was considered significant. (2) A method for identifying statistically conspicuous spike coincidences was implemented to detect the number of quasi-simultaneous appearances of spikes over small coincidence windows, here of 5 ms (Gütig et al., 2002; Quian Quiroga et al., 2002). Their occurrence was then studied in relation to surrogate data generated by dithering the individual, original spike times within a given time interval. Here, each spike in the original data set was randomly and independently jittered on a uniform interval of [−5, +5] ms to form a surrogate dataset. By repeating the procedure 1000 times, the 99.9% confidence interval for each bin (*p* = 0.001) was calculated.

## **RESULTS**

Microelectrode recordings were selected by an expert electroencephalographer to have very abundant and persistent ID in the hippocampus (5 patients) or entorhinal cortex (3 patients) during quiet wakefulness or slow-wave sleep (recording durations from 10 to 118 min; total recording time: 6 h). All ID were recorded in the epileptic zone and appeared as spatially synchronous patterns emerging at about the same time on the same bundle of microelectrodes (**Figure 1B**). A standard, threshold-based ID detector was performed to automatically detect, from the LFP, events showing a pointed peak with a large amplitude, large slope and duration of 20–100 ms, appearing at a frequency of 0.07 ± 0.30 Hz (range: 0.01–0.21 Hz). After expert visual confirmation, 862 ID were identified showing a large pattern of morphological characteristics typical for sharp waves, spikes and spike-wave discharges (Niedermeyer, 2005). Events were aligned by the sharpest peak of the discharge (**Figure 1C**). In order to analyze the patterns of neuronal activity around the discharge, we defined a baseline period (–600 to –300 ms), pre-ID period (–300 to –50 ms), the interictal discharge (–50 to 50 ms), the post-ID period (50– 400 ms). The activities of different neurons per microelectrode were identified with a spike sorting algorithm and a total of 75 single units were selected for analysis. To visualize the dischargerelated activity of single neurons, peri-stimulus raster plots and timing histograms were constructed for the period of 1 s before and after each event (**Figure 1D**).

During the ID period (–50 to 50 ms), we found that around 40% of the recorded units showed some change in firing, whereas 60% remain unchanged. About 32% increased their firing rate more than 2 times during ID relative to baseline epochs [**Figure 2B**; right-tail *t*-test: *T*(23) = 1.78; *p* = 0.04; an example of a cell can be seen in **Figure 1D**]. The firing rate of these cells showed a considerable degree of variability (range from 1.4 to 99 Hz) with a mean of 9.4 ± 19.7 Hz during ID (baseline: 2.7 ± 3.1 Hz). During the post-ictal period, 40% of units decreased firing by half [50–400 ms, mean firing rate: 1.8 ± 2.7 Hz and baseline: 7.0 ± 2.7 Hz, left-tail *t*-test: *T*(29) = −3.73; *p* = 4.1 · 10<sup>−</sup>4, **Figure 2C**]. In addition to this modulated single unit activity during ID, many units showed a significant change in firing preceding the interictal discharge. From 30% of single units that significantly changed during the pre-ID period, 12% increased [mean firing rate: 10.0 ± 13.5 Hz and baseline: 4.2 ± 5.8 Hz; *T*(8) = −3.45; *p* = 0.004] and 18% decreased [mean firing rate: 2.3 ± 7.0 Hz and baseline: 5.2 ± 11.6 Hz; *T*(13) = −1.64; *p* = 0.06] their firing rate (–300 to –50 ms, **Figure 2A**; examples are given in **Figure 2D**). On the corresponding channels, we were interested in the relationship between these pre-ID firing changes and

LFP (<300 Hz). Spectral power was performed by using Morlet wavelet analysis (20–300 Hz) and pre-ID changes in LFP were tested for significant increases/decreases from baseline of specific frequency bands (*p* < 0.001). In 4 subjects we observed that pre-ID neuronal firing pattern was correlated with an increase in high-frequency oscillations between 40 and 120 Hz (mean peak from baseline SD: *Z* = 6.1, range from 4.3 to 9.1). **Figure 3** shows average time-frequency representations around the ID for the two patients of **Figure 2D**. Main changes in spectral power can be seen in the LFP preceding the interictal discharge and correlate closely with the increase or decrease in neuronal firing.

Finally, we analyzed unit synchronization during ID between pairs of units simultaneously recorded in two different microelectrodes. Because of the inter-tip spacing of 500 μm, the units are assumed to reflect adjacent but different neuronal populations. Two complementary methods have been used to address the synchrony between spike trains. First, analysis of cross-correlograms between pairs of units was performed for each cell pairs that showed a sufficient number of spikes (>100) during ID. The significance of the correlation was obtained by jittering each pair of spike trains and by computing the 99% confidence interval. Of the 120 cross-correlograms constructed, only 5 cross-correlograms (about 4%) had a significant peak that occurred within ±25 ms around the origin, indicating that these neuronal pairs were discharging in a correlated way. **Figure 4** (top) illustrates examples of significant peaks in crosscorrelograms of two units. In addition to cross-correlation analysis, we also analyzed the overall level of synchronicity from the number of quasi simultaneous appearances of spikes. In order to not overestimate the number of random synchronous spikes due to the elevated firing rate, we used jitter techniques to infer millisecond-precise temporal synchrony (Hatsopoulos et al., 2003). Here, spikes of one of the pairs of neurons were time jittered by ±5 ms to generate jittered peri-stimulus raster plots of unit coincidence that could be used to assess the statistical significance of bin fluctuations in the non-jittered spike series. Because the jittered data sets preserve firing rates on timescales much broader than that of the jitter interval (in this case, 5 ms), the overall effect of the analysis is to identify those pairs that showed excessive co-firing at short latencies that cannot be accounted for by firing rates varying at timescales of tens of milliseconds. Despite the strong increase in about 30% of the recorded units during ID, only a very small subset of cells (18 of 120 analyzed pairs, about 15%) showed significant coincident firing before or during ID. For two patients, **Figure 4** (bottom) illustrates pairs of units that showed significant coincident firing (*p* = 0.001) during ID (A) and the pre-ID period (B).

## **DISCUSSION**

We found that a large subset of the recorded units showed significant changes in firing in or around ID in the hippocampal formation of patients with mesial temporal epilepsy. Around 30% of the unit increased their firing rate during ID while 40% showed a decrease during the post-ID period. This percentage of modulated neurons agrees with that described by Wyler et al. (1982), who found that 44% of recorded neurons showed primarily an increase in firing rate near the interictal discharge peak. Surprisingly, a subset of 30% of units showed significant firing rate variations several hundred of milliseconds before the ID. In a few patients, we observed that this neuronal firing pattern was related with elevated LFP oscillations at 40–120 Hz. Finally, based on two statistical methods that identify spike synchronization, we found that only a very small subset of cells showed significant coincident firing before or during ID.

Our observations of neuronal firing during the interictal discharge are consistent with the paroxysmal depolarizing shift (PDS) mechanism—a large depolarization phase followed by a long hyperpolarization—that have been studied in animal models of epilepsy (Matsumoto and Marsan, 1964; Prince, 1968). The first part of the depolarization phase is believed to be generated by intrinsic membrane conductance (de Curtis et al., 1999), and the later from feedback recurrent synaptic excitation mediated by AMPA and NMDA receptor subtypes, and glutamate receptorcoupled calcium conductances (Traub et al., 1993). Thus, PDS has been shown to be the result of giant excitatory postsynaptic potentials. The PDS is usually followed by a hyperpolarization, which represents GABA-mediated recurrent inhibition, as well as Ca2<sup>+</sup>-dependent K<sup>+</sup> currents. Interestingly, consistent with *in vitro* studies on hippocampal slices from human patients with temporal lobe epilepsy (Cohen et al., 2002; Wittner et al., 2009), the presence of a similar suppression of unit activities in our *in vivo* data suggests that IDs can occur in cortical regions maintaining substantial inhibitory function.

However, in contrast to simple models of PDS and in line with other observations in human epileptic neocortex (Keller et al., 2010), we found that ID, rather than requiring a large synchronization of neurons, can occur with relatively sparse single neuron participation (estimated at about 30% of the cells). Furthermore, a small subset of the units significantly increased or decreased their firing well before ID. Concomitant with changes in firing rate for certain neurons, at least in some patients, high-frequency oscillations at 40–120 Hz can be seen in the LFP preceding the ID and correlate closely with the changes in neuronal firing. Because interneurons are involved in the generation of high frequency oscillations through mechanisms of post-inhibition resetting of neuronal firing (Cobb et al., 1995; Ylinen et al., 1995; Le Van Quyen et al., 2008; Le Van Quyen, 2012), it is here tempting to speculate that GABA-mediated events may contribute to enhance synchronization of local epileptic networks before ID. Interestingly, emerging evidence indicates that GABA promotes epileptiform synchronization (Pavlov et al., 2013). For instance, GABA receptor-mediated inhibition can facilitate thalamocortical processes leading to the occurrence of generalized spike and wave discharges that occur during absence seizures (Danober et al., 1998). Following a similar mechanism, ID may be caused by a rebound synchronization of cells that may start firing

synchronizations (red circles) were defined as coincidences between the two units (green and blue points) occurring over a 5-ms interval. Note the significant increase in coincidences during ID **(A)** and the pre-ID period **(B)**, over the statistical threshold defined by a random jitter of the original data.

synchronously shortly after inhibition ceases and permit the fast component of the ID. Moreover, intense synaptic activation of GABAA receptors in the hippocampus can lead to a shift in GABAergic neurotransmission from inhibitory to excitatory, contributing to epileptic discharges (Kohling et al., 2000; Cohen et al., 2002). Interestingly, pre-event changes have also been seen in advance of seizures in an animal model of temporal lobe epilepsy (Bower and Buckmaster, 2008) and around seizure onset in human epilepsy (Babb and Crandall, 1976; Truccolo et al., 2011), suggesting a possible similar mechanism before seizures.

Taken together, our data suggest that ID in patients with temporal lobe epilepsy is not a simple paroxysm of hypersynchronous

## **REFERENCES**


properties of human neocortical neurons maintained *in vitro*. *Prog. Neurobiol*. 48, 519–554. doi: 10.1016/0301-0082(95)00050-X


excitatory activity, but rather represents a heterogeneous synchronization process possibly initiated by GABAergic responses in small subsets of cells and emerging over hundreds of milliseconds before the paroxysmal discharges.

## **ACKNOWLEDGMENTS**

Catalina Alvarado-Rojas was supported by the Administrative Department for Science, Technology and Innovation (COLCIENCIAS), Colombia. Vincent Navarro was supported by a Contrat Inferface INSERM. This work was supported by funding from the program "Investissements d'avenir" ANR-10-IAIHU-06 and from the ICM and OCIRP.


by time-frequency analysis. *Clin. Neurophysiol*. 122, 32–42. doi: 10.1016/j.clinph.2010.05.033


*Applications, and Related Fields*, eds E. Niedermeyer and F. Lopes da Silva (Philadelphia, PA: Lippincott Williams and Wilkins).


brainweb: phase synchronization and large-scale integration. *Nat. Rev. Neurosci*. 2, 229–239. doi: 10.1038/35067550


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 27 September 2013; published online: 18 October 2013.*

*Citation: Alvarado-Rojas C, Lehongre K, Bagdasaryan J, Bragin A, Staba R, Engel Jr J, Navarro V and Le Van Quyen M (2013) Single-unit activities during epileptic discharges in the human hippocampal formation. Front. Comput. Neurosci. 7:140. doi: 10.3389/ fncom.2013.00140*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Alvarado-Rojas, Lehongre, Bagdasaryan, Bragin, Staba, Engel, Navarro and Le Van Quyen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A unified view on weakly correlated recurrent networks

#### *Dmytro Grytskyy1 \*, Tom Tetzlaff 1, Markus Diesmann1,2 and Moritz Helias <sup>1</sup>*

*<sup>1</sup> Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6), Jülich Research Centre and JARA, Jülich, Germany <sup>2</sup> Medical Faculty, RWTH Aachen University, Germany*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Brent Doiron, University of Pittsburgh, USA Shigeru Shinomoto, Kyoto University, Japan*

#### *\*Correspondence:*

*Dmytro Grytskyy, Jülich Research Centre and JARA, Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6), Building 15.22, 52425 Jülich, Germany e-mail: d.grytskyy@fz-juelich.de*

The diversity of neuron models used in contemporary theoretical neuroscience to investigate specific properties of covariances in the spiking activity raises the question how these models relate to each other. In particular it is hard to distinguish between generic properties of covariances and peculiarities due to the abstracted model. Here we present a unified view on pairwise covariances in recurrent networks in the irregular regime. We consider the binary neuron model, the leaky integrate-and-fire (LIF) model, and the Hawkes process. We show that linear approximation maps each of these models to either of two classes of linear rate models (LRM), including the Ornstein–Uhlenbeck process (OUP) as a special case. The distinction between both classes is the location of additive noise in the rate dynamics, which is located on the output side for spiking models and on the input side for the binary model. Both classes allow closed form solutions for the covariance. For output noise it separates into an echo term and a term due to correlated input. The unified framework enables us to transfer results between models. For example, we generalize the binary model and the Hawkes process to the situation with synaptic conduction delays and simplify derivations for established results. Our approach is applicable to general network structures and suitable for the calculation of population averages. The derived averages are exact for fixed out-degree network architectures and approximate for fixed in-degree. We demonstrate how taking into account fluctuations in the linearization procedure increases the accuracy of the effective theory and we explain the class dependent differences between covariances in the time and the frequency domain. Finally we show that the oscillatory instability emerging in networks of LIF models with delayed inhibitory feedback is a model-invariant feature: the same structure of poles in the complex frequency plane determines the population power spectra.

**Keywords: correlations, linear response, Hawkes process, leaky integrate-and-fire model, binary neuron, linear rate model, Ornstein–Uhlenbeck process**

## **1. INTRODUCTION**

The meaning of correlated neural activity for the processing and representation of information in cortical networks is still not understood, but evidence for a pivotal role of correlations increases (recently reviewed in Cohen and Kohn, 2011). Different studies have shown that correlations can either decrease (Zohary et al., 1994) or increase (Sompolinsky et al., 2001) the signal to noise ratio of population signals, depending on the readout mechanism. The architecture of cortical networks is dominated by convergent and divergent connections among the neurons (Braitenberg and Schüz, 1991) causing correlated neuronal activity by common input from shared afferent neurons in addition to direct connections between pairs of neurons and common external signals. It has been shown that correlated activity can faithfully propagate through convergent-divergent feed forward structures, such as synfire chains (Abeles, 1991; Diesmann et al., 1999), a potential mechanism to convey signals in the brain. Correlated firing was also proposed as a key to the solution of the binding problem (von der Malsburg, 1981; Bienenstock, 1995; Singer, 1999), an idea that has been discussed controversially (Shadlen and Movshon, 1999). Independent of a direct functional role of correlations in cortical processing, the covariance function between the spiking activity of a pair of neurons contains the information about time intervals between spikes. Changes of synaptic coupling, mediated by spike-timing dependent synaptic plasticity (STDP, Markram et al., 1997; Bi and Poo, 1999), are hence sensitive to correlations. Understanding covariances in spiking networks is thus a prerequisite to investigate the evolution of synapses in plastic networks (Burkitt et al., 2007; Gilson et al., 2009, 2010).

On the other side, there is ubiquitous experimental evidence of correlated spike events in biological neural networks, going back to early reports on multi-unit recordings in cat auditory cortex (Perkel et al., 1967; Gerstein and Perkel, 1969), the observation of closely time-locked spikes appearing at behaviorally relevant points in time (Kilavik et al., 2009; Ito et al., 2011) and collective oscillations in cortex [recently reviewed in Buzsáki and Wang (2012)].

The existing theories explaining correlated activity use a multitude of different neuron models. Hawkes (1971) developed the theory of covariances for linear spiking Poisson neurons (Hawkes processes). Ginzburg and Sompolinsky (1994) presented the approach of linearization to treat fluctuations around the point of stationary activity and to obtain the covariances for networks of non-linear binary neurons. The formal concept of linearization allowed Brunel and Hakim (1999) and Brunel (2000) to explain fast collective gamma oscillations in networks

of spiking leaky integrate-and-fire (LIF) neurons. Correlations in feed-forward networks of LIF models are studied in Moreno-Bote and Parga (2006), exact analytical solutions for such network architectures are given in Rosenbaum and Josic (2011) for the case of stochastic random walk models, and threshold crossing neuron models are considered in Tchumatchenko et al. (2010) and Burak et al. (2009). Covariances in structured networks are investigated for Hawkes processes (Pernice et al., 2011), and in linear approximation for LIF (Pernice et al., 2012) and exponential integrate-and-fire neurons (Trousdale et al., 2012). The latter three works employ an expansion of the propagator (time evolution operator) in terms of the order of interaction. Finally Buice et al. (2009) investigate higher order cumulants of the joint activity in networks of binary model neurons.

Analytical insight into a neuroscientific phenomenon based on correlated neuronal activity often requires a careful choice of the neuron model to arrive at a solvable problem. Hence a diversity of models has been proposed and is in use. This raises the question which features of covariances are generic properties of recurrent networks and which are specific to a certain model. Only if this question can be answered one can be sure that a particular result is not an artifact of oversimplified neuronal dynamics. Currently it is unclear how different neuron models relate to each other and whether and how results obtained with one model carry over to another. In this work we present a unified theoretical view on pairwise correlations in recurrent networks in the asynchronous and collective-oscillatory regime, approximating the response of different models to linear order. The joint treatment allows us to answer the question of genericness and moreover naturally leads to a classification of the considered models into only two categories, as illustrated in **Figure 1**. The classification in addition enables us to extend existing theoretical results to biologically relevant parameters, such as synaptic delays and the presence of inhibition, and to derive explicit expressions for the timedependent covariance functions, in quantitative agreement with direct simulations, which can serve as a starting point for further work.

The remainder of this article is organized as follows. In the first part of our results in "Covariance structure of noisy rate models" we investigate the activity and the structure of covariance functions for two versions of linear rate models (LRM); one with input the other with output noise. If the activity relaxes exponentially after application of a short perturbation, both models coincide with the OUP. We mainly consider the latter case, although most results hold for arbitrary kernel functions. We extend the analytical solutions for the covariances in networks of OUP (Risken, 1996) to the neuroscientifically important case of synaptic conduction delays. Solutions are derived first for general forms of connectivity in "Solution of the convolution equation with input noise" for input noise and in "Solution of convolution equation with output noise" for output noise. After analyzing the spectral properties of the dynamics in the frequency domain in "Spectrum of the dynamics," identifying poles of the propagators

**linear rate models (LRM).** The arrows indicate analytical methods which enable a mapping from the original spiking (LIF model, Hawkes model) or binary neuron dynamics to the analytically more tractable linear rate models. Depending on the original dynamics (spiking or binary) the resulting LRM contains an additive noise component *x* either on the output side (left) or on the input side (right).

and their relation to collective oscillations in neuronal networks, we show in "Population-averaged covariances"' how to obtain pairwise averaged covariances in homogeneous Erdös-Rényi random networks. We explain in detail the use of the residue theorem to perform the Fourier back-transformation of covariance functions to the time domain in "Fourier back transformation" for general connectivity and in "Explicit expression for the population averaged cross covariance in the time domain" for averaged covariance functions in random networks, which allows us to obtain explicit results and to discuss class dependent features of covariance functions.

In the second part of our results in "Binary neurons," "Hawkes processes," and "Leaky integrate-and-fire neurons" we consider the mapping of different neuronal dynamics on either of the two flavors of the linear rate models discussed in the first part. The mapping procedure is qualitatively the same for all dynamics as illustrated in **Figure 1**: Starting from the dynamic equations of the respective model, we first determine the working point described in terms of the mean activity in the network. For unstructured homogeneous random networks this amounts to a mean-field description in terms of the population averaged activity (i.e., firing rate in spiking models). In the next step, a linearization of the dynamical equations is performed around this working point. We explain how fluctuations can be considered in the linearization procedure to improve its accuracy and we show how the effective linear dynamics maps to the LRM. We illustrate the results throughout by a quantitative comparison of the analytical results to direct numerical simulations of the original non-linear dynamics. The appendices "Implementation of noisy rate models," "Implementation of binary neurons in a spiking simulator code," and "Implementation of Hawkes neurons in a spiking simulator code." describe the model implementations and are modules of our long-term collaborative project to provide the technology for neural systems simulations (Gewaltig and Diesmann, 2007).

#### **2. COVARIANCE STRUCTURE OF NOISY RATE MODELS**

#### **2.1. DEFINITION OF MODELS**

Let us consider a network of linear model neurons, each characterized by a continuous fluctuating rate *r* and connections from neuron *j* to neuron *i* given by the element *wij* of the connectivity matrix **w**. We assume that the response of neuron *i* to input can be described by a linear kernel *h* so that the activity in the network fulfills

$$\mathbf{r}(t) = h(\circ) \* [\mathbf{w}\mathbf{r}(\circ - d) + b\mathbf{x}(\circ)](t),\tag{1}$$

where *f*(◦ − *d*) denotes the function *f* shifted by the delay *d*, **x** is an uncorrelated noise with

$$
\langle \mathbf{x}\_i(t) \rangle = 0, \qquad \langle \mathbf{x}\_i(s) \mathbf{x}\_j(t) \rangle = \delta\_{ij} \delta(s - t) \rho^2, \tag{2}
$$

e.g., a Gaussian white noise and (*f* ∗ *g*)(*t*) = *t* −∞ *f*(*t* − *t* ) *g*(*t* ) *dt* is the convolution. With the particular choice *b* = **w**δ(◦ − *d*)∗ we obtain

$$\mathbf{r}(t) = [h(\circ) \ast \mathbf{w}(\mathbf{r}(\circ - d) + \mathbf{x}(\circ - d))](t). \tag{3}$$

We call the dynamics (3) the linear noisy rate model (LRM) with noise applied to output, as the sum *r* + *x* appears on the right hand side. Alternatively, choosing *b* = **1** we define the model with input noise as

$$\mathbf{r}(t) = h(\circ) \* [\mathbf{w}\mathbf{r}(\circ - d) + \mathbf{x}(\circ)](t). \tag{4}$$

Hence, Equations (3) and (4) are special cases of (1). In the following we consider the particular case of an exponential kernel

$$h(\mathbf{s}) = \frac{1}{\mathfrak{r}} \theta(\mathbf{s}) e^{-\mathbf{s}/\mathfrak{r}},\tag{5}$$

where θ denotes the Heaviside function, θ(*t*) = 1 for *t* > 0, 0 else. Applying to (1) the operator *O* = τ *<sup>d</sup> ds* + 1 which has *h* as a Green's function (i.e., *Oh* = δ) we get

$$\mathbf{r}\frac{d}{dt}\mathbf{r}(t) + \mathbf{r}(t) = \mathbf{w}\mathbf{r}(t-d) + \mathbf{b}\mathbf{x}(t),\tag{6}$$

which is the equation describing a set of delay coupled Ornstein-Uhlenbeck-processes (OUP) with input or output noise for *b* = **1** or *b* = **w**δ(◦ − *d*)∗, respectively. We use this representation in "Binary neurons" to show the correspondence to networks of binary neurons.

#### **2.2. SOLUTION OF THE CONVOLUTION EQUATION WITH INPUT NOISE**

The solution for the system with input noise obtained from the definition (4) after Fourier transformation is

$$\mathbf{R} = H\_d \mathbf{w} \mathbf{R} + H \mathbf{X},\tag{7}$$

where the delay is consumed by the kernel function *hd*(*s*) <sup>=</sup> <sup>1</sup> <sup>τ</sup> <sup>θ</sup>(*<sup>s</sup>* <sup>−</sup> *<sup>d</sup>*)*e*−(*s*−*d*)/τ. We use capital letters throughout the text to denote objects in the Fourier domain and lower case letters for objects in the time domain. Solved for **R** = (1 − *Hd***w**)−1*H***X** the covariance function of **r** in the Fourier domain is found with the Wiener–Khinchin theorem (Gardiner, 2004) as **R**(ω)**R***T*(−ω), also called the cross spectrum

$$\mathbf{C}(\boldsymbol{\omega}) = \langle \mathbf{R}(\boldsymbol{\omega}) \mathbf{R}^{\top}(-\boldsymbol{\omega}) \rangle \tag{8}$$

$$= (1 - H\_d(\boldsymbol{\omega}) \mathbf{w})^{-1} H(\boldsymbol{\omega}) \langle \mathbf{X}(\boldsymbol{\omega}) \mathbf{X}^{T}(-\boldsymbol{\omega}) \rangle$$

$$H(-\boldsymbol{\omega}) (1 - H\_d(-\boldsymbol{\omega}) \mathbf{w}^{T})^{-1}$$

$$= (H\_d(\boldsymbol{\omega})^{-1} - \mathbf{w})^{-1} \mathbf{D} (H\_d(-\boldsymbol{\omega})^{-1} - \mathbf{w}^{T})^{-1},$$

where we introduced the matrix **D** = **X**(ω)**X***T*(−ω). From the second to the third line we used the fact that the non-delayed kernels *H*(ω) can be replaced by delayed kernels *Hd*(ω) and that the corresponding phase factors *ei*ω*<sup>d</sup>* and *e*−*i*ω*<sup>d</sup>* cancel each other. If **x** is a vector of pairwise uncorrelated noise, **D** is a diagonal matrix and needs to be chosen accordingly in order for the cross spectrum (8) to coincide (neglecting non-linear effects) with the cross spectrum of a network of binary neurons, as described in "Equivalence of binary neurons and Ornstein–Uhlenbeck processes".

#### **2.3. SOLUTION OF CONVOLUTION EQUATION WITH OUTPUT NOISE**

For the system with output noise we consider the quantity *yi* = *ri* + *xi* as the dynamic variable representing the activity of neuron *i* and aim to determine pairwise correlations. It is easy to get from (3) after Fourier transformation

$$\mathbf{R} = H\_d \mathbf{w}(\mathbf{R} + \mathbf{X}),\tag{9}$$

which can be solved for **R** = (1 − *Hd***w**)−<sup>1</sup>*Hd***wX** in order to determine the Fourier transform of **Y** as

$$\mathbf{Y} = \mathbf{R} + \mathbf{X} = (1 - H\_d \mathbf{w})^{-1} \mathbf{X}.\tag{10}$$

The cross spectrum hence follows as

$$\mathbf{C}(\boldsymbol{\omega}) = \langle \mathbf{Y}(\boldsymbol{\omega}) \mathbf{Y}^T(-\boldsymbol{\omega}) \rangle \tag{11}$$

$$= (1 - H\_d(\boldsymbol{\omega}) \mathbf{w})^{-1} \langle \mathbf{X}(\boldsymbol{\omega}) \mathbf{X}^T(-\boldsymbol{\omega}) \rangle (1 - H\_d(-\boldsymbol{\omega}) \mathbf{w}^T)^{-1}$$

$$= (1 - H\_d(\boldsymbol{\omega}) \mathbf{w})^{-1} \mathbf{D} (1 - H\_d(-\boldsymbol{\omega}) \mathbf{w}^T)^{-1},$$

with **D** = **X**(ω)**X***T*(−ω). **D** is a diagonal matrix with the *i*-th diagonal entry ρ<sup>2</sup> *<sup>i</sup>* . For the correspondence to spiking models **D** must be chosen appropriately, as discussed in "Hawkes processes" and "Leaky integrate-and-fire neurons" for Hawkes processes and LIF neurons, respectively.

#### **2.4. SPECTRUM OF THE DYNAMICS**

For both linear rate dynamics, with output and with input noise, the cross spectrum **C**(ω) has poles at certain frequencies ω in the complex plane. These poles are defined by the zeros of det(*Hd*(ω)−<sup>1</sup> − **w**) and the corresponding term with the opposite sign of ω. The zeros of det(*Hd*(ω)−<sup>1</sup> − **w**) are solutions of the equation

$$\left(H\_d(\omega)\right)^{-1} = (1 + i\omega\mathfrak{r})e^{i\omega d} = L\_j$$

where *Lj* is the *j*-th eigenvalue of **w**. The same set of poles arises from (1) when solving for **R**. For *d* > 0 and the exponential kernel (5), the poles can be expressed as

$$z\_k(L\_j) = \frac{i}{\mathfrak{r}} - \frac{i}{d} \mathcal{W}\_k \left( L\_j \frac{d}{\mathfrak{r}} e^{\frac{d}{\mathfrak{r}}} \right), \tag{12}$$

where *Wk* is the *k*-th of the infinitely many branches of the Lambert-W function (Corless et al., 1996). For vanishing synaptic delay *d* = 0 there is obviously only one solution for every *Lj* given by *z* = <sup>−</sup>*<sup>i</sup>* <sup>τ</sup> (*Lj* − 1).

Given the same parameters *d*, **w**, τ, the pole structures of the cross spectra of both systems (8) and (11) are identical, since the former can be obtained from the latter by multiplication with (*Hd*(ω)*Hd*(−ω))−<sup>1</sup> = (*H*(ω)*H*(−ω))<sup>−</sup>1, which has no poles. The only exception causing a different pole structure for the two models is the existence of an eigenvalue *Lj* = 0 of the connectivity matrix **w**, corresponding to a pole *z*(0) = *<sup>i</sup>* <sup>τ</sup> . However, this pole corresponds to an exponential decay of the covariance for input noise in the time domain and hence does not contribute to oscillations. For output noise, the multiplication with the term (*H*(ω)*H*(−ω))<sup>−</sup>1, vanishing at ω = *<sup>i</sup>* <sup>τ</sup> , cancels this pole in the covariance. Consequently both dynamics exhibit similar oscillations. A typical spectrum of poles for a negative eigenvalue *Lj* < 0 is shown in **Figures 2B,D**.

#### **2.5. POPULATION-AVERAGED COVARIANCES**

Often it is desirable to consider not the whole covariance matrix but averages over subpopulations of pairs of neurons. For instance the average over the whole network would result in a single scalar value. Separately averaging pairs, distinguishing excitatory and inhibitory neuron populations, yields a 2 by 2 matrix of covariances. For these simpler objects closed form solutions can be obtained, which already preserve some useful information and show important features of the network. Averaged covariances are also useful for comparison with simulations and experimental results.

In the following we consider a recurrent random network of *Ne* = *N* excitatory and *Ni* = γ*N* inhibitory neurons with synaptic weight *w* for excitatory and −*gw* for inhibitory synapses. The probability *p* determines the existence of a connection between

**FIGURE 2 | Pole structure determines dynamics.** Autocovariance of the population activity **(A,C)** measured in ρ2/τ and its Fourier transform called power spectrum **(B,D)** of the rate models with output noise (dots) (3) and input noise (diagonal crosses) (4) for delays *d* = 3 ms **(A,B)**, and *d* = 1 ms **(C,D)**. Black symbols show averages over the excitatory population activity and gray symbols over the inhibitory activity obtained by direct simulation. Light gray curves show theoretical predictions for the spectrum (20) and the covariance (21) for output noise and the spectrum (17) and the covariance (18) for input noise. Black crosses (12) in **(B,D)** denote the locations of the poles of the cross spectra - with the real parts corresponding to the damping (vertical axis), and the imaginary parts to oscillation frequencies (horizontal axis). The detailed parameters for this and following figures are given in "Parameters of simulations".

two randomly chosen neurons. We study the dynamics averaged over the two subpopulations by introducing the quantities *ra* = <sup>1</sup> *Na <sup>j</sup>* <sup>∈</sup> *<sup>a</sup> rj* and noise terms *xa* <sup>=</sup> <sup>1</sup> *Na <sup>j</sup>* <sup>∈</sup> *<sup>a</sup> xj* for *a* ∈ {*E*, *I*}; indices *I* and *E* stand for inhibitory and excitatory neurons and corresponding quantities. Calculating the average local input *<sup>N</sup>*−<sup>1</sup> *<sup>a</sup> <sup>j</sup>* <sup>∈</sup> *<sup>a</sup> wjkrk* to a neuron of type *a*, we obtain

$$N\_a^{-1} \sum\_{j \in a} \sum\_k \boldsymbol{w}\_{jk} \boldsymbol{r}\_k = N\_a^{-1} \left( \sum\_{j \in a} \sum\_{k \in \mathcal{E}} \boldsymbol{w}\_{jk} \boldsymbol{r}\_k + \sum\_{j \in a} \sum\_{k \in \mathcal{I}} \boldsymbol{w}\_{jk} \boldsymbol{r}\_k \right) (13)$$

$$= N\_a^{-1} \left( p N\_a \boldsymbol{w} \sum\_{k \in \mathcal{E}} \boldsymbol{r}\_k - p N\_a \boldsymbol{g} \boldsymbol{w} \sum\_{k \in \mathcal{I}} \boldsymbol{r}\_k \right)$$

$$= p \boldsymbol{w} N(\boldsymbol{r}\_\mathcal{E} - \boldsymbol{\chi} \boldsymbol{g} \boldsymbol{r} \boldsymbol{\chi}),$$

where, from the second to the third line we used the fact that in expectation a given neuron *k* has *pNa* targets in the population *a*. The reduction to the averaged system in (13) is exact if in every column *k* in **w***jk* there are exactly *K* non-zero elements for *j* ∈ *E* and γ*K* for *j* ∈ *I*, which is the case for networks with fixed out-degree (number of outgoing connections of a neuron to the neurons of a particular type is kept constant), as noted earlier (Tetzlaff et al., 2012). For fixed in-degree (number of connections to a neuron coming in from the neurons of a particular type is kept constant) the substitution of *rj* <sup>∈</sup> *<sup>a</sup>* by *ra* is an additional approximation, which could be considered as an average over possible realizations of the random connectivity. In both cases the effective population-averaged connectivity matrix **M** turns out to be

$$\mathbf{M} = K\nu \begin{pmatrix} 1 \ -\gamma \mathbf{g} \\ 1 \ -\gamma \mathbf{g} \end{pmatrix},\tag{14}$$

with *K* = *pN*. So the averaged activities fulfill the same Equations (3) and (4) with the non-averaged quantities **r**, **x**, and **w** replaced by their averaged counterparts **<sup>r</sup>**¯ <sup>=</sup> (*r<sup>E</sup>* ,*rI*)*T*, **<sup>x</sup>**¯ <sup>=</sup> (*x<sup>E</sup>* , *<sup>x</sup>I*)*T*, and **M**. The population averaged activities *ra* are directly related to the block-wise averaged covariance matrix **c**¯ = *cEE cEI <sup>c</sup>IE <sup>c</sup>II* , with *cab* <sup>=</sup> *<sup>N</sup>*−<sup>1</sup> *<sup>a</sup> <sup>N</sup>*−<sup>1</sup> *b i* ∈ *a <sup>j</sup>* <sup>∈</sup> *<sup>b</sup> cij*. With

$$\bar{D}\_{ab} = N\_a^{-1} N\_b^{-1} \left\langle \sum\_{i \in a} \mathbf{x}\_i \sum\_{j \in b} \mathbf{x}\_j \right\rangle \tag{15}$$
 
$$= N\_a^{-1} N\_b^{-1} \sum\_{i \in a} \sum\_{j \in b} D\_{ij}$$
 
$$= \delta\_{ab} N\_a / N\_a^2 \rho^2 = \delta\_{ab} N\_a^{-1} \rho^2$$

we replace **D** by **D**¯ = ρ<sup>2</sup> *N*−<sup>1</sup> 0 0 (γ*N*)−<sup>1</sup> and **c** by **c**¯ so that the same Equations (11) and (8) and their general solutions also hold for the block-wise averaged covariance matrices.

The covariance matrices separately averaged over pairs of excitatory, inhibitory or mixed pairs are shown in **Figure 2** for both linear rate dynamics (3) and (4). (Parameters for all simulations presented in this article are collected in "Parameters of simulations," the implementation of LRM is described in "Implementation of noisy rate models"). The poles of both models shown in **Figure 2B** are given by (12) and coincide with the peaks in the cross spectra (8) and (11) for output and input noise, respectively. The results of direct simulation and the theoretical prediction are shown for two different delays, with the longer delay leading to stronger oscillations.

**Figure 3C** shows the distribution of eigenvalues in the complex plane for two random connectivity matrices with different synaptic amplitudes *w*. The model exhibits a bifurcation, if at least one eigenvalue assumes a zero real part. For fixed out-degree the averaging procedure (13) is exact, reflected by the precise agreement of theory and simulation in **Figure 3D**. For fixed in-degree, the averaging procedure (13) is an approximation, which is good only for parameters far from the bifurcation. Even in this regime still small deviations of the theory from the simulation results are visible in **Figure 3B**. On the stable side close to a bifurcation, the appearance of long living modes causes large fluctuations. These weakly damped modes appearing in one particular realization of the connectivity matrix are not represented after the replacement of the full matrix **w** by the average **M** over matrix realizations. The eigenvalue spectrum of the connectivity matrix provides an alternative way to understand the deviations. By the averaging the set of *N* eigenvalues of the connectivity matrix is replaced withby the two eigenvalues of the reduced matrix **M**, one of which is zero due to identical rows of **M**. The eigenvalue spectrum of the full matrix

**FIGURE 3 | Limits of the theory for fixed in-degree and fixed out-degree.** Autocovariance **(A)** and covariance **(B)** in random networks with fixed in-degree (dots) and fixed out-degree (crosses). Simulation results for *cEE* , *cEI* , and *cII* are shown in dark gray, black and light gray, respectively for synaptic weight *w* = 0.011 far from bifurcation. For larger synaptic weight *w* = 0.018 close to bifurcation (see text at the end of "Population-averaged covariances"), *cEE* is also shown in **(D)** for fixed in-degree (dark gray dots) and for fixed out-degree (black dots). Corresponding theoretical predictions for the autocovariance (34) **(A)** and the covariance (18) **(B,D)** are plotted as light gray curves throughout. The set of eigenvalues is shown as black dots in panel **(C)** for the smaller weight. The gray circle denotes the spectral radius *w Np*(1 − *p*)(1 + γ*g*2) (Rajan and Abbott, 2006; Kriener et al., 2008) confining the set of eigenvalues for the larger weight. The small filled gray circle and the triangle show the effective eigenvalues *L* of the averaged systems for small and large weight, respectively.

is illustrated in **Figure 3C**. Even if the eigenvalue(s) *L***<sup>M</sup>** of **M** are far in the stable region (corresponding to (*z*(*L***M**)) > 0) some eigenvalues *L***<sup>w</sup>** of the full connectivity matrix in the vicinity of the bifurcation region may still have an imaginary part becoming negative and the system can feel their influence, shown in **Figure 3D**.

#### **2.6. FOURIER BACK TRANSFORMATION**

Although the cross spectral matrices (8) and (11) for both dynamics look similar in the Fourier domain, the procedures for back transformation differ in detail. In both cases, the Fourier integral along the real ω-axis can be extended to a closed integration contour by a semi-circle with infinite radius centered at 0 in the appropriately chosen half-plane. The half-plane needs to be selected such that the contribution of the integration along the semi-circle vanishes. By employing the residue theorem (Bronstein et al., 1999) the integral can be replaced by a sum over residua of the poles encircled by the contour. For a general covariance matrix we only need to calculate **c**(*t*) for *t* ≥ 0, as for *t* < 0 the solution can be found by symmetry **c**(*t*) = **c***T*(−*t*).

For input noise it is possible to close the contour in the upper half-plane where the integrand **C**(ω)*ei*ω*<sup>t</sup>* vanishes for |ω| → ∞ for all *t* > 0, as |*Cij*(ω)| decays as |ω| <sup>−</sup>2. This can be seen from (8), because the highest order of *H*−<sup>1</sup> *<sup>d</sup>* <sup>∝</sup> <sup>ω</sup> appearing in det(*H*−<sup>1</sup> *<sup>d</sup>* − **w**) is equal to the dimensionality *N* of **w** (*N* = 2 for **M**), and in det(adjugate matrix *ij* of *H*−<sup>1</sup> *<sup>d</sup>* − **w**) it is *N* − 1 (*i* = *j*) or *N* − 2 (*i* = *j*). So |(*H*−<sup>1</sup> *<sup>d</sup>* <sup>−</sup> **<sup>w</sup>**)−1<sup>|</sup> is proportional to <sup>|</sup>ω<sup>|</sup> <sup>−</sup>1|*e*−*i*ω*d*| and |**C**(ω)|∝|ω| <sup>−</sup><sup>2</sup> for large |ω|.

For the case of output noise (11) **C**(ω) can be obtained from the **C**(ω) for input noise (8) multiplied with (*Hd*(ω)*Hd*(−ω))−<sup>1</sup> ∼ |ω| <sup>2</sup> for large |ω|. The multiplication with this factor changes the asymptotic behavior of the integrand, which therefore contains terms converging to a constant value and terms decaying like |ω| <sup>−</sup><sup>1</sup> for |ω| → ∞. These terms result in non-vanishing integrals over the semicircle in the upper half-plane and have to be considered separately. To this end we rewrite (11) as

$$\begin{split} \mathbf{C}(\boldsymbol{\omega}) &= ((1 - H\_d(\boldsymbol{\omega})\mathbf{w})^{-1} H\_d(\boldsymbol{\omega}) \mathbf{w} + 1) \\ &\quad \mathbf{D}(\mathbf{w}^T H\_d(-\boldsymbol{\omega})(1 - H\_d(-\boldsymbol{\omega})\mathbf{w}^T)^{-1} + 1) \\ &= (1 - H\_d(\boldsymbol{\omega})\mathbf{w})^{-1} H\_d(\boldsymbol{\omega}) \mathbf{w} \mathbf{D} \mathbf{w}^T H\_d(-\boldsymbol{\omega})(1 - H\_d(-\boldsymbol{\omega})\mathbf{w}^T)^{-1} \\ &\quad + (1 - H\_d(\boldsymbol{\omega})\mathbf{w})^{-1} H\_d(\boldsymbol{\omega}) \mathbf{w} \mathbf{D} \\ &\quad + \mathbf{D} \mathbf{w}^T H\_d(-\boldsymbol{\omega})(1 - H\_d(-\boldsymbol{\omega})\mathbf{w}^T)^{-1} \\ &\quad + \mathbf{D}, \end{split}$$

and find the constant term **D** which turns into a δ-function in the time domain. The first term in the second line of (16) decays like |ω| <sup>−</sup><sup>2</sup> and can be transformed just as **C**(ω) for input noise closing the contour in the upper half-plane. The second and third term are the transposed complex conjugates of each other, because of the dependence of *H* on −ω instead of ω, and require a special consideration. Multiplied by *ei*ω*<sup>t</sup>* under the Fourier integral, the first term is proportional to *Hde<sup>i</sup>*ω*<sup>t</sup>* ∼ ω−1*ei*ω(*t*−*d*) and vanishes faster than |ω| <sup>−</sup><sup>1</sup> for large |ω| in the upper half-plane for *t* > *d* and in the lower half plane for *t* < *d*. For the second term the half planes are interchanged. The application of the residue theorem requires closing the integration contour in the half-plane where the integral over the semi-circle vanishes faster than |ω| <sup>−</sup>1. For **w** = **M** and in the general case of a stable dynamics all poles of the first term are in the upper half-plane (*zk*(*Lj*)) > 0, and have no contribution to **c**(*t*) for *t* < *d*. For the second term the same is true for *t* > −*d*; these terms correspond to the jumps of **c**(*t*) after one delay, caused by the effect of the sending neuron arriving at the other neurons in the system after one synaptic delay. These terms correspond to the response of the system to the impulse of the sending neuron – hence we call them "echo terms" in the following (Helias et al., 2013). The presence of such discontinuous jumps at time points *d* and −*d* in the case of output noise is reflected in the convolution of *h***w** with **D** in the time domain in (37). For input noise the absence of discontinuities can be inferred from the absence of such terms in (33), where the derivative of the correlation function is equal to the sum of finite terms. The first summand in (16) corresponds to the covariance evoked by fluctuations propagating through the system originating from the same neuron and we call it "correlated input term". In the system with input noise a similar separation into effective echo and correlated input terms can be performed. We obtain the correlated input term as the covariance in an auxiliary population without outgoing connections and echo terms as the difference between the full covariance between neurons within the network and the correlated input term.

### **2.7. EXPLICIT EXPRESSION FOR THE POPULATION AVERAGED CROSS COVARIANCE IN THE TIME DOMAIN**

We obtain the population averaged cross spectrum in a recurrent random network of Ornstein–Uhlenbeck processes (OUP) with input noise by inserting the averaged connectivity matrix **w** = **M** (14) into (8). The explicit expression for the covariance function follows by taking into account all (both) eigenvalues of **M** with values 0 and *L* = *Kw*(1 − γ*g*). The detailed derivation of the results presented in this section are documented in "Calculation of the Population Averaged Cross Covariance in Time Domain". The expression for the cross spectrum (8) takes the form

$$\mathbf{C}(\boldsymbol{\omega}) = f(\boldsymbol{\omega})f(-\boldsymbol{\omega}) \left( 1 + K\nu \begin{pmatrix} \gamma \text{g } -\gamma \text{g} \\ 1 & -1 \end{pmatrix} H\_d(\boldsymbol{\omega}) \right)$$

$$\mathbf{D} \begin{pmatrix} 1 + K\nu \begin{pmatrix} \gamma \text{g} & 1 \\ -\gamma \text{g} & -1 \end{pmatrix} H\_d(-\boldsymbol{\omega}) \end{pmatrix}, \tag{17}$$

where we introduced *f*(ω) = (*Hd*(ω)−<sup>1</sup> − *L*)−<sup>1</sup> as a short hand. Sorting the terms by their dependence on ω, introducing the functions -1(ω), . . . , -<sup>4</sup>(ω) for this dependence, and ϕ1(*t*), . . . , ϕ4(*t*) for the corresponding functions in the time domain, the covariance in the time domain **<sup>c</sup>**(*t*) <sup>=</sup> <sup>1</sup> 2π - +∞ −∞ **C**(ω)*ei*ω*<sup>t</sup> d*ω takes the form

$$\begin{split} \mathbf{c}(t) &= \mathbf{D}\boldsymbol{\varphi}\_{1}(t) \\ &+ K\boldsymbol{\nu} \left( \begin{pmatrix} \boldsymbol{\chi}\mathbf{g} & -\boldsymbol{\chi}\mathbf{g} \\ 1 & -1 \end{pmatrix} \mathbf{D}\boldsymbol{\varphi}\_{2}(t) + \mathbf{D} \begin{pmatrix} \boldsymbol{\chi}\mathbf{g} & 1 \\ -\boldsymbol{\chi}\mathbf{g} & -1 \end{pmatrix} \boldsymbol{\varphi}\_{3}(t) \right) \\ &+ K^{2}\boldsymbol{\nu}^{2} \begin{pmatrix} \boldsymbol{\chi}\mathbf{g} & -\boldsymbol{\chi}\mathbf{g} \\ 1 & -1 \end{pmatrix} \mathbf{D} \begin{pmatrix} \boldsymbol{\chi}\mathbf{g} & 1 \\ -\boldsymbol{\chi}\mathbf{g} & -1 \end{pmatrix} \boldsymbol{\varphi}\_{4}(t) .\end{split}$$

The previous expression is valid for arbitrary **D**. In simulations presented in this article we consider identical marginal input statistics for all neurons. In this case the averaged activities for excitatory and inhibitory neurons are the same, so we can insert the special form of **D** given in (15), which results in

$$
\mathbf{c}(t) = \frac{\rho^2}{N} \begin{pmatrix} 1 & 0 \\ 0 \ \mathbf{y}^{-1} \end{pmatrix} \boldsymbol{\varphi}\_1(t) \tag{18}
$$

$$
+ \frac{\rho^2}{N} \text{Kw} \begin{pmatrix} \text{\"g} & -\text{g} \\ 1 & -\text{\"} \end{pmatrix} \boldsymbol{\varphi}\_2(t) + \frac{\rho^2}{N} \text{Kw} \begin{pmatrix} \text{\"g} & 1 \\ -\text{g} & -\text{\"} \end{pmatrix} \boldsymbol{\varphi}\_3(t)
$$

$$
+ \frac{\rho^2}{N} (\text{\"\/} + 1) \boldsymbol{K}^2 \boldsymbol{w}^2 \begin{pmatrix} \text{\"\"g}^2 & \text{\"\"g} \\ \text{\"g} & \text{\"\"} \end{pmatrix} \boldsymbol{\varphi}\_4(t).
$$

The time-dependent functions ϕ1,..., ϕ<sup>4</sup> are the same in both cases. Using the residue theorem <sup>ϕ</sup>*i*(*t*) <sup>=</sup> <sup>1</sup> 2π - +∞ −∞ *<sup>i</sup>*(ω)*ei*ω*<sup>t</sup> d*ω = *i z*∈poles of *<sup>i</sup>* Res(*<sup>i</sup>*,*z*)*eizt* for *t* - 0 they can be expressed as a sum over the poles *zk*(*L*) given by (12) and the pole *z* = *<sup>i</sup>* <sup>τ</sup> of *Hd*(ω). At ω = *zk*(*L*) the residue of *f*(ω) is Res(*f* , ω = *zk*(*L*)) = *idL* + *i*τ*ei*ω*<sup>d</sup>* −1 , the residue of *Hd*(ω) at *z* = *<sup>i</sup>* <sup>τ</sup> is <sup>−</sup> *<sup>i</sup>* <sup>τ</sup> *<sup>e</sup>d*/τ, so that the explicit forms of <sup>ϕ</sup>1,..., <sup>ϕ</sup><sup>4</sup> follow as

$$\begin{split} \varphi\_{1}(t) &= \sum\_{\boldsymbol{\alpha} = \boldsymbol{z}\_{k}(L)} i \text{Res}(f, \boldsymbol{\alpha}) f(-\boldsymbol{\alpha}) e^{i \boldsymbol{\alpha} t} \\ \varphi\_{2}(t) &= \sum\_{\boldsymbol{\alpha} = \boldsymbol{z}\_{k}(L)} i \text{Res}(f, \boldsymbol{\alpha}) f(-\boldsymbol{\alpha}) H\_{d}(\boldsymbol{\alpha}) e^{i \boldsymbol{\alpha} t} \\ &+ \frac{e^{(d-t)/\tau}}{\tau} f\left(\frac{i}{\tau}\right) f\left(-\frac{i}{\tau}\right) \end{split}$$

$$\begin{split} \varphi\_{3}(t) &= \sum\_{\boldsymbol{\alpha} = \boldsymbol{z}\_{k}(L)} i \text{Res}(f, \boldsymbol{\alpha}) f(-\boldsymbol{\alpha}) H\_{d}(-\boldsymbol{\alpha}) e^{i \boldsymbol{\alpha} t} \\ \varphi\_{4}(t) &= \sum\_{\boldsymbol{\alpha} = \boldsymbol{z}\_{k}(L)} i \text{Res}(f, \boldsymbol{\alpha}) f(-\boldsymbol{\alpha}) H\_{d}(\boldsymbol{\alpha}) H\_{d}(-\boldsymbol{\alpha}) e^{i \boldsymbol{\alpha} t} \\ &+ \frac{e^{-t/\tau}}{2\tau} f\left(\frac{i}{\tau}\right) f\left(-\frac{i}{\tau}\right). \end{split}$$

The corresponding expression for **C**(ω) for output noise is obtained by multiplying (17) with *H*−<sup>1</sup> *<sup>d</sup>* (ω)*H*−<sup>1</sup> *<sup>d</sup>* (−ω) = (1 + ω2τ2)

$$\mathbf{C}(\omega) = H\_d^{-1}(\omega) H\_d^{-1}(-\omega) f(\omega) f(-\omega) \tag{20}$$

$$\times (\mathbf{1} + \mathbf{K}\mathbf{w} \begin{pmatrix} \gamma \mathbf{g} & -\gamma \mathbf{g} \\ 1 & -1 \end{pmatrix} H\_d(\omega)) \mathbf{D}(\mathbf{1} + \mathbf{K}\mathbf{w} \begin{pmatrix} \gamma \mathbf{g} & 1 \\ -\gamma \mathbf{g} & -1 \end{pmatrix} H\_d(-\omega)),$$

which, after Fourier transform, provides the expression for **c**(*t*) in the time domain for *t* -0

$$\begin{split} \mathbf{c}(t) &= \mathbf{M} \mathbf{D} \mathbf{M}^T \boldsymbol{\varphi}\_1(t) + \mathbf{M} \mathbf{D} \boldsymbol{\varphi}\_0(t) + \mathbf{D} \boldsymbol{\delta}(t) \\ &= K^2 \boldsymbol{\nu}^2 \frac{\boldsymbol{\rho}^2}{N} (1 + \boldsymbol{\gamma} \mathbf{g}^2) \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \boldsymbol{\varphi}\_1(t) + K \boldsymbol{\nu} \frac{\boldsymbol{\rho}^2}{N} \begin{pmatrix} 1 & - \mathbf{g} \\ 1 & - \mathbf{g} \end{pmatrix} \boldsymbol{\varphi}\_0(t) \\ &+ \frac{\boldsymbol{\rho}^2}{N} \begin{pmatrix} 1 & 0 \\ 0 & \boldsymbol{\gamma}^{-1} \end{pmatrix} \boldsymbol{\delta}(t). \end{split} \tag{21}$$

As in (18), the first line holds for arbitrary **D**, and the second for **D** given by (15), valid if the firing rates are homogeneous. ϕ<sup>1</sup> is defined as before, and

$$\varphi\_0(t) = \theta(t - d) \sum\_{\omega = z\_k(L)} \left( dL + \tau e^{i\alpha d} \right)^{-1} e^{i\alpha t} \tag{22}$$

vanishes for *t* < *d*. All matrix elements of the first term in (21) are identical. Therefore all elements of **c**(*t*) are equal for 0 < |*t*| < *d*. Both rows of the matrix in front of ϕ<sup>0</sup> are identical, so for *t* > 0 the off diagonal term *cIE* coincides with *cEE* and *cEI* with *cII* and vice versa for *t* < 0.

As an illustration we show the functions ϕ0,..., ϕ<sup>4</sup> for one set of parameters in **Figure 4**. The left panels **(A,C)** correspond to contributions to the covariance caused by common input to a pair of neurons, the right panels **(B,D)** to terms due to the effect of one of the neurons' activities on the remaining network (echo terms). The upper panels **(A,B)** belong to the model with input noise, the lower panel **(C,D)** to the one with output noise.

**FIGURE 4 | Functions ϕ<sup>0</sup> (D), ϕ<sup>1</sup> (C), ϕ2, ϕ<sup>3</sup> (B), ϕ<sup>4</sup> (A) introduced in (19) and (22) for decomposition of covariance c***(t)***.** In **(B)** ϕ3(−*t*) is shown in gray and ϕ2(*t*) in black. The two functions are continuations of each other, joint at *t* = 0. Both functions appear in the echo term for input noise. The function ϕ<sup>0</sup> in **(D)** describing the corresponding echo term in the case of output noise is shifted to be aligned with the function in **(B)** to facilitate the comparison of **(B,D)**. Parameters in all panels are *d* = 3 ms, τ = 10 ms, *L* = −1.72.

For the rate dynamics with output noise, the term with ϕ<sup>1</sup> in (21) (shown in **Figure 4C**) is symmetric and describes the common input covariance and the term with ϕ<sup>0</sup> (shown in **Figure 4D**) is the echo part of the covariance. For the rate dynamics with input noise (18) the term containing ϕ<sup>4</sup> (shown in **Figure 4A**) is caused by common input and is hence also symmetric, the terms with ϕ<sup>2</sup> and ϕ<sup>3</sup> (shown in **Figure 4B**) correspond to the echo part and have hence their peak outside the origin. The second echo term in (18) is equal to the first one transposed and with opposite sign of the time argument, so we show ϕ2(*t*) and ϕ3(−*t*) together in one panel in **Figure 4B**. Note that for input noise, the term with ϕ<sup>1</sup> describes the autocovariance, which corresponds to the term with the δ-function in case of output noise.

The solution (18) is visualized in **Figure 6**, the solution (21) in **Figure 7**, and the decomposition into common input and echo parts is also shown and compared to direct simulations in **Figure 8**.

#### **3. BINARY NEURONS**

In the following sections we study, in turn, the binary neuron model, the Hawkes model and the LIF model and show how they can be mapped to one of the two OUPs; either the one with input or the one with output noise, so that the explicit solutions (18) and (21) for the covariances derived in the previous section can be applied. In the present section, we start with the binary neuron model (Ginzburg and Sompolinsky, 1994; Buice et al., 2009).

Following Ginzburg and Sompolinsky (1994) the state of the network of *N* binary model neurons is described by a binary vector **n** ∈ {0, 1}*<sup>N</sup>* and each neuron is updated at independently drawn time points with exponentially distributed intervals of mean duration τ. This stochastic update constitutes a source Grytskyy et al. Unified correlations

of noise in the system. Given the *i*-th neuron is updated, the probability to end in the up-state (*ni* = 1) is determined by the gain function *Fi*(**n**) which depends on the activity **n** of all other neurons. The probability to end in the down state (*ni* = 0) is 1 − *Fi*(**n**). Here we implemented the binary model in the NEST simulator (Gewaltig and Diesmann, 2007) as described in "Implementation of Binary Neurons in a Spiking Simulator Code". Such systems have been considered earlier (Ginzburg and Sompolinsky, 1994; Buice et al., 2009), and here we follow the notation employed in the latter work. In the following we collect results that have been derived in these works and refer the reader to these publications for the details of the derivations. The zero-time lag covariance function is defined as *cij*(*t*) = *ni*(*t*)*nj*(*t*) − *ai*(*t*)*aj*(*t*), with the expectation value taken over different realizations of the stochastic dynamics. Here **a**(*t*) = (*a*1(*t*), . . . , *aN*(*t*))*<sup>T</sup>* is the vector of mean activities *ai*(*t*) = *ni*(*t*). *cij*(*t*) fulfills the differential equation

$$\mathbf{u}\frac{d}{dt}\mathbf{c}\_{\vec{\eta}}(t) = -2\mathbf{c}\_{\vec{\eta}}(t) + \langle (\mathbf{u}\_{\vec{\jmath}}(t) - a\_{\vec{\jmath}}(t))F\_{\vec{\imath}}(\mathbf{n}) \rangle$$

$$+ \langle (\mathbf{n}\_{\vec{\imath}}(t) - a\_{\vec{\imath}}(t))F\_{\vec{\jmath}}(\mathbf{n}) \rangle.$$

In the stationary state, the covariance therefore fulfills

$$c\_{\vec{\eta}} = \frac{1}{2} \langle (\eta\_{\vec{\eta}} - a\_{\vec{\eta}}) F\_{\vec{\imath}}(\mathbf{n}) \rangle + \frac{1}{2} \langle (\eta\_{\vec{\imath}} - a\_{\vec{\imath}}) F\_{\vec{\jmath}}(\mathbf{n}) \rangle. \tag{23}$$

The time lagged covariance *cij*(*t*, *s*) = *ni*(*t*)*nj*(*s*) − *ai*(*t*)*aj*(*s*) fulfills for *t* > *s* the differential equation

$$\text{tr}\frac{d}{dt}\boldsymbol{c}\_{\vec{\eta}}(t,\mathbf{s}) = -\boldsymbol{c}\_{\vec{\eta}}(t,\mathbf{s}) + \langle F\_i(\mathbf{n},\mathbf{t})(\eta\_j(\mathbf{s}) - a\_j(\mathbf{s})) \rangle. \tag{24}$$

This equation is also true for *i* = *j*, the autocovariance. The term *Fi*(**n**, *t*)(*nj*(*s*) − *aj*(*s*)) has a simple interpretation: it measures the influence of a fluctuation of neuron *j* at time *s* around its mean value on the gain of neuron *i* at time *t* (Ginzburg and Sompolinsky, 1994). We now assume a particular form for the coupling between neurons

$$F\_l(\mathbf{n}, t) = \phi(\mathbf{J}\_l \mathbf{n}(t - d)) = \phi\left(\sum\_{k=1}^{N} J\_{ik} \eta\_k(t - d)\right), \quad (25)$$

where **J***<sup>i</sup>* is the vector of incoming synaptic weights into neuron *i* and φ is a non-linear gain function. Assuming that the fluctuations of the total input **J***i***n** into the *i*-th neuron are sufficiently small to allow a linearization of the gain function φ, we obtain the Taylor expansion

$$F\_i(\mathbf{n}, t) = F\_i(\mathbf{a}) + \phi'(\mathbf{J}\_i \mathbf{a}) \mathbf{J}\_i(\mathbf{n}(t - d) - \mathbf{a}(t - d)),$$

where

$$\phi'(\mathbf{J}\_i \mathbf{a})\tag{26}$$

is the slope of the gain function at the point of mean input.

Up to this point the treatment of the system is identical to the work of Ginzburg and Sompolinsky (1994). Now we present an alternative approach for the linearization which takes into account the effect of fluctuations in the input. For sufficiently asynchronous network states, the fluctuations in the input **J***i***n**(*t* − *d*) to neuron *i* can be approximated by a Gaussian distribution *N* (μ, σ). In the following we consider a homogeneous random network with fixed in-degree as described in "Populationaveraged covariances". As each neuron receives the same number *K* of excitatory and γ*K* inhibitory synapses, the marginal statistics of the summed input to each neuron is identical. The mean input to a neuron then is μ = *KJ*(1 − γ*g*)*a*, where *a* is the mean activity of a neuron in the network. If correlations are small, the variance of this input signal distribution can be approximated as the sum of the variances of the individual contributions from the incoming signals, resulting in σ<sup>2</sup> = *KJ*2(1 + γ*g*2) *a*(1 − *a*), where we used the fact that the variance of a binary variable with mean *a* is *a*(1 − *a*). This results from a direct calculation: since *n* ∈ {0, 1}, *n*<sup>2</sup> = *n*, so that the variance is *n*2−*n*<sup>2</sup> = *n*−*n*<sup>2</sup> = *a*(1 − *a*). Averaging the slope φ of the gain function over the distribution of the input variable results in the averaged slope

$$
\langle \phi' \rangle \cong \int\_{-\infty}^{\infty} \mathcal{N}(\mu, \sigma, \mathbf{x}) \, \phi'(\mathbf{x}) \, d\mathbf{x} \tag{27}
$$

$$
\mathbf{1} \qquad \Big( \quad (\mathbf{x} - \mu)^2 \Big)
$$

$$\text{with}\,\mathcal{N}(\mu,\sigma,\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{(\mathbf{x}-\mu)^2}{2\sigma^2}\right).$$

The two alternative methods of linearization of φ are illustrated in **Figure 5**. In the given example, the linearization procedure taking into account the fluctuations of the input signal results in a smaller effective slope φ than taking the slope φ (*a*) at the mean activity *a* near its maximum. Averaging the slope φ over this distribution fits simulation results better than φ (*a*) calculated at the mean of *a*, as shown in **Figure 6**.

The finite slope of the non-linear gain function can be understood as resulting from the combination of a hard threshold with an intrinsic local source of noise. The inverse strength of this noise

**FIGURE 5 | Alternative linearizations of the binary neuron model.** The black curve represents the non-linear gain function φ(*x*) = <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> tanh(β*x*). The dashed gray line is its tangent at the mean input value (denoted by the diagonal cross). The solid curve is the slope φ averaged over the distribution of the fluctuating input (27). This distribution estimated from direct simulation is presented by black dots, the corresponding theoretical prediction of a normal distribution *N*(μ, σ) (27) is shown as the light gray curve.

determines the slope parameter β (Ginzburg and Sompolinsky, 1994). In this sense, the network model contains two sources of noise, the explicit local noise, quantified by β and the fluctuating synaptic input interpreted as self-generated noise on the network level, quantified by σ. Even in the absence of local noise (β → ∞), the above mentioned linearization is applicable and yields a finite effective slope φ (27). In the latter case the resulting effective synaptic weight is independent of the original synapse strength (Grytskyy et al., 2013).

We now extend the classical treatment of covariances in binary networks (Ginzburg and Sompolinsky, 1994) by synaptic conduction delays. In (25) *Fi*(**n**, *t*) must therefore be understood as a functional acting on the function **n**(*t* ) for *t* ∈ [−∞, *t*], so that also synaptic connections with time delay *d* can be realized. We define an effective weight vector to absorb the gain factor as **w***<sup>i</sup>* = β*i***J***i*, with either β*<sup>i</sup>* = φ (μ) or β*<sup>i</sup>* = φ depending on the linearization procedure, and expand the right hand side of (24) to obtain

$$
\langle F\_{\vec{l}}(\mathbf{n},t)(n\_{\vec{j}}(s) - a\_{\vec{j}}(s))\rangle = \sum\_{k=1}^{N} \,\omega\_{ik}c\_{kj}(t-d,s).
$$

Thus the cross-covariance fulfills the matrix delay differential equation

$$\mathfrak{tr}\frac{d}{dt}\mathbf{c}(t,s) + \mathbf{c}(t,s) = \mathbf{w}\mathbf{c}(t-d,s). \tag{28}$$

This differential equation is valid for *t* > *s*. For the stationary solution, the differential equation only depends on the relative timing *u* = *t* − *s*

$$\mathfrak{tr}\frac{d}{du}\mathfrak{c}(u) + \mathfrak{c}(u) = \mathfrak{wc}(u-d). \tag{29}$$

The same linearization applied to (23) results in the boundary condition for the solution of the previous equation

$$\mathbf{2c}(0) = \mathbf{wc}(-d) + \left(\mathbf{wc}(-d)\right)^T \tag{30}$$

or, if we split **c** into its diagonal and its off-diagonal parts **c***<sup>a</sup>* and **c**=

$$\mathbf{2c}\_{\neq}(0) = \mathbf{w}\mathbf{c}\_{\neq}(-d) + \left(\mathbf{w}\mathbf{c}\_{\neq}(-d)\right)^{T} + \mathbf{O} \tag{31}$$
 
$$\text{with } \mathbf{O} = \mathbf{w}\mathbf{c}\_{a}(-d) + \left(\mathbf{w}\mathbf{c}\_{a}(-d)\right)^{T}.$$

In the following section we use this representation to demonstrate the equivalence of the covariance structure of binary networks to the solution for OUP with input noise.

#### **3.1. EQUIVALENCE OF BINARY NEURONS AND ORNSTEIN–UHLENBECK PROCESSES**

In the following subsection we show that the same Equations (29) and (31) for binary neurons also hold for the Ornstein-Uhlenbeck process (OUP) with input noise. In doing so here we also extend the existing framework of OUP (Risken, 1996) to synaptic conduction delays *d*. A network of such processes is described by

$$\mathbf{r}\frac{d}{dt}\mathbf{r}(t) + \mathbf{r}(t) = \mathbf{w}\mathbf{r}(t-d) + \mathbf{x}(t),\tag{32}$$

where **x** is a vector of pairwise uncorrelated white noise with **x**(*t*)*<sup>x</sup>* = 0 and *xi*(*t*)*xj*(*t* + *t* )*<sup>x</sup>* = δ*ij*δ(*t* )ρ2. With the help of the Green's function *G* satisfying (τ *<sup>d</sup> dt* + 1)*G*(*t*) = δ(*t*), namely *G*(*t*) = <sup>1</sup> <sup>τ</sup> <sup>θ</sup>(*t*)*e*−*t*/τ, we obtain the solution of Equation (32) as

$$\mathbf{r}(t) = \mathbf{r}G(t)\mathbf{r}(0) + \int\_0^t \mathbf{G}(t - t')(\mathbf{w}\mathbf{r}(t' - d) + \mathbf{x}(t'))\,dt'.$$

The equation for the fluctuations δ**r**(*t*) = **r**(*t*) − **r**(*t*)*<sup>x</sup>* around the expectation value

$$\delta \mathbf{r}(t) = \int\_0^t G(t - t') (\mathbf{w} \delta \mathbf{r}(t' - d) + \mathbf{x}(t')) \, dt'$$

coincides with the noisy rate model with input noise (4) with delay *d* and convolution kernel *h* = *G*. In the next step we investigate the covariance matrix *cij*(*t*, *s*) = δ*ri*(*t* + *s*)δ*rj*(*t*)*<sup>x</sup>* to show for which choice of parameters the covariance matrices for the binary model and the OUP with input noise coincide. To this end we derive the differential equation with respect to the time lag *s* for positive lags *s* > 0

$$\mathbf{r}\frac{d}{ds}\mathbf{c}(t,s) = \left\langle \mathbf{r}\frac{d}{ds}\delta\mathbf{r}(t+s)\delta\mathbf{r}^T(t) \right\rangle\_{\mathbf{x}} \tag{33}$$

$$= \left\langle (\mathbf{w}\delta\mathbf{r}(t+s-d) - \delta\mathbf{r}(t+s) + \mathbf{x}(t+s))\delta\mathbf{r}^T(t) \right\rangle\_{\mathbf{x}}$$

$$= \mathbf{w}\mathbf{c}(t,s-d) - \mathbf{c}(t,s),$$

where we used **x**(*t* + *s*))δ**r**(*t*)*<sup>x</sup>* = 0, because the noise is realized independently for each time step and the system is causal. Equation (33) is identical to the differential equation satisfied by the covariance matrix (28) for binary neurons (Ginzburg and Sompolinsky, 1994). To determine the initial condition of (33) we need to take the limit **c**(*t*, 0) = lim*s*→+<sup>0</sup> **c**(*t*,*s*). This initial condition can be obtained as the stationary solution of the following differential equation

$$\begin{split} \pi \frac{d}{dt} \mathbf{c}(t, \mathbf{0}) &= \lim\_{s \to +0} \left( \left\langle \mathbf{r} \frac{d}{dt} \delta \mathbf{r}(t+s) \delta \mathbf{r}^T(t) \right\rangle\_{\mathbf{x}} + \left\langle \delta \mathbf{r}(t+s) \mathbf{r} \frac{d}{dt} \delta \mathbf{r}^T(t) \right\rangle\_{\mathbf{x}} \right) \\ &= \lim\_{s \to +0} \left( \langle (\mathbf{w} \delta \mathbf{r}(t+s-d) - \delta \mathbf{r}(t+s) + \mathbf{x}(t+s)) \delta \mathbf{r}^T(t) \rangle\_{\mathbf{x}} \right) \\ &\quad + \langle \delta \mathbf{r}(t+s) (\delta \mathbf{r}^T(t-d) \mathbf{w}^T - \delta \mathbf{r}^T(t) + \mathbf{x}^T(t)) \rangle\_{\mathbf{x}} \right) \\ &= -2\mathbf{c}(t, 0) + \mathbf{w} \mathbf{c}(t, -d) + \mathbf{c}(t-d, d) \mathbf{w}^T + \mathbf{D}. \end{split}$$

Here we used that **x**(*t* + *s*)δ**r***T*(*t*) vanishes due to independent noise realizations and causality and

$$\begin{split} \mathbf{D} &= \lim\_{s \to +0} \langle \delta \mathbf{r}(t+s) \mathbf{x}^T(t) \rangle\_{\mathbf{x}} \\ &= \lim\_{s \to +0, \, s \prec d} \int\_0^{t+s} G(t+s-t') (\mathbf{w} \underbrace{\langle \delta \mathbf{r}(t'-d) \mathbf{x}^T(t) \rangle\_{\mathbf{x}}}\_{=0 \text{ causality}} + \underbrace{\langle \mathbf{x}(t') \mathbf{x}^T(t) \rangle\_{\mathbf{x}}}\_{=1 \delta(t-t')\rho^2} dt' \\ &= \lim\_{s \to +0, \, s \prec d} \int\_0^{t+s} G(t+s-t') 1 \delta(t-t') \rho^2 dt' \\ &= \lim\_{s \to +0, \, s \prec d} G(s) \mathbf{1} \boldsymbol{\rho}^2 = \frac{1}{\pi} \mathbf{1} \boldsymbol{\rho}^2. \end{split}$$

In the stationary state, **c** only depends on the time lag *s* and is independent of the first time argument *t*, which, with the symmetry **c**(−*d*)*<sup>T</sup>* = **c**(*d*) yields the additional condition for the solution of (33)

$$\mathbf{2c}(0) = \mathbf{wc}(-d) + (\mathbf{wc}(-d))^T + \mathbf{D}$$

or, if **c** is split in diagonal and off-diagonal parts **c***<sup>a</sup>* and **c**=, respectively,

$$2\mathbf{c}\_{\neq}(0) = \mathbf{w}\mathbf{c}\_{\neq}(-d) + \left(\mathbf{w}\mathbf{c}\_{\neq}(-d)\right)^{T} + \mathbf{O}$$

$$2\mathbf{c}\_{a}(0) = \mathbf{w}\mathbf{c}\_{\neq}(-d) + \left(\mathbf{w}\mathbf{c}\_{\neq}(-d)\right)^{T} + \mathbf{D}$$

with **O** = **wc***a*(−*d*) + (**wc***a*(−*d*))*T*. In the equation for the autocovariance **c***<sup>a</sup>* the first two terms are contributions due to the cross covariance. In the state of asynchronous network activity with *cij* ∼ *N*−<sup>1</sup> for *i* = *j* these terms are typically negligible in comparison to the third term because *<sup>k</sup> wikcki* <sup>∼</sup> *wKN*−<sup>1</sup> <sup>=</sup> *pw*, which is typically smaller than 1 for small effective weights *w* < 1 and small connection probabilities *p* 1. In this approximation with (33) the temporal shape of the autocovariance function is exponentially decaying with time constant τ. With **c***a*(0) ≈ **D**/2 the approximate solution for the autocovariance is

$$\mathbf{c}\_a(t) = \frac{\mathbf{D}}{2} \exp\left(-\frac{|t|}{\mathfrak{r}}\right). \tag{34}$$

The cross covariance then satisfies the initial condition

$$2\mathbf{c}\_{\neq}(0) = \mathbf{w}\mathbf{c}\_{\neq}(-d) + (\mathbf{w}\mathbf{c}\_{\neq}(-d))^T + \mathbf{O},$$

$$\mathbf{O} = \mathbf{w}\mathbf{D}/2 + (\mathbf{w}\mathbf{D}/2)^T,$$

which coincides with (31) for binary neurons if the diagonal matrix containing the zero time autocorrelations **c***a*(0) for binary neurons is equal to **D**/2, i.e., if the amplitude of the input noise ρ<sup>2</sup> = 2τ*a*(1 − *a*) and the effective linear coupling satisfies **w***<sup>i</sup>* = β*i***J***i*. **Figure 6** shows simulation results for population averaged covariance functions in binary networks and in networks of OUPs with input noise where the parameters of the OUP network are chosen according to the requirements derived above. The theoretical results (18) agree well with the direct simulations of both systems. For comparison, both methods of linearization, as explained above, are shown. The linearization procedure which takes into account the noise on the input side of the non-linear gain function results in a more accurate prediction. Moreover, the results derived here extend the classical theory (Ginzburg and Sompolinsky, 1994) by considering synaptic conduction delays. **Figure 8** shows the decomposition of the covariance structure for a non-zero delay *d* = 3 ms. For details of the implementation see "Implementation of binary neurons in a spiking simulator code". The explicit effect of introducing delays into the system, such as the appearance of oscillations in the time dependent covariance, is presented in **(E,F)** of **Figure 6**, differing from **(A,B)** of this figure, respectively, only in the delay (*d* = 10 ms for **(E,F)**, *d* = 0.1 ms for **(A,B)**).

#### **4. HAWKES PROCESSES**

In the following section we show that to linear order the covariance functions in networks of Hawkes processes (Hawkes, 1971) are equivalent to those in the linear rate network with output noise. Hawkes processes generate spikes randomly with a time density given by **r**(*t*), where neuron *i* generates spikes at a rate *ri*(*t*), realized independently within each infinitesimal time step. Arriving spike trains **s** influence **r** according to

$$\mathbf{r}(t) = \boldsymbol{\upsilon} + (h\_d \* \mathbf{J} \mathbf{s})(t),\tag{35}$$

with the connectivity matrix **J** and the kernel function *hd* including the delay. Here ν is a constant base rate of spike emission assumed to be equal for each neuron. Here we employ the implementation of the Hawkes model in the NEST simulator (Gewaltig and Diesmann, 2007). The implementation is described in "Implementation of Hawkes neurons in a spiking simulator code".

Given neuron *j* spiked at time *u* ≤ *t*, the probability of a spike in the interval [*t*, *t* + δ*t*) for neuron *i* is 1 if *i* = *j*, *u* = *t* (the neuron spikes synchronously with itself) and *ri*(*t*)δ*t* + *o*(δ*t* <sup>2</sup>) otherwise. Considering the system in the stationary state with the time averaged activity **r**¯ = **s**(*t*) we obtain a convolution equation for time lags τ ≥ 0 for the covariance matrix with the entry *cij*(τ) for the covariance between spike trains of neurons *i* and *j*

$$\mathbf{c}(\mathbf{r}) = \langle \mathbf{s}(t+\mathbf{r})\mathbf{s}^T(t) \rangle - \langle \mathbf{s}(t+\mathbf{r}) \rangle \langle \mathbf{s}^T(t) \rangle \tag{36}$$

$$= \langle (\mathbf{\bar{s}}(\mathbf{r})\mathbf{I} + \mathbf{r}(t+\mathbf{r}))\mathbf{s}^T(t) \rangle - \bar{\mathbf{r}}\bar{\mathbf{r}}^T$$

$$= \langle \mathbf{r}(t+\mathbf{r})(\mathbf{s}^T(t) - \bar{\mathbf{r}}^T) \rangle + \mathbf{D}\_{\tilde{\mathbf{r}}}$$

$$= \langle (\mathbf{v} + (h\_d \* \mathbf{J}\mathbf{s})(t+\mathbf{r}))(\mathbf{s}^T(t) - \bar{\mathbf{r}}^T) \rangle + \mathbf{D}\_{\tilde{\mathbf{r}}}$$

$$= h\_d \* \mathbf{J}(\mathbf{s}(t+\mathbf{r})(\mathbf{s}^T(t) - \bar{\mathbf{r}}^T)) + \mathbf{D}\_{\tilde{\mathbf{r}}}$$

$$= (h\_d \* \mathbf{J}\mathbf{c})(\mathbf{r}) + \mathbf{D}\_{\tilde{\mathbf{r}}},$$

with the diagonal matrix **Dr** = δ(τ)diag(**r**¯), which has been derived earlier (Hawkes, 1971). If the rates of all neurons are equal, **r**¯*<sup>i</sup>* = *r*¯, all entries in the diagonal matrix are the same, **Dr** = δ(τ)**1***r*¯. In the subsequent section we demonstrate that the same convolution Equation (36) holds for the linear rate with output noise.

#### **4.1. CONVOLUTION EQUATION FOR LINEAR NOISY RATE NEURONS**

For the linear rate model with output noise we use Equation (3) for time lags τ > 0 to obtain a convolution equation for the covariance matrix of the output signal vector **y** = **r** + **x** as

$$\begin{split} \mathbf{c}(\mathbf{r}) &= \langle \mathbf{y}(t+\tau)(\mathbf{y}^T(t) - \bar{\mathbf{r}}^T) \rangle \\ &= \langle (h\_d \ast \mathbf{w} \mathbf{y} + \mathbf{x})(t+\tau)(\mathbf{y}^T(t) - \bar{\mathbf{r}}^T) \rangle \\ &= (h\_d \ast \mathbf{w} \mathbf{c})(\mathbf{r}) + \langle \mathbf{x}(t+\tau)(\mathbf{r}^T(t) - \bar{\mathbf{r}}^T) \rangle + \langle \mathbf{x}(t+\tau)\mathbf{x}^T(t) \rangle \\ &= (h\_d \ast \mathbf{w} \mathbf{c})(\mathbf{r}) + \mathbf{D}, \end{split} \tag{3.6}$$

where we utilized that due to causality the random noise signal generated at *t* + τ has no influence on **r**(*t*), so the respective correlation vanishes. **D** is the covariance of the noise as in (11), *Dij*(τ) = *xi*(*t*)*xj*(*t* + τ) = δ*ij*δ(τ)ρ2. If ρ is chosen such that ρ<sup>2</sup> coincides with the averaged activity *r*¯ in a network of Hawkes neurons and the connection matrix **w** is identical to **J** of the Hawkes network, the Equations (36) and (37) are identical. Therefore the cross spectrum of both systems is given by (11).

#### **4.2. NON-LINEAR SELF-CONSISTENT RATE IN RECTIFYING HAWKES NETWORKS**

The convolution Equation (36) for the covariance matrix of Hawkes neurons is exact if no element of **r** is negative, which is particularly the case for a network of only excitatory neurons. Especially in networks including inhibitory couplings, the intensity *ri* of neuron *i* may assume negative values. A neuron with *ri* < 0 does not emit spikes, so the instantaneous rate is given by λ*<sup>i</sup>* = [*ri*(*t*)]+ = θ(*ri*(*t*))*ri*(*t*), with the Heaviside function θ. We now take into account this effective nonlinearity –the rectification of the Hawkes model neuron– in a similar manner as we already used to linearize binary neurons. If the network is in the regime of low spike rates, the fluctuations in the input of each neuron due to the Poissonian arrival of spikes are large compared to the fluctuations due to the time varying intensities **r**(*t*). Considering the same homogeneous network structure as described in "Population-averaged covariances," the input statistics is identical for each cell *i*, so the mean activity λ<sup>0</sup> = λ*i* is the same for all neurons *i*. The superposition of the synaptic inputs to neuron *i* cause an instantaneous intensity *ri* that follows approximately a Gaussian distribution *N* (μ, σ,*ri*) with mean μ = *r* = ν + λ0*KJ*(1 − *g*γ) and standard deviation σ = *r*2−*r*<sup>2</sup> = *J* λ<sup>0</sup> <sup>2</sup>τ*K*(1 + *g*2γ). These expressions hold for the exponential kernel (5) due to Campbell's theorem (Papoulis and Pillai, 2002), because of the stochastic Poisson-like arrival of incoming spikes, where the standard deviation of the spike count is proportional to the square root of the intensity λ0. The rate λ<sup>0</sup> is accessible by explicit integration over the Gaussian probability density as

$$\begin{split} \lambda\_{0} &= \int\_{-\infty}^{\infty} \mathcal{N}(\mu, \sigma, r) \, r \, \theta(r) \, dr \\ &= \frac{1}{\sqrt{2\pi}\sigma} \int\_{0}^{\infty} \exp\left(-\frac{(r-\mu)^{2}}{2\sigma^{2}}\right) r \, dr \\ &= \frac{-\sigma}{\sqrt{2\pi}} \int\_{0}^{\infty} \exp\left(-\frac{(r-\mu)^{2}}{2\sigma^{2}}\right) \frac{-(r-\mu)}{\sigma^{2}} \, dr \\ &\quad + \frac{\mu}{\sqrt{2\pi}\sigma} \int\_{0}^{\infty} \exp\left(-\frac{(r-\mu)^{2}}{2\sigma^{2}}\right) dr \\ &= \frac{\sigma}{\sqrt{2\pi}} \exp\left(-\frac{\mu^{2}}{2\sigma^{2}}\right) + \frac{\mu}{2} \left(1 - \text{erf}\left(-\frac{\mu}{\sqrt{2}\sigma}\right)\right). \end{split}$$

This equation needs to be solved self-consistently (numerically or graphically) to determine the rate in the network, as the right hand side depends on the rate λ<sup>0</sup> itself through μ and σ. Rewritten as

$$
\lambda\_0 = \frac{\sigma}{\sqrt{2\pi}} \exp\left(-\frac{\mu^2}{2\sigma^2}\right) + \mu P\_{\mu,\sigma}(r > 0)
$$

$$
P\_{\mu,\sigma}(r > 0) = \frac{1}{2} - \frac{1}{2} \text{erf}\left(-\frac{\mu}{\sqrt{2}\sigma}\right),\tag{38}
$$

*P*μ,σ(*r* > 0) is the probability that the intensity of a neuron is above threshold and therefore contributes to the transmission of a small fluctuation in the input. A neuron for which *r* < 0 acts as if it was absent. Hence we can treat the network with rectifying neurons completely analogous to the case of linear Hawkes processes, but multiply the synaptic weight *J* or −*gJ* of each neuron with *P*μ,σ(*r* > 0), i.e., the linearized connectivity matrix is

$$\mathbf{w} = P\_{\mu,\sigma}(r > 0)\mathbf{J}.\tag{39}$$

**Figure 7** shows the agreement of the covariance functions obtained from direct simulation of the network of Hawkes processes and the analytical solution (21) with average firing rate

**FIGURE 7 | Covariance structure in spiking networks corresponds to OUP with output noise. (A)** Autocovariance obtained by direct simulation of the LIF (black), Hawkes (gray), and OUP (light gray) models for excitatory (dots) and inhibitory neurons (crosses). **(B)** Covariance *cEI* averaged over disjoint pairs of neurons for LIF (black dots), Hawkes (gray dots), and OUP with output noise (empty circles). **(C)** Covariance averaged over disjoint pairs of neurons of the same type. **(D)** Autocovariance of the population averaged activity. Averages in **(C,D)** over excitatory neurons as black dots, over inhibitory neurons as gray dots. Corresponding theoretical predictions (21) are plotted as light gray curves in all panels except **(A)**. Light gray diagonal crosses in **(A,D)** denote theoretical peak positions determined by the firing rate *r*¯ as *r*¯*t* (where *t* = 0.1 ms is the time resolution of the histogram).

λ<sup>0</sup> determined by (38), setting the effective strength of the noise ρ<sup>2</sup> = λ0, and the linearized coupling as described above. The detailed procedure for choosing the parameters in the direct simulation is described together with the implementation of the Hawkes model in "Implementation of Hawkes neurons in a spiking simulator code".

#### **5. LEAKY INTEGRATE-AND-FIRE NEURONS**

In this section we consider a network of LIF model neurons with exponentially decaying postsynaptic currents and show its equivalence to the network of OUP with output noise, valid in the asynchronous irregular regime. A spike sent by neuron *j* at time *t* arrives at the target neuron *i* after the synaptic delay *d*, elicits a synaptic current *Ii* that decays with time constant τ*<sup>s</sup>* and causes a response in the membrane potential *Vi* proportional to the synaptic efficacy *Jij*. With the time constant τ*<sup>m</sup>* of the membrane potential, the coupled set of differential equations governing the subthreshold dynamics of a single neuron *i* is (Fourcaud and Brunel, 2002)

$$\begin{aligned} \mathfrak{r}\_m \frac{dV\_i}{dt} &= -V\_i + I\_i(t) \\ \mathfrak{r}\_s \frac{dI\_i}{dt} &= -I\_i + \mathfrak{r}\_m \sum\_{j=1,j}^N I\_{ij} \mathfrak{s}\_j(t-d), \end{aligned} \tag{40}$$

where the membrane resistance was absorbed into the definitions of *Jij* and *Ii*. If *Vi* reaches the threshold *V*<sup>θ</sup> at time point *t i <sup>k</sup>* the neuron emits an action potential and the membrane potential is reset to *Vr*, where it is clamped for the refractory time τ*r*. The spiking activity of neuron *i* is described by this sequence of action potentials, the spike train *si*(*t*) = *<sup>k</sup>* δ(*t* − *t i <sup>k</sup>*). The dynamics of a single neuron is deterministic, but in network states of asynchronous, irregular activity and in the presence of external Poisson inputs to the network, the summed input to each cell can well be approximated as white noise (Brunel, 2000) with first moment μ*<sup>i</sup>* = τ*<sup>m</sup> <sup>j</sup> Jijrj* and second moment σ<sup>2</sup> *<sup>i</sup>* = τ*<sup>m</sup> j J*2 *ijrj*, where *rj* is the stationary firing rate of neuron *j*. The stationary firing rate of neuron *i* is then given by Fourcaud and Brunel (2002)

$$r\_i^{-1} = \mathfrak{r}\_r + \mathfrak{r}\_m \sqrt{\pi} \left( F(\mathfrak{y}\_\Theta) - F(\mathfrak{y}\_r) \right) \tag{41}$$

$$f(\mathfrak{y}) = e^{\mathfrak{y}^2} (1 + \text{erf}(\mathfrak{y})) \quad F(\mathfrak{y}) = \int^\mathcal{V} f(\mathfrak{y}) \, d\mathfrak{y}$$

$$\text{with } \mathfrak{y}\_{\Theta, r} = \frac{V\_{\Theta, r} - \mu\_i}{\sigma\_i} + \frac{\alpha}{2} \sqrt{\frac{\mathfrak{r}\_s}{\mathfrak{r}\_m}} \quad \alpha = \sqrt{2} \left| \mathfrak{k} \left( \frac{1}{2} \right) \right|,$$

with Riemann's zeta function ζ. The response of the LIF neuron to the injection of an additional spike into afferent *j* determines the impulse response *wijh*(*t*) of the system. The time integral *wij* = *wij* - ∞ <sup>0</sup> *h*(*t*) *dt* is the DC-susceptibility, which can formally be written as the derivative of the stationary firing rate by the rate of the afferent *rj*, which, evaluated by help of (41), yields (Helias et al., 2013, Results and App. A)

$$\omega\_{i\dot{j}} = \frac{\partial r\_i}{\partial r\_{\dot{j}}} = \alpha I\_{\dot{i}\dot{j}} + \beta I\_{\dot{i}\dot{j}}^2 \tag{42}$$

$$\begin{aligned} \text{with } \alpha &= \sqrt{\pi} \left( \mathfrak{r}\_m r\_i \right)^2 \frac{1}{\sigma\_i} \left( f(\mathfrak{y}\_\Theta) - f(\mathfrak{y}\_r) \right) \\ \text{and } \mathfrak{h} &= \sqrt{\pi} \left( \mathfrak{r}\_m r\_i \right)^2 \frac{1}{2\sigma\_i^2} \left( f(\mathfrak{y}\_\Theta) \frac{V\_\Theta - \mathfrak{\mu}\_i}{\sigma\_i} - f(\mathfrak{y}\_r) \frac{V\_r - \mathfrak{\mu}\_i}{\sigma\_i} \right). \end{aligned}$$

In the strongly fluctuation-driven regime, the temporal behavior of the kernel *h* is dominated by a single exponential decay, whose time constant can be determined empirically. In a homogeneous random network the firing rates of all neurons are identical *ri* = *r*¯ and follow from the numerical solution of the self-consistency Equation (41). Approximating the autocovariance function of a single spike train by a δ-peak scaled by the rate *r*¯δ(*t*), one obtains for the covariance function **c** between pairs of spike trains the same convolution Equation (36) as for Hawkes neurons (Helias et al., 2013, cf. equation 5). As shown in "Convolution equation for linear noisy rate neurons" this convolution equation coincides with that of a linear rate model with output noise (37), where the diagonal elements of **D** are chosen to agree to the average spike rate ρ<sup>2</sup> = *r*¯. The good agreement of the analytical cross covariance functions (21) for the OUP with output noise and direct simulation results for LIF are shown in **Figure 7**.

#### **6. DISCUSSION**

In this work we describe the path to a unified theoretical view on pairwise correlations in recurrent networks. We consider binary neuron models, LIF models, and linear point process models. These models containing a non-linearity (spiking threshold in spiking models, non-linear sigmoidal gain function in binary neurons, strictly positive rates in Hawkes processes) are linearized, taking into account the distribution of the fluctuating input.

The work presents results for several neuron models: We derive analytical expressions for delay-coupled OUP with input and with output noise, we extend the analytical treatment for stochastic binary neurons to the presence of synaptic delays, present a method that takes into account network-generated noise to determine the effective gain function, extend the theory of Hawkes processes to the existence of delays and inhibition, and present in Equation (12) a condition for the onset of global oscillations caused by delayed feedback, generalized to feedback pathways through different eigenvalues of the connectivity.

Some results qualitatively extend the existing theory (delays, inhibition), others improve the accuracy of existing theories (linearization including fluctuations). More importantly, our approach enables us to demonstrate the equivalence of each of these models after linear approximation to a linear model with fluctuating continuous variables. The fact that linear perturbation theory leads to effective linear equations is of course not surprising, but the analytical procedure firstly enables a mapping between models that conserves quantitative results and secondly allows us to uncover common structures underlying the emergence of correlated activity in recurrent networks. For the commonly appearing exponentially decaying response kernel function, these rate models coincide with the OUP (OUP, Uhlenbeck and Ornstein, 1930; Risken, 1996). We find that the considered models form two groups, which, in linear approximation merely differ by a matrix valued factor scaling the noise and in the choice of variables interpreted as neural activity. The difference between these two groups corresponds to the location of the noise: spiking models—LIF models and Hawkes models—belong to the class with noise on the output side, added to the activity of each neuron. The non-spiking binary neuron model corresponds to an OUP where the noise is added on the input side of each neuron. The closed solution for the correlation structure of OUP holds for both classes.

We identify different contributions to correlations in recurrent networks: the solution for output noise is split into three terms corresponding to the δ-peak in the autocovariance, the covariance caused by shared input, and the direct synaptic influence of stochastic fluctuations of one neuron on another–the latter echo terms are equal to propagators acting with delays (Helias et al., 2013). A similar splitting into echo and correlated input terms for the case of input noise is shown in **Figure 8**. For increasing network size *N* → ∞, keeping the connection probability *p* fixed, so that *K* = *pN*, and with rescaled synaptic amplitudes *J* ∼ 1/ √*N* (van Vreeswijk and Sompolinsky, 1996; Renart et al., 2010) the echo terms vanish fastest. Formally this can be seen from (18): the multiplicative factor of the common covariance term ϕ<sup>4</sup> does not change with *N* while the other coefficients decrease. So ultimately all four entries of the matrix **c** have the same time dependence determined by the common covariance term ϕ4. In particular the covariance between excitation and inhibition *cEI* becomes

**FIGURE 8 | Different echo terms for spiking and non-spiking neurons.** Binary non-spiking neurons shown in **(A,C)** and LIF in **(B,D)**. **(A)**,**(B)** Echo terms by direct influence of the neuron's output on the network in dependence of neuron types (in **A,B** *cEE* ,*cEI* , and *cII* are plotted as black, gray dots and circles). **(C,D)**, Contributions to the covariance evoked by correlated and common input (black dots) measured with help of auxiliary model neurons which do not provide feedback to the network. Corresponding theoretical predictions (16) are plotted as light gray curves throughout.

symmetric in this limit. This finally provides a quantitative explanation of the observation made in (Renart et al., 2010) that the time-lag between excitation and inhibition vanishes in the limit of infinitely large networks. For a different synaptic rescaling *J* ∼ *N*−<sup>1</sup> while keeping ρ<sup>2</sup> constant by appropriate additional input to each neuron (see Helias et al., 2013 applied to the LIF model), all multiplicative factors decrease ∼ *N*−<sup>1</sup> and so does the amplitude of all covariances. Hence the asymmetry of *cEI* does not vanish in this limit. The same results hold for the case of output noise where the term with ϕ<sup>1</sup> describes the common input part of the covariance. In this case and for finite network size, *cIE* coincides with *cEE* and *cEI* with *cII* for *t* > 0, having a discontinuous jump at the time of the synaptic delay *t* = *d*. For time lags smaller than the delay all four covariances coincide. This is due to causality, as the second neuron cannot feel the influence of a fluctuation that happened in the first neuron less than one synaptic delay before. The covariance functions for systems corresponding to an OUP with input noise contain neither discontinuities nor sharp peaks at *t* = *d*, but *cEI* and *cIE* have maxima and minima near this location. This observation can be interpreted as a result of the stochastic nature of the binary model where changes in the input influence the state of the neuron only with a certain probability. So, the entries of **c** in this case take different values for |*t*| < *d* but show the tendency to approach each other with increasing |*t*| *d*. This tendency increases with network size. Our analytical solutions (18) for input noise and (21) for output noise hence explain the model-class dependent differences in the shape of covariance functions.

The two above mentioned synaptic scaling procedures are commonly termed "strong coupling" (*J* ∼ 1/ <sup>√</sup>*N*) and "weak coupling" (*J* ∼ 1/*N*), respectively. The results shown in **Figure 6** were obtained for *J* = 2/ <sup>√</sup>*<sup>N</sup>* and <sup>β</sup> <sup>=</sup> <sup>0</sup>.5, so the number of synapses required to cause a notable effect on the gain function is 1/(β*J*) <sup>=</sup> <sup>√</sup>*N*, which is small compared to the number of incoming synapses *pN*. Hence the network is in the strong coupling regime. Also note that for infinite slope of the gain function, β → ∞, the magnitude of the covariance becomes independent of the synaptic amplitude *J*, in agreement with the linear theory presented here. This finding can readily be understood by the linearization procedure, presented in the current work, that takes into account the network- generated fluctuations of the total input. The amplitude σ of these fluctuations scales linearly in *J* and the effective susceptibility depends on *J*/σ in the case β → ∞, explaining the invariance (Grytskyy et al., 2013). In the current manuscript we generalized this procedure to finite slopes β and to other models than the binary neuron model.

Our approach enables us to map results obtained for one neuron model to another, in particular we extend the theory of all considered models to capture synaptic conduction delays, and devise a simpler way to obtain solutions for systems considered earlier (Ginzburg and Sompolinsky, 1994). Our derivation of covariances in spiking networks does not rely on the advanced Wiener-Hopf method (Hazewinkel, 2002), as earlier derivations (Hawkes, 1971; Helias et al., 2013) do, but only employs elementary methods. Our results are applicable for general connectivity matrices, and for the purpose of comparison with simulations we explicitly derive population averaged results. The averages of the dynamics of the linear rate model equations are exact for random network architectures with fixed out-degree, and approximate for fixed in-degree. Still, for non-linear models the linearization for

#### **REFERENCES**


fixed in-degree networks are simpler, because the homogeneous input statistics results in an identical linear response kernel for all cells. Finally we show that the oscillatory properties of networks of integrate-and-fire models (Brunel, 2000; Helias et al., 2013) are model-invariant features of all of the studied dynamics, given inhibition acts with a synaptic delay. We relate the collective oscillations to the pole structure of the cross spectrum, which also determines the power spectra of population signals such as EEG, ECoG, and the LFP.

The presented results provide a further step to understand the shape and to unify the description of correlations in recurrent networks. We hope that our analytical results will be useful to constrain the inverse problem of determining the synaptic connectivity given the correlation structure of neurophysiological activity measurements. Moreover the explicit expressions for covariance functions in the time domain are a necessary prerequisite to understand the evolution of synaptic amplitudes in systems with spike-timing dependent plasticity and extend the existing methods (Burkitt et al., 2007; Gilson et al., 2009, 2010) to networks including inhibitory neurons and synaptic conduction delays.

### **ACKNOWLEDGMENTS**

We gratefully appreciate ongoing technical support by our colleagues in the NEST Initiative, especially Moritz Deger for the implementation of the Hawkes model. Binary and spiking network simulations performed with NEST (www.nestinitiative.org). Partially supported by the Helmholtz Association: HASB and portfolio theme SMHB, the Jülich Aachen Research Alliance (JARA), the Next-Generation Supercomputer Project of MEXT, and EU Grant 269921 (BrainScaleS).


Gardiner, C. W. (2004). *Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*, 3rd Edn. Springer Series in Synergetics. Berlin: Springer.


integrate-and-fire neurons with slow synapses. *Phys. Rev. Lett.* 96:028101. doi: 10.1103/PhysRev Lett.96.028101


*Rev. Lett.* 97:188104. doi: 10.1103/ PhysRevLett.97.188104


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 April 2013; accepted: 10 September 2013; published online: October 2013. 18*

*Citation: Grytskyy D, Tetzlaff T, Diesmann M and Helias M (2013) A unified view on weakly correlated recurrent networks. Front. Comput. Neurosci. 7:131. doi: 10.3389/fncom.2013.00131*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Grytskyy, Tetzlaff, Diesmann and Helias. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX**

## **CALCULATION OF THE POPULATION AVERAGED CROSS COVARIANCE IN TIME DOMAIN**

We obtain the population averaged cross spectrum for the Ornstein-Uhlenbeck process with input noise by inserting the averaged connectivity matrix **w** = **M** (14) into (8). The two eigenvalues of **M** are 0 and *L* = *Kw*(1 − γ*g*). Taking these into account, we first rewrite the term

$$\begin{aligned} &(H\_d(\omega)^{-1} - \mathbf{M})^{-1} \\ &= \det(H\_d(\omega)^{-1} - \mathbf{M})^{-1} \begin{pmatrix} H\_d(\omega)^{-1} + \mathbf{K}\eta\gamma\mathbf{g} & -\mathbf{K}\eta\gamma\mathbf{g} \\ \mathbf{K}\eta & H\_d(\omega)^{-1} - \mathbf{K}\eta \end{pmatrix} \\ &= ((H\_d(\omega)^{-1} - \mathbf{0})(H\_d(\omega)^{-1} - \mathbf{L}))^{-1} \left( H\_d(\omega)^{-1}\mathbf{1} + \mathbf{K}\eta \begin{pmatrix} \gamma\mathbf{g} & -\gamma\mathbf{g} \\ 1 & -1 \end{pmatrix} \right) \\ &= f(\omega) \left( 1 + \mathbf{K}\eta \begin{pmatrix} \gamma\mathbf{g} & -\gamma\mathbf{g} \\ 1 & -1 \end{pmatrix} H\_d(\omega) \right), \end{aligned}$$

where we introduced *f*(ω) = (*Hd*(ω)−<sup>1</sup> − *L*)<sup>−</sup>1. The corresponding transposed and conjugate complex term follows analogously. Hence we obtain the expression for the cross spectrum (17). The residue of *f*(ω) at ω = *zk*(*L*) is

$$\text{Res}(f,\omega = z\_k(L)) = \lim\_{\omega\_1 \to \omega} \frac{\omega\_1 - \omega}{f^{-1}(\omega\_1)}$$

$$\stackrel{\text{I'Hopital}}{=} \lim\_{\omega\_1 \to \omega} \frac{1}{(f^{-1})'(\omega\_1)} = \left(\frac{d(e^{i\alpha d}(1 + i\alpha \mathbf{r}))}{d\alpha}\right)^{-1}$$

$$= \left(i d e^{i\alpha d} (1 + i\alpha \mathbf{r}) + i \pi e^{i\alpha d}\right)^{-1}$$

$$= \left(i d L + i \pi e^{i\alpha d}\right)^{-1},$$

where in the last step we used the condition for a pole *Hd*(*zk*)−<sup>1</sup> = *eizkd*(1 + *izk*τ) = *L* (see "Spectrum of the dynamics"). The residue of *Hd*(ω) at *z*(0) = *<sup>i</sup>* <sup>τ</sup> is <sup>−</sup> *<sup>i</sup>* <sup>τ</sup> *<sup>e</sup>d*/τ. Using the residue theorem, we need to sum over all poles within the integration contour{*zk*(*L*)|*k* ∈ N} ∪ *<sup>i</sup>* <sup>τ</sup> to get the expression for **c**(*t*) = 1 2π - +∞ −∞ **C**(ω)*ei*ω*<sup>t</sup> d*ω = *i z*∈{*zk*(*L*)|*k*∈N}∪ *<sup>i</sup>* τ Res(**C**(*z*),*z*)*eizt* for*t* - 0. Sorting (17) to obtain four matrix prefactors and remainders with different frequency dependence, -<sup>1</sup>(ω) = *f*(ω)*f*(−ω), -<sup>2</sup>(ω) = *f*(ω)*f*(−ω)*Hd*(ω), -<sup>3</sup>(ω) = -<sup>2</sup>(−ω), and -<sup>4</sup>(ω) = *f*(ω)*f*(−ω)*Hd*(ω)*Hd*(−ω), we get (18). **C**(ω) for output noise (20) is obtained by multiplying the expression for **C**(ω) for input noise with *H*−<sup>1</sup> *<sup>d</sup>* (ω)*H*−<sup>1</sup> *<sup>d</sup>* (−ω) <sup>=</sup> (<sup>1</sup> <sup>+</sup> <sup>ω</sup>2τ2). In order to perform the back Fourier transformation one first needs to rewrite the cross spectrum in order to isolate the frequency independent term and the two terms that vanish for either *t* < *d* or *t* > *d*, as described in "Fourier back transformation,"

$$\mathbf{C}(\boldsymbol{\omega}) = f(\boldsymbol{\omega})(1 + K\boldsymbol{\omega} \begin{pmatrix} \gamma \text{g } -\gamma \text{g} \\ 1 & -1 \end{pmatrix} H\_d(\boldsymbol{\omega})) \mathbf{M} \mathbf{D} \mathbf{M}^T f(-\boldsymbol{\omega}),$$

$$(1 + K\boldsymbol{\omega} \begin{pmatrix} \gamma \text{g } 1 \\ -\gamma \text{g } -1 \end{pmatrix} H\_d(-\boldsymbol{\omega})) $$

$$\begin{aligned} &+f(\boldsymbol{\omega})(\mathbf{1}+K\boldsymbol{\omega}\begin{pmatrix} \gamma\mathbf{g} & -\gamma\mathbf{g} \\ 1 & -1 \end{pmatrix}H\_d(\boldsymbol{\omega})) \mathbf{M} \mathbf{D} \\ &+\mathbf{D}\mathbf{M}^T f(-\boldsymbol{\omega})(\mathbf{1}+K\boldsymbol{\omega}\begin{pmatrix} \gamma\mathbf{g} & 1 \\ -\gamma\mathbf{g} & -1 \end{pmatrix}H\_d(-\boldsymbol{\omega})) + \mathbf{D} \\ &= f(\boldsymbol{\omega})\mathbf{M}\mathbf{D}\mathbf{M}^T f(-\boldsymbol{\omega}) + f(\boldsymbol{\omega})\mathbf{M}\mathbf{D} + \mathbf{D}\mathbf{M}^T f(-\boldsymbol{\omega}) + \mathbf{D}, \end{aligned}$$

where in the last step we used γ*<sup>g</sup>* <sup>−</sup>γ*<sup>g</sup>* 1 −1 **M** = 0, because **M** is symmetric, obtaining (21). For each of the first three terms in the last expression the right integration contour needs to be chosen as described in "Fourier back transformation" on the example of the general expression (16).

## **IMPLEMENTATION OF NOISY RATE MODELS**

The dynamics is propagated in time steps of duration *t* (note that in other works we use *h* as a symbol for the computation step size, which here is used as the symbol for the kernel). The product of the connectivity matrix with the vector of output variables at the end of the previous step *i* − 1 is the vector **I**(*ti*) of inputs at the current step *i*. The intrinsic time scale of the system is determined by the time constant τ. For sufficiently small time steps *t* τ these inputs can be assumed to be time independent within one step. So we can use (3) or (4) and analytically convolve the kernel function *h* assuming the input to be constant over the time interval *t*. This corresponds to the method of exponential integration (Rotter and Diesmann, 1999, see App. C.6) requiring only local knowledge of the connectivity matrix **w**. Note that this procedure becomes exact for *t* → 0 and for finite *t* is an approximation. The propagation of the initial value *rj*(*ti*<sup>−</sup>1) until the end of the time interval takes the form *rj*(*ti*<sup>−</sup>1)*e*−*t*/<sup>τ</sup> because *h*(*ti*) = *h*(*ti*<sup>−</sup>1)*e*−*t*/τ, so we obtain the expression *rj*(*ti*) at the end of the step as

$$r\_{\dot{\jmath}}(t\_i) = e^{-\Delta t/\tau} r\_{\dot{\jmath}}(t\_{i-1}) + (1 - e^{-\Delta t/\tau}) \, I\_{\dot{\jmath}}(t\_i),\tag{43}$$

where *Ij* denotes the input to the neuron *j*. For output noise the output variable of neuron *j* is *yj* = *rj* + *xj*, with the locally generated additive noise *xj* and hence the input is *Ij*(*ti*) = (**w y**(*ti*))*j*. In the case of input noise the output variable is *rj* and the additional noise is added to the input variable, *Ij*(*ti*) = (**w r**(*ti*))*<sup>j</sup>* + *xj*(*ti*). In both cases *xj* is implemented as a binary noise: in each time step, *xj* is independently and randomly chosen to be 1 or −1 with probability 0.5 multiplied with ρ/ <sup>√</sup>*<sup>t</sup>* to satisfy (2) for discretized time. Here the δ-function is replaced by a "rectangle" function that is constant on the interval of length *t*, vanishes elsewhere, and has unit integral. The factor *t* <sup>−</sup><sup>1</sup> in the expression for *x*<sup>2</sup> ensures the integral to be unity. So far, the implementation assumes the synaptic delay to be zero. To implement a non-zero synaptic delay *d*, each object representing a neuron contains an array *b* of length *ld* = *d*/*t* acting as a ring buffer. The input *Ij*(*ti*) used to calculate the output rate at step *i* according to (43) is then taken from position *i* mod *ld* of this array and after that replaced by the input presently received from the network, so that the new input will be used only after one delay has passed. This sequence of buffer handling can be represented as

$$l\_j(t\_i) \leftarrow b[i \bmod l\_d]$$

$$b[i \bmod l\_d] \leftarrow \begin{cases} (\mathbf{w} \,\mathbf{r})\_j + \mathbf{x}\_j & \text{for input noise} \\ (\mathbf{w} \,\mathbf{y})\_j & \text{for output noise} \end{cases}$$

The model is implemented in Python version 2.7 (Python Software Foundation, 2008) using numpy 1.6.1 (Ascher et al., 2001) and scipy 0.9.0 (Jones et al., 2001).

#### **IMPLEMENTATION OF BINARY NEURONS IN A SPIKING SIMULATOR CODE**

The binary neuron model is implemented in the NEST simulator, version 2.2.1 (Gewaltig and Diesmann, 2007), which allows distributed simulation on parallel machines and handles synaptic delays in the established framework for spiking neurons (Morrison et al., 2005). The name of the model is "ginzburg\_neuron". In NEST information is transmitted in form of point events, which in case of binary neurons are sent if the state of the neuron changes: one spike is sent for a downtransition and two spikes at the same time for an up-transition, so the multiplicity reflects the type of event. The logic to decode the original transitions is implemented in the function handle shown in Alg. 2. If a single spike is received, the synaptic weight *w* is subtracted from the input buffer at the position determined by the time point of the transition and the synaptic delay. In distributed simulations a single spike with multiplicity 2 sent to another machine is handled on the receiving side as two separate events with multiplicity 1 each. In order to decode this case on the receiving machine we memorize the time (*t*last) and origin (global id gidlast of the sending neuron) of the last arrived spike. If both coincide to the spike under consideration, the sending neuron has performed an up transition 0 → 1. We hence add twice the synaptic weight 2*w* to the input buffer of the target neuron, one that reflects the real change of the system state and another that compensates the subtraction of *w* after reception of the first spike of a pair. The algorithm relies on the fact that within NEST two spikes that are generated by one neuron at the same time point are delivered sequentially to the target neurons. This is assured, because neurons are updated one by one: The update propagates each neuron by a time step equal to the minimal delay *d*min in the network. All spikes generated within one update step are written sequentially into the communication buffers, and finally the buffers are shipped to the other processors (Morrison et al., 2005). Hence a pair of spikes generated by one neuron within a single update step will be delivered consecutively and will not be interspersed by spikes from other neurons with the same time stamp.

The model exhibits stochastic transitions (at random points in time) between two states. The transitions are governed by probabilities φ(*h*). Using asynchronous update (Rumelhart et al., 1986), in each infinitesimal interval [*t*, *t* + δ*t*) each neuron in the network has the probability <sup>1</sup> <sup>τ</sup> δ*t* to be chosen for update (Hopfield, 1982). A mathematically equivalent formulation draws the time points of update independently for all neurons. For a particular neuron, the sequence of update points has exponentially distributed intervals with mean duration τ, i.e., it forms a Poisson process with rate τ<sup>−</sup>1. We employ the latter formulation to incorporate binary neuron models in the globally time-driven spiking simulator NEST (Gewaltig and Diesmann, 2007) and constrain the points of transition to a discrete time grid *t* = 0.1 ms covering the interval *d*min ≥ *t*. This neuron state update is implemented by the algorithm shown in Alg. 1. Note that the field *h* is updated in steps of *t* while the activity state is updated only when the current time exceeds the next potential transition point. As the last step of the activity update we draw an exponentially distributed time interval to determine the new potential transition time. The potential transition time is represented with a higher resolution (on the order of microseconds) than *t* to avoid a systematic bias of the mean inter-update-interval. This update scheme is identical to the one used in (Hopfield, 1982). Note that the implementation is different from the classical asynchronous update scheme (van Vreeswijk and Sompolinsky, 1998), where in each discrete time step *t* exactly one neuron is picked at random. The mean inter-update-interval (time constant τ in Alg. 1) in the latter scheme is determined by τ = *tN*, with *N* the number of neurons in the network. For small time steps both schemes converge so that update times follow a Poisson process.

At each update time point the neuron state becomes 1 with the probability given by the function φ applied to the input at that time according to (25) and 0 with probability 1 − φ. The input is a function of the whole system state and is constant between spikes which indicate state changes. Each neuron therefore maintains a state variable *h* at each point in time holding the summed input and being updated by adding and subtracting the input read from the ring buffer *b* at the point readpos(t) corresponding to the current time (see Morrison et al., 2005, for the implementation of the ring buffer, i.p. Fig 6). The ring buffer enables us to implement synaptic delays. For technical

#### **Algorithm 1 | Update function of a binary neuron embedded in the spiking network simulator NEST.**


*The function* readpos(t) *returns a position in the ring buffer b corresponding to the current time point.*

#### **Algorithm 2 | Input spike handler of a binary neuron embedded in the spiking network simulator NEST.**

*The simulation kernel calls the handle function for each spike event to be delivered to the neuron. A spike event is characterized by the time point of occurrence t*spike*, the synaptic delay d after which the event should reach the target, the global id* gid *identifying the sending neuron, and the multiplicity m* ≥ 1*, indicating the reception of multiple spike events. The function* pos(*t*spike, *d*, *t*) *returns the position in the ring buffer b to which the spike is added so that it will be read at time t* + *d by the update function of the neuron, see Alg. 1.*

reasons this implementation requires a minimal delay of a single simulation time step (Morrison and Diesmann, 2008). The gain function φ applied to the input *h* has the form

$$\phi(h) = c\_1 h + c\_2 \frac{1}{2} \left( 1 + \tanh(c\_3(h - \theta)) \right),\tag{44}$$

where throughout this manuscript we used *c*<sup>1</sup> = 0, *c*<sup>2</sup> = 1, and *c*<sup>3</sup> = β, as defined in "Parameters of simulations".

## **IMPLEMENTATION OF HAWKES NEURONS IN A SPIKING SIMULATOR CODE**

Hawkes neurons (Hawkes, 1971) were introduced in the NEST simulator in version 2.2.0 (Gewaltig and Diesmann, 2007). The name of the model is "pp\_psc\_delta". In the following we describe the implemented neuron model in general and mention the particular choices of parameter and correspondences to the theory presented in "Hawkes processes". The dynamics of the quasi-membrane potential *u* is integrated exactly within a time step *t* of the simulation (Rotter and Diesmann, 1999), expressing the voltage *u*(*ti*) at the end of time step *i* by the membrane potential at the end of the previous time step *u*(*ti*<sup>−</sup>1) as

$$u(t\_i) = e^{-\Delta t/\tau} u(t\_{i-1}) + (1 - e^{-\Delta t/\tau}) \, R\_m I\_e + b(t\_i), \quad (45)$$

where *Ie* is a time-step wise constant input current (equal to 0 in all simulations presented in this article) and *Rm* = τ*m*/*Cm* is the membrane resistance. The buffer *b*(*ti*) contains the summed contributions of incoming spikes, multiplied by their respective synaptic weight, which have arrived at the neuron within the interval (*ti*<sup>−</sup>1, *ti*]. *b* is implemented as a ring-buffer in order to handle the synaptic delay, logically similar as in "Implementation of noisy rate models," described in detail in Morrison et al. (2005). The instantaneous spike emission rate is λ = [*c*1*u* + *c*2*ec*3*<sup>u</sup>*]+, where we use *c*<sup>3</sup> = 0 in all simulations presented here. The quantities in the theory "Hawkes processes," in particular in (35), are related to the parameters of the simulated model in the following way. The quantity *r* relates to the membrane potential *u* as *r* = *c*1*u* + *c*<sup>2</sup> and the background rate ν agrees to *c*<sup>2</sup> = ν. Hence the synaptic weight *Jij* corresponds to the synaptic weight in the simulation multiplied by *c*1. For the correspondence of the Hawkes model to the OUP with output noise of variance ρ<sup>2</sup> we use (38) to adjust the background rate ν in order to obtain the desired rate λ<sup>0</sup> = ρ<sup>2</sup> and we choose the synaptic weight *J* of the Hawkes model so that the linear coupling strength *w* of the OUP agrees to the effective linear weight given by (39). These two constraints can be fulfilled simultaneously by solving (38) and (39) by numerical iteration. The spike emission of the model is realized either with or without dead time. In this article we only used the latter. In the presence of a dead time, which is constrained to be larger than the simulation time step, at most one spike can be generated within a time step. A spike is hence emitted with the probability *p*≥<sup>1</sup> = 1 − *e*λ*<sup>t</sup>* , where *e*λ*<sup>t</sup>* is the probability of the complementary event (emitting 0 spikes), implemented by comparing a uniformly distributed random number to *p*≥1. The refractory period is handled as described in Morrison et al. (2005). Without refractoriness, the number of emitted spikes is drawn from a Poisson distribution with parameter λ*t*, implemented in the GNU Scientific Library (Galassi et al., 2006). Reproducibility of the random sequences for different numbers of processes and threads is ensured by the concept of random number generators assigned to virtual processes, as described in (Plesser et al., 2007).

## **PARAMETERS OF SIMULATIONS**

For all simulations we used γ = 0.25 corresponding to the biologically realistic fraction of inhibitory neurons, a connectivity probability *p* = 0.1, and a simulation time step of *t* = 0.1 ms. For binary neurons we measured the covariance functions with a resolution of 1 ms, for all other models the resolution is 0.1 ms. Simulation time is 10, 000 ms for linear rate and for LIF neurons, 50, 000 ms for Hawkes, and 100, 000 ms for binary neurons. The covariance is obtained for a time window of ±100 ms.

The parameters for simulations of the LIF model presented in **Figure 7** and **Figure 8** are *J* = 0.1 mV, τ = 20 ms, τ*<sup>s</sup>* = 2 ms, τ*<sup>r</sup>* = 2 ms, *V*<sup>θ</sup> = 15 mV, *Vr* = 0, *g* = 6, *d* = 3 ms, *N* = 8000. The number of neurons in the corresponding networks of other models is the same. Cross covariances are measured between the summed spike trains of two disjoint populations of *N*rec = 1000 neurons each. The single neuron autocovariances *a*α are averaged over a subpopulation of 100 neurons. The autocovariances of the population averaged activity <sup>1</sup> *<sup>N</sup>*<sup>α</sup> *<sup>a</sup>*<sup>α</sup> <sup>+</sup> *<sup>C</sup>*αα for population α ∈ {*E*, *I*} (shown in **Figure 7**) are constructed from the estimated single neuron population averaged autocovariances *a*α and cross covariances *C*αα. This enables us to estimate *a*α and *C*αα from the activity of a small subpopulation and still assigns the correct relative weights to both contributions. The corresponding effective parameters describing the system dynamics are μ = 15 mV, σ = 10 mV, *r* = 23.6 Hz (see (40) and the following text for details).

The parameters of the Hawkes model and of the noisy rate model with output noise yielding quantitatively agreeing covariance functions are:


The network of binary neurons shown in **Figure 8** uses θ = −3.89 mV, β = 0.5 mV−1, *J* = 0.02 mV, *d* = 3 ms (see (25), (44)), and the same *g* and τ as the noisy rate model. Covariances are measured using the signals from all neurons.

The simulation results for the network of binary neurons presented in **Figure 6** uses θ = −2.5 mV, τ = 10 ms, β = 0.5 mV−1, *g* = 6, *J* ≈ 0.0447 mV, *N* = 2000 and the smallest possible value of synaptic delay is *d* = 0.1 ms equal to time resolution (the same set of parameters only with modified β = 1 mV−<sup>1</sup> was used to create **Figure 5**). The cross covariances *CEE* and *CII* are estimated from two disjoint subpopulations each comprising half of the neurons of the respective population, *cEI* is measured between two such subpopulations. For *cEE* and *cII* we used the full populations.

The parameters required for a quantitative agreement with the rate model with input noise are *<sup>w</sup>* <sup>≈</sup> <sup>0</sup>.011, <sup>ρ</sup> <sup>≈</sup> <sup>2</sup>.<sup>23</sup> <sup>√</sup>ms. We used the same parameters in **Figure 3**, where additionally results for *w* = 0.018 are shown. The population sizes are the same as for the binary network. The covariances are estimated in the same way as for the rate model with output noise. Note that the definition of noisy rate models has no limitation for units of ρ2. These can be arbitrary and are chosen differently as required by the correspondence with either spiking or binary neurons.

## Efficient neural codes can lead to spurious synchronization

## *Massimiliano Zanin1,2\* and David Papo3*

*<sup>1</sup> Departamento de Engenharia Electrotécnica, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Lisboa, Portugal*

*<sup>2</sup> Innaxis Foundation and Research Institute, Madrid, Spain*

*<sup>3</sup> Center for Biomedical Technology, Technical University of Madrid, Madrid, Spain*

*\*Correspondence: mzanin@mzanin.com*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Klaus Wimmer, Universitat de Barcelona, Spain*

**Keywords: synchronization, neural models, boolean code, EEG/MEG, stimuli**

Experimental and computational evidence shows that cognitive function requires an optimal balance between global integrative and local functionally specialized processes (Tononi et al., 1998). This balance can be described in terms of transient short-lived episodes of synchronized activity between different parts of the brain (Friston, 2000; Breakspear, 2002). Synchronization over multiple frequency bands is thought to subserve fundamental operations of cortical computation (Varela et al., 2001; Fries, 2009), and to be one of the mechanisms mediating the large-scale coordination of scattered functionally specialized brain regions. For instance, transient synchronization of neuronal oscillatory activity in the 30–80 Hz range has been proposed to act as an integrative mechanism, binding together spatially distributed neural populations in parallel networks during sensory perception and information processing (Singer, 1995; Miltner et al., 1999; Rodriguez et al., 1999). More generally, synchrony may subserve an integrative function in cognitive functions as diverse as motor planning, working or associative memory, or emotional regulation (Varela, 1995).

Over the past 15 years, cognitive neuroscientists have tried to capture and quantify neural synchronies across distant brain regions both during spontaneous brain activity and in association with the execution of a wide range of cognitive tasks, using neuroimaging techniques such as functional resonance imaging, electro- or magneto-encephalography. Theoretical advances in various fields including non-linear dynamical systems theory have allowed the study of various types of synchronization from time series (Pereda et al., 2005), and to address important issues such as determining whether observed couplings do not reflect a mere correlation between activities recorded at two different brain regions but rather a causal relationship (Granger, 1969) whereby a brain region would cause the activity of the other one.

However, not all measured synchrony may in fact represent neurophysiologically and cognitively relevant computations: various confounding effects may mislead into identifying functional connectivity, defined as the temporal correlations between spatially remote neurophysiological events, with effective connectivity, i.e., the influence one neuronal system exerts over another (Friston, 1994). For instance, measured synchrony may stem from common thalamo-cortical afferents or neuromodulatory input from ascending neurotransmitter systems, or may be the visible part of indirect effective connectivity. Other technique-specific artifactual sources of synchrony, for instance induced by volume conduction, are also well-known to cognitive neuroscientists (Stam et al., 2007).

Here, we address a further (extracranial) confounding source: the appearance of simultaneous, yet uncorrelated stimuli. We show how the activity of two groups of binary neurons, whose output code is optimized to represent rare events with short codes, can exhibit a synchronization when such rare events appear, even in the absence of shared information or common computational activities.

## **1. THE MODEL**

We suppose that a neuron codifies an external stimulus with a set of spikes, to transmit information about the event to other regions of the neural system. For the sake of simplicity, let's also suppose that all stimuli are drawn from a finite set of events E = {*e*1,...,*eN*}, *N* being the total number of events. Each event *i* is characterized by two strongly related features: the frequency of appearance *fi* and the importance factor *mi*. Clearly, rare events are also the most important ones. For instance, the image of a group of trees is quite common for an animal, and should not attract his attention. On the other hand, a predator appearing behind such trees is far less frequent, and the importance of a fast response to the event, high. Therefore, for each event *i*, the relation *mi* = 1/*fi* is defined.

Each neuron optimizes its code to represent such an environment, i.e., it assigns a symbol *si* drawn from an alphabet *S* to each input event *i*. As the neuron natural language is composed of spikes, each symbol *si* is defined as a sequence of spikes and silences; this is represented by a sequence of 0's and 1's, of arbitrary length, forming a Boolean code. In other words, and from an information science perspective, each symbol *si* is a number in its Boolean representation.

In the creation of the code, the neurons use all their available knowledge concerning their environment, given by *fi* and *mi*, trying to fulfilling two conditions. First, the cost associated with the transmission of information should be minimized, thus as few spikes as possible should be generated; this favors large symbols with few 1's and a large proportion of 0's. This condition is energy saving, but increases the neuron's response time. Therefore, a second condition ensures that the neuron minimizes symbol length, particularly those associated with events or items of great importance, i.e., with low *fi* and high *mi*.

A *cost* given by:

$$C = \sum\_{i} \left[ \alpha \frac{b\_i f\_i}{l\_i} + (1 - \alpha) \, l\_i m\_i \right] \quad (1)$$

accounts for the trade-off between these conditions is associated to each code, and minimized by the neuron in a training phase representing a natural selection process. The contribution of each symbol *i* to the total is given by two terms see Equation 1. The first, involving the number of spikes in the symbol (*bi*), its expected frequency of appearance (*fi*) and its length (*li*), expresses the probability of having the neuron spiking , at a given time, and thus the expected energetic cost of the code. The second term penalizes the appearance of long symbols codifying important messages. Finally, the parameter α defines the balance between both contributions to the total cost: for α ≈ 0 (α ≈ 1) the total cost is dominated by the length of important symbols (by the energetic cost).

Two additional requirements are added. First, for different events no to be confused, all symbols should be different, i.e., *si* = *sj*. Second, all symbols should start with a spike (a 1) and have at least one zero, in order to be recognizable and to avoid codes composed only of silences or spikes.

Due to the computational cost of optimizing such codes when multiple events are considered, the process is performed by means of a *greedy algorithm* Cormen et al. (2001), that is, by starting with an empty set, and adding one symbol at the time, making the locally optimal choice at each iteration.

### **2. RESULTS**

We now explore how a spurious synchronization between different neurons (or groups of them) can be achieved even in the absence of any information transfer.

Neurons are supposed to work independently, that is, they receive independent inputs from the environment and create their optimal code to process and transmit such information. For instance, two groups of neurons may receive two different and uncorrelated stimuli, corresponding to the image of a predator and the sound of a thunder.

Following this idea, a large number of neurons are modeled and their codes created. Each neuron has its independent set of stimuli, half of them highly probable (and therefore, less important), and half of them with low probability of appearance.

Using this information, all codes are generated, and a time series for each neuron is created, by presenting sequences of stimuli at random, and recording the neuron's corresponding activity. Time series are divided into two parts of equal length. During the first half, neurons are stimulated by high-probability events; the opposite occurs during the second half. Following the previous example, we suppose that the organism is resting quietly at the beginning, and then spots a predator and hears a thunder. Furthermore, we suppose that neurons do not respond with the same velocity to the external stimuli: each neuron receives its inputs with a delay drawn from a uniform distribution defined between 0 and 400 time steps.

**Figure 1** Left depicts the evolution of the time series generated by two groups of neurons, each one composed of 500 neurons, for α = 0.1, 40 stimuli, and a transition interval of 400. Each series is clearly divided in two epochs, the first one corresponding to the time window [0, 5000], in which no relevant event appears, and a second window [5000, 10000] in which neurons respond to rare external stimuli. As previously described, an efficient code requires important stimuli to be codified with short symbols, which, in turns, are associated with high spike densities. This effect is clearly shown in **Figure 1** Left, where the proportion of spiking neurons after time 5000 is roughly increased by 0.05.

As neural codes are independently generated for the 1000 neurons considered, with different probability distributions, and external stimuli are also triggered in an independent way, no synchronization is expected between both time series. Indeed, if one computes the Pearson's correlation coefficient between both series within the time window [0, 5000], the result is in the order of 10<sup>−</sup>4. Nonetheless, an interesting result is obtained when the correlation is calculated by means of a sliding window; in other words, a time-varying correlation is obtained, whose value at time *t* represents the dynamics of both neural groups in the interval [*t* − 200,*t* + 200]. Intuitively, when analyzing the series near time 5000, both series share the same trend, i.e., an upward dynamics, thus leading to a positive synchronization. Such

The black solid line represents the evolution of the Pearson's correlation coefficient between both groups, calculated with a sliding window of size 400. **(C)** Average values for the four synchronization metrics, using the same event sets of panel **(A)**. All neural codes are optimized for α = 0.1.

effect is shown in **Figure 1** Left, black line and right scale: around time 5000 the Pearson's correlation coefficient jumps to 0.6.

To confirm this result, **Figure 1** Right reports the average synchronization level obtained in 100 realizations of the previously described process, as obtained by 4 commonly used metrics for the assessment of synchronization in brain activity:


As can be seen in **Figure 1** Right, all four metrics present a peak around time 5000, indicating that they all detect this spurious synchronization between the two groups of neurons.

This spurious synchronization is caused by the optimization of the neural code, in which the length of important events is minimized, thus increasing the proportion of spiking neurons when rare events are presented to the system.

The example proposed in **Figure 1** Left is not very ecological as the set of events presented in the two halfs of the considered period only included frequent ([0, 5000]) and infrequent ([5000, 10000]) events. **Figure 1** Center presents a more realistic example, in which the probability of finding rare events is continuously varied between two intermediate values. The resulting time series (gray and light red lines) are highly noisy, while it is still possible to detect some trends. The black solid line represents the evolution of the Pearson's correlation coefficient calculated over a sliding window of size 400. Even in this noisy configuration, it is possible to detect regions in which the correlation between the two time series is strongly increased - similar results were obtained with the three other considered metrics.

## **3. DISCUSSION**

In conclusion, we showed that synchronization can appear when the response of two groups of binary neurons is modulated by the simultaneous appearance of uncommon stimuli, even if both groups do not share information and are not performing a common computation. This is due to the way neural codes are constructed, i.e., to the preference of short symbols, with high spiking rates, representing uncommon events. The present toy model is not intended to mirror actual neural functioning, but rather to draw attention to a possible source of spurious synchronization occurring at the system level of description of neural activity typical of standard neuroimaging techniques. In particular, our results show that even a measure such as the Granger causality can be fooled into signaling causal relationships in the presence of mere coincidences corresponding to no underlying computation. This confirms that claims of causality from (multiple) bivariate time series should always be taken with caution (Pereda et al., 2005), as true causality can only be assessed if the set of two time series contains all possible relevant information and sources of activities for the problem (Granger, 1980), a condition that a neurophysiological experiment can only rarely comply with. Finally, it is important to remark that our model's main suggestion that some of the correlations one would observe in neural activity would not correspond to genuine computation holds true even for resting brain activity, which is operationally defined by the absence of exogenous stimulation. This is explained by the fact that resting brain activity is characterized by unobservable, endogenous activity stemming from numerous simultaneous sources rendering spurious coincidences a plausible occurrence.

## **ACKNOWLEDGMENTS**

Authors acknowledge the usage of the resources, technical expertise and assistance provided by supercomputing facility CRESCO of ENEA in Portici (Italy).

#### **REFERENCES**


*Received: 01 June 2013; accepted: 21 August 2013; published online: 10 September 2013.*

*Citation: Zanin M and Papo D (2013) Efficient neural codes can lead to spurious synchronization.* *Front. Comput. Neurosci. 7:125. doi: 10.3389/fncom. 2013.00125*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Zanin and Papo. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Impact of neuronal heterogeneity on correlated colored noise-induced synchronization

## *Pengcheng Zhou1,2, Shawn D. Burton2,3, Nathaniel N. Urban2,3 and G. Bard Ermentrout 2,4\**

*<sup>1</sup> Program in Neural Computation, Carnegie Mellon University, Pittsburgh, PA, USA*

*<sup>2</sup> Center for the Neural Basis of Cognition, Pittsburgh, PA, USA*

*<sup>3</sup> Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA*

*<sup>4</sup> Department of Mathematics, University of Pittsburgh, Pittsburgh, PA, USA*

#### *Edited by:*

*Robert Rosenbaum, University of Pittsburgh, USA*

#### *Reviewed by:*

*Tatjana Tchumatchenko, Max Planck Institute for Brain Research, Germany Eric Shea-Brown, University of Washington, USA*

#### *\*Correspondence:*

*G. Bard Ermentrout, Department of Mathematics, University of Pittsburgh, 139 University Place, Pittsburgh, PA 15260, USA e-mail: bard@pitt.edu*

Synchronization plays an important role in neural signal processing and transmission. Many hypotheses have been proposed to explain the origin of neural synchronization. In recent years, correlated noise-induced synchronization has received support from many theoretical and experimental studies. However, many of these prior studies have assumed that neurons have identical biophysical properties and that their inputs are well modeled by white noise. In this context, we use colored noise to induce synchronization between oscillators with heterogeneity in both phase-response curves and frequencies. In the low noise limit, we derive novel analytical theory showing that the time constant of colored noise influences correlated noise-induced synchronization and that oscillator heterogeneity can limit synchronization. Surprisingly, however, heterogeneous oscillators may synchronize better than homogeneous oscillators given low input correlations. We also find resonance of oscillator synchronization to colored noise inputs when firing frequencies diverge. Collectively, these results prove robust for both relatively high noise regimes and when applied to biophysically realistic spiking neuron models, and further match experimental recordings from acute brain slices.

**Keywords: synchrony, correlation, colored noise, heterogeneity, neural oscillators, phase-response curve**

## **1. INTRODUCTION**

Synchronization of neural oscillators is thought to play a critical role in sensory, motor, and cognitive processes (Sanes and Donoghue, 1993; Fries et al., 2001; Wang, 2010). In many networks, synchronization is achieved via direct coupling such as through gap junctions and chemical synapses. However, there are several systems (notably, the mammalian olfactory bulb) where the mode of coupling is less clear and neural synchrony is hypothesized to arise from partially correlated presynaptic inputs (Galán et al., 2006; Marella and Ermentrout, 2010). Indeed, in non-oscillatory networks of neurons, such correlated input is largely responsible for the output correlations of the neurons (de la Rocha et al., 2007). Thus, a natural question is: how do the properties of neurons and networks alter output correlations for a given degree of input correlation? At small input correlations, output and input correlations can be regarded as linearly proportional; this ratio is called the *susceptibility* (Shea-Brown et al., 2008). For example, (de la Rocha et al., 2007) showed that the susceptibility depends on the background firing rate of the neuron. For some model systems, this susceptibility can be computed using linear response theory (which assumes small perturbations around the stationary state).

When neurons fire regularly, they can be regarded as noisy nonlinear oscillators and, as such, there are many mathematical techniques available for their analysis. In particular, the *phase-response curve* (PRC) provides a compact and useful characterization of the responses of a nonlinear oscillator to external perturbations. The PRC describes the shift in timing of, say, an action potential as a function of the timing of the input relative to the last action potential. In several studies, we have described the theoretical relationship between the shape of the PRC and the ability of *identical* neurons to transfer partially synchronized activity (Marella and Ermentrout, 2008; Abouzeid and Ermentrout, 2009). In these studies, the only source of heterogeneity considered between neural oscillators was their unshared (uncorrelated) inputs, which consisted of white noise. Recently, we extended these methods to cases in the low noise limit in which the oscillators were not identical and showed how heterogeneity in intrinsic properties could significantly degrade the output correlation in pairs receiving common inputs (Burton et al., 2012).

In this study, we extend this theory to include colored noise inputs and, further, report some surprising effects of heterogeneity. First, we derive a set of equations for the distribution of phase differences for pairs of heterogeneous oscillators driven by a partially correlated Ornstein-Uhlenbeck (OU) process (low-pass filtered noise). We next show that the theory developed for phase reduced models works well with a conductance-based biophysical model. We then show that, quite surprisingly, at low input correlations, heterogeneity can sometimes produce higher output correlations than the homogeneous case. That is, consider two distinct oscillators, A and B, such that the AA pair has a small susceptibility and the BB pair a larger susceptibility. Then, at low correlations, the susceptibility of the AB pair can sometimes exceed that of the AA pair. We confirm this somewhat counterintuitive prediction with recordings from regularly firing mitral cells of the main olfactory bulb. In addition to heterogeneity in response properties, neurons can fire at different frequencies, and such frequency differences can significantly impact correlatednoise induced synchronization (Markowitz et al., 2008; Burton et al., 2012). Here, we find that for some frequency differences between oscillators, there is an optimal time scale of correlated noise that will maximally synchronize the oscillators. We do not see this effect when the oscillators have the same frequency.

#### **2. MATERIALS AND METHODS**

#### **2.1. PHASE REDUCTION MODEL**

In Appendix, we provide a brief overview of how to reduce a general weakly perturbed limit cycle to a single differential equation for the phase of the cycle. If we assume that the original limit cycle represents repetitive firing of a single compartment neuron model that is driven by a noisy current, *I*(*t*), then we obtain:

$$\frac{d\theta}{dt} = \mathbf{l} + \epsilon \Delta(\theta) I(t) / \mathbf{C}\_m \tag{1}$$

where *Cm* is the membrane capacitance, θ is the phase (or, typically, the time since the last spike), and -(θ) is the PRC of the neuron. The PRC describes the phase-dependent shift in the spike times of an oscillator receiving small perturbations. It is readily measured in neurons and other biological oscillators (Torben-Nielsen et al., 2010) and provides a compact representation of the effects of stimuli on the timing of action potentials. -(θ) has dimensions of milliseconds per millivolt; that is, the shift in timing of the next action potential per millivolt perturbation of the potential. Mathematically, for a given model, -(θ) is found by solving a certain differential equation (see Appendix). It is a periodic function of phase and, with no loss in generality, we can normalize the period to be 2π for simplicity.

#### **2.2. STATIONARY DENSITY**

Given the reduced model (Equation 1), we can now turn to the main question at hand, which is: how do oscillating heterogeneous neurons transfer correlations? We will consider two types of heterogeneity: differences in the PRC shapes and differences in natural frequencies. We drive the oscillators with correlated filtered noise. After reduction to phase variables, we obtain:

$$
\theta\_1' = 1 + \epsilon \Delta\_1(\theta\_1) \mathbf{x} \tag{2}
$$

$$
\theta\_2' = 1 + \epsilon \Delta\_2(\theta\_2) y + \epsilon^2 \omega \tag{3}
$$

$$\mathbf{x}' = -\mathbf{x}/\mathbf{r} + \xi\_{\mathbf{r}}/\sqrt{\mathbf{r}} \tag{4}$$

$$
\gamma' = -\gamma/\mathfrak{r} + \mathfrak{k}\_{\mathcal{Y}}/\sqrt{\mathfrak{r}} \tag{5}
$$

θ<sup>1</sup> and θ<sup>2</sup> are the phases of two oscillators, and -<sup>1</sup>(θ) and -<sup>2</sup>(θ) are PRCs of the two oscillators. Without loss of generality, we set the natural frequency of one oscillator to 1. The parameter ω then determines the magnitude of the difference in natural frequencies between the two oscillators. 1, thus the noise is weak and the frequency difference is small. The processes *x* and *y* are generated by an OU process with the same time constant τ. ξ*<sup>x</sup>* and ξ*<sup>y</sup>* are two correlated white noise processes, i.e., ξ*x*(*t*)ξ*x*(*t* - ) = δ(*t* − *t* - ), ξ*y*(*t*)ξ*y*(*t* - ) = δ(*t* − *t* - ), ξ*x*(*t*)ξ*y*(*t* - ) = *c*δ(*t* − *t* - ), where *c* is the degree of correlation.

We remark that the allowable frequency difference is *O*(2), which seems considerably smaller than the magnitude of the noise, which is . However, as the noise has zero mean, what matters is the variance of the noise, which has magnitude 2. Thus, the scales of both the frequency difference and the synchronizing inputs (correlations in the noise) are similar. If the frequency differences are larger, then no synchronization is possible.

Our goal is to compute the stationary distribution of the phase difference between two neurons since this will enable us to compute various measures of correlation and synchrony. Thus, some variable substitution will be helpful: θ = θ1, φ = θ<sup>2</sup> − θ1. Therefore, φ is the phase difference between the two oscillators. With this change of variables, the equations are:

$$
\boldsymbol{\theta}' = \boldsymbol{1} + \epsilon \Delta\_1(\boldsymbol{\theta}) \mathbf{x} \tag{6}
$$

$$\phi' = \epsilon [\Delta\_2(\theta + \phi)\chi - \Delta\_1(\theta)\chi] + \epsilon^2 \omega \tag{7}$$

and *x*, *y* are as above. Let ρ(*x*, *y*, θ, φ, *t*) represent the probability density function at time *t*:

$$Pr(X(t)\in(\mathfrak{x},\mathfrak{x}+d\mathfrak{x}),Y(t)\in(\mathfrak{y},\mathfrak{y}+d\mathfrak{y}),\,\Theta(t)\in(\theta,\theta+d\theta),$$

$$\Phi(t)\in(\phi,\phi+d\phi))=\rho(\mathfrak{x},\mathfrak{y},\theta,\phi)d\mathfrak{x}d\mathfrak{y}d\theta d\phi\tag{8}$$

We denote the stationary density (long-time behavior as *t* → ∞) as ρ*ss*(*x*, *y*, θ, φ).

Our goal is to compute the probability density of the phase difference between the two oscillators, *R*(φ), which is:

$$R(\phi) := \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \int\_{0}^{2\pi} \rho\_{ss}(\mathbf{x}, \mathbf{y}, \theta, \phi) \, d\mathbf{x} d\mathbf{y} d\theta \tag{9}$$

If the oscillators are perfectly synchronized, then *R*(φ) will be a delta function centered at φ = 0. If the oscillators are completely independent, then *R*(φ) = 1/(2π). In Appendix, we show that *R*(φ) satisfies a simple first order boundary value problem (BVP). We present the exact equation for this in Results.

#### **2.3. ORDER PARAMETER**

Once we get the distribution of phase differences, *R*(φ), we need a number to estimate the synchronization, which means the sharpness of this distribution. In this study, we use an order parameter (OP) to do this. We define:

$$\text{OP} = \sqrt{C^2 + S^2} \tag{10}$$

$$C = \int\_0^{2\pi} R(\phi) \cos(\phi) d\phi$$

$$S = \int\_0^{2\pi} R(\phi) \sin(\phi) d\phi$$

$$\theta = \operatorname{atan2}(C, S)$$

OP is a representation of sharpness and θ is the estimation of the peak position. For certain types of heterogeneity, *R*(φ) is peaked at φ = 0; in this case, we can show that the cross correlation of the spike times is (*R*(0) − 1/(2π))/(2π) (Burton et al., 2012). However, OP provides a better global measure of the synchrony and is not dependent on the peak being centered at 0; we will therefore use OP in our current results.

#### **2.4. MORRIS-LECAR MODEL**

The Morris-Lecar (ML) model (Rinzel and Ermentrout, 1989) is a simplified two-dimensional system membrane model that we use to compare the phase models with a full biophysical model:

*C dV*<sup>1</sup> *dt* <sup>=</sup> *<sup>I</sup>*<sup>1</sup> <sup>−</sup> *gL*(*V*<sup>1</sup> <sup>−</sup> *VL*) <sup>−</sup> *gKw*1(*V*<sup>1</sup> <sup>−</sup> *VK*)

$$-\left.g\_{Ca}m\_{\infty}(V\_1)(V\_1 - V\_{Ca}) + \sigma\infty\right|\tag{11}$$

$$\frac{dw\_1}{dt} = \phi \frac{w\_\infty(V\_1) - w\_1}{\mathfrak{r}\_w(V\_1)}\tag{12}$$

$$C\frac{dV\_2}{dt} = I\_2 - \underline{\mathbf{g}}\_L(V\_2 - V\_L) - \underline{\mathbf{g}}\_K \mathbf{w}\_2 (V\_2 - V\_K)$$

$$-\underline{\mathbf{g}}\_{Ca} \underline{m}\_{\infty}(V\_2)(V\_2 - V\_{Ca}) + \sigma \mathbf{x} \tag{13}$$

$$\frac{d\mathfrak{w}\_2}{dt} = \phi \frac{\mathfrak{w}\_\infty(V\_2) - \mathfrak{w}\_2}{\mathfrak{r}\_\mathbb{W}(V\_2)}\tag{14}$$

$$\mathbf{x}' = -\mathbf{x}/\mathbf{\mathfrak{r}} + \xi\_\mathbf{x}/\sqrt{\mathbf{\mathfrak{r}}} \tag{15}$$

$$y' = -\mathbf{y}/\mathfrak{r} + \mathfrak{k}\_{\mathfrak{Y}}/\sqrt{\mathfrak{r}} \tag{16}$$

with ξ1(*t*), ξ1(*t* - ) = δ(*t* − *t* - ), ξ2(*t*), ξ2(*t* - ) = δ(*t* − *t* - ), and ξ1(*t*), ξ2(*t* - ) = *c*δ(*t* − *t* - ), *c* ∈ [0, 1]. The auxiliary functions are:

$$m\_{\infty}(V) = 0.5 \cdot (1 + \tanh((V - V\_a)/V\_b)) \tag{17}$$

$$\left(w\_{\infty}(V) = 0.5 \cdot (1 + \tanh((V - V\_c)/V\_d))\right) \tag{18}$$

$$\pi\_{\rm w}(V) = \frac{1}{\cosh((V - V\_{\varepsilon})/(2V\_{d}))} \tag{19}$$

The parameters used in this paper are: *VK* = −84 *mV*, *VL* = −60 *mV*, *VCa* = 120 *mV*, *gK* = 8 *mS cm*<sup>2</sup> , *gL* <sup>=</sup> <sup>2</sup> *mS cm*<sup>2</sup> , *gCa* <sup>=</sup> <sup>4</sup> *mS cm*<sup>2</sup> , *C* = 20 <sup>μ</sup>*<sup>F</sup> cm*<sup>2</sup> , *Va* = −1.<sup>2</sup> *mV*, *Vb* <sup>=</sup> <sup>18</sup> *mV*, *Vc* <sup>=</sup> <sup>2</sup> *mV*, and *Vd* <sup>=</sup> 30 *mV*. *I*1, *I*<sup>2</sup> and φ1, φ<sup>2</sup> vary for each figure.

To get the phase from the noisy voltage signal generated by the ML model, we first apply the Hilbert transform (HT) to *V*(*t*) which allows us to get a phase. However, the phase is not uniform as it is not a temporal phase. We then map the HT phase to a temporal phase on the noise-free limit cycle which gives a uniform phase-distribution as required by the theory. This allows us to estimate *R*(φ) for the biophysical model, where φ here is the phase difference between two ML model neurons that are driven with partially correlated noise.

In some of the figures, we simulate the phase-reduced dynamics for the ML model. To do this, we must compute the infinitesimal PRC, -*ML*(θ). As described in Appendix, the PRC for the model is the voltage component of the solution to the adjoint equation (Equation 32). The software package XPP (Ermentrout, 2002) includes an algorithm for computing the adjoint solution for an exponentially stable limit cycle, so we simply compute various limit cycles (say with very different parameters but similar periods) and then compute -*ML*(θ) for those specific parameters. We save the result as a lookup table and then numerically solve the phase equation.

#### **2.5. NUMERICS**

To get solutions to the stochastic phase and membrane equations, we use the Euler-Murayama method. We solve the BVP for the stationary phase difference density using a custom BVP solver written in MATLAB. All codes are available by request.

#### **3. RESULTS**

#### **3.1. APPROXIMATION OF THE PHASE DIFFERENCE DENSITY**

Oscillators driven with a correlated fluctuating signal will exhibit a degree of synchronization that depends on the size of the signal, the strength of correlation, and the similarity of the two oscillators. Thus, for example, identical oscillators driven by small enough identical white noise will synchronize perfectly (Pikovsky et al., 1997; Teramae and Tanaka, 2004). The rate at which these identical oscillators synchronize depends on the properties of the noise - in particular, its autocorrelation (Nakao et al., 2007; Goldobin et al., 2010). In general, and especially in biological systems, there will be a great deal of heterogeneity in any pair of oscillators. For example, for neurons, there is always some source of independent noise so that the input correlation is always less than 1. The neurons may also be firing at slightly different frequencies. Finally, even if the neurons are adjusted to fire at the same frequency, their distribution of ion channels can be very different and, thus, their response to correlated signals can be quite different (Burton et al., 2012). If the fluctuating inputs are sufficiently small, then any stable limit cycle oscillator can be reduced to a so-called phase model where the dynamics are characterized by a single variable, the phase, such that the firing is considered to occur at a phase of 0 and the time between spikes is mapped onto an angle between zero and 2π. Here, we consider driven pairs of heterogeneous oscillators that receive partially correlated filtered noise. As our main examples come from neuroscience, we assume that the external inputs are implemented as currents, in which case the phase model for the pair of neural oscillators has the form:

$$\begin{aligned} \theta\_1' &= 1 + \epsilon \Delta\_1(\theta\_1) \mathfrak{x}(t) \\ \theta\_2' &= 1 + \epsilon^2 \omega + \epsilon \Delta\_2(\theta\_2) \mathfrak{y}(t) \end{aligned}$$

where *x*(*t*) and *y*(*t*) are OU processes with the same time constant, τ, and with correlation *c*; -<sup>1</sup>,2(θ) are the PRCs for the two oscillators; is a small positive parameter (characterizing the magnitude of the fluctuations); and ω accounts for the frequency difference in the unperturbed oscillators (see Materials and Methods, Equations 2–5). We are primarily interested in the distribution of the phase difference, φ := θ<sup>2</sup> − θ1. In the Appendix (Equation 62), we show that *R*(φ), the probability density function for the phase difference, satisfies a simple BVP:

$$\frac{d}{d\phi}\{[\mathbf{c}\cdot\mathbf{g}(\phi)-C\_1]R(\phi)\}+(4\pi\omega-C\_2)R(\phi)=K$$

$$R(\phi)=R(\phi+2\pi)$$

$$g(\phi)=g(\phi+2\pi)$$

$$\int\_{-\pi}^{\pi}R(\phi)d\phi=1$$

$$K=2\omega-\frac{C\_2}{2\pi}$$

The 2π−periodic function *g*(φ) and the constants, *C*1,2, depend in a complicated way on the forms of the PRCs and the time constant of the noise, τ (see Appendix). However, all quantities can be found by integrating elementary functions. If the oscillators have the same PRC, then *C*<sup>2</sup> = 0 and *g*(φ) is even symmetric. If the oscillators have the same frequency, then ω = 0. When both *C*<sup>2</sup> and ω vanish, we can immediately solve the BVP, yielding *R*(φ) = *N*/(*C*<sup>1</sup> − *cg*(φ)), where *N* is a normalization constant so that the integral is 1. This is the result found in Marella and Ermentrout (2008) for white noise, but is clearly also true for colored noise. When the oscillators are identical and there is no difference in frequencies, the phase difference density is symmetric and always peaks at 0. However, when ω − *C*<sup>2</sup> is nonzero, the peak of the phase difference density will generally be offset. We note that does not appear in the expression for *R*(φ), which says that the phase difference density is, to a first approximation, independent of the amplitude of the noise. **Figure 1** shows typical results comparing the perturbation calculation and the simulation of the Langevin equation. In **Figure 1A**, at fairly high noise = 1, there is some distortion at the peak of the distribution, but as predicted from the theory, the distribution magnitude is largely independent of the magnitude of the fluctuations. **Figure 1B** shows a similar simulation, but the correlation of noise is lower (*c* = 0.5 vs.*c* = 0.8), the noise is faster (τ = 0.25 vs.

τ = 1), and the PRCs are identical. In this case, even the higher noise simulations match the theory. We once again emphasize that the perturbation expansion requires a small value of , but clearly, the simulations show that can be nearly 1 and still yield good agreement.

We note that the density of the phase differences can be related to more conventional measures of correlation. In Burton et al. (2012), we showed that the spike time cross-correlation (CC) between a pair of weakly noisy oscillators is:

$$\text{CC}(\mathfrak{r}) = \frac{1}{2\pi} \left[ R(-\mathfrak{r}) - \frac{1}{2\pi} \right] \tag{20}$$

For example, if the oscillators are asynchronous, then they have a uniform phase difference density and the cross-correlation will be 0. This calculation confirms ones intuition that *different* neurons that receive correlated noise will have spike time cross-correlations that peak off-center.

**Figure 2** shows that we can apply the theory through two levels of simplification. The ML system is a simple, biophysically realistic model for a spiking neuron (Rinzel and Ermentrout, 1989). With different choices of parameters, the onset to oscillatory behavior can be either through a Hopf bifurcation (HB) or a saddle-node on an invariant circle (SNIC) bifurcation. The PRCs that result from these distinct bifurcations are often quite different (Brown et al., 2004; Izhikevich, 2007) and thus have quite different synchronization properties. In **Figure 2**, we tune the ML model so that each cell has the same frequency but the parameters are quite different and so the PRCs are different (see **Figure 2B)**.

**FIGURE 2 | Analytical theory accurately predicts synchronization of biophysically realistic spiking neuron models. (A)** Invariant phase difference density computed from the reconstructed phase of two ML model neurons receiving partially correlated colored noise (period is 91.25 ms, τ = 5 ms, *c* = 0.8). Three cases are illustrated with either identical (homogeneous) or mixed (heterogeneous) PRCs. The "Hopf" case corresponds to a set of parameters where the oscillatory activity arises via a HB and the "SNIC" case through a SNIC bifurcation (Rinzel and Ermentrout, 1989); see Appendix for parameters. **(B)** PRCs for the two cases. **(C)** Same as **(A)**, but using simulations of the phase reduced equations. **(D)** Solutions to the BVP using the PRCs from **(B)**.

In **Figure 2C**, we show the results of a Monte Carlo simulation in which the biophysical model is driven by correlated noise. Phase is reconstructed from the voltage traces using a Hilbert transform and from these, we obtain phase difference histograms. In this figure, the correlation *c* is 0.8, τ = 5 ms, and the natural period of the oscillation is 91.25 ms. For the same degree of correlation, two HB oscillators are much better at synchronizing than are two SNIC oscillators. This result is consistent with the theory developed in Marella and Ermentrout (2008) for white noise and also for spike time correlations over fast timescales (i.e., spike synchrony) (Barreiro et al., 2010). At this high correlation, the heterogeneous HB-SNIC pair shows greatly reduced synchrony from either of the homogeneous cases and a shift in the peak *even though there is no frequency difference.* **Figure 2B** shows the two PRCs that were determined using the adjoint method. We then used these PRCs to compute the invariant densities for the corresponding phase reduced models. The invariant density is a function that describes the distribution of phase differences of the two neurons over some time interval consisting of many cycles. Thus, the peak of the invariant density indicates the most likely phase difference, and a large peak at zero phase difference would indicate that the two neurons are well synchronized. Comparison between **Figures 2A,C** shows excellent agreement. Finally, we substituted the numerically computed PRCs into our BVP and computed the invariant density. The result of this calculation is shown in **Figure 2D**. There are small differences in the amplitude, but the shapes and the shift of the densities in the heterogeneous case are almost identical. Thus, through two levels of reduction (first, from the full model to the phase model, and second, from the Langevin phase model to the approximate invariant density), we see that our analytical method works very well at estimating the invariant density of phase differences between neural oscillators.

#### **3.2. PRC HETEROGENEITY**

Our approximation of the invariant density, while requiring that we solve a BVP, allows us to explore the effects of heterogeneity much faster than simulating the appropriate Monte Carlo system. Thus, we will use this method to explore the effects of PRC heterogeneity, frequency differences, and the color of the noise on the ability of oscillators to synchronize. One simple global measure of synchrony/correlation for systems whose natural dynamics are periodic is the circular variance, σ*circle* = 1 − OP, where we define an order parameter (OP) (see Materials and Methods, Equation 10):

$$\text{OP} = \left[ \left( \int\_{-\pi}^{\pi} R(\phi) \cos \phi \, d\phi \right)^2 + \left( \int\_{-\pi}^{\pi} R(\phi) \sin \phi \, d\phi \right)^2 \right]^{\frac{1}{2}}$$

For a flat distribution, OP = 0 (σcircle = 1) and for a delta function distribution, OP = 1 (σcircle = 0). The OP is a commonly used measure for the degree of synchronization between two oscillators (Kuramoto, 2003).

In general, one expects that the synchrony between two oscillators forced with correlated noise would be greatest if the oscillators are homogeneous. Certainly, if the inputs are identical (i.e., no independent or unshared noise), then identical oscillators will synchronize perfectly, while heterogeneous oscillators will not synchronize perfectly. That is, the phase density will not be a delta function. [See Burton et al. (2012) for a proof]. However, surprisingly, at low input correlations, it is possible for a heterogeneous pair of oscillators to produce greater synchrony than one (but not both) of the respective homogeneous pairs of oscillators. **Figure 3** illustrates the behavior of two separate homogeneous pairs of oscillators (blue and green lines, respectively) as the input correlation varies from 0 to 1. A third, heterogeneous pair comprised of an oscillator from each homogeneous pair is shown in red. **Figure 3A** shows the two different PRCs; pairs of oscillators with the green PRC ("PRC 1-PRC 1") produce weaker synchrony than pairs of oscillators with the blue PRC ("PRC 2-PRC 2"). This is demonstrated in **Figure 3B**, where the correlation is set to 0.8. Note that the phase difference density for PRC 2-PRC 2 pair is more peaked than that for PRC 1-PRC 1 pair, while both densities are more peaked than the heterogeneous "PRC 1-PRC 2" pair. As noted above, the peak of the heterogeneous pair is not at the origin but rather, is shifted to the left. In order to get a global measure of synchrony, we plot OP as a function of the input correlation (**Figure 3C**). As *c* → 1, both homogeneous pairs approach OP = 1 (i.e., perfect synchrony) while the heterogeneous pair never exceeds OP = 0.4. However, at low correlations, the heterogeneous pair can actually synchronize better than the PRC 1-PRC 1 pair (compare red to green lines in inset). That is, in the presence of low correlations, a "good synchronizer" paired with a "bad synchronizer" performs better than the homogeneous pair of bad synchronizers. This effect is not just due

**FIGURE 3 | Oscillator heterogeneity can enhance correlated noise-induced synchronization at low input correlations. (A)** Two PRCs with the form *<sup>j</sup>*(θ*j*) = sin(*aj*) − sin(*aj* + θ*j*) + *bj* sin(2θ*j*), *a*<sup>1</sup> = 0.1, *b*<sup>1</sup> = 0.32, *a*<sup>2</sup> = 0.6, and *b*<sup>6</sup> = 0.3. **(B)** *R*(φ) with different combinations of PRCs. Blue: PRC 2-PRC 2; red: PRC 1-PRC 2; green: PRC 1-PRC 1. Solid lines are theoretical predictions from the solution to the BVP and open symbols are Monte Carlo simulation results (same notation applies in following figures). Parameters used: τ = 1 and *c* = 0.8. **(C)** Synchronization as input correlation varies from 0 to 1; inset shows magnification when *c* < 0.5. **(D)** Same as **(C)**, but using the ML model. Parameters used: *I*<sup>1</sup> = 110, φ<sup>1</sup> = 0.04616, *I*<sup>2</sup> = 120, and φ<sup>2</sup> = 0.04.

to our approximate expansion as the Monte Carlo simulations show the same phenomenon. **Figure 3D** further hints that we can also see the effect in the full ML model, although the results are not as clear.

## *3.2.1. Experimental evidence*

Could this subtle difference in the ability of neural oscillators to transfer correlation be seen in experiments? To answer this, we re-examined data from a previous study (Burton et al., 2012). Mitral cells from the mouse main olfactory bulb were injected with constant current overlaid with frozen noise to evoke noisy periodic firing. PRCs were then experimentally estimated using our previously described method using the spike-triggered average (Ermentrout et al., 2007). [Complete methods are provided in Burton et al. (2012)]. In this dataset, we found several examples where injecting partially correlated noise produced greater synchrony between two different mitral cells firing at the same rate than for one of the mitral cells across different trials (experimentally simulating a homogeneous pair of mitral cells). **Figure 4** illustrates an example. In **Figure 4A**, we show the voltage traces (top) of two mitral cells receiving correlated inputs, and the spike times (middle) and phase (bottom) as determined by a simple linear interpolation between spikes. **Figure 4B** shows the PRCs from each of these two cells along with their fit to the exponential-sine PRC model (see Appendix, Equation 64). In **Figure 4C**, we show the phase difference density as constructed from the linear phase interpolation of the two cells. In this example, the currents delivered through the electrodes are perfectly correlated. However,

**FIGURE 4 | Physiological neuronal heterogeneity can enhance correlated noise-induced synchronization at low input correlations. (A)** Example linear interpolation of phase between recorded spike times of two mitral cells injected with perfectly correlated colored noise. Top: experimentally recorded membrane potentials. Middle: raster plot of spike times. Bottom: phase. **(B)** Experimentally estimated PRCs for the two cells shown in **(A)**. Dashed lines are fits of the exponential-sine PRC model to the estimated PRCs. **(C)** Phase difference densities of the two cells during injection of perfectly correlated currents. Densities were calculated from pairs of 5 sec recordings. Blue and green curves show densities for homogeneous pairs of cell 1 and cell 2, respectively. The red curve shows the density for the heterogeneous pair of cell 1 with cell 2. **(Di)** Experimental and **(Dii)** theoretical OP vs. input correlation for homogeneous and heterogeneous pairs of the two mitral cells. Theoretical curves calculated by solving the BVP with the model PRC fits and τ = 5. [Note that the same results were obtained in separate calculations for τ = 3, the time scale of the noise used in Burton et al. (2012)]. **(Ei–Eii)** Mean OP (±SEM) vs. input correlation across 85 pairs of mitral cells (formed from 27 separate mitral cell recordings described in Burton et al. (2012)). For each

pair, the cell with the greatest area under its homogeneous OP vs. correlation curve was classified as the "good synchronizer" of the pair. **(Fi)** Theoretical OP vs. input correlation (with τ = 3) for each of the 85 heterogeneous pairs from **(E)** (light red lines), plotted against the theoretical OP vs. input correlation of a homogeneous pair formed from the average mitral cell PRC. Note that many (but not all) of the heterogeneous pairs exceed the homogeneous pair in the low correlation range shown. On average (dark red line), physiological heterogeneity enhances synchrony for input correlations up to ∼0.27. **(Fii)** Magnification of the homogeneous and average heterogeneous lines from **Fi** for low input correlations. **(Gi)** Percent and **(Gii)** absolute change in theoretical OP for heterogeneous vs. homogeneous bad pairs of mitral cells. Black lines plot OP changes for pairs in which heterogeneity increased synchrony at low input correlations; magenta line plots mean OP enhancement (±SEM) for these pairs. Grey lines plot OP changes for pairs in which heterogeneity did not increase synchrony. Note that heterogeneity mediates the greatest percent increase in OP at low (<0.1) input correlations, similar to experimental results shown in **(E)**. Inset shows magnification when |-OP| < 0.03.

unlike the simulations, the neurons themselves are intrinsically noisy, so there is a substantial component of "private" noise. Nevertheless, one can see that cell 1 (blue) synchronizes better across trials than does cell 2 (green) across trials. **Figures 4Di,Dii** show the OP as reconstructed from the experimental data and as obtained by using the computed PRCs, respectively. This shows that at low correlations, the heterogeneous pair ("1–2") can synchronize better than the "2–2" homogeneous pair (but not the "1–1" homogeneous pair). The inset in 4Dii magnifies the low *c* region.

Are the results presented in **Figures 4A–D** for a single pair of mitral cells consistent across a larger population of mitral cells? To answer this, we examined recordings from 27 regularly firing mitral cells, from which we were able to form 85 different pairs of mitral cells with highly similar (≤5 Hz difference) firing rates. For each pair of mitral cells, we computed the OP across varying input correlations for both homogeneous combinations and the heterogeneous combination. We automatically classified the mitral cell with the greatest homogeneous OP across all levels of input correlation as the "good synchronizer" of the mitral cell pair. **Figure 4E** shows the mean OP vs. correlation across the 85 good, bad, and heterogeneous mitral cell pairs. Note that, even with this relatively insensitive classification of good vs. bad synchronizers, there is a region at low input correlations where, on average, heterogeneous pairs synchronize better than the bad homogeneous pairs. This phenomenon is seen more clearly when we use the experimentally estimated PRCs and the BVP to compute the OP vs. input correlation. **Figure 4Fi** plots OP vs. *c* for all heterogeneous pairs (light red lines), the mean of the heterogeneous pairs (dark red line), and the OP for a single homogeneous pair whose PRC is the mean of all the PRCs (black line). For many cases (but not all), heterogeneity increases the OP above that achieved by a uniform population of neural oscillators with the mean PRC. **Figure 4Fii** magnifies the mean OP vs. *c* curves at low correlation; the red curve is clearly higher than the black curve.

We then quantified the degree to which physiological levels of heterogeneity [as experimentally measured in mitral cells (Burton et al., 2012)] can enhance synchrony between neural oscillators. Using the BVP and our experimentally estimated mitral cell PRCs, we calculated the percent and absolute change in OP for all 85 heterogenous vs. homogeneous bad mitral cell pairs. That is, for the example pair in **Figure 4Dii**, we subtracted the green from the red line to calculate the absolute change in OP, and divided this difference by the green line to calculate the percent change in OP. **Figures 4Gi,Gii** plot the results of this analysis for all 85 pairs. In 26 of these pairs (plotted in black), heterogeneity enhanced synchrony at low input correlations, with a mean increase in OP (plotted in magenta; ±SEM) of up to 36%. Thus, in relative terms, physiological levels of heterogeneity can significantly enhance correlated noise-induced synchrony at low input correlations. While this relative enhancement in synchrony corresponds to an admittedly low absolute increase in OP of up to 0.01 on average (**Figure 4Gii**), we nevertheless expect this phenomenon to significantly contribute to patterns of oscillatory synchrony in the olfactory bulb and potentially other brain regions (see Discussion).

#### *3.2.2. Good vs. bad synchronizers*

When is a neuron a good vs. bad synchronizer? Here, the BVP is much simpler since we just have to compare homogeneous pairs. In this case, the probability density function can be written as:

$$R(\phi) = \frac{N}{1 - c\frac{\xi(\phi)}{\xi(0)}}\tag{21}$$

where *N* is a normalization and *g*(φ) is defined above by setting *n* = *m*. For low values of *c*, we get:

$$R(\phi) \approx N \left[ 1 + c \frac{g(\phi)}{g(0)} \right] \tag{22}$$

and integrating, we can find *N*:

$$\frac{1}{N} \approx 2\pi \left[ 1 + c \frac{1}{2\pi} \int\_0^{2\pi} g(\phi) / g(0) \, d\phi \right] \tag{23}$$

Since the two neurons are identical, the peak of *R*(φ) occurs at φ = 0 and, so, we can estimate the zero lag cross-correlation as [*R*(0) − 1/(2π)]/(2π). Using the approximations above, we see that:

$$\text{CC} \approx \frac{c}{2\pi} \left( 1 - \frac{1}{2\pi} \int\_0^{2\pi} \frac{g(\phi)}{g(0)} \, d\phi \right) := c\text{S} \tag{24}$$

That is, the cross-correlation is linearly proportional to the input correlation (for small *c*) and this factor [called the susceptibility (de la Rocha et al., 2007)], is a simple function of *g*(φ). We can maximize *S* if we can make the integral as small as possible. Note that *g*(φ) is periodic, and the integral over a period is proportional to the constant Fourier coefficient. Recall that *g*(φ) is a low-pass filtered version of *h*(φ), which is the autocorrelation function of the PRC. Thus, *h*(0) is positive and so is *g*(0). The integral of *g*(φ) is proportional to the integral of *h*(φ), which is just 2π*a*<sup>2</sup> <sup>0</sup> where *a*<sup>0</sup> is the DC component of the PRC. Hence, we can minimize the integral and maximize the correlation transfer (susceptibility) if we mimimize the DC component of the PRC. This fact generalizes the conclusions in Marella and Ermentrout (2008) and Abouzeid and Ermentrout (2009) that state that more sinusoidal PRCs are the best synchronizers. For example, with a PRC of the form (sin(*a*) − sin(*x* + *a*)) exp(*C*(*x* − 2π)), we obtain the best synchrony when *a* = − arctan *C*.

Can we determine when a pair of oscillators will have the property that a good-bad heterogeneous pair is better than a bad-bad homogeneous pair? Since the effect is only seen at low correlations, this suggests a perturbation expansion for small *c*. We write *R*(φ) = *cnRn*(φ) and find that *R*<sup>0</sup> is constant and so to order 1, *R*(φ) = *R*<sup>0</sup> + *cR*1(φ). From this, we can compute OP, OP = *c* <sup>2</sup><sup>π</sup> <sup>0</sup> cos φ*R*1(φ) *d*φ. The formulas for this are not terribly useful, but we can illustrate the results with a simple example. Let *<sup>j</sup>*(φ) = sin(*aj*) − sin(φ + *aj*), where *j* = 1, 2 for two oscillators. Then:

$$\text{OP}\_{jk} = c \frac{K}{1 + \left(\pi^2 + 1\right)\left[\sin^2 a\_{\vec{j}} + \sin^2 a\_k\right]} \tag{25}$$

Thus, for 0 ≤ *a*<sup>1</sup> < *a*<sup>2</sup> ≤ π/2, we always have OP11 > OP12 > OP22 for all τ and sufficiently small values of *c*. This provides a simple and surprising illustration that heterogeneity will improve synchrony at low correlations for very simple PRCs. We remark, however, that this phenomenon does not always hold. Pairs of PRCs can be found such that OP is always bigger for both of the homogeneous oscillator pairs than for the heterogeneous oscillator pair, as can be seen from **Figures 4F,G**.

### *3.2.3. PRC heterogeneity tunes the sharpness and peak position of the phase difference density*

If two neurons are identical but driven with partially correlated noise, then the peak of the phase difference density will be centered at φ = 0, which means that the two oscillators will tend to have the same phase. However, with heterogeneity, the peak will shift depending on the degree of heterogeneity, just as two coupled oscillators will shift if they have different intrinsic frequencies. **Figure 5** shows how the peak of the phase difference density is shifted by oscillator heterogeneity. Using the two term double sinusoidal form PRC (Equation 63), we keep PRC 1 constant (*a*<sup>1</sup> = 0.1, *b*<sup>1</sup> = 0.32) as we vary PRC 2 (*b*<sup>2</sup> = 0.3 is constant and *a*<sup>2</sup> varies from −π to π). From the results shown in **Figure 5**, we can conclude that heterogeneity can tune oscillator synchronization in both the sharpness and peak position of the phase difference density, which might be useful in neural signal coding. We also note that OP is minimized when the peak is at ±π/2 and that "changing the sign" of the PRC (e.g., setting *a*<sup>2</sup> = π) shifts the peak but has very little effect on the OP.

#### **3.3. FREQUENCY DIFFERENCES HIGHLY LIMIT SYNCHRONIZATION**

In the above results, we assume that all oscillators have the same natural frequency, which means ω = 0. This is a somewhat unreasonable assumption for real neurons. Thus, we now study how synchronization is dependent on the frequency differences between oscillators. **Figure 6** shows the effects of frequency differences on a pair of oscillators that have different PRCs (of the two term double sinusoidal form, Equation 63) and are driven by partially correlated noise. With no frequency difference, the heterogeneity in oscillator PRCs yields a shift in the peak position (**Figure 5**), consistent with previous measurements of synchrony between irregularly firing neurons (Tchumatchenko et al., 2010). This means that, if frequency differences can shift the peak in the

opposite direction [e.g., see Figure 1C of Burton et al. (2012)], then changes in frequency could "cancel" the effects of PRC heterogeneity so that the peak of the phase difference density is at 0. This cancellation can be seen in **Figure 6** near ω = 0.2. However, this cancellation comes at a loss to precision, as seen by the decrease in OP. While not shown, we remark that the drop in OP is symmetric about ω = 0; thus, a negative frequency difference will not result in a larger OP. While it remains to be proven, we conjecture that the OP is always maximal when there is no frequency difference. This differs from the case that we looked at in the previous section where heterogeneity (in PRCs) can sometimes lead to a larger OP than homogeneity.

#### **3.4. CORRELATED NOISE-INDUCED SYNCHRONIZATION IS DEPENDENT ON THE TIME CONSTANT OF THE NOISE**

Because of the natural decay times of synapses, broadband inputs into neurons have some temporal correlations. Thus, we now explore how the temporal properties of noise interact with heterogeneities in the PRCs. **Figure 7** shows that synchronization decreases monotonically as τ increases for ω = 0, while there exists an optimal value of τ that achieves the greatest synchronization for ω = 0.5. This means synchronization of two oscillators with different frequencies (i.e., ω = 0) can have a resonance with τ. Furthermore, as seen in **Figures 7B,D** the peak of the phase difference density depends on τ only when there is a frequency difference between the two oscillators. In **Figure 8**, we explore this resonance in more detail where *R*(φ) is plotted as τ varies. The left panels (showing the solution to the BVP and the results of Monte Carlo simulation) show that when ω = 0, the peak position of *R*(φ) is largely unchanged and the magnitude decreases monotonically with τ. There is a sharp drop off in *R*(φ) at τ ≈ 2. A different result emerges in the right panels, where a frequency difference exists (ω = 0.5). At low and high values of τ, *R*(φ) is almost flat with a distinct resonance when τ ≈ 1. We see the same resonance in the biophysical ML model when the neurons have different frequencies and different PRCs (**Figure 9**).

We can see why the frequency differences are needed for resonance by considering the simplest example of identical PRCs of the form -(φ) = sin *a* − sin(φ + *a*). In this case, we solve the BVP:

$$\frac{G(\phi,\mathfrak{r})R(\phi)}{d\phi} = \alpha \frac{1+\mathfrak{r}^2}{\mathfrak{r}}(R(\phi)-1) \tag{26}$$

**FIGURE 6 | Frequency differences limit noise-induced synchronization.** OP decreases quickly as frequency differences increase (black axis). The peak position of the phase difference density is shifted by changing frequency differences between two oscillators (grey axis). Parameters used here: *a*<sup>1</sup> = 0.1, *b*<sup>1</sup> = 0.32, *a*<sup>2</sup> = 0.6, *b*<sup>2</sup> = 0.3, τ = 1, and *c* = 0.8.

**FIGURE 7 | Frequency differences between oscillators change the dependence of synchrony on the time constant of the correlated noise. (A)** OP and **(B)** the peak position of the phase difference density for oscillators with no frequency difference (ω = 0). **(C)** OP and **(D)** the peak position of the phase difference density for oscillators with a frequency difference (ω = 0.5). Parameters used: *a*<sup>1</sup> = 0.1, *b*<sup>1</sup> = 0.32, *a*<sup>2</sup> = 0.6, *b*<sup>2</sup> = 0.3, and *c* = 0.8.

where *G*(φ, τ) = (1 + τ2)sin(*a*)<sup>2</sup> + 1 − *c* cos φ. Here, α is proportional to the frequency difference. In particular, note that when ω = 0, *G* is independent of τ and otherwise, τ acts to weaken the correlated noise-induced synchronization as it increases the part of *G* that is not phase dependent. However,

the right side of this equation shows that the effect of the frequency difference is minimized when τ = 1, and thus we expect resonance in the OP. This effect disappears when α = 0.

different frequencies (*I*<sup>1</sup> = 120, φ<sup>1</sup> = 0.041, *I*<sup>2</sup> = 110, and φ<sup>2</sup> = 0.04616).

## **4. DISCUSSION**

In our current study, we have extended a number of previous results describing the ability of neural oscillators to synchronize in the presence of correlated noise. Our methods are similar to those in Burton et al. (2012), with the additional aspect that we now use colored noise (OU process). The properties of the noise show up only through a convolution of the autocorrelation function of the noise with the phase functions *hnm*(φ) that, in turn, depend only on the PRCs (see Appendix, Equation 56). Thus, we could easily generalize this work to noise with an arbitrary autocorrelation function. In addition, we have now included many more examples of the theory and shown that the conclusions from the perturbation theory continue to be valid for full biophysical models (cf. **Figure 2**). Further, we have shown that for low input correlations, heterogeneity can actually improve synchrony both pairwise and in large populations. We demonstrated that this theoretical effect can be seen in experimental recordings of regularly firing olfactory bulb mitral cells. Thus, we have significantly extended the findings presented in Burton et al. (2012), and our results on colored noise further suggest some experimentally testable phenomena, such as the resonance seen in slightly detuned oscillators (**Figures 7–9**). These novel findings and their biological implications are discussed in more detail below.

#### **4.1. HETEROGENEITY CAN IMPROVE SYNCHRONY**

We found that correlated noise can synchronize a heterogeneous pair of oscillators (comprised of a "bad synchronizer" and a "good synchronizer") better than a homogeneous pair of bad synchronizers at low levels of input correlation and verified this experimentally. We showed that good (bad) synchronizers are characterized by having a relatively low (high) DC component in their PRC. Consistent with this, oscillators with the generic "type II" PRC (i.e., sin φ) are better synchronizers than oscillators with the generic "type I" PRC (i.e., 1 − cos φ).

Several authors have previously studied the effects of heterogeneity on the transfer of correlation. As we noted in Introduction, at low correlations, the output correlation is linearly proportional to the input correlation through a factor, *S*, called the susceptibility (de la Rocha et al., 2007; Shea-Brown et al., 2008). If we let *S*(*A*, *B*) denote the susceptibility for two neurons, *A*, *B*, then what we have found in our current study is that in some cases, *S*(*A*, *A*) > *S*(*A*, *B*) > *S*(*B*, *B*). Note that in our study, we are looking at output correlation related to spike-to-spike synchronization, whereas in many other studies of output correlation, the interest is in *spike count* correlation. We can regard our measure of synchrony as the same as spike count correlation, but over a time window that is of the order of the mean interspike interval. In a recent paper, (Shea-Brown et al., 2008) showed that for spike count correlation, *<sup>S</sup>*(*A*, *<sup>B</sup>*) <sup>=</sup> <sup>√</sup>*S*(*A*, *<sup>A</sup>*)*S*(*B*, *<sup>B</sup>*) and thus, trivially, we can obtain *<sup>S</sup>*(*A*, *<sup>A</sup>*) > *S*(*A*, *B*) > *S*(*B*, *B*) when *A* is "better" than *B* at transferring correlation. We want to emphasize that their result is for long time windows (that is, the window length tends to infinity). Which neurons are better than others at the transfer of correlation depends very strongly on the window of time through which you measure the correlation. Indeed, Barreiro et al. (2010) and Abouzeid and Ermentrout (2011) showed that type II PRCs have larger susceptibilities than type I for short time windows (i.e., spike synchrony) but the trend is reversed for large time windows (i.e., rate correlation).

Interestingly, the efficiency of correlated-noise induced synchronization is also modulated by firing rate in the low input correlation regime (de la Rocha et al., 2007; Tchumatchenko et al., 2010). Given that changes in firing rate can modulate PRC shape in biophysically realistic neuron models and in real neurons (Gutkin et al., 2005; Marella and Ermentrout, 2008; Stiefel et al., 2008, 2009; Schultheiss et al., 2010; Fink et al., 2011; Burton et al., 2012), whether or not (and the degree to which) PRC heterogeneity will enhance synchrony may depend in part on the firing rate. However, in the simplest cases (such as models like the leakyintegrate and fire model and the theta model), the only effect of the firing rate on the shape of the PRC is to change its amplitude. Since amplitude (but not shape) changes can be absorbed into the size of the noise, and our theory shows that the phase difference density is independent of the size of the noise (at least, if it is small enough), changes in firing rate will have no effect on the synchronization of pairs of neurons firing at the same or nearly the same rates.

The ability of cellular heterogeneity to regulate which oscillators synchronize best as a function of input correlation likely contributes to coding in many neural systems. In the olfactory bulb, where oscillatory synchrony appears to be critical to olfactory coding [for review, see Bathellier et al. (2010)], tens of "sister" mitral cells are linked to each glomerulus where they receive highly correlated afferent input (Carlson et al., 2000; Schoppa and Westbrook, 2001). Each sister mitral cell of a glomerulus may also participate in independent (i.e., unshared) lateral inhibitory circuits with non-sister mitral cells of surrounding glomeruli, mediated by local inhibitory granule cells (Dhawale et al., 2010; Tan et al., 2010). On average, sister mitral cells are thus subject to high input correlations while non-sister mitral cells are subject to low (though non-zero) input correlations (Dhawale et al., 2010). Further, we and others have demonstrated that mitral cells exhibit substantial cell-to-cell heterogeneity (Padmanabhan and Urban, 2010; Angelo and Margrie, 2011; Angelo et al., 2012; Burton et al., 2012). Based on our current results, this heterogeneity will thus act to reduce output synchrony of sister mitral cells but *enhance* output synchrony of non-sister mitral cells. Thus, in the context of the olfactory system, heterogeneity will promote encoding of combinatorial sensory information (i.e., activation of non-sister mitral cells by odor combinations).

Our results suggest that heterogeneity can only enhance correlation-induced synchronization by a moderate amount between two neural oscillators (up to 36% in BVP solutions using mitral cell PRCs). Two properties of the olfactory bulb nevertheless suggest that even this moderate effect can significantly influence patterns of oscillatory synchrony in the olfactory system. First, the reciprocal dendrodendritic connectivity between mitral cells and granule cells enables activity-dependent regulation of granule cell recruitment (Arevian et al., 2008), which can lead to amplification of granule cell-mediated correlated noise-induced synchronization (Marella and Ermentrout, 2010). Second, mitral cells separated by up to ∼2 mm can engage in lateral inhibitory interactions (Egger and Urban, 2006), thus multiplying the synchrony-enhancing effect of cellular heterogeneity across a potentially large fraction of the ∼40,000 total mitral cells per mouse olfactory bulb (Benson et al., 1984). Whether neural oscillator heterogeneity exists in, and significantly enhances, correlated-noise induced synchrony in other brain regions remains a promising topic of future research.

### **4.2. RESONANCE**

In addition to the above findings, we found that there exists some resonance of correlated noise-induced synchronization with respect to the time scale of the noise. That is, we found a local maximum in OP as the time scale of the correlated noise varied. Surprisingly, this only occurs when there is a difference in the frequencies between the two oscillators. The requirement for the frequency difference would seem to contradict earlier work (Galán et al., 2008), where it was found that the Liapunov exponent was most negative when the noise has a particular time scale. However, when the noise is only partially correlated, the uncorrelated part of the noise causes a drift in the phase difference. The degree of this drift is also dependent on the time scale of the noise, and thus the two effects cancel. A frequency difference breaks this symmetry by adding an additional drift term, which prevents one from factoring out the resonance. A frequency difference thus leads to a dependence of OP on the time scale of the noise. We have not yet tested this idea experimentally, but it seems to be quite robust, having been found in both the simple phase models (**Figures 7**, **8**) and in the biophysical ML model (**Figure 9**).

## **4.3. LIMITATIONS OF THE THEORY**

The analysis that we have done in this paper and in our earlier papers requires that the neurons fire almost periodically. This means that the activity of the neurons is *mean driven* rather than *fluctuation driven* so that the coefficient of variation of the interspike intervals should be small. While this may not be the case in all areas of the brain, there are many regions, such as the olfactory bulb, where the firing rate can be quite regular and synchronous as indicated by the large rhythmic local field potentials. Assuming that the neurons are firing at a fairly regular rate, it is also reasonable to ask how well the PRC describes such noisy neurons. An extensive review of the caveats of PRC theory for real neurons can be found in Smeal et al. (2010). Another issue is the actual estimation of the PRCs in the presence of noise. Several studies have shown that background synaptic activity and other forms of noise do not significantly affect the shape of the PRC (Ermentrout et al., 2011; Netoff et al., 2012).

## **REFERENCES**


In conclusion, we have extended our previous work to demonstrate that oscillator heterogeneity and frequency differences interact with the time scale of input noise to regulate how correlated noise synchronizes uncoupled oscillators.

### **4.4. DATA SHARING**

All codes are available by request from the authors. They include Matlab and XPPaut files.

## **ACKNOWLEDGMENTS**

This work was supported by National Institute on Drug Abuse Predoctoral Training Grant R90DA023426 and a R.K. Mellon Foundation Presidential Fellowship in the Life Sciences (Pengcheng Zhou), an Achievement Rewards for College Scientists Foundation Fellowship (Shawn D. Burton), National Institute on Deafness and Other Communication Disorders Grant 5R01DC011184-07 (Nathaniel N. Urban and G. Bard Ermentrout), and National Science Foundation grant DMS1219754 (G. Bard Ermentrout).


(2010). Dynamics of limit-cycle oscillators subject to general noise. *Phys. Rev. Lett.* 105:154101. doi: 10.1103/PhysRevLett.105.154101


*Phys. Rev. Lett.* 98:184101. doi: 10.1103/PhysRevLett.98.184101


I. Segev (Cambridge, MA: MIT Press), 135–169.


*Physiol. Rev.* 90, 1195–1268. doi: 10.1152/physrev.00035.2008

White, J. A., Chow, C. C., Rit, J., Soto-Treviño, C., and Kopell, N. (1998). Synchronization and oscillatory dynamics in heterogeneous, mutually inhibited neurons. *J. Comput. Neurosci.* 5, 5–16. doi: 10.1023/A:1008841325921

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 May 2013; accepted: 25 July 2013; published online: 21 August 2013. Citation: Zhou P, Burton SD, Urban NN and Ermentrout GB (2013) Impact of neuronal heterogeneity on correlated colored noise-induced synchronization. Front. Comput. Neurosci. 7:113. doi: 10.3389/fncom.2013.00113*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Zhou, Burton, Urban and Ermentrout. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX**

#### **REDUCTION TO A PHASE MODEL**

Consider a general oscillator receiving a possibly noisy time-dependent signal:

$$\frac{dX}{dt} = F(X) + \epsilon N(X, t) \tag{27}$$

Here *N*(*X*,*t*) is the external or imposed inputs into the system. For single compartment neural models, *N* will typically only affect the membrane potential, e.g., as an injected or synaptic current. We assume that *X*- = *F*(*X*) has as a solution an exponentially stable limit cycle, *U*(*t* + *T*) = *U*(*t*) and that is a small positive parameter characterizing the magnitude of the input. We are interested in how the phase of the limit cycle evolves in time in the presence of small inputs. The phase of a limit cycle is easy to define when a point lies *on* the limit cycle. For example, for neurons, the phase is just the time since the last spike of the cell. However, if the limit cycle is attracting, then it is also possible to define the phase of a point that is near, but not directly on the limit cycle. Specifically, there is a function (*X*) that maps a point near the limit cycle, *X*, to the phase that it will eventually reach as it is attracted to the limit cycle (asymptotic phase). Clearly (*U*(*t*)) = *t*. Define the phase to be θ(*t*) = (*X*(*t*)), so that by the chain rule:

$$\frac{d\theta}{dt} = \nabla\_X \Theta(X) \cdot \frac{dX}{dt} \tag{28}$$

$$\dot{\Theta} = \nabla\_X \Theta(X) \cdot F(X) + \epsilon \nabla\_X \Theta(X) \cdot N(X(t), t) \tag{29}$$

$$=1+\epsilon \nabla\_X \Theta(X) \cdot N(X(t), t) \tag{30}$$

Thus, in the absence of inputs, the phase moves around the circle at constant velocity. This expression is exact, but not very helpful since it requires knowledge of the solution *X*(*t*). Kuramoto's approximation (which is valid for small ) is to replace *X*(*t*) in the right-hand side by *U*(θ(*t*)), where *U* is the unperturbed limit cycle (Kuramoto, 2003). This closes the system yielding:

$$\frac{d\theta}{dt} = 1 + \epsilon Z(\theta) \cdot N(U(\theta), t) \tag{31}$$

where we have defined *Z*(θ) := ∇*X*(*U*(θ)). The function, *Z*(θ) is the so-called adjoint function satisfying the linear equation:

$$Z' = -\left(D\_X F(U(t))\right)^T Z(t) \tag{32}$$

with *Z*(*t*) · *U*- (*t*) = 1. Here *DXF*(*U*(*t*)) means the linearization of *F*(*X*) evaluated along the limit cycle.

In single compartment neuron models, inputs appear only in the voltage component of the neural oscillator in the form of external currents so that the dot product in Equation 31 becomes scalar multiplication:

$$\frac{d\theta}{dt} = 1 + \epsilon \Delta(\theta) I(U(\theta), t)/C \tag{33}$$

where *I* is the input current, *C* is the capacitance, and -(θ) is the voltage component of the vector *Z*. The quantity, -(θ), is sometimes called the infinitesimal PRC and, for small perturbations, is proportional to the PRC.

#### **DERIVATION OF THE STATIONARY DENSITY OF PHASE DIFFERENCES**

The Langevin equations that drive the phase models (Equations 4–6) correspond to a forward Fokker-Planck (FP) equation that can be written as (Risken, 1984):

$$\frac{\partial\rho}{\partial t} = \frac{1}{\pi} \left\{ \frac{\partial}{\partial \mathbf{x}} \left( \frac{1}{2} \frac{\partial}{\partial \mathbf{x}} + \mathbf{x} \right) + \frac{\partial}{\partial \mathbf{y}} \left( \frac{1}{2} \frac{\partial}{\partial \mathbf{y}} + \mathbf{y} \right) + \boldsymbol{\epsilon} \frac{\partial^2}{\partial \mathbf{x} \partial \mathbf{y}} \right\} \rho - \frac{\partial}{\partial \theta} \rho \tag{34}$$
 
$$ -\boldsymbol{\epsilon} \left\{ \frac{\partial}{\partial \theta} [\Delta\_1(\theta) \mathbf{x} \rho] + \frac{\partial}{\partial \phi} [(\Delta\_2(\theta + \phi)\mathbf{y} - \Delta\_1(\theta)\mathbf{x})\rho] \right\} $$
 
$$ -\boldsymbol{\epsilon}^2 \omega \frac{\partial}{\partial \phi} \rho$$

When the distribution of phase differences is stationary, <sup>∂</sup><sup>ρ</sup> <sup>∂</sup>*<sup>t</sup>* <sup>=</sup> 0. Our goal is to exploit the smallness of to compute this stationary density with which we can compute the marginal distribution of the phase difference.

#### **ANALYTICAL SOLUTION**

We expand the steady state ρ in :

$$\rho(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) = \rho\_0(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) + \epsilon \rho\_1(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) + \epsilon^2 \rho\_2(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi})\tag{35}$$

$$\iiint \iint \rho\_0(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) dxdyd\boldsymbol{\theta}d\boldsymbol{\phi} = 1$$

$$\iiint \iint \rho\_n(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) dxdyd\boldsymbol{\theta}d\boldsymbol{\phi} = 0, \quad n = 1, 2$$

We define the operator:

$$L\_0 = \frac{1}{\pi} \left\{ \frac{\partial}{\partial \mathbf{x}} \left( \frac{1}{2} \frac{\partial}{\partial \mathbf{x}} + \mathbf{x} \right) + \frac{\partial}{\partial \boldsymbol{\chi}} \left( \frac{1}{2} \frac{\partial}{\partial \boldsymbol{\chi}} + \boldsymbol{\chi} \right) + \boldsymbol{\varepsilon} \frac{\partial^2}{\partial \mathbf{x} \partial \boldsymbol{\chi}} \right\} + \frac{\partial}{\partial \boldsymbol{\theta}} \tag{36}$$

At steady state condition ( <sup>∂</sup><sup>ρ</sup> <sup>∂</sup>*<sup>t</sup>* <sup>=</sup> 0), we substitute the above expansion into the FP equation and collect the powers of . We need to go to 2:

$$0 = L\rho \rho \tag{37}$$

$$0 = L\_0 \rho\_1 - \left\{ \frac{\partial}{\partial \theta} [\Delta\_1(\theta) \mathbf{x} \rho\_0] + \frac{\partial}{\partial \phi} [(\Delta\_2(\theta + \phi)\mathbf{y} - \Delta\_1(\theta)\mathbf{x})\rho\_0] \right\} \tag{38}$$

$$0 = L\_0 \rho\_2 - \left\{ \frac{\partial}{\partial \theta} [\Delta\_1(\theta) \mathbf{x} \rho\_1] + \frac{\partial}{\partial \phi} [(\Delta\_2(\theta + \phi)\mathbf{y} - \Delta\_1(\theta)\mathbf{x})\rho\_1] \right\} - a \frac{\partial}{\partial \phi} \rho\_0 \tag{39}$$

#### *Solving Equation 37*

Equation 37 is just a linear separable equation, independent of φ, so, by inspection:

$$\rho\_0(\mathbf{x}, \boldsymbol{y}, \boldsymbol{\theta}, \boldsymbol{\phi}) = \frac{1}{2\pi} G(\mathbf{x}, \boldsymbol{y}) R(\boldsymbol{\phi}) \tag{40}$$

where:

$$G(\mathbf{x}, \mathbf{y}) = \frac{1}{\sqrt{1 - c^2} \pi} e^{-\frac{1}{1 - c^2} (\mathbf{x}^2 + \mathbf{y}^2 - 2c\mathbf{x}\mathbf{y})} \tag{41}$$

and *R*(φ) remains to be determined. Note that *G*(*x*, *y*) is just the standard stationary solution to the multivariate OU equation. At this juncture, we remark that our main goal is to find *R*(φ), which is the marginal density of the phase differences between the two oscillators.

#### *Solving Equation 38*

Both Equations 38 and 39 have the form *L*0ρ = *b*(*x*, *y*, θ). *L*<sup>0</sup> operates on the space of functions defined on *R*<sup>2</sup> × *S*<sup>1</sup> that are twice continuously differentiable in *x*, *y* and continuously differentiable in θ. In this space, *L*<sup>0</sup> has a one-dimensional nullspace spanned by *G*(*x*, *y*) (constant in θ) and so *L*<sup>0</sup> is not invertible. However, *L*0ρ(*x*, *y*, θ) = *b*(*x*, *y*, θ) does have a solution provided that *b*(*x*, *y*, θ) is orthogonal to the null space of *L*<sup>∗</sup> <sup>0</sup>, which is the adjoint linear operator of *L*0. Since *L*<sup>0</sup> is a standard probability operator, its adjoint is always 1 (i.e., the function that is 1 everywhere).

Since:

$$b\_1(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}) = \frac{\mathbf{x} \mathbf{G}(\mathbf{x}, \mathbf{y})}{2\pi} [\Delta\_1'(\boldsymbol{\theta}) \mathbf{R}(\boldsymbol{\phi}) - \Delta\_1(\boldsymbol{\theta}) \mathbf{R}'(\boldsymbol{\phi})] + \boldsymbol{\mathcal{y}} \mathbf{G}(\mathbf{x}, \mathbf{y}) [\Delta\_2'(\boldsymbol{\theta} + \boldsymbol{\phi}) \mathbf{R}(\boldsymbol{\phi}) + \Delta\_2(\boldsymbol{\theta} + \boldsymbol{\phi}) \mathbf{R}'(\boldsymbol{\phi})] \tag{42}$$

we see that - *b*1(*x*, *y*, θ)*dxdyd*θ = 0. Thus, *L*0ρ<sup>1</sup> = *b*<sup>1</sup> has a solution. Since:

$$L\_0[\mathbf{x}G(\mathbf{x},\mathbf{y})] = -\mathbf{x}G(\mathbf{x},\mathbf{y})/\mathbf{r} \tag{43}$$

$$L\_0[\chi G(\mathbf{x}, \mathbf{y})] = -\chi G(\mathbf{x}, \mathbf{y})/\pi \tag{44}$$

we look for a solution of the form:

$$\rho\_1(\mathbf{x}, \mathbf{y}, \theta, \phi) = \frac{\mathbf{w}\_1(\theta, \phi)\mathbf{x}\mathbf{G}(\mathbf{x}, \mathbf{y}) + \mathbf{w}\_2(\theta, \phi)\mathbf{y}\mathbf{G}(\mathbf{x}, \mathbf{y})}{2\pi} \tag{45}$$

Inserting this into Equation 38, we find that *wj*(θ, φ) must satisfy:

$$\frac{\partial}{\partial\theta}\mathbb{w}\_1(\theta,\phi) + \frac{\mathbb{w}\_1(\theta,\phi)}{\mathfrak{r}} = -\Delta\_1'(\theta)R(\phi) + \Delta\_1(\theta)R'(\phi) \tag{46}$$

$$\frac{\partial}{\partial \theta} \mathcal{w}\_2(\theta, \phi) + \frac{\mathcal{w}\_2(\theta, \phi)}{\mathfrak{r}} = -\Delta\_2'(\theta + \phi)R(\phi) - \Delta\_2(\theta + \phi)R'(\phi) \tag{47}$$

*wj* must be periodic functions of θ; we defer their exact solution to later, but note that there is always a unique periodic solution to each of these equations.

#### *Solving Equation 39*

We now have:

$$b\_2(\mathbf{x}, \mathbf{y}, \boldsymbol{\theta}) = \frac{\partial}{\partial \boldsymbol{\theta}} [\Delta\_1(\boldsymbol{\theta}) \mathbf{x} \boldsymbol{\rho}] + \frac{\partial}{\partial \boldsymbol{\phi}} [(\Delta\_2(\boldsymbol{\theta} + \boldsymbol{\phi}) \mathbf{y} - \Delta\_1(\boldsymbol{\theta}) \mathbf{x}) \boldsymbol{\rho}] + a \frac{\partial}{\partial \boldsymbol{\phi}} \boldsymbol{\rho} \mathbf{o} \tag{48}$$

In order to solve Equation 39, for ρ2(*x*, *y*, θ, φ), we must have:

$$\begin{split} 0 &= \iint \int\_{-\infty}^{\infty} \int\_{0}^{2\pi} b\_{2}(\mathbf{x}, \mathbf{y}, \boldsymbol{\Phi}) d\mathbf{x} d\boldsymbol{\Phi} d\boldsymbol{\Phi} \\ &= 0 + \frac{\partial}{\partial \boldsymbol{\Phi}} \left\{ \iint\_{-\infty}^{\infty} \int\_{0}^{2\pi} \left\{ \frac{\Delta\_{1}(\boldsymbol{\Phi} + \boldsymbol{\Phi})}{2\pi} [w\_{2}(\boldsymbol{\Phi}, \boldsymbol{\Phi})\boldsymbol{\chi}^{2} + w\_{1}(\boldsymbol{\Phi}, \boldsymbol{\Phi})\boldsymbol{\chi}\boldsymbol{\chi}] G(\mathbf{x}, \boldsymbol{\eta}) \right. \right. \\ & \left. - \frac{\Delta\_{1}(\boldsymbol{\Phi})}{2\pi} [w\_{1}(\boldsymbol{\Phi}, \boldsymbol{\Phi})\boldsymbol{\chi}^{2} + w\_{2}(\boldsymbol{\Phi}, \boldsymbol{\Phi})\boldsymbol{\chi}\boldsymbol{\chi}] G(\mathbf{x}, \boldsymbol{\eta}) + \frac{a}{2\pi} R(\boldsymbol{\Phi}) G(\mathbf{x}, \boldsymbol{\eta}) \right\} d\mathbf{x} d\boldsymbol{y} d\boldsymbol{\theta} \\ &= \frac{1}{4\pi} \frac{\partial}{\partial \boldsymbol{\Phi}} \left\{ \int\_{0}^{2\pi} [\Delta\_{2}(\boldsymbol{\Phi} + \boldsymbol{\Phi})] \nu\_{2}(\boldsymbol{\Phi}, \boldsymbol{\Phi}) + c \cdot w\_{1}(\boldsymbol{\Phi}, \boldsymbol{\Phi})] - \Delta\_{1}(\boldsymbol{\Phi}) [w\_{1}(\boldsymbol{\Phi}, \boldsymbol{\Phi}) + c \cdot w\_{2}(\boldsymbol{\Phi}, \boldsymbol{\Phi})]] d\boldsymbol{\Phi} + 4\pi a R(\boldsymbol{\Phi}) \right\} \\ &= \frac{1}{4\pi} \frac{\partial}{\partial \boldsymbol{\Phi}} [f(\boldsymbol{\Phi}) + 4\pi a R(\boldsymbol{\Phi})] \end{split} \tag{49}$$

where:

$$f(\boldsymbol{\phi}) = \int\_0^{2\pi} [\Delta\_2(\boldsymbol{\theta} + \boldsymbol{\phi})\nu\_2(\boldsymbol{\theta}, \boldsymbol{\phi}) - \Delta\_1(\boldsymbol{\theta})\nu\_1(\boldsymbol{\theta}, \boldsymbol{\phi})]d\boldsymbol{\theta} \tag{50}$$

$$\nu\_1(\boldsymbol{\theta}, \boldsymbol{\phi}) = \nu\_1(\boldsymbol{\theta}, \boldsymbol{\phi}) + c\nu\_2(\boldsymbol{\theta}, \boldsymbol{\phi})$$

$$\nu\_2(\boldsymbol{\theta}, \boldsymbol{\phi}) = \nu\_2(\boldsymbol{\theta}, \boldsymbol{\phi}) + c\nu\_1(\boldsymbol{\theta}, \boldsymbol{\phi})$$

Given Equations 46 and 47, we see that *v*1(θ) and *v*2(θ) satisfy:

$$\begin{aligned} \nu\_1'(\theta) + \frac{\nu\_1(\theta)}{\pi} &= -[\Delta\_1'(\theta) + c \cdot \Delta\_2'(\theta + \phi)]R(\phi) + [\Delta\_1(\theta) - c \cdot \Delta\_2(\theta + \phi)]R'(\phi) \\ &:= q\_1(\theta) \\ \nu\_2'(\theta) + \frac{\nu\_2(\theta)}{\pi} &= -[c \cdot \Delta\_1'(\theta) + \Delta\_2'(\theta + \phi)]R(\phi) + [c \cdot \Delta\_1(\theta) - \Delta\_2(\theta + \phi)]R'(\phi) \\ &:= q\_2(\theta) \end{aligned} \tag{51}$$

For Equations 51 and 52, we can write down the solution of *v*1(θ) and *v*2(θ) in terms of *q*1(θ) and *q*2(θ) (see Appendix):

$$\nu\_n(\theta) = \int\_0^\infty e^{-\frac{t}{\epsilon}} q\_n(\theta - s) ds, \quad n = 1, 2 \tag{53}$$

**Frontiers in Computational Neuroscience www.frontiersin.org** August 2013 | Volume 7 | Article 113 |

We substitute *vn*(φ) into *f*(φ),

$$f(\boldsymbol{\theta}) = \int\_0^{2\pi} [\Delta\_2(\boldsymbol{\theta} + \boldsymbol{\phi})\nu\_2(\boldsymbol{\theta}) - \Delta\_1(\boldsymbol{\theta})\nu\_1(\boldsymbol{\theta})]d\boldsymbol{\theta}$$

$$= \int\_0^{\infty} e^{-\frac{t}{\tau}} ds \int\_0^{2\pi} [\Delta\_2(\boldsymbol{\theta} + \boldsymbol{\phi})q\_2(\boldsymbol{\theta} - \boldsymbol{s}) - \Delta\_1(\boldsymbol{\theta})q\_1(\boldsymbol{\theta} - \boldsymbol{s})]d\boldsymbol{\theta}$$

$$= \int\_0^{\infty} e^{-\frac{t}{\tau}} ds \int\_0^{2\pi} h(\boldsymbol{\theta}, \boldsymbol{\phi}, \boldsymbol{s})d\boldsymbol{\theta} \tag{54}$$

where:

$$h(\theta, \phi, s) = \Delta\_2(\theta + \phi)q\_2(\theta - s) - \Delta\_1(\theta)q\_1(\theta - s)$$

$$= \Delta\_1(\theta - s)[c \cdot \Delta\_2(\theta + \phi) - \Delta\_1(\theta)]R'(\phi)$$

$$+ \Delta\_2(\theta + \phi - s)[-\Delta\_2(\theta + \phi) + c \cdot \Delta\_1(\theta)]R'(\phi)$$

$$+ \Delta\_1'(\theta - s)[-c \cdot \Delta\_2(\theta + \phi) + \Delta\_1(\theta)]R(\phi)$$

$$+ \Delta\_2'(\theta + \phi - s)[-\Delta\_2(\theta + \phi) + c \cdot \Delta\_1(\theta)]R(\phi) \tag{55}$$

Define:

$$g\_{mn}(\phi) = \int\_0^\infty h\_{mn}(s+\phi)e^{-\frac{s}{\epsilon}}ds\tag{56}$$

$$h\_{mn}(s) = \int\_0^{2\pi} \Delta\_m(\theta)\Delta\_n(\theta+s)d\theta$$

Since -<sup>1</sup>(θ) and -<sup>2</sup>(θ) are periodic functions,

$$\begin{aligned} \int\_{0}^{2\pi} h(\theta,\phi,s)d\theta &= \int\_{0}^{2\pi} \{\Delta\_{1}(\Theta-s)[[c\Delta\_{2}(\Theta+\phi)-\Delta\_{1}(\Theta)]R'(\phi)+[c\Delta\_{2}'(\Theta+\phi)-\Delta\_{1}'(\Theta)]R(\phi)\} \\ &+ \Delta\_{2}(\Theta+\phi-s)[[c\Delta\_{1}(\Theta)-\Delta\_{2}(\Theta+\phi)]R'(\phi)-[c\Delta\_{1}'(\Theta)-\Delta\_{2}'(\Theta+\phi)]R(\phi)]\}d\theta \\ &= [c[h\_{12}(s+\phi)+h\_{21}(s-\phi)]-[h\_{11}(s+h\_{22}(s))]R'(\phi) \\ &+ \left\{c\frac{d}{d\phi}[h\_{12}(s+\phi)+h\_{21}(s-\phi)]-\frac{d}{d\phi}[h\_{11}(s+\phi)-h\_{22}(s+\phi)]\Big|\_{\phi=0}\right\}R(\phi) \\ f(\phi) &= \int\_{0}^{\infty} e^{-\frac{\phi}{\tau}}d\phi \int\_{0}^{2\pi} h(\theta,\phi,s)d\theta \\ &= [c[g\_{12}(\theta)+g\_{21}(-\phi)]-[g\_{11}(0)+g\_{22}(0)]R'(\phi) \\ &+ \left\{c\cdot\frac{d}{d\phi}[g\_{12}(\phi)+g\_{21}(-\phi)]-[g\_{11}'(0)-g\_{22}'(0)]\right\}R(\phi) \\ &= \frac{d}{d\phi}\left[l(\cdot,g(\theta)-C\_{1})R(\phi)\right]-C\_{2}R(\phi) \end{aligned} \tag{58}$$

where:

$$\mathbf{g}(\phi) = \mathbf{g}\_{12}(\phi) + \mathbf{g}\_{21}(-\phi) \tag{59}$$

$$C\_1 = \lg\_{11}(0) + \lg\_{22}(0)\tag{60}$$

$$C\_2 = g\_{11}'(0) - g\_{22}'(0) \tag{61}$$

Combined with Equations 49–58, we have a boundary value problem (BVP):

$$\frac{d}{d\phi}\left[\left(\mathbf{c}\cdot\mathbf{g}(\phi)-\mathbf{C}\_{1}\right)R(\phi)\right]+\left(4\pi\omega-\mathbf{C}\_{2}\right)R(\phi)=K\tag{62}$$

$$R(\phi)=R(\phi+2\pi)$$

$$g(\phi)=g(\phi+2\pi)$$

$$\int\_{-\pi}^{\pi}R(\phi)d\phi=1$$

$$K=2\omega-\frac{\mathbf{C}\_{2}}{2\pi}$$

To solve this BVP, we need to compute *g*(φ) for a given PRC. We use two forms of the PRC in this paper:

$$
\Delta(\theta) = \sin(a) - \sin(\theta + a) + b\sin(2\theta) \tag{63}
$$

and

$$
\Delta(\theta) = A[\sin(B) - \sin(\theta + B)]e^{C(\theta - 2\pi)} \quad (\theta \in (0, 2\pi), \Delta(\theta) = \Delta(\theta + 2\pi))\tag{64}
$$

The required integrals can be computed for both PRC forms. More generally, all PRCs can be written in Fourier form and, again, the integrals are readily computed to obtain *g*(φ) (see below).

#### *Small correlation approximation for R(*φ*)*

We use a BVP solver to get the numerical solution for *R*(φ), but we would like to better understand the form of *R*(φ) at low values of correlation, *c*, so we expand *R* as a series in *c*. As *K* is dependent on *c*, we must also expand *K* in *c*. Finally, we need to keep the normalization condition for *R*(φ), hence:

$$\begin{aligned} R(\phi) &= R\_0(\phi) + cR\_1(\phi) + \dots & K = K\_0 + cK\_1 + \dots \\ R\_0(\phi) &= R(\phi)|\_{\varepsilon=0}, \quad \int\_{-\pi}^{\pi} R\_0(\phi) d\phi = 1 \\ \int\_{-\pi}^{\pi} R\_n(\phi) &= 0, n \ge 1 \end{aligned} \tag{65}$$

We substitute these expressions into the BVP, Equation 62 and find after equating powers of *c*:


$$-C\_1 R\_0'(\phi) + (4\pi\omega - C\_2)R\_0(\phi) = K\_0 \tag{66}$$

$$-C\_1R\_1'(\phi) + (4\pi a - C\_2)R\_1(\phi) + [\![g(\phi)R\_0(\phi)]\!]' = K\_1 \tag{67}$$

We can integrate both left and right side over [−π,π], then use the assumptions and periodicity requirements above to get:

$$K\_0 = \frac{4\pi\alpha - C\_2}{2\pi}, \quad K\_1 = 0 \tag{68}$$

Rewriting these equations,

$$R\_0'(\phi) + DR\_0(\phi) = Q\_0 \tag{69}$$

$$D = \frac{C\_2 - 4\pi\omega}{C\_1}, \quad Q\_0 = -\frac{K\_0}{C\_1}$$

$$R\_1'(\phi) + DR\_1(\phi) = Q\_1(\phi) \tag{70}$$

$$Q\_1(\phi) = \frac{[g(\phi)R\_0(\phi)]'}{C\_1}$$

we see:

$$R\_0(\phi) = \frac{1}{2\pi}$$

$$[R\_1(\phi)e^{D\phi}]' = Q\_1(\phi)e^{D\phi}$$

We can use numerical methods to get the solution of *R*1(φ) given *R*0(φ) and for some choices of PRCs we can get exact expressions. For example, for the two term double sinusoidal form PRC (Equation 63) we get:

$$R\_1(\phi) = \frac{1}{C\_1} \left\{ \frac{\pi}{\pi^2 + 1} \frac{\cos(\phi + a\_2 - a\_1) - D \sin(\phi + a\_2 - a\_1)}{1 + D^2} + \frac{2b\_1 b\_2 \pi}{4\pi^2 + 1} \frac{2 \cos(2\phi) - D \sin(2\phi)}{4 + D^2} \right\} \tag{71}$$

#### **DIFFERENT PRCS**

*Double sines form PRC*

For the PRC:

$$\Delta\_m(\theta) = \sin(a\_m) - \sin(\theta + a\_m) + b\_m \sin(2\theta)$$

We have:

$$\begin{split} h\_{\text{nn}}(s) &= \int\_0^{2\pi} \Delta\_{\text{m}}(\theta) \Delta\_{\text{n}}(\theta + s) d\theta \\ &= 2\pi \sin(a\_{\text{m}}) \sin(a\_{\text{n}}) + \pi \cos(s - a\_{\text{m}} + a\_{\text{n}}) + b\_{\text{m}} b\_{\text{n}} \pi \cos(2s) \\ \text{where} \quad \epsilon\_{\text{n}}(\theta) &= \int\_0^{\infty} h\_{\text{nn}}(s \pm \lambda) e^{-\frac{s}{\lambda}} ds \end{split} \tag{72}$$

$$\begin{aligned} g\_{mn}(\phi) &= \int\_0^\pi h\_{mn}(s+\phi)e^{-\frac{\pi}{4}}ds \\ &= 2\pi\tau\sin(a\_m)\sin(a\_n) + \pi\tau \frac{\cos(\phi+a\_n-a\_m) - \tau\sin(\phi+a\_n-a\_m)}{\pi^2+1} \end{aligned}$$
 
$$\cos(2\phi) - 2\pi\sin(2\phi)$$

$$a\_n + b\_m b\_n \pi \tau \frac{\cos(2\phi) - 2\tau \sin(2\phi)}{4\pi^2 + 1} \tag{73}$$

$$g\_{mn}'(\phi) = -\pi \text{tr} \frac{\pi \cos(\phi + a\_n - a\_m) + \sin(\phi + a\_n - a\_m)}{\pi^2 + 1} - 2b\_m b\_n \pi \text{tr} \frac{2\pi \cos(2\phi) + \sin(2\phi)}{4\pi^2 + 1} \tag{74}$$

$$\begin{split} g(\phi) &= g\_{12}(\phi) + g\_{21}(-\phi) \\ &= 4\pi\tau \sin(a\_1)\sin(a\_2) + 2\pi\tau \frac{\cos(\phi + a\_2 - a\_1)}{\pi^2 + 1} + 2b\_1 b\_2 \pi \tau \frac{\cos(2\phi)}{4\pi^2 + 1} \end{split} \tag{75}$$

$$\begin{split} g'(\phi) &= g'\_{12}(\phi) - g'\_{21}(-\phi) \\ &= -2\pi \mathfrak{r} \frac{\sin(\phi + a\_2 - a\_1)}{\mathfrak{r}^2 + 1} - 4b\_1 b\_2 \,\pi \mathfrak{r} \frac{\sin(2\phi)}{4\tau^2 + 1} \end{split} \tag{76}$$

$$\text{tr}\,C\_{l} = \mathcal{g}\_{11}(0) + \mathcal{g}\_{22}(0) = 2\pi\tau(\sin^{2}(a\_{1}) + \sin^{2}(a\_{2})) + \frac{2\pi\tau}{\tau^{2} + 1} + \frac{(b\_{1}^{2} + b\_{2}^{2})\pi\tau}{4\tau^{2} + 1} \tag{77}$$

$$C\_2 = g\_{11}'(0) - g\_{22}'(0) = \frac{4\pi\mathfrak{r}^2(b\_2^2 - b\_1^2)}{4\mathfrak{r}^2 + 1} \tag{78}$$

#### *Exponential-sine form PRC*

For empirical PRCs:


We have:

$$h\_{mt}(\mathbf{s}) = B\_1 \cdot e^{-\epsilon\_{ns}s} \langle hc\_1, h\nu\_1(\mathbf{s}) \rangle + B\_0 \cdot e^{\epsilon\_{qs}} \langle hc\_0, h\nu\_0(\mathbf{s}) \rangle \tag{79}$$

$$g\_{mn}(\phi) = C\_0 \cdot e^{\frac{\phi}{\varepsilon}} - e^{-c\_0 \phi} \langle \g c\_1, g\nu\_1(\phi) \rangle - e^{\varepsilon\_0 \phi} \langle \g c\_0, g\nu\_0(\phi) \rangle \tag{80}$$

Where *s* ∈ [0, 2π) and φ ∈ [0, 2π), both have the period of 2π. Note also that **(,)** means the inner product of two vectors. The above quantities are defined as:

*B*<sup>1</sup> = *am* · *an* · *e* <sup>−</sup>2π*cn* · [1 − *e* <sup>−</sup>2π*cm* ] *B*<sup>0</sup> = *am* · *an* · [1 − *e* <sup>−</sup>2π*cn* ] *D*<sup>1</sup> = *cm* + 1 τ *<sup>D</sup>*<sup>0</sup> <sup>=</sup> *cn* <sup>−</sup> <sup>1</sup> τ *<sup>k</sup>*<sup>0</sup> <sup>=</sup> sin(*bm*) · sin(*bn*) *cm* + *cn* <sup>−</sup> sin(*bm*) <sup>1</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> [(*cm* <sup>+</sup> *cn*) · sin(*bn*) <sup>−</sup> cos(*bn*)] *<sup>k</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> 2(*cm* + *cn*) , *<sup>k</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>4</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> , *<sup>k</sup>*<sup>3</sup> = − *cm* <sup>+</sup> *cn* 2[4 + (*cm* + *cn*)2] *<sup>k</sup>*<sup>4</sup> <sup>=</sup> (*cm* <sup>+</sup> *cn*) · sin(*bn*) <sup>1</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> , *<sup>k</sup>*<sup>5</sup> <sup>=</sup> sin(*bn*) 1 + (*cm* + *cn*)<sup>2</sup> *<sup>j</sup>*<sup>0</sup> <sup>=</sup> sin(*bm*) · sin(*bn*) *cm* + *cn* <sup>−</sup> sin(*bn*) <sup>1</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> [(*cm* <sup>+</sup> *cn*) · sin(*bm*) <sup>−</sup> cos(*bm*)] *<sup>j</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> 2(*cm* + *cn*) , *<sup>j</sup>*<sup>2</sup> = − <sup>1</sup> <sup>4</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> , *<sup>j</sup>*<sup>3</sup> = − *cm* <sup>+</sup> *cn* 2[4 + (*cm* + *cn*)2] *<sup>j</sup>*<sup>4</sup> = −(*cm* <sup>+</sup> *cn*) · sin(*bm*) <sup>1</sup> <sup>+</sup> (*cm* <sup>+</sup> *cn*)<sup>2</sup> , *<sup>j</sup>*<sup>5</sup> <sup>=</sup> sin(*bm*) 1 + (*cm* + *cn*)<sup>2</sup> *hc*<sup>1</sup> = [*k*0, *k*1, *k*2, *k*3, *k*4, *k*5] *hc*<sup>0</sup> = [*j*0, *j*1, *j*2,*j*3,*j*4, *j*5] *hv*<sup>1</sup> = [1, cos(*s* + *bn* − *bm*),sin(*s* − *bm* − *bn*), cos(*s* − *bm* − *bn*),sin(*s* − *bm*), cos(*s* − *bm*)] *hv*<sup>0</sup> = [1, cos(*s* + *bn* − *bm*),sin(*s* + *bm* + *bn*), cos(*s* + *bm* + *bn*),sin(*s* + *bn*), cos(*s* + *bn*)] *gc*<sup>1</sup> <sup>=</sup> *<sup>B</sup>*<sup>1</sup> 1 + *D*<sup>2</sup> 1 [− 1 + *D*<sup>2</sup> 1 *D*1 *k*0, −*D*1*k*1, *k*1, *k*3 − *D*1*k*2, −*k*<sup>2</sup> − *D*1*k*3, *k*5 − *D*1*k*4, −*k*4 − *D*1*k*5] *gc*<sup>0</sup> <sup>=</sup> *<sup>B</sup>*<sup>0</sup> 1 + *D*<sup>2</sup> 0 [ 1 + *D*<sup>2</sup> 0 *D*0 *j*0, *D*0*j*1,*j*1,*j*3 + *D*0*j*2, −*j*<sup>2</sup> + *D*0*j*3,*j*5 + *D*0*j*4, −*j*4 + *D*0*j*5] *gv*1(φ) = [1, cos(φ + *bn* − *bm*),sin(φ + *bn* − *bm*),sin(φ − *bm* − *bn*), cos(φ − *bm* − *bn*),sin(φ − *bm*), cos(*s* − *bm*)] *gv*0(φ) = [1, cos(φ + *bn* − *bm*),sin(φ + *bn* − *bm*),sin(φ + *bm* + *bn*), cos(φ + *bm* + *bn*),sin(φ + *bn*), cos(*s* + *bn*)] *<sup>C</sup>*<sup>0</sup> <sup>=</sup> *<sup>e</sup>*−2πτ 1 − *e* − <sup>2</sup><sup>π</sup> τ [(*e* <sup>−</sup>2π*cm* − 1) · **(***gc*1, *gv*1(0)**)** + (*e* <sup>2</sup>π*cn* − 1) · **(***gc*0, *gv*0(0)**)**]

*Fourier form PRC*

For the Fourier form of the PRC:

$$\Delta\_m(\theta) = \sum\_{k=-\infty}^{\infty} a\_{m,k} e^{i\theta \cdot \theta}$$

$$\begin{aligned} h\_{mn}(s) &= \int\_0^{2\pi} \Delta\_m(\theta) \Delta\_n(\theta) ds \\ &= \sum\_{k\_1, k\_2} a\_{m, k\_1} a\_{n, k\_2} \int\_0^{2\pi} e^{ik\_1 \theta} e^{ik\_2(\theta + s)} d\theta \end{aligned}$$

$$=2\pi \sum\_{k=-\infty}^{\infty} a\_{m,k} a\_{n,-k} e^{-iks} \tag{81}$$

$$g\_{mn}(\phi) = \int\_0^{\infty} h\_{mn}(s+\phi) e^{-\frac{t}{\tau}} ds$$

$$= 2\pi \sum\_{k=-\infty}^{\infty} a\_{m,k} a\_{n,-k} \int\_0^{\infty} e^{-ik(s+\phi)} e^{-\frac{t}{\tau}} ds$$

$$= 2\pi \sum\_{k=-\infty}^{\infty} \frac{a\_{m,k} a\_{n,-k}}{k^2 + \frac{1}{\tau^2}} (\frac{1}{\tau} - ik) e^{-ik\phi} \tag{82}$$

## Direct connections assist neurons to detect correlation in small amplitude noises

## *E. Bolhasani , Y. Azizi and A. Valizadeh\**

*Department of Physics, Institute for Advanced Studies in Basic Sciences, Zanjan, Iran*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Germán Mato, Centro Atomico Bariloche, Argentina Zachary P. Kilpatrick, University of Houston, USA*

#### *\*Correspondence:*

*A. Valizadeh, Department of Physics, Institute for Advanced Studies in Basic Sciences, Gava zang, PO Box 45195-1159, Zanjan, Iran e-mail: valizade@iasbs.ac.ir*

## **1. INTRODUCTION**

The recent advent of novel recording techniques has made it easier to simultaneously record from a large number of neurons and has provided new possibilities to relate population activity to coding and information processing in the brain (Greenberg et al., 2008; Cohen and Kohn, 2011). Many researchers suggest that studying the correlated activity of neurons in a population is essential for understanding how information is coded in the brain (Zohary et al., 1994; Abbott and Dayan, 1999; Nirenberg and Latham, 2003; Averbeck et al., 2006; Biederlack et al., 2006; Schneidman et al., 2006; Pillow et al., 2008). Correlated spiking of neurons contributes in several cognitive functions such as attention (Steinmetz et al., 2000), sensory coding (Christopher deCharms and Merzenich, 1996; Bair et al., 2001; Doiron et al., 2004; Galán et al., 2006; Schoppa, 2006) and discrimination (Stopfer et al., 1997; Kenyon et al., 2004), motor behavior (Maynard et al., 1999) and population coding (Sompolinsky et al., 2001; Averbeck et al., 2006; Josic et al., 2009). In addition to the functional effects of such correlations between populations of neurons on neural coding, understanding how different parameters such as biological, network or stimulus parameters tune them is eventually being revealed (Shadlen and Newsome, 1998; Binder and Powers, 2001; Moreno et al., 2002; Moreno-Bote and Parga, 2006; Tchumatchenko et al., 2010b; Rosenbaum and Josic, 2011b ´ ). Correlation between neuronal activities is measured frequently by pairwise correlation coefficients and spike count correlations, and the ability of a neuronal system to transfer correlation can be quantified by the correlation transfer function (CTF), which determines the relation between the output correlation of a system under stimulus and a specific input correlation (Doiron et al., 2006; Shea-Brown et al., 2008; Rosenbaum and Josic, 2011b ´ ).

A periodic common input on two (or more) uncoupled oscillators can cause coherent behavior when both oscillators lock to the external force (Pikovsky et al., 2003). A very common example is the control of circadian rhythms of humans/animals

We address a question on the effect of common stochastic inputs on the correlation of the spike trains of two neurons when they are coupled through direct connections. We show that the change in the correlation of small amplitude stochastic inputs can be better detected when the neurons are connected by direct excitatory couplings. Depending on whether intrinsic firing rate of the neurons is identical or slightly different, symmetric or asymmetric connections can increase the sensitivity of the system to the input correlation by changing the mean slope of the correlation transfer function over a given range of input correlation. In either case, there is also an optimum value for synaptic strength which maximizes the sensitivity of the system to the changes in input correlation.

**Keywords: correlation, correlation transfer, coupling, inhomogeneity, synchrony**

by the light-dark stimulation (Roberts, 2005). In case of noisy inputs the counterpart of the phenomena appears as stochastic synchronization (SS) which is a general topic that addresses the phenomenon of irregular phase locking between two noisy non-linear oscillators (Neiman et al., 1999). In nervous systems, cross-correlations can arise either from the presence of direct synaptic connections (Csicsvari et al., 1998; Barthó et al., 2004) or from shared inputs from the surrounding network or sensory layers (Binder and Powers, 2001; Türker and Powers, 2001, 2004). Effect of direct synaptic connections and common inputs have been widely studied, but these two sources of correlation can be present concurrently in many physical and biological systems and their interplay can result in quite interesting phenomena. Couplings can regulate the activity of noisy oscillators and less variability in neuronal dynamics emerges through synchronization in networks of coupled noisy oscillators (Ly and Ermentrout, 2010; Tabareau et al., 2010; Zilli and Hasselmo, 2010). Studies on the correlation of spike trains have reported increase and decrease of correlation due to the presence of excitatory and inhibitory synapses, respectively (Rosenbaum and Josic, 2011a; Ly et al., ´ 2012). When delay in communication and type of excitabilty of neurons are taken into account, the generality of these results can be debated since both excitatory and inhibitory synapses can be sources of synchrony and may increase correlation in different parameter ranges (Vreeswijk et al., 1994; Wang et al., 2012; Sadeghi and Valizadeh, 2013). Regarding the type of excitability and categorizing couplings as synchronizing and desynchronizing, it has been shown that shared inputs and direct couplings can show cooperative or disruptive effects on the correlation of noisy coupled oscillators (Ly and Ermentrout, 2009).

Possible differences between intrinsic parameters of neurons causes the message from the environment to the system to be decoded differently by the system components. Another aim of the current study is to investigate how the correlation is transferred by two neurons when the neurons are not identical. In such a heterogeneous system, the temporal symmetry of spike correlation is lost (Tchumatchenko et al., 2010b). We will show that with small amplitude stochastic inputs, even a slight inhomogeneity in the intrinsic parameters can lead to a large reduction of the pairwise correlation coefficient in the case of uncoupled neurons. As expected, the results depend on the time bins over which the correlation is calculated: spike count correlations over long time bins are less affected by the heterogeneity but synchrony—alignment of the action potential in small time bins—is tightly dependent on the homogeneity of the system.

We have shown that correlated inputs and direct connections can either show cooperative or disruptive effects in different ranges of parameters. For uncoupled neurons, correlation susceptibility increases by increasing the amplitude of noise for mildly correlated inputs (De La Rocha et al., 2007; Shea-Brown et al., 2008; Tchumatchenko et al., 2010b). We show that when direct connections are present between non-identical neurons, the mean susceptibility is not a monotonic function of the amplitude of the correlated noisy input anymore. Reminiscence of stochastic resonance phenomena, an intermediate noise amplitude in this case, leads to larger a sensitivity of the system to the changes in input correlation. We have also shown that with monosynaptic connections between two neurons, presence of inhomogeneity in the intrinsic firing rate of the neurons can enhance correlation of spike trains while for symmetric couplings, maximum correlation is seen for homogeneous system. Changing mismatch and synaptic strengths between two neurons, it is possible to change the functional form of the correlation transfer function to optimize the mean correlation susceptibility which is an indicator of the sensitivity of the system to the change of input correlation in different ranges. In this way, as the most important result of current study, we will show that with direct couplings it is possible to detect correlation in small amplitude noises by increasing the sensitivity of the system to the change of correlation in small amplitude noisy inputs.

## **2. MATERIALS AND METHODS**

The system under investigation consists of two coupled leaky integrate and fire (LIF) neurons (Knight, 1972), subjected to correlated stochastic inputs (see **Figure 1**). Subthreshold dynamics of the LIF neuron obeys the following first order equation:

$$
\pi\_m \frac{d\nu\_i}{dt} = V\_{\text{rest}} - \nu\_i + I\_i + I\_{ij}, \tag{1}
$$

in which *vi* is a voltage-like variable for each neuron labeled by *i* = 1, 2 with τ*<sup>m</sup>* = 20 ms and *V*rest = −70 mV. A severe nonlinearity is imposed on the model by considering a threshold value *vth* = −54 mV. Whenever this value is reached, the neuron *spikes* and the voltage resets to *v*reset = −60 mV. [Parameters taken from Troyer and Miller (1997)]. The spikes of the neurons are recorded as *xi*(*t*) = - *<sup>m</sup>* δ(*t* − *t m <sup>i</sup>* ) where *t m <sup>i</sup>* is the time of *m* th spike of the neuron *i*, and δ(*x*) is the Dirac delta function.

Each model neuron receives a synaptic current through the direct connection from the other neuron *Iij*, and an external current *Ii* representing the sensory input or the effect of the surrounding networks. In the model equations, external current to

the neuron *i* comprises a constant (dc) and a stochastic component with amplitude σ. The stochastic inputs are sum of a common component ξ*c*(*t*) and an individual component ξ*i*(*t*):

$$I\_i(t) = (1 \pm \delta)I + \sigma \left[ \sqrt{1 - c} \xi\_i(t) + \sqrt{c} \xi\_c(t) \right],\tag{2}$$

where ξ*c*(*t*) and ξ*i*(*t*) are mutually independent Gaussian stochastic processes with zero mean and unit variance ξ*i*(*t*)ξ*j*(*t* ) = δ*ij*δ(*t* − *t* ). The parameter *c* ∈ [0, 1] determines correlation of external currents which will be referred to as the input correlation. With the minimal model we used, inhomogeneity in the intrinsic activity rates is imposed by different constant currents which are chosen as *I*<sup>1</sup> = (1 + δ)*I* and *I*<sup>2</sup> = (1 − δ)*I*, where δ is referred to as the parameter of inhomogeneity. With non-zero δ the neurons 1 and 2 will be the high frequency (fast) and low frequency (slow) neurons, respectively. The currents are chosen suprathreshold (>14 mV) such that the neurons fire periodically at vanishing noise. Note that in this mean driven regime presence of small amplitude noise results in small jitters in firing times and a narrow distribution of interspike intervals.

Neurons are pulse coupled. The neuron *i*receives a pulse by the strength *ij* every time the neuron *j* fires, so the synaptic current in Equation 1 can be written as *Iij* = *ijxj*(*t*) where the synaptic strength *ij* can be positive (excitatory) or negative (inhibitory). For convenience, we call the connections 21 and 12, the forward and backward connections, respectively. Although the external and synaptic inputs appear as currents, they are actually measured in units of the membrane potential (mV) since a factor of the membrane resistance has been absorbed into their definition.

Co-fluctuations in the activity of neurons are measured over a range of timescales (for a review see Cohen and Kohn, 2011). Spike count correlation is usually measured over the time scales from tens of milliseconds to seconds, while synchrony, that is almost precise alignment of the spikes, is measured over the time scale of the typical width of an action potential. It has been shown that spike count correlation over the small bins, bins of the order of one millisecond, can be largely determined by zero-lag conditional firing rate which quantifies exact synchrony (Tchumatchenko et al., 2010a). In this study we focus on synchrony, by describing spike counts and correlation coefficients in discrete bins of duration *T* = 0.5 ms. Correlation coefficient of spike counts *ni*(*t*) <sup>=</sup> *<sup>t</sup>* <sup>+</sup>*<sup>T</sup> <sup>t</sup> xi*(*s*)*ds*, is defined as the zero lag cross-correlation between *n*<sup>1</sup> and *n*2:

$$\rho\_T = \frac{\langle n\_1(t)n\_2(t)\rangle - \langle n\_1(t)\rangle \langle n\_2(t)\rangle}{\sqrt{\langle n\_1(t)^2\rangle - \langle n\_1(t)\rangle^2}\sqrt{\langle n\_2(t)^2\rangle - \langle n\_2(t)\rangle^2}}. \tag{3}$$

Dependence of the output correlation to the input correlation shows how correlation is transferred along neuronal layers in the nervous system (Rosenbaum and Josic, 2011a ´ ). With varying input correlation while other parameters are fixed, we compute ρ*T*(*c*), correlation of spike trains as a function of input correlation. To study sensitivity of correlation of output spike trains to the change of input correlation, we use *mean correlation susceptibility* (MCS), the mean slope of ρ*T*(*c*) in a given range of *c* ∈ [*c*1,*c*2]:

$$S\_T(c\_1, c\_2) = \frac{\Delta \rho\_T}{\Delta c}.\tag{4}$$

which shows ratio of the change of correlation of spike trains ρ*<sup>T</sup>* = ρ*T*(*c*2) − ρ*T*(*c*1) to the change of input correlation *c* = *c*<sup>2</sup> − *c*1. For two identical neurons with no direct connection, this value is equal to one when it is evaluated over the full range of input correlation [0, 1].

## **3. RESULTS**

We first present the results for two uncoupled neurons. In **Figure 2A** we have shown the cross-correlation coefficient as a function of the mismatch between intrinsic firing rates of neurons for low noise amplitude and different values of the input correlation. When there is no direct connection between the neurons, highly correlated inputs lead to a large output correlation in case of identical neurons. Even a small mismatch decreases the output correlation considerably if the noise is small amplitude. In this case, even common noises lead to a relatively low output correlation in the presence of a slight inhomogeneity (e.g., δ = 0.01 in **Figure 2A**). For larger noise amplitudes, the output correlation is less sensitive to inhomogeneity (**Figure 2B**). The system is also less sensitive to inhomogeneity when the inputs are weakly correlated where both homogeneous and inhomogeneous systems show a small output correlation. In **Figures 2C,D** we have shown the correlation transfer function. It can be seen that while the slope of the correlation transfer function decreases with mismatch for all the values of input correlation, this dependence is only noticeable when inputs are highly (completely) correlated. Increasing the noise amplitude (while decreasing the constant input to avoid a change in the mean firing rate as explained below) makes the output correlation less sensitive to inhomogeneity, yet the maximum sensitivity to mismatch is observed for highly correlated inputs (**Figure 2D**).

To show how sensitive are the correlation of spike trains to the input correlation, in **Figure 2E** we have plotted MCS (mean slope

**FIGURE 2 | Correlation of spike trains for two uncoupled neurons. (A)** Correlation coefficient is plotted against inhomogeneity, the mismatch between input current of neurons, for different values of input correlation and low noise amplitude σ = 1 mV. In **(B)** the same results are shown for larger value of noise amplitude σ = 5 mV with the same mean firing rate as **(A)** (see materials and methods). **(C,D)** Correlation transfer function, which shows the dependence of correlation of spike trains to the input

correlation, is plotted for different values of inhomogeneity for the same noise amplitudes as **(A)** and **(B)**. **(E)** Mean correlation susceptibility (MCS) is plotted for homogeneous and slightly inhomogeneous systems, as a function of noise amplitude, which shows the mean sensitivity of the output correlation to the change of input correlation over the range [0, 0.5]. In **(F)** the geometric mean of the firing rate of the two neurons is shown when σ is varied.

of ρ*T*(*c*) as described in materials and methods) as a function of the amplitude of the stochastic input for two uncoupled neurons over the range *c* ∈ [0 − 0.5] for homogeneous (δ = 0) and slightly inhomogeneous (δ = 0.02) systems. The system shows low sensitivity to the change in input correlation for small amplitude noises and the sensitivity smoothly increases with noise amplitude. Also, the presence of inhomogeneity has negligible effect on the mean correlation suceptibility: as noted above, for uncoupled neurons effect of inhomogeneity is only significant when inputs are highly correlated and while MCS is calculated over a range of weakly correlated inputs, it is almost insensitive to small inhomogeneity. While increasing the amplitude of the fluctuations, we have decreased mean value of the input currents to keep the mean firing rate almost constant (∼64 Hz) as is shown in **Figure 2F**. In such a way the results observed in **Figure 2E** can not be attributed to the increase in firing rate which is known to increase the spike train correlation (De La Rocha et al., 2007; Shea-Brown et al., 2008). These results show that the correlation in small amplitude noises can not be suitably detected by a system of uncoupled neurons, whether the neurons have equal firing rates or their firing rates are different. To investigate the effect of direct couplings we have first considered a two neurons motif with just one unidirectional excitatory synapse. In many cases this configuration is favored when the synapses change through spike timing-dependent plasticity (Song et al., 2000). We considered an excitatory forward coupling from the high frequency neuron (as the presynaptic) to low frequency neuron (as the postsynaptic). In the absence of noise, any finite value of the forward coupling strength can lead to a zone of 1:1 synchrony, in which the dissimilar neurons fire in a causal master-slave fashion (Takahashi et al., 2009; Bayati and Valizadeh, 2012). In such causal limit the postsynaptic neuron fires immediately after receiving presynaptic stimulation (Woodman and Canavier, 2011; Wang et al., 2012). In our model delays in communication have been ignored, so in the causal 1:1 synchrony zones the postsynaptic neuron fires just one simulation time step after the firing of presynaptic neuron. Since the time bin on which the correlation is calculated contains several time steps (see materials and methods), such a causal master-slave firing leads to ρ = 1 (gray curves in **Figure 3**).

Stochastic inputs have non-trivial effects on the correlation of the spike trains of these two neurons. The output correlation is not a monotonically decreasing function of mismatch anymore, and in the presence of noise a small mismatch can increase the output correlation (**Figure 3A**). With zero mismatch, in the presence of one excitatory connection from neuron 1 to neuron 2 and in the absence of noise, the only stable state is the phase locked state in which neuron 2 fires one time step after neruon 1 (Bayati and Valizadeh, 2012). In the presence of noise this state looses stability as follows: because of the initial phase difference between the two neurons after master-slave firing (even though the phase difference is very small, just one time step), they respond slightly differently even to common noises. The different responses of the two neurons lead to a cumulative phase difference and if this phase difference results in the firing of neuron 2 before neuron 1

are shown when the neurons are bidirectionally coupled by symmetric connections. In **(C)** and **(D)** the results are presented for larger noise amplitude σ = 5 mV. Noise amplitude in **(A)** and **(B)** is σ = 1 mV. The gray curves correspond to autonomous case when no stochastic input is present.

reaches threshold, the excitatory pulse from neuron 1 would be desynchronizing and makes the next firing of the two neurons further apart. The probability of the advancement of the phase of neuron 2 decreases in the presence of inhomogeneity (with *I*<sup>1</sup> > *I*2), and with larger inhomogeneity it is less likely that the firing of neuron 2 (low frequency neuron) exceeds the firing of neuron 1 (high frequency neuron). When neuron 2 fires, before neuron 1 has reached the threshold, the excitatory pulse to the low frequency neuron will be synchronizing and if the voltage of neuron 2 is in the range [*vth* − -<sup>21</sup>, *vth*] at the time of the firing of neuron 1, the neurons maintain causal master-slave firing. Further increasing the inhomogeneity lowers the probability of the voltage of the low frequency neuron reaching the range [*vth* − -<sup>21</sup>, *vth*] at the time of the firing of the high frequency neuron, which results in the reduction of the spike trains correlation. A similar argument can explain the other notable rise and fall of the correlation which is seen in 1:2 locking zone of the noiseless system.

With symmetric bidirectional couplings, maximum correlation is obtained when the neurons are of the same firing rate (**Figure 3B**). When the neurons have equal firing rates (with *I*<sup>1</sup> = *I*2) and in the absence of noise, each of the neurons can play the role of the master in a causal master-slave firing: in this case the connection from the master is synchronizing and the other connection has a desynchronizing effect (Bayati and Valizadeh, 2012). In the presence of small amplitude noise, the system can maintain causal locking by interchanging the role of two connections as synchronizing and desynchronizing. Suppose the firing of neuron 1 (master) is followed by the firing of neuron 2 (slave). Firing of neuron 2 exerts an excitatory pulse on neuron 1 but the phase advance of neuron 1 is relatively small because of the weak response of the LIF neuron at the beginning of its cycle (Mirollo and Strogatz, 1990). So it is probable that neuron 2 fires before neuron 1 reaches the threshold, then the excitatory pulse to neuron 1 would be synchronizing and neuron 1 fires immediately at the time it receives the pulse if its voltage is within the range [*vth* − -<sup>12</sup>, *vth*](note that the argument holds also in the presence of an absolute refractory period where the desynchonizing pulse from the slave neuron is ineffective). In the presence of inhomogeneity, it is the high frequency neuron that more probably plays the role of the master in a locked causal firing in the absence of the noise. In this case, in the presence of noise, inhomogeneity increases the probability that the voltage of low frequency neuron takes a value outside the range [*vth* − -<sup>21</sup>, *vth*] at the time of the firing of the high frequency neuron, which reduces the correlation of spike trains as can be seen in **Figure 3B** for small values of inhomogeneity. For larger values of inhomogeneity, a bump can be seen again which belongs to the other main locking zone of the system in the absence of noise.

Intuitively, the relative amplitudes of noise and recurrent stimulations determine the behavior of the system and the most notable results can be expected when these two sources are of the same order, i.e., when neither the external noises nor recurrent stimulations are dominant. The results of **Figures 3A,B** are produced in this regime. For larger values of the noise amplitude, qualitative behavior of the system becomes more similar to the uncoupled system as shown in **Figures 3C,D**. For all partially correlated inputs, correlation of the spike trains is independent of the inhomogeneity and no signature of the locking zones is observed in the presence of large amplitude noises. It is only for common noise (γ = 1) that the effect of the unidirectional direct connection can be seen in the presence of strong noise in the region of the main locking zone.

In **Figure 4** we have plotted correlation of spike trains as a function of input correlation to inspect the effect of changing the correlation of the stochastic inputs on the correlation of the spike trains for a fixed value of the synaptic strength. When the noise amplitude is not large, depending on the mismatch, different dependencies of the output correlation to the input correlation can be observed (**Figures 4A,B**. Notably with changing mismatch it is possible to generate, for example, a system with higher sensitivity to the input correlation in different ranges of input correlation, or a negative slope ρ*T*(*c*). Comparing with the results of **Figure 3** it can be deduced that high sensitivities on the input correlation is seen on the main locking zone (where the neurons are causally locked in 1:1 zone in the absence of the noise), and a negative slope is seen between two main locking zones. Again, as can be seen in **Figures 4C,D**, strong noises wash the signature of the direct couplings, and ρ*T*(*c*) for large amplitude noises is qualitatively similar to the uncoupled neurons.

Impact of direct connections on the detection of the input correlation of low amplitude noisy inputs is more apparent in a plot of MCS. In **Figure 5A** we have plotted *ST*(0, 0.5) as a function of noise amplitude for several values of synaptic strength, for unidirectionally coupled neurons and in the presence of a small mismatch in the intrinsic firing rates. As shown in **Figure 5A**, a forward monosynaptic connection (from high frequency to low frequency neuron) can considerably change the performance of the heterogeneous system in detecting variable input correlation.

In an intermediate synaptic strength (-<sup>21</sup> = 1) MCS shows a faster growth and a higher maximum in relatively small amplitude noise. Further increasing of the synaptic strength or the noise amplitude reduces the performance of the system in the detection of the input correlation. With very large noise amplitudes, the

effect of the direct connections is washed out and all the curves,

rate of the neurons and ρ*<sup>T</sup>* (*c*) are shown for the corresponding curves in **(A)**, respectively. Shadings in **(C)** are guide to eye for a comparison

including that of the uncoupled neurons, merge together and the MCS smoothly increases with noise amplitude. Overall increase of the correlation of the spike trains is an intuitive expectation when direct excitatory couplings are present in the systems (although this can be dependent on the type of excitability of the neurons). But how can direct connections increase the sensitivity to the changes in input correlation? In **Figure 5B** we have shown the geometric mean of the firing rate of the two neurons <sup>√</sup>ν1ν<sup>2</sup> for the curves plotted in **Figure 5A**. Note that ν<sup>2</sup> may be different from the intrinsic firing rate of neuron 2 because of the presence of an excitatory afferent synapse. The results show that the increase in the mean correlation susceptibility cannot be attributed to the increase of the mean firing rate of neurons, since then, larger coupling constants would lead to more sensitivity as they increase the mean firing rate of the

system. A simple explanation can be found in **Figure 5C**: the degree of amplification of the output correlation depends on the input correlation. A suitable choice of the synaptic strength would result in more amplification for higher input correlations and would increase the slope of ρ*T*(*c*). Increasing the synaptic strength further, decreases the sensitivity due to the saturation of the correlation of the spike trains for the upper bound of the input correlation. In calculating MCS we have considered the range [0, 0.5] for the input correlation. Reducing the upper bound of this range increases the synaptic strength which saturates the correlation of the spike trains, so the synaptic strength which gives the maximum sensitivity increases with decreasing the range over which the mean sensitivity is calculated.

coupling constant is increased. Vertical dotted lines are plotted to show

where the mean sensitivity is maximized.

The *best* synaptic strength, which maximizes sensitivity, depends also on the mismatch between the intrinsic firing rate of the neurons as can be implicitly deduced from the results shown in **Figures 3A,B**. In **Figure 5D** we have shown MCS as a function of the strength of the forward unidirectional coupling for three values of mismatch. Optimum value of synaptic strength is larger when the intrinsic firing rate of the neurons are more different. Plots of the spike train correlation ρ for upper limiting value of the input correlation *c* = 0.5 again shows that the maximum mean sensitivity in this range is obtained when the spike train correlation is not saturated for the upper bound of the range of *c* (**Figure 5E**).

All the results presented in this study have focused on the degree of zero-lag synchrony which is measured by the zero-lag cross correlation of the binned spike trains with small bin size. In the presence of an inhomogeneity and with asymmetric direct connections, it is possible that the maximum correlation of the spike trains appears in non-zero lag. In **Figure 6** we have shown the cross-correlation coefficient of spike trains as a function of the time lag for three values of noise strength and two values of the input correlation (*c* = 0 and *c* = 0.5). It can be seen that the maximum cross correlation for all the values appears in zero time lag (more precisely at a time lag equal to one simulation time step). Presence of other maximums is an indicator of almost periodic firing of the neurons which arises from the suprathreshold mean and the small amplitude stochastic fluctuations of the input current. Results in **Figure 6** are presented for one forward unidirectional coupling and sample values of inhomogeneity and synaptic strength. The results for other parameters are similar while the system is in the main locking zone in the absence of noise. This result shows a drawback of the simplified models we have used: LIF neurons with pulsatile instantaneous couplings can be synchronized with zero phase lag even in the presence of frequency mismatch, which is revealed as a maximum in correlation at zero lag (one simulation time step) when a small amplitude noise is added. Both mismatch and delay (synaptic and axonal) can be source of phase lag, when the neurons are modeled by limit cycle oscillators and more realistic models are used for synaptic currents. Our results are still valid when such phase lags are small, of the order of the time bins in the calculation of the correlation.

Above results were obtained for bidirectional symmetric couplings or for one unidirectional coupling. To find the *best* configuration through which direct couplings can improve the performance of the system in the detection of a variable input correlation, we have tested mutual couplings with different ratios of forward -<sup>21</sup> and backward -<sup>12</sup> connections. While the synaptic cost (sum of two synaptic strengths) is kept constant, different configurations can be designed by changing the ratio of the coupling constants *r* = -21/-<sup>12</sup> (**Figures 7A,B**). In the absence of mismatch, the best configuration is that which preserves symmetry, i.e., the best performance results with equal forward and backward couplings. On the other hand, in the presence of mismatch, an asymmetric arrangement of couplings in which the forward coupling (from the high frequency neuron) is larger, improves the performance of the system. Interestingly, asymmetric excitatory couplings in favor of backward coupling (from the low frequency neuron), significantly decreases the sensitivity of the system since it plays the role of desynchronizing coupling as discussed above.

## **4. DISCUSSION**

Both direct connections and common inputs can be sources of the correlated activity of neurons in the nervous system. Effect of direct connections is widely studied as a general problem in dynamical systems and in particalur in nervous systems (Kuramoto, 1991; Strogatz and Mirollo, 1991; Abbott and van Vreeswijk, 1993). Stochastic inputs are usually a source of temporal disorder but spatial order can be induced in a neuronal pool when the neurons share stochastic inputs from common sources (Binder and Powers, 2001; Türker and Powers, 2001, 2004). Because of the possible cooperative/competitive effects of common inputs and direct connections, interesting results can be

correlation *c* = 0 and *c* = 0.5. The results are shown for three different noise amplitudes (shown above each panel) and two different values of

the high-frequency to the low-frequency neuron. The difference between two curves at zero lag gives MCS which has been shown in the plots by arrows.

**(A,B)** MCS is plotted against sum of synaptic weights for **(A)** homogeneous system δ = 0, and **(B)** inhomogenous system δ = 0.1. Different curves are plotted for different ratios *r* of forward and backward couplings indicated in the legends. When the neurons have equal intrinsic firing rates, symmetric configuration *r* = 1 shows the best performance with a suitable choice of synaptic strengths. For inhomogeneous case when the imbalance of couplings is in favor of forward coupling (from the high frequency to low frequency neuron) the sensitivity is considerably improved. When the backward coupling is larger *r* < 1, the system performance is quite poor. As is shown in axes labels, MCS is calculated over the range [0, 0.5] of input correlation.

expected when they are concurrently present in a system (Ostojic et al., 2009; Ly and Ermentrout, 2010; Tabareau et al., 2010; Zilli and Hasselmo, 2010; Rosenbaum and Josic, 2011a; Ly et al., ´ 2012). In this study we have numerically inspected the effect of correlated stochastic inputs on the correlation of spike trains of two coupled LIF neurons. We have mainly focused on the correlation of spike trains when correlated small amplitude noises were imposed on a system of two coupled neurons, and the neurons were regularly and synchronously firing in the absence of noise. We have shown that such a system shows high sensitivity to the changes of input correlation, and therefore can be a suitable detector of the correlation in small amplitude noises. To study the system in a more general framework, we have considered neurons with different intrinsic firing rates. We have assumed neurons have equal membrane time constants, and inhomogeneity is imposed on the system by feeding the neurons with unequal suprathreshold constant currents. The inhomogeneity, determined by the difference in the mean input currents, along with synaptic strengths are the key-parameters that specify the response of the system to stochastic inputs.

While for uncoupled neurons the output correlation is a monotonically decreasing function of inhomogeneity, for coupled neurons with low noise amplitudes, spike trains correlation can be increased by increasing inhomogeneity in some ranges. This result holds for sufficiently small noise amplitudes and the system inherits this property from *n*:*m* locking zones for the autonomous system when there is no stochastic input present. This introduces inhomogeneity as an important parameter with non-trivial impact on the correlation of spike trains in coupled systems.

Another feature of the system is that the two sources of correlation, correlated inputs and direct excitatory connections, do not necessarily cooperate in the formation of correlated spike trains. For uncoupled neurons output correlation is a monotonically increasing function of input correlation and for weakly correlated inputs, the slope decreases with lowering noise amplitude (De La Rocha et al., 2007; Shea-Brown et al., 2008) and with increasing mismatch. With different choices of the synaptic strengths and the inhomogeneity, it is possible to change functional form of correlation transfer (dependence of output correlation to the input correlation) and design a system with different sensitivity to the input correlation. In particular, it is possible to design a system with negative mean slope of correlation transfer, showing a case with destructive effect of common noises on the correlation of spike trains, or a system with maximum sensitivity to the changes in input correlation in a given range by maximizing the slope of correlation transfer. The latter proposes that direct connections can increase the sensitivity of the system to the correlation of the neuron's stochastic inputs, especially when the noises are small amplitude. We have further shown that for a homogeneous system (where the neurons have equal intrinsic firing rates), the best configuration of the couplings which maximizes the mean sensitivity of the system in a given range, is a symmetric configuration with equal coupling constants. On the other hand, in the presence of inhomogeneity, an asymmetric configuration in which the synaptic constant from the high frequency neuron to the low frequency neuron is larger, improves the sensitivity. In either case, there is an optimum value of the synaptic constant which maximizes the sensitivity.

Competitive learning through conventional spike timingdependent plasticity (STDP) in feed-forward networks leads to the potentiation of the synapses which convey correlated data and depression of those with uncorrelated activity (Babadi and Abbott, 2010). How does STDP change the lateral connections transverse to the path of data flow? It has been shown that in the recurrent networks, asymmetric connections arise through STDP and in the presence of inhomogeneity, such an asymmetric change is in favor of the connection from the high frequency to the low frequency neuron (Takahashi et al., 2009; Bayati and Valizadeh, 2012). Our results show that asymmetric connections can enhance the performance of inhomogeneous systems in the detection of input correlation, and interestingly such an optimum configuration of connections emerges through STDP (with asymmetric profile) in inhomogeneous neuronal pools (Bayati and Valizadeh, 2012).

Type of neuronal excitability can also affect the correlation transfer in neuronal pools (Galán et al., 2008; Abouzeid and Ermentrout, 2009; Barreiro et al., 2010). Phase resetting curve characterizes how small perturbations influence the oscillator's subsequent timing or phase. It has been recently shown that uncoupled type-II neurons with both negative and positive regions in their PRC transfer correlations more faithfully when the correlation is calculated over short time bins (Abouzeid and Ermentrout, 2011). Since the phase of a LIF neuron always advances in response to the external pulses, the results for LIF neurons are likely to apply for type-I neurons.

Correlation of spike trains over such small time bins that we have used *T* = 0.5 ms, is a measure of (almost) precise alignment of the action potentials. Similar results were obtained

## **REFERENCES**


and extracellular features. *J. Neurophysiol.* 92, 600–608. doi: 10.1152/jn.01170.2003


when we repeated the experiments with *T* = 1 ms but we expect qualitatively different results when the correlation of the spike counts is measured over the time scales comparable, or larger than the mean inter-spike interval. Less sensitivity to the inhomogeneity is expected when the correlation is evaluated over large time bins, but the effect of direct couplings warrants further studies to find out if correlation in small amplitude stochastic inputs can be revealed in co-variation of spike trains of coupled neurons over large time scales.

## **ACKNOWLEDGMENTS**

Authors gratefully acknowledge Bahman Farnoudi for proof reading of the manuscript, and the reviewers Germán Mato and Zachary P. Kilpatrick, for careful reading of the manuscript and giving valuable comments.

with the spatial correlation of the stochastic input stimulus. *Phys. Rev. Lett.* 93:48101. doi: 10.1103/PhysRevLett.93.048101


J. J. (1999). Synchronization of noisy systems by stochastic signals. *Phys. Rev. E* 60:284. doi: 10.1103/PhysRevE.60.284


population. *Nature* 440, 1007–1012. doi: 10.1038/nature04701


protects from noise. *PLoS Comput. Biol.* 6:e1000637. doi: 10.1371/journal.pcbi.1000637


*PLoS Compu. Biol.* 8:e1002306. doi: 10.1371/journal.pcbi.1002306


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 24 July 2013; published online: 14 August 2013. Citation: Bolhasani E, Azizi Y and Valizadeh A (2013) Direct connections assist neurons to detect correlation in small amplitude noises. Front. Comput. Neurosci. 7:108. doi: 10.3389/fncom. 2013.00108*

*Copyright © 2013 Bolhasani, Azizi and Valizadeh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A generative spike train model with time-structured higher order correlations

#### *James Trousdale1 \*, Yu Hu2, Eric Shea-Brown2,3 and Krešimir Josic´ 1,4*

*<sup>1</sup> Department of Mathematics, University of Houston, Houston, TX, USA*

*<sup>2</sup> Department of Applied Mathematics, University of Washington, Seattle, WA, USA*

*<sup>3</sup> Program in Neurobiology and Behavior, University of Washington, Seattle, WA, USA*

*<sup>4</sup> Department of Biology and Biochemistry, University of Houston, Houston, TX, USA*

#### *Edited by:*

*Robert Rosenbaum, University of Pittsburgh, USA*

#### *Reviewed by:*

*John A. Hertz, Niels Bohr Institute, Denmark Sonja Grün, Research Center Juelich, Germany*

#### *\*Correspondence:*

*James Trousdale, Department of Mathematics, University of Houston, 641 PGH Building, Houston, TX 77204-3008, USA e-mail: jrtrousd@math.uh.edu*

Emerging technologies are revealing the spiking activity in ever larger neural ensembles. Frequently, this spiking is far from independent, with correlations in the spike times of different cells. Understanding how such correlations impact the dynamics and function of neural ensembles remains an important open problem. Here we describe a new, generative model for correlated spike trains that can exhibit many of the features observed in data. Extending prior work in mathematical finance, this *generalized thinning and shift* (GTaS) model creates marginally Poisson spike trains with diverse temporal correlation structures. We give several examples which highlight the model's flexibility and utility. For instance, we use it to examine how a neural network responds to highly structured patterns of inputs. We then show that the GTaS model is analytically tractable, and derive cumulant densities of all orders in terms of model parameters. The GTaS framework can therefore be an important tool in the experimental and theoretical exploration of neural dynamics.

**Keywords: correlations, spiking neurons, neuronal networks, cumulant, neuronal modeling, neuronal network model, point processes**

## **1. INTRODUCTION**

Recordings across the brain suggest that neural populations spike collectively—the statistics of their activity as a group are distinct from that expected in assembling the spikes from one cell at a time (Bair et al., 2001; Salinas and Sejnowski, 2001; Harris, 2005; Averbeck et al., 2006; Schneidman et al., 2006; Shlens et al., 2006; Pillow et al., 2008; Ganmor et al., 2011; Bathellier et al., 2012; Hansen et al., 2012; Luczak et al., 2013). Advances in electrode and imaging technology allow us to explore the dynamics of neural populations by simultaneously recording the activity of hundreds of cells. This is revealing patterns of collective spiking that extend across multiple cells. The underlying structure is intriguing: For example, higher-order interactions among cell groups have been observed widely (Amari et al., 2003; Schneidman et al., 2006; Shlens et al., 2006, 2009; Ohiorhenuan et al., 2010; Ganmor et al., 2011; Vasquez et al., 2012; Luczak et al., 2013). A number of recent studies point to mechanisms that generate such higher-order correlations from common input processes, including unobserved neurons. This suggests that, in a given recording or given set of neurons projecting downstream, higher-order correlations may be quite ubiquitous (Barreiro et al., 2010; Macke et al., 2011; Yu et al., 2011; Köster et al., 2013). Moreover, these *higher-order correlations* may impact the firing statistics of downstream neurons (Kuhn et al., 2003), the information capacity of their output (Ganmor et al., 2011; Cain and Shea-Brown, 2013; Montani et al., 2013), and could be essential in learning through spike-time dependent synaptic plasticity (Pfister and Gerstner, 2006; Gjorgjieva et al., 2011).

What exactly is the impact of such collective spiking on the encoding and transmission of information in the brain? This question has been studied extensively, but much remains unknown. Results to date show that the answers will be varied and rich. Patterned spiking can impact responses at the level of single cells (Salinas and Sejnowski, 2001; Kuhn et al., 2003; Xu et al., 2012) and neural populations (Amjad et al., 1997; Tetzlaff et al., 2003; Rosenbaum et al., 2010, 2011). Neurons with even the simplest of non-linearities can be highly sensitive to correlations in their inputs. Moreover, such non-linearities are sufficient to accurately decode signals from the input to correlated neural populations (Shamir and Sompolinsky, 2004).

An essential tool in understanding the impact of collective spiking is the ability to generate artificial spike trains with a predetermined structure across cells and across time (Brette, 2009; Gutnisky and Josic, 2009; Krumin and Shoham, 2009; Macke ´ et al., 2009). Such synthetic spike trains are the grist for testing hypotheses about spatiotemporal patterns in coding and dynamics. In experimental studies, such spike trains can be used to provide structured stimulation of single cells across their dendritic trees via glutamate uncaging (Gasparini and Magee, 2006; Reddy et al., 2008; Branco et al., 2010; Branco and Häusser, 2011). In addition, entire populations of neurons can be activated via optical stimulation of microbial opsins (Han and Boyden, 2007; Chow et al., 2010). Computationally, they are used to examine the response of non-linear models of downstream cells (Carr et al., 1998; Salinas and Sejnowski, 2001; Kuhn et al., 2003).

Therefore, much effort has been devoted to developing statistical models of population activity. A number of flexible, yet tractable probabilistic models of joint neuronal activity have been proposed. Pairwise correlations are the most common type of interactions obtained from multi-unit recordings. Therefore, many earlier models were designed to generate samples of neural activity patterns with predetermined first and second order statistics (Brette, 2009; Gutnisky and Josic, 2009; Krumin and ´ Shoham, 2009; Macke et al., 2009). In these models, higher-order correlations are not explicitly and separately controlled.

A number of different models have been used to analyze higher-order interactions. However, most of these models assume that interactions between different cells are instantaneous (or near-instantaneous) (Kuhn et al., 2003; Johnson and Goodman, 2009; Staude et al., 2010; Shimazaki et al., 2012). A notable exception is the work of Bäuerle and Grübel (2005), which developed such methods for use in financial applications. In these previous efforts, correlations at all orders were characterized by the increase, or decrease, in the probability that groups of cells spike together at the same time, or have a common temporal correlation structure regardless of the group.

The aim of the present work is to provide a statistical method for generating spike trains with more general correlation structures across cells and time. Specifically, we allow distinct temporal structure for correlations at pairwise, triplet, and all higher orders, and do so separately for different groups of cells in the neural population. Our aim to describe a model that can be applied in neuroscience, and can potentially be fit to emerging datasets.

A sample realization of a multivariate generalized thinning and shift (GTaS) process is shown in **Figure 1**. The multivariate spike train consists of six marginally Poisson processes. Each event was either uncorrelated with all other events across the population, or correlated in time with an event in all other spike trains. This model was configured to exhibit activity that cascades through a sequence of neurons. Specifically, neurons with larger index tend to fire later in a population wide event (this is similar to a synfire chain (Abeles, 1991), but with variable timing of spikes within the cascade). In **Figure 1B**, we plot the "population cross-cumulant density" for three chosen neurons—the summed activity of the population triggered by a spike in a chosen cell. The center of mass of this function measures the average latency by which spikes of the neuron in question precede those of the rest of the population (Luczak et al., 2013). Finally, **Figure 1C** shows the third-order cross-cumulant density for the three neurons. The triangular support of this function is a reflection of a synfirelike cascade structure of the spiking shown in the raster plot of panel **(A)**: when firing events are correlated between trains, they tend to proceed in order of increasing index. We demonstrate the impact of such structured activity on a downstream network in section 2.2.3.

## **2. RESULTS**

Our aim is to describe a flexible multivariate point process capable of generating a range of high order correlation structures. To do so, we extend the *TaS* (thinning and shift) model of temporally- and spatially-correlated, marginally Poisson counting processes (Bäuerle and Grübel, 2005). The TaS model itself generalizes the SIP and MIP models (Kuhn et al., 2003) which have been used in theoretical neuroscience (Tetzlaff et al., 2008; Rosenbaum et al., 2010; Cain and Shea-Brown, 2013). However, the TaS model has not been used as widely. The original TaS model is too rigid to generate a number of interesting activity patterns observed in multi-unit recordings (Ikegaya et al., 2004; Luczak et al., 2007, 2013). We therefore developed the *GTaS* which allows for a more diverse temporal correlation structure.

We begin by describing the algorithm for sampling from the GTaS model. This constructive approach provides an intuitive understanding of the model's properties. We then present a pair of examples, the first of which highlights the utility of the

**FIGURE 1 | (A)** Raster plot of event times for an example multivariate Poisson process **X** = (*X*1,..., *X*6) generated using the methods presented below. This model exhibits independent marginal events (blue) and population-level, chain-like events (red). **(B)** Some second order population cumulant densities (i.e., second order correlation between individual unit activities and population activity) for this model (Luczak et al., 2013). Greater mass to the right (resp. left) of τ = 0 indicates that the cell tends to lead (resp. follow) in pairwise-correlated events. **(C)** Third-order cross-cumulant density for processes *X*1, *X*2, *X*3. The quantity κ**<sup>X</sup>** <sup>123</sup>(τ1, τ2) yields the probability of observing spikes in cells 2 and 3 at an offset τ1, τ<sup>2</sup> from a spike in cell 1, respectively, in excess of what would be predicted from the first and second order cumulant structure. All quantities are precisely defined in the Methods. Note: system parameters necessary to reproduce results are given in the Appendix for all figures.

GTaS framework. The second example demonstrates how sample point processes from the GTaS model can be used to study population dynamics. Next, we present the analysis which yields the explicit forms for the cross-cumulant densities derived in the context of the examples. We do so by first establishing a useful distributional representation for the GTaS process, paralleling Bäuerle and Grübel (2005). Using this representation, we derive cross-cumulants of a GTaS counting process, as well as explicit expressions for the cross-cumulant densities. After explaining the derivation at lower orders, we present a theorem which describes cross-cumulant densities at all orders.

## **2.1. GTaS MODEL SIMULATION**

The GTaS model is parameterized first by a rate λ which determines the intensity of a "mother process"—a Poisson process on R. The events of the mother process are marked, and the markings determine how each event is distributed among a collection of *N* daughter processes. The daughter processes are indexed by the set D = {1,..., *N*}, and the set of possible markings is the power set 2D, the set of all subsets of D. We define a probability distribution *p* = (*pD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>, assigning a probability to each possible marking, *D*. As we will see, *pD* determines the probability of a joint event in all daughter processes with indices in the set *D*. Finally, to each marking, *D*, we assign a probability distribution *QD*, giving a family of shift (jitter) distributions (*QD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>. Each (*QD*) is a distribution over R*N*.

The rate λ, the distribution *p* over the markings, and the family of jitter distributions (*QD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>, define a vector **X** = (*X*1,..., *XN*) of dependent daughter Poisson processes described by the following algorithm, which yields a single realization (see **Figure 2**):

1. Simulate the mother Poisson process of rate λ on R, generating a sequence of event times {*t j* }. (**Figure 2A**).

distribution *pD*, and project the event at time *t<sup>j</sup>* to the subsets with indices in *D<sup>j</sup>* . The legend indicates the colors assigned to three possible markings in this example. **(C)** Step 3: For each pair (*t<sup>j</sup>* , *D<sup>j</sup>* ) generated in the previous two steps, draw **Y***<sup>j</sup>* from *QDj* , and shift the event times in the daughter processes by the corresponding values *Y <sup>j</sup> i* .


Hence copies of each point of the mother process are placed into daughter processes after a shift in time. A primary difference between the GTaS model and the TaS model presented in Bäuerle and Grübel (2005) is the dependence of the shift distributions *QD* on the chosen marking. This allows for greater flexibility in setting the temporal cumulant structure.

## **2.2. EXAMPLES**

## *2.2.1. Relation to SIP/MIP processes*

Two simple models of correlated, jointly Poisson processes were defined in Kuhn et al. (2003). The resulting spike trains exhibit spatial correlations, but only instantaneous temporal dependencies. Each model was constructed by starting with independent Poisson processes, and applying one of two elementary point process operations: superposition and thinning (Cox and Isham, 1980). We show that both models are special cases of the GTaS model.

In the *single interaction process* (SIP), each marginal process *Xi* is obtained by merging an independent Poisson process with a common, global Poisson process. That is,

$$X\_i(\cdot) = Z\_i(\cdot) + Z\_c(\cdot), \quad i = 1, \ldots, N,$$

where *Zc* and each *Zi* are independent Poisson counting processes on R with rates λ*c*, λ*i*, respectively. An SIP model is equivalent to a GTaS model with mother process rate λ = λ*<sup>c</sup>* + -*N <sup>i</sup>* <sup>=</sup> <sup>1</sup> λ*i*, and marking probabilities

$$p\_D = \begin{cases} \frac{\lambda\_i}{\lambda} & D = \{i\} \\ \frac{\lambda\_\zeta}{\lambda} & D = \mathbb{D} \\ 0 & \text{otherwise} \end{cases}$$

Note that if λ*<sup>c</sup>* = 0, each spike will be assigned to a different process *Xi*, resulting in *N* independent Poisson processes. Lastly, each shift distribution is equal to a delta distribution at zero in every coordinate (i.e., *qD*(*y*1,..., *yN*) ≡ *<sup>N</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> <sup>δ</sup>(*yi*) for every *<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>). Thus, all joint cumulants (among distinct marginal processes) of orders 2 through *N* are delta functions of equal magnitude, λ*p*D.

The *multiple interaction process* (MIP) consists of *N* Poisson processes obtained from a common mother process with rate λ*<sup>m</sup>* by *thinning* (Cox and Isham, 1980). The *i*th daughter process is formed by independent (across coordinates and events) deletion of events from the mother process with probability *p* = (1 − ). Hence, an event is common to *k* daughter processes with probability *k*. Therefore, if we take the perspective of retaining, rather than deleting events, the MIP model is equivalent to a GTaS process with λ = λ*m*, and *pD* = |*D*<sup>|</sup> (1 − )*d*−|*D*<sup>|</sup> . As in the SIP case, the shift distributions are singular in every coordinate. Below, we present a general result (Theorem 1.1) which immediately yields as a corollary that the MIP model has cross-cumulant functions which are δ functions in all dimensions, scaled by *k*, where *k* is the order of the cross-cumulant.

#### *2.2.2. Generation of synfire-like cascade activity*

The GTaS framework provides a simple, tractable way of generating cascading activity where cells fire in a preferred order across the population—as in a synfire chain, but (in general) with variable timing of spikes (Abeles, 1991; Abeles and Prut, 1996; Aertsen et al., 1996; Aviel et al., 2002; Ikegaya et al., 2004). More generally, it can be used to simulate the activity of *cell assemblies* (Hebb, 1949; Harris, 2005; Buzsáki, 2010; Bathellier et al., 2012), in which the firing of groups of neurons is likely to follow a particular order.

In the Introduction, we briefly presented one example in which the GTaS framework was used to generate synfire-like cascade activity (see **Figure 1**), and we present another in **Figure 3**. In what follows, we will present the explicit definition of this second model, and then derive explicit expressions for its cumulant structure. Our aim is to illustrate the diverse range of possible correlation structures that can be generated using the GTaS model.

Consider an *N*-dimensional counting process **X** = (*X*1,..., *XN*) of GTaS type, where *N* ≥ 4. We restrict the marking distribution so that *pD* ≡ 0 unless |*D*| ≤ 2 or *D* = D. That is, events are either assigned to a single, a pair, or all daughter processes. For sets *D* with |*D*| = 2, we set *QD* ∼ *N* (0,)—a Gaussian distributions of zero mean and some specified covariance. The choice of the precise pairwise shift distributions is not important. Shifts of events attributed to a single process have no effect on the statistics of the multivariate process—this will become clear in section 2.3, when we exhibit that a GTaS process is equivalent in distribution to a sum of independent Poisson processes. In that context, the shifts of marginal events are applied to the event times of only one of these Poisson processes, which does not impact its statistics.

It remains to define the jitter distribution for events common to the entire population of daughter processes, i.e., events marked by D. We will show that we can generate cascading activity, and analytically describe the resulting correlation structure. We will say that a random variable *T* follows the exponential distribution Exp(α) if it has probability density

$$f(t|\alpha) = \alpha e^{-\alpha t} \Theta(t),$$

where (*t*) is the Heaviside step function. We generate random vectors **Y** ∼ *Q*D according to the following rule, for each *i* = 1,..., *N*:

1. Generate independent random variables *Ti* ∼ Exp(α*i*) where α*<sup>i</sup>* > 0. 2. Set *Yi* = *i <sup>j</sup>* <sup>=</sup> <sup>1</sup> *Tj*.

$$\text{In particular, note that these shift times satisfy } Y\_N \ge \dots \ge Y\_2 \ge 1$$

*Y*<sup>1</sup> ≥ 0, indicating the chain-like structure of these joint events. From the definition of the model and our general result (Theorem 1.1) below, we immediately have that κ**<sup>X</sup>** *ij*(τ), the second

**FIGURE 3 | An example of a six dimensional GTaS model exhibiting synfire-like cascading firing patterns. (A)** A raster-plot of spiking activity over a 100 ms window. Blue spikes indicate either marginal or pairwise events (i.e., events corresponding to markings for sets *D* ⊂ D with |*D*| ≤ 2). Red spikes indicate population-wide events which have shift-times given by cumulative sums of independent exponentials, as described in the text. Arrows indicate the location of the first spike in the cascade. **(B)** A

second-order cross-cumulant κ**<sup>X</sup>** <sup>13</sup> (black line) of this model is composed of contributions from two sources: correlations due to second-order markings, which have Gaussian shifts (*c*<sup>2</sup> <sup>13</sup>—dashed red line), and correlations due to the occurrence of population wide events (*c<sup>N</sup>* <sup>13</sup>—dashed blue line). **(C)** Density plots of the third-order cross-cumulant density for triplets **(i)** (1, 2, 3) and **(ii)** (1, 2, 4)—the latter is given explicitly in Equation (6). System parameters are given in the Appendix.

order cross-cumulant density for the process (*i*, *j*), is given by

$$\kappa\_{ij}^{\mathbf{X}}(\mathfrak{r}) = c\_{ij}^2(\mathfrak{r}) + c\_{ij}^N(\mathfrak{r}),\tag{1}$$

where

$$c\_{ij}^2(\mathbf{r}) = \lambda p\_{\{i,j\}} \int q\_{\{i,j\}}^{\{i,j\}}(t, t + \mathbf{r}) dt,$$

$$c\_{ij}^N(\mathbf{r}) = \lambda p\_{\mathbb{D}} \int q\_{\mathbb{D}}^{\{i,j\}}(t, t + \mathbf{r}) dt \tag{2}$$

define the contributions to the second order cross-cumulant density by the second-order, Gaussian-jittered events and the population-level events, respectively. Therefore, correlations between spike trains in this case reflect distinct contributions from second order and higher order events. The functions *qD D* indicate the densities associated with the distribution *QD*, projected to the dimensions of *D* . All statistical quantities are precisely defined in the methods.

By exploiting the hierarchical construction of the shift times, we can find an expression for the joint density *q*D, necessary to explicitly evaluate Equation (1). For a general *N*-dimensional distribution,

$$f(\boldsymbol{\jmath}\_1, \ldots, \boldsymbol{\jmath}\_N) = f(\boldsymbol{\jmath}\_N | \boldsymbol{\jmath}\_1, \ldots, \boldsymbol{\jmath}\_{N-1}) f(\boldsymbol{\jmath}\_{N-1} | \boldsymbol{\jmath}\_1, \ldots, \boldsymbol{\jmath}\_{N-2}) \cdots$$

$$\cdot f(\boldsymbol{\jmath}\_2 | \boldsymbol{\jmath}\_1) f(\boldsymbol{\jmath}\_1) . \tag{3}$$

Since *Y*<sup>1</sup> ∼ Exp(α1), we have *f*(*y*1) = exp −α1*y*<sup>1</sup> (*y*1), where (*y*) is the Heaviside step function. Further, as (*Yi* − *Yi* <sup>−</sup> <sup>1</sup>)|(*Y*1,..., *Yi* <sup>−</sup> <sup>1</sup>) ∼ Exp(α*i*) for *i* ≥ 2, the conditional densities of the *yi*'s take the form

$$f(\boldsymbol{y}\_{i}|\boldsymbol{y}\_{1},\ldots,\boldsymbol{y}\_{i-1}) = f(\boldsymbol{y}\_{i}|\boldsymbol{y}\_{i-1}) = \alpha\_{i} \exp\left[-\alpha\_{i}(\boldsymbol{y}\_{i} - \boldsymbol{y}\_{i-1})\right],$$

$$\cdot \oplus (\boldsymbol{y}\_{i} - \boldsymbol{y}\_{i-1}), \quad i \ge 2.$$

Substituting this in to the identity Equation (3), we have

$$q\_{\mathbb{D}}(\boldsymbol{\chi}\_{1},\ldots,\boldsymbol{\chi}\_{N}) = \begin{cases} \boldsymbol{\alpha}\_{1}\exp\left[-\boldsymbol{\alpha}\_{1}\boldsymbol{\chi}\_{1}\right] \prod\_{i=2}^{N} \boldsymbol{\alpha}\_{i} & \boldsymbol{\chi}\_{N} \ge \ldots \ge \\\ \cdot \ \exp\left[-\boldsymbol{\alpha}\_{i}(\boldsymbol{\chi}\_{i} - \boldsymbol{\chi}\_{i-1})\right] & \boldsymbol{\chi}\_{2} \ge \boldsymbol{\chi}\_{1} \ge \boldsymbol{0} \\\ 0 & \text{otherwise} \end{cases} (4)$$

Using Theorem 1.1 (Equation A8) we obtain the *N*th order crosscumulant density (see the Methods),

$$\begin{aligned} \mathsf{x}\_{1}^{\mathsf{X}} & \quad \mathsf{x}\_{1} \mathsf{x}\_{1} \left( \mathsf{r}\_{1}, \dots, \mathsf{r}\_{N-1} \right) \\ &= \lambda p\_{\mathbb{D}} \int q\_{\mathbb{D}} (t, \ t + \mathsf{r}\_{1}, \dots, t + \mathsf{r}\_{N-1}) dt \\ &= \lambda p\_{\mathbb{D}} \cdot \begin{cases} \prod\_{i=1}^{N-1} \alpha\_{i+1} & \mathsf{r}\_{i} \geq \mathsf{r}\_{i-1} \\ \cdot \exp \left[ -\alpha\_{i+1} (\mathsf{r}\_{i} - \mathsf{r}\_{i-1}) \right] & i = 1, \dots, N-1, \\ 0 & \text{otherwise} \end{cases} \end{aligned} \tag{5}$$

where, for notational convenience, we define τ<sup>0</sup> = 0. A raster plot of a realization of this model is shown in **Figure 3A**. We note that the cross-cumulant densities of arbitrary subcollections of the counting processes **X** can be obtained by finding the appropriate marginalization of *q*D via integration of Equation (4). In the case that common distributions are used to define the shifts, symbolic calculation environments (i.e., Mathematica) can quickly yield explicit formulas for cross-cumulant densities. Mathematica notebooks for **Figure 1** available upon request.

As a particular example, we consider the cross-cumulant density of the marginal processes *X*1, *X*3. Using Equations (2, 4), we find

$$
\omega\_{13}^{N}(\mathfrak{r}) = \lambda p\_{\mathbb{D}} \Theta(\mathfrak{r}) \cdot \begin{cases}
\frac{\alpha\_{2}\alpha\_{3}}{\alpha\_{3} - \alpha\_{2}} \left\{ \exp\left[ -\alpha\_{2}\mathfrak{r} \right] - \exp\left[ -\alpha\_{3}\mathfrak{r} \right] \right\} & \alpha\_{2} \neq \alpha\_{3} \\
\alpha\_{2}\alpha\_{3}\mathfrak{r} \exp\left[ -\alpha\_{2}\mathfrak{r} \right] & \alpha\_{2} = \alpha\_{3} \\
\end{cases}.
$$

An expression for *c*<sup>2</sup> <sup>13</sup>(τ) may be obtained similarly using Equation (2) and recalling that *Q*{*i*, *<sup>j</sup>*} ≡ *N* (0,) for all *i*, *j*. In **Figure 3B**, we plot these contributions, as well as the full covariance density.

Similar calculations at third order yield, as an example,

$$\begin{aligned} \mathsf{x}\_{124}^{\mathsf{X}}(\mathsf{\tau}\_{1},\mathsf{\tau}\_{2}) &= \lambda \mathsf{p}\_{\mathsf{D}} \\ \cdot \begin{cases} \frac{\alpha\_{2}\alpha\_{3}\alpha\_{4}}{\alpha\_{4} - \alpha\_{3}} \exp\left[-\alpha\_{2}\mathsf{\tau}\_{1}\right] \left\{\exp\left[-\alpha\_{3}(\mathsf{\tau}\_{2} - \mathsf{\tau}\_{1})\right] \right. \\\ -\; \exp\left[-\alpha\_{4}(\mathsf{\tau}\_{2} - \mathsf{\tau}\_{1})\right] \left\{ \\\ \alpha\_{2}\alpha\_{3}\alpha\_{4}(\mathsf{\tau}\_{2} - \mathsf{\tau}\_{1})\exp\left[-\alpha\_{2}\mathsf{\tau}\_{1} - \alpha\_{3}(\mathsf{\tau}\_{2} - \mathsf{\tau}\_{1})\right] \end{cases} & \quad \alpha\_{3} \neq \alpha\_{4}, \end{aligned} (6)$$

where the cross-cumulant density κ**<sup>X</sup>** <sup>124</sup>(τ1, τ2) is supported only on τ<sup>2</sup> ≥ τ<sup>1</sup> ≥ 0. Plots of the third-order cross-cumulants for triplets (1, 2, 3) and (1, 2, 4) in this model are shown in **Figure 3C**. Note that, for the specified parameters, the conditional distribution of *Y*4—the shift applied to the events of *X*<sup>4</sup> in a joint population event—given *Y*<sup>2</sup> follows a gamma distribution, whereas *Y*3|*Y*<sup>2</sup> follows an exponential distribution, explaining the differences in the shapes of these two cross-cumulant densities.

General cross-cumulant densities of at least third order for the cascading model will have a form similar to that given in Equation (6), and will contain no signature of the correlation of strictly second order events. This highlights a key benefit of cumulants as a measure of dependence: although they agree with central moments up to third order, we know from Equation (23) below [or Equation (22) in the general case] that central moments necessarily exhibit a dependence on lower order statistics. On the other hand, cumulants are "pure" and quantify only dependencies at the given order which cannot be inferred from lower order statistics (Grün and Rotter, 2010).

One useful statistic for analyzing population activity through correlations is the *population cumulant density* (Luczak et al., 2013). The second order population cumulant density for cell *i* is defined by (see the Methods)

$$\kappa\_{i,\text{pop}}^{\mathbf{X}}(\mathfrak{r}) = \sum\_{j \neq i} \kappa\_{ij}^{\mathbf{X}}(\mathfrak{r}).$$

This function is linearly related to the spike-triggered average of the population activity conditioned on that of cell *i*. In **Figure 4** we show three different second-order population-cumulant functions for the cascading GTaS model of **Figure 3A**. When the second order population cumulant for a neuron is skewed to the right of τ = 0 (as is κ**<sup>X</sup>** <sup>1</sup>, pop—blue line), a neuron tends to precede other cells in the population in pairwise spiking events. Similarly, skewness to the left of τ = 0 (κ**<sup>X</sup>** <sup>6</sup>, pop—orange line) indicates a neuron which tends to trail other cells in the population in such events. A symmetric population cumulant density indicates a neuron is a follower *and* a leader. Taken together, these three second order population cumulants hint at the chain structure of the process.

Greater understanding of the joint temporal statistics in a multivariate counting process can be obtained by considering higherorder population cumulant densities. We define the third-order population cumulant density for the pair (*i*,*j*) to be

$$
\kappa\_{ij,\text{pop}}^{\mathbf{X}}(\mathfrak{r}\_1,\mathfrak{r}\_2) = \sum\_{k \neq i,j} \kappa\_{ijk}^{\mathbf{X}}(\mathfrak{r}\_1,\mathfrak{r}\_2).
$$

The third-order population cumulant density is linearly related to the spike-triggered population activity, conditioned on spikes in cells *i* and *j* separated by a delay τ1. In **Figures 4B–D**, we present three distinct third-order population cumulant densities. Examining κ**<sup>X</sup>** <sup>12</sup>, pop(τ1, τ2) (panel **B**), we see only contributions in the region τ<sup>2</sup> > τ<sup>1</sup> > 0, indicating that the pairwise event 1 → 2 often precedes a third spike elsewhere in the population (i.e., with a probability above chance). The population cumulant κ**<sup>X</sup>** <sup>34</sup>, pop(τ1, τ2) has contributions in two sections of the plane (panel **C**). Contributions in the region τ<sup>2</sup> > τ<sup>1</sup> > 0 can be understood following the preceding example, while contributions in the region τ<sup>2</sup> < 0 < τ<sup>1</sup> imply that the firing of other neurons tends to precede the joint firing event 3 → 4. Lastly, contributions to κ**<sup>X</sup>** <sup>16</sup>, pop(τ1, τ2) (panel **D**) are limited to 0 < τ<sup>2</sup> < τ1, indicating an above chance probability of joint firing events of the form 1 → *i* → 6, where *i* indicates a distinct neuron within the population.

A distinct advantage of the study of population cumulant densities as opposed to individual cross-cumulant functions in

cumulant for processes *X*1, *X*<sup>2</sup> in the cascading GTaS process. Concentration of the mass in different regions of the plane indicates temporal structure of events correlated between *X*1, *X*<sup>2</sup> relative to the remainder of the population (see the text). **(C)** Same as **(B)**, but for processes *X*3, *X*4. **(D)** Same as **(B)**, but for processes *X*1, *X*6. System parameters are given in the Appendix.

practical applications is related to data (i.e., sample size) limitations. In many practical applications, where the temporal structure of a collection of observed point processes is of interest, we often deal with a small, noisy samples. It may therefore be difficult to estimate third- or higher-order cumulants. Population cumulants partially circumvent this issue by *pooling* (Tetzlaff et al., 2003; Rosenbaum et al., 2010, 2011) (or summing) responses, to amplify existing correlations and average out the noise in measurements.

We conclude this section by noting that even cascading GTaS examples can be much more general. For instance, we can include more complex shift patterns, overlapping subassemblies within the population, different temporal processions of the cascade, and more.

#### *2.2.3. Timing-selective network*

The responses of single neurons and neuronal networks in experimental (Meister and Berry II, 1999; Singer, 1999; Bathellier et al., 2012) and theoretical studies (Jeffress, 1948; Hopfield, 1995; Joris et al., 1998; Thorpe et al., 2001; Gütig and Sompolinsky, 2006) can reflect the temporal structure of their inputs. Here, we present a simple example that shows how a network can be selective to fine temporal features of its input, and how the GTaS model can be used to explore such examples.

As a general network model, we consider *N* leaky integrateand-fire (LIF) neurons with membrane potentials *Vi* obeying

$$\frac{dV\_i}{dt} = -V\_i + \sum\_{j=1}^{N} \omega\_{ij} (\mathbf{F} \ast \mathbf{z}\_j)(t) + \mathbf{w}^{\text{in}} \mathbf{x}\_i(t), \quad i = 1, \dots, N. (7)$$

When the membrane potential of cell *i* reaches a threshold *V*th, an output spike is recorded and the membrane potential is reset to zero, after which evolution of *Vi* resumes the dynamics in Equation (7). Here *wij* is the synaptic weight of the connection from cell *j* to *i*, *w*in is the input weight, and we assume time to be measured in units of membrane time constants. The function *F* = τsyn−1*e*−(*<sup>t</sup>* <sup>−</sup> <sup>τ</sup>d)/τsyn(*t* − τd) is a delayed, unit-area exponential synaptic kernel with time-constant τsyn and delay τd. The output of the *i*th neuron is

$$z\_i(t) = \sum\_j \delta(t - t\_i^j),$$

where *t j <sup>i</sup>* is the time of the *j*th spike of neuron *i*. In addition, the input {*xi*}*<sup>N</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> is

$$\mathbf{x}\_i(t) = \sum\_j \mathbb{S}(t - s\_i^j),$$

where the event times {*s j i* } correspond to those of a GTaS counting process **X**. Thus, each input spike results in a jump in the membrane potential of the corresponding LIF neuron of amplitude *w*in. The particular network we consider will have a ring topology (nearest neighbor-only connectivity)—specifically, for *i*, *j* = 1,..., *N*, we let

$$w\_{ij} = \begin{cases} \nu^{\text{syn}} & i - j \mod N \equiv 1 \text{ or } N - 1 \\ 0 & \text{otherwise} \end{cases}$$

We further assume that all neurons are *excitatory*, so that *w*syn > 0.

A network of LIF neurons with synaptic delay is a minimal model which can exhibit fine-scale discrimination of temporal patterns of inputs without precise tuning (Izhikevich, 2006) (that is, without being carefully designed to do so, with great sensitivity to modification of network parameters). To exhibit this dependence we generate inputs from two GTaS processes. The first (the *cascading model*) was described in the preceding example. To independently control the mean and variance of relative shifts we replace the sum of exponential shifts with sums of gamma variates. We also consider a model featuring population-level events without shifts (the *synchronous model*), where the distribution *Q*D is a δ distribution at zero in all coordinates.

The only difference between the two input models is in the temporal structure of joint events. In particular, the rates, and all long timescale spike count cross-cumulants (equivalent to the total "area" under the cross-cumulant density, see the Methods) of order two and higher are identical for the two processes. We focus on the sensitivity of the network to the temporal cumulant structure of its inputs.

In **Figures 5A,B**, we present two example rasters of the nearestneighbor LIF network receiving synchronous (left) and cascading (right) input. In the second case, there is an obvious pattern in the outputs, but the firing rate is also increased. This is quantified in **Figure 5C**, where we compare the number of output spikes fired by a network receiving synchronous input (horizontal axis) with the same for a network receiving cascading input (vertical axis), over a large number of trials. On average, the cascading input increases the output rate by a factor of 1.5 over the synchronous inputs—we refer to this quantity as the *cascade amplification factor* (CAF).

Finally, in **Figure 5D**, we illustrate how the cascade amplification factor depends on the parameters that define the timing of spikes for the cascading inputs. First, we study the dependence on the standard deviation σshift of the gamma variates determining the shift distribution. We note that amplification factors above 1.5 hold robustly (i.e., for a range of shift σshift values). The amplification factors decrease with shift variance. In the inset to panel **(D)**, we show how the gain depends on the mean of the shift distribution μshift. On an individual trial, the response intensity will depend strongly on the total number of input spikes. Thus, in order to enforce a fair comparison, the mother process and markings used were identical in each trial of every panel of **Figure 5**. We note that network properties, such as the membrane properties of individual cells or synaptic timescales, may have an equally large impact on the cascade amplification factor—indeed, as we explain below, the observed behavior of the CAF is a result of synergy between the timescales of input and interactions within the network.

These observations have simple explanations in terms of the network dynamics and input statistics. Neglecting, for a moment, population-level events, the network is configured so that correlations in activity decrease with topographic distance. Accordingly, the probability of finding neurons that are simultaneously close to threshold also decreases with distance. Under the synchronous input model, a population-level event results in a simultaneous increase of the membrane potentials of all neurons by an amount *w*in, but unless the input is very strong (in which case every, or almost every, neuron will fire regardless of finescale input structure), the set of neurons sufficiently close to threshold to "capitalize" on the input and fire will typically be restricted to a topographically adjacent subset. Neurons which do not fire almost immediately will soon have forgotten about this population-level input. As a result, the output does not significantly reflect the chain-like structure of the inputs (**Figure 5A**, right).

On the other hand, in the case of the cascading input, the temporal structure of the input and the timescale of synapses can operate synergistically. Consider a pair of adjacent neurons in the ring network, called cells 1 and 2, arranged so that cell 2 is downstream from cell 1 in the direction of the population-level chain events. When cell 1 spikes, it is likely that cell 2 will also have an elevated membrane potential. The potential is further elevated by the delayed synaptic input from cell 1. If cell 1 spikes in response to a population-level chain event, then cell 2 imminently receives an input spike as well. If the synaptic filter and time-shift of the input spikes to each cell align, then the firing probability of cell 2 will be large relative to chance. This reasoning can be carried on across the network. Hence synergy between the temporal structure of inputs and network architecture allows the network to selectively respond to the temporal structure of the inputs (**Figure 5B**, right).

In Kuhn et al. (2003), the effect of higher order correlations on the firing rate gain of an integrate-and-fire neuron was studied by driving single cells using sums of SIP or MIP processes with equivalent firing rates (first order cumulants) and pairwise correlations (second order cumulants). In contrast, in the preceding example, the two inputs have equal long time spike count cumulants, and differ only in temporal correlation structure. An increase in firing rate was due to network interactions, and is therefore a population level effect. We return to this comparison in the Discussion.

These examples demonstrate how the GTaS model can be used to explore the impact of spatio-temporal structure in population activity on network dynamics. We next proceed with a formal derivation of the cumulant structure for a general GTaS process.

#### **2.3. CUMULANT STRUCTURE OF A GTaS PROCESS**

The GTaS model defines an *N*-dimensional counting process. Following the standard description for a counting process, **X** = (*X*1,..., *XN*) on R*N*, given a collection of Borel subsets *Ai* <sup>∈</sup> *<sup>B</sup>*(R),*<sup>i</sup>* <sup>=</sup> <sup>1</sup>,..., *<sup>N</sup>*, then **X**(*A*<sup>1</sup> ×···× *AN*) = (*X*1(*A*1), . . . , *XN* (*AN*)) ∈ N*<sup>N</sup>* is a random vector where the value of each coordinate *i* indicates the (random) number of points which fall inside the set *Ai*. Note that the GTaS model defines processes that are marginally Poisson. All GTaS model parameters and related quantities are defined in **Table 1**.


**Table 1 | Common notation used in the text.**

For each *D* ⊂ D = {1,..., *N*}, define the tail probability *p*¯*<sup>D</sup>* by

$$\bar{p}\_{\mathcal{D}} = \sum\_{D \subset D' \subset \mathcal{D}} p\_{D'}.\tag{8}$$

Since *pD* is the probability that exactly the processes in *D* are marked, *p*¯*<sup>D</sup>* is the probability that all processes in *D*, as well as possibly other processes, are marked. An event from the mother process is assigned to daughter process *Xi* with probability *p*¯{*i*}. As noted above, an event attributed to process *i* following a marking *D i* will be marginally shifted by a random amount determined by the distribution *Q*{*i*} *<sup>D</sup>* which represents the projection of *QD* onto dimension *i*. Thus, the events in the marginal process *Xi* are shifted in an independent and identically distributed (IID) manner according to the mixture distribution *Qi* given by

$$Q\_i = \frac{\sum\_{D \ni i} p\_D Q\_D^{\{i\}}}{\sum\_{D \ni i} p\_D}.$$

Note that IID shifting of the event times of a Poisson process generates another Poisson process of identical rate. Thus, the process *Xi* is marginally Poisson with rate λ*p*¯{*i*} (Ross, 1995).

In deriving the statistics of the GTaS counting process **X**, it will be useful to express the distribution of **X** as

$$
\begin{pmatrix} X\_1(A\_1) \\ \vdots \\ X\_N(A\_N) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \sum\_{D \ni 1} \xi(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D \ni N} \xi(D; A\_1, \dots, A\_N) \end{pmatrix} . \tag{9}
$$

Here, each ξ(*D*; *A*1,..., *AN* ) is an independent Poisson process, and the notation =distr indicates that the two random vectors are equal in distribution. This process counts the number of points which are marked by a set *D* ⊃ *D*, but (after shifting) only the points with indices *i* ∈ *D* lie in the corresponding set *Ai*. Precise definitions of the processes ξ and a proof of Equation (9) may be found in the Appendix. We emphasize that the Poisson processes ξ(*D*) do not directly count points marked for the set *D*, but instead points which are marked for a set containing *D* that, after shifting, only have their *D*-components lying in the "relevant" sets *Ai*.

Suppose we are interested in calculating dependencies among a subset of daughter processes, {*Xij* }*ij*∈*D*¯ for some set *<sup>D</sup>*¯ <sup>⊂</sup> <sup>D</sup>, consisting of |*D*¯ | = *k* distinct members of the collection of counting processes **X**. Then the following alternative representation will be useful:

$$\begin{pmatrix} X\_{i\_1}(A\_{i\_1}) \\ \vdots \\ X\_{i\_k}(A\_{i\_k}) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \sum\_{i\_1 \in D \subset \bar{D}} \xi\_D(A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{i\_k \in D \subset \bar{D}} \xi\_D(A\_1, \dots, A\_N) \end{pmatrix} \tag{10}$$

where

$$\xi\_D(A\_1, \ldots, A\_N) = \sum\_{\substack{D' \supset D \\ (\vec{D} \backslash D) \cap D' = \emptyset}} \xi(D'; A\_1, \ldots, A\_N).$$

We illustrate this decomposition in the cases *k* = 2, 3 in **Figure 6**. The sums in Equation (10) run over all sets *D* ⊂ D containing the indicated indices *ij* and contained within *D*¯ . The processes ζ*<sup>D</sup>* are comprised of a sum of all of the processes ξ(*D* ) (defined below Equation 9) such that *D* contains all of the indices *D*, but no other indices which are part of the subset *D*¯ under consideration. These sums are non-overlapping, implying that the ζ*<sup>D</sup>* are also independent and Poisson.

The following examples elucidate the meaning and significance of Equation (10). We emphasize that the GTaS process is a completely characterized, joint Poisson process, and we use Equation (10) to calculate cumulants of a GTaS process. In principle, any other statistics can be obtained similarly.

#### *2.3.1. Second order cumulants (covariance)*

We first generalize a well-known result about the dependence structure of temporally jittered pairs of Poisson processes, *X*1, *X*2. Assume that events from a mother process with rate λ, are

assigned to two daughter processes with probability *p*. Each event time is subsequently shifted independently according to a univariate distribution *f* . The cross-cumulant density (or crosscovariance function; see the Methods for cumulant definitions) then has the form (Brette, 2009)

$$
\kappa\_{12}^{\mathbf{X}}(\mathfrak{r}) = \lambda p \int f(t) f(t+\mathfrak{r}) dt = \lambda p (f \times f)(\mathfrak{r}).
$$

We generalize this result within the GTaS framework. At second order, Equation (10) has a particularly nice form. Following Bäuerle and Grübel (2005) we write for *i* = *j* (see **Figure 6A**)

$$
\begin{pmatrix} X\_i(A\_i) \\ X\_j(A\_j) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \xi\_{\{i,j\}}(A\_i, A\_j) + \xi\_{\{i\}}(A\_i) \\ \xi\_{\{i,j\}}(A\_i, A\_j) + \xi\_{\{j\}}(A\_j) \end{pmatrix}. \tag{11}
$$

The process ζ{*i*, *<sup>j</sup>*} sums all ξ(*D* ) for which {1, 2} ⊂ *D* , while the process ζ{*i*} sums all ξ(*D* ) such that *i* ∈ *D* ,*j* ∈/ *D* , and ζ{*j*} is defined likewise.

Using the representation in Equation (11), we can derive the second order cumulant (covariance) structure of a GTaS process. First, we have

$$\begin{split} \mathsf{cov}[X\_{i}(A\_{i}), X\_{j}(A\_{j})] &= \mathsf{\kappa}[X\_{i}(A\_{i}), X\_{j}(A\_{j})] \\ &= \mathsf{\kappa}[\xi\_{\{i,j\}}(A\_{i}, A\_{j}), \xi\_{\{i,j\}}(A\_{i}, A\_{j})] \\ &+ \mathsf{\kappa}[\xi\_{\{i\}}(A\_{i}), \xi\_{\{i,j\}}(A\_{i}, A\_{j})] \\ &+ \mathsf{\kappa}[\xi\_{\{i,j\}}(A\_{i}, A\_{j}), \xi\_{\{j\}}(A\_{j})] \\ &+ \mathsf{\kappa}[\xi\_{\{i\}}(A\_{i}), \xi\_{\{j\}}(A\_{j})] \\ &= \mathsf{\kappa}\_{2}[\xi\_{\{i,j\}}(A\_{i}, A\_{j})] + 0 \\ &= \mathsf{E}[\xi\_{\{i,j\}}(A\_{i}, A\_{j})]. \end{split}$$

The third equality follows from the construction of the processes ζ*D*: if *D* = *D* , then the processes ζ*D*,ζ*D* are independent. The final equality follows from the observation that every cumulant of a Poisson random variable equals its mean.

The covariance may be further expressed in terms of model parameters (see Theorem 1.1 for a generalization of this result to arbitrary cumulant orders):

$$\begin{aligned} \text{cov}[X\_i(A\_i), X\_j(A\_j)] \\ = \lambda \sum\_{D' \supset \{i, j\}} p\_{D'} \int P\left(t + Y\_i \in A\_i, t + Y\_j \in A\_j \mid \mathbf{Y} \sim \mathbf{Q}\_{D'}\right) dt. \end{aligned} \tag{12}$$

In other words, the covariance of the counting processes is given by the weighted sum of the probabilities that the (*i*,*j*) marginal of the shift distributions yield values in the appropriate sets. The weights are the intensities of each corresponding component processes ξ(*D*) which contribute events to both of the processes *i* and *j*.

In the case that *QD* ≡ *Q*, Equation (12) reduces to the solution given in Bäuerle and Grübel (2005). Using the tail probabilities defined in Equation (8), if *QD* ≡ *Q* for all *D*, the integral in Equation (12) no longer depends on the subset *D* , and the equation may be written as

$$\begin{aligned} &\mathbb{E}\mathbb{cov}[X\_i(A\_i), X\_j(A\_j)] \\ &= \lambda \bar{p}\_{\{i,j\}} \int P\left(t + Y\_i \in A\_i, t + Y\_j \in A\_j \mid \mathbf{Y} \sim Q\right) dt. \end{aligned}$$

Using Equation (12), we may also compute the second crosscumulant density (also called the *covariance density*) of the processes. From the definition of the cross-cumulant density [Equation (24) in the Methods], this is given by

$$\begin{aligned} \mathsf{x}\_{ij}^{\mathbf{X}}(\mathbf{r}) &= \lim\_{\Delta t \to 0} \frac{\mathsf{cov}[X\_i(\mathbf{0}, \Delta t)), X\_j(\mathbf{r}, \tau + \Delta t))}{\Delta t^2} \\ &= \lambda \sum\_{D' \supset \{i, j\}} p\_{D'} \\ &\quad \int \lim\_{\Delta t \to 0} \frac{P\left(t + Y\_i \in [0, \Delta t), t + Y\_j \in [\tau, \tau + \Delta t) \mid \mathbf{Y} \sim Q\_D\right)}{\Delta t^2} dt, \end{aligned} \tag{13}$$

Before continuing, we note that given a random vector **Y** = (*Y*1,..., *YN*) ∼ *Q*, where *Q* has density *q*(*y*1,..., *yN*), the vector **Z** = (*Y*<sup>2</sup> − *Y*1,..., *YN* − *Y*1) has density *qZ* given by

$$q\_Z(\mathbf{r}\_1, \dots, \mathbf{r}\_{N-1}) = \int q(t, t + \mathbf{r}\_1, \dots, t + \mathbf{r}\_{N-1}) dt. \quad \text{(14)}$$

Assuming that the distributions *QD* have densities *qD* , and denoting by *q* {*i*, *j*} *<sup>D</sup>* the bivariate marginal density of the variables *Yi*, *Yj* under *QD* , we have that

$$\kappa\_{ij}^{X}(\mathbf{r}) = \lambda \sum\_{D' \supset \{i, j\}} p\_{D'} \int q\_{D'}^{\{i, j\}}(t, t + \mathbf{r}) dt. \tag{15}$$

According to Equation (14), the integrals present in Equation (15) are simply the densities of the variables *Yj* − *Yi*, where **Y** ∼ *QD* .

Thus κ**<sup>X</sup>** *ij*(τ), which captures the additional probability for events in the marginal processes *Xi* and *Xj* separated by τ units of time beyond what can be predicted from lower order statistics is given by a weighted sum (in this case, the lower order statistics are marginal intensities—see the discussion around Equation (24) of the Methods). The weights are the "marking rates" λ*pD* for markings contributing events to both component processes, while the summands are the probabilities that the corresponding shift distributions yield a pair of shifts in the proper arrangement specifically, the shift applied to the event as attributed to *Xi* precedes that applied to the event mapped to *Xj* by τ units of time. This interpretation of the cross-cumulant density is quite natural, and will carry over to higher order cross-cumulants of a GTaS process. However, as we show next, this extension is not trivial at higher cumulant orders.

#### *2.3.2. Third order cumulants*

To determine the higher order cumulants for a GTaS process, one can again use the representation given in Equation (10). The distribution of a subset of three processes may be expressed in the form (see **Figure 6B**)

$$
\begin{pmatrix} X\_i(A\_i) \\ X\_j(A\_j) \\ X\_k(A\_k) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \xi\_{\{i,j,k\}} + \xi\_{\{i,j\}} + \xi\_{\{i,k\}} + \xi\_{\{i\}} \\ \xi\_{\{i,j,k\}} + \xi\_{\{i,j\}} + \xi\_{\{j,k\}} + \xi\_{\{j\}} \\ \xi\_{\{i,j,k\}} + \xi\_{\{i,k\}} + \xi\_{\{j,k\}} + \xi\_{\{k\}} \end{pmatrix}, \tag{16}
$$

where, for simplicity, we suppressed the arguments of the different ζ*<sup>D</sup>* on the right hand side. Again, the processes in the representation are independent and Poisson distributed. The variable ζ{*i*, *<sup>j</sup>*, *<sup>k</sup>*} is the sum of all random variables ξ(*D*) (see Equation 9) with *D* ⊃ {*i*, *j*, *k*}, while the variable ζ{*i*, *<sup>j</sup>*} is now the sum of all ξ(*D*) with *D* ⊃ {*i*, *j*}, but *k* ∈/ *D*. The rest of the variables are defined likewise. Using properties (C1) and (C2) of cumulants given in the Methods, and assuming that *i*,*j*, *k* are distinct indices, we have

$$\kappa(X\_i(A\_i), X\_j(A\_j), X\_k(A\_k)) = \kappa\_\Im(\zeta\_{\{i,j,k\}}) = \mathbf{E}[\zeta\_{\{i,j,k\}}] \dots$$

The second equality follows from the fact that all cumulants of a Poisson distributed random variable equal its mean. Similar to Equation (12), we may write

$$\begin{aligned} \mathbb{K}(X\_i(A\_i), X\_j(A\_j), X\_k(A\_k)) &= \lambda \sum\_{D' \supset \{i, j, k\}} p\_{D'} \int P(t + Y\_i \in A\_i, \\\\ t + \ \ Y\_j \in A\_j, t + Y\_k \in A\_k \mid \mathbf{Y} \sim Q\_{D'}) dt. \end{aligned}$$

The third cross-cumulant density is then given similarly to the second order function by

$$\kappa\_{ijk}^{\mathbf{X}}(\mathfrak{r}\_1, \mathfrak{r}\_2) = \lambda \sum\_{D' \supset \{i, j, k\}} p\_{D'} \int q\_{D'}^{\{i, j, k\}}(t, t + \mathfrak{r}\_1, t + \mathfrak{r}\_2) dt.$$

Here, we have again assumed the existence of densities *qD* , and denoted by *q* {*i*, *j*, *k*} *<sup>D</sup>* the joint marginal density of the variables *Yi*, *Yj*, *Yk* under *qD* . The integrals appearing in the expression for the third order cross-cumulant density are the probability densities of the vectors (*Yj* − *Yi*, *Yk* − *Yi*), where **Y** ∼ *QD* .

## *2.3.3. General cumulants*

Finally, consider a general subset of *k* distinct members of the vector counting process **X** as in Equation (10). The following theorem provides expressions for the cross-cumulants of the counting processes, as well as the cross-cumulant densities, in terms of model parameters in this general case. The proof of Theorem 1.1 is given in the Appendix.

**Theorem 1.1.** *Let* **X** *be a joint counting process of GTaS type with total intensity* λ*, marking distribution* (*pD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*, and family of shift distributions* (*QD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*. Let A*1,..., *Ak be arbitrary sets in <sup>B</sup>*(R)*, and D*¯ = {*i*1,...,*ik*} ⊂ D *with* |*D*¯ | = *k. The cross-cumulant of the counting processes may be written*

$$\begin{aligned} \mathbb{K}(X\_{\dot{i}\_1}(A\_1), \dots, X\_{\dot{i}\_k}(A\_k)) \\ = & \lambda \sum\_{D' \supset \bar{D}} p\_{D'} \int P(t\mathbf{1} + \mathbf{Y}^{\bar{D}} \in A\_1 \times \dots \times A\_k | \mathbf{Y} \sim Q\_{D'}) dt \end{aligned} \tag{17}$$

*where* **Y***D*¯ *represents the projection of the random vector* **Y** *onto the dimensions indicated by the members of the set D. Furthermore,* ¯ *assuming that the shift distributions possess densities* (*qD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*, the cross-cumulant density is given by*

$$\begin{aligned} \mathsf{w}\_{i\_1\ldots i\_k}^X(\mathfrak{r}\_1,\ldots,\mathfrak{r}\_{k-1}) \\ = \lambda \sum\_{D'\supset\bar{D}} p\_{D'} \int q\_{D'}^{\bar{D}}(t, t+\mathfrak{r}\_1, \ldots, t+\mathfrak{r}\_{k-1}) dt,\end{aligned} \tag{18}$$

*where qD*¯ *<sup>D</sup> indicates the kth order joint marginal density of qD in the dimensions of D.* ¯

An immediate corollary of Theorem 1.1 is a simple expression for the infinite-time-window cumulants, obtained by integrating the cumulant density across all time lags τ*i*. From Equation (A8), we have

$$\gamma\_{i\_1\dots i\_k}^{\mathbf{X}}(\infty) = \int \cdots \int \kappa\_{i\_1\dots i\_k}^{\mathbf{X}}(\mathbf{r}\_1,\dots,\mathbf{r}\_{k-1})d\mathbf{r}\_{k-1}\cdots d\mathbf{r}\_1$$

$$= \lambda \sum\_{D'\supset D} p\_{D'} \cdot 1 = \lambda \bar{p}\_D.\tag{19}$$

This shows that the infinite time window cumulants for a GTaS process are non-increasing with respect to the ordering of sets, i.e.,

$$
\boldsymbol{\gamma}\_{i\_1\cdots i\_k}^{\mathbf{X}}(\infty) \ge \boldsymbol{\gamma}\_{i\_1\cdots i\_k i\_{k+1}}^{\mathbf{X}}(\infty).
$$

We conclude this section with a short technical remark: Until this point, we have considered only the cumulant structure of sets of *unique* processes. However occasionally, one may wish to calculate a cumulant for a set of processes including repeats. Take, for example, a cumulant κ(*X*1(*A*1), *X*1(*A*2), *X*3(*A*3)). Owing to the marginally Poisson nature of the GTaS process, we would have (referring to the Methods for cumulant definitions)

$$\begin{aligned} &\mathbb{K}(X\_1(A\_1), X\_1(A\_2), X\_3(A\_3)) \\ &= \mathbb{K}\_{(2,1)}(X\_1(A\_1 \cap A\_2), X\_3(A\_3)) \quad \text{if} \quad \mathbf{X} \sim \text{GTaS.} \end{aligned} \tag{20}$$

For a general counting process **X**, it may be shown that

$$
\kappa\_{113}^{\mathbf{X}}(\mathbf{r}\_1, \mathbf{r}\_2) = \delta(\mathbf{r}\_1)\kappa\_{13}^{\mathbf{X}}(\mathbf{r}\_2) + \text{ \text{\textquotedbl{}non-singular\textquotedbl{}continuous\textquotedbl{}} }\text{\textquotedbl{}}\tag{21}
$$

In addition, the second order auto-cumulant density may be written (Cox and Isham, 1980)

$$
\kappa\_{ii}^{\mathbf{X}}(\mathbf{r}) = r\_i \delta(\mathbf{r}) + \text{ "non-singular contributions"},
$$

where *ri* is the stationary rate. The singular contribution shown in Equation (21) at third order is in analogy to the delta contribution proportional to the firing rate which appears in the second-order auto-cumulant density. For a GTaS process, the non-singular contributions in Equation (21) are identically zero, following directly from Equation (20). Expressions similar to Equations (20, 21) hold for general cases.

## **3. DISCUSSION**

We have introduced a general method of generating spike trains with flexible spatiotemporal structure. The GTaS model is completely analytically tractable: all statistics of interest can be obtained directly from the distributions used to define it. It is based on an intuitive method of selecting and shifting point processes from a "mother" train. Moreover, the GTaS model can be used to easily generate partially synchronous states, cluster firing, cascading chains, and other spatiotemporal patterns of neural activity.

Processes generated by the GTaS model are naturally described by cumulant densities of pairwise and higher orders. This raises the question of whether such statistics are readily computable from data, so that realistic classes of GTaS models can be defined in the first place. One approach is to fit mechanistic models to data, and to use the higher order structure that is generated by the underlying mechanisms (Yu et al., 2011). A synergistic blend of other methods with the GTaS framework may also be fruitful—for example, the CuBIC framework of Staude et al. (2010) could be used to determine relevant marking orders, and the parametrically-described GTaS process could then be fit to allow generation of surrogate data after selection of appropriate classes of shift distributions. When it is necessary to infer higher order structure in the face of data limitations, population cumulants are an option to increase statistical power (albeit at the cost of spatial resolution; see **Figure 4**).

While the GTaS model has flexible higher order structure, it is always marginally Poisson. While throughout the cortex spiking is significantly irregular (Holt et al., 1996; Shadlen and Newsome, 1998), the level of variability differs across cells, with Fano factors ranging from below 0.5 to above 1.5—in comparison with the Poisson value of 1 (Churchland et al., 2010). Changes in variability may reflect cortical states and computation (Litwin-Kumar and Doiron, 2012; White et al., 2012). A model that would allow flexible marginal variability would therefore be very useful. Unfortunately, the tractability of the GTaS model is closely related to the fact that the marginal processes are Poisson. Therefore, an immediate generalization does not seem possible.

A number of other models have been used to describe population activity. Maximum entropy (ME) approaches also result in models with varied spatial activity; these are defined based on moments or other averaged features of multivariate spiking activity (Schneidman et al., 2006; Roudi et al., 2009). Such models are often used to fit purely spatial patterns of activity, though (Tang et al, 2008; Marre et al., 2009) have extended the techniques to treat temporal correlations as well. Generalized linear models (GLMs) have been used successfully to describe spatiotemporal patterns at second (Pillow et al., 2008), and third order (Ohiorhenuan et al., 2010). In comparison to the present GTaS method, both GLMs and ME models are more flexible. They feature well-defined approaches for fitting to data, including likelihood-based methods with well-behaved convexity properties. What the GTaS method contributes is an explicit way to generate population activity with explicitly specified high order spatio-temporal structure. Moreover, the lower order cumulant structure of a GTaS process can be modified independently of the higher order structure, though the reverse is not true.

There are a number of possible implications of such spatiotemporal structure for communication within neural networks. In section 2.2.3, we showed that these temporal correlations can play a role similar to that of spatial correlations established in Kuhn et al. (2003) for determining network input-output transfer. Our model allowed us to examine that impact of such temporal correlations on the network-level gain of a downstream population (cascade amplification factor). Even in a very simple network it was clear that the strength of the response is determined jointly by the temporal structure of the input to the network, and the connectivity within the network. Kuhn et al. examined the effect of higher order structure on the firing rate gain of an integrate-and-fire neuron by driving it with a mixture of SIP or MIP processes (Kuhn et al., 2003). However, in these studies, only the spatial structure of higher order activity was varied. The GTaS model allows us to concurrently change the temporal structure of correlations. In addition, the precise control of the cumulants allows us to derive models which are equivalent up to a certain cross-cumulant order, when the configuration of marking probabilities and shift distributions allow it (as for the SIP and MIP processes of Kuhn et al. (2003), which are equivalent at second order).

Such patterns of activity may be useful when experimentally probing dendritic information processing (Gasparini and Magee, 2006), synaptic plasticity (Pfister and Gerstner, 2006; Gjorgjieva et al., 2011), or investigating the response of neuronal networks to complex patterns of input (Kahn et al., 2013). Spatiotemporal patterns may also be generated by cell assemblies (Bathellier et al., 2012). The firing in such assemblies can be spatially structured, and this structure may not be reflected in the activity of participating cells. Assemblies can exhibit persistent patterns of firing, sometimes with millisecond precision (Harris et al., 2002). The GTaS framework is well suited to describe exactly such activity patterns. The examples we presented can be easily extended to generate more complex patterns of activity with overlapping cell assemblies, different cells leading the activity, and other variations.

Understanding impact of spatiotemporal patterns on neural computations remains an open and exciting problem. Progress will require coordination of computational, theoretical, and experimental work—the latter taking advantage of novel stimulation techniques. We hope that the GTaS model, as a practical and flexible method for generating high-dimensional, correlated spike trains, will play a significant role along the way.

## **4. METHODS**

#### **4.1. CUMULANTS AS A MEASURE OF DEPENDENCE**

We first define *cross-cumulants* (also called *joint cumulants*) (Stratonovich and Silverman, 1967; Kendall et al., 1969; Gardiner, 2009) and review some important properties of these quantities. Define the cumulant generating function *g* of a random vector **X** = (*X*1,..., *XN*) by

$$\log(t\_1, \dots, t\_N) = \log\left(\mathbf{E}\left[\exp\left(\sum\_{j=1}^N t\_j X\_j\right)\right]\right).$$

The **r**-cross-cumulant of the vector **X** is given by

$$\kappa\_{\mathbf{r}}(\mathbf{X}) = \left. \frac{\partial^{|\mathbf{r}|}}{\partial t\_1^{r\_1} \cdots \partial t\_N^{r\_N}} g(t\_1, \dots, t\_N) \right|\_{t\_1 = \cdots = t\_N = 0}.$$

where **r** = (*r*1,...,*rN*) is a *N*-vector of positive integers, and |**r**| = -*N <sup>i</sup>* <sup>=</sup> <sup>1</sup> *ri*. We will generally deal with cumulants where all variables are considered at first order, without excluding the possibility that some variables are duplicated. In this case, we define the cross-cumulant κ(**X**), of the variables in the random vector **X** = (*X*1,..., *XN*) as

$$\kappa(\mathbf{X}) := \kappa\_1(\mathbf{X}) = \left. \frac{\partial^N}{\partial t\_1 \cdots \partial t\_N} \mathbf{g}(t\_1, \dots, t\_N) \right|\_{t\_1 = \cdots = t\_N = 0}$$

where **1** = (1,..., 1).

This relationship may be expressed in combinatorial form:

$$\kappa(X\_1, \dots, X\_N) = \sum\_{\pi} (|\pi| - 1)! (-1)^{|\pi| - 1} \prod\_{B \in \pi} \mathbf{E} \left[ \prod\_{i \in B} X\_i \right] \tag{22}$$

where π runs through all partitions of D = {1,..., *N*}, and *B* runs over all blocks in a partition π. More generally, the **r**-crosscumulant may be expressed in terms of moments by expanding the cumulant generating function as a Taylor series, noting that

$$g(t\_1, \ldots, t\_N) = \sum\_{\mathbf{r}} \frac{\kappa\_{\mathbf{r}}(X\_1, \ldots, X\_N)}{\mathbf{r}!} x\_1^{r\_1} \cdots x\_d^{r\_N} \quad \text{with} \quad \mathbf{r}!$$

$$\mathbf{r}! = \prod\_{i=1}^N r\_i!,$$

similarly expanding the moment generating function *M*(*t*) = *eg*(*t*) , and matching the polynomial coefficients. Note that the *n*th cumulant κ*<sup>n</sup>* of a random variable *X* may be expressed as a joint cumulant via

$$\kappa\_n(X) = \kappa(\underbrace{X, \dots, X}\_{n \text{ copies of } X})\,\,\,^\vee$$

We will utilize the following two principal properties of cumulants (Brillinger, 1965; Stratonovich and Silverman, 1967; Mendel, 1991; Staude et al., 2010):

(C1) Multilinearity - for any random variables *X*, *Y*,{*Zi*} *N <sup>i</sup>* <sup>=</sup> <sup>2</sup>, we have

$$\begin{aligned} \kappa(aX + bY, Z\_2, \dots, Z\_N) &= a\kappa(X, Z\_2, \dots, Z\_N) \\ &+ b\kappa(Y, Z\_2, \dots, Z\_N). \end{aligned}$$

This holds regardless of dependencies amongst the random variables.

(C2) If any subset of the random variables in the cumulant argument is independent from the remaining, the crosscumulant is zero—i.e., if {*X*1,..., *XN*<sup>1</sup> } and {*Y*1,..., *YN*<sup>2</sup> } are sets of random variables such that each *Xi* is independent from each *Yj*, then

$$\mathbf{x}\_{\left(\mathbf{r}\_{X},\mathbf{r}\_{Y}\right)}\left(X\_{1},\ldots,X\_{N\_{1}},Y\_{1},\ldots,Y\_{N\_{2}}\right) = \mathbf{0}$$
 
$$\text{for all } \mathbf{r}\_{X} \in \mathbb{N}\_{+}^{N\_{1}}, \mathbf{r}\_{Y} \in \mathbb{N}\_{+}^{N\_{2}}.$$

To exhibit another key property of cumulants, consider a 4 vector **X** = (*X*1, *X*2, *X*3, *X*4) with non-zero fourth cumulant and a random variable *Z* independent of each *Xi*. Define **Y** = (*X*<sup>1</sup> + *Z*, *X*<sup>2</sup> + *Z*, *X*<sup>3</sup> + *Z*, *X*4). Using properties (C1), (C2) above, it follows that

$$
\kappa(Y\_1, Y\_2, Y\_3) = \kappa(X\_1, X\_2, X\_3) + \kappa\_3(Z).
$$

On the other hand, it is also true that

$$
\kappa(\mathbf{Y}) = \kappa(\mathbf{X}),
$$

that is, adding the variable *Z* to only a subset of the variables in **X** results in changes to cumulants involving only that subset, but *not* to the joint cumulant of the entire vector. In this sense, an *r*th order cross-cumulant of a collection of random variables captures exclusively dependencies amongst the collection which cannot be described by cumulants of lower order. In the example above, only the joint statistical properties of a subset of **X** were changed. As a result, the total cumulant κ(**X**) remained fixed.

From Equation (22), it is apparent that κ(*Xi*) = **E**[*Xi*], and κ(*Xi*, *Xj*) = **cov** *Xi*, *Xj* . In addition, the third cumulant, like the second, is equal to the corresponding central moment:

$$\kappa(X\_i, X\_j, X\_k) = \mathbf{E}[(X\_i - \mathbf{E}[X\_i])(X\_j - \mathbf{E}[X\_j])(X\_k - \mathbf{E}[X\_k])\,\big]\,.$$

As cumulants and central moments agree up to third order, central moments up to third order inherit the properties discussed above at these orders. On the other hand, the fourth cumulant is *not* equal to the fourth central moment. Rather:

$$\begin{split} \mathbb{K}(X\_i, X\_j, X\_k, X\_l) \\ &= E[(X\_i - \mathbb{E}[X\_i])(X\_j - \mathbb{E}[X\_j])(X\_k - \mathbb{E}[X\_k])(X\_l - \mathbb{E}[X\_l])] \\ &- \mathbf{cov}[X\_i, X\_j] \left[ \mathbf{cov}[X\_k, X\_l] - \mathbf{cov}[X\_i, X\_k] \right. \\ &\left. - \mathbf{cov}[X\_i, X\_l] \left[ \mathbf{cov}[X\_j, X\_k] \right. \right] . \end{split} \tag{23}$$

Higher cumulants have similar (but more complicated) expansions in terms of central moments. Accordingly, central moments of fourth and higher order do not inherit properties (C1), (C2).

#### **4.2. TEMPORAL STATISTICS OF POINT PROCESSES**

In the Results, we present an extension of previous work (Bäuerle and Grübel, 2005) in which we construct and analyze multivariate counting processes **X** = (*X*1,..., *XN*) where each *Xi* is marginally Poisson.

Formally, a counting process **X** is an integer-valued random measure on *<sup>B</sup>*(R*N*). Evaluated on subset *<sup>A</sup>*<sup>1</sup> ×···× *AN* of *<sup>B</sup>*(R*N*), the random vector (*X*1(*A*1), . . . , *XN*(*AN*)) counts events in *d* distinct categories whose times of occurrence fall in to the sets *Ai*. A good general reference on the properties of counting processes (marginally Poisson and otherwise) is Daley and Vere-Jones (2002).

The assumption of Poisson marginals implies that for a set *Ai* <sup>∈</sup> *<sup>B</sup>*(R), the random variable *Xi*(*Ai*) follows a Poisson distribution with mean λ*i* (*Ai*), where is the Lebesgue measure on R, and λ*<sup>i</sup>* is the (constant) rate for the *i*th process. The processes under consideration will further satisfy a joint stationarity condition, namely that the distribution of the vector (*X*1(*A*<sup>1</sup> + *t*), . . . , *XN* (*AN* + *t*)) does not depend on *t*, where *Ai* + *t* denotes the translated set {*a* + *t* : *a* ∈ *Ai*}.

We now consider some common measures of temporal dependence for jointly stationary vector counting processes. We will refer to the quantity *Xi*[0, *T*] as the *spike count* of process *i* over [0, *T*]. The quantity γ**<sup>X</sup>** *i*1···*ik* (*T*) (which we will refer to as a *spike count cumulant*) is given by

$$\gamma\_{i\_1\dots i\_k}^{\mathbf{X}}(T) = \frac{1}{T} \mathbb{K}[\mathbf{X}\_{i\_1}[\mathbf{0}, T], \dots, \mathbf{X}\_{i\_k}[\mathbf{0}, T]],$$

measures *k*th order correlations amongst spike counts for the listed processes which occur over windows of length *T*. At second order, γ**<sup>X</sup>** *ij*(*T*) measures the covariance of the spike counts of processes *i*, *j* over a common window of length *T*. The infinite window spike count cumulant quantifies dependencies in the spike counts of point processes over arbitrarily long windows, and is given by

$$\boldsymbol{\gamma}\_{i\_1\cdots i\_k}^{\mathbf{X}}(\infty) = \lim\_{T \to \infty} \boldsymbol{\gamma}\_{i\_1\cdots i\_k}^{\mathbf{X}}(T).$$

A related measure is the *k*th order cross-cumulant density κ*X i*1,...,*ik* (τ1,..., τ*<sup>k</sup>* <sup>−</sup> <sup>1</sup>), defined by

$$\kappa\_{i\_1\dots i\_k}^{\mathbf{X}}(\mathfrak{r}\_1,\dots,\mathfrak{r}\_{k-1}) = \lim\_{\Delta t \to 0} \frac{1}{\Delta t^k} \mathbb{K}[X\_{i\_1}[0,\Delta t],$$

$$X\_{i\_2}[\mathfrak{r}\_1,\mathfrak{r}\_1+\Delta t], \dots, X\_{i\_k}[\mathfrak{r}\_{k-1},\mathfrak{r}\_{k-1}+\Delta t]].\tag{24}$$

The cross-cumulant density should be interpreted as a measure of the likelihood—above what may be expected from knowledge of the lower order cumulant structure—of seeing events in processes *i*2,..., *ik* at times τ<sup>1</sup> + *t*,..., τ*<sup>k</sup>* <sup>−</sup> <sup>1</sup> + *t*, conditioned on event in process *i*<sup>1</sup> at time *t*. The infinite window spike count cumulant is equal to the total integral under the cross-cumulant density,

$$\gamma\_{i\_1\cdots i\_k}^{\mathbf{X}}(\infty) = \int \cdots \int \kappa\_{i\_1\cdots i\_k}^{\mathbf{X}}(\mathfrak{r}\_1,\ldots,\mathfrak{r}\_{k-1}) d\mathfrak{r}\_{k-1}\cdots d\mathfrak{r}\_1.\ldots$$

As an example, we again consider the familiar second-order cross-cumulant density κ*<sup>X</sup> ij*(τ)—often referred to as the *crosscovariance density* or *cross-correlation function*. Defining the conditional intensity *hij*(τ) of process *j*, conditioned on process *i* to be

$$h\_{ij}^X(\mathfrak{r}) = \lim\_{\Delta t \to 0} \frac{1}{\Delta t} P(X\_j | \mathfrak{r}, \mathfrak{r} + \Delta t) > 0 | X\_i [0, \Delta t] > 0),$$

that is, the intensity of *j* conditioned on an event in process *i* which occurred τ units of time in the past, then it is not difficult to show that

$$
\kappa\_{ij}^X(\mathfrak{r}) = \lambda\_i h\_{ij}(\mathfrak{r}) - \lambda\_i \lambda\_j.
$$

That is, the second order cross-cumulant density supplies the probability of chance of observing an event attributed to process *i*, followed by one attributed to process *j*, τ units of time later, above what would be expected from knowledge of first order statistics (given by the product of the marginal intensities, λ*i*λ*j*). More generally, at higher orders, the cross-cumulant density should be interpreted as a measure of the likelihood (above what may be expected from knowledge of the lower order correlation structure) of seeing events attribute to processes *i*2,...,*ik* at times τ<sup>1</sup> + *t*,..., τ*<sup>k</sup>* <sup>−</sup> <sup>1</sup> + *t*, conditioned on an event in process *i*<sup>1</sup> at time *t*.

Another statistic useful in the study of a correlated vector counting process **X** is the *population cumulant density*. At second-order, the population cumulant density for *Xi* takes the form (Luczak et al., 2013)

## **REFERENCES**


$$\kappa\_{i,\text{pop}}^{\mathbf{X}}(\mathbf{r}) = \sum\_{j \neq i} \kappa\_{ij}^{\mathbf{X}}(\mathbf{r}).$$

More generally, the *k*th order population cumulant density corresponding to the processes *Xi*<sup>1</sup> ,..., *Xik* <sup>−</sup> <sup>1</sup> is given by

$$\kappa\_{\mathbf{i}\_1\dots\mathbf{i}\_{k-1},\operatorname{pop}}^{\mathbf{X}}(\mathfrak{t}\_1,\dots,\mathfrak{t}\_{k-1}) = \sum\_{j\neq i\_1,\dots,i\_k} \kappa\_{\mathbf{i}\_1\dots\mathbf{i}\_{k-1}j}^{\mathbf{X}}(\mathfrak{t}\_1,\dots,\mathfrak{t}\_{k-1}).\tag{25}$$

## **FUNDING**

This work was supported by NSF grants DMS-0817649, DMS-1122094, a Texas ARP/ATP award to Krešimir Josic, and by a ´ Career Award at the Scientific Interface from the Burroughs Wellcome Fund and NSF Grant DMS-1122106 to Eric Shea-Brown.

on decision-making performance. *Neural Comput.* 25, 289–327. doi: 10.1162/NECO\_a\_00398


CA1 pyramidal neurons. *J. Neurosci.* 26, 2088–2100. doi: 10.1523/JNEUROSCI.4428-05.2006


A., et al. (2002). Spike train dynamics predicts theta-related phase precession in hippocampal pyramidal cells. *Nature* 417, 738–741. doi: 10.1038/nature00808


cortical neural variability is stimulus- and state-dependent. *J. Neurophysiol.* 108, 2383–2392. doi: 10.1152/jn.00723.2011


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 May 2013; paper pending published: 31 May 2013; accepted: 12 June 2013; published online: July 2013. 17*

*Citation: Trousdale J, Hu Y, Shea-Brown E and Josi´c K (2013) A generative spike train model with time-structured higher order correlations. Front. Comput.* *Neurosci. 7:84. doi: 10.3389/fncom. 2013.00084*

*Copyright © 2013 Trousdale, Hu, Shea-Brown and Josi´c. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX**

## **PROOF OF THE DISTRIBUTIONAL REPRESENTATION OF THE GTaS MODEL IN EQUATION (9)**

The construction of the GTaS model allows us to provide a useful distributional representation of the process. We describe this representation in a theorem that generalizes Theorem 1 in Bäuerle and Grübel (2005). This theorem also immediately implies that the GTaS process is marginally Poisson.

Some definitions are required: first, for subsets *A*1,..., *AN* ∈ *<sup>B</sup>*(R) and *<sup>D</sup>*, *<sup>D</sup>* <sup>⊂</sup> <sup>D</sup> with *<sup>D</sup>* <sup>⊂</sup> *<sup>D</sup>* , let

$$M(D, D'; A\_1, \ldots, A\_N) := B\_1 \times \cdots \times B\_N \text{ with } B\_i$$

$$:= \begin{cases} A\_i, & \text{for } i \in D, \\ A\_i^{\epsilon}, & \text{for } i \in D' \backslash D, \\ \mathbb{R}, & \text{otherwise} \end{cases}$$

In addition, setting **1** = (1,..., 1) to be the *N*-dimensional vector with all components equal to unity, and if *QD* is a measure on R*N*, then we define the measure ν(*QD*) by

$$\begin{split} \mathbb{V}(Q\_{\mathcal{D}})(A) &:= \int Q\_{\mathcal{D}}(A - t\mathbf{1})dt \quad \text{for } A \in \mathcal{B}(\mathbb{R}^N) \\ &= \int P(\mathbf{Y} + t\mathbf{1} \in A | \mathbf{Y} \sim Q\_{\mathcal{D}})dt. \end{split} \tag{A1}$$

The measure ν(*QD*) can be interpreted as giving the *expected* Lebesgue measure of the subset *L* of R for which uniform shifts by the elements of *L* translate a random vector **Y** ∼ *QD* in to *A*. Heuristically, one may imagine sliding the vector **Y** over the whole real line, and counting the number of times every coordinate ends up in the "right" set—the projection of *A* onto that dimension. In equation form, this means

$$\mathbb{V}(Q\_{\mathcal{D}})(A) = \mathbb{E}\_{\mathbf{Y}}[\ell(\{t \in \mathbb{R} : \mathbf{Y} + t\mathbf{1} \in A\}) | \mathbf{Y} \sim Q\_{\mathcal{D}}] \,. \tag{A2}$$

where the subscript **Y** indicates that we take the average over the distribution of **Y** ∼ *QD*. A short proof of this representation is presented below. We now present the theorem, with a proof indicating adjustments necessary to that of Bäuerle and Grübel (2005).

**Theorem 0** *Let X be an N-dimensional counting process of GTaS type with base rate* λ*, thinning mechanism p* = (*pD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*, and family of shift distributions* (*QD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*. Then, for any Borel subsets A*1,..., *AN of the real line, we have the following distributional representation:*

$$
\begin{pmatrix} X\_1(A\_1) \\ \vdots \\ X\_N(A\_N) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \sum\_{D \ni 1} \xi(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D \ni d} \xi(D; A\_1, \dots, A\_N) \end{pmatrix}, \tag{A3}
$$

*where the random variables* ξ(*D*; *A*1,..., *AN*), ∅ = *D* ⊂ D*, are independent and Poisson distributed with*

$$\mathbb{E}[\xi(D; A\_1, \ldots, A\_N)] = \lambda \sum\_{D' \supset D} p\_{D'} \psi(Q\_{D'}) (M(D, D'; A\_1, \ldots, A\_N)).$$

*Proof.* For each marking *D* ⊂ D, define **X***D* to be an independent TaS (Bäuerle and Grübel, 2005) counting process with mother process rate <sup>λ</sup>*pD* , shift distribution *QD* , and markings (*pD <sup>D</sup>* )*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup> where *pD <sup>D</sup>* <sup>=</sup> 1 if *<sup>D</sup>* <sup>=</sup> *<sup>D</sup>* and is zero otherwise (i.e., the only possible marking for **X***D* is *D* ). We first claim that

$$\mathbf{X} = \operatorname{distr} \sum\_{D'} \mathbf{X}^{D'}.\tag{A4}$$

To see this, note that spikes in the mother process of the GTaS process of **X** marked for a set *D* occur at a rate λ*pD* , which is the rate of the process **X***D* . In addition, these event times are then shifted by *QD* , exactly as they are for **<sup>X</sup>***D* . Thus, the distribution of event times (and hence the counting process distributions) are equivalent.

Let *A*1,..., *AN* be any Borel subsets of the real line. Applying Theorem 1 of Bäuerle and Grübel (2005) to each **X***D* gives the following distributional representation:

$$\begin{pmatrix} X\_1^{D'}(A\_1) \\ \vdots \\ X\_N^{D'}(A\_N) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \sum\_{D \ni 1} \xi^{D'}(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D \ni N} \xi^{D'}(D; A\_1, \dots, A\_N) \end{pmatrix}, \tag{A5}$$

where the random variables ξ*D* (*D*;, *A*1,..., *AN*) are taken to be identically zero unless *D* ⊂ *D* . In the latter case, they are independent and Poisson distributed with

$$\begin{aligned} &\mathbf{E}\Big[\xi^{D'}(D;A\_1,\ldots,A\_N)\Big] \\ &=\lambda p\_{D'}\sum\_{D''\supset D}p\_{D''}^{D'}\mathbf{v}(Q\_{D'})(M(D,D'';A\_1,\ldots,A\_N)) \\ &=\lambda p\_{D'}\mathbf{v}(Q\_{D'})(M(D,D';A\_1,\ldots,A\_N)).\end{aligned}$$

The second equality above follows from the fact that *pD <sup>D</sup>* = 1 if *D* = *D* and is zero otherwise.

Next, define

$$\begin{aligned} \xi(D; A\_1, \ldots, A\_N) &= \sum\_{D'} \xi^{D'}(D; A\_1, \ldots, A\_N) \\ &= \sum\_{D' \supset D} \xi^{D'}(D; A\_1, \ldots, A\_N). \end{aligned}$$

As the sum of independent Poisson variables is again Poisson with rate equal to the sum of the rates, we have that ξ(*D*; *A*1,..., *AN*) is Poisson with mean

$$\mathbb{E}[\xi(D; A\_1, \ldots, A\_N)] = \lambda \sum\_{D' \supset D} p\_{D'} \psi(Q\_{D'}) (M(D, D'; A\_1, \ldots, A\_N)).\tag{A6}$$

Finally, combining Equations (A4, A5), we may write

$$\begin{split} \binom{X\_1(A\_1)}{\vdots} &= \det\_{\text{dist}} \begin{pmatrix} \sum\_{D'} \sum\_{D \ni 1} \xi^{D'}(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D'} \sum\_{D \ni N} \xi^{D'}(D; A\_1, \dots, A\_N) \end{pmatrix}, \\ &= \begin{pmatrix} \sum\_{D \ni 1} \sum\_{D'} \xi^{D'}(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D \ni N} \sum\_{D'} \xi^{D'}(D; A\_1, \dots, A\_N) \end{pmatrix}, \\ &= \begin{pmatrix} \sum\_{D \ni 1} \xi(D; A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{D \ni N} \xi(D; A\_1, \dots, A\_N) \end{pmatrix}, \end{split}$$

which, along with Equation (A6), establishes the theorem.

A short note: The variable ξ(*D*;*A*1,..., *AN*) counts the number of points which are marked by a set *D* ⊃ *D*, but after shifting, only the points attributed to the processes with indices *i* ∈ *D* remain in the corresponding subsets *Ai*. Thus, to determine the number of points attributed to the *i*th process which lie in *Ai* (*Xi*(*Ai*)), one simply sums the variables ξ for all *D* containing *i*, as in Equation (A3). Thus, the intensity of ξ(*D*; *A*1,..., *AN*),

$$
\lambda \mathfrak{p}\_{D'} \mathbb{v}(Q\_{D'}) (M(D, D'; A\_1, \dots, A\_N)),
$$

is simply the expected number of such points. Keeping in mind these natural interpretations of terms, Theorem 1 is easier to digest, and the result is not surprising.

#### **PROOF OF EQUATION (27)**

In Equation (A2), we gave a more intuitive representation of the measure ν(*QD*) than the one first defined in Bäuerle and Grübel (2005), which we prove here. Suppose that *Q* is a measure on *<sup>B</sup>*(R*d*), and *<sup>A</sup>* <sup>∈</sup> *<sup>B</sup>*(R*d*). Then we have

$$\begin{aligned} \mathbb{V}(Q)(A) &= \int Q(A - t\mathbf{1})dt \\ &= \iint \mathbf{1}\_{A - t\mathbf{1}}(\mathbf{y})Q(d\mathbf{y})dt \\ &= \iint \mathbf{1}\_{\{t \in \mathbb{R} : \mathbf{y} + t\mathbf{1} \in A\}}(t)dt Q(d\mathbf{y}) \\ &= \int \ell(\{t \in \mathbb{R} : \mathbf{y} + t\mathbf{1} \in A\})Q(d\mathbf{y}) \\ &= \mathbb{E}\_{\mathbf{Y}}[\ell(\{t \in \mathbb{R} : \mathbf{Y} + t \in A\})|\mathbf{Y} \sim Q]. \end{aligned}$$

thus proving Equation (A2)

#### **PROOF OF THEOREM 1.1**

**Theorem 1.1** *Let X be a joint counting process of GTaS type with total intensity* λ*, marking distribution* (*pD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*, and family of shift distributions* (*QD*)*<sup>D</sup>* <sup>⊂</sup> <sup>D</sup>*. Let A*1,..., *Ak be arbitrary sets in <sup>B</sup>*(R)*,* *and D*¯ = {*i*1,...,*ik*} ⊂ D *with* |*D*¯ | = *k. The cross-cumulant of the counting processes may be written*

<sup>κ</sup>(*Xi*<sup>1</sup> (*A*1), . . . , *Xik* (*Ak*)) <sup>=</sup> <sup>λ</sup> *D* ⊃ *D*¯ *pD <sup>P</sup>*(*t***<sup>1</sup>** <sup>+</sup> **<sup>Y</sup>***D*¯ <sup>∈</sup> *<sup>A</sup>*<sup>1</sup> ×··· × *Ak*|**Y** ∼ *QD*)*dt* (A7)

*where* **Y***D*¯ *represents the projection of the random vector* **Y** *onto the dimensions indicated by the members of the set D. Furthermore,* ¯ *assuming that the shift distributions possess densities* (*qD*)*<sup>D</sup>* <sup>⊂</sup> <sup>2</sup><sup>D</sup>*, the cross-cumulant density is given by*

$$\begin{aligned} \kappa\_{i\_1\dots i\_k}^X(\mathfrak{r}\_1,\dots,\mathfrak{r}\_{k-1}) \\ = \lambda \sum\_{D'\supset\bar{D}} p\_{D'} \int q\_{D'}^D(t, t+\mathfrak{r}\_1, \dots, t+\mathfrak{r}\_{k-1}) dt,\quad \text{(A8)} \end{aligned}$$

*where qD*¯ *<sup>D</sup> indicates the kth order joint marginal density of qD in the dimensions of D.* ¯

*Proof.* First, as noted in the text, we may rewrite the distributional representation of Theorem 0 (Equation A3) as

$$\begin{pmatrix} X\_{i\_1}(A\_{i\_1}) \\ \vdots \\ X\_{i\_k}(A\_{i\_k}) \end{pmatrix} =\_{\text{distr}} \begin{pmatrix} \sum\_{i\_1 \in D \subset \bar{D}} \xi\_D(A\_1, \dots, A\_N) \\ \vdots \\ \sum\_{i\_k \in D \subset \bar{D}} \xi\_D(A\_1, \dots, A\_N) \end{pmatrix} \tag{A9}$$

where

$$\xi\_D(A\_1, \ldots, A\_N) = \sum\_{\substack{D' \supset D \\ (\bar{D} \backslash D) \cap D' = \emptyset}} \xi(D'; A\_1, \ldots, A\_N). \tag{A10}$$

Repeating the description from the main text, the processes ζ*<sup>D</sup>* are comprised of a sum of all of the processes ξ(*D* ) (defined above, in Theorem 0) such that *D* contains all of the indices *D*, but no other indices which are part of the subset *D*¯ under consideration. These sums are non-overlapping, implying that the ζ*<sup>D</sup>* are also independent and Poisson.

Using the representation of Equation (A9), we first find that

$$\begin{aligned} \mathbb{K}(X\_{i\_1}(A\_1), \ldots, X\_{i\_k}(A\_k)) &= \mathbb{K} \left[ \sum\_{i\_1 \in D\_1 \subset \tilde{D}} \xi\_{D\_1}, \ldots, \sum\_{i\_k \in D\_k \subset \tilde{D}} \xi\_{D\_k} \right] \\ &= \sum\_{i\_1 \in D\_1 \subset D} \cdots \sum\_{i\_k \in D\_k \subset D} \mathfrak{k}[\xi\_{D\_1}, \ldots, \xi\_{D\_k}]. \end{aligned}$$

where we suppressed the dependence of the variables ζ*<sup>D</sup>* on the subsets *Ai*. The first equality in the previous equation is simply the representation defined in Equation (A10), and the second is from the multilinear property of cumulants (property (C1) in the Methods). Note that the sums are over the sets *D*1,..., *Dk* satisfying the given conditions. Recall that, by construction, the Poisson processes ζ*<sup>D</sup>* (see Equation A10) are independent for distinct marking sets. Accordingly, the cumulant κ[ζ*D*<sup>1</sup> ,...,ζ*Dk* ] is zero unless *D*<sup>1</sup> = ... = *Dk*, by property (C2) of cumulants—that is,

$$\begin{aligned} & \mathbb{K}[\mathfrak{f}\_{\mathcal{D}\_1}(A\_1, \dots, A\_N), \dots, \mathfrak{f}\_{\mathcal{D}\_k}(A\_1, \dots, A\_N)] \\ &= \begin{cases} \mathbb{K}\_k(\mathfrak{f}\_{\mathcal{D}}(A\_1, \dots, A\_N)) & D\_{\bar{j}} = \bar{D} \text{ for each } \bar{j} \\ 0 & \text{otherwise} \end{cases} \end{aligned}$$

Hence,

$$\begin{aligned} \mathbb{K}(X\_{\vec{i}\_1}(A\_1), \dots, X\_{\vec{i}\_k}(A\_k)) &= \kappa\_k(\xi\_{\vec{D}}(A\_1, \dots, A\_N)) \\ &= \mathbb{E}[\xi\_{\vec{D}}(A\_1, \dots, A\_N)], \quad \text{(A11)} \end{aligned}$$

where we have again used that all cumulants of a Poissondistributed random variable are equal to its mean.

For what follows, taking *D*0, *D* ⊂ D fixed with *D*<sup>0</sup> ⊂ *D* , the sets *M*(*D*, *D* ; *A*1,..., *AN*) with *D*<sup>0</sup> ⊂ *D* ⊂ *D* are disjoint, and

$$\cup\_{D\_0 \subset D \subset D'} M(D, D'; A\_1, \dots, A\_N) = B\_1 \times \dots \times B\_N$$

$$\text{with } B\_i = \begin{cases} A\_i, & i \in D\_0 \\ \mathbb{R}, & i \notin D\_0 \end{cases}.\quad (A12)$$

In particular, note the independence of the above union from *D* . Substituting Equation (A10) in to (A11), we have

$$\begin{split} &\mathbb{K}(\boldsymbol{X}\_{\bar{\boldsymbol{\Lambda}}}(\boldsymbol{A}\_{1}),\ldots,\boldsymbol{X}\_{\bar{\boldsymbol{\Lambda}}}(\boldsymbol{A}\_{k})) \\ &=\sum\_{\boldsymbol{D}}\sum\_{\bar{\boldsymbol{D}}}\mathbb{E}[\boldsymbol{\xi}(\boldsymbol{D};\boldsymbol{A}\_{1},\ldots,\boldsymbol{A}\_{k})] \\ &=\lambda\sum\_{\boldsymbol{D}}\sum\_{\bar{\boldsymbol{D}},\bar{\boldsymbol{D}}'\boldsymbol{D}'\geq\boldsymbol{D}}p\_{\mathcal{V}}\,\boldsymbol{\nu}(\boldsymbol{Q}\boldsymbol{\nu})(\boldsymbol{M}(\boldsymbol{D},\boldsymbol{D}';\boldsymbol{A}\_{1},\ldots,\boldsymbol{A}\_{N})) \\ &=\lambda\sum\_{\boldsymbol{D}'\supset\bar{\boldsymbol{D}}}p\_{\mathcal{V}'}\sum\_{\bar{\boldsymbol{D}}\subset\boldsymbol{D}\subset\boldsymbol{D}'}\boldsymbol{\nu}(\boldsymbol{Q}\boldsymbol{\nu})(\boldsymbol{M}(\boldsymbol{D},\boldsymbol{D}';\boldsymbol{A}\_{1},\ldots,\boldsymbol{A}\_{N})) \\ &=\lambda\sum\_{\boldsymbol{D}'\supset\bar{\boldsymbol{D}}}p\_{\mathcal{V}'}\boldsymbol{\nu}(\boldsymbol{Q}\boldsymbol{\nu})(\boldsymbol{\downarrow}\_{\bar{\boldsymbol{D}}\subset\boldsymbol{D}\subset\boldsymbol{D}'}\boldsymbol{M}(\boldsymbol{D},\boldsymbol{D}';\boldsymbol{A}\_{1},\ldots,\boldsymbol{A}\_{N})) \\ &=\lambda\sum\_{\boldsymbol{D}'\supset\bar{\boldsymbol{D}}}p\_{\mathcal{V}'}\int P(\boldsymbol{t}+\mathbf{Y}^{\bar{\boldsymbol{D}}}\in\boldsymbol{A}\_{1}\times\cdots\times\boldsymbol{A}\_{k}|\mathbf{Y}\sim Q\_{\mathcal{V}'})dt,\end{split}$$

where the third equality above is a simple exchange of the order of summation, and the fourth equality follows from the additivity of the measure ν(*QD*) over the disjoint sets *M*(*D*, *D* ; *A*1,..., *AN*). Finally, the fifth equality makes use of the independence of the set union on the fourth line from the set *D* as indicated by Equation (A12), the definition of the measure ν(*QD*) in Equation (A1) and the value of the set union given in Equation (A12).

This completes the proof of Equation (A7), and (A8) follows from the definition of the cross-cumulant density in Equation (24) of the Methods.

#### **OTHER DETAILS**

#### *Parameters for figures in the text*

*Figure 1.* For **Figure 1**, the GTaS process of size *N* = 6 consisted of only first order and population-level events which were assigned marking probabilities

$$p\_D = \begin{cases} 0.05 & D = \mathbb{D} \\ \frac{0.95}{6} & D = \{i\} \text{ for some } i \in \mathbb{D} \\ 0 & \text{otherwise} \end{cases}$$

The rate of the mother process was λ = 0.5 kHz, and the shift times for population level events were generated as in section 2.2.2 with

$$T\_i \sim \Gamma(2, 1) - 1, \quad i = 1, \dots, 6, 1$$

where the Gamma distribution has density

$$f(t|k,\theta) = \frac{1}{\Gamma(k)\theta^k} \mathbf{x}^{k-1} e^{-\frac{\theta}{\theta}} \Theta(t) .$$

*Figures 3, 4.* For **Figures 3**, **4**, the GTaS process of size *N* = 6 consisted of first and second order as well as population-level events. These events had marking probabilities

$$p\_D = \begin{cases} 0.05 & D = \mathbb{D} \\ \frac{0.95}{21} & D = \{i\}, \{i, j\} \text{ for some } i, j \in \mathbb{D} \\ 0 & \text{otherwise} \end{cases}$$

The rate of the mother process was λ = 0.5 kHz, and the shift times for population level events were generated as in section 2.2.2 with

$$T\_i \sim \text{Exp}(0.5), \quad i = 1, \ldots, 6.$$

The shift times of the second order events were drawn from an independent Gaussian distribution with each coordinate having standard deviation 5 ms.

*Figure 5.* For **Figure 5**, the network parameters were *w*in = 0.4, *w*syn = 6, τsyn = 0.1, τ<sup>d</sup> = 1.75. The GTaS input had the same size as the network (*N* = 10). As in the example of **Figures 3**, **4**, the GTaS input included first and second order as well as population level events. Here, we set

$$p\_D = \begin{cases} 0.2 & D = \mathbb{D} \\ \frac{0.95}{5} & D = \{i\}, \{i, j\} \text{ for some } i, j \in \mathbb{D} \\ 0 & \text{otherwise} \end{cases}$$

The rate of the mother process was λ = 1.5 kHz, and the shift times for population level events were generated as in section 2.2.2 with

#### *Ti* ∼ (*k*, θ), *i* = 1,..., 6.

The shift parameters *k*, θ (representing shape and scale) were determined by the given shift mean μshift and standard deviation σshift as

μshift = *k*θ, σshift = ' *k*θ2.

The shift times of the second order events were drawn from an independent Gaussian distribution with each coordinate having standard deviation 0.3 ms.

## Interareal coupling reduces encoding variability in multi-area models of spatial working memory

## *Zachary P. Kilpatrick\**

*Department of Mathematics, University of Houston, Houston, TX, USA*

#### *Edited by:*

*Ruben Moreno-Bote, Foundation Sant Joan de Deu, Spain*

#### *Reviewed by:*

*Albert Compte, Institut d'investigacions Biomèdiques August Pi i Sunyer, Spain Moritz Helias, Institute for Advanced Simulation, Germany*

#### *\*Correspondence:*

*Zachary P. Kilpatrick, Department of Mathematics, University of Houston, 651 Phillip G Hoffman Hall, Houston, 77204-3008 TX, USA e-mail: zpkilpat@math.uh.edu*

Persistent activity observed during delayed-response tasks for spatial working memory (Funahashi et al., 1989) has commonly been modeled by recurrent networks whose dynamics is described as a *bump attractor* (Compte et al., 2000). We examine the effects of interareal architecture on the dynamics of bump attractors in stochastic neural fields. Lateral inhibitory synaptic structure in each area sustains stationary bumps in the absence of noise. Introducing noise causes bumps in individual areas to wander as a Brownian walk. However, coupling multiple areas together can help reduce the variability of the bump's position in each area. To examine this quantitatively, we approximate the position of the bump in each area using a small noise expansion that also assumes weak amplitude interareal projections. Our asymptotic results show the motion of the bumps in each area can be approximated as a multivariate Ornstein–Uhlenbeck process. This shows reciprocal coupling between areas can always reduce variability, if sufficiently strong, even if one area contains much more noise than the other. However, when noise is correlated between areas, the variability-reducing effect of interareal coupling is diminished. Our results suggest that distributing spatial working memory representations across multiple, reciprocally-coupled brain areas can lead to noise cancelation that ultimately improves encoding.

**Keywords: neural field, bump attractor, spatial working memory, correlations, noise cancelation**

## **INTRODUCTION**

Persistent spiking activity has been experimentally observed in prefrontal cortex (Funahashi et al., 1989; Miller et al., 1996), parietal cortex (Colby et al., 1996; Pesaran et al., 2002), superior colliculus (Basso and Wurtz, 1997), caudate nucleus (Hikosaka et al., 1989; Levy et al., 1997), and globus pallidus (Mushiake and Strick, 1995; McNab and Klingberg, 2008) during the retention interval of visuospatial working memory tasks. Often, the subject must remember a cue's location for several seconds (Funahashi et al., 1989). Delay period neurons persistently fire in response to a preferred cue orientation as described by a bell-shaped tuning curve. Networks of these neurons, with recurrent excitation between similarly tuned neurons and broadly tuned feedback inhibition, can generate spatially localized "bumps." The position of these bumps encodes the remembered location of the cue (Compte et al., 2000).

Dynamic variability can degrade the accuracy of working memory over time though. Fluctuations in membrane voltage and synaptic conductance can lead to spontaneous spike or failure events at the edge of the bump, causing the bump to wander diffusively (Compte et al., 2000; Laing and Chow, 2001). Bump attractor networks are particularly prone to such diffusive error because bump positions lie on a line attractor where each location is neutrally stable (Amari, 1977). Interestingly, psychophysical data demonstrates spatial working memory error does scale linearly with delay time, suggesting the underlying process that degrades memory is diffusive (White et al., 1994; Ploner et al., 1998). Much theoretical work has examined network properties that might limit memory degradation. Several computational studies have explored networks built from bistable neuronal units, which sustain persistent states that are less susceptible to noise (Camperi and Wang, 1998; Koulakov et al., 2002; Goldman et al., 2003). In addition, synaptic facilitation has been shown to slow the drift of bump position due to internal variability (Itskov et al., 2011). Synaptic plasticity has also be shown to reduce diffusion of bumps in (Hansel and Mato, 2013). Finally, spatially heterogeneous recurrent excitation can reduce wandering of bumps quantizing the line attractor by stabilizing a finite set of bump locations (Kilpatrick and Ermentrout, 2013; Kilpatrick et al., 2013).

Complementary to these possibilities, we propose that interareal coupling across multiple areas of cortex may reduce error in working memory recall generated by dynamic fluctuations. Multiple representations of spatial working memory have been identified in different cortical areas (Colby et al., 1996). This distributed representation makes working memory information readily available for motor (Owen et al., 1996) and decisionmaking (Curtis and Lee, 2010) tasks. In addition, this redundancy may serve to reduce degrading effects of noise. It is known that several areas involved in oculomotor delayed response tasks are reciprocally coupled to one another (Constantinidis and Wang, 2004; Curtis, 2006). We presume the representation of a spatial working memory in a single area takes the form of a bump in a recurrently coupled neural field. Projections between areas share information about bump position across the multiarea network. Recently, (Folias and Ermentrout, 2011) showed several novel activity patterns emerge when considering neural fields with multiple areas. In addition, recent analyses of spatiotemporal dynamics of perceptual rivalry have exploited dual population neural field models, where activity in each area represents a single percept (Kilpatrick and Bressloff, 2010; Bressloff and Webber, 2012b). In this study, we focus on activity patterns where bumps in each area have positions that remain close.

Our study mostly focuses on a dual area model of spatial working memory, where each area provides a replicate representation of the presented cue. We begin by demonstrating the neutral stability of the bump position in each area, in the absence of noise and interareal projections. Upon including noise and interareal projections, we use a small-noise expansion to derive an effective stochastic differential equation for the position of the bump in each area. The effective system is a multivariate Ornstein–Uhlenbeck process, which we can analyze using diagonalization. The variance of this stochastic process decreases as the strength of connections between areas increases. Variance reduction relies on cancelations arising due to averaging noise between both areas. Thus, when noise is strongly correlated between areas, the effect of interareal coupling is negligible. Lastly, we show this analysis extends to the case of *N* (more than two) areas and that for sufficiently strong interareal connections, variance scales as 1/*N*.

## **MATERIALS AND METHODS**

#### **DUAL AREA MODEL OF SPATIAL WORKING MEMORY**

We consider a recurrently coupled model commonly used for spatial working memory (Camperi and Wang, 1998; Ermentrout, 1998) and visual processing (Ben-Yishai et al., 1995). GABAergic inhibition (Gupta et al., 2000) typically acts faster than excitatory NMDAR kinetics (Clements et al., 1992), and we assume excitatory synapses contain a mixture of AMPA and NMDA components. Thus, we make the assumption that inhibition is slaved to excitation as in (Amari, 1977). We can then describe average activity *u*1(*x*,*t*) and *u*2(*x*, *t*) of neurons in either area by the system (Ben-Yishai et al., 1995; Folias and Ermentrout, 2011; Kilpatrick and Ermentrout, 2013)

$$\mathfrak{rd}u\_1(\mathbf{x},t) = \left[-\boldsymbol{u}\_1 + \boldsymbol{w}\_{11} \ast f(\boldsymbol{u}\_1) + \varepsilon^{1/2} \boldsymbol{w}\_{12} \ast f(\boldsymbol{u}\_2)\right] \mathbf{d}t$$

$$+ \varepsilon^{1/2} \mathrm{d}W\_1(\mathbf{x},t),\tag{1a}$$

$$\operatorname{tr} \mathbf{d} \mu\_2(\mathbf{x}, t) = \left[ -\mu\_2 + \mathbf{w}\_{22} \ast f(\mu\_2) + \varepsilon^{1/2} \mathbf{w}\_{21} \ast f(\mu\_1) \right] \mathbf{d}t$$

$$+ \varepsilon^{1/2} \operatorname{d} W\_2(\mathbf{x}, t), \tag{1b}$$

where the effects of synaptic architecture are described by the convolution

$$\ast\_{\vec{\eta}k} \* f(u\_k) = \int\_{-\pi}^{\pi} \ast\_{\vec{\eta}k} (\chi - \chi) f(u\_k(y, t)) \, \mathrm{d}y,\tag{2}$$

for *j*, *k* = 1, 2, so the case *j* = *k* describes recurrent synaptic connections within a area and *j* = *k* describes synaptic connections between areas (interareal). Several fMRI and electrode recordings have revealed correlations between activity in multiple cortical areas during spatial working memory tasks (Constantinidis and Wang, 2004; Curtis, 2006), such as parietal and prefrontal cortex (Chafee and Goldman-Rakic, 1998). However, it seems the strength of these correlations is often not on the order of the activity itself (di Pellegrino and Wise, 1993). For this reason, we presume the strength of interareal connections is weak 0 ≤ ε1/<sup>2</sup> 1. Note, we could choose to make them a different magnitude than the noise, but for analytical convenience, we choose interareal connection and noise magnitude to be roughly the same. Analysis could still be performed in other cases, but it would simply be more complicated. By setting τ = 1, we can assume that time evolves on units of the excitatory synaptic time constant, which we presume to be roughly 10 ms (Häusser and Roth, 1997). The function *wjk*(*x* − *y*) describes the strength (amplitude of *wjk*) and net polarity (sign of *wjk*) of synaptic interactions from neurons with stimulus preference *y* to those with preference *x*. Following previous studies, we presume the modulation of the recurrent synaptic strength is given by the cosine

$$\mathcal{W}\_{\overline{\mathbb{M}}}(\mathbf{x} - \mathbf{y}) = \mathcal{W}(\mathbf{x} - \mathbf{y}) = \cos(\mathbf{x} - \mathbf{y}), \quad j = 1, 2, \tag{3}$$

so neurons with similar orientation preference excite one another and those with dissimilar orientation preference disynaptically inhibit one another (Ben-Yishai et al., 1995; Ferster and Miller, 2000). Lateral inhibitory type network architectures are supported by anatomical studies of the delay period neurons in prefrontal cortex (Goldman-Rakic, 1995). Our general analysis will apply to any even symmetric function of the distance *x* − *y*, but we typically compute things using (Equation 3) since it eases calculations. Finally, synaptic connections from area *k* to *j* are specified by the weight function *wjk*(*x* − *y*), and we typically take this to be the function

$$w\_{jk}(\mathbf{x} - \mathbf{y}) = E\_j + M\_j \cos(\mathbf{x} - \mathbf{y}), \quad k \neq j \tag{4}$$

where *Ej* and *Mj* specify the strength of baseline excitation and modulation projecting to the *j*th area.

Output firing rates are given by taking the gain function *f*(*u*) of the synaptic input, which we usually proscribe to be (Wilson and Cowan, 1973)

$$f(\boldsymbol{\mu}) = \frac{1}{1 + \mathbf{e}^{-\boldsymbol{\gamma}(\boldsymbol{\mu} - \boldsymbol{\theta})}},$$

and often take the high gain limit γ → ∞ for analytical convenience, so (Amari, 1977)

$$f(u) = H(u - \theta) = \begin{cases} 0 : u < \theta, \\ 1 : u \ge \theta. \end{cases} \tag{5}$$

Effects of noise are described by the small amplitude (0 ≤ ε 1) stochastic processesε1/<sup>2</sup>*Wj*(*x*, *t*)that are white in time and correlated in space so that d*Wj*(*x*, *t*) = 0 and

$$\begin{aligned} \langle \mathrm{d}W\_{\circ}(\infty,t)\mathrm{d}W\_{\circ}(\mathbb{y},s) \rangle &= \mathrm{C}\_{\circ}(\infty-\mathsf{y})\delta(t-s)\mathrm{d}t\mathrm{d}s, \\ \langle \mathrm{d}W\_{\circ}(\infty,t)\mathrm{d}W\_{\circ}(\mathbb{y},s) \rangle &= \mathrm{C}\_{\circ}(\infty-\mathsf{y})\delta(t-s)\mathrm{d}t\mathrm{d}s, \end{aligned}$$

describing both local and shared noise in either area, *j* = 1, 2 with *j* = *k*. For simplicity, we assume the local spatial correlations have a cosine profile *Cj*(*x*) = *cj* cos(*x*). We also typically assume the correlated noise component has cosine profile so *Cc*(*x*) = *cc* cos(*x*). Therefore, in the limit *cc* → 0, there are no interareal noise correlations, and in the limit *cc* → min(*c*1,*c*2), noise in each area is maximally correlated. For instance, when *c*<sup>1</sup> = *c*<sup>2</sup> = *cc* = 1, noise in each area is drawn from the same process.

#### **MULTIPLE-AREA MODEL OF SPATIAL WORKING MEMORY**

To incorporate the effects of many coupled, redundant areas encoding a spatial working memory, we consider a model with *N* areas and arbitrary synaptic architecture, given by

$$\mathfrak{rdu}\_{\dot{j}}(\mathbf{x},t) = \left[ -u\_{\dot{j}} + \varepsilon^{1/2} \sum\_{k=1}^{N} w\_{\dot{j}k} \* f(u\_k) \right] \mathbf{d}t$$

$$+ \varepsilon^{1/2} \mathbf{d}W\_{\dot{j}}(\mathbf{x},t) \tag{6}$$

where *uj* represents neural activity in the *j*th area where *j* = 1,..., *N*. As before, we set τ = 1, so each time unit corresponds to the roughly 10 ms timescale of excitatory synaptic conductance. The weight function *wjk*(*x* − *y*) represents the connection from neurons in area *k* with cue preference *y* to neurons in area *j* with cue preference *x* as described by (Equation 2). For comparison with numerical simulations, we take weight functions to be the cosines (Equation 3) and (Equation 4) and the firing rate function to be Heaviside (Equation 5). As in the dual area model, noises *Wj*(*x*,*t*) are white in time and correlated in space so that d*Wj*(*x*,*t*) = 0 and

$$
\langle \mathrm{d}W\_j(\mathbf{x}, t) \mathrm{d}W\_k(\mathbf{y}, s) \rangle = C\_{jk}(\mathbf{x} - \mathbf{y}) \delta(t - s) \mathrm{d}t \mathrm{d}s,
$$

with *j*, *k* = 1,..., *N*, where local noise correlations are described when *j* = *k* and noise correlations between areas are described when *j* = *k*. For comparison with numerical simulations, we consider *Cjj*(*x*) = cos(*x*) and *Cjk*(*x*) = *cc* cos(*x*) for all *j* = *k*.

#### **NUMERICAL SIMULATION OF STOCHASTIC DIFFERENTIAL EQUATIONS**

The spatially extended model (Equation 1) was simulated using an Euler–Maruyama method with a timestep 10<sup>−</sup>4, using Riemann integration on the convolution term with 2000 spatial grid points. To compute and compare the variances -1(*t*)<sup>2</sup> for the dual and multiple area model, we simulated the system 5000 times. The position of the bump *<sup>j</sup>* at each timestep, in each simulation, was determined by the position *x* in each area *j* at which the maximal value of *uj*(*x*, *t*) was attained. The variance was then computed at each timepoint and compared to our asymptotic calculations.

#### **RESULTS**

We will now study how interareal architecture affect the dynamics of bumps in multiple area stochastic neural fields. To start, we demonstrate that in the absence of reciprocal connectivity between areas bump attractors exist that are neutrally stable to perturbations that change their position, which has long been known (Amari, 1977; Camperi and Wang, 1998; Ermentrout, 1998). Introducing weak interareal connectivity can decrease the variability in bump position because noise that moves bumps in the opposite direction is canceled due to an attractive force introduced by connectivity. Perturbations that push bumps in the same direction are still integrated, so bumps wander due to dynamic fluctuations, but their effective variance is smaller than it would be without interareal synaptic connections. In the presence of noise correlations between areas, effects of noise cancelation are weaker since stochastic forcing in each area is increasingly similar. Our asymptotic analysis is able to explain all of this with its resulting multivariate Ornstein–Uhlenbeck process.

#### **BUMPS IN THE NOISE-FREE SYSTEM**

To begin, we seek stationary solutions to Equation (1) in the absence interareal connections and noise (ε → 0). Similar analyses have been carried out for bumps in single area populations (Ermentrout, 1998; Hansel and Sompolinsky, 1998). For this study, we assume recurrent connections are identical in all areas (*wjj* = *w*). Relaxing this assumption slightly does not dramatically alter our results. Note first stationary solutions take the form (*u*1(*x*,*t*), *u*2(*x*,*t*)) = (*U*1(*x*), *U*2(*x*)). In the absence of any interareal connections, we would not necessarily expect the peaks of these bumps to be at the same location. However, translation invariance of the system (Equation 1) allows us to set the center of both bumps to be *x* = 0 to ease calculations. The stationary bump solutions then satisfy the system

$$U\_1 = \boldsymbol{\omega} \ast f(U\_1), \quad U\_2 = \boldsymbol{\omega} \ast f(U\_2), \tag{7}$$

so the shape of each bump is only determined by the local connections *w*. For *w* given by Equation (3), since *U*1(*x*) and *U*2(*x*) are assumed to be peaked at *x* = 0, then by also assuming even symmetric solutions, we find

$$U\_1(\mathbf{x}) = \int\_{-\pi}^{\pi} \cos \chi f(U\_1(\mathbf{y})) \mathrm{d}\chi \cos \chi,$$

$$U\_2(\mathbf{x}) = \int\_{-\pi}^{\pi} \cos \chi f(U\_2(\mathbf{y})) \mathrm{d}\chi \cos \chi,\tag{8}$$

where we use cos(*x* − *y*) = cos *x* cos *y* + sin *x* sin *y*. We can more easily compute the precise shape of these bumps in case of a Heaviside firing rate function (Equation 5). There is then an identical active region of each bump such that *U*1(*x*) > θ and *U*2(*x*) > θ when *x* ∈ (−*a*, *a*), so the Equation (8) become *U*1(*x*) = *U*2(*x*) = 2 sin *a* cos *x*. Applying self-consistency, *U*1(±*a*) = *U*2(±*a*) = θ, we can generate an implicit equation for the half-widths of the bumps *a* given by 2 sin *a* cos *a* = sin(2*a*) = θ. Solving this explicitly for *a*, we find two solutions on *a* ∈ [0, π]: *au* = <sup>1</sup> <sup>2</sup> sin−<sup>1</sup> <sup>θ</sup> and *as* <sup>=</sup> <sup>π</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> sin−<sup>1</sup> <sup>θ</sup>. Only the bump associated with *as* is stable.

The bumps (Equation 7) are neutrally stable to perturbations in both directions, which can lead to encoding error once the effects of dynamic fluctuations are considered (Kilpatrick et al., 2013). Since the two areas are uncoupled, examining bumps' stability can be reduced to studying each bump's stability individually (see Kilpatrick and Ermentrout, 2013 for details). Translating a bump by a scaling of the spatial derivative *U* (*x*), we find *uj*(*x*,*t*) = *Uj*(*x*) + ε1/2*U j* (*x*)eλ*<sup>t</sup>* is associated with a zero eigenvalue (λ = 0), corresponding to neutral stability. To see this, we plug it into the corresponding bump equation of Equation (1) in the absence of noise and interareal connections and examine the linearization

$$
\lambda U\_j'(\mathbf{x}) = -U\_j'(\mathbf{x}) + \int\_{-\pi}^{\pi} \mathbf{w}(\mathbf{x} - \mathbf{y}) f'(U\_j(\mathbf{y})) U\_j'(\mathbf{y}) d\mathbf{y}. \tag{9}
$$

Note, in the limit of infinite gain γ → ∞, a sigmoid *f* becomes the Heaviside (Equation 5), and

$$f'(U(\mathbf{x})) = \frac{\mathrm{d}H(U(\mathbf{x}))}{\mathrm{d}U} = \frac{\delta(\mathbf{x} - a)}{|U'(a)|} + \frac{\delta(\mathbf{x} + a)}{|U'(a)|},$$

in the sense of the distributions. Equation (9) still hold in this case. Differentiating (Equation 7), and integrating by parts, we find

$$-\,U\_1' + \ast \ast \left[f'(U\_1)U\_1'\right] = 0,$$

$$-\,U\_2' + \ast \ast \left[f'(U\_2)U\_2'\right] = 0,\tag{10}$$

where the boundary terms vanish due to periodicity of the domain [−π, π]. Thus, the right hand side of Equation (9) vanishes, and λ = 0 is the only eigenvalue corresponding to translating perturbations. Thus, either bump (in area 1 or 2) is neutrally stable to perturbations that shifts its position in either direction (rightwards or leftwards), since the bump in each area experiences no force from the other bump.

This changes when we consider the effect of interareal connectivity. Once the two areas of Equation (1) are reciprocally coupled, bumps are stable to perturbations that translate them in opposite directions of one another (see **Figure 1**). Interareal connections act as a restoring force between the two positions of each bump. We will demonstrate this in the subsequent section by deriving a linear stochastic system for the position of either bump in the presence of small noise and weak interareal connectivity. The restorative nature of interareal connectivity is revealed by the negative eigenvalue associated with the interaction matrix (Equation 15) of our stochastic system, as shown in Equation (18).

#### **NOISE-INDUCED WANDERING OF BUMPS**

Now we consider the effects of small noise on the position of bumps in the presence of weak interareal connections. We start by presuming noise generates two distinct effects in the bumps (see **Figure 2**). First, noise causes both bumps to wander away from their initial positions, while still being pulled back into place by the bump in the other area. Bump position in areas 1 and 2 will be described by the time-varying stochastic variables -<sup>1</sup>(*t*) and -<sup>2</sup>(*t*). Second, noise causes fluctuations in the shape of both bumps, described by a correction *j*. To account for this, we consider the ansatz

$$u\_1 = U\_1(\mathbf{x} - \Delta\_1(t)) + \varepsilon^{1/2} \Phi\_1(\mathbf{x} - \Delta\_1(t), t) + \cdots$$

$$u\_2 = U\_2(\mathbf{x} - \Delta\_2(t)) + \varepsilon^{1/2} \Phi\_2(\mathbf{x} - \Delta\_2(t), t) + \cdots \tag{11}$$

Armero et al. (1998) originally developed this approach to analyze of front propagation in stochastic PDE models. In stochastic neural fields, it has been modified to analyze wave propagation (Bressloff and Webber, 2012a) and bump wandering (Kilpatrick and Ermentrout, 2013). Plugging the ansatz (Equation 11) into the system (Equation 1) and expanding in powers of ε1/2, we find that at *O*(1), we have the bump solution (Equation 7). Proceeding to *<sup>O</sup>*(ε1/2), we find

Equation (18), to perturbations that translate them in opposite directions.

$$\mathbf{d}\Phi - \mathcal{L}\Phi = \begin{pmatrix} \varepsilon^{-1/2}\dot{\Delta}\_1 U\_1' + \mathbf{d}W\_1\\ \varepsilon^{-1/2}\dot{\Delta}\_2 U\_2' + \mathbf{d}W\_2 \end{pmatrix} + \mathcal{K}(\mathbf{x}, t), \tag{12}$$

where *K*(*x*, *t*) is the 2 × 1 vector function

$$\mathcal{K}(\mathbf{x}, \mathbf{t}) = \begin{pmatrix} \boldsymbol{w}\_{12} \ast \left[ \boldsymbol{f}(\boldsymbol{U}\_{2}) + \boldsymbol{f}'(\boldsymbol{U}\_{2}) \boldsymbol{U}\_{2}' \cdot (\boldsymbol{\Delta}\_{2} - \boldsymbol{\Delta}\_{1}) \right] \mathrm{d}t \\ \boldsymbol{w}\_{21} \ast \left[ \boldsymbol{f}(\boldsymbol{U}\_{1}) + \boldsymbol{f}'(\boldsymbol{U}\_{1}) \boldsymbol{U}\_{1}' \cdot (\boldsymbol{\Delta}\_{1} - \boldsymbol{\Delta}\_{2}) \right] \mathrm{d}t \end{pmatrix}$$

*-*<sup>=</sup> (1(*x*, *<sup>t</sup>*), 2(*x*, *<sup>t</sup>*))*T*; and *<sup>L</sup>* is the linear operator

$$\mathcal{L}\mathbf{u} = \begin{pmatrix} -\imath(\mathbf{x}) + \imath(\mathbf{x}) \* \left[ f'(U\_1(\mathbf{x}))\imath(\mathbf{x}) \right] \\ -\imath(\mathbf{x}) + \imath(\mathbf{x}) \* \left[ f'(U\_2(\mathbf{x}))\imath(\mathbf{x}) \right] \end{pmatrix}$$

for any vector **u** = (*u*(*x*) *v*(*x*)) *<sup>T</sup>* of integrable functions. Note that the nullspace of *<sup>L</sup>* includes the vectors (*<sup>U</sup>* <sup>1</sup>, <sup>0</sup>)*<sup>T</sup>* and (0, *<sup>U</sup>* <sup>2</sup>)*T*, due to Equation (10). The last terms in the right hand side vector of Equation (12) arise due to interareal connections. We have linearized them under the assumption |-<sup>1</sup> − -<sup>2</sup>| remains small, so

$$\begin{aligned} f(U\_j(\mathfrak{x} + \Delta\_k - \Delta\_j)) &\approx f(U\_j(\mathfrak{x})) \\ &+ f'(U\_j(\mathfrak{x}))U\_j'(\mathfrak{x}) \cdot (\Delta\_k - \Delta\_j), \end{aligned}$$

where *j* = 1, 2 and *k* = *j*. To make sure that a solution to Equation (12) exists, we require the right hand side is orthogonal to all elements of the null space of the adjoint *<sup>L</sup>*∗, which is defined

$$\int\_{-\pi}^{\pi} \mathbf{p}^T \mathcal{L} \mathbf{u} \mathbf{d}x = \int\_{-\pi}^{\pi} \mathbf{u}^T \mathcal{L}^\* \mathbf{p} \mathbf{d}x,$$

for any integrable vector **p** = *p*(*x*) *q*(*x*) *T* . It then follows

$$\mathcal{L}^\* \mathbf{p} = \begin{pmatrix} -p(\mathbf{x}) + f'(U\_1(\mathbf{x}))[\mathbf{w}(\mathbf{x}) \* p(\mathbf{x})] \\ -q(\mathbf{x}) + f'(U\_2(\mathbf{x}))[\mathbf{w}(\mathbf{x}) \* q(\mathbf{x})] \end{pmatrix} . \tag{13}$$

We can show that the nullspace of *<sup>L</sup>*<sup>∗</sup> contains the vector **f**<sup>1</sup> = (*f* (*U*1)*U* <sup>1</sup>, <sup>0</sup>)*<sup>T</sup>* by plugging it into Equation (13) to yield

$$\mathcal{L}^\* \mathbf{f}\_1 = \begin{pmatrix} -f'(U\_1)U\_1' + f'(U\_1)[\mathbf{w} \* [f'(U\_1)U\_1']] \\ 0 \end{pmatrix} = \mathbf{0}$$

where **0** = (0, 0)*<sup>T</sup>* and we use Equation (10). We can also show the nullspace of *<sup>L</sup>*<sup>∗</sup> contains **<sup>f</sup>**<sup>2</sup> <sup>=</sup> (0,*<sup>f</sup>* (*U*2)*U* <sup>2</sup>)*<sup>T</sup>* in the same way. Thus, we can ensure Equation (12) has a solution by taking the inner product of both sides of Equation (12) with the two null vectors to yield

$$
\langle f'(U\_1)U\_1', \varepsilon^{-1/2} \dot{\Delta}\_1 U\_1' + \text{d}W\_1
$$

$$
+ \left. w\_{12} \ast \left[ f(U\_2) + f'(U\_2)U\_2' \cdot (\Delta\_2 - \Delta\_1) \right] \text{d}t \right\rangle = 0
$$

$$
\langle f'(U\_2)U\_2', \varepsilon^{-1/2} \dot{\Delta}\_2 U\_2' + \text{d}W\_2
$$

$$
+ w\_{21} \ast \left[ f(U\_1) + f'(U\_1)U\_1' \cdot (\Delta\_1 - \Delta\_2) \right] \text{d}t \rangle = 0,
$$

where we define the inner product *u*, *v* = <sup>π</sup> <sup>−</sup><sup>π</sup> *u*(*x*)*v*(*x*)d*x*. Therefore, the stochastic vector (*t*) = (-1(*t*), -<sup>2</sup>(*t*))*<sup>T</sup>* obeys the multivariate Ornstein–Uhlenbeck process

$$\mathbf{d}\mathbf{A}(t) = \mathbf{K}\mathbf{A}(t)\mathbf{d}t + \mathbf{d}\mathbf{W}(t) \tag{14}$$

where effects of interareal connections are described by the matrix

$$\mathbf{K} = \begin{pmatrix} -\mathbb{K}\_1 & \mathbb{K}\_1 \\ \mathbb{K}\_2 & -\mathbb{K}\_2 \end{pmatrix},\tag{15}$$

with

$$\kappa\_1 = \frac{\langle f'(U\_1)U\_1', \mathfrak{e}^{1/2}\mathfrak{w}\_{12} \* \left[f'(U\_2)U\_2'\right] \rangle}{\langle f'(U\_1)U\_1', U\_1' \rangle},$$

$$\kappa\_2 = \frac{\langle f'(U\_2)U\_2', \mathfrak{e}^{1/2}\mathfrak{w}\_{21} \* \left[f'(U\_1)U\_1'\right] \rangle}{\langle f'(U\_2)U\_2', U\_2' \rangle},\tag{16}$$

and (*w*<sup>12</sup> ∗ *f*(*U*2)) · *U* <sup>1</sup> and (*w*<sup>21</sup> <sup>∗</sup> *<sup>f</sup>*(*U*1)) · *<sup>U</sup>* <sup>2</sup> vanish upon integration since they are odd. Noise is described by the vector <sup>d</sup>**W**(*t*) <sup>=</sup> (d*W*1, <sup>d</sup>*W*2)*<sup>T</sup>* with

$$\mathrm{d}\mathcal{W}\_{1}(t) = -\mathrm{e}^{1/2} \frac{\langle f'(U\_{1})U\_{1}, \mathrm{d}W\_{1} \rangle}{\langle f'(U\_{1})U\_{1}', U\_{1}' \rangle},$$

$$\mathrm{d}\mathcal{W}\_{2}(t) = -\mathrm{e}^{1/2} \frac{\langle f'(U\_{2})U\_{2}, \mathrm{d}W\_{2} \rangle}{\langle f'(U\_{2})U\_{2}', U\_{2}' \rangle}.$$

The white noise term **W** has zero mean **W**(*t*) = **0** and variance described by pure diffusion so **W**(*t*)**W***T*(*t*) = **D***t* with

$$\mathbf{D} = \begin{pmatrix} D\_1 \ D\_\ell \\ D\_\ell \ D\_2 \end{pmatrix} \tag{17}$$

where the associated diffusion coefficients of the variance are

$$D\_1 = \mathrm{e} \frac{\int\_{-\pi}^{\pi} \int\_{-\pi}^{\pi} F\_1(\mathbf{x}) F\_1(\mathbf{y}) C\_1(\mathbf{x} - \mathbf{y}) \mathrm{d}\mathbf{x} \mathrm{d}\mathbf{y}}{\left[ \int\_{-\pi}^{\pi} F\_1(\mathbf{x}) U\_1'(\mathbf{x}) \mathrm{d}\mathbf{x} \right]},$$

$$D\_2 = \mathrm{e} \frac{\int\_{-\pi}^{\pi} \int\_{-\pi}^{\pi} F\_2(\mathbf{x}) F\_2(\mathbf{y}) C\_2(\mathbf{x} - \mathbf{y}) \mathrm{d}\mathbf{x} \mathrm{d}\mathbf{y}}{\left[ \int\_{-\pi}^{\pi} F\_2(\mathbf{x}) U\_2'(\mathbf{x}) \mathrm{d}\mathbf{x} \right]}.$$

where *Fj*(*x*) = *f* (*Uj*(*x*))*U j* (*x*) and covariance is described by the coefficient

$$D\_{\mathfrak{c}} = \mathfrak{e} \frac{\int\_{-\pi}^{\pi} \int\_{-\pi}^{\pi} F\_1(\mathfrak{x}) f'(U\_2(\mathfrak{y})) F\_2(\mathfrak{y}) C\_{\mathfrak{c}}(\mathfrak{x} - \mathfrak{y}) \, \mathrm{d}\mathbf{x} \mathrm{d}\mathfrak{y}}{\left[ \int\_{-\pi}^{\pi} F\_1(\mathfrak{x}) U\_1'(\mathfrak{x}) \mathrm{d}\mathfrak{x} \right] \left[ \int\_{-\pi}^{\pi} F\_2(\mathfrak{x}) U\_2'(\mathfrak{x}) \mathrm{d}\mathfrak{x} \right]}.$$

In the next section, we analyze this stochastic system (Equation 14), showing how coupling between areas can reduce the variability of the bump positions -<sup>1</sup>(*t*) and -<sup>2</sup>(*t*).

#### **EFFECT OF COUPLING ON BUMP POSITION VARIANCE**

To analyze the Ornstein–Uhlenbeck process (Equation 14), we start by diagonalizing the matrix **K** = **VV**−<sup>1</sup> using the eigenvalue decomposition

$$\Lambda = \begin{pmatrix} 0 & 0\\ 0 & -\kappa\_1 - \kappa\_2 \end{pmatrix},$$

$$\mathbf{V} = \frac{1}{\kappa\_1 + \kappa\_2} \begin{pmatrix} 1 & \kappa\_1\\ 1 & -\kappa\_2 \end{pmatrix},\tag{18}$$

$$\mathbf{V}^{-1} = \begin{pmatrix} \kappa\_2 & \kappa\_1\\ 1 & -1 \end{pmatrix},$$

such that is the diagonal matrix of eigenvalues; columns of **V** are right eigenvectors; and rows of **V**−<sup>1</sup> are left eigenvectors. Eigenvalues λ1, λ<sup>2</sup> and eigenvectors **v**1, **v**<sup>2</sup> inform us of the effect of interareal coupling on linear stability. The eigenvalue λ<sup>1</sup> = 0 corresponds to the neutral stability of the positions (-1, -2)*T* to translations in the same direction **v**<sup>1</sup> = (1, 1)*T*. The negative eigenvalue λ<sup>2</sup> = −(κ<sup>1</sup> + κ2) corresponds to the linear stability introduced by interareal connections. The positions (-1, -2)*T* revert to one another when perturbations translate them in opposite directions **v**<sup>2</sup> = (κ1, −κ2)*T*.

Diagonalizing **K** = **VV**−<sup>1</sup> using Equation (18), we can compute the mean and variance of the vector (*t*) given by Equation (14). First, note that the mean (*t*) = e**K***<sup>t</sup>* (0) (Gardiner, 2003), which we can compute

$$
\langle \mathbf{A} \rangle = \begin{pmatrix}
(\kappa\_2 + \kappa\_1 \mathbf{e}^{\lambda\_2 t}) \Delta\_1(0) + (\kappa\_1 - \kappa\_1 \mathbf{e}^{\lambda\_2 t}) \Delta\_2(0) \\
(\kappa\_2 - \kappa\_2 \mathbf{e}^{\lambda\_2 t}) \Delta\_1(0) + (\kappa\_1 + \kappa\_2 \mathbf{e}^{\lambda\_2 t}) \Delta\_2(0)
\end{pmatrix}.
$$

using the diagonalization e**K***<sup>t</sup>* = **V**e*<sup>t</sup>* **V**<sup>−</sup>1. Since λ<sup>2</sup> = −(κ<sup>1</sup> + κ2) < 0,

$$\lim\_{t \to \infty} \langle \mathbf{A}(t) \rangle = \left[ \kappa\_2 \Delta\_1(0) + \kappa\_1 \Delta\_2(0) \right] \begin{pmatrix} 1 \\ 1 \end{pmatrix}.$$

Thus, the means of -<sup>1</sup>(*t*) and -<sup>2</sup>(*t*) always relax to the same position in long time, due to the linear stability introduced by connections between areas. Under the assumption they both begin at -<sup>1</sup>(0) = -<sup>2</sup>(0) = 0, the covariance matrix is given (Gardiner, 2003)

$$
\langle \mathbf{A}(t)\mathbf{A}^T(t)\rangle = \int\_0^t \mathbf{e}^{\mathbf{K}(t-s)} \mathbf{D} \mathbf{e}^{\mathbf{K}^T(t-s)} \, \mathrm{d}s,\tag{19}
$$

where **D** is the covariance coefficient matrix of the white noise vector **W**(*t*) given by Equation (17). To compute Equation (19), we additionally need the diagonalization **K***<sup>T</sup>* = (**V**−1)*T***V***T*, so e**K***<sup>T</sup> <sup>t</sup>* = (**V**−1)*T*e*<sup>t</sup>* **V***T*. After multiplying and integrating (Equation 19), we find the elements of the covariance matrix

$$
\langle \mathbf{A}(t)\mathbf{A}^T(t)\rangle = \begin{pmatrix}
\langle \Delta\_1(t)^2 \rangle & \langle \Delta\_1(t)\Delta\_2(t) \rangle \\
\langle \Delta\_1(t)\Delta\_2(t) \rangle & \langle \Delta\_2(t)^2 \rangle
\end{pmatrix}
$$

are

$$
\langle \Delta\_1(t)^2 \rangle = D\_+ t + 2\kappa\_1 r\_1(t) + \frac{\mathbb{K}\_1}{\mathbb{K}\_2} r\_2(t) \tag{20}
$$

$$
\langle \Delta\_2(t)^2 \rangle = D\_+ t - 2\kappa\_2 r\_1(t) + \frac{\kappa\_2}{\kappa\_1} r\_2(t) \tag{21}
$$

$$
\langle \Delta\_1(t) \Delta\_2(t) \rangle = D\_+ t + (\kappa\_1 - \kappa\_2) r\_1(t) - r\_2(t),
$$

where the effective diffusion coefficients are

$$D\_{+} = \frac{\kappa\_{2}^{2}D\_{1} + 2\kappa\_{1}\kappa\_{2}D\_{c} + \kappa\_{1}^{2}D\_{2}}{(\kappa\_{1} + \kappa\_{2})^{2}},\tag{22}$$

$$D\_r = \frac{\kappa\_2 D\_1 - \kappa\_1 D\_2 + (\kappa\_1 - \kappa\_2) D\_c}{(\kappa\_1 + \kappa\_2)^2},\tag{23}$$

$$D\_{-}=\frac{D\_{1}-2D\_{\epsilon}+D\_{2}}{(\kappa\_{1}+\kappa\_{2})^{2}},\tag{24}$$

so that *D*+ and *D*− are variances of noises occurring along the eigendirections **v**<sup>1</sup> and **v**2. The functions *r*1(*t*),*r*2(*t*) are exponentially saturating

$$r\_1(t) = \frac{D\_r}{\kappa\_1 + \kappa\_2} \left[ 1 - \mathbf{e}^{-(\kappa\_1 + \kappa\_2)t} \right],$$

$$r\_2(t) = \frac{\kappa\_1 \kappa\_2 D\_-}{2(\kappa\_1 + \kappa\_2)} \left[ 1 - \mathbf{e}^{-2(\kappa\_1 + \kappa\_2)t} \right].$$

The main quantities of interest to us are the variances (Equation 20) and (Equation 21) with which we can make a few observations concerning the effect of interareal connections on the variance of bump positions.

First, note the long term variance of either bump's position -<sup>1</sup>(*t*) and -<sup>2</sup>(*t*) will be the same, described by the averaged diffusion coefficient *D*+, since

$$\lim\_{t \to \infty} \langle \Delta\_1(t)^2 \rangle = \lim\_{t \to \infty} \langle \Delta\_2(t)^2 \rangle = D\_+ t. \tag{25}$$

As the effective coupling strengths κ*<sup>j</sup>* are increased, we can expect the variances *j*(*t*)<sup>2</sup> approach these limits at faster rates since other portions of the variance decay at a rate proportional to |λ2| = κ<sup>1</sup> + κ2.

Next, we study the case, across all times *t*, where connections between areas are the same (*w*<sup>12</sup> ≡ *w*<sup>21</sup> = *wr*) and noise within areas is identical (*D*<sup>1</sup> ≡ *D*<sup>2</sup> = *Dl*), the mean reversion rates will be the same (κ<sup>1</sup> = κ<sup>2</sup> = κ) and terms in Equation (23) cancel so *Dr* = 0. Thus, the variances will be identical ( -1(*t*)<sup>2</sup> = -2(*t*)<sup>2</sup> = -(*t*)<sup>2</sup> ) and

$$
\langle \Delta(t)^2 \rangle = \frac{D\_l + D\_c}{2} t + \frac{D\_l - D\_c}{8\kappa} \left[ 1 - e^{-4\kappa t} \right].
$$

This demonstrates the way in which correlated noise (*Dc*) contributes to the variance. When noise within each area is shared (*Dc* → *Dl*), there is no benefit to interareal coupling and -(*t*)<sup>2</sup> = *Dlt* (see Kilpatrick and Ermentrout, 2013). However, when any noise is not shared between areas (*Dc* < *Dl*), variance can be reduced by increasing coupling strength κ between areas. The variance -(*t*)<sup>2</sup> is monotone decreasing in κ since

$$\frac{\partial}{\partial \kappa} \langle \Delta(t)^2 \rangle = \frac{D\_l - D\_c}{8} \frac{(1 + 4\kappa t)e^{-4\kappa t} - 1}{\kappa^2} \le 0.1$$

Inequality holds because (1 + 4κ*t*) ≤ e4<sup>κ</sup>*<sup>t</sup>* is ensured by the Taylor series expansion of e4<sup>κ</sup>*<sup>t</sup>* when κ*t* > 0.

Thus, variance is minimized in the limit

$$\lim\_{\kappa \to \infty} \langle \Delta(t)^2 \rangle = \frac{D\_l + D\_\ell}{2} t. \tag{26}$$

Therefore, strengthening interareal connections in *both* directions reduces the variance in bump position. On the other hand, in the limit of no interareal connections, we find limκ→0 -(*t*)<sup>2</sup> = *Dlt*, and the variance in a bump's position is determined entirely by local sources of noise.

Returning to asymmetric connectivity (κ<sup>1</sup> = κ2), we consider the case of feedforward connectivity from area 1 to 2 (*w*<sup>12</sup> ≡ 0), κ<sup>1</sup> = 0, so *D*<sup>+</sup> = *D*<sup>1</sup> and the formulas for the variances reduce to

$$\begin{aligned} \left< \Delta\_1(t)^2 \right> &= D\_1 t, \\ \left< \Delta\_2(t)^2 \right> &= D\_1 t + \frac{2(D\_1 - D\_c)}{\kappa\_2} \left[ 1 - \mathrm{e}^{-\kappa\_2 t} \right] \\ &+ \frac{D\_1 - 2D\_c + D\_2}{2\kappa\_2} \left[ 1 - \mathrm{e}^{-2\kappa\_2 t} \right], \end{aligned}$$

so the pure diffusive term of both variances is wholly determined by the local noise of area 1. Then, only the position of the bump in area 2 possesses additional mean-reverting fluctuations in its position, which arise from local sources of noise that force it away from the position of the bump in area 1. In this situation, the variance of the bump in area 2's position is minimized when

$$\lim\_{\kappa\_2 \to \infty} \langle \Delta\_1(t)^2 \rangle = \lim\_{\kappa\_2 \to \infty} \langle \Delta\_2(t)^2 \rangle = D\_1 t.$$

Comparing this with Equation (26) we see that, since *Dc* ≤ *D*1, the variances *j*(*t*)<sup>2</sup> will always be higher in this case than in the case of very strong reciprocal coupling between both areas. Averaging information and noise between both areas decreases positional variance as opposed to one area simply receiving noise and information from another. Similar results have been recently identified in the context of studying synchrony of reciprocally coupled noisy oscillators (Ly and Ermentrout, 2010).

One important caveat is that if area 1 has more noise than area 2, the weighting of reciprocal connectivity, κ<sup>1</sup> and κ2, should be balanced to minimize the variance. If the average diffusion coefficient *D*+ is weighted too heavily with the area having the larger variance, the area with less intrinsic noise can end up noisier than it would be without reciprocal connectivity. To see this in the extreme case feedforward coupling, note that if *D*<sup>2</sup> < *D*1, then *D*2*t* < *D*1*t* < -2(*t*)<sup>2</sup> . Thus, the variance of -<sup>2</sup>(*t*) increases as opposed to the uncoupled case where -2(*t*)<sup>2</sup> = *D*2*t*.

We now derive the optimal weighting of κ<sup>1</sup> and κ<sup>2</sup> to minimize the long term variance (Equation 25) for general asymmetric connectivity, in the absence of correlated noise *Dc* = 0. To do so, we fix κ<sup>2</sup> and find the κ<sup>1</sup> that minimizes *D*+, which happens to be

$$\kappa\_1 = \kappa\_2 \frac{D\_1}{D\_2}.$$

Thus, for identical noise *D*<sup>1</sup> = *D*2, setting κ<sup>1</sup> = κ<sup>2</sup> minimizes *D*+. For much stronger noise in area 2 (*D*<sup>2</sup> *D*1), κ<sup>1</sup> should be made relatively small. In the case of noise correlations between areas (*Dc* > 0), the optimal value of κ<sup>1</sup> that minimizes (Equation 25) is

$$\kappa\_1 = \kappa\_2 \frac{D\_1 - D\_c}{D\_2 - D\_c}$$

.

#### **CALCULATING THE STOCHASTIC MOTION OF BUMPS**

We now compute the effective variances (Equation 20) and (Equation 21), considering the specific case of Heaviside firing rate functions (Equation 5), cosine synaptic weights (Equation 3) and (Equation 4). Doing so, we can compare our asymptotic results to those computed from numerical simulations. We compute the mean reversion terms κ<sup>1</sup> and κ<sup>2</sup> by noting the spatial derivative of each bump will be *U* <sup>1</sup>(*x*) <sup>=</sup> *<sup>U</sup>* <sup>2</sup>(*x*) = −2 sin *a* sin *x* and the null vector components are

$$f'(U\_j(\mathfrak{x}))U\_j'(\mathfrak{x}) = \mathfrak{k}(\mathfrak{x} + a) - \mathfrak{k}(\mathfrak{x} - a).$$

for *j* = 1, 2. Plugging these formulae into Equation (16), we find κ<sup>1</sup> = ε1/2*M*<sup>1</sup> and κ<sup>2</sup> = ε1/2*M*2.

We first consider the case of uncorrelated noise between areas, so *cc* ≡ 0, meaning *Dc* = 0. We can compute the diffusion coefficients associated with the local noise in each area assuming cosine spatial correlations

$$D\_1 = \frac{c\_1 \varepsilon}{2 + 2\sqrt{1 - \theta^2}}, \quad D\_2 = \frac{c\_2 \varepsilon}{2 + 2\sqrt{1 - \theta^2}}.\tag{27}$$

We can then compute Equations (20) and (21) directly, for the case of no noise correlations between areas, by plugging in Equation (27).

For symmetric connections between areas, κ = ε1/2*M*<sup>1</sup> = ε1/2*M*2, as well as identical noise, *c*<sup>1</sup> = *c*<sup>2</sup> = 1, we have -1(*t*)<sup>2</sup> = -2(*t*)<sup>2</sup> = -(*t*)<sup>2</sup> and

**FIGURE 3 | Variance in the position of bumps as computed numerically (red shades) and from theory (blue shades) using Equation (28).** Coupling between areas is symmetric <sup>√</sup>ε*w*12(*x*) <sup>=</sup> <sup>√</sup>ε*w*21(*x*) <sup>=</sup> κ(cos(*x*) + 1), so -1(*t*)<sup>2</sup> = -2(*t*)<sup>2</sup> , and there is no shared noise (*cc* = 0). **(A)** The increase in variance is slower for stronger amplitudes of interareal coupling κ. Notice variance climbs sublinearly for κ > 0, due to the mean-reversion caused by coupling. **(B)** Variance drops considerably more over low values of κ that over high values. Other constituent functions and parameters are the same as in **Figure 2**.

$$\langle \Delta \left( t \right)^{2} \rangle = \frac{\mathfrak{e} t}{4(1 + \sqrt{1 - \theta^{2}})} + \frac{\mathfrak{e}}{16(1 + \sqrt{1 - \theta^{2}})\mathfrak{e}} \left[ 1 - \mathfrak{e}^{-4\kappa t} \right]. \tag{28}$$

We compare the formula (28) to results we obtain from numerical simulations in **Figure 3**, finding our asymptotic formula (28) matches quite well. In addition, we compare our results for general (possibly asymmetric) reciprocal connectivity to results from numerical simulations in **Figure 4**. We also show in **Figure 5**, as predicted, when κ<sup>2</sup> is held fixed, there is a finite optimal value of κ<sup>1</sup> that minimizes variance -1(*t*)<sup>2</sup> . Therefore, reciprocal connectivity in multi-area networks should be balanced, in order to minimize positional variance of the stored bump.

Next, we consider the case of correlated noise between areas, so *cc* > 0, meaning *Dc* > 0. In this case, the covariance terms in *D*+ and *D*− are non-zero. We can thus compute the diffusion coefficient associated with correlated noise

$$D\_{\mathfrak{c}} = \frac{c\_{\mathfrak{c}}\mathfrak{c}}{2 + 2\sqrt{1 - \theta^2}}.$$

In the case of symmetric connections between areas, κ = ε1/2*M*<sup>1</sup> = ε1/2*M*2, and identical internal noise, *c*<sup>1</sup> = *c*<sup>2</sup> = 1, we have -1(*t*)<sup>2</sup> = -2(*t*)<sup>2</sup> = -(*t*)<sup>2</sup> and

$$
\langle \Delta(t)^2 \rangle = \frac{(1+c\_c)\mathfrak{e}}{4\left(1+\sqrt{1-\theta^2}\right)}t + \frac{(1-c\_c)\mathfrak{e}}{16\left(1+\sqrt{1-\theta^2}\right)\mathfrak{e}}\left[1-\mathfrak{e}^{-4\mathfrak{e}t}\right],\tag{29}
$$

which reflects the fact that interareal connections do not reduce variability as much when there are strong noise correlations *cc* between areas. We demonstrate the accuracy of the theoretical calculation (Equation 29) as compared to numerical simulations in **Figure 6**. Numerical simulations also reveal the fact that stronger noise correlations between areas diminish the effectiveness of interareal connections at reducing bump position variance.

#### **REDUCTION OF BUMP WANDERING IN MULTIPLE AREAS**

We now examine the effect of interareal connections in networks with more than two areas using the system (Equation 6). As with the dual area network without noise or interareal connectivity, stationary bump solutions take the form (*u*1,..., *uN*) = (*U*1(*x*), . . . , *UN*(*x*)), and translation invariance let us to set all bump peaks to be located at *x* = 0 so

$$U\_j = \mathbf{w} \* f(U\_j), \quad j = 1, \ldots, N. \tag{30}$$

As before, we presume *wjj* = *w*, and relaxing this assumption does not dramatically alter our results. Linear stability analysis of bumps proceeds along similar lines to the dual area network, so we omit those calculations and summarize the results. In the absence of interareal connections, each bump is neutrally stable to perturbation in either direction. In the presence of interareal connections, all bumps are only neutrally stable to translations that move them all in the same direction. Therefore, networks with more areas provide more perturbation cancelations.

To study how noise and interareal connections affect the trajectory of bump positions, we again note noise causes all bumps to wander away from their initial position, while being pulled back into place by projections from other areas (see **Figure 7**). The position of the bump in area *j* is described by the stochastic variable *<sup>j</sup>*. Noise also causes fluctuations in the shape

**between areas is increased.** Numerically computed variance (red shades) match theoretical curves from Equation (29), blue shades, very well. Reciprocal connectivity reduces variability the most when there is no correlated noise (*cc* = 0) between areas. As the shared noise between areas increased is amplitude (*cc* = 0.5, 1), the advantage of reciprocal connectivity is diminished. When *cc* = 1 changing κ does not affect the variance -(*t*)<sup>2</sup> (see formula (29) in the limit *cc* → 1). Other constituent functions and parameters are the same as in **Figure 2**.

of both bumps, which is described by the correction term *j*. Therefore, we presume the resulting state of the system satisfies the ansatz

$$u\_j = U\_j(\mathbf{x} - \Delta\_j(t)) + \varepsilon^{1/2} \Phi\_j(\mathbf{x} - \Delta\_j(t), t) + \dotsb \ ,$$

where *j* = 1,..., *N*. Plugging this ansatz into Equation (6) and expanding in powers of <sup>ε</sup>1/2, we find that at *<sup>O</sup>*(1), we simply have the system of Equation (30) for the bump solutions. Proceeding to *<sup>O</sup>*(ε1/2), we find

$$\mathbf{d}\Phi - \mathcal{L}\Phi = \mathcal{K}(\mathbf{x}, t) + \begin{pmatrix} \varepsilon^{-1/2} \dot{\Delta}\_1 U\_1' + \mathbf{d}W\_1 \\ \vdots \\ \varepsilon^{-1/2} \dot{\Delta}\_j U\_j' + \mathbf{d}W\_j \\ \vdots \\ \varepsilon^{-1/2} \dot{\Delta}\_N U\_N' + \mathbf{d}W\_N \end{pmatrix} \tag{31}$$

where *K*(*x*, *t*) is an *N* × 1 vector whose *j*th entry is

$$\mathcal{K}\_{\mathfrak{j}} = \sum\_{k \neq j} \mathsf{w}\_{jk} \ast \left[ f(U\_k) + f'(U\_k) U\_k' \cdot (\Delta\_k - \Delta\_j) \right] \mathrm{d}t;$$

*-*<sup>=</sup> (1(*x*, *<sup>t</sup>*),··· , *N*(*x*,*t*))*T*; and *<sup>L</sup>* is the linear operator

$$\mathcal{L}\Psi = \begin{pmatrix} -\Psi\_1(\mathfrak{x}) + \boldsymbol{\varkappa} \ast \left[ f'(U\_1(\mathfrak{x})) \Psi\_1(\mathfrak{x}) \right] \\ \vdots \\ -\Psi\_N(\mathfrak{x}) + \boldsymbol{\varkappa} \ast \left[ f'(U\_N) \Psi\_N(\mathfrak{x}) \right] \end{pmatrix}$$

for any integrable vector = (1(*x*), . . . , *N*(*x*))*T*. The nullspace of *<sup>L</sup>* is spanned by the vectors (*<sup>U</sup>* <sup>1</sup>, <sup>0</sup>,..., <sup>0</sup>)*T*; (0, *U* <sup>2</sup>, <sup>0</sup>,..., <sup>0</sup>)*T*; . . . ; and (0,..., <sup>0</sup>, *<sup>U</sup> <sup>N</sup>*)*T*, which can be seen

(green) reverts to one another. We show only the evolution of activity *u*(*x*,*t*)

between *N* = 3 areas, the position of bumps 1 (magenta), 2 (cyan), and 3

all areas (colored lines) stay close together. All other parameters are as in **Figure 2**.

by differentiating (Equation 30). The last terms on the right hand side of Equation (31) arise due to interareal connections. We have linearized them under the assumption that |*<sup>k</sup>* − *<sup>j</sup>*| remains small for all *j*, *k*. To ensure a solution to Equation (31), we require the right hand side is orthogonal to all elements of the null space of the adjoint operator *<sup>L</sup>*∗. The adjoint is defined with respect to the inner product

$$\int\_{-\pi}^{\pi} \Upsilon^T \mathcal{L} \Psi \mathrm{d}x = \int\_{-\pi}^{\pi} \Psi^T \mathcal{L}^\* \Upsilon \mathrm{d}x$$

where *ϒ* = (ϒ1(*x*), . . . , ϒ*N*(*x*))*<sup>T</sup>* is integrable. It then follows

$$
\mathcal{L}^\* \Upsilon = \begin{pmatrix} -\Upsilon\_1(\mathfrak{x}) + f'(U\_1(\mathfrak{x}))[\![\![w \ast \:\Upsilon\_1] \!] \!] \\ \vdots \\ -\Upsilon\_N(\mathfrak{x}) + f'(U\_N)[\![w \ast \:\Upsilon\_N] \!] \end{pmatrix}.
$$

The nullspace of *<sup>L</sup>*<sup>∗</sup> contains the vectors (*<sup>f</sup>* (*U*1)*U* <sup>1</sup>, <sup>0</sup>,..., <sup>0</sup>)*T*; (0,*f* (*U*2)*U* <sup>2</sup>, <sup>0</sup>,... <sup>0</sup>)*T*; . . . ; and (0,..., <sup>0</sup>, *<sup>f</sup>* (*UN*)*U <sup>N</sup>*), which can be shown by applying *<sup>L</sup>*<sup>∗</sup> to them and using the formula generated by differentiating (Equation 30). Thus, to be sure (Equation 31) has a solution, we take the inner product of both sides of the equation with all *N* null vectors and isolate d*<sup>j</sup>* terms to yield the multivariate Ornstein–Uhlenbeck process

$$\mathbf{d}\mathbf{A}(t) = \mathbf{K}\mathbf{A}(t)\mathbf{d}t + \mathbf{d}\mathbf{W}(t),\tag{32}$$

where effects of interareal connections are described by the matrix **K** ∈ R*N*×*<sup>N</sup>* where the diagonal and off-diagonal entries are given

$$\mathbf{K}\_{\vec{\eta}} = -\sum\_{k \neq j} \kappa\_{jk}, \qquad \mathbf{K}\_{jk} = \kappa\_{jk}.$$

for *j* = 1,..., *N* and *k* = *j*, where

$$\kappa\_{jk} = \frac{\langle f'(U\_{\vec{l}})U\_j', \, \mathfrak{s}^{1/2} \, \omega\_{jk} \* \left[ f'(U\_k) \, U\_k' \right] \rangle}{\langle f'(U\_{\vec{l}})U\_j', \, U\_j' \rangle},$$

and we have used the fact that *wjk* ∗ *f*(*Uk*) · *U <sup>j</sup>* is an odd function for all *j*, *k*, so they vanish on integration. Stochastic forces are described by the vector

$$\mathrm{d}\mathbf{W}(t) = \begin{pmatrix} \mathrm{d}\mathcal{W}\_1(t) \\ \vdots \\ \mathrm{d}\mathcal{W}\_N(t) \end{pmatrix},$$

$$\mathrm{d}\mathcal{W}\_{\hat{\jmath}}(t) = -\varepsilon^{1/2} \frac{\langle \varprojlim(U\_{\hat{\jmath}}) U'\_{\hat{\jmath}}, \mathrm{d}\mathcal{W}\_{\hat{\jmath}} \rangle}{\langle \varprojlim(U\_{\hat{\jmath}}) U'\_{\hat{\jmath}}, \, U'\_{\hat{\jmath}} \rangle}.$$

The white noise vector **W**(*t*) has zero mean **W**(*t*) = **0**, and covariance matrix **W**(*t*)**W***T*(*t*) = **D***t* where associated coefficients of the matrix **D** are

$$D\_{j\dot{j}} = \epsilon \frac{\int\_{-\pi}^{\pi} \int\_{-\pi}^{\pi} F\_{\dot{j}}(\mathbf{x}) F\_{\dot{j}}(\mathbf{y}) C\_{\dot{j}}(\mathbf{x} - \mathbf{y}) \mathrm{d}\mathbf{x} \mathrm{d}\mathbf{y}}{\left[ \int\_{-\pi}^{\pi} F\_{\dot{j}}(\mathbf{x}) U\_{\dot{j}}'(\mathbf{x}) \mathrm{d}\mathbf{x} \right]^2}.$$

where *Fj*(*x*) = *f* (*Uj*(*x*))*U j* (*x*), which describe the variance within an area and

$$D\_{jk} = \mathrm{e} \frac{\int\_{-\pi}^{\pi} \int\_{-\pi}^{\pi} F\_j(\mathbf{x}) F\_k(\mathbf{y}) C\_{jk}(\mathbf{x} - \mathbf{y}) \mathrm{d}\mathbf{x} \mathrm{d}\mathbf{y}}{\left[ \int\_{-\pi}^{\pi} F\_j(\mathbf{x}) U\_j'(\mathbf{x}) \mathrm{d}\mathbf{x} \right] \left[ \int\_{-\pi}^{\pi} F\_k(\mathbf{x}) U\_k'(\mathbf{x}) \mathrm{d}\mathbf{x} \right]},$$

which describes covariance between areas. Since correlations are symmetric *Cjk*(*x*) = *Ckj*(*x*) for all *j*, *k*, then *Djk* = *Dkj* for all *j*, *k*.

A detailed analysis of the linear stochastic system (Equation 32) is difficult without some knowledge of the entries κ*jk*. However, we can make a few general statements. We note that all eigenvalues of **K** must have negative real part or be zero, due to the Gerschgorin circle theorem (Feingold and Varga, 1962), which states that all eigenvalues a matrix **K** must lie in one of the disks with center *Kjj* and radius *<sup>k</sup>*=*<sup>j</sup>* |*Kjk*|. Since *Kjj* = − *<sup>k</sup>*=*<sup>j</sup>* κ*jk* and *Kjk* = κ*jk*, then

$$K\_{j\bar{j}} + \sum\_{k \neq j} K\_{jk} = -\sum\_{k \neq j} \kappa\_{jk} + \sum\_{k \neq j} |\kappa\_{jk}| = 0 \tag{33}$$

is the maximal possible eigenvalue, since κ*jk* ≥ 0 for all *j*, *k*. Therefore, we expect *N* eigenpairs λ*j*, **v***<sup>k</sup>* associated with **K**, where λ*<sup>N</sup>* ≤ λ*N*−<sup>1</sup> ≤···≤ λ<sup>2</sup> ≤ λ<sup>1</sup> = 0. This means we can perform the diagonalization **K** = **VV**<sup>−</sup>1, where is the diagonal matrix of eigenvalues; columns of **V** are right eigenvectors; and rows of **V**−<sup>1</sup> are left eigenvectors. Therefore, we can decompose the stochastic solution to Equation (32), when (0) = **0** as

$$\mathbf{A}(t) = \int\_0^t \mathbf{e}^{\mathbf{K}(t-s)} \, \mathrm{d}\mathbf{W}(s) = \int\_0^t \mathbf{V} \mathbf{e}^{\Lambda(t-s)} \mathbf{V}^{-1} \mathrm{d}\mathbf{W}(s),$$

Thus, as we expect, any stochastic fluctuations in Equation (32) will be integrated or decay over time due to the exponential filters eλ*j*(*t*−*s*) . In addition, when (0) = **0** the covariance matrix can be computed as

$$
\langle \mathbf{A}(t)\mathbf{A}^T(t)\rangle = \int\_0^t \mathbf{e}^{\mathbf{K}(t-s)} \mathbf{D} \mathbf{e}^{\mathbf{K}^T(t-s)} \, \mathrm{d}s,\tag{34}
$$

where **D** is the matrix of diffusion coefficients for the covariance **W**(*t*)**W***T*(*t*) . We now compute the covariance in the specific case of symmetric connectivity.

In the case of symmetric connectivity between areas, *wjk* = *wr* for all *j* = *k*, so κ*jk* = κ for all *j* = *k*. Effects of connectivity between areas are described by the symmetric matrix

$$\mathbf{K} = \kappa J\_N - N\kappa I$$

where *JN* is the *N* × *N* matrix of ones and *I* is the identity. The eigenvalues of *JN* are *N*, with multiplicity one, and zero, with multiplicity *N* − 1. Thus, the largest eigenvalue of **K** = κ*JN* − *N*κ*I* is λ<sup>1</sup> = 0 with associated eigenvector **v**<sup>1</sup> = (1,..., 1)*T*. All other eigenvalues are λ*<sup>j</sup>* = −*N*κ for *j* ≥ 2, with associated eigenvectors **v***<sup>j</sup>* = **e**<sup>1</sup> − **e***j*, where *j* = 2,..., *N* and **e***<sup>j</sup>* is the unit vector with a one in the *j*th row and zeros elsewhere. Our diagonalization of the symmetric matrix **K** = **K***<sup>T</sup>* = **VV**−<sup>1</sup> then involves the diagonal matrix of eigenvalues λ*j*; the symmetric matrix **V** whose columns **v***<sup>j</sup>* are right eigenvectors; and the symmetric matrix **V**−<sup>1</sup> whose rows are left eigenvectors. The matrix **V**−<sup>1</sup> takes the form

$$\mathbf{V}^{-1} = \frac{1}{N} \begin{pmatrix} 1 & 1 & \cdots \\ 1 - (N - 1) & 1 & \cdots \\ & \ddots & 1 \\ 1 & \cdots & 1 & -(N - 1) \end{pmatrix}.$$

.

We can thus compute the covariance using the diagonalization e**K***<sup>t</sup>* = e**K***<sup>T</sup> <sup>t</sup>* = **V**e*<sup>t</sup>* **V**<sup>−</sup>1. In addition, we will assume each area receives noise with identical statistics (*Djj* = *Dl*) and there are identical noise correlations between areas (*Djk* = *Dc* for *j* = *k*), so **D** = (*Dl* − *Dc*)*I* + *DcJN*. Multiplying and integrating (Equation 34), we find the diagonal entries (variances) of (*t*)*T*(*t*) are

$$
\langle \Delta\_{\dot{f}}(t)^2 \rangle = \frac{D\_l + (N-1)D\_c}{N} t + \frac{(N-1)(D\_l - D\_c)}{2N^2 \kappa} \left[ 1 - e^{-2N\kappa t} \right],\tag{35}
$$

and the off-diagonal entries (true covariances) are

$$
\langle \Delta\_j(t)\Delta\_k(t)\rangle = \frac{D\_l + (N-1)D\_c}{N}t - \frac{(D\_l - D\_c)}{2N^2\kappa} \left[1 - e^{-2N\kappa t}\right].
$$

As revealed by the diffusive term in Equation (35), the system still possesses a rotational symmetry, given by the action of rotating all the bumps in the same direction. Thus, the component of noise in this direction is not damped out by coupling. Thus, note that the long term variance of any bump's position *<sup>j</sup>*(*t*) will be approximately described by the averaged diffusion

$$\lim\_{t \to \infty} \langle \Delta\_j(t)^2 \rangle = \frac{D\_l + (N-1)D\_c}{N} t.$$

As the strength of coupling κ or number of areas *N* is increased, the variances *j*(*t*)<sup>2</sup> approach this limit at a faster rate, since the other portions of variance decay at a rate proportional to |λ2| = *N*κ. Note also that in the limit *Dc* → *Dl*, effects of coupling are negligible and the long term variance of each bump is determined by the diffusion introduced by its area's internal noise.

Returning to study the full variance Equation (35) for symmetric coupling and noise, we make a few observations. First, in the limit of purely correlated noise across areas (*Dc* → *Dl*), interareal connections have no effect, and *j*(*t*)<sup>2</sup> = *Dlt* for all areas and arbitrary coupling strength. However, if there is any independent noise in each area (*Dc* < *Dl*), variance *j*(*t*)<sup>2</sup> can always be reduced further by increasing coupling strength or the number of areas since

$$\frac{d}{d\kappa} (\Delta\_{\dot{\jmath}}(t)^2) = \frac{(N-1)(D\_l - D\_c)}{2N^2} \times \frac{(1+2N\kappa)e^{-2N\kappa t}) - 1}{\kappa^2} \le 0,$$

where inequality (1 + 2*N*κ*t*) ≥ e2*N*κ*<sup>t</sup>* holds due to the Taylor expansion of e2*N*κ*<sup>t</sup>* when *N*κ*t* ≥ 0, and

$$\begin{split} \frac{\mathbf{d}}{\mathbf{d}N} \langle \Delta\_{\hat{f}}(t)^{2} \rangle &= -\frac{D\_{l} - D\_{c}}{N^{2}} \\ &+ \frac{D\_{l} - D\_{c}}{2N^{3}\kappa} \left[ 2(1 + N\kappa t) \mathbf{e}^{-2N\kappa t} - N \right] \le 0 \end{split}$$

**FIGURE 8 | (A)** Variance in the position of the bump in the first area -1(*t*)<sup>2</sup> builds up more slowly in networks with more areas *N*, and we expect similar behavior in all other areas. Fixing the strength of interareal connections, <sup>√</sup>ε*wjk* (*x*) <sup>=</sup> <sup>0</sup>.01(cos(*x*) <sup>+</sup> <sup>1</sup>) for *<sup>j</sup>* = *<sup>k</sup>*, we see that varying *<sup>N</sup>* decreases the variance *j*(*t*)<sup>2</sup> . **(B)** As in dual area networks, increasing the level of noise correlations between areas diminishes the effectiveness of interareal connectivity as a noise cancelation mechanism. Other parameters are as in **Figure 2**.

when *N* ≥ 2, since *Dl* ≥ *Dc* and due to the Taylor expansion of e2*N*κ*<sup>t</sup>* . Note, we have temporarily treated *N* as a continuous variable. Thus, we know the variance *j*(*t*)<sup>2</sup> to decrease with increasing κ and expect it to decrease with increasing *N*.

We can compute the variance *j*(*t*)<sup>2</sup> explicitly in the case of Heaviside firing rate functions (Equation 5), cosine synaptic weights (Equation 3) and (Equation 4). With these assumptions, as well as there being identical noise to all areas (*cjj* = 1 for all *j*, *cjk* = *cc* for *j* = *k*), we find

$$D\_l = \frac{\mathfrak{e}}{2 + 2\sqrt{1 - \theta^2}}, \quad D\_\mathfrak{e} = \frac{\mathfrak{e}\_\mathfrak{e}\mathfrak{e}}{2 + 2\sqrt{1 - \theta^2}},$$

so that

$$
\langle \Delta\_j(t)^2 \rangle = \frac{(1 + (N - 1)\mathfrak{c}\_c)\mathfrak{c}}{2N(1 + \sqrt{1 - \theta^2})} t + \frac{(1 - \mathfrak{c}\_c)\mathfrak{c}}{4N^2\mathfrak{c}} \left[ 1 - \mathfrak{e}^{-2N\mathfrak{c}t} \right], \tag{36}
$$

which reflects the fact that increasing the number of areas will decrease variability, when noise between areas is not too strongly correlated. We demonstrate the accuracy of this formula (36) in **Figure 8**. In numerical simulations, as predicted by our asymptotic calculations, the variance scales more slowly in time in networks with more areas.

#### **DISCUSSION**

We have shown that interareal coupling in multi-area stochastic networks can reduce the diffusive wandering of bumps. Since bump attractors offer a well studied model of persistent activity underlying spatial working memory (Compte et al., 2000), our results provide a novel suggestion for how the memory networks may reduce error. Our calculations have exploited a small noise approximation for the position of the bump in each area (Armero et al., 1998; Bressloff and Webber, 2012a). Assuming connectivity between areas is weak, we have shown the equations describing bump positions reduce to a multivariate Ornstein– Uhlenbeck process. In this formulation, we find interareal connectivity stabilizes all but one eigendirection in the space of bump position movements. Neutral stability does still exist, so stochastic forces that move bumps in all areas in the same direction do not decay away. However, sources of noise that force bumps in opposite directions create bump movements that will decay with time. Thus, interareal connectivity provides a noise cancelation

#### **REFERENCES**


mechanism that operates by stabilizing the bumps in each area to stochastic forces that push them in opposite directions. (Polk et al., 2012) recently explored noise correlation statistics in persistent state networks that reduce wandering. Our work complements these results by studying synaptic architectures that limit persistent state diffusion.

Storing spatial working memories with neural activity that spans multiple brain areas does serve other purposes than potential noise cancelation. Delayed response tasks that lead to limb motion can generate persistent activity in the parietal cortex (Colby et al., 1996; Pesaran et al., 2002) so that motor responses can be readily executed. In addition, superior colliculus demonstrates sustained activity (Basso and Wurtz, 1997), which is an area also thought to underlie directed behavioral responses. Therefore, activity is distributed between areas providing short term information storage, like prefrontal cortex (Goldman-Rakic, 1995), and those responsible for motor responses and/or behavior. An additional effect of this delegation of activity is that reciprocal connections between areas may provide noise cancelation during the storage period of working memory. However, our work suggests distributing working memory-serving neural activity between areas that receive strongly correlated noise will not provide as effective cancelation.

Our work should be contrasted with several other results concerning the stabilization of networks that encode a continuous variable (Koulakov et al., 2002; Goldman et al., 2003; Cain and Shea-Brown, 2012; Kilpatrick et al., 2013). Pure integrators, which are usually line attractors, are notoriously fragile to parametric perturbations, so (Koulakov et al., 2002) suggested they may be made more robust by considering networks that integrate in discrete bursts, rather than continuously. This can be implemented by considering a population of bistable neural units so that firing rate integration of a stimulus occurs in a stairstep fashion, rather than a ramplike fashion (see Goldman et al., 2003 for example). Related ideas were recently implemented in a bump attractor model of spatial working memory (Kilpatrick et al., 2013), but quantization was implemented with synaptic architecture rather than single neural unit properties. As opposed to the approach of quantizing the space of possible stimulus representations, we have kept the representation space a continuum. Deleterious effects of noise are reduced by considering reciprocal connectivity between encoding areas that redundantly represent the stimulus. Due to noise cancelations, the encoding error of the network decreases as the number of areas is increased.

stochastic neural fields. *SIAM J. Appl. Dyn. Syst.* 11, 708–740. doi: 10.1137/110851031


*Neurobiol.* 22, 1047–1053. doi: 10.1016/j.conb.2012.04.013


a spatial working memory task. *J. Neurophysiol.* 79, 2919–2940.


excitatory-inhibitory neural fields. *Phys. Rev. Lett.* 107:228103. doi: 10.1103/PhysRevLett.107.228103


tomography study. *Cereb. Cortex* 6, 31–38. doi: 10.1093/cercor/6.1.31


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; paper pending published: 24 May 2013; accepted: 11 June 2013; published online: 01 July 2013.*

*Citation: Kilpatrick ZP (2013) Interareal coupling reduces encoding variability in multi-area models of spatial working memory. Front. Comput. Neurosci. 7:82. doi: 10.3389/fncom.2013.00082*

*Copyright © 2013 Kilpatrick. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*