# QUANTITATIVE ANALYSIS OF NEUROANATOMY

EDITED BY: Julian M. L. Budd, Hermann Cuntz, Stephen J. Eglen and Patrik Krieger PUBLISHED IN: Frontiers in Neuroanatomy

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-796-5 DOI 10.3389/978-2-88919-796-5

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **QUANTITATIVE ANALYSIS OF NEUROANATOMY**

Topic Editors: **Julian M. L. Budd,** University of Sussex, UK **Hermann Cuntz**, Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society and Frankfurt Institute for Advanced Studies, Germany **Stephen J. Eglen**, University of Cambridge, UK **Patrik Krieger**, Ruhr University Bochum, Germany

The true revolution in the age of digital neuroanatomy is the ability to extensively quantify anatomical structures and thus investigate structure-function relationships in great detail. Large-scale projects were recently launched with the aim of providing infrastructure for brain simulations. These projects will increase the need for a precise understanding of brain structure, e.g., through statistical analysis and models.

From articles in this Research Topic, we identify three main themes that clearly illustrate how new quantitative approaches are helping advance our understanding of neural structure and function. First, new approaches to reconstruct neurons and circuits from empirical data are aiding neuroanatomical mapping. Second, methods are introduced to improve understanding of the underlying principles of organization. Third, by combining existing knowledge from lower levels of organization, models can be used to make testable predictions about a higher-level organization where knowledge is absent or poor. This latter approach is useful for examining statistical properties of specific network connectivity when current experimental methods have not yet been able to fully reconstruct whole circuits of more than a few hundred neurons.

**Citation:** Budd, J. M. L., Cuntz, H., Eglen, S. J., Krieger, P., eds. (2016). Quantitative Analysis of Neuroanatomy. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-796-5

# Table of Contents


Marcel Oberlaender


Markus Butz, Ines D. Steenbuck and Arjen van Ooyen

*217 Parametric Anatomical Modeling: a method for modeling the anatomical layout of neurons and their projections*

Martin Pyka, Sebastian Klatt and Sen Cheng

*235 Saltatory conduction in unmyelinated axons: clustering of Na+ channels on lipid rafts enables micro-saltatory conduction in C-fibers* Ali Neishabouri and A. Aldo Faisal

## Editorial: Quantitative Analysis of Neuroanatomy

#### Julian M. L. Budd<sup>1</sup> \*, Hermann Cuntz 2, 3, Stephen J. Eglen<sup>4</sup> and Patrik Krieger <sup>5</sup>

<sup>1</sup> Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK, <sup>2</sup> Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt/Main, Germany, <sup>3</sup> Frankfurt Institute for Advanced Studies, Frankfurt/Main, Germany, <sup>4</sup> Department of Applied Mathematics and Theoretical Physics, Cambridge Computational Biology Institute, University of Cambridge, Cambridge, UK, <sup>5</sup> Department of Systems Neuroscience, Medical Faculty, Ruhr University Bochum, Bochum, Germany

Keywords: neuroinformatics, connectome, mathematical modeling, statistical analysis, quantitative methods, computational neuroscience

### INTRODUCTION

RECONSTRUCTION

The true revolution in the age of digital neuroanatomy is the ability to extensively quantify anatomical structures and thus investigate structure-function relationships in great detail. Largescale projects were recently launched with the aim of providing infrastructure for brain simulations. These projects will increase the need for a precise understanding of brain structure, e.g., through statistical analysis and models.

From articles in this Research Topic, we identify three main themes that clearly illustrate how new quantitative approaches are helping advance our understanding of neural structure and function. First, new approaches to reconstruct neurons and circuits from empirical data are aiding neuroanatomical mapping. Second, methods are introduced to improve understanding of the underlying principles of organization. Third, by combining existing knowledge from lower levels of organization, models can be used to make testable predictions about a higher-level organization where knowledge is absent or poor. This latter approach is useful for examining statistical properties of specific network connectivity when current experimental methods have not yet been able to fully reconstruct whole circuits of more than a few hundred neurons.

#### Edited by:

Javier DeFelipe, Cajal Institute, Spain

#### \*Correspondence:

Julian M. L. Budd j.m.l.budd@sussex.ac.uk

Received: 13 October 2015 Accepted: 26 October 2015 Published: 11 November 2015

#### Citation:

Budd JML, Cuntz H, Eglen SJ and Krieger P (2015) Editorial: Quantitative Analysis of Neuroanatomy. Front. Neuroanat. 9:143. doi: 10.3389/fnana.2015.00143 The first theme illustrates how novel quantitative anatomical methods are reducing the time and effort taken to reconstruct neurons and networks even when data are incomplete.

Modeling the electrophysiological computations made by single neurons requires a precise reconstruction of their morphology. Blackman et al. (2014) assessed the accuracy of neuronal reconstructions of biocytin-labeled cells against reconstructions from fluorescence-based imaging using 2-photon microscopy. The authors conclude that biocytin-labeled cells are more accurate at reproducing diameter values which is in particular crucial for electrophysiology modeling, while faster fluorescence-imaging reconstruction methods are appropriate for tasks such as cell-type classification.

Identifying cell types based on accurate tracings is very time-consuming. From such tracings of microscopic images, it is known that retinal cell types can be identified by their laminar position in the network (Sümbül et al., 2014a). Taking advantage of this link between macroscopic and microscopic features, Sümbül et al. (2014b) show how automated volumetric reconstructions can be performed more rapidly using the fluorescence distribution directly obtained from the image stacks.

Integrating structural and functional data has always been central to reconstructing neural circuits. Here, Ullo et al. (2014) apply a novel method that uses structural data of in vitro neuronal networks to constrain estimates of functional connections underlying spiking data of the same network acquired with microelectrode arrays. The general idea could also apply to the more complex structures in vivo.

Our understanding of synaptic connectivity is largely based on measurements from brain slice preparations. However, some of the complex 3D geometry of neurons is unavoidably lost by slicing. This problem affects connectivity measurements, especially for long-range connections. Two articles address this problem. First, van Pelt et al. (2014) validate a statistical approach (implemented in the NETMORPH software) for inferring complete neuronal reconstructions from incomplete slice data. From these completed neuronal morphologies, the authors explain how this information can be used to estimate connectivity in large-scale networks. Second, Miner and Triesch (2014) use a computational model to propose how differences in the experimental procedure such as slice thickness and sampling area can explain differences observed in experimentally-derived results.

Abnormalities in subcellular organelle morphology and distribution characterize a variety of neuropathological conditions. To aid faster quantification, Perez et al. (2014) combine image processing methods with a supervised, multiresolution machine learning algorithm to automatically segment specific types of cellular organelles (mitochondria, lysosomes, nuclei, and nucleoli) from electron microscopy (EM) image stacks. The authors demonstrate how this approach should generalize to other organelle types and scale to large 3D datasets from serial electron microscopy.

### DESCRIPTION

The second theme in this Research Topic is that mathematical techniques are applied to describe the spatial properties of neurons and networks at a range of scales (Eglen et al., 2008; Hansson et al., 2013). Firstly, two papers used spatial statistics to examine spatial patterning within a region of neural tissue. Anton-Sanchez et al. (2014) studied the spatial distribution of synapses in layers I to VI of rat cortex in three dimensions. They found that synapses are distributed randomly, subject only to not physically overlapping with each other, although density variations were found between different layers. Moving from the distribution of synapses to distributions of neurons, Keeley and Reese (2014) proposed a new metric for evaluating the spatial regularity in two-dimensional distributions of neuronal somata. They suggest a normalization term for the widely-used regularity index measure that compares the average and the standard deviation of the distance between cells. Using various genetically inbred strains their new measure is found to be effective in detecting genes controlling spatial patterning in various genetically inbred strains using quantitative trait loci approaches.

The papers by Anton-Sanchez et al. (2014) and Keeley and Reese (2014) both analyse the distribution of synapses and neurons by treating these objects as points in space. Polavaram et al. (2014) by contrast studied the detailed morphology of individual neurons to search for key features underlying the variability in axonal and dendritic morphologies. By comparing around 5000 neuron morphologies curated from the Neuromorpho.org repository, they discovered six main morphological classes, with clustering driven mainly by biological factors, such as cell type, rather than technical factors, such as recording laboratory.

The study by Polavaram et al. (2014) highlights the difficulties in performing quantitative analysis of data recorded across many laboratories. Instead of reanalysing raw data, Beul and Hilgetag (2014) performed a detailed literature review of rodent cortex anatomy to evaluate the evidence for a universal "canonical microcircuit" (Douglas and Martin, 2004). Beul and Hilgetag suggest such a canonical microcircuit, proposed based on data from cat striate cortex, is unlikely to apply into other regions of the cortex where the granular layer (layer 4) is reduced or absent—the "agranular areas" of cortex. Instead, they propose a revised wiring diagram, with reduced inhibition between upper and deep layers in these agranular regions.

### GENERATIVE MODELING

The third and final theme that emerged from this collection of articles concerns the usage of generative methods to bridge the gap between single neurons and the overall network structure (Budd and Kisvárday, 2012). Computational models sometimes based on simple self-organizing principles exist that reproduce biology at a high level of detail both at the microscopic and macroscopic scales (Schneider et al., 2014). Complementing these approaches, large databases have emerged that embed the biological details captured at the microscopic scale into the context of larger scale structures (Chiang et al., 2011). The resulting generative approaches allow for a better intuition of the underlying principles for higher-level organization and for making predictions in cases where data are currently sparse or missing.

Egger and colleagues provide a software package NeuroNet that generates a statistical connectome model of the barrel cortex while taking into account statistical measures of synapse and soma distributions as well as a small subset of complete realistic cellular morphologies for all cell types (Egger et al., 2014). The resulting statistical connectome is in line at all scales with experimental anatomical and electrophysiological measurements in barrel cortex and indicates that cortical connections are probabilistic as a function of dendrite and axon overlap.

In the study by Egger et al. registering the reconstructed morphologies and synaptic positions to the barrel outlines is essential to generate their statistical connectome. In a similar line, the context in the circuit is the defining feature of the study by Torben-Nielsen and de Schutter that provides a software tool in Python called NeuroMaC to generate synthetic dendritic morphologies within the constraints provided by a given piece of tissue (Torben-Nielsen and De Schutter, 2014). The synthetic neuronal morphologies are grown both considering the context of large-scale circuit features and the interactions with other growing dendritic trees in the vicinity.

Acimovi ´ c and colleagues study the relationship between ´ dendritic shape and network measures using statistical models at both micro- and macroscopic scales albeit in a simplified setting that allows analytical solutions to be obtained (Acimovic et al., 2015). For analytical tractability, the small scale dendritic and axonal shapes are described by density fields that predict network motif distributions and can be compared with experimental data (Song et al., 2005; Rieubland et al., 2014).

Brain disorders are often accompanied by alterations in the higher level context in which neurons are embedded. Using a model of structural plasticity, Butz et al. (2014) analyse how such global structural changes can affect network connectivity. It is argued that local homeostatic structural plasticity mechanisms can cause changes in network topology.

With the parametric anatomical modeling (PAM) technique by Pyka et al. (2014) the precise shape of macroscale anatomical structures is incorporated into neural network models. Using empirically-based mapping rules and connectivity kernels to generate realistic pathway trajectories, spatial connectivity matrices, and axonal conduction distances, Pyka and colleagues make testable predictions for interlaminar connectivity parameter distributions.

Finally, long-range connections require fast and reliable axonal signal propagation and Neishabouri and Faisal (2014) have studied a recently observed structural formation of proteins and lipids known as lipid rafts (Pristerà et al., 2012) in thin, unmyelinated axons in the peripheral nervous

#### REFERENCES


system. Using realistic stochastic modeling of individual ion channels, Neishabouri and Faisal show that while action potential conduction in such systems was reliable, it did not offer any obvious gain in either conduction velocity or metabolic cost over a uniform ion channel density.

### CONCLUSION

With these articles we hope the reader will appreciate that understanding neural structure quantitatively and its functional relations is more than a handle turning exercise of known algorithms but a creative interdisciplinary endeavor of a variety of approaches across different species, brain regions, and spatial scales. Here, authors have managed to coax information out of noisy data obtained at the extremes of methodological resolution; they have discovered new ways of describing anatomical organization; and arrived at novel ideas that when implemented in a generative way provide an anatomical framework for large-scale network models of the brain. To maximize this creativity from limited funding, there is an obvious need to provide an environment in which individual exploration, data access, collaboration, and reproducibility is made much easier through an open and shared informatics framework (Green et al., 2015).

### ACKNOWLEDGMENTS

HC was funded by the German Federal Ministry of Education and Research grant 01GQ1406.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Budd, Cuntz, Eglen and Krieger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

#### **Arne V. Blackman<sup>1</sup> , Stefan Grabuschnig<sup>2</sup> , Robert Legenstein<sup>2</sup> and P. Jesper Sjöström1,3\***

<sup>1</sup> Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK

2 Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria

<sup>3</sup> Department of Neurology and Neurosurgery, Centre for Research in Neuroscience, The Research Institute of the McGill University Health Centre, Montreal General Hospital, Montreal, QC, Canada

#### **Edited by:**

Hermann Cuntz, Ernst Strüngmann Institute in Cooperation with Max Planck Society, Germany

#### **Reviewed by:**

Dirk Feldmeyer, RWTH Aachen University, Germany Uygar Sümbül, Massachusetts Institute of Technology, USA

#### **\*Correspondence:**

P. Jesper Sjöström, Centre for Research in Neuroscience, Department of Neurology and Neurosurgery, The Research Institute of the McGill University Health Centre, Montreal General Hospital, 1650 Cedar Ave., Room L7-225, Montreal, QC H3G 1A4, Canada

e-mail: jesper.sjostrom@mcgill.ca

Accurate 3D reconstruction of neurons is vital for applications linking anatomy and physiology. Reconstructions are typically created using Neurolucida after biocytin histology (BH). An alternative inexpensive and fast method is to use freeware such as Neuromantic to reconstruct from fluorescence imaging (FI) stacks acquired using 2-photon laser-scanning microscopy during physiological recording. We compare these two methods with respect to morphometry, cell classification, and multicompartmental modeling in the NEURON simulation environment. Quantitative morphological analysis of the same cells reconstructed using both methods reveals that whilst biocytin reconstructions facilitate tracing of more distal collaterals, both methods are comparable in representing the overall morphology: automated clustering of reconstructions from both methods successfully separates neocortical basket cells from pyramidal cells but not BH from FI reconstructions. BH reconstructions suffer more from tissue shrinkage and compression artifacts than FI reconstructions do. FI reconstructions, on the other hand, consistently have larger process diameters. Consequently, significant differences in NEURON modeling of excitatory post-synaptic potential (EPSP) forward propagation are seen between the two methods, with FI reconstructions exhibiting smaller depolarizations. Simulated action potential backpropagation (bAP), however, is indistinguishable between reconstructions obtained with the two methods. In our hands, BH reconstructions are necessary for NEURON modeling and detailed morphological tracing, and thus remain state of the art, although they are more labor intensive, more expensive, and suffer from a higher failure rate due to the occasional poor outcome of histological processing. However, for a subset of anatomical applications such as cell type identification, FI reconstructions are superior, because of indistinguishable classification performance with greater ease of use, essentially 100% success rate, and lower cost.

**Keywords: morphology, reconstruction, cell-type classification, multicompartmental modeling, interneurons, 2-photon imaging, Neurolucida, neocortex**

#### **INTRODUCTION**

Investigations of neuronal morphology have been a key feature of neuroscience since the studies of Ramón y Cajal and before (Ramón y Cajal, 1911; Senft, 2011). More recently, the drive to explain the relationship between neural structure and function has required more accurate and quantifiable models of neural morphology. Such reconstructions are vital across subfields such as cell-type identification (Ascoli et al., 2008), connectomics (Helmstaedter, 2013), computer modeling (Vetter et al., 2001; Sarid et al., 2007; Gidon and Segev, 2012) and studies of morphology itself (Cannon et al., 1999). Depending on the scope of the study, different levels of accuracy, completeness, resolution and throughput of reconstructions may be required; this is reflected in choice of imaging and reconstruction method, from electron microscopy to fluorescence imaging (FI). The development of techniques such as biocytin labeling of physiologically recorded cells, genetic labeling, 2-photon laser-scanning microscopy (2PLSM) and digital analysis have greatly aided efforts to bridge physiology and anatomy (Ascoli, 2006; Svoboda, 2011; Thomson and Armstrong, 2011). Detailed reconstructions, in combination with physiological data, have provided valuable insight into the connectivity, structure and function of neural circuits (Douglas and Martin, 2004). Increases in the number and accessibility of reconstructed neurons promise new approaches; for example, resources such as NeuroMorpho.Org allow researchers access to a large pool of reconstructions from published studies, which can be mined for further data (Ascoli et al., 2007). Use of such interlinked datasets of 3D reconstructions may be key in "big science" initiatives such as the Human Brain Project, and for any project wishing to simulate the brain (Markram, 2013).

Currently, digital reconstructions at the single-cell and microcircuit level are most often created manually using the Neurolucida system with biocytin labeled cells (Halavi et al., 2012). This said, neuronal reconstructions are increasingly based on other methods; for example fluorescent markers have been more frequently used over the past decade, and newer studies take advantage of technologies such as 2PLSM and freeware reconstruction software such as Neuromantic (Buchanan et al., 2012; Halavi et al., 2012; Myatt et al., 2012). However, the use of different reconstruction methods may yield different results. For example, BH based reconstructions can exhibit shrinkage and distortion when compared to reconstructions from 2PLSM FI (Egger et al., 2008). As such, the choice of reconstruction method could have a significant effect in itself on the results of e.g., cell classification and computer modeling. Despite this, there has been little quantification of the effects of method choice on morphological measurements and computer simulations. Here, we compare and contrast 16 reconstructions of the same 8 cells using the currently most popular method—Neurolucida reconstruction of biocytin-filled cells—and one increasing in use—reconstructions from 2PLSM FI stacks. We identify the strengths and weaknesses of either method for specific applications, and we make recommendations as to their appropriate use.

#### **METHODS**

#### **ELECTROPHYSIOLOGY/SLICE PREPARATION**

Procedures conformed to the *UK Animals (Scientific Procedures) Act 1986* and to the standards and guidelines set in place by *the Canadian Council on Animal Care*, with appropriate licenses. Mice aged P12-P20 were anesthetized with isoflurane and decapitated. Brain dissection was performed in ice-cold artificial cerebrospinal fluid (aCSF; in mM: NaCl, 125, KCl, 2.5; MgCl2, 1; NaH2PO4, 1.25; CaCl2, 2; NaHCO3, 26; Dextrose, 25; bubbled with 95% O2/5% CO2). Acute brain slices (visual cortex, near-coronal, 300µm thick) were prepared with a Leica VT1200S vibratome, and incubated in 37◦C aCSF for up to 1 h, after which they were allowed to cool to room temperature. Patch-clamp recordings were then performed in slices in the whole-cell configuration at 32-34◦C. Patch pipettes (4– 6 M) were produced with a P-1000 electrode puller (Sutter Instruments) from medium-wall capillaries, and held internal solution containing, in mM: KCl, 5; K-Gluconate, 115; K-HEPES, 10; MgATP, 4; NaGTP, 0.3; Na-Phosphocreatine, 10; for imaging/reconstruction: 10–40µM Alexa Fluor 594 and 0.5–1.0% w/v Biocytin. Internal was adjusted with KOH to pH 7.2–7.4. Primary visual cortex was targeted based on the presence of a granular layer 4. All recordings were performed in layer 5 (L5), identified by the presence of large L5 pyramidal cell (PC) somata. L5 PCs were targeted based on a thick apical dendrite; interneurons (INs) were targeted based on small, rounded somata, and were verified by fast-spiking response to rheobase current injection. PCI-6229 boards (National Instruments, Austin, TX) were used for data acquisition, with custom software (Sjöström et al., 2001) running in Igor Pro 6 (WaveMetrics Inc., Lake Oswego, OR). All recordings were made in current clamp and were filtered at 5–6 kHz and acquired at 10 kHz. Neurons were patched at 400X or 600X magnifications using a SliceScope (see below, Scientifica Ltd.) with infrared video Dodt contrast. All recordings were made in the C57BL/6 strain. Electrophysiology procedures were used solely to ascertain cell health, fill cells with dyes and verify cell-type online by inspection of spiking properties.

#### **HISTOLOGICAL PROCESSING AND NEUROLUCIDA RECONSTRUCTION**

After recording, slices were histologically processed to enable biocytin-based reconstructions. Slices were fixed in 4% paraformaldehyde/4% sucrose in phosphate-buffered saline (PBS; pH 7.2–7.4) overnight at 4◦C. The following day, slices were washed for 3 × 15 mins in PBS. Subsequently, slices were permeabilized in pre-cooled 100% methanol at −20◦C for 5–10 mins. Slices were then washed in PBS a further 3 × 10 mins. Endogenous peroxidases were blocked in 1% H2O<sup>2</sup> for 15 mins at room temp. Further 3 × 5 min PBS washes were performed. Slices were then incubated with Vectastain ABC elite kit (Vector Labs) overnight at 4◦C. The next day, slices were washed a further 3 × 10 mins in PBS, and incubated with ImmPact SG Peroxidase substrate (Vector Labs) to initiate staining reaction. The staining was stopped when developed (around 10 mins) with PBS. Further 3 × 5 min PBS washes were performed, and slices were mounted/coverslipped in Mowiol (Sigma-Aldrich). Filled neurons in mounted and coverslipped slices were reconstructed using the Neurolucida system (MBF Bioscience) with a 100× oil-immersion objective. Resulting Neurolucida DAT files were converted to SWC using the freeware NLMorphologyConverter (www.neuronland.org).

#### **2-PHOTON IMAGING AND FLUORESCENCE RECONSTRUCTION**

2PLSM (Denk et al., 1990) was performed using a workstation custom built from a SliceScope (Scientifica) microscope fitted with an MDU (Scientifica), with photomultipliers in epifluorescence configuration. Scanners were Thorlabs GVSM002/M 5-mm galvanometric mirrors. A MaiTai BB (Spectraphysics) Ti:Sa laser tuned to 800–820 nm for Alexa 594 excitation was used for excitation. Uniblitz LS6ZM2/VCM-D1 shutters were used to gate the laser, while laser power level was controlled manually using a polarizing beam splitter (Melles Griot PBSH-450-1300-100 with AHWP05M-980 half-wave plate) and monitored using a power meter (Melles Griot 13PEM001/J) after a fraction of the beam was picked off with a glass slide.

PCI-6110 boards (National Instruments) were used to acquire imaging data using custom versions of ScanImage v3.5–3.7 (Pologruto et al., 2003) in Matlab (MathWorks, Natick, MA). 3D image stacks with slices of 512 × 512 pixels were acquired at 2 ms/line with z-steps of 1–2µm. To reduce noise, each slice of the stack was an average of three frames. Resulting TIFF stacks were subsequently 3D-median filtered for inspection and for figures, but not for the reconstruction process. Stack brightness and contrast were altered in MacBiophotonics ImageJ (www. macbiophotonics.ca). Parameters were chosen to allow visualization and manual tracing of neurites with the least possible artificial enlargement of diameters. Registration of stacks was performed manually in Neuromantic (http://www.reading.ac.uk/ neuromantic) and reconstruction of neurons was performed in this environment.

#### **MORPHOLOGICAL ANALYSIS**

Images of reconstructed cells (e.g., **Figure 2**) were rendered using NEURON. Quantitative analysis of reconstructions in SWC format was performed using either L-measure (Scorcioni et al., 2008), for which details of each function are available at http://cng.gmu.edu:8080/Lm/help/index.htm, or with our custom software qMorph written in Igor Pro, previously described in Buchanan et al. (2012). In L-measure, results are for the entire cell (axons and dendrites pooled together). The L-measure function "Length" refers to average compartment length, so in **Table 1** we have referred to this as "Compartment length" for clarity. Custom software was used to create density maps, convex hulls and Sholl analysis (Sholl, 1953). Prior to analysis, morphologies were rotated slightly (16.97 ± 5.36◦ on average) to align apical dendrite/pial surface directly upward. Morphologies were aligned on the soma for all analyses.

To create density maps, each compartment of a reconstruction was represented by a 2D Gaussian aligned on its XY center, with its amplitude proportional to compartment length and its sigma fixed to 25µm. These Gaussians were summed to create a smoothed 2D projection of morphology (density map). Axon and dendrite were treated separately. Individual density maps were peak normalized to enable averaging across reconstructions. Symmetry in density maps is a result of mirroring of reconstructions, however analyses on individual cells were performed on non-mirrored data. Ensemble maps for axon and dendrite were normalized, assigned color lookup tables and merged with a logical OR (e.g., **Figure 2**). Gamma correction was used to better visualize weak densities.

Convex hulls were created for each reconstruction based on 2D projections of axonal and dendritic arbors, using the giftwrapping algorithm, also known as the Jarvis march (Jarvis, 1973). Ensemble hulls are convex hulls of all hulls of a certain type, including mirror images. Sholl analysis was performed in radial coordinates, moving in increasing 6.5µm steps from *r* = 0, with the origin centered on the cell soma, and counting the number of compartments crossing a given radius. Sholl diagrams are averaged without normalization. Maximum value is the maximum number of crossings, whilst critical radius is the radius at which the maximum number of crossings was found. Maximum Sholl radius is the furthest radius with at least one crossing (the enclosing radius).

Process diameters were calculated using L-measure to obtain averages of cells (axon and dendrite measured separately). Diameters of visually matched locations between reconstructions of the same cells with different methods were measured manually in Neuromantic.

#### **STATISTICAL COMPARISONS**

Results are reported as mean ± s.e.m. unless otherwise stated. Comparisons were made using paired samples *t*-test for equal means, unless otherwise stated. No corrections for multiple comparisons were applied, as for the purposes of this paper we feel it is more important and preferable to highlight potential differences between methods than to overlook them. Statistical tests were carried out in Igor Pro, Microsoft Excel and/or JMP (SAS). At least three animals were used for each group analyzed, and *n*cell = *n*animal (Aarts et al., 2014). Significance levels *p* < 0.05, *p* < 0.01 and *p* < 0.001 are denoted by one, two, and three stars respectively.

#### **DATA CLUSTERING**

Multidimensional hierarchical data clustering was performed on the first two principal components of standardized data in JMP using Ward's method and the Euclidean distance as linkage metric; or normal mixtures iterative clustering, which is based on the expectation-maximization algorithm (http://www.jmp.com/support/help/Normal\_Mixtures.shtml).

Prior to clustering, we performed principal component analysis on all variables listed in **Table 1**. In order to achieve fair weighting of morphological features in clustering, we identified pairs of variables in the resulting correlation matrix where *r* > 0.8, and excluded the variable which had the lower loading value in PCA (Tsiola et al., 2003). Clustering of morphologies was thus performed on the first 2 principal components of 27 measured parameters. From L-measure, we used Diameter, Length, PathDistance, Branch\_Order, Taper\_1, Contraction, Daughter\_Ratio, Parent\_Daughter\_Ratio, Partition\_asymmetry, Bif\_ampl\_local, Helix, Fractal\_Dim. From our custom software qMorph, we used distance to center of axonal cloud, angle to center of axonal cloud, most distal axonal compartment x-coordinate, most distal axonal compartment y-coordinate, most distal dendritic compartment x-coordinate, angle to most distal dendritic compartment, axon hull x-center, axon hull width, dendritic hull x-center, dendritic hull y-center, dendritic hull width, axon Sholl max value, axon Sholl critical radius, dendrite Sholl critical radius, axon Sholl maximum/enclosing radius.

#### **SIMULATIONS**

All Simulations were performed in NEURON 7.2 (Hines and Carnevale, 1997). Plots were created using a combination of Matlab and Igor Pro.

To explore the differences in the electrical behavior of FI and BH reconstructions of the same original cell, we studied active back propagation of APs and passive forward propagation of EPSPs along the apical dendrite of NEURON models based on these reconstructions. During a simulation, the peak potential at every segment along a path from the soma to the apical tuft was recorded and was plotted against the distance of the recording site from the origination point of the apical dendrite. The distance was measured as the Euclidean distance between the two points in space, and a path from soma to the tip was picked by hand.

#### **Model initialization**

In order to build a model from the reconstructions, the active and passive membrane properties from the model of Stuart and Häusser (2001) were used. The passive membrane properties were initialized with specific membrane and axial resistivities R<sup>M</sup> of 12,000 cm<sup>2</sup> , R<sup>A</sup> of 150 cm and a specific membrane capacitance C<sup>M</sup> of 1 <sup>µ</sup>*<sup>F</sup> cm*<sup>2</sup> . Active membrane conductances constituted by mechanisms for fast sodium and slow potassium currents were uniformly distributed over the membrane with *<sup>g</sup>*Na <sup>=</sup> <sup>30</sup> *pS* <sup>µ</sup>*m*<sup>2</sup> and *<sup>g</sup>*Kv <sup>=</sup> <sup>50</sup> *pS* µ*m*<sup>2</sup> in dendrites and at the soma. To avoid end-effects the sodium conductance in basal dendrites

#### **Table 1 | Morphometry.**


Morphological measures used for comparison of reconstruction methods (**Figures 2**, **3**) in pyramidal and basket cells. Measures were generated using either inhouse software (see Methods) or L-measure (listed as function names from the software to reduce ambiguity). Comparisons with significance levels p < 0.05 and p < 0.01 are highlighted in green and yellow respectively. (–) indicates unit less measures such as counts and ratios.

and apical oblique dendrites was reduced to *<sup>g</sup>*Na = 8 *pS* µ*m*<sup>2</sup> . In dendrites, all conductances and the capacitance were multiplied by 2 to account for spines. The axon was treated as completely myelinated without spike initiating regions with *<sup>g</sup>*Na <sup>=</sup> <sup>10</sup> *pS* µ*m*<sup>2</sup> , and *g*Kv = 0 *pS* <sup>µ</sup>*m*<sup>2</sup> and a reduced C<sup>M</sup> of 0.04 <sup>µ</sup>*<sup>F</sup> cm*<sup>2</sup> .

#### **Backpropagation of APs**

To standardize across reconstructions, a rheobase spike was generated and recorded. All backpropagation simulations were performed by replaying this spike at the soma. For spike generation, a spike-initiating hillock was added to the reconstruction PC FI 2 (20130205) with *<sup>g</sup>*Na <sup>=</sup> <sup>10000</sup> *pS* <sup>µ</sup>*m*<sup>2</sup> and *<sup>g</sup>*Kv <sup>=</sup> <sup>500</sup> *pS* µ*m*<sup>2</sup> . The rheobase spike was then triggered by injection of a 5 ms current of 1.0215 nA.

#### **Forward propagation of EPSPs**

For EPSP generation, an alpha-synapse with a τrise of 0.3 ms, a τfall of 3 ms and a gmax of 5 nS was used. This was inserted at a dendritic location with prominent surrounding morphology, to ensure that it could reliably be positioned at an identical location for both the BH and the FI reconstructions of the same neuron.

#### **Length constants**

Length constants were determined by injecting a 300-ms-long constant current of 50 pA at matched locations (as with the EPSPs above). When steady state was reached (we arbitrarily picked *t* = 149 ms), the membrane voltage was plotted vs. distance from injection site. Length constants λ, were measured by fitting exponentials to these plots in Igor PRO.

#### **RESULTS**

#### **MORPHOMETRIC COMPARISON OF RECONSTRUCTION METHODS**

Neocortical L5 pyramidal cells (PCs) and basket cells (BCs) were targeted based on soma shape and were subsequently identified by spiking properties (data not shown) and morphology. We filled cells with both biocytin and Alexa 594, and reconstructed using Neurolucida software on BH tissue and Neuromantic software on 2PLSM FI stacks, resulting in two morphological reconstructions of each cell (see Methods and **Figure 1**). Subjectively, reconstructions appeared similar with both methods, although BH allowed tracing of horizontal axonal/dendritic collaterals for longer distances (**Figure 2A**), perhaps because thin distal processes dye-filled so slowly that BH but not FI distal tips were readily visualized. In addition, BH involves an amplification step that further improves visualization of poorly labeled processes. PCs were identified by their characteristic apical dendrite, and their axons were largely confined to L5 with the occasional ascending process. BCs were characterized by axonal and dendritic arbors ramifying extensively within L5, with few processes venturing outside this layer.

We quantitatively analyzed morphology with L-measure, a freely available software for morphological analysis (Scorcioni et al., 2008). Comparison of measurements for entire cells (see **Table 1**) revealed a wider arbor width for BH reconstructions of BCs (*p* < 0.05), and smaller depth (*p* < 0.01) and somatic surface area (*p* < 0.05) for BH reconstructions of PCs (**Table 1**). Whilst a wider arbor width for BH BC reconstructions likely reflects the greater ease of tracing distal collaterals with this method, the smaller depth and somatic surface area of BH PC reconstructions are likely due to shrinkage during fixation and differences in software soma modeling, respectively.

Examination of branch-level and bifurcation-level measures (**Table 1**, see Methods), using L-measure highlighted the general similarity of reconstructions, as most metrics were indistinguishable (**Table 1**). That said, parent-daughter ratio, defined as the ratio of process diameter between daughter and parent at each bifurcation point, was significantly lower for BH PC reconstructions (*p* < 0.05). Local bifurcation amplitude (angle between two new branches at a bifurcation) was also significantly larger for BH BC reconstructions (*p* < 0.05; **Table 1**).

When quantifying morphology, it is often useful to separately analyze axonal and dendritic segments. For example, axonal morphology is thought to be more important than dendritic morphology for IN classification (Markram et al., 2004; Ascoli et al., 2008; DeFelipe et al., 2013). As previously described (Buchanan

et al., 2012), we also analyzed morphology by comparison of axonal and dendritic convex hulls and density maps using custom software (**Figure 2B**; **Table 1**; see Methods). Whilst reconstruction with BH allowed tracing of more distal collaterals, reflected by significant differences in mean axon hull width (*p* < 0.01) and distance from soma to the furthest axonal compartment (*p* < 0.05) for PCs, and both axonal (*p* < 0.05) and dendritic (*p* < 0.05) hull width and distance from soma to the furthest dendritic compartment (*p* < 0.05) for BCs, most other measures derived this way were indistinguishable between reconstruction methods (for full detail see **Table 1**). This suggests that FI and BH may perform similarly for cell classification and morphometry that does not rely chiefly on thin distal tips of arborizations. In addition, indistinguishable measures included the relative density and hull centers of axonal and dendritic arbors, indicating that both methods are in fact comparable in revealing the majority of axonal and dendritic morphology.

Angle to the center of the dendritic density cloud was significantly but only slightly different between FI and BH reconstructions for PCs (*p* < 0.05; **Table 1**), but not for BCs. Although significant, this may be a spurious finding, since reconstructions were manually aligned to point straight up, which may introduce human error and a bias. However, this remained significant even when we tried to carefully account for any bias, so we report this as is.

Sholl analysis (Sholl, 1953) is a classical quantitative method used to analyze neuronal morphology based upon the number of crossings made by processes over usually soma-centered concentric circles of increasing radius. Sholl analysis indicated that both methods yielded largely similar reconstructions (**Figure 2C**); differences in maximum value and critical radius (see Methods) were not significant for either cell type (**Table 1**). However, the furthest radius with at least one crossing was larger with BH for axon but not dendrite in PCs, and dendrite but not axon for BCs (**Table 1**). This probably reflects both the capacity to visualize more distal processes with BH, and shrinkage or compression of BH-processed slices after coverslipping. Compression results in smaller depth of BH reconstructions and to expansion in the XY axes (see **Table 1**).

Overall, whilst BH allows better reconstruction of very distal processes, seen in e.g., wider arbor extents and maximum Sholl radii, reconstructions were largely indistinguishable between methods (**Table 1**), indicating that both methods are suitable for analysis of morphology. Although FI/2PLSM based reconstructions are limited by the extent of imaging captured, it may be possible to recover more distal processes using this method by capturing images from a wider area, even if there does not appear to be fluorescence signal when viewing online (see area imaged for FI reconstructions, **Figure 2B**).

When investigating neural circuits, it is vital to properly identify anatomical cell type as, for example, synaptic features may differ widely at connections between different cells (Ascoli et al., 2008; Blackman et al., 2013; DeFelipe et al., 2013). We explored the impact of reconstruction method on cell classification using multidimensional hierarchical clustering of all reconstructions from both methods (see Methods and **Figures 3A,B**). This approach independently segregated reconstructions into two major clusters, each containing exclusively BCs or PCs. Within the two BC and PC clusters, however, reconstructions from BH or FI did not further segregate into distinct sub-clusters. Taken together, these results suggest that both reconstruction methods produce enough detail to reliably classify different neuronal types, while at the same being so similar in terms of outcome that the choice of method does not impact cell classification appreciably. This said, a pair of reconstructions of the same cell using BH and FI formed a nearest-linkage neighbor in only one case (BC 2; **Figure 3A**), highlighting that whilst classification performance was similar between methods, there were still appreciable morphological differences between reconstructions of the same cell completed with BH or FI. Clustering of all reconstructions into two groups using the expectation-maximization algorithm (normal mixtures clustering in JMP) also separated PCs and BCs with no errors (**Figure 3B**). Whilst clustering of morphologies resulted in two major cell classes here, it should be noted that both PCs (Groh et al., 2010) and BCs (Markram et al., 2004) may consist of further subtypes.

#### **RECONSTRUCTIONS FROM 2PLSM HAVE LARGER PROCESS DIAMETER**

When creating 3D reconstructions of neurons to be used for e.g., computer modeling, it is important for these to be as accurate as possible, as even quite subtle structural differences can have quite dramatic effects on biophysical properties (Vetter et al., 2001; Schaefer et al., 2003). For example, differences in process diameter between reconstructions will affect membrane surface area, process volume, number of ion channels, axial resistance, length constant, and in turn propagation of electrical signals. Changes in laser power during acquisition of fluorescence images and image processing prior to reconstruction when using 2PLSM/FI may have affected reconstructed process diameter. Comparison of reconstructions based on FI or biocytin histology (BH) revealed a significant trend for those created using 2PLSM/FI to have larger process diameter than those based on BH (**Figure 4**).

We compared differences in average process diameter between the two reconstruction methods using L-measure. Diameter was consistently significantly larger for reconstructions made using FI for axonal and dendritic compartments of both cell types (**Figure 4B**). Differences in process diameter between reconstruction methods were investigated in more detail by comparing the diameter of many individually matched compartments for each PC dendrite using manual measurements (**Figures 4C,D**). All but

independently segregated all reconstructed cells into two major clusters, each exclusively containing PCs or BCs. Further subclusters did not segregate reconstructions from FI or BH. Taken together, this indicates their similarity for morphological cell classification. Each label on the y-axis is a

separated BCs from PCs. Crosses denote BCs, and dots PCs. As in **(A)**, coloring indicates reconstruction method (blue or yellow = FI; green or red = BH). Ovals denote the region where 90% of observations in each cluster are expected to fall.

axon 1.20 ± 0.14µm vs. 0.67 ± 0.04µm, p < 0.05; dendrite 1.65 ± 0.17µm vs. 0.84 ± 0.03µm, p < 0.01) and BCs (n = 3 cell pairs; axon 0.89 ± 0.04µm vs. 0.55 ± 0.04µm, p < 0.05; dendrite 1.40 ± 0.16µm vs. 0.71 ± 0.03µm, p < 0.05). Average diameters for entire cells are found in **Table 1**. **(C)** Differences in diameter for manually matched dendritic locations using either method (see **Figure 3A**). All but one

one of the matched segments had a larger diameter when reconstructed from 2PLSM stacks (*n* = 25; *n* = 5 cells; FI vs. BH, 1.80 ± 0.15µm vs. 0.91 ± 0.09µm; *p* < 0.001). Taken together, these results show that FI reconstructions consistently exaggerate compartment diameter, on average and also typically for individual compartments.

#### **EFFECT OF RECONSTRUCTION METHOD ON SINGLE-CELL MODELING**

A major use of 3D reconstructions of neurons is in single-cell and network modeling, using software such as NEURON (For review, see Brette et al., 2007). Differences between reconstruction methods, particularly in features such as process diameter, are expected to have considerable effects on the results of such modeling (Vetter et al., 2001; Tsay and Yuste, 2002; Acker and White, 2007). Complete morphological reconstruction may be vital for accurate simulation of features such as PC coincidence detection (Schaefer et al., 2003) or responses to stimulation such as whisker deflection (Sarid et al., 2013). To quantify these effects, we examined the effect of reconstruction method choice on single-cell modeling of action potential backpropagation (bAP) and EPSP overestimates diameters more for larger diameters. <sup>∗</sup>p < 0.05, ∗∗p < 0.01. forward propagation in the NEURON simulation environment (**Figure 5**), comparing models of the same cells based on mor-

that FI reconstructions consistently suffer from exaggerated process diameters. The upper and lower dotted lines indicate ±2SD and the 95% limits of agreement (SD = 0.64µm). Linear regression (not shown) identified a significant slope (0.56; p < 0.05), showing that FI reconstruction

phologies generated using either BH or FI. To investigate bAP simulations, we generated a rheobase spike at the soma of each model and recorded the resulting peak potentials in the apical dendrite at given distances away from the soma (Methods, **Figures 5A,B**). Interestingly, whilst models based on FI reconstructions exhibited a small trend for smaller depolarizations, this was indistinguishable between methods at all locations (**Figure 5B**). The effect of reconstruction method on modeling may thus be subtle and dependent on which aspects one is investigating. We should also point out that these findings might depend on the choice of model parameters; modifying the degree of dendritic excitability, for example, is not unlikely to bring out other differences.

Next, we investigated simulation of EPSP forward propagation. Here, we generated simulated EPSPs using the same parameters (see Methods) at matched locations on FI and BH reconstructions of the same cells, and measured resulting peak depolarizations across the morphology. Ensemble averaging of

depolarization and local differences for the FI reconstruction. Arrows indicate the location of simulated synapses. Distal branches of morphologies are slightly cropped for clarity. **(B)** Ensemble averages of

conclude that FI reconstructions are generally not suitable for

multicompartmental computer modeling.

simulated synapse. Region of significance is indicated by black bar (paired

results revealed that simulations in FI reconstructions yielded smaller depolarizations (**Figure 5C**; areas where *p* < 0.05 indicated by bar). As EPSPs were generated at different distances from the soma in different cells, normalization of results to the somato-synaptic distance revealed the differences better, with FI reconstructions generating considerably smaller depolarizations (peak potential; BH vs. FI; 15.65 ± 1.63 mV vs. 6.27 ± 0.33 mV; *p* < 0.01; other areas of significance where *p* < 0.05 indicated by black bar in **Figure 5C**).

As systematic differences in process diameter may be expected to affect the spatial rate of voltage decay for both bAPs and EPSPs (Segev, 1998), we measured the length constant in each reconstruction (see Methods) and compared this between BH and FI. Surprisingly, the length constant did not vary significantly between methods (λBH = 308.518 ± 46.319µm vs. λFI = 321.128 ± 65.185µm, *p* = 0.80), despite FI systematically overestimating process diameters (see above). Presumably, this was because of other non-systematic differences between reconstruction methods and general variability that overshadowed the effect of diameter on length constant.

Overall, whilst differences in simulated bAPs were marked but not systematically different, there was a dramatic and consistent difference between methods in EPSP simulation, with FI reconstructions exhibiting smaller depolarizations in response to the same simulated synaptic stimulation. We therefore

#### **DISCUSSION**

t-test, p < 0.05).

In this paper, we have quantified the effect of reconstruction method choice on morphometry and computer modeling by direct comparison of cells reconstructed using two commonly used methods. The one method, BH, is well established since many years back and is widely considered state of the art, for several good reasons. The other method, FI, is rapidly gaining in popularity, which is why it is important to know its pitfalls as well as its advantages in comparison to BH. By comparing these two methods, we have identified strengths and limitations of either method for such purposes, and we can in turn make recommendations as to the suitability of each for different applications. According to our results, FI is as a rule of thumb preferable for cell-type classification scenarios, whilst BH is superior for multicompartmental modeling and other applications requiring highly detailed tracing of thin arborizations with accurate diameter measurements.

#### **QUANTITATIVE MORPHOLOGICAL ANALYSIS AND CELL-TYPE CLASSIFICATION**

One of the most common uses of 3D reconstructions such as those compared here is analysis of morphology, particularly in order to establish cell type. For example, axonal morphology is often cited as the most important determinant of cortical IN cell type (Markram et al., 2004; Wang et al., 2004; Toledo-Rodriguez et al., 2005; Ascoli et al., 2008). Increasingly, many properties of neural circuits such as synapse type and ion channel expression are found to be dependent on anatomical cell class (Blackman et al., 2013); therefore it is vital to accurately verify morphological type in any study where there may be cell-type-specific differences.

Our results indicate that FI and BH reconstructions are equal in providing an accurate representation of local morphology, with most morphological measures being indistinguishable between the two (**Table 1**). Unsupervised clustering results in successful separation of cell type in both methods (**Figure 3**). Whilst both methods appear to generate equivalent results for this purpose, FI reconstructions may confer a number of benefits that make them preferable in cell classification. Firstly, FI reconstructions, due to the ability to monitor FI online during electrophysiology experiments, effectively have a 100% yield for most purposes, as compared to the 50-80% yield of BH in our hands, which is dependent on post-recording histology (**Figure 1**). The lower yield of BH is highly dependent on the experimenter's experience and training with this state-of-the-art method, as well as on other factors such as cell type and age of the brain tissue. Although the yield can clearly be improved with experience and training, it will never reach 100%. FI-based reconstructions, however, are in our hands quite straightforward and are in fact an excellent training opportunity for volunteering undergraduate students who are just starting working in a lab. In addition, with FI, cell type may also be subjectively identified online whilst recording, increasing the throughput of electrophysiology experiments targeting a particular cell type. Furthermore, the unwanted distortions and shrinkage seen with BH reconstructions are avoided when using FI.

With all methodological comparisons, it is important to consider the costs involved. As FI reconstructions do not require histological processing or a dedicated setup for reconstruction, and image stacks can be acquired at the same time as electrophysiological recording, the time to generate a single reconstruction is much less than with BH, which can translate into saving running costs. Furthermore, FI reconstructions require less auxiliary equipment and use of consumables than BH reconstructions, resulting in lower cost per reconstruction. FI reconstructions do, however, require the initial high setup cost of the laser-scanning microscope, so this reasoning only applies for labs that already have access to 2PLSM or to confocal imaging. In our eyes, these benefits, together with the almost equal performance of FI and BH in revealing local morphology, make FI the preferred method in studies focusing on cell-type classification. This said, some cell types may extend over much larger areas than those described here (Lichtman and Denk, 2011). Whilst increasing fluorophore concentration, fill time and area imaged may increase the visible extent of FI reconstructions (see **Figure 2**), our results show that BH reconstructions reveal more distal processes (**Table 1**; e.g., hull width, max. Sholl radius, etc.), and therefore may be preferable if reconstruction over large distances is required. Even so, FI of axonal arborizations ranging several millimeters has successfully been carried out (see for example Pressler and Strowbridge, 2006; Williams et al., 2007), suggesting that this problem is possible to overcome by fine-tuning the FI reconstruction method. Mapping connectivity on larger scales using FI may be possible with whole-brain methods such as serial two-photon tomography (Ragan et al., 2012; Osten and Margrie, 2013).

#### **MULTICOMPARTMENTAL COMPUTER MODELING**

Another major use of 3D reconstructions is in single-cell multicompartmental modeling. In this application, accuracy is paramount; even subtle differences in morphology may have considerable effects on both passive and active properties of neurons and models (Segev et al., 1995; Vetter et al., 2001). For example, dendritic morphology is thought to play a key role in the level of coupling in cortical pyramidal cell coincidence detection (Schaefer et al., 2003). Our results reveal that differences in morphology resulting from reconstruction method choice alone have large and significant effects on simulation of EPSP propagation. FI reconstructions consistently exhibit much smaller depolarizations than BH reconstructions (**Figure 5**).

The major contributing factor to these results is likely the large differences in dendritic diameter obtained between the two methods. Differences in measured process diameter alone would affect models of e.g., synaptic efficacy (Holmes, 1989) and voltage attenuation (Stuart and Spruston, 1998). Our results show that FI reconstructions consistently and significantly have larger process diameters, both on average and for matched compartments. As both BH and FI methods allow visualization of spines and axonal varicosities, a lack of spine detection is unlikely to be the cause of the larger diameters seen in FI. This finding is not unexpected, however, since increasing the laser power during acquisition of 2PLSM fluorescence images typically results in an apparent thickening of dendrites and axons. Neurite diameters obtained with 2PLSM are also subjectively affected by brightness/contrast settings during the reconstruction procedure, with a tendency for broadening of diameters when adjusting look-up tables to compensate for weak fluorescence. This problem seems much smaller with BH, presumably because the contrast produced with the histological amplification process is generally quite sufficient in and of itself. Due to the wavelength used, the theoretical resolution limit of light microscopes is also better than that of 2PLSM. This difference is compounded by the typical usage of high numerical aperture oil-immersion objectives with BH.

Although we have not tested this, we suspect that the neurite thickening problem might be considerably smaller with confocal microscopy than with 2PLSM, since its resolution limit is much better. It would be interesting to see a side-by-side comparison of FI reconstructions from 2PLSM and confocal microscopy stacks.

As diameter appears to be the main contributing factor for differences in computer modeling between FI and BH reconstructions of the same cells, it may be possible to correct for this, assuming that the differences are systematic. Preliminary results using a correction factor determined from differences in diameter of matched compartments suggest that it is possible to recover EPSP amplitudes in FI reconstructions to the levels seen with BH by manipulating diameter alone (data not shown). However, whilst it may be possible to determine specific correction parameters for a particular setup and experimenter by directly comparing diameter differences, these parameters may not be the same in alternate situations. For example, wide inter-experimenter differences in diameter and simulation results have been described when reconstructing from multiphoton data (Losavio et al., 2008). Another important factor to consider is that without technically demanding dendritic recordings, it is difficult to ascertain completely the ground truth, i.e., which of BH or FI is closer to reality. This said, the higher resolution and better signal-to-noise ratio found with BH justifies its position as a gold standard and as such BH reconstructions can be considered a benchmark or gold standard.

Because of the factors described above, and the large differences between EPSP modeling with FI and BH reconstructions, we recommend the use of BH in all multicompartmental modeling applications. This is further supported by the greater morphological detail revealed in BH reconstructions; it has been shown that even small differences in dendritic arborization may have large effects on the physiological properties of pyramidal cells (Schaefer et al., 2003), and simulations of such properties should therefore be based on the most accurate and complete morphological reconstructions possible. In contrast to neurite diameters and number of branches, the distortions and shrinkage seen with BH reconstructions are not likely to affect simulations much, and are therefore less of an issue for modeling as opposed to in morphometric applications (Schaefer et al., 2003). Until resolutionlimit breaking FI reconstruction methods (see below) become commonplace, BH-based reconstructions are likely to remain state of the art for all multicompartmental computer-modeling applications.

#### **ALTERNATIVE APPROACHES AND IMPROVEMENTS**

In this study we have chosen to focus on two commonly used methods to reconstruct detailed morphologies of single neurons, in order to provide a broadly applicable comparison of their strengths and weaknesses. However, a range of alternative methods are becoming increasingly available which may offer means to address some of the problems identified here, although these are often far more expensive, technically demanding and time-consuming.

For FI reconstructions, a key issue identified in this study is a potential lack of accuracy at levels of high detail, due to scattering of laser light in brain tissue, effects of image processing and a worse resolution limit than light microscopy. FI under the diffraction limit is however possible with super-resolution techniques such as structured illumination microscopy (SIM) or stimulated emission depletion (STED) (Hell, 2007; Ding et al., 2009; Evanko, 2009) and such methods potentially offer the ability to produce reconstructions at a detail suitable for accurate NEURON modeling using 2PLSM, although this would incur higher costs. An alternative way to create highly detailed reconstructions from FI is to use microinjection of fluorescent dyes in fixed tissue followed by confocal microscopy with deconvolution, although with this method anatomy cannot be combined with electrophysiology (Dumitriu et al., 2011). As noted above, confocal FI imaging may in general produce reconstructions with different properties to the 2PLSM derived reconstructions used here.

In contrast, a potential shortcoming of BH reconstructions identified in this study is the propensity to be affected by tissue distortions and deformations, particularly in the z-axis. Furthermore, there is a risk with BH of reconstructing from incompletely processed tissue—especially when a novice is first learning to use the technique—which may skew results. Recently, an improved biocytin staining protocol with slow dehydration and using the embedding medium Eukitt has been shared, which preserves some cytoarchitectonic features and allows for easier shrinkage correction in all dimensions (Marx et al., 2012). Compared with the far more common method used here, this may result in more realistic morphologies and allow for layer and area-specific morphometry without the use of markers such as cytochrome c oxidase. This method would also presumably result in even more accurate morphologies to be used in NEURON modeling. This said, it is not currently widely used and requires many more reagents than the standard protocol used in this study.

#### **CONCLUDING REMARKS**

In this study, we have quantitatively compared reconstructions from two popular methods (FI and BH) and identified consistent and significant differences in aspects of their resulting morphologies and use in computer modeling. Whilst both methods perform similarly for many morphological applications including cell classification, BH reconstructions reveal more distal neurites but suffer from compression and distortion artifacts. In computer modeling, FI reconstructions result in smaller simulated EPSPs, primarily due to the systematically larger diameters of cells reconstructed with this method. Therefore, care must be taken in reconstruction method choice for a particular application. In modeling studies particularly, mixing reconstructions from different methods may introduce measureable differences that do not represent that of underlying physiology and anatomy. In our hands, BH reconstructions are the gold standard for accuracy however FI reconstructions are preferable for cell classification applications due to lower cost, higher throughput, and ease of use.

#### **AUTHOR CONTRIBUTIONS**

Arne V. Blackman carried out experiments and morphometry. Stefan Grabuschnig did NEURON simulations. All authors contributed to analysis. Arne V. Blackman wrote the manuscript with input from co-authors.

#### **ACKNOWLEDGMENTS**

We thank Alanna Watt, Tom Mrsic-Flogel, Sonja Hofer, Simon Schultz, Julia Oyrer and Mark Farrant for help and useful discussions. We thank Michael Häusser and Troy Margrie for lending their Neurolucida setups, and Scientifica for lending electrophysiology and imaging equipment. This work was supported by a BBSRC Industrial CASE studentship BB/H016600/1 (Arne V. Blackman) that was co-funded by Scientifica, EU FP7 Future Emergent Technologies grant 243914 "Brain-i-nets" (Robert Legenstein and P. Jesper Sjöström), European Union project #269921 "BrainScaleS" (Stefan Grabuschnig and Robert Legenstein) CIHR OG 126137 (P. Jesper Sjöström), and NSERC DG 418546-12 (P. Jesper Sjöström).

#### **REFERENCES**


collaterals in the inner molecular layer. *J. Neurosci.* 27, 13756–13761. doi: 10.1523/JNEUROSCI.4053-07.2007

**Conflict of Interest Statement:** We would like to declare financial and infrastructure support from Scientifica that was provided as part of a BBSRC Industrial CASE Studentship (BB/H016600/1 - "*Using Novel Technology to Elucidate Neocortical Microcircuits with Multiple Simultaneous Whole-Cell Recordings*"). Scientifica cofunded Arne V. Blackman's stipend by £11,100.00 over 3 years and also provided the Sjöström lab with a combined 2-photon imaging and whole-cell recording rig for free over 4 years.

#### *Received: 04 April 2014; accepted: 23 June 2014; published online: 11 July 2014.*

*Citation: Blackman AV, Grabuschnig S, Legenstein R and Sjöström PJ (2014) A comparison of manual neuronal reconstruction from biocytin histology or 2-photon imaging: morphometry and computer modeling. Front. Neuroanat. 8:65. doi: 10.3389/ fnana.2014.00065*

#### *This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Blackman, Grabuschnig, Legenstein and Sjöström. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Automated computation of arbor densities: a step toward identifying neuronal cell types

#### *Uygar Sümbül 1,2\*† ‡, Aleksandar Zlateski 3‡, Ashwin Vishwanathan1, Richard H. Masland2,4 and H. Sebastian Seung5*

*<sup>1</sup> Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA*

*<sup>3</sup> Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA*

*<sup>4</sup> Department of Neurobiology, Harvard Medical School, Boston, MA, USA*

*<sup>5</sup> Princeton Neuroscience Institute and Computer Science Department, Princeton University, Princeton, NJ, USA*

#### *Edited by:*

*Hermann Cuntz, Ernst Strüngmann Institute in Cooperation with Max Planck Society, Germany*

#### *Reviewed by:*

*Karl Farrow, Neuroelectronics Research Flanders, Belgium Marta Costa, University of Cambridge, UK*

#### *\*Correspondence:*

*Uygar Sümbül, Grossman Center for the Statistics of Mind and Department of Statistics, Columbia University, New York, NY 10027, USA*

*e-mail: uygar@stat.columbia.edu*

#### *†Present address:*

*Uygar Sümbül, Grossman Center for the Statistics of Mind and Department of Statistics, Columbia University, New York, NY, USA*

*‡These authors have contributed equally to this work.*

#### **1. INTRODUCTION**

The classification of neuronal types is far from complete. Advances in genetic engineering for sparse and specific labeling (Gong et al., 2003; Wickersham et al., 2006, 2007; Kim et al., 2008; Chung et al., 2013; Ke et al., 2013) offer improved data acquisition and molecular identification of neuronal classes. However, the need for structural information has not diminished because what defines a true neuronal type is not clear when only molecular information is available. One challenge facing a successful classification is to ensure that every cell type is represented in the sample set. For the structural approach, dense reconstruction of tissues imaged by electron microscopy offers a solution to this completeness problem (Denk and Horstmann, 2004; Hayworth et al., 2006; Bock et al., 2011). On the other hand, electron microscopy is not yet capable of either obtaining large enough sample sets to capture the biological variability within individual cell types, or imaging cells with very large neuronal arbors. Light microscopy offers high throughput imaging and a large field of view to complement electron microscopy. However, the time-intensive tracing step represents a bottleneck of the overall program.

The shape and position of a neuron convey information regarding its molecular and functional identity. The identification of cell types from structure, a classic method, relies on the time-consuming step of arbor tracing. However, as genetic tools and imaging methods make data-driven approaches to neuronal circuit analysis feasible, the need for automated processing increases. Here, we first establish that mouse retinal ganglion cell types can be as precise about distributing their arbor volumes across the inner plexiform layer as they are about distributing the skeletons of the arbors. Then, we describe an automated approach to computing the spatial distribution of the dendritic arbors, or arbor density, with respect to a global depth coordinate based on this observation. Our method involves three-dimensional reconstruction of neuronal arbors by a supervised machine learning algorithm, post-processing of the enhanced stacks to remove somata and isolate the neuron of interest, and registration of neurons to each other using automatically detected arbors of the starburst amacrine interneurons as fiducial markers. In principle, this method could be generalizable to other structures of the CNS, provided that they allow sparse labeling of the cells and contain a reliable axis of spatial reference.

**Keywords: cell types, classification, retinal ganglion cells, reconstruction, stratification, laminar structures**

Recently, it was shown that neurons in the mammalian retina can achieve submicron precision in their laminar positioning (Sümbül et al., 2014). This was done by combining an arbor density formalism (Stepanyants and Chklovskii, 2005) with a neurite based registration system for sparsely labeled neurons. The ensuing arbor density classification suggests that a robust classification of all mammalian retinal ganglion cells is within reach. However, this study and many other previous attempts (Sun et al., 2002; Badea and Nathans, 2004; Kong et al., 2005; Coombs et al., 2006; Völgyi et al., 2009) depend on manual tracing of individual neuronal arbors, which is a time-intensive task. Tracing a neuronal arbor creates a "skeleton representation" of the arbor, which consists of interconnecting line segments going through the dendrites. The thickness of dendrites along the line segments is often ignored because tissue preparation artifacts can result in unreliable estimates. In contrast, a volumetric representation includes both the skeleton and the dendrite thickness along the skeleton. Here, we propose an automated method using volumetric analysis to aid the classification of neuron types. At the heart of our approach is the simple observation that while the arbor density representation in Sümbül et al. (2014) requires a

*<sup>2</sup> Department of Ophthalmology, Harvard Medical School, Boston, MA, USA*

precise characterization of laminar positioning, it does not utilize detailed descriptions of arbors. In particular, we demonstrate that volumetric stratification precision of neurons can match the trace-based precision in the mammalian retina. Our method is designed for sparse imaging scenarios. It does not address the problem of separating the arbors of overlapping neurons from each other, for which tracing may still be required. Kim et al. (2014) recently used volumetric analysis and semi-manual arbor reconstruction to identify bipolar and starburst amacrine cells in an electron microscopy setting.

Volumetric reconstruction of neuroanatomy from an image stack involves obtaining a digital representation of the neuronal arbor (i.e., a voxel is "white" if it belongs to the cell, and "black" otherwise), and registering this representation to other neuronal structures to achieve a comparative description. As a first step to reconstruct a sparsely labeled neuron, we use a convolutional network (LeCun et al., 1998), which is a supervised machine-learning architecture, to enhance the image quality and suppress the acquisition noise. Although robust and accurate reconstruction of neuronal morphology is still a largely unsolved problem, it has become a bottleneck only recently as a result of the advances in high-throughput imaging. The demand for automated reconstruction prompted the Digital Reconstruction of Axonal and Dendritic Morphology Challenge (DIADEM challenge) (Brown et al., 2011). The challenge helped disseminate many novel approaches (Bas and Erdogmus, 2011; Chothani et al., 2011; Narayanaswamy et al., 2011; Turetken et al., 2011; Wang et al., 2011; Zhao et al., 2011). We anticipate that some of these approaches may be preferable to the convolutional network module of our method depending on the imaging conditions. A common problem is that when labeling is not sparse enough, cells other than the neuron of interest are also reconstructed. Our solution is to apply a post-processing routine to remove extraneous objects after the initial reconstruction step.

In the mammalian retina, the dendrites of the starburst amacrine interneuron form two parallel surfaces in the inner plexiform layer, which serve as fiducial marks (Haverkamp and Wässle, 2000). When the tissue is not flattened to preserve internal structure, it assumes a wavy form under the microscope. We solve this problem by digitally flattening (unwarping) the stack with the guidance of starburst surfaces after the imaging is done. Finally, we obtain a common depth coordinate by registering the starburst surfaces from different stacks to each other.

#### **2. MATERIALS AND METHODS**

#### **2.1. THE DATASET**

We use the retinal ganglion cells (RGCs) from a recent study on the classification of retinal cell types (Sümbül et al., 2014). The dataset was obtained by confocal microscopy at a voxel size of 0.4µm×0.4µm×0.5µm. This dataset also includes the relative positions of On and Off starburst amacrine interneurons for each RGC, by staining for choline acetyltransferase (Haverkamp and Wässle, 2000), thereby allowing a stratification analysis of RGCs based on starburst amacrine arbors. The methodological bottleneck of that study was the semi-automated tracing of RGC arbors, which required an average time of 40 min per trace with experienced tracers. The full dataset includes five strongly defined cell types, which have consistent and specific functional, molecular, and structural identifiers. We focus here on this subset, and omit the stacks where labeling is too dense (i.e., existence of many neurites in close proximity from more than one neuron) or too dim for fully automated analysis. In a few cases, the starburst surfaces were weakly stained; these were also omitted. After this culling, two neuron types did not have enough representatives for statistical analysis and were omitted altogether. The final dataset comprises 50 neurons that form three molecularly, physiologically, and structurally homogeneous cell types.

The JAM-B neurons express the junction adhesion molecule *JAM-B*, respond to offset of upward moving stimuli, and their arbors are asymmetric in the dorsal-ventral axis (in the central retina) (Kim et al., 2008). The W3 neurons express the *TYW3* gene, are sensitive to local edges, and have one of the smallest arbor sizes in the mammalian retina (Kim et al., 2010). The BDa neurons express the *FSTL4* gene, are On-Off direction sensitive, and arborize twice (Kim et al., 2010). Finally, these cell types are known to stratify at characteristic depths in the inner pexiform layer with submicron precision [distance from the On starburst surface: 15.6µm (JAM-B), 5.5µm (W3), 0.3µm (BDa)— BDa neurons stratify again 0.3µm distal to the Off starburst surface] (Sümbül et al., 2014).

#### **2.2. VOLUMETRIC RECONSTRUCTION OF SPARSELY LABELED NEURONS FROM MANUAL TRACES**

We use the concept of *simple pixel* from digital topology (Bertrand and Malandain, 1994) to probe whether neuronal mass attains the stratification precision achieved by the arbor traces (skeletons). A simple pixel is defined as a pixel that does not change the topology of the digital image when its value is flipped. (i.e., does not create/remove objects, holes, splits, mergers) Similar approaches were previously used in the reconstruction of dense electron microscopy images of neuronal tissue (Jain et al., 2010; Helmstaedter et al., 2013). Specifically, we inflate the individual traces by respecting the topology of the traces (via simple pixel characterization), and the geometry of the neurons (via thresholding the brightness values in the raw image). We use 60% of the maximum brightness value in an image stack as the threshold. We iterate the inflation process 62 times, potentially inflating by a single layer of voxels at each step so that somata as large as (62 × 2 + 1) × 0.4µm= 50µm in diameter are properly characterized. **Algorithm 1** presents a pseudocode of the steps. The resulting three dimensional binary stacks are *seemingly* perfect characterizations of neuronal morphology based on the raw image stacks and the arbor traces (**Figure 1**) because they respect both the tree structure (through tracing, **Figure 1B**), and the dendritic widths (through inflation, **Figure 1D**). The caveat is that the resulting volumetric representations depend on the laborious task of (semi-) manual tracing.

#### **2.3. AUTOMATED ENHANCEMENT AND POST-PROCESSING OF RGC ARBORS**

Various approaches have been developed recently for automated reconstruction of neuronal morphology from sparsely labeled **Algorithm 1 | Pseudocode for topologically constrained inflation of a trace. Binary operations on same-size arrays are to be interpreted elementwise. ¬ (⊕) denotes negation (exclusive-or).** *imdilate* **dilates its first argument using its second argument as the kernel.** *nnz* **returns the number of nonzero (true) entries in an array. Matlab notation is used in the array on line 12.**

#### **Algorithm** *Inflating a trace.*

#### **Input:**


image stacks (Al-Kofahi et al., 2002, 2008; Schmitt et al., 2004; Zhang et al., 2007; Losavio et al., 2008; Peng et al., 2010, 2011; Srinivasan et al., 2010; Bas and Erdogmus, 2011; Turetken et al., 2011; Wang et al., 2011; Xie et al., 2011; Choromanska et al., 2012; Turetken et al., 2012; Gala et al., 2014). While these methods can capture the geometrical layout of neuronal arbors, imperfections in tissue handling and imaging (e.g., non-uniform labeling of neurites, high density labeling, low signal-to-noise ratio images) often result in topological errors such as missing branches and extraneous structures. On the other hand, blurring and projection operations are robust against local mistakes. Therefore, topological imperfections in the reconstruction may be acceptable for cell type identification purposes so long as the general morphology of a neuron is captured properly. As a first step, we use the convolutional network based enhancement of RGC arbors reported in Sümbül et al. (2014). A convolutional network is a feed-forward network of convolutional filters whose outputs are transformed by a non-linearity (e.g., sigmoid). An advantage of such a supervised machine learning approach is that it does not have free parameters to adjust. Rather, the paradigm depends on the existence of a labeled training set through which the various parameters are automatically optimized. The network is trained to transform noisy gray-scale images of sparsely labeled neurons into cleaner binary images. Here, we improve the architecture and filter sizes, and provide an efficient implementation that does not need specialized hardware (http://www. github.com/zlateski/znn3). The resulting network has 8 layers

manually traced arbor **(B)**, the inflated trace after one round of topologically constrained inflation **(C)**, and the inflated trace after 62 rounds of topologically constrained inflation **(D)**. In each panel, large image: *xy* projection, bottom: *zy* projection, right: *xz* projection. Scale bar, 40µm; bottom-right, *xy* projection image in panel D.

with 8 perceptrons in each hidden layer except for the last hidden layer, which is a fully connected layer of 100 perceptrons. The filter sizes within each layer are identical and are as follows: 5 × 5 × 1, 5 × 5 × 1, 3 × 3 × 3, 5 × 5 × 1, 3 × 3 × 3, 3 × 3 × 3, 1 × 1 × 1, 1 × 1 × 1. Therefore, the overall patch size to decide whether the central voxel of the patch belongs to a neurite or not is 19 × 19 × 7 voxels (7.6µm ×7.6µm ×3.5µm). The network has all-to-all connectivity between subsequent layers, and is trained by backpropagation learning LeCun et al. (1998).

When the density of labeling is not low enough, somata and neurites of other neurons appear in the image stacks. On the other hand, the reconstructed arbors may have breaks due to dim/inhomogeneous labeling. Therefore, we devise a simple postprocessing routine to isolate the neuron of interest. The algorithm uses connected component analyses and basic morphological image operations (i.e., opening and dilation) to remove extraneous structures and somata. In particular, the algorithm detects the largest object in the image stack and removes the objects that are smaller than a given size and farther from the largest object than a given distance. Somata are removed by locating and removing the white regions that are large enough to fully enclose a given ellipsoid (**Algorithm 2**). While soma size is known to carry information on neuronal identity, it is a weak classifier (Sun et al., 2002; Coombs et al., 2006; Völgyi et al., 2009). **Algorithm 2 | Pseudocode for post-processing a binary volume. Various binary operations are as defined in Algorithm 1. Matlab notation is used for brevity.** *bwlabeln* **returns an array the same size as its argument, where voxels are assigned different values** *iff* **they belong to different objects (26-connectivity).** *regionVolumes* **returns a list of object sizes.** *bwareaopen* **removes from its first argument all objects whose volumes are smaller than the second argument.** *imopen* **performs a morphological opening operation on its first argument using a cubic kernel whose edge length is given by the second argument.**

**Algorithm** *Post-processing*

#### **Input:**

1. inStack (*m* × *n* × *p*), dilationRadius, sizeThreshold, searchRadius (scalar)

#### **Output:** outStack (*m* × *n* × *p*)


The final image stacks may include axonal projections from other neurons, imperfectly suppressed noise, missing small branches, extraneous branches from other neurons, and splits/mergers of the neuronal arbor depending on the image quality and the sparsity of labeling in the tissue. Nevertheless, the next few subsections demonstrate that the reconstruction quality is high enough to study stratification patterns and probe neuronal identity.

#### **2.4. QUASI-CONFORMAL UNWARPING OF VOLUMETRIC DATA AND LAMINAR REGISTRATION**

We use the automatically detected starburst surfaces in individual stacks as fiducial marks (**Figure 2**). We find quasi-conformal mappings that independently transform the detected starburst surfaces into flat surfaces as described in Sümbül et al. (2014) to maximally preserve local angles within the surfaces (Levy et al., 2002). The two flattened surfaces are registered to each other in-plane by matching the *xy* coordinates of the patch in which both starburst layers are the flattest. We extend the resulting transformation to other points in the image stack by using local polynomial approximations (quadratic in *xy*, linear in *z*). In particular we apply the transformation to individual voxels of the binary three-dimensional representation of a neuron, rather than its trace points. The transformed voxels are scaled and shifted in *z* so as to place the flattened On starburst surface at *z* = 0µm and the flattened Off starburst surface at *z* = 12µm. **Figure 3** depicts the dramatic effect of unwarping on a BDa neuron. Finally, the histogram of depth positions of the voxels (depth profile) is obtained by gridding onto a Cartesian grid with a resolution of 0.5µm (**Figure 3D**). The gridding step uses a Kaiser-Bessel kernel (Jackson et al., 1991) to maintain accuracy in laminar registration, and applies weights to individual voxels to compensate for the distance fluctuations between warped voxels. Note that if the arbor density function is obtained by blurring in *xy* only (and not in *z*), then the depth profile is the projection of the three-dimensional arbor density function.

#### **2.5. STATISTICAL MEASURES AND OTHER METRICS**

The peak position of a depth profile is the signed distance from the On starburst layer at which the profile achieves its maximum value. [The On (Off) layer is located at *z* = 0µm (*z* = 12µm.)] For the bistratified BDa cells, a second peak position is also reported. This second peak position is defined as the depth value at least 6µm away (half the distance between starburst layers) from the first peak position, at which the remaining profile achieves its maximum value.

We assume that the peak positions of the depth profiles of individual neurons of a given type are independent and identically distributed (i.i.d.) with *N*(μ, σ2). The distribution of the sample variance of *n* i.i.d. *N*(μ, σ2) observations is given by χ<sup>2</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>(*t*(*<sup>n</sup>* <sup>−</sup> 1)/σ2)(*<sup>n</sup>* <sup>−</sup> 1)/σ<sup>2</sup> , where <sup>χ</sup><sup>2</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>(*t*) denotes the chi-squared distribution with *n* − 1 degrees of freedom. The symmetrical 95% confidence interval for σ, given the sample standard deviation *s*, is

$$\left[\sqrt{\frac{(n-1)s^2}{X\_{n-1}^{-2}(0.975)}}, \sqrt{\frac{(n-1)s^2}{X\_{n-1}^{-2}(0.025)}}\right],\tag{1}$$

where *X*−<sup>2</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup> denotes the inverse cumulative distribution function of χ<sup>2</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>.

We use the Brown-Forsythe test to infer whether the different reconstruction methods return significantly different variance values for the peak stratification position of cells of the same type.

We define the "signal" in each normalized depth profile for cells of a given type as the average normalized profile over cells of that type. Then, the "noise" in each profile is the difference of the profile from the "signal" component. We define the signal-tonoise ratio (SNR) for a cell type as the average, of the Euclidean norm of the signal divided by the Euclidean norm of the noise, over all cells of that type.

The Crest factor is defined as the peak amplitude of a profile divided by the root-mean-square value of the profile. That is, it is the ratio of the peak value to the average value. It indicates how extreme the peak is in a given depth profile. Since narrow and sharp peaks in a profile where the "background" regions are small makes it easy to detect a cell type in the presence of many cell

**RGC arbors (green), and detection of starburst surfaces (red).** Left: *xy* **(A)**, *xz* **(B)**, and *zy* **(C)** projections of the raw image of an RGC. Right: *xy* **(D)**, *xz* **(E)**, and *zy* **(F)** projections of the

RGC. Starburst surfaces within a slab are shown and starburst somata are removed for better visualization. Scale bar, 40µm; lower-right, panel **D**.

types, we use the Crest factor as a figure of merit for the different approaches analyzed in this paper.

#### **3. RESULTS**

#### **3.1. PROJECTIONS OF VOLUMETRIC DATA PRESERVE THE STRATIFICATION PRECISION OF RGCs**

We obtain the volumetric reconstructions of all 50 neurons in the dataset by inflating their manually reconstructed traces as described in **Algorithm 1**. These volumetric reconstructions were unwarped and registered, to obtain depth profiles of all neurons in the dataset. **Figure 4** shows the average profiles for each neuron type generated from the volumetric reconstructions together with the average profiles of the traces (skeletons). Two qualitative observations emerge: (i) The peak positions of the average profiles are preserved across the two methods. The distribution of mass along the skeletons of neurons preserve the peak stratification depth of the skeletons. (ii) The peaks of the normalized profile averages are lower in the volume-based approach because the branches close to the soma are typically thicker than the distal branches. **Table 1** tabulates the Crest factors for both methods, and quantifies the observation that the trace profiles have slightly sharper peaks. Profiles with sharper peaks are preferable when identifying cell types in the presence of a heterogeneous dataset, similar to spectroscopy.

The specificity of stratification peaks -not just their average- is important to be able to identify cell types. The sample standard deviations of the peak position for each cell type do not change significantly between the skeleton-based and volume-based depth profiles (Brown-Forsythe test—See **Tables 1**, **2** for individual *n* and *p*-values). This suggests that neurons of a given type are as precise about distributing their neuronal volume in depth as they are about distributing their skeleton-based presence.

**Table 1** also tabulates the mean SNR values over the three cell types using both the trace profiles and the trace-based volumetric profiles (Methods). Higher SNR values indicate stereotypical distributions. While *n* = 3 pairs are too few to probe statistical significance with rigor, the trace-based volumetric profiles do not seem to have lower SNR values than the trace profiles. (Right-sided Wilcoxon signed rank test, *n* = 3 pairs, *p* = 0.75)

#### **3.2. AN AUTOMATED METHOD TO PROBE RGC IDENTITY**

The stereotypy of the profiles of volume-based reconstructions obtained by inflating manual traces suggests that it may be possible to avoid the laborious task of manual tracing altogether for cell type identification purposes. For comparison, we begin by implementing simple thresholding: Each image stack is thresholded at 60% of its maximum brightness value. Then, somata are removed and the resulting binary stack is unwarped and registered as described in the Methods. The results are not impressive: The extraneous structures and imaging artifacts contaminate many stacks significantly. As a simple proxy, we observe that the mistakes perturb the depth profiles enough to create spurious peaks far away from the original stratification peak in 11 out of 50 stacks (**Figure 5A**). Therefore, this simple approach is not suitable for automation. This is especially clear for cells that stratify close to the ganglion cell layer. After removing the part of the profiles where *z* < −6µm, we find that the stratification stereotypy of the depth profiles is essentially preserved (**Tables 1**, **2**). Nevertheless, even in this restricted region, the presence of objects that do not belong to the neuron makes type identification harder, as reflected by the low SNR values and the Crest factors (**Table 1** and **Figure 5C**).

While the threshold based approach may allow for cell type identification when the image stacks are sparsely labeled and have very low noise, insufficient suppression of the background noise and failure to isolate the neuron of interest from other structures prevents it from working reliably on our dataset. Therefore, we apply the convolutional network described in the

Methods on the image stacks to suppress the background noise, retain the neuronal structures, and connect the occasionally disconnected neurite pieces. Subsequently, we apply the postprocessing routine (Methods) to remove the extraneous structures from the image stack that are not critically close to the neuron of interest. Notably, no manual labor is used in this scheme.

A drawback is that the automated approach occasionally causes splits and mergers in the reconstruction and includes extraneous structures. On the other hand, the depth profiles one-dimensional arbor densities that serve as proxies for the three-dimensional arbor densities—identify the stratification peaks correctly (**Figure 5B**). Moreover, the sample standard deviation of the peak position did not change significantly in any of the three neuron types (Brown-Forsythe test—See **Tables 1**, **2** for individual *n* and *p*-values). The Crest factors for this automated method are lower than those of the trace profiles, but they are roughly the same as those of the trace-based volumetric profiles. Lastly, the mean SNR value for the automated method is lower than that for the trace-based approaches, but it is higher than the threshold method's mean SNR value (**Table 1**).

#### **4. DISCUSSION**

Identifying and providing experimental access to homogeneous cell types of nervous systems is a prerequisite to understanding the fundamental principles of brain function in health and disease. Recently, it was shown that a method using a neurite based registration system and an arbor density representation of neurons is capable of robustly identifying the mammalian RGC types in a highly heterogeneous sample set (Sümbül et al., 2014). Notably, that study relied on traces of neuronal arbors, which are time consuming to obtain. Here, we show that the spatial distribution of the arbor volume attains a stratification precision similar to that of the arbor trace. Based on this observation, we describe an automated method that can remove the time intensive tracing step in identifying cell types. We anticipate our approach to be useful in

**FIGURE 4 | Depth profiles of trace-based volumetric reconstructions maintain the stereotypy attained by the depth profiles of arbor traces while having lower peaks. (A)** Depth profiles of the arbor traces, **(B)**, depth profiles of the topology preserving inflations of the traces.

**Table 1 | Mean and standard deviation values for the peak positions and norms of the depth profiles.**


*Values are given as mean* ± *SD. The number of samples (n-values) are denoted in parantheses next to the cell type names. (\*) Peaks at z* < −*6*μ*m are not considered in the calculation of stratification means and standard deviations.*

integrating structural information to studies that investigate the molecular or functional dynamics of neurons, as well as purely anatomical pursuits.

We quantify the stratification precision as the standard deviation of the peak position of the depth profiles. We do not observe significant differences between the stratification precisions of the depth profiles of the traces and the volumes obtained by inflating the traces or by our automated method (**Table 2**). This suggests that the depth distribution of the overall mass can be as stereotyped as that of of the skeletal mass. Another observation suggesting volumetric stereotypy is the lack of a significant difference between the mean SNR values of the normalized depth profiles of the traces and the volumes obtained by inflating the traces.


*The first row indicates the 95% confidence interval of the reported standard deviation values of the peak positions based on the traces. The remaining rows are the p-values of the Brown-Forsythe test of equal variance between the indicated method and the trace method. The n-values are denoted in parantheses next to the cell type names. (\*) Peaks at z* < −*6*μ*m are not considered.*

We have argued that the presented method can be useful in identifying cell types using three-dimensional arbor densities. However, we have not attempted a formal classification of the cells used in this study. While **Figure 5C**, **Tables 1**, **2** clearly suggest that such an attempt would be successful, classification becomes a hard task only in the presence of a highly heterogeneous dataset. On the other hand, considering that the automated approach can maintain the stratification precision attained by the trace based analysis and the arbor density representation in Sümbül et al. (2014) used substantial in-plane blurring (and no axial blurring), it is plausible that arbor densities generated from the output of our automated method can be classified successfully not only in the presently studied dataset of three cell types, but also in a more heterogeneous sample set.

We observe that the peak values of the normalized volumetric profiles are smaller than those of the normalized trace profiles. This can be explained by the fact that branches closer to the soma are typically thicker than the distal branches, presumably to minimize signal propagation delays while keeping arbor volume to a minimum (Chklovskii and Stepanyants, 2003).

Dim or inhomogeneous labeling of neurites, denser (not sparse enough) labeling of neurons, and high noise levels often result in imperfect reconstructions with the current stateof-the-art automated approaches. Our convolutional network implementation is not immune to such imperfections, either. Removal of failing image stacks decreases the throughput of the overall method. On the other hand, standard approaches in machine learning, such as boosting and training deeper networks with larger training data, suggest ways of increasing the throughput by providing better noise suppression and better reconstruction of arbor topology. Moreover, while other automated reconstruction methods often require manual tuning of free parameters, they can be inserted instead of our convolutional network implementation as well (Al-Kofahi et al., 2002, 2008; Schmitt et al., 2004; Zhang et al., 2007; Losavio et al., 2008; Peng et al., 2010, 2011; Srinivasan et al., 2010; Bas and Erdogmus, 2011; Turetken et al., 2011, 2012; Wang et al., 2011; Xie et al., 2011; Choromanska et al., 2012; Gala et al., 2014).

While we investigate retinal ganglion neurons in this study, our approach only assumes (i) the existence of an arbor marker specific to a cell type and (ii) a method of labeling cells sparsely in a laminar structure. Therefore, it is readily extendible to other neuron classes of the retina. In particular, the same fiducial marks (starburst amacrine cells) and very similar sparse labeling methods can be used to study the classification and co-stratification of bipolar and amacrine cell classes. The effort required to trace a neuron increases as the complexity of its arbor increases. Hence, the potential impact of our method is higher for neurons whose total dendritic lengths are larger. Cortical neurons are typically much larger than retinal neurons, and classifying them is an impending problem (Ascoli et al., 2008). Traditionally, obtaining datasets of cortical neurons that capture their diversity has been a practical challenge. However, recent advances in tissue clarification and a multiplicity of genetic or viral methods (Gong et al., 2003; Wickersham et al., 2006, 2007; Kim et al., 2008; Chung et al., 2013; Ke et al., 2013) enable high-throughput structural imaging of such neurons. Therefore, we speculate that our approach can be useful in automating the discovery and identification of cortical cell types if the two requirements mentioned above are met.

#### **DATA SHARING STATEMENT**

The ZNN library is available at http://www.github.com/zlateski/ znn3. The subset of trace files, the associated automatically detected starburst surfaces, the trace based reconstructions, and the software used in this study are available at http://www.github. com/uygarsumbul/volumetricRGC. The original dataset of trace files is available at http://www.github.com/uygarsumbul/rgc. Raw image stacks are available upon request.

#### **AUTHOR CONTRIBUTIONS**

All authors contributed to the conception of the study and the editing of the manuscript. Uygar Sümbül designed and performed the analysis. Aleksandar Zlateski conceived and implemented the ZNN package. Uygar Sümbül and Ashwin Vishwanathan wrote an initial draft of the manuscript.

#### **FUNDING**

We are grateful for financial support from the Harvard Neuro-Discovery Center, the U.S. Army Research Office (W911NF-12-1- 0594), DARPA (HR0011-14-2-0004), NIH/NINDS, the Howard Hughes Medical Institute, the Gatsby Charitable Foundation, and the Human Frontier Science Program.

#### **REFERENCES**


neuron tracing from 3D confocal images. *Cytometry A* 73, 36–43. doi: 10.1002/cyto.a.20499


Zhao, T., Xie, J., Amat, F., Clack, N., Ahammad, P., Peng, H., et al. (2011). Automated reconstruction of neuronal morphology based on local geometrical and global structural models. *Neuroinformatics* 9, 247–261. doi: 10.1007/s12021-011-9120-3

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 July 2014; accepted: 06 November 2014; published online: 25 November 2014.*

*Citation: Sümbül U, Zlateski A, Vishwanathan A, Masland RH and Seung HS (2014) Automated computation of arbor densities: a step toward identifying neuronal cell types. Front. Neuroanat. 8:139. doi: 10.3389/fnana.2014.00139*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Sümbül, Zlateski, Vishwanathan, Masland and Seung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Functional connectivity estimation over large networks at cellular resolution based on electrophysiological recordings and structural prior

#### *Simona Ullo1 \*, Thierry R. Nieus 2, Diego Sona1, Alessandro Maccione2, Luca Berdondini 2 † and Vittorio Murino1,3 †*

*<sup>1</sup> Department of Pattern Analysis and Computer Vision, Istituto Italiano di Tecnologia, Genova, Italy*

*<sup>2</sup> Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genova, Italy*

*<sup>3</sup> Department of Computer Science, University of Verona, Verona, Italy*

#### *Edited by:*

*Patrik Krieger, Ruhr University Bochum, Germany*

#### *Reviewed by:*

*Gordon William Arbuthnott, Okinawa Institute of Science and Technology, Japan Jean-Philippe Thivierge, University of Ottawa, Canada*

#### *\*Correspondence:*

*Simona Ullo, Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy e-mail: simona.ullo@iit.it*

*†These authors have contributed equally to this work.*

Despite many structural and functional aspects of the brain organization have been extensively studied in neuroscience, we are still far from a clear understanding of the intricate structure-function interactions occurring in the multi-layered brain architecture, where billions of different neurons are involved. Although structure and function can individually convey a large amount of information, only a combined study of these two aspects can probably shade light on how brain circuits develop and operate at the cellular scale. Here, we propose a novel approach for refining functional connectivity estimates within neuronal networks using the structural connectivity as prior. This is done at the mesoscale, dealing with thousands of neurons while reaching, at the microscale, an unprecedented cellular resolution. The High-Density Micro Electrode Array (HD-MEA) technology, combined with fluorescence microscopy, offers the unique opportunity to acquire structural and functional data from large neuronal cultures approaching the granularity of the single cell. In this work, an advanced method based on probabilistic directional features and heat propagation is introduced to estimate the structural connectivity from the fluorescence image while functional connectivity graphs are obtained from the cross-correlation analysis of the spiking activity. Structural and functional information are then integrated by reweighting the functional connectivity graph based on the structural prior. Results show that the resulting functional connectivity estimates are more coherent with the network topology, as compared to standard measures purely based on cross-correlations and spatio-temporal filters. We finally use the obtained results to gain some insights on which features of the functional activity are more relevant to characterize actual neuronal interactions.

**Keywords: connectomics, structural connectivity, functional connectivity, high-density Micro Electrode Array, electrophysiology, graph heat kernel, probabilistic directional feature, Von Mises distribution**

#### **1. INTRODUCTION**

Brain processing is widely recognized to be distributed over a wide range of different scales, involving an impressive number of cells with heterogeneous phenotypes that are structurally and functionally organized in a sophisticated and still unclear architecture. Disentangling the intricate contributions of single neurons constituting large brain circuits from the strongly correlated phenomena shaping brain function is one of the biggest challenges in neuroscience. To complicate things further, most of the neuronal processing taking place in the nervous system is characterized by a limited observability and still requires the additional improvement of currently existing neurotechnologies. Indeed, while direct measurements are only possible at very small scales (i.e., monitoring the intracellular potential of a few single neurons or up to a few hundreds of neurons with 2-photon microscopy), larger scale mechanisms can commonly be observed through indirect non-invasive modalities (i.e., brain imaging) but rather loosing the resolution of single cells. Given these two opposite experimental approaches that have characterized the neuroscientific research over the last decades, what still remains unanswered is how to bridge the structural and functional aspects observed at the different scales.

In the last few decades efforts have been put forward for the investigation of the so-called *connectome*, i.e., the reconstruction of the neural connectivity at different scales (Sporns et al., 2005; Leergaard et al., 2012). The term *connectomics* has a very broad scope, ranging from single-neuron interplays (*microscale* connectomics) to pathways between large brain regions (*macroscale* connectomics, Yap et al., 2010). Reconstructing the brain connectome across these scales is fundamentally important to understand the constituent parts of the nervous system, their multiple interactions and the advanced cognitive functions that they support, both in normal and pathological neurodegenerative conditions. By promoting the analysis of different aspects of brain behavior, connectomic studies typically involve two complementary forms of information: structure and function.

In the literature these two aspects are usually studied separately. Part of the efforts focuses on a dense reconstruction of the *structural connectivity*, while complementary studies address the analysis of synchronous patterns of neuronal activation for estimating the *functional connectivity*.

However, structure and function are tightly interrelated. By looking at fine-scale interactions, we are learning that the functional properties of single neurons are strongly driven by their anatomical interconnections with other cells, dendritic arborizations, and synaptic distributions. At the same time, singleneuron physical links affect the expression of functional patterns throughout the entire network by placing constraints on which functional interactions are more likely to occur. Consequently, it is getting crucial to combine a detailed description of the anatomical connectivity patterns with physiological parameters to capture the way functional properties emerge from structural configurations at the cellular scale.

This work addresses this challenge by proposing a combined structural and functional analysis of large neuronal networks that are functionally resolved at an unprecedented resolution, approaching the scale of single-neurons.

The joint study of structure and function has been recently gaining interest in the context of brain imaging modalities (Rykhlevskaia et al., 2008), where it is possible to observe largescale interactions. Recent attempts address the estimation of functional connectivity guided by the structural connectivity as prior (Deligianni et al., 2011; Chen et al., 2013; Zhu et al., 2013). The underlying hypothesis is that the functional connectivity must reflect the existence of structural paths connecting functionally linked regions (Honey et al., 2010). However, *macroscale* approaches are not suitable for single-neuron resolution as they deal with large areas (billions of neurons) that make any fine-grained analysis unfeasible.

On the other hand, *microscale* connectomics achieves good resolution by focusing on single or few cells, but looses the information on network-wide topology and interplays. A new branch of investigation is recently emerging studying the so-called *mesoscale* that, in principle, could overcome the limitations of *micro* and *macro* studies. Mesoscale connectomics refers to the analysis of connectivity at the level of neuronal circuits with a micrometric spatial resolution (Sporns, 2012). Interestingly, high-level functions such as learning and memory build on stratified non-linear mechanisms that can be particularly witnessed at this scale (Jimbo et al., 1999; Marom and Eytan, 2005). Although there is still no clear indication about the possibility of bridging the gap between the different scales at which the brain is currently investigated, there are studies highlighting the role of specific neurons (hub neurons) in determining emergent network dynamics (Bonifazi et al., 2009).

Thanks to recent technological advances, it is nowadays possible to collect high-resolution structural and functional information at the mesoscale from cultured neuronal networks. This enables the development of new methodologies for a combined structural and functional analysis at this scale. In particular, novel generations of active Micro Electrode Arrays (MEAs), such as the High-Density MEA (HD-MEA) chips introduced by Berdondini et al. (2009), allow to record the electrical activity of neuronal networks from thousands of electrodes at sub-millisecond resolution and at the granularity of the single cell. The combination of such a high-resolution functional data with fluorescence microscopy imaging can enable the unprecedented mapping of both activity and structure of neural assemblies at a cellular level. Indeed, relatively sparse neuronal cultures–grown on-chip by seeding few thousand cells–allow to acquire detailed spatio-temporal recording of neuronal activity and topographic distribution of neurons with respect to the electrode array. This provides the unique chance of correlating functional activity with neuronal topology over large assemblies.

This work proposes a computational framework for the joint analysis of functional and structural connectivity at the mesoscale which takes advantage of the remarkable spatial resolution offered by HD-MEAs.

In particular, we start from the reasonable hypothesis that the presence of a strong structural connection makes a functional connection more likely to occur. The influence of the network topology on the functional behavior has been already proven on a theoretical level (Kriener et al., 2009). Furthermore, distance and strength of cross-correlation have been proven to be related also *in vivo* (Hirase et al., 2001) and *in vitro* (Shlens et al., 2006). However, experimental studies at neuronal resolution covering large networks are typically more difficult to carry out due to both technological constraints and problem complexity. Here, we address this task by developing a set of computational algorithms that enables the combined structural and functional analysis of networks with thousands of neurons.

This could not be done on conventional MEAs that typically integrate 60–256 microelectrodes, and where existing studies are typically limited to the analysis of network-wide electrophysiological activity. Consequently, the absence of any anatomical evidence to support functional hypotheses strongly limits the potentiality of this analysis. Few recent attempts have been presented in literature addressing multimodal studies at the mesoscale. Abdoun et al. (2011) introduced the NeuroMap software tool for handling MEA recordings co-registered with fluorescence images. However, in this tool, the image is used only for visualization purposes. Another multimodal study has been proposed by Becchetti et al. (2012) for differentiating the functional activity of excitatory and inhibitory neurons from MEA recordings and GAD67-GFP imaging. Their method is based on a manual extraction of the structural information (i.e., visually classifying excitatory from inhibitory cells) lacking in any further characterization of the network anatomy (e.g., the topology). As no structural connectivity information is available, the assessed statistical properties of the electrophysiological signals only account for local functional dynamics, discarding more complex network interactions. Furthermore, in both cases the use of standard MEAs (up to few hundred electrodes) offers poor spatial resolution. Unlike HD-MEAs, these systems cannot provide the possibility of monitoring both single-cell activity and wide network dynamics at the same time.

In this paper, we propose a framework for integrating multimodal information acquired on HD-MEAs with the aim of refining the estimate of functional connectivity using the structural connectivity as prior. Specifically, we localize neurons with respect to the electrode array and estimate the structural connectivity of the electrodes to compute the topological distance along the paths connecting them. This is used as structural information to refine a rough estimate of functional connectivity based only on cross-correlations. As extensively suggested in the literature (Feldt et al., 2011), graph theory is used to support the analysis by describing the network connectivity with graph representations. Neuronal networks perfectly fit into this framework as it provides the flexibility to characterize both structure and function from anatomical and electrophysiological observations (Bullmore and Sporns, 2009).

An overview of the proposed approach is provided in **Figure 1**. Structural connectivity maps are first estimated from fluorescence images of the neuronal culture by using local directional features and heat propagation (Ullo et al., 2013). The obtained prior on the existing anatomical links is then used to refine the estimate of functional connectivity which is obtained from cross-correlation measures of the electrophysiological signals, as introduced by Maccione et al. (2012). In this fashion, the anatomical information offers a reference space facilitating the interpretation of the observed functional interactions.

The contributions of this paper are twofold. First, we introduce a computational framework capable of estimating the structural connectivity of large neuronal assemblies and we show how more reliable estimates of functional connectivity can be obtained by incorporating such structural information as prior. Second, we use the obtained results to formulate new hypotheses on relevant features of the electrophysiological activity that can better characterize functional interactions between neurons.

#### **2. MATERIALS AND METHODS**

#### **2.1. ELECTROPHYSIOLOGICAL RECORDINGS AND CELL CULTURE STAINING**

Cell cultures were recorded by means of High-Density Micro Electrode Arrays (HD-MEAs). These commercially available devices (www.3brain.com), have been extensively described in Imfeld et al. (2008) and Berdondini et al. (2009). Briefly, highdensity MEAs allow simultaneous extracellular recordings from 4096 square electrodes (pitch = 42μm) arranged in a 64 × 64 layout (2.7 by 2.7 mm2 active area) at a sampling rate of about 7 kHz per channel.

Primary hippocampal neurons from rat embryos at E18 were dissociated by enzymatic digestion and seeded on HD-MEAs previously sterilized and coated with polylisine adhesion factor (Maccione et al., 2010, 2012). Drops of 30–50 μL were seeded over the active area of the chip at a nominal low concentration of 100–150 cell/μL. After 2–3 weeks in incubator, cultures develop a sparse interconnected network structure showing synchronous functional activity. Extracellular electrophysiological recordings of neuronal signals were acquired at 18–19 Days *In Vitro*. Spontaneous activity was recorded for 10–15 min as control condition, followed by another 10–15 min recording under chemical stimulation by adding 30 μMol Bicucculline.

After electrophysiological recordings, neuronal tissues were fixed on the chip array in 4% paraformaldehyde for 20 min and stained with NeuN for neuronal nuclei and β3-tubulin for axonal and dendritic arborization (Maccione et al., 2012). Cultures were then inspected under a microscope, collecting multiple fields at 20× magnification with a micro positioning stage. The acquired portions were then stitched together using Adobe Photoshop CS3 and the open source free software Fiji (http://fiji.sc/Fiji).

#### **2.2. MULTIMODAL DATASET DESCRIPTION**

The combination of the HD-MEA technology with the immunofluorescence microscopy results in multimodal datasets, each consisting of a high-resolution fluorescence image—i.e., the *structural* data—and a set of electrophysiological recordings—i.e., the *functional* data1 .

For the purpose of our experiments, two different neuronal networks were cultured on HD-MEAs under the same experimental conditions. **Figure 5A** shows the fluorescence images of the two cultures. In each neuronal culture about one thousand of cells were grown, showing a strong degree of structural connectivity. As we aim at investigating the excitatory functional connectivity, we focus on the analysis of the electrophysiological recordings with added Bicucculline, a blocker of the inhibitory pathway. This choice limits the number of potential inhibitory connections and is a desirable condition since the cross-correlation (as defined by Equation 3) is only designed to detect excitatory functional connections (Garofalo et al., 2009). The raw electrical signal recorded by each electrode was encoded, after spike detection (Maccione et al., 2009), as a sparse vector of size *fs* × *tr*, where *fs* is the sampling frequency and *tr* is the recording time interval. The whole network recording is arranged in a sparse matrix, where the indexing (*i*,*j*) refers to the electrode at row *i* and column *j* in the electrode array. Each electrode (*i*, *j*) is then associated to a vector with the time stamps of the corresponding spiking activities. This encoding of the spiking activity is used as input data for estimating the functional connectivity (see Section 2.4).

The presented cases of study were selected as representative in terms of number of neurons, density of connections and number of functionally-correlated signals. Detailed information on each dataset are reported in **Table 1**. First order statistics on cell culture dynamics are in line with previous studies (Maccione et al., 2012).

#### **2.3. STRUCTURAL CONNECTIVITY ANALYSIS FROM FLUORESCENCE IMAGES**

Dissociated neuronal cultures show extensive and fuzzy connectivity that makes structural analysis computationally hard. To tackle this challenge, a method based on heat propagation is used to estimate the structural connectivity of neuronal assemblies with dense connectivity, as reported in Ullo et al. (2013).

The method provides a description of the network topology in terms of a graph where nodes correspond to the electrodes and edges represent structural connections. In fact, this provides the

<sup>1</sup>Imaging and electrophysiological datasets will be available on CARMEN (https://portal.carmen.org.uk) upon request of access credentials to the corresponding author.


common reference frame to relate the functional signal recorded by the HD-MEA to the network anatomy.

Maps of electrode connectivity are determined using a Graph Heat Kernel (GHK) framework (Belkin and Niyogi, 2003; Bai et al., 2010) based on probabilistic directional features (Ullo et al., 2013). These features encode the local directionality of the neurites within small patches of the image corresponding to the MEA electrodes. A feature consists in a histogram with 8 entries, each representing the probability of the current electrode being connected to each of its adjacent neighbors (both horizontal, vertical, and diagonal adjacency are considered).

In its general formulation, a GHK allows to estimate the structure of a graph by computing the amount of heat that propagates from a source to a destination node. The intuition behind the use of a GHK for structural connectivity estimation can be explained by first considering the lattice formed by the regular MEA structure. A weighted graph can be defined on this lattice where the electrodes are nodes and edge weights are given by their degree of connectivity, i.e., by the values of the corresponding probabilistic directional features (see **Figure 2**). If we placed a certain amount of heat on a *seed* node and let it propagate through the graph, heat propagation would favor the edges having higher weights, i.e., corresponding, in principle, to stronger connections. As a result, the amount of heat reaching a destination electrode from the seed could be considered as an estimate of the strength of their connectivity. Repeating this propagation for all seed electrodes, we can obtain an estimate of the whole-network structural connectivity. Only electrodes having neurons in their recording area are considered as seeds, as they are the ones substantially contributing to the electrophysiological activity.

Further details on the structural analysis will be provided in the following sections.

#### *2.3.1. Probabilistic directional features.*

A preprocessing pipeline is first run to detect neuronal nuclei and reconstruct the electrode array from the image as reported in Ullo et al. (2012). Specifically, the MEA reconstruction allows to compute an electrode-based partition of the image, i.e., a partition into small patches corresponding to the electrode areas (see **Figure 3A**). The proposed directional features are then extracted from each patch with the aim of obtaining the probability of connection between neighboring electrodes as explained by **Figures 3B,C**. The features characterize the local configuration of neurites' orientations using a directional Von Mises Mixture (VMM) model fitted to a number of line segments.

Von Mises distributions are widely used to describe directional statistics on the circle (Mardia and Jupp, 2000) and are defined by two parameters: mean μ and concentration κ. The larger is the value of the concentration, the higher is the clustering of the points around the mode placed at θ = μ.

**FIGURE 2 | Heat propagation.** A heat source is placed at the seed electrode and propagated according to the probability of connection defined by the directional feature. Heat propagation favors directions with higher probability of connection. The adjacent electrodes (numbered from 1 to 8) are reached by a different amount of heat according to the seed feature, as described by the colormap.

In our framework, segments—approximating real neurites are detected at each image patch using the Hough Transform. A different Von Mises distribution is then fitted to each of the segment endpoints. The goal is to describe the main neurite orientations inside the patch and the corresponding uncertainty in each of the given directions. Uncertainty is associated to the angles at which a neurite exits/enters the patch and is due to the approximation of real neurites by line segments (which can be affected by errors caused by noise, blurring, etc.). To fit the parameters of the Von Mises (VM) distribution, we compute the angle θ*<sup>A</sup>* (θ*B*) as the projection of the segment endpoint *A* (*B*) onto the circle circumscribing the patch, as shown in **Figure 3B**. The angle defines the mean μ*<sup>A</sup>* (μ*B*) of the VM fitted at endpoint *A* (*B*) which represents the most probable angle at which the neurite enters/exits the patch. To model the uncertainty of this orientation we compute the distance *dA* (*dB*) between endpoint *A* (*B*) and the boundary of the patch. The higher this distance, the higher the uncertainty of the neurite crossing the boundary exactly at the estimated angle. Consequently, the concentration parameter κ*<sup>A</sup>* (κ*B*) is set as inversely proportional to distance *dA* (*dB*).

For a patch with *n* segments, 2*n* VM distributions will be fitted to the data and used to define the VMM model. As the Hough Transform assigns each segment a vote depending on its evidence

on the image, votes are used to define mixture proportions. As a results, segments having stronger evidence will be assigned higher weight in the mixture model. An example of VMM model is shown in **Figure 3C**.

Finally, the obtained probability distribution is discretized in the 8 neighboring directions. This is done by computing the area under the probability density function of the VMM model in 8 different sectors of the circle, as shown in **Figure 3C**. This results in a histogram in which each entry represents the probability of the current electrode being connected to its neighbors.

#### *2.3.2. Graph heat kernel*

The heat kernel specifies how the information flows across a network or a manifold in time. Generally speaking, the goal of the heat kernel is to reduce the dimensionality of high-dimensional data lying on sub-manifolds, so it is related to the concept of spectral clustering (Luxburg, 2007). Similarly, it can be used to geometrically characterize the structure of a graph residing on a manifold by defining its pattern of geodesic distances (Bai et al., 2010).

Given a weighted graph *G* = (*V*, *E*, *W*), where *V* is the set of nodes, *E* ⊆ *V* × *V* is the set of edges, and *W* the matrix of edge weights, the heat diffusion on *G* is defined by the *heat equation*:

$$\left(L\_G + \frac{\partial}{\partial t}\right)h\_t = 0;\tag{1}$$

where *ht* is the heat distribution at time *t* and *LG* is the *Graph Laplacian* operator (Belkin and Niyogi, 2003). In particular, *LG* = *D* − *A*, where *A* is the symmetric *adjacency matrix* defined on graph *G*, and *D* is the *diagonal degree* matrix whose diagonal elements are given by *D*(*x*, *y*) = *<sup>y</sup>*∈*<sup>V</sup> A*(*x*, *y*).

As the time derivative of the kernel is determined by the graph Laplacian, the solution of the heat equation is obtained by exponentiating the Laplacian eigensystem over time. According to spectral graph theory, the *heat kernel* has the following eigendecomposition (Bai et al., 2010):

$$h\_t(\mathbf{x}, \boldsymbol{y}) = \sum\_{i=0}^{|V|} e^{-\lambda\_i t} \phi\_i(\mathbf{x}) \phi\_i(\boldsymbol{y}), \tag{2}$$

where *ht*(*x*, *y*) is the heat kernel element for nodes *x* and *y*, and λ*<sup>i</sup>* and φ*<sup>i</sup>* are the *i*th eigenvalue and eigenvector of the Graph Laplacian, respectively.

The heat kernel *ht*(*x*, *y*) is the solution of the heat equation with heat source placed at point *x* at time *t* = 0, and represents the amount of heat at point *y* after time *t*.

The heat kernel solution is generally computed in two steps: (1) the manifold is approximated by the adjacency graph *A* computed from data points and incorporating neighborhood information, and (2) the weighted graph Laplacian is used to estimate the real manifold, optimally preserving such neighborhood information (Belkin and Niyogi, 2003).

In our application, the weighted adjacency matrix *A* is obtained from the probabilistic directional features, by defining each element *A*(*x*, *y*) as the histogram value for the edge connecting electrode *x* to electrode *y* (defining their probability of being connected by a neurite). Due to the way features are defined, histogram values are not symmetric in the two directions, so the matrix *A* needs to be symmetrized, as requested by the GHK formulation. This is done by summing the probability contributions in the two edge directions.

The output of the heat kernel is a |*V*|×|*V*| adjacency matrix (4096 × 4096 in our case) indicating the electrode connectivity in terms of amount of heat propagated after time *t* from a seed electrode. Matrix weights are normalized (divided by the maximum value in the matrix) to obtain the final structural connectivity map. As a matter of fact, this matrix is quite sparse, as only electrodes having neurons in their recording area are taken into consideration as nodes. This allows to limit the connectivity estimation to actual neurons lying on electrodes that can contribute to the electrical activity recorded by the HD-MEA.

In the GHK framework, heat propagation is regulated by the time parameter *t*. **Figure 4** shows the influence of this parameter on the final estimate. It can be observed that, when *t* grows, a larger portion of the graph is explored, resulting in the overlap of multiple feature contributions (in addition to the initial seed feature). While this makes it more likely to introduce false positives, it also allows to discover new branches in the network connectivity and to compensate for imprecise and noisy local contributions. Hence, setting the value of *t* is a trade-off that strongly depends on the size of the considered domain (in our case the 64 × 64 matrix of electrodes).

#### **2.4. FUNCTIONAL CONNECTIVITY ANALYSIS FROM SPIKING ACTIVITY**

A cross-correlation based approach for functional connectivity estimation applied to electrophysiological recordings of *in vitro* populations has been recently validated on the HD-MEA recording system in Maccione et al. (2012). Cross-correlations are computed between pairs of electrode signals to obtain a first rough estimate of functional connectivity. For each pair of electrodes (*x*, *y*) (with at least one spike to ensure presence of activity) the following cross-correlation function (*cross-correlogram*) is evaluated among their spike trains:

$$C\_{\mathbf{x}\mathbf{y}}(\mathbf{r}) = \frac{1}{\sqrt{N\_{\mathbf{x}}N\_{\mathbf{y}}}} \sum\_{s=1}^{N\_{\mathbf{x}}} \sum\_{t\_{l}=\mathbf{r}-(\Delta\_{\mathbf{r}}/2)}^{\mathbf{r}+(\Delta\_{\mathbf{r}}/2)} \mathbf{x}(t\_{\mathbf{s}}) \, \mathbf{y}(t\_{\mathbf{s}}+t\_{l}) \tag{3}$$

with *Nx* (*Ny*) being the number of spikes in train *x* (*y*), *ts* the spike occurrence time in train *x* and τ the time window in which synchronous spikes in train *y* are counted. τ is set at 0.5 ms.

The resulting normalized cross-correlations are then postprocessed using a filtering strategy to remove false positives not compatible with biological prior. In particular, the maximum propagation velocity for *in vitro* biological preparations (400 mm/s, Bonifazi et al., 2005) is used to discard physiologically implausible links, i.e., links having correlation peak latency below this value. Such a physiological filter also accounts for delayed spikes in post-synaptic cells. Cross-correlations are then thresholded to retain only statistically significant links. To this aim, the cross-correlation of jittered spike trains [by ±5 ms, thus maintaining the same Inter Spike Interval (ISI) distributions] is computed as null model and a significance threshold *Cs* is defined using a non-parametric statistical test at *p*-value *p* = 0.05. This shuffling procedure, also called dithering (Grün and Rotter, 2010), is repeated 100 times on each randomly-selected pair of channels. The probability of the jittering was set as uniform in the ±5 ms time interval.

The thresholded functional graph, weighted by the crosscorrelation values, relies entirely on the recorded electrophysiology, discarding the valuable information coming from the structural modality.

#### **2.5. COMBINING STRUCTURAL AND FUNCTIONAL INFORMATION**

We build on the hypothesis that functional co-activation commonly relies on anatomical connections to refine the estimate of Functional Connectivity (FC) from our Structural Connectivity (SC) prior. In order to coherently combine structural and functional information, the refinement process starts from the unthresholded cross-correlation values obtained after spatio-temporal filtering.

As a first step, we observe that the functional connectivity estimates are unaware of the actual neuronal distribution on the array of electrodes. Due to noise affecting the spike detection and/or strong dendritic arborization (whose activity can be, in some cases, detected by the MEA), electrodes with no neuron in their recording area are sometimes included in the graph. As the proposed structural analysis is capable to retrieve a unique correspondence between neurons and electrodes, the first refinement stage consists in discarding such nodes from

the FC graph. Additionally, neuronal correlations have been shown to decay with physical distance (Vincent et al., 2013). Nevertheless, cross-correlation measures do not strictly reflect this behavior—due to noise and random co-activations—and the resulting FC graphs frequently present a substantial number of long-range links that are improbable, given the underlying network topology. Thresholding strategies, used to select a subset of somehow relevant links, are typically based on purely empirical observations due to the absence of any ground truth information.

We take advantage of the relationship between functional correlation and structural distance, to define the second step of our refinement strategy, called *reweighting*.

Specifically, the FC values (i.e., normalized cross-correlation peaks) associated to the functional graph are reweighted based on the distance of the corresponding nodes. This measure called *structural distance*—is the euclidean distance computed along the shortest path connecting the nodes on the structural graph. In principle, we want our algorithm to penalize functional links according to this value. To this aim, a functional link connecting electrodes *x* and *y* with cross-correlation peak defined as:

$$\text{Cp}(\mathfrak{x}, \mathfrak{y}) = \max\_{\mathfrak{t}} \text{C}\_{\mathfrak{x}\mathfrak{y}}(\mathfrak{x}),\tag{4}$$

is reweighted according to the following formula:

$$\mathcal{W}(\mathbf{x},\boldsymbol{\chi}) = \widetilde{\mathcal{C}}\_P(\mathbf{x},\boldsymbol{\chi})^{(1+d\_{\mathbf{x}\boldsymbol{\chi}})} \tag{5}$$

with *dxy* being the structural distance and *CP*(*x*, *<sup>y</sup>*) being the cross-correlation peak normalized in the interval [0, 1]. As the distance *dxy* is also normalized in the same interval [0, 1], the resulting weights *W* reflect our initial hypothesis.

Finally, a threshold of statistical significance for the estimated functional links is determined by applying the reweighting process to the null model introduced in Section 2.4. A statistical significance test at *p*-value *p* = 0.05 is then used to define a significance threshold *Ws*. As will be discussed in Section 3, results of the refined and thresholded FC graphs show that, by incorporating the structural information as prior, it is possible to provide estimates of functional connectivity more coherent with the network topology.

#### **2.6. CLASSIFICATION OF FUNCTIONAL CONNECTIONS**

After the reweighting and thresholding of the initial FC graph, a subset of the original functional links is discarded. We want to investigate if the two classes of discarded and retained connections are characterized by distinguishable functional features. First of all, this would allow to show that the structural prior is not only imposing an *a priori* constraint on the functional connectivity but it is effectively selecting links that behave differently from a functional point of view. Second, this investigation could give some insights on how to effectively detect functional links purely from the analysis of the electrophysiological activity. Former studies (e.g., Ostojic et al., 2009) are informative on how the cross-correlation function is affected by variations of the network background activity, the synaptic strengths and the local network connectivity. In line with these studies, we compute a set of features of the cross-correlogram (i.e., cross-correlation peak, time lag of the peak, spread of the cross-correlation function) that might be informative on the occurrence of actual functional connections. For each discarded/retained link, functional features are computed from the analysis of the original spike trains. As they are not affected by our structural prior, this allows to highlight any intrinsic property of the functional activity capable of revealing real co-activations. The sets of discarded/retained links are then regarded as two classes and a linear Support Vector Machine (SVM) (Duda et al., 2000) is trained/tested on these data to quantify the discriminative power of different combinations of features. The functional features considered in this study are the following:


Features *CP* and *CO* are related to the strength of a given functional link, whereas the time lag of the peak (*C*τ ) will likely be proportional to the closeness of the correlated nodes. The spread of the cross-correlation function can be informative of the nature of a given link: broad functions would likely correspond to unreliable and noisy cross-correlations. As previously shown in Maccione et al. (2012), this feature is also related to the link length. Here, instead of resorting to a gaussian fit of the cross-correlation functions (Maccione et al., 2012), the spread is measured in terms of the more general entropy measure (*CH*) quantified as

$$C\_H = -\sum C\_n(\mathbf{r}) \, \log\_2 C\_n(\mathbf{r}) \tag{6}$$

with *Cn*(τ ) = *C*(τ )/ <sup>τ</sup> *C*(τ ).

We also included the mean firing rates (*MFRx*, *MFRy*) that are typically used to quantify first order statistics of cell culture dynamics. In principle, the firing rates cannot be regarded as effective predictors of any correlated activity, however, at higher firing rates the probability of coincident events (i.e., correlated activities) increases. In addition, since *CO* is related to *CP* by the geometric mean of *MFRx* and *MFRy* (by the relation *CO* = *CP T MFRx* · *MFRy*, with *T* being the length of the recording session), this further motivates the investigation of the interplay between all the features determining the cross-correlation function.

To evaluate the performance of the classifier, we adopt a cross validation (CV) procedure. The original dataset is split into two complementary subsets used, respectively, for training and testing the classifier. Specifically, a standard 10-fold CV is carried out that consists in subdividing the original dataset into ten subsets, the training is performed on 9/10 of them and the accuracy (i.e., performance) of the linear SVM classifier is evaluated on the tenth. This procedure is repeated ten times by alternating the tested subset. The performance of the linear SVM is then quantified as the mean and standard deviation of the accuracies obtained by the 10-fold CV procedure.

#### **3. RESULTS AND DISCUSSION**

#### **3.1. FUNCTIONAL CONNECTIVITY ESTIMATION FROM STRUCTURAL PRIOR**

The proposed approach was applied to the analysis of the two HD-MEA datasets described in Section 2.2. The structural connectivity of the network was first estimated by running the GHK algorithm for each *seed* electrode, i.e., for each electrode having at least one neuron in its recording area. Propagation time was chosen experimentally and set to *t* = 25, taking into account the extent of the domain, i.e., the 64 × 64 initial lattice defined on the MEA structure. The value provides a good trade-off between the capability of the system of exploring the graph and the introduction of spurious connections due to the extended contribution of neighboring features. For further details on the study of the time parameter the reader is referred to Ullo et al. (2013). The estimated SC graphs are shown in **Figure 5B** and both reflect the strong degree of connectivity of the networks (5570 and 7808 SC links were estimated for Chip-253 and Chip-250, respectively).

Functional connectivity graphs were computed for the two neuronal cultures using the cross-correlation algorithm, followed by spatio-temporal filtering and thresholding, as described in Section 2.4. The resulting FC maps are provided in **Figure 6A** where functional links are color-coded based on the value of the cross-correlation peak. Both graphs—even after selecting only the statistically significant connections—present a substantial number of long-range links. According to what is suggested in Section 2.5, electrodes without any neuron in their recording area were first removed from the functional graphs. This allowed to reduce the number of functional links by 22% in the case of Chip-253 and by 37% for Chip-250.

The proposed reweighting method was then applied to investigate which of the remaining functional links actually relied on a structural path. It should be noted that the use of the shortest path between pairs of electrodes is a choice that favors shorter structural connections which are more likely to be direct or, in general, morphologically plausible. Although this does not guarantee that the chosen path is the one actually active, we assume that—statistically—minimal paths are the most probable ones (Vincent et al., 2013). **Figure 7A** shows the distribution of functional weights with and without the graph refinement. A substantial decrease in the number of functional connections can be observed as a result of incorporating the structural information into the initial FC estimates. To better highlight this effect, weight vs. distance scatter plots are also provided in **Figure 7B**. The plots are referred to the initial functional links obtained after spatio-temporal filtering and after removing the electrodes without neurons. Points are color-coded using a heat colormap based on the link's cross-correlation. It can be observed that higher correlations correspond to shorter paths and that an increase in the structural distance weakens the corresponding functional correlation. The red line represents the significance threshold *Cs* obtained from the statistical test. Although applying this threshold would allow to discard 37.1% of connections for Chip-250

and 68.3% for Chip-253, the result still presents many long-range functional links.

Unfortunately, after the statistical significance test, there is no easy or intuitive way for neuroscientists to make a further distinction between relevant and spurious connections. More conservative estimates of functional connectivity are sometimes provided by ranking the estimated links according to their cross-correlation values and selecting the strongest *K* (e.g., *K* = 100, Maccione et al., 2012). However, the problem of determining a satisfactory value for *K* still remains. Ideally, we would like to threshold the FC graph in order to privilege short-range connections while penalizing long-range ones. At the same time, we want to allow the selection of links with substantial correlation even on long distances.

The proposed reweighting formula allows to meet these requirements, as shown by the scatter plots of **Figure 7C**. Points are plotted according to their new weight but they maintain the initial color of **Figure 7B**. This allows to highlight how the significance threshold applied to the reweighted graph can effectively discard functional links that are too distant, even when they have significant correlation. **Figure 6B** shows the refined functional graphs for the two networks under study obtained after applying the significance threshold *Ws*. Results present an overall decrease in the number of FC links by 86.5% for Chip-253 and 83.7% for Chip-250 with respect to the initial cross-correlation estimates. As opposed to the functional connectivity graph shown in **Figure 6A**, the introduced strategy automatically selects functional connections that are more coherent with the structural topology of the network. This can be more evidently observed in the case of Chip-253, where the culture presents a clear clustering into two subnetworks that are almost completely separated from each other. Nevertheless, the initial estimate of the FC graph relying only on the electrophysiological signals—included a massive amount of links connecting the two subnetworks. Thanks to the reweighting process we are capable of filtering such FC links, retaining only the ones being coherent with the structural prior or showing a substantially strong correlation. The introduced reweighting formula allows to penalize correlation with distance while modulating the contribution of the structural prior, based on the amount of evidence on the functional co-activation. Thanks to the proposed formulation, lower correlations are more strongly penalized with distance—thus imposing stronger

**FIGURE 6 | Functional connectivity graphs. (A)** Functional connectivity graphs estimated computing cross-correlations on the pairwise electrophysiological signals and applying spatio-temporal filters and a significance threshold based on dithering. Functional links are color-coded

based on the value of their correlation peak. **(B)** The same graphs obtained after removing electrodes without any neuron and reweighting based on the structural distance. The refined FC graph shows a better coherence with the network topology.

structural prior—whereas higher correlations are less influenced by the neuronal displacement as the functional evidence prevails. This provides a more conservative estimate of functional connectivity, as compared to other weighting functions. For instance, in case of a negative exponential function [i.e., *CP*(*x*, *y*)*e*−*dxy* ], at a given distance, higher correlations would be more penalized than smaller ones. This would imply a substantial influence of the structural prior, even in presence of strong evidence of functional co-activation. On the contrary, with our approach structural and functional information are combined preserving the contribution of clear functional observations without imposing a too strong structural prior. **Table 2** summarizes the quantitative results on the structural and functional analysis of the considered datasets.

#### **3.2. RELEVANT FEATURES FOR FUNCTIONAL ANALYSIS**

We want to investigate if the functional features of discarded/retained FC links can suggest new hypotheses on the way neurons functionally interact. To this purpose, we want to assess and compare the relevance of commonly used functional features for the classification of FC links belonging to the two classes (discarded or retained links, according to the structural prior). Some of these functional features are directly computed from the cross-correlation function. The cross-correlation peak *CP*, its time lag *C*<sup>τ</sup> and the spread *CH* are reported in **Figure 8A**. Then, as indicated in Section 2.6, *CO* is computed from *CP*, *MFRx*, and *MFRy*. To gain some insights on the potential discriminative power of each feature, we first compared their distributions across the two classes of discarded/retained connections. Results are shown in the box plots of **Figure 8B**. As intuitively expected, the distribution of correlation peaks *CP* is significantly different from one class to the other, as this feature is directly involved in the FC graph estimation. The features *CH* and *C*<sup>τ</sup> show smaller values in the retained dataset, indicating that the reweighting procedure was effective in selecting cross-correlation functions with reduced spreads and time lags. The latter result shows that the retained features actually correspond to more reliable (i.e., lower entropy) and more physiological (i.e., the peak is closer to the integration time of synaptic events) functional links. Then, subsets of the considered features were used to train and test the SVM classifier. The corresponding ranked accuracies (mean ± std on 10-fold cross-validation) are reported in **Figure 8C** and confirm that *CP* is the most significant feature (x-axis: 1–4). Interestingly, **Figure 8C** shows that when *CP*

#### **Table 2 | Quantitative structural/functional information.**

color-coded with a heat colormap based on the correlation value. The significance threshold *Cs* is computed using a statistical test with *p*-value


network.

connections that are more coherent with the structural topology of the

is removed from the tested features (x-axis: 5 on) a reasonable level of accuracy can still be achieved by the linear SVM. This holds true for different combinations of features (**Figure 8C**, xaxis: 5–7 for Chip-250; x-axis: 5–9 for Chip-253) indicating that other features are also informative for discriminating retained from filtered links. Specifically, the mean firing rates (*MFRx*, *MFRy*) can be alternatively combined with the *CO*, *CH*, and *C*<sup>τ</sup> features still yielding a good discriminative power. Finally, the computed accuracies reach a plateau (**Figure 8C**, x-axis: 8–17 for Chip-250; x-axis: 11–17 for Chip-253) that corresponds to the noise level of the classifier (i.e., the chance of a random classification). Indeed, based on **Table 2**, the noise level (*ACC*η) was computed analytically<sup>2</sup> and matched with the corresponding plateaus (Chip-250: *ACC*η = 74.2%; Chip-253: *ACC*η = 82.6%). The plateau region is characterized by single as well as subgroups of features (e.g., *CO*, *C*<sup>τ</sup> , {*MFRx*, *MFRy*} and *CH*) that are ineffective for discriminating the retained from the discarded links. In conclusion, apart from *CP*, we found that different combinations of features can also be effective in discriminating the retained from the discarded links thus motivating the development of alternative algorithms that incorporate this information to improve the detection of structurally-coherent functional connectivity maps.

#### **3.3. GENERAL DISCUSSION AND PERSPECTIVES**

The study of the relationship between structure and function at the mesoscale, taking advantage of multielectrode arrays and fluorescence microscopy, has to face two important issues: (i) a limited optical resolution for the structural description, and (ii) the need for resolving single neurons from extracellular recordings on the functional aspect. Knowing this limitation in resolution, a central concept in our approach is to take advantage from the combination of *partial* descriptions of structure and function to generate a refined estimate of the network activity. Furthermore, to place our study in the best conditions to properly validate the proposed ideas, we adopt low-density cell cultures and high-density MEAs. As a matter of fact, this combination allows to typically record one single-unit from each electrode of the 4096-array (this holds for about 90% of the electrodes). This settings allows to minimize the shared variance given by the potential recording of many neurons from a single electrode. On the other side, the issue of cross-talk that might be given by the recording of the same neuron from many nearby electrodes is minimized by the low-density culture condition and by the electrode density of the CMOS-MEA that provides a small inter-electrode separation of 21μm. Finally, we deliberately use low-density cell cultures as they enable to validate the proposed framework allowing to identify single neurons and estimate their connectivity within large neuronal networks. However, in principle, the basic concepts of the presented methodology might also be applied to denser cell cultures, to *ex vivo* brain tissue preparations or even to *in vivo* experimental studies on subsets of neural populations expressing fluorescent markers. This would be feasible upon the adoption of sufficiently high-resolution microscopy and recording techniques. For instance, having higher plating densities would imply a much larger number of structural connections. In this case, the problem complexity would lie in a correct and reliable encoding of the local neuritic architecture. As the proposed local directional features are capable to deal with complex structures showing many crossing and branching neurites, despite the increase in the computational load, it would be possible to apply the same feature-based analysis. The heat kernel propagation could then be used to estimate the structural connectivity even in such denser neuronal preparations.

#### **4. CONCLUSIONS**

Although functional analysis at the mesoscale is typically carried out with coarse or absent structural information, thanks to the HD-MEA technology and to the proposed structural analysis, it was possible to move a step forward relating network-scale functional and structural data at cellular resolution.

In this paper, we presented a computational framework capable of estimating structural and functional connectivity graphs from immunofluorescence images and electrophysiological recordings of *in vitro* neuronal networks cultured on HD-MEAs. As functional correlation and structural distance have been shown to be related both theoretically and in different experimental conditions (Hirase et al., 2001; Shlens et al., 2006; Kriener et al., 2009; Vincent et al., 2013), we introduced a reweighting strategy that allows to refine correlation-based measures of functional connectivity using the acquired structural prior. Such refined estimates were then used to investigate the role of different functional features in actual neuronal interactions. Our analysis showed that the combination of structure and function allows to obtain reliable functional connectivity graphs that are more coherent with the network topology and, as a consequence, with the known distance-dependent neuronal behavior. The classification results also allowed to reveal how different combinations of features can be more informative than others when targeting the detection of correlated functional activities.

Thanks to the cellular resolution offered by the HD-MEA technology, the proposed approach allowed, for the first time, to obtain a full characterization of the structural and functional connectivity at the mesoscale with a granularity of the single cell. This first attempt in combining structure and function at this level paves the way toward a deeper understanding of the lowlevel functions of complex circuits from which higher-level brain behaviors emerge.

Further investigation will target the analysis of more advanced reweighting techniques, based on a probabilistic modeling of the relationship between cross-correlation and structural distance or other relevant features of the structural graph. Complementary future work will address the analysis of dissociated networks with selective immunofluorescence staining to separate the contributions of inhibitory and excitatory subnetworks and study their structure-function interplay.

#### **REFERENCES**


<sup>2</sup>*ACC*<sup>η</sup> = *max*(*nDIS*, *nRET*)/(*nDIS* + *nRET*) with nDIS/nRET corresponding to the cardinalities of the discarded/retained data sets.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; accepted: 05 November 2014; published online: 20 November 2014.*

*Citation: Ullo S, Nieus TR, Sona D, Maccione A, Berdondini L and Murino V (2014) Functional connectivity estimation over large networks at cellular resolution based on electrophysiological recordings and structural prior. Front. Neuroanat. 8:137. doi: 10.3389/fnana.2014.00137*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Ullo, Nieus, Sona, Maccione, Berdondini and Murino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Axonal and dendritic density field estimation from incomplete single-slice neuronal reconstructions

#### *Jaap van Pelt <sup>1</sup> \*, Arjen van Ooyen1 and Harry B. M. Uylings <sup>2</sup>*

*<sup>1</sup> Computational Neuroscience Group, Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands*

*<sup>2</sup> Department of Anatomy and Neuroscience, VU University Medical Center, Amsterdam, Netherlands*

#### *Edited by:*

*Patrik Krieger, Ruhr University, Germany*

#### *Reviewed by:*

*Artur Luczak, University of Lethbridge, Canada Gillian Queisser, Goethe Center for Scientific Computing, Germany*

#### *\*Correspondence:*

*Jaap van Pelt, Computational Neuroscience Group, Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, Netherlands e-mail: j.van.pelt@vu.nl*

Neuronal information processing in cortical networks critically depends on the organization of synaptic connectivity. Synaptic connections can form when axons and dendrites come in close proximity of each other. The spatial innervation of neuronal arborizations can be described by their axonal and dendritic density fields. Recently we showed that potential locations of synapses between neurons can be estimated from their overlapping axonal and dendritic density fields. However, deriving density fields from single-slice neuronal reconstructions is hampered by incompleteness because of cut branches. Here, we describe a method for recovering the lost axonal and dendritic mass. This so-called completion method is based on an estimation of the mass inside the slice and an extrapolation to the space outside the slice, assuming axial symmetry in the mass distribution. We validated the method using a set of neurons generated with our NETMORPH simulator. The model-generated neurons were artificially sliced and subsequently recovered by the completion method. Depending on slice thickness and arbor extent, branches that have lost their outside parents (orphan branches) may occur inside the slice. Not connected anymore to the contiguous structure of the sliced neuron, orphan branches result in an underestimation of neurite mass. For 300μm thick slices, however, the validation showed a full recovery of dendritic and an almost full recovery of axonal mass. The completion method was applied to three experimental data sets of reconstructed rat cortical L2/3 pyramidal neurons. The results showed that in 300μm thick slices intracortical axons lost about 50% and dendrites about 16% of their mass. The completion method can be applied to single-slice reconstructions as long as axial symmetry can be assumed in the mass distribution. This opens up the possibility of using incomplete neuronal reconstructions from open-access data bases to determine population mean mass density fields.

**Keywords: neuronal morphology, reconstruction, slices, density fields, cut branches, recovery**

#### **INTRODUCTION**

Cognition emerges from electrical activity dynamics in neuronal networks in the brain. These networks consist of a large number of neurons from a multitude of cell types connected to each other via synapses. Neurons innervate space through their axonal and dendritic arborizations and synapses may be formed when axonal and dendritic arbors are sufficiently close in space (Peters, 1979; Mishchenko et al., 2010). The shape of neuronal arborizations is therefore a crucial determinant of synaptic connectivity in the brain. Neurons show a large variability in shape but maintain characteristics typical for their cell type. The distribution of dendritic and axonal mass in space can be described as a mass density field, indicating at each location in space the amount of dendritic and axonal mass (in terms of length or volume). Averaging these density fields over a population of neurons gives a statistical representation of how this cell type distributes its mass in space.

Potential locations for synaptic connections between neurons can be found by searching all spatial locations where axonal and dendritic branches are sufficiently close to each other (Van Pelt et al., 2010). In a recent study we showed that the number of potential locations can also be derived from the overlap between axonal and dendritic density fields (Van Pelt and Van Ooyen, 2013). Thus, for creating neuronal networks with neurons at different locations in space and originating from a variety of cell types, knowledge about their population mean density fields is sufficient to estimate the number of potential synapse locations between these neurons. Note, however, that for estimating actual connectivity additional knowledge is required of the probability that a synapse will develop at a potential location (e.g., Mishchenko et al., 2010). Thanks to open-access data bases neuronal reconstructions from a large number of cell types have now become widely available [e.g., NeuroMorpho.org (Ascoli, 2006) and SenseLab (Shepherd et al., 1997)]. These data could, in principle, be used for calculating population mean density fields for each cell type represented in the data base. However, many of the reconstructions originate from stained neurons in single slices with thicknesses up to about 300μm. With axonal and dendritic arborizations extending beyond the spatial boundaries of single slices, all these reconstructions have cut endings and are thus incomplete (**Figure 1**). This incompleteness hampers the use of these reconstructions for density field estimations.

In the present study a method is developed to estimate the lost axonal and dendritic parts by extrapolating densities calculated from the observed parts inside the slices to the space outside the slices, assuming axial symmetry in the axonal and dendritic mass distribution. This so called completion procedure is described in the section Materials and Methods and applied to three different data sets of rat cortical layer 2/3 pyramidal neurons for which axial symmetry can be assumed. A vital part of the study is the validation of the method. To this end, 50 neurons generated with our NETMORPH simulator (Koene et al., 2009) were artificially sliced with varying slice thicknesses, and subsequently subjected to the completion procedure. Comparison of the original mass distributions with the sliced ones and the completed ones showed the level of recovery obtained.

For slices of thickness of 300μm or thicker the validation study showed that the completion procedure resulted in a (almost) full recovery of the dendritic and axonal mass density fields from the incomplete reconstructed neurons. An important aspect in the recovery of cut arborizations is the occurrence of orphan branches, i.e., branches which have lost their parents located outside the slice. These orphan branches are no longer part of the contiguous reconstructed structure and thus result in an underestimate of the mass of the arborization inside the slice, which cannot be recovered by the completion method. This especially occurred for larger arborizations (axons) in thin slices of 100 or 200μm thickness.

#### **MATERIALS AND METHODS**

#### **MASS DENSITIES IN 3D SPACE**

The completion method is based on the calculation of the neuronal mass densities inside the slices and extrapolating these densities to the area outside the slices. For this extrapolation it is assumed that the mass densities of the neuronal arborizations are axial symmetric (**Figure 2A**). This is a plausible assumption for pyramidal neurons, which have the apical main shaft as symmetry

**neuron extending its apical (dark blue) and basal (red) dendritic branches in the XY direction of the slice, and (B) a side view of the slice with the neuron having many branches cut in the Z-direction of the slice.** The cut endings are indicated by colored dots with the lost branches indicated in light blue. Axonal branches are not shown in these cartoon images.

axis. Using an axial-radial coordinate system, the neuronal mass is determined for a given height and a given radius, thus summing all the mass in a ring as is illustrated in **Figure 2A**. For a fully intact neuron, the radius of the integration ring ranges from zero up to the maximal radial extension of the arborizations. The total mass distribution of a neuron becomes a function of height and radius. Evidently, in the case of pyramidal neurons, these distributions can also be obtained separately for axons and (apical and basal) dendrites. For a sliced neuron, its radial extension in the XY direction of the slice can become as large as in the intact neuron, but in the Z-direction it is limited by the boundary planes of the slice. Also the integration ring will be complete for small radii but becomes incomplete at one or two sides for larger radii. In the last case the ring is reduced to two separate ring segments (**Figure 2B**). The estimation of the mass densities is then restricted to the remaining parts of the integration rings.

#### **METHOD FOR ESTIMATING MASS DENSITIES OUTSIDE THE SLICES**

With the assumption of axial symmetry, the densities in the fractional parts of the integration rings inside the slice can be extrapolated to their complementary parts outside the slice. The volume of the integration ring inside and outside the slice is fully determined by the position of its center (i.e., the symmetry axis), the radius of the integration ring, and the thickness of the slice. Three situations can be distinguished. The ring can be fully contained in the slice, it can extend the slice on one side, or it can extend the slice on both sides (**Figures 2C–E**). The neuronal masses can only be obtained for the volume fraction of the ring inside the slice. For an integration ring with radius *r* (pointing to the center of the ring), the volume fraction *F*(*r*) of the ring area (or volume) within the slice can be expressed in terms of the thickness of the slice (*T*), the position of the ring center within the slice (*H*) and the radius of the ring (*r*).


$$H > r \text{ and } T - H \le r \text{, with } \wp = \arccos\left(\frac{T - H}{r}\right).$$

$$\text{Ring volume fraction: } F(r) = \frac{\beta r \delta r}{2\pi r \delta r} = \frac{\beta}{2\pi} = \frac{2\pi - 2\wp}{2\pi} = \frac{\beta}{2}$$

$$\frac{\pi - \arccos\left(\frac{T - H}{r}\right)}{\pi}.$$

C. Ring extends on both sides of the slice:

$$\begin{aligned} H &\le r \quad \text{and} \quad T - H \le r, \quad \text{with} \quad \alpha = \arccos\left(\frac{H}{r}\right) \quad \text{and} \quad \gamma = \frac{\pi}{r} \\\arccos\left(\frac{T - H}{r}\right). \\\ \text{Ring volume fraction: } F(r) &= \frac{2\beta r \delta r}{2\pi r \delta r} = \frac{\beta}{\pi} = \frac{\pi - \alpha - \gamma}{\pi} = \frac{\pi}{r} \\\ \pi - \arccos\left(\frac{H}{r}\right) - \arccos\left(\frac{T - H}{r}\right) \\\ \pi &= \frac{\pi}{r} \end{aligned}$$

symmetry is assumed. The distribution of mass is obtained by summing all the mass per integration ring (yellow-orange ring) at a given axial position (height) and with a given radius. The final distribution is given as a function of radius and height. **(B)** Within the restricted space of a slice, the integration ring becomes fractionated when it exceeds the slice boundary. **(C–E)** Three different conditions for the integration ring in a slice of thickness *T*. The integration ring has its center at a distance *H* from

and a thickness δ*r*. It can **(C)** be fully contained in the slice, **(D)** extend the slice on one of its sides, or **(E,B)** extend on both of its sides. **(F)** Effect of slicing of a neuron leaving a contiguous structure within the slice (dark blue), lost branches outside the slice (red) and orphan branches within the slice (light blue and marked in dashed ellipses). These orphan branches have lost their parent branches and therefore their connection to the dark blue contiguous structure.

An estimate of the mass in a full ring *Mring* (*r*, *h*, δ*r*, δ*h*) with radius *r* and at height *h* and with radial thickness δ*r* and height δ*h* can be obtained by dividing the experimentally observed neuronal mass *Mobs*(*r*, *h*, δ*r*, δ*h*) in the part of the ring inside the slice by the ring volume fraction. Thus,

$$M\_{\rm ring}(r, h, \delta r, \delta h) \approx \frac{M\_{\rm obs}(r, h, \delta r, \delta h)}{F(r)}.$$

For the mass density in the ring *Dring* (*r*, *h*) we now obtain

$$D\_{\rm ring}(r,h) = \frac{M\_{\rm ring}(r,h,\&r,\&h)}{V\_{\rm ring}} = \frac{M\_{\rm ring}(r,h,\&r,\&h)}{2\pi r \delta r \delta h}$$

with *Vring* the volume of the full ring. In our analysis, integration rings with a radial thickness of δ*r* = 1μm and height δ*h* = 1μm are used.

#### **ORPHAN BRANCHES**

Neuronal arborizations may extend beyond the boundaries of a slice but more distal parts may bend back into the slice (**Figure 2F**). Such branches have lost their parent branches and are called orphan branches. Orphan branches are no longer connected with the proximal parts of the arborization inside the slice. As the reconstruction procedure quantifies only the contiguous parts of the arborization inside the slice, the orphan branches are lost. In the calculation of the mass densities from the contiguous part inside the slice, this then results in an underestimation of the neuronal mass. As the completion procedure extrapolates from this neuronal mass, it does not solve the problem of lost orphan branches.

#### **SLICE THICKNESS AND POSITION OF NEURONS INSIDE A SLICE**

For the completion procedure it is important to know the thickness of the slice and the position of the cell body with respect to both boundary planes of the slice (Z-axis). Although the thickness of the slices is provided in most reconstructions, this is not the case for the cell's position in the slice, so that we had to estimate the cell's position from the reconstructions themselves. For this estimation we used the fact that all cut terminal tips have approximately the same Z-coordinate. To this end we calculated the Z-coordinates of all terminal tips of both axons and dendrites in the reconstruction and obtained a frequency distribution along the Z-axis. When the distribution showed a sudden increase in frequency (number of tips) at one or both of its ends, this was interpreted as the effect of cutting. When a frequency increase was observed at the low-Z end of the distribution, the cut part of the cell's arborization was aligned at the low-Z end of the slice. When a frequency increase was observed at the high-Z end of the distribution, the cut part of the cell's arborization was aligned at the high-Z end of the slice. Finally, when a frequency increase was observed at both sides of the distribution, it indicated the actual thickness of the slice. The thickness information and the estimated cell positions were used in the completion procedure.

#### **POPULATION AVERAGES OF THE MASS DENSITIES**

The mass density completion procedure was applied to the arborizations of individual neurons because the procedure depends on the position of the individual neuron inside the slice, which varies from neuron to neuron. To obtain a population average of the estimated mass densities, the cells were aligned by their somata and all the cells were rotated in such a way that their apical main stem was pointing into the Y-direction.

#### **ALIGNMENT PROCEDURE**

For the alignment of the apical dendrite, the orientation of the apical main stem was needed. As the data formats in the reconstruction files generally do not distinguish between apical main stem, and apical tuft and oblique branches, this information needed to be derived from the supplied data. To this end, an iterative procedure was applied of pruning terminal line pieces off the apical dendrite, until it was reduced to a single segment with one terminal line piece. The tip coordinate of this terminal line piece together with the exit coordinate from the soma of the apical dendrite provided the alignment line piece with which to rotate the whole cell in the XY plane (i.e., around the Z-axis) in such a way that the projection of the alignment line piece onto the XY plane was pointing into the Y-direction. Alternatively, a rotation in the XYZ space could have been applied so that the alignment line pieces were all pointing into the Y-direction. However, such a rotation would also have changed the Z-coordinates of the terminal tips in the arborizations and would have hampered the use of tip coordinates for estimating the cell's position in the

slice. As a consequence, the orientation of the alignment segment could maintain a slightly tilted angle with respect to the Z-axis, while the Z-axis itself was taken as symmetry axis for the calculation of the mass distribution. Given the wide spread of arborization mass, the effect of a possibly slightly tilted orientation on the final estimated mass distribution was assumed to be negligible.

#### **VALIDATION**

Crucial for the completion procedure is its validation, i.e., whether the masses of the completed neurons are equal to those of the original non-sliced neurons. Such a validation is not possible for experimental data sets of sliced neurons, but can be done for a set of model-generated neurons. For this validation, we used a set of 50 neurons, generated with our NETMORPH simulator (Koene et al., 2009). These model neurons were subsequently sliced according to several slice thicknesses, followed by the density field completion procedure. The masses of the original and of the completed neurons were subsequently compared. Because in this case the mass of orphan branches was also known, the comparisons were made from completions with and without inclusion of orphan branches.

#### **DATA SETS**

#### *NETMORPH-generated neurons*

The data set of neuronal arborizations used for validating the mass completion procedure was obtained with our simulator NETMORPH (Koene et al., 2009). A number of 50 random neuron morphologies were generated with growth parameters optimized on a set of rat cortical L2/3 pyramidal neurons, reconstructed by Svoboda (Shepherd and Svoboda, 2005) and made available by the NeuroMorpho.org data base (Ascoli, 2006). This same data set was also used and described in an earlier study (Van Pelt and Van Ooyen, 2013).

#### *Svoboda data set*

This dataset consists of 11 young adults (25–36 days PN) Sprague Dawley rat somatosensory barrel cortex L2/3 pyramidal neurons, reconstructed by Svoboda (Shepherd and Svoboda, 2005) from 300μm thick slices, and made available by the NeuroMorpho.org database (Ascoli, 2006).

#### *Markram data set*

This dataset consists of 33 young (13–15 days PN) Wistar rat somatosensory cortex L2/3 pyramidal neurons, reconstructed by Wang et al. (2002) from 300μm thick slices, and made available by the NeuroMorpho.org database (Ascoli, 2006).

#### *Parnavelas-Uylings data set*

This dataset originates from a study on basal dendritic development in female Sprague-Dawley rat visual cortex pyramidal and non-pyramidal neurons (Parnavelas and Uylings, 1980; Uylings et al., 1994). Reconstructions of 153 Golgi stained pyramidal dendrites from slices with a thickness of about 120μm were obtained from layer 2/3 at different ages of postnatal cortical development (10, 14, 18, 24, 30, and 90 days PN).

#### **RESULTS**

#### **VALIDATION—DATA NETMORPH**

For one of the NETMORPH-generated neurons an example is given in **Figure 3** to demonstrate the impact of slicing on the remaining morphology within the slice.

To validate the completion procedure, the 50 NETMORPH generated neurons were artificially sliced with slice thicknesses of 100, 200, 300, 400, 500, 600, 1000, or 2000μm. First, the neurons were aligned with their apical main stem pointing into the Y-direction, and with the slicing perpendicular to the Z-axis. For the position of the neurons in the slice, two choices could be made. With the somata positioned in the center of the XY plane of the slice, the Z-coordinate was put either in the center of the slice (thus with equal distances to both cutting sides of the slice), or was uniform randomly selected within an 80% Z-range (between 10 and 90%) or within a 60% Z-range (between 20 and 80% of the Z-range). After slicing, the axial-radial mass distribution was obtained for each individual neuron followed by the completion procedure, which was applied excluding or including orphan branches. The results of the completion procedure were finally summed and averaged over all the NETMORPH neurons in the data set.

The results for a slice thickness of 200μm, 80% range random Z-coordinates of the somata, and ignoring orphan branches are shown in **Figure 4**. **Figure 4A** displays the axial-radial mean dendritic mass distribution, in the top part for several axial positions (summed per 50μm height steps) and in the bottom part summed over all heights. The dashed histograms present the masses of the sliced dendrites, while the solid lines present the completed dendritic mass distributions. Both distributions coincide at small radial distances from the axis, but the completed ones become increasingly higher at larger radial distances. Evidently this is caused by the larger correction factors at higher radial distances. The distribution for the non-sliced original full dendrites is also included in the bottom panel of **Figure 4A** as a red solid line. The total mass of 5264μm of the dashed (sliced) distribution deviated 19.7% from the original dendritic mass of 6552μm, while the completion procedure resulted in a mass of 6399μm, deviating only 2.3% from the original mass (**Table 1**). **Figure 4B** displays the mean mass distribution as a function of the radial distance to the soma of the sliced dendrites (dashed), of the completed ones (black solid line), and of the original non-sliced dendrites (red solid line). This distribution is related to a 3D Sholl diagram which counts the number of intersections with a set of concentric spheres (in practice, however, the Sholl method is most frequently applied to 2D projections, see Uylings and Van Pelt (2002) for a discussion on Sholl diagrams). The tail in both distributions originates from the apical dendrites. **Figure 4C** shows the positions of the somata in the Z-direction of the slice (filled circles) as well as the distribution of Z-coordinates of the terminal tips per neuron. The effect of cutting is clearly seen in the accumulation of terminal tips at the boundary layers of the slice. **Figures 4D–F** show the findings for the mean axonal mass distribution. Note that panel 4D extends over larger axial (Y-axis) and radial distances than panel 4A for the dendritic mass distribution. The summed radial distribution shows a substantial difference between the sliced axonal distribution (dashed) and the completed one (black solid line), with a mean total axonal mass

their respective radial distributions, which are scaled according to the numbers in the upper right corner in each bin. Thus the mass distribution at a Y-position of -50μm has a maximum mass of 330μm per radial bin of 10μm. The bottom graph in **(A)** shows the radial dendritic mass distribution summed over all axial Y-positions. The black curve for the completed mass distribution has a strong overlap with the red curve for the full non-sliced dendrites. **(B)** Distribution of dendritic mass as a function of the radial distance to the soma (this distribution is similar to a of the dendritic tip Z-coordinates per neuron. With a binning of 10μm each frequency per bin is plotted as a horizontal line piece centered in the bin and symmetrical around the vertical position axis. The tips of these line pieces are subsequently connected to each other. The frequency scale is indicated by the small bar left underneath the panel. Clearly is shown how the number of tips (length of the horizontal line piece) can accumulate at the boundary planes, indicating the presence of cut endings. Panels **(D–F)** show similar results for the axonal arborizations.

per neuron of 4239μm for the sliced axon and of 8531 μm for the completed mass, while the original axon (red solid line) had a total length of 10655μm. Thus, in 200μm thick slices, 60.2% of the original axonal mass was lost, while the completion procedure reduced the axonal loss to 19.9% (**Table 1**). Comparison of the black and red solid lines in the bottom panel of **Figure 4D** clearly shows the level of axonal recovery obtained in the case of 200μm thick slices. The difference in the level of recovery between dendrites and axons appears to be related to the fraction of conserved mass in the slice (see the results shown in **Figure 5E**).

slice thickness and soma positions in the slice. Also are shown the results of the completion algorithm, taking or not taking into account the orphan branches. The original mean (apical and basal) dendritic and axonal lengths are 6552 and 10655μm, respectively.

As expected, slicing has a significant effect on the length of axons and dendrites. For instance, centrally located neurons in 100μm thick slices loose 34.5% of their dendritic length and 85.9% of their axonal length; in 200μm thick slices the neurons loose 12.0% of their dendritic length and 61.1% of their axonal length; and in 300μm thick slices they loose 4.0% of their dendritic length and 37.3% of their axonal length. A

**Table 1** summarizes the effects of slicing on the dendrites and axons of 50 NETMORPH-generated neurons, depending on


*Results are shown for 50 sliced NETMORPH neurons for several slice thicknesses, with central or random positions of somata within the slices, and excluding or including orphan branches. The original neurons had a mean dendritic length of 6552*μ*m and a mean axonal length of 10655*μ*m.*

**FIGURE 5 | (A,B)** Plots of mass loss (%) vs. slice thickness of NETMORPH generated neurons centrally located in the artificial slice. (**A**, solid green curve) Loss of axonal mass by slicing. (**A**, dashed green curve) Remaining axonal mass loss mass after completion without orphan branches. (**A**, dotted green curve) Remaining axonal mass loss after completion including orphan branches. (**B**, solid red curve) Loss of dendritic mass by slicing of NETMORPH generated neurons centrally located in the slice. (**B**, dashed red curve) Remaining dendritic mass loss after completion without orphan branches. (**B**, dotted red curve) Remaining dendritic mass loss after completion including orphan branches. **(C,D)** Plots of mass loss by slicing and recovery by completion vs. the mean radial center-of-mass of the sliced axonal and dendritic arborizations. The 50 NETMORPH generated neurons were artificially sliced by 300 μm thick slices.

The neuronal somata were uniform randomly located within a 80% range of the slice thickness. The original 50 NETMORPH generated neurons had a mean dendritic radial center of mass at 70μm and a mean axonal radial center of mass at 204μm. **(C)** Scatterplot of the mass loss of the individual sliced axonal (green) and dendritic (red) arborizations. **(D)** Scatterplot of the recovery by completion of the individual sliced axonal (green) and dendritic (red) arborizations. **(E)** Plot of recovery values vs. the mass conserved in the slices. The data points (excluding orphan branches) are labeled with the slice thickness (inμm). The positioning of the somata in the slices is indicated by circles for the central position, by squares for the 80% range and by drops for the 60% range random positioning. Axonal data are plotted in green and dendritic data in red. Overlap of data points is indicated by a star.

significant recovery of the lost parts was obtained by applying the completion procedure. For instance, when orphan branches were not included, dendritic loss in 100μm thick slices was reduced from 34.5% down to 7.4%, and axonal loss from 85.9 to 61.1%. When orphan branches were included, dendritic loss was further reduced to 4.5% and axonal loss to 7.0%. In 300μm thick slices, dendritic recovery of centrally located neurons excluding orphan branches resulted in 0% dendritic loss and 7.9% axonal loss, and including orphan branches in 0% dendritic loss and 0.8% axonal loss.

When the neurons were randomly placed in the slice, the losses were slightly higher for the dendrites, but more or less similar for the axons because the larger extent of the axons makes them less sensitive to the precise position in the slice. Also with random placement the completion procedure was able to greatly reduce the loss. For instance, in the case of a Z-range of 10–90%, the dendritic loss of 41.1% in 100μm thick slices was reduced by completion without orphan branches to 9.3%, and with orphan branches to 3.1%. Axonal loss of 86.1% was reduced by completion without orphan branches to 61.3%, and with orphan branches to 3.5%.

The results in **Table 1** show that the completion procedure that takes orphan branches into account is able to fully recover the original mass, with loss values around zero. Small negative loss values also occur, indicating an overcompensation, which can be expected when, as a result of statistical fluctuations in the spatial distribution of the arbors, the mass densities inside the slice are somewhat larger than outside the slice. The full recovery of dendritic and axonal mass when orphan branches are included can be considered as a validation of the completion procedure. In the absence of knowledge about orphan branches, which is usually the case, the completion procedure is still able to recover dendritic mass to values very close to the original mass, with a deviation of less than 2.3% in the case of 200μm thick slices, and less than 1.6% in 300μm thick slices.

Intracortical axons extend their branches at large distances from the slice and their loss in 100μm and 200μm thick slices is substantial (up to 85.9 and 61.1%, respectively). Although the completion procedure significantly reduces these losses (to 61.1% in 100μm thick slices, to 25.6% in 200μm thick slices, and to 7.9% in 300μm thick slices), knowledge of orphan branches is required for a full recovery.

The loss of dendritic and axonal mass by slicing depends on the spatial extent of dendritic and axonal arbors in relation to the slice thickness. The NETMORPH neurons have a mean radial center of dendritic mass at 70μm, and a mean radial center of axonal mass at 204μm. The dependence of mass loss on slice thickness as well as the results from the completion procedure is depicted in **Figure 5**, displaying the mass loss of neurons with their somata in the center of the slice. Clearly is shown that for 300μm thick slices, completion without orphan branches results in a full recovery of the dendrites (dashed red curve in 5A), with a loss of 0% (**Table 1**), and an almost full recovery of the axons (dashed green curve in 5B), with a loss of 7.9% (**Table 1**).

**Figure 5C** shows a scatter plot of the mass losses of the axonal and dendritic arborizations of each individual neuron vs. the radial center-of-mass of these arborizations after slicing. These data were obtained for neurons with their somata uniform randomly positioned in a 80% range of the slice thickness of 300μm. Thus the closest distance of a soma to the cutting plane is 30μm. **Figure 5D** shows a scatter plot of the recovery results vs. the radial center-of-mass of the individual axons and dendrites within the slice. **Figures 5C,D** reveal a large scatter in the individual data points, originating from the large variability in morphologies of the axonal and dendritic arborizations and the effect of cutting. The larger extent of axonal arborizations in comparison with dendritic arborizations (as reflected by the larger values for their radial center-of-masses) causes also larger values for their mass losses by slicing, as shown in **Figure 5C**. Nevertheless, for these 300μm thick slices, there is almost full recovery of both axons (95.7%) and dendrites (98.3%). Note that the axonal recovery is better than shown in **Figure 5A** because of the different positions of the somata in the slices. The uniform random positions of somata are more realistic than a central positioning. The findings in **Figure 5** thus show that the completion procedure results in (almost) full recovery of the mass distributions of axonal and dendritic arborizations when they were cut by 300μm thick slices. The amount of scatter in the data points appears to depend also on the locations of the somata in the slices. In Supplementary Figure S5, the results are also shown for centrally located neurons, and for uniform random placements in a 60% range of the 300μm thick slices.

Whether recovery results are related to the fraction of the original neuronal mass conserved in the slice is shown in **Figure 5E** using the data in **Table 1** (excluding orphan branches). Both the axonal (green) and dendritic (red) data points show a clear dependency of recovery result on the fraction of conserved mass. The data points are labeled by the section thickness and the positioning scheme used. Averaged over the three positioning schemes we observe for the dendrites, that a conserved mass of 62.2% (100μm) relates to a recovery of 91.8%, a conserved mass of 84.1% (200μm) relates to a recovery of 98.1%, and a conserved mass of 92.5% (300μm) relates to a recovery of 99.2%. For the axons, a conserved mass of 14.3% (in 100μm thick slices) relates to a recovery of 39.9%, a conserved mass of 39.1% (200μm) relates to a recovery of 76.4%, and a conserved mass of 61.6% (300μm) relates to a recovery of 95.3%. Clearly is shown in this figure how the red dendritic data points for 100–300μm thick slices intermingle with the green axonal data points for 300–1000μm thick slices. Thus dendrites and axons show similar relationships obtained for different slice thicknesses, which leads us to the conclusion that recovery results relate to the fraction of conserved mass, while this relation is independent of the slice thickness.

#### **DATA SVOBODA**

The 11 rat parietal cortical L2/3 pyramidal neurons were reconstructed from 300μm thick slices (Shepherd and Svoboda, 2005). **Figure 6** shows a selection of four of these neurons as projections on the XY plane and the YZ plane. A full display of all 11 neurons is shown in Supplementary Figure S1. While the XY projections show the full extent of the arbors, the YZ projections clearly show the accumulation of terminal tips at the intersection of the XY and YZ planes.

The axial-radial mass distributions of the reconstructed neurons after mass completion are shown in **Figures 7A,D** for (apical and basal) dendrites and axons, respectively. Particularly the summed radial distributions in **Figures 7A,D** and the radial to soma distributions in **Figures 7B,E** show the extent of the corrections, which were much larger for the axons than for the dendrites. **Figures 7C,F** clearly show the cut endings, which appear as an accumulation of cut terminal tips at one side of the arborizations. This accumulation was used as a criterion to align the neurons within their slices. The total length of the completed mass distributions is shown in **Table 2A**. The mean dendritic length increased from 7865 to 9289μm, and the mean axonal length from 4692 to 9118μm. The radial center-of-mass of the reconstructed dendrites was equal to 76μm and that of the reconstructed axons 195μm. Based on the validation results with the NETMORPH neurons (**Figure 5**) and the thickness of the slices in the Svoboda data set (300μm), we may expect an almost full recovery of the axonal and dendritic mass. This implies that by slicing 15% of the dendritic mass was lost, and 49% of the intracortical axonal mass (within an uncertainty range of a few percent).

#### **DATA MARKRAM**

The 33 Wistar rat somatosensory cortical L2/3 pyramidal neurons were reconstructed by Wang et al. (2002) from 300μm thick slices, and are shown in **Figure 8** as projections on the XY plane and the YZ plane. While the XY projections show the full extent of the arbors, the YZ projections clearly show the accumulation of terminal tips at the intersection of the XY and YZ planes.

The axial-radial mass distributions of the original neurons and after mass completion are shown in **Figures 9A,D** for (apical and basal) dendrites and axons, respectively. Particularly the summed radial distributions in **Figures 9A,D** and the radial to soma distributions in **Figures 9B,E** show the extent of the corrections, which were much larger for the axons than for the dendrites. **Figures 9C,F** clearly show the cut endings as an accumulation of cut terminal tips at one side of the arborizations, which was used as a criterion to align the neurons within their slices. The length of the completed mass distributions is shown in **Table 2B**. The mean dendritic length increased from 3790 to 4567μm, and the mean axonal length from 3211 to 6177μm. The radial center-of-mass of the reconstructed dendrites was 59μm and that of the reconstructed axons 160μm. Based on the validation results with the NETMORPH neurons (**Figure 5**) and the thickness of the slices in the Markram data set (300μm), we may expect an almost full recovery of the axonal and dendritic mass. This implies that by slicing 17% of the dendritic mass was lost, and 48% of the intracortical axonal mass (within an uncertainty range of a few percent).

#### **DATA PARNAVELAS-UYLINGS**

The data from Parnavelas and Uylings originate from a study of (basal) dendritic development in rat visual cortex pyramidal neurons (Uylings et al., 1994). Golgi stained dendrites were

reconstructed from slices with a thickness of about 120μm. Reconstructions were obtained from 153 layer 2/3 pyramidal cells at different ages of cortical development, i.e., at 10, 14, 18, 24, 30, and 90 days postnatal (PN). **Figure 10** shows a selection of the 24 L2/3 pyramidal reconstructions at the age of 90 days PN. A full display of all the neurons is given Supplementary Figure S4.

The results of the completion procedure for the 90 days PN data set are shown in **Figure 11**. With a mean length of 2022μm for the reconstructed basal dendrites and 2532μm for the completed masses, the mass loss by slicing becomes 20%. However, the validation study (**Table 1**) has shown that for 100 and 200μm thick slices dendritic recovery still leaves a deficit of 7.1 and 1.7%, respectively, for somata within a 60% range of the slice thickness. Based on these validation findings, we may expect that the outcome of the completion procedure of 2532μm deviates less than 7% from the original mean dendritic mass, thus in the range of 2532–2723μm. The outcomes of the completion procedure for the other age groups are listed in **Table 2C**.

#### **DISCUSSION**

The aim of this study was to explore the feasibility of density field completion of incompletely reconstructed neurons. The underlying idea was that when the density of the arborizations within the slices can be estimated, an extrapolation to the space outside the slice is possible under the assumption of axial symmetry. The method for this extrapolation is based on simple geometrical considerations.

#### **OTHER APPROACH FOR RECOVERING CUT ARBORIZATIONS**

The problem of incomplete reconstructions by slicing has recently also been studied by Hill et al. (2012). The algorithm they devised for morphological repair was to derive a statistical growth model from the intact parts of the arborizations and then to regrow the cut portions. Using Bayesian spatial distributions, cut dendrites were regrown point by point. Axons were separately repaired by pasting subtrees from the intact parts. Invariance to axial rotations was also assumed. While Hill et al. (2012) attempted to recover the individual branches outside the slice volume, our approach aimed at recovering the population mean axonal and dendritic mass density fields by extrapolating the observed mass distributions within the slices to the outside space by simple geometrical relationships. Our approach is not model-based, and the only assumption used is that of axial symmetry in the mass distributions.

#### **VALIDATION OF THE COMPLETION METHOD**

For the validation of the completion procedure we used a set of 50 neurons generated with our NETMORPH simulator. The neurons were subsequently sliced with different slice thicknesses, and subjected to the completion procedure. Comparison of the sliced masses and completed masses then showed how well the completion procedure was able to recover the lost masses. It turned out that a complete recovery was indeed possible provided that all the original mass of the sliced neurons within the slices was used. This outcome validated the completion procedure. However, the original mass within a slice includes the contiguous part seen from the soma, as well as the mass of orphan branches that lost their connection with the contiguous part within the slice. Because the orphan branches are not included in experimental reconstructions, they remain a missing part of the reconstructed neurons and therefore affect the degree of mass recovery. However, the number of orphan branches depends on the slice thickness and the spatial extent of the arborizations. From the validation data it turned out that this missing part for dendrites in the case of 300μm thick slices was negligibly small, while for intracortical axons the missing part was less than 8% for centrally located neurons (**Figure 5**). For randomly located neurons in 300μm thick slices a full (i.e., better than 98%) dendritic recovery and an

almost full axonal recovery (i.e., better than 95%) was obtained by the completion procedure.

#### **DEPENDENCE ON SOMA LOCATIONS IN SLICES**

The validation study revealed that the mean loss of mass in a population of neurons depends on the positions of the somata within the slice. However, a systematic trend could not be derived from **Table 1**, because the outcomes for the three different location options also depended on the slice thickness and the extent of arborization (dendritic or axonal). Apparently, these geometrical parameters all play a role in the final outcome.

**Table 2 | Outcomes of the completion procedure applied to (A) the dataset of Svoboda, (B) the dataset of Markram, and (C) the Parnavelas-Uylings datasets of all age groups in the developmental study, viz. 10, 14, 18, 24, 30, and 90 days PN.**


*The data represents the actual measured axonal and dendritic length from the reconstructions, the recovered length after the completion procedure, and an estimate of the mass loss (%). Note that the mass loss (%) is estimated relative to the recovered length (and not relative to the true length, which is unknown).*

#### **RELATION BETWEEN CONSERVED MASS IN SLICE AND RECOVERY RESULT**

Recovery results appear to relate to the fraction of conserved mass in the slice. Both the dendritic and the axonal data show a similar relationship. In particular the observation that dendritic data points in this relation for 100μm thick slices coincide with axonal data points for 300μm thick slices, and for 200 and 300μm thick slices are intermingled with axonal data points for 400–1000μm thick slices shows that this relationship is independent of the slice thickness itself. Thus, the fraction of conserved mass in the slice

determines the level of recovery obtained. For instance, a recovery result better than about 95% requires a conserved mass in the slice larger than about 60%. This relation may be of practical value as it provides guidance for the required slice thickness which will be different for axons and dendrites. Whether a similar relationship between conserved mass and recovery result is obtained when the method is applied to experimentally fully reconstructed neurons instead of NETMORPH generated neurons is still an open question.

#### **APPLICATION TO EXPERIMENTAL DATA SETS**

Based on the positive outcomes of the validation study, we have analyzed three data sets of reconstructed neurons, two of which were obtained from the NeuroMorpho.org data

**FIGURE 10 | A selection of rat visual cortex layer 2/3 pyramidal basal dendrites at the age of 90 days PN, reconstructed by Parnavelas-Uylings (Uylings et al., 1994), and plotted as projections onto the XY and YZ plane.** A full display of all the neurons is given in Supplementary Figure S4.

base and one was provided by one of the authors of this study.

The Svoboda data set (Shepherd and Svoboda, 2005) resulted in a recovery of the mean dendritic length of 9289μm (from the actual measured value of 7865μm, indicating a loss by slicing of 15%). Axonal length was recovered up to 9118μm (from the actual measured value of 4692μm, indicating a mass loss of 49%), but this outcome may still be about 5% from the true value (see discussion in the previous two paragraphs).

The Markram data set (Wang et al., 2002) resulted in a recovery of the mean dendritic length of 4567μm (from the actual measured value of 3790μm, indicating a loss by slicing of 17%). Axonal length was recovered up to 6177μm (from the actual measured value of 3211μm, indicating a mass loss of 48%), but this outcome may still be about 5% from the true value (see discussion in the previous two paragraphs).

The Parnavelas-Uylings data set (Uylings et al., 1994) resulted in a recovery of the mean basal dendritic length of the 90 days age group of 2532μm (from the actual measured value of 2022μm, indicating a loss by slicing of 20%). These dendritic reconstructions were, however, made from 120 μm thick slices and the recovery result may still be about 7% from the true value (see **Table 1**, dendritic deviation for 100 μm thick slices with random positions in 20–80% range).

The outcomes of the three data sets are not directly comparable, because of different ages of the rats used (25–36 days PN for the Svoboda data, 13–15 days PN for the Markram data, and 90 days PN for the Parnavelas-Uylings data), and the restriction in the last data set to basal dendrites only.

It is interesting to note that the Svoboda and the Markram data set show similar mass losses for axons (48–49%) and dendrites (15–17%) in 300μm thick slices. Apparently, the mean mass loss is not so sensitive to the difference in the mean radial center-ofmasses of the cut axons (195 and 160μm) and cut dendrites (76 and 59μm) in the Svoboda and Markram data set, respectively. This is also in line with the more or less uncorrelated scatter of the individual NETMORPH neuron data in **Figure 5C**. Nevertheless, the axonal and dendritic losses do differ significantly in all these cases.

#### **SHRINKAGE OF THE TISSUE**

The described completion/recovery method, and thus the quality of its outcomes, is independent of homogeneous tissue shrinkage. The 3D metrical properties of the arborizations, however, are affected by shrinkage. In quantifying the 3D geometry of neuronal arborizations, one needs to take tissue shrinkage into account, which occurs in histological and staining procedures. The extent of shrinkage is different for different staining techniques, and is even different for different Golgi techniques. For the Markram data set based on HRP staining, Wang et al. (2002)reported a 25% shrinkage of the slice thickness and ∼10% anisotropic shrinkage along the X- and Y-axes. Only shrinkage of thickness was corrected. Shrinkage correction in the Svoboda data set (Shepherd and Svoboda, 2005) was not reported. No shrinkage correction was applied for the Parnavelas-Uylings data set (Uylings et al., 1994). Uylings et al. (1986, 1989) summarized tissue shrinkage values for different Golgi staining procedures. Linear shrinkage between 5 and 10% was reported for Golgi-Cox and rapid Golgi staining, and of 5–20% for Golgi-Kopsch staining. Thus, depending on histological and staining procedures, tissue shrinkage is an important issue in quantitative studies of 3D neuronal arborizations. Correction for shrinkage is only possible when the actual amount of anisotropic shrinkage in X, Y, and Z direction is known.

#### **AXIAL SYMMETRY**

A key assumption in the completion procedure is axial (rotational) symmetry in the distribution of axonal and dendritic mass. Because such symmetry is often assumed for pyramidal neurons, we have selected cortical layer 2/3 pyramidal neuron reconstructions for this study. While this assumption may be reasonably valid for basal dendritic arborizations that locally innervate space, it may not be valid for non-pyramidal dendrites (Parnavelas and Uylings, 1980) and axons that extend their arbors not only to local but also to remote locations. Clearly, single slice axonal reconstructions visualize only the local part of axonal arbors. Because the completion procedure recovers only this local axonal part, the axial symmetry assumption may still be valid.

#### **DENSITY FIELDS**

The paper dealt with the estimation of the axial-radial distribution of axonal and dendritic mass. The calculation of axonal and dendritic densities is a straightforward extension and proceeds by dividing the mass in an integration ring (see **Figure 2**) by the volume of the ring (see Materials and Methods; see also Van Pelt and Van Ooyen, 2013).

#### **DENSITY FIELDS AND SYNAPTIC CONNECTIVITY ESTIMATION**

Recently, we have shown that potential synaptic connectivity between neurons in a network can be estimated from their axonal and dendritic density fields (Van Pelt and Van Ooyen, 2013; McAssey et al., 2014). Thus, for constructing neuronal networks and their inter-neuron connectivity one does not need as many neuronal reconstructions as there are neurons in the network, but can use the population mean density fields instead. The availability of density fields for a variety of neuronal cell types is thus important. While for a large variety of cell types reconstructions have become available in open-access data bases, their incompleteness hampers a full use of the data. When with our method full density fields can be recovered from incomplete single-slice reconstructions, the open-access data become even more valuable, as they now can also be used for building neuronal networks and connectivity studies.

It has to be noted that the estimation of the number of synapses not only requires the number of potential synapse locations but also the probability that synapses actually are formed at these locations. A detailed EM study of the hippocampal neuropil by Mishchenko et al. (2010) showed that this probability was variable and dependent on ultrastructural details, such as dendritic circumference and actual axo-dendritic touches. Helmstaedter (2013) emphasizes in his review on dense neural circuit reconstruction that in "mapping neuronal circuits, it is important to detect synaptic contacts between neurons, but it is in many cases even more important to be able to exclude synaptic connectivity between neurons to determine the structure of a wiring diagram," Clearly, the estimation of potential synapse locations is only one, but still crucial, factor in estimating synaptic connectivity.

Nevertheless, the realism of network connectivity estimates based on overlapping axonal and dendritic arborizations has recently been demonstrated by Hill et al. (2012) and Van Ooyen et al. (2014). In a statistical study, Hill et al. (2012) found that random alignment of axonal and dendritic arbors provides a sufficient foundation for specific functional connectivity to emerge in local neural microcircuits. In a computational study, Van Ooyen et al. (2014) found that the synaptic connectivity emerging between neurons that grow out in the absence of any guidance cues showed a good agreement with available experimental data on spatial locations of synapses on dendrites and axons, number of synapses by which neurons are connected, connection probability between neurons, distance between connected neurons, and pattern of synaptic connectivity.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnana.2014. 00054/abstract

Supplementary Figures give a full display of the neurons obtained from the NeuroMorpho.org database and used for the application of the completion method. Additional scatter plots are given of the loss of mass by slicing and recovery by completion vs. the radial center-of-mass in the axonal and dendritic mass distribution of the sliced neurons.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 April 2014; accepted: 07 June 2014; published online: 25 June 2014. Citation: van Pelt J, van Ooyen A and Uylings HBM (2014) Axonal and dendritic density field estimation from incomplete single-slice neuronal reconstructions. Front. Neuroanat. 8:54. doi: 10.3389/fnana.2014.00054*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 van Pelt, van Ooyen and Uylings. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Slicing, sampling, and distance-dependent effects affect network measures in simulated cortical circuit structures

#### *Daniel C. Miner\* and Jochen Triesch*

*Department of Neuroscience, Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany*

#### *Edited by:*

*Patrik Krieger, Ruhr University Bochum, Germany*

#### *Reviewed by:*

*Jason N. MacLean, University of Chicago, USA Mark D. McDonnell, University of South Australia, Australia*

#### *\*Correspondence:*

*Daniel C. Miner, Department of Neuroscience, Frankfurt Institute for Advanced Studies, Ruth-Moufang-Strasse 1, 60438 Frankfurt am Main, Germany e-mail: miner@fias.uni-frankfurt.de*

The neuroanatomical connectivity of cortical circuits is believed to follow certain rules, the exact origins of which are still poorly understood. In particular, numerous nonrandom features, such as common neighbor clustering, overrepresentation of reciprocal connectivity, and overrepresentation of certain triadic graph motifs have been experimentally observed in cortical slice data. Some of these data, particularly regarding bidirectional connectivity are seemingly contradictory, and the reasons for this are unclear. Here we present a simple static geometric network model with distance-dependent connectivity on a realistic scale that naturally gives rise to certain elements of these observed behaviors, and may provide plausible explanations for some of the conflicting findings. Specifically, investigation of the model shows that experimentally measured nonrandom effects, especially bidirectional connectivity, may depend sensitively on experimental parameters such as slice thickness and sampling area, suggesting potential explanations for the seemingly conflicting experimental results.

**Keywords: cortical networks, graph theory, nonrandom connectivity, network topology, common neighbor, motifs, cortical slices**

#### **1. INTRODUCTION**

Synaptic connectivity forms the anatomical substrate which gives rise to our cognitive abilities. It has been shown that much of the lateral recurrent connectivity of the cortex is significantly nonrandom. That is to say that the statistics of local connectivity do not follow that of a directed Erdos-Rényi graph, i.e., ˝ a graph in which all possible connections exist with equal and independent probability (Erdos and Rényi, 1960 ˝ ). For example, Holmgren et al. (2003), Song et al. (2005), and Ko et al. (2011) note the presence of greater than expected bidirectional connectivity, a feature that has been suggested as a key requirement for the sort of large-scale recurrent excitation that is seen and computation that is believed to take place in the neocortex (Douglas et al., 1995). Lefort et al. (2009), on the other hand, notes no excess of bidirectional connectivity. Song et al. (2005) additionally notes greater than expected counts of certain triangular or triadic network motifs (three-neuron connectivity patterns) (Milo et al., 2002). Yoshimura et al. (2005) examines specific microstructure, including bidirectional connections, within cortical columns. Perin et al. (2011) notes greater than expected common neighbor clustering, a phenomenon in which pairs of neurons sharing a greater number of common neighbors are more likely to be connected themselves, while Perin et al. (2013) further examines the structural implications of this abovechance common neighbor clustering. Morgan and Soltesz (2008), Litwin-Kumar and Doiron (2012), and McDonnell and Ward (2014) highlight some of the functional implications of clustering in balanced cortex-like networks. Rubinov and Sporns (2010) provides an overview of graph measures that might be applied to brain networks.The abundance of nonrandom features suggests that there may be some computational or metabolic advantage to the particular connectivity structure of the cortex. It is an open question which nonrandom features are developed as a result of direct genetic programming, neural plasticity under structured input, and spontaneous self-organization (Prill et al., 2005).

The connectome, which we take here to refer to the microscale, or neuron-and-synapse connectivity of the brain Sporns et al. (2005) is a detailed and difficult thing to study. Numerous methods exist for its study, including (but not limited to) increasingly detailed histological techniques (Kleinfeld et al., 2011, for example) and, more commonly, as they allow access to synaptic strengths and dynamics in addition to structure, electrophysiological recordings. We focus here on the most common implementation of the latter, involving the preparation of and recording from *in vitro* slices of cortical tissue. Though it provides more information about individual connections, the overall picture provided by electrophysiological techniques is affected by sampling biases and constraints (Seung, 2009). Traditionally, the primary concern regarding such biases and constraints has been accurate reconstruction of very small sections of circuitry. However, as techniques improve and the available sections get larger and more densely sampled, and in particular as statistical network measures are utilized more and more, it becomes important to study the effect of these biases and constraints on the network measures as well.

We examine here a simple model for horizontal connectivity in the cortex under intersomatic distance-dependent connection constraints. This simple distance-dependence results in the formation of several nonrandom features including, but not limited to, common neighbor clustering, excess reciprocal or bidirectional connectivity, and an overrepresentation of certain triadic motifs. We perform virtual slicing and sampling on this model, similar to what would be done in a physiological experiment, and examine how the results depend on slice thickness and the size of the sampling area from which cells are probed. We find, encouragingly, that such complex nonrandom features can be seeded (if not fully instantiated to the degree at which they are experimentally observed) by such simple distance-dependent phenomenon. We also find, more troublingly, that the observed representation of some of these features depends strongly on interactions of scale between the connectivity profiles, the cortical structures, and the slicing and sampling thereof. We discuss in this paper the implications of these phenomena and conclude that in order to correctly interpret data on cortical connectivity and its nonrandom features, close attention has to be paid to the exact experimental parameters such as slice thickness and sampling area.

#### **2. MATERIALS AND METHODS**

Our model is designed to represent a virtual slab of cortical layer V in rodents. The slab's dimensions are 1000 × 1000µm, with a thickness of 300µm (the lattermost dimension describing the approximate thickness of layer V of the rodent cortical sheet (Schüz and Palm, 1989) (see **Figure 1**). We assume a cortical neuronal density of at least 20000 excitatory neurons per cubic mm, resulting in a total population of 6000 neurons, which are populated into the volume in a random, uniform fashion. This is a slight reduction in neuronal density from biological values, but is sufficient to demonstrate the phenomena we wish to explore and is necessary for rapid computational tractability. Though is is known that horizontal cortical axonal projections can reach lengths of several millimeters (Hirsch and Gilbert, 1991), we choose to focus on local, sub-millimeter connectivity, as this is the scale of the microstructure typically being examined in network measure studies of cortical wiring. Various connectivity models, ranging in complexity from simple piecewise dense and sparse connectivity radii (Voges et al., 2010a,b) to detailed reconstructions based on axonal and dendritic structure (Stepanyants et al., 2008; Kleinfeld et al., 2011), have been produced from experimental data. We select a continuous radial function for distance-dependent connectivity as solution between these two extremes. Our profile is a Gaussian with a half-width of 200µm. This particular profile is chosen as a middle ground between the results of Song et al. (2005), who find no distance dependence up to a scale of 80–100µm, and the results of Holmgren et al. (2003) and Perin et al. (2011), who find exponential distance dependence at a scale of 150–300µm. The Gaussian compromise coarsely approximates both the flat top of the former result and the decay of the latter.

To produce the model graph, first, a 6000 × 6000 element distance matrix is constructed, with each element representing the euclidian distance between each pair of neurons. The boundary conditions are non-periodic, corresponding to slice boundary truncation. The connectivity profile function is then applied to each element, producing an unnormalized probability matrix, with each entry representing the pairwise connection probability. Self-connection probabilities are set to zero. The matrix is flattened into a vector and then the cumulative sum of the vector is taken and normalized, producing a cumulative distribution function (CDF). A look up table map is generated mapping each interval in the CDF to a particular pair of neurons.

The network is treated as a directed graph. A global connection fraction *FC* is chosen upon model initialization, and the model is populated by generating random numbers in the interval [0*,* 1] against the CDF and instantiating the edge mapped to the CDF interval in which each random number falls (rejecting already-instantiated edges) until the total number of edges reaches *<sup>N</sup>*edges <sup>=</sup> *FC* <sup>×</sup> (*N*<sup>2</sup> nodes − *N*nodes).

Two sequential reduction procedures are then performed on the graph in order to simulate experimental sampling of the network. The first procedure simulates slicing. The virtual volume of the network is truncated along the X axis in **Figure 1** to correspond to the dimensions of a typical slice (50–500µm, depending on the experiment). Edges and nodes that fall outside the truncation region are eliminated from the graph. The second procedure roughly simulates probing and sampling. In this procedure, a subset of nodes *N*sample are randomly selected from a centered cylindrical volume within the slice of radius *r*sample (50–300µm, depending on the experiment), and a subgraph is constructed from these nodes and their respective edges. This subgraph is then taken to be equivalent to an electrophysiologically obtained sample. An example geometry of this virtual slicing and sampling is shown in **Figure 1**).

For any selected network, be it complete, a virtual slice, or a virtual sample, we compare properties against ensembles of two types of control graphs. The first control is a comparison against

a purely random graph. It is a directed Erdos-Rényi graph ( ˝ Erdos ˝ and Rényi, 1960) parametrized by the same number of nodes and number of edges as the selected network.

The second control is a graph that naturally and randomly attains the amount of overrepresented bidirectional connections induced by the distance dependent connectivity, but contains no higher order effects. It is essentially a modified directed Erdos- ˝ Rényi-like graph parametrized by the number of nodes and the two independent probabilities of unidirectional connections and reciprocal or bidirectional connections. More explicitly, from the model graph, the fraction of node pairs that are unidirectionally connected and the fraction of node pairs that are bidirectionally connected is calculated. A new graph is then randomly populated with the same fractions of unidirectionally connected and bidirectionally connected edge pairs in an Erdos-Rényi-like ˝ fashion. This controls against an overrepresentation of motifs driven solely by excess bidirectional connectivity while preserving overrepresentation of motifs driven by higher order or more subtle forms of clustering.

The Python package NetworkX (Hagberg et al., 2008) and a publicly available software script that counts triadic motifs in a directed graph (Levenson and van Liere, 2011) are used to assist in the construction and analysis of graphs.

We will make comparisons between different sample and slice sizes based on overall connection fraction, bidirectional connection fraction, triadic motif count, and common neighbor clustering. We will demonstrate that sampling scale has a notable effect on how such properties are observed.

#### **3. RESULTS**

We select a global target connection fraction of 0.025 for the 1000 × 1000 × 300µm layer V slab, as this produces a local connection fraction of 0.1 for a medium-sized slice and sample, as observed in numerous layer V slice studies (Thomson and Deuchars, 1997; Thomson et al., 2002). We select three slice thicknesses (in addition to the complete network) and three sampling radii with 100 neuron subsamples (except in the case of small sections, in which case the maximum number of neurons in the section is sampled). We will examine the complete network and complete slice statistics, as well as the sample statistics for each condition, and note how they vary. Unless otherwise specified, we average over five network samples.

The global connection fraction and bidirectional connection fraction for each condition is given in **Tables 1**, **2**. We note that in general, for a given slice size, the overall connection fraction decreases with increasing sampling radius. This is an obvious result of local clustering due to the distance-dependent connection probability. Similarly, we note that as sampling radius increases, the number of bidirectional connections over chance (as compared to an Erdos-Rényi graph) increases. This ˝ is also a result of local clustering due to the distance-dependent connection probability.

We examine the common neighbor behavior in **Figures 3**–**6**. The common neighbor effect is measured as follows. Pairs of neurons sharing each possible number of commonly connected neighbors (up to some maximum value) are counted, ignoring directionality (see **Figure 2**). For each number of commonly connected neighbors, the number of connected neuron pairs





is divided by the total number of neuron pairs, resulting in a connection probability conditioned on the number of common neighbors. The the steeper the slope of this measure as a function of number of common neighbors is, the stronger the effect (Perin et al., 2011). For an Erdos-Rényi graph, this common ˝ neighbor effect measure will have, on average, a slope of zero and a value equal to the overall connection probability (up until the maximum number of neighbors). Common neighbor clustering should not be confused with more traditional clustering measures (Watts and Strogatz, 1998; Fagiolo, 2007). Common neighbor effect is taken here as an undirected measure for two reasons: alignment with the convention of Perin et al. (2011), and because our simple structural model has no directional preference, and can thus make no prediction about it. In an actual biological or more complex simulated system, it is likely that in and out (to and from) common neighbor effects would produce different results, as is suggested in the supplementary material of Perin et al. (2011).

**Figure 3** shows the total common neighbor effect for each entire slice. We note, firstly, that the slope of the common neighbor clustering increases with decreasing section size, and secondly, that the saturation point decreases with decreasing section size. We speculate that this occurs due to the truncation of connections that occurs upon slicing, and the resulting tendency of only nearby neurons to be well-connected. Similarly, for each individual slice thickness (**Figures 4**–**6**), the saturation point increases with decreasing sampling radius. The overall effect also becomes less pronounced for the smaller (in this case, 100 neuron) samples, as would be expected. The strength of common neighbor clustering is sensitive to both the neuronal and connection densities, and the size of the distance-dependent connection probability, particularly as it relates to the sampling scale. It is the sensitivity to the relationship between these scales that we wish to emphasize in these results.

Experimental data (Perin et al., 2011) shows an above-chance common neighbor effect stronger than the one demonstrated by our model for similar sampling conditions, suggesting the presence of additional clustering mechanisms in the cortex beyond the simple geometric ones examined in our model. One prediction our model makes is that after a linear or near-linear rise in connection probability as function of common neighbor count, the connection probability saturates for some large number of common neighbors. It can be extrapolated, despite the increased common neighbor effect seen in physiological data, that this sort of turnover and saturation effect will still necessarily occur for a large number of common neighbors given a sufficiently thorough sampling of a section of cortical tissue.

We examine the counts of occurrences of directed triadic motifs (possible directed triangular subgraph configurations; see **Figure 7**) in the simulated tissue sections compared with Erdos- ˝ Rényi random graphs for complete sections and for a sampled 300µm slice in **Figures 8**, **9** (which is representative of sliced and sampled behavior, as it is observed that sliced and sampled behavior does not vary much between slice sizes; only sample radii). We note an excess of motifs with bidirectional connections. This is trivially expected from distant-dependent connection probabilities; since each direction in an edge is treated independently it will of course be the case that many minimally separated nodes will be bidirectionally connected, and, more generally that inter-group connectivity will be increased among tight groups of neurons. Furthermore, it is trivially the case that given an excess of bidirectional connections, triads containing them will be overrepresented. We wish to correct for this second effect, and do so via the bidirectionality corrected control described in the Materials and Methods section and elucidated below.

We examine triadic motif counts against bidirectionalitycorrected random graphs for complete sections and for a sampled

**FIGURE 4 | Common neighbor clustering for 500 µm slice: pairwise connection probability as a function of number of commonly connected neighbors.** Error bars indicate standard error of the mean. Average over five populations.

300µm slice in **Figures 10**, **11**. Again, sliced and sampled behavior does not vary much between slice sizes; only sample radii. We note that even after bidirectionality correction, excesses of closed-loop (i.e., connected on all sides) triadic motifs containing bidirectionally connected pairs remain. Of interest as well is the excess of closed but non-bidirectional triadic motifs (numbers 10 and 11) remaining. We note, in general, that motifs 10 -16 remain overrepresented, a phenomenon seen as well in Song et al. (2005). An underrepresentation of motif 8, which is observed in Song et al. (2005) with a similar strength to the aforementioned overrepresentations, is not seen in our model. However, the purpose of this paper is not to fully analyze the more subtle effects of distant-dependent clustering, but rather to examine the implications of similar clustering occurring at the same spatial scale as variations in sampling. We note, firstly, that as slice size decreases, the statistics of the complete slice approach the statistics of the sample. This follows logically from the fact that the sample occupies an increasing fraction of the slice by volume for a smaller slice. Along similar lines, we note that thinner slices exhibit less variation in the counts between sampling radii. For a sufficiently thin slice, one could hypothetically move from a three-dimensional to a two-dimensional reference model,

**FIGURE 6 | Common neighbor clustering for 100 µm slice: pairwise connection probability as a function of number of commonly connected neighbors.** Error bars indicate standard error of the mean. Average over five populations.

approximating a sheet. We also note that post-bidirectionality correction in the control, the variation between slice sizes and sample radii is smaller than it was pre-bidirectionality correction in the control. This is a strong indicator that any motif surveys undertaken would benefit from using a bidirectionality or similar (as in Song et al., 2005) correction on the control in order to maximize consistency and universality in results.

#### **4. DISCUSSION**

As we are able to access larger and denser subsamples of the connectome, complex network measures (Rubinov and Sporns, 2010) are becoming an increasingly important way of understanding both the structure and function. Such measures have already been applied to the complete connectome of *C. elegans* (Varshney et al., 2011). While elements of this study are highly telling, they do not provide a direct comparison to cortical slice studies, which are subsampled portions of a very different structure, even if the individual elements are similar. Currently, cortical slice studies provide some of the best information we have about the wiring structure of the cortex on a microscopic scale.

In order to understand this microstructure, it is very important to study and examine the statistics of connectivity at scales

of tens to hundreds of µm—this will be vital to understanding the self-organizational and computational principles underlying the structure of the brain (Prill et al., 2005; Sporns et al., 2005; Seung, 2009). However, at the same time, extreme care must be taken, as relatively small variations in section size and sampling density can lead to significantly differing results, as this is also the scale at which naturally occurring simple clustering may occur, and at which the statistical transition from microstructure to macrostructure may take place as well.

It is thus of great importance that experimenters take this into account and, accordingly, provide all available information regarding neuron type and approximate density, sampling space distribution, slice thickness, and other parameters that might lead to sampling biases. Various studies of such microstructure have shown conflicting results. Reiterating, Song et al. (2005) and Holmgren et al. (2003) noted an excess of bidirectional connectivity in layer V and layer II / III, respectively. However, Lefort et al. (2009) noted no such excess. It is possible that this could be a result of sampling from different parts of the cortex which exhibit significantly different micro-organization, or that small differences in sectioning size and sampling procedure could lead to such differences. It is this latter concern that we would like to emphasize.

We have not reproduced the sampling procedures used in these studies exactly, but rather provided a generic sampling simulation from which we can gain some qualitative insight into real-world experimental results. Examining the aforementioned studies, we note that Song et al. (2005) used a 300µm slice (Sjöström et al., 2001) with a roughly ellipsoid sampling area with radii of approximately 100 and 50µm on the major and minor axes, respectively. Holmgren et al. (2003) also used a 300µm slice, recording in an irregular shape out to a maximal radius of nearly 300µm. Our model does not reproduce the high degree of excess bidirectional connections observed under these parameters, but it does result in an above-chance representation. Lefort et al. (2009), who noted no excess of bidirectional connections, used a 300µm slice as well, further subdividing these into 100µm sections, which would correspond to a centered recording radius of 50µm—a radius at which our model does not exhibit a noteworthy excess of bidirectional connectivity, and suggesting an explanation for why their results appear potentially at odds with other cortical slice studies.

Our model demonstrating this concern is a simple graph model that, while it does not completely reproduce the nonrandom features noted in electrophysiological surveys, does reproduce some of them at a presumably natural scale. It is our belief that such a model provides a more reasonable, realistic, and general baseline for measuring the statistics of nonrandom cortical connectivity than a simple Erdos-Rényi graph. Certain observed ˝ complex features have been necessarily excluded to avoid an overly *ad-hoc* model. For example, our model does not reproduce the common neighbor clustering asymmetry in the in- and out-degree noted in the supplementary materials of Perin et al. (2011).

That the examined features depend so sensitively on section size in the presence of order 100µm scale clustering should be both enlightening and concerning, particularly when most sampling procedures operate around this scale. Other factors such as neuronal type and local density almost certainly play into such effects as well. The model is not exhaustive, and numerous parameters, including the exact size and form of the connection probability profile and neuronal connection densities, could be varied. The thrust of the example provided in this paper is not to provide an exhaustive catalog of scenarios, but to demonstrate how sensitive the observed nonrandom effects of clustering mechanisms are to small variations in sampling. With this brief and simple demonstration in mind, the authors encourage experimenters to include all available information about neuronal and connection density and scale, as well as the full extent of exact sampling techniques in any study of such nonrandom features so that they can be best understood in the context of a complete graph.

#### **AUTHOR CONTRIBUTIONS**

Dr. Miner performed the programming, analysis, and initial writing. Research direction was shared. Significant background expertise and guidance was provided by Dr. Triesch, as was significant input into the writing and revision process.

#### **FUNDING**

This work was supported by the Quandt Foundation and the LOEWE-Program Neuronal Coordination Research Focus Frankfurt (NeFF).

#### **ACKNOWLEDGMENT**

The authors would like to thank Christoph Hartmann for his assistance.

#### **SUPPLEMENTAL DATA**

Elements of analysis source code to be released on the web via corresponding author's webpage. http://fias*.*uni-frankfurt*.*de/neuro/ triesch/members/

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 July 2014; accepted: 19 October 2014; published online: 05 November 2014.*

*Citation: Miner DC and Triesch J (2014) Slicing, sampling, and distance-dependent effects affect network measures in simulated cortical circuit structures. Front. Neuroanat. 8:125. doi: 10.3389/fnana.2014.00125*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Miner and Triesch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A workflow for the automatic segmentation of organelles in electron microscopy image stacks

#### *Alex J. Perez 1,2\*, Mojtaba Seyedhosseini 3, Thomas J. Deerinck1, Eric A. Bushong1, Satchidananda Panda4, Tolga Tasdizen3 and Mark H. Ellisman1,2,5\**

*<sup>1</sup> Center for Research in Biological Systems, National Center for Microscopy and Imaging Research, University of California, San Diego, La Jolla, CA, USA*

*<sup>2</sup> Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA*

*<sup>3</sup> Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA*

*<sup>4</sup> Regulatory Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA*

*<sup>5</sup> Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA*

#### *Edited by:*

*Julian Budd, University of Sussex, UK*

#### *Reviewed by:*

*Kevin Briggman, National Institutes of Health, USA Anna Kreshuk, University of Heidelberg, Germany Hanspeter Pfister, Harvard University, USA*

#### *\*Correspondence:*

*Alex J. Perez and Mark H. Ellisman, National Center for Microscopy and Imaging Research, Center for Research in Biological Systems, University of California, San Diego, Biomedical Sciences Building, Room 1000, 9500 Gilman Drive, Dept. Code 0608, La Jolla, CA 92093, USA e-mail: aperez@ncmir.ucsd.edu; mellisman@ucsd.edul*

Electron microscopy (EM) facilitates analysis of the form, distribution, and functional status of key organelle systems in various pathological processes, including those associated with neurodegenerative disease. Such EM data often provide important new insights into the underlying disease mechanisms. The development of more accurate and efficient methods to quantify changes in subcellular microanatomy has already proven key to understanding the pathogenesis of Parkinson's and Alzheimer's diseases, as well as glaucoma. While our ability to acquire large volumes of 3D EM data is progressing rapidly, more advanced analysis tools are needed to assist in measuring precise three-dimensional morphologies of organelles within data sets that can include hundreds to thousands of whole cells. Although new imaging instrument throughputs can exceed teravoxels of data per day, image segmentation and analysis remain significant bottlenecks to achieving quantitative descriptions of whole cell structural organellomes. Here, we present a novel method for the automatic segmentation of organelles in 3D EM image stacks. Segmentations are generated using only 2D image information, making the method suitable for anisotropic imaging techniques such as serial block-face scanning electron microscopy (SBEM). Additionally, no assumptions about 3D organelle morphology are made, ensuring the method can be easily expanded to any number of structurally and functionally diverse organelles. Following the presentation of our algorithm, we validate its performance by assessing the segmentation accuracy of different organelle targets in an example SBEM dataset and demonstrate that it can be efficiently parallelized on supercomputing resources, resulting in a dramatic reduction in runtime.

**Keywords: serial block-face scanning electron microscopy, 3D electron microscopy, electron microscopy, automatic segmentation, image processing, organelle morphology, neuroinformatics**

#### **INTRODUCTION**

Advances in instrumentation for 3D EM are fueling a renaissance in the study of quantitative neuroanatomy (Peddie and Collinson, 2014). Data obtained from techniques such as SBEM (Denk and Horstmann, 2004) provide unprecedented volumetric snapshots of the *in situ* biological organization of the mammalian brain across a multitude of scales (**Figure 1A**). When combined with breakthroughs in specimen preparation (Deerinck et al., 2010), such datasets reveal not only a complete view of the membrane topography of cells and organelles, but also the location of cytoskeletal elements, synaptic vesicles, and certain macromolecular complexes.

Harnessing the power of these emerging 3D techniques to study the structure of whole cell organellomes is of critical importance to the field of neuroscience. Abnormal organelle morphologies and distributions within cells of the nervous system are characteristic phenotypes of a growing number of neurodegenerative diseases. Aberrant mitochondrial fragmentation is believed to be an early and key event in neurodegeneration (Knott et al., 2008; Campello and Scorrano, 2010), and changes in mitochondrial structure have been observed in Alzheimer's disease (AD) neurons from human biopsies (Hirai et al., 2001; Zhu et al., 2013). Additionally, altered nuclear or nucleolar morphologies have been observed in a host of pathologies, including AD (Mann et al., 1985; Riudavets et al., 2007), torsion dystonia, (Kim et al., 2010), and Lewy body dementia (Gagyi et al., 2012).

Our ability to quantify and understand the details of these subcellular components within the context of large-scale 3D EM datasets is dependent upon advances in the accuracy, throughput, and robustness of automatic segmentation routines. Although a number of studies have extracted organelle morphologies from SBEM datasets via manual segmentation, (Zhuravleva et al., 2012; Herms et al., 2013; Holcomb et al., 2013; Wilke et al., 2013; Bohórquez et al., 2014), their applications are limited to only small subsets of the full stack due to the notoriously high labor cost associated with manual segmentation (**Figure 1B**).

Automatic segmentations generated based on thresholds or manipulations of the image histogram (Jaume et al., 2012; Vihinen et al., 2013) may require extensive manual editing of their results to achieve the accurate quantification of single organelle morphologies.

The development of computationally advanced methods for the automatic segmentation of organelles in 3D EM stacks has led to increasingly accurate results (Vitaladevuni et al., 2008; Narashima et al., 2009; Smith et al., 2009; Kumar et al., 2010; Seyedhosseini et al., 2013a). Recently, Giuly and co-workers proposed a method to segment mitochondria utilizing patch classification followed by isocontour pair classification and level sets (Giuly et al., 2012). Lucchi et al. (2010, 2012) developed an approach that trains a classifier to detect supervoxels that are most likely to belong to the boundary of the desired organelle. An approach to automatically segment cell nuclei using the software package ilastik to train a Random forest voxel classifier followed by morphological post-processing and object classification was proposed by Sommer et al. (2011), Tek et al. (2014). Though they yield impressive results, many current approaches utilize assumptions about the 3D morphology of the organelle target. This is problematic not only because it makes their expansion to the segmentation of other organelles non-trivial, but also because the typical SBEM dataset contains a heterogeneous mixture of organelle morphologies across multiple cell types. Therefore, there is a clear need for a robust method to accurately segment various organelles in SBEM stacks without any *a priori* assumptions about organelle morphology.

In this work, we present a method for the robust and accurate automatic segmentation of morphologically and functionally diverse organelles in EM image stacks. Organelle-specific pixel classifiers are trained using the cascaded hierarchical model (CHM), a state-of-the-art, supervised, multi-resolution framework for image segmentation that utilizes only 2D image information (Seyedhosseini et al., 2013b). A series of tunable 2D filters are then applied to generate accurate segmentations from the outputs of pixel classification. In the final processing step, 3D connected components are meshed together in a manner that minimizes the deleterious effects of local and global imaging artifacts. Finally, we demonstrate that our method can be easily and efficiently scaled-up to handle the segmentation of all organelles in teravoxel-sized 3DEM datasets.

#### **MATERIAL AND METHODS**

The description and validation of our method are arranged into three sections. In the first section, the workflow is described in detail. In the second, the robustness and accuracy of our method are validated by applying it to four different organelle targets (mitochondria, lysosomes, nuclei, and nucleoli) from a test SBEM dataset. In the third section, we describe experiments that demonstrate how our method can be easily scaled-up to accommodate the segmentation of teravoxel-sized datasets.

#### **THE PROPOSED METHOD**

#### *Image alignment and histogram specification*

All individual images of the input SBEM stack are converted to the MRC format and appended to an 8-bit MRC stack using the IMOD programs *dm2mrc* and *newstack*, respectively (Kremer et al., 1996). Sequential images within the stack are then translationally aligned to one another in the XY-plane using the cross-correlational alignment algorithm of the IMOD program *tiltxcorr*. To ensure consistency throughout the stack, the histograms of all images are matched to that of the first image in the stack using a MATLAB (The MathWorks, Inc., Natick, MA, U.S.A.) implementation of the exact histogram specification algorithm (Coltuc et al., 2006).

#### *Generation of training images and labels*

Once an organelle target has been selected by the experimenter, the next step is to generate a set of organelle-specific training images and labels to subsequently train a CHM pixel classifier. A set of N seed points, P, are selected throughout the processed SBEM stack in locations that possess at least one instance of the desired organelle, such that:

$$\mathbf{P\_i} = (\mathbf{x\_i}, \mathbf{y\_i}, \mathbf{z\_i}) \\ \forall \mathbf{i} \in \{1, \dots, N\}$$

These points should be chosen in a manner that yields a wide distribution throughout the stack. After the selection of seed points, every instance of the chosen organelle is manually segmented in a Q × R pixel tile centered at each Pi. Following manual segmentation, all tiles are extracted from the full SBEM stack using the IMOD program *boxstartend*. The extracted tiles will serve as training images, Ti. Binary training labels, Bi, are generated from each Ti by applying the corresponding manual segmentation as a mask using the IMOD program *imodmop*. Thus, the final outputs from training data generation are (1) a stack of 8 bit, grayscale training images, Ti, and (2) a stack of corresponding binary organelle masks, Bi. Both stacks are of size Q × R × N. A flow chart illustrating this process is shown in **Figure 2**.

#### *Training organelle pixel classifiers with the cascaded hierarchical model*

The CHM consists of bottom-up and top-down steps cascaded in multiple stages (Seyedhosseini et al., 2013b). The bottom-up step occurs in a user-specified number of hierarchical levels, L. At each level, the input stacks Ti and Bi are sequentially downsampled and a classifier is trained based on features extracted from the downsampled data as well as information from all lower levels of the hierarchy. After classifiers have been trained at all levels, the top-down path combines the coarse contextual information from higher levels into a single classifier that is applicable to images at native resolution. This whole process is then cascaded in a number of stages, S, where the output classifier from the previous stage serves as the input classifier for the subsequent stage. The final output is a pixel classifier, CS*,*L, that is applicable to images at the native pixel size of Ti and Bi. For optimal results, the number of stages chosen should be greater than one. The exact number of stages and levels chosen depends on a host of factors, including the size of Ti and Bi and the computational resources available to the experimenter.

#### *Probability map generation*

In the next step, a stack of test images, Ij, are selected to apply the pixel classifier to. Depending on the goals of the experiment, these images may be full slices of the SBEM volume or extracted subvolumes. Prior to pixel classification, each Ij is split into an *m* × *n* array of tiles such that the dimensions of each tile are roughly equivalent to the lateral dimensions of the training stacks, Q × R (step 3 of **Algorithm 1**). Tiling is performed with an overlap of U pixels between adjacent tiles. The choice of U is dependent on the size of the training stacks as well as the organelle target; in general, ideal values of U should fall in the range of 2–10% of Q and R. The previously generated CHM pixel classifier, CS*,*L, is then applied to each tile, yielding *m* × *n* probability map tiles (step 5 of **Algorithm 1**). All processed tiles are then stitched together to yield a final probability map, Mj (step 7 of **Algorithm 1**). When

The generation of a set of training data for mitochondrial automatic segmentation is shown here. First, a set of seed points, Pi, are selected such that a wide distribution throughout the volume is achieved (bottom left). Tiles of size Q × R centered at each seed point are extracted to serve as training images, Ti. All instances of the desired organelle target are manually

manual segmentations are then used as masks to binarize each Ti such that pixels of value one correspond to pixels of Ti that are positive for the desired organelle. This process is repeated N times to yield stacks of training images and their corresponding training labels, Bi. These stacks are then used to train a CHM classifier, CS*,*L, with the desired number of stages, S, and levels, L.

#### **Algorithm 1 | Organelle segmentation using tiled input images.**


stitching, the pixels in Mj that correspond to regions of overlap between adjacent tiles are set to the maximum intensity pixel from all contributing tiles. Finally, Mj is normalized such that each pixel ranges from [0, 1], with one representing the highest probability (step 8 of **Algorithm 1**). This process is then repeated over each Ij to yield the final stack of probability maps.

#### *Binarization of probability maps*

Each probability map, Mj, is binarized by evolving active contours (Chan and Vese, 2001) at automatically determined initial positions. For an unsupervised determination of the initial positions, the probability map Mj is first thresholded using Otsu's multilevel method (Otsu, 1979) with G unique gray levels (step 9 of **Algorithm 1**). The output from this operation is Oj, a map in which each pixel of Mj has been classified into one of G unique levels, with the zeroth level corresponding to the approximate background. This map is then binarized by thresholding Oj at a pixel intensity of G, yielding a mask of initial positions, Kj (step 10 of **Algorithm 1**). This binary mask is then made smaller by applying two iterations of morphological shrinking (step 11 of **Algorithm 1**) and used to initialize the evolution of active contours with a number of iterations and smoothing factor specified by α and λ, respectively (step 12 of **Algorithm 1**). Each 2D connected component of Kj serves as a unique initial position for contour evolution. For best results, α should be at least 50. The choice of λ depends largely on the organelle target and pixel size of the test images, but in general should fall in the range of 0– 8. Larger values of λ can be used when the pixel size is small. If the pixel size is too large (i.e., above 10 nm/pixel), smoothing should be turned off by setting λ to zero. The value of G significantly alters the results, and its choice is dependent on the goals of the experimenter. Low values of G tend to emphasize true positives at the risk of retaining false positives. As G is increased, false positives are more readily removed, but so are true positives. The final output from this process is SEGj, the organelle segmentation of the input grayscale image, Ij. An illustration of this process is shown for two test images in **Figure 3**.

#### *Meshing*

Each output SEGj is converted to the MRC format and appended to an MRC stack. Contours are drawn around each 2D connected component using the IMOD program *imodauto*. The output contours are then three-dimensionally meshed together using the program *imodmesh*, and separate 3D connected components are sorted into different objects using the program *imodsortsurf*. Meshing is performed using the low resolution option to reduce the effect of translational artifacts between subsequent image slices.

#### **EXPERIMENTAL VALIDATION**

#### *Tissue processing, image acquisition, and preprocessing*

The suprachiasmatic nucleus (SCN) of one 3-month-old, male C57BL/6J mouse was harvested and prepared for SBEM using a standard protocol (Wilke et al., 2013). The resin-embedded tissue was mounted on an aluminum specimen pin and prepared for SBEM imaging as previously described (Holcomb et al., 2013). Imaging was performed by detection of backscattered electrons (BSE) using a Zeiss Merlin scanning electron microscope equipped with a 3View ultramicrotome (Gatan). The SBEM image stack was acquired in ultrahigh vacuum mode using an accelerating voltage of 1.9 kV, a pixel dwell time of 500 ns, and a spot size of 1.0. Sectioning was performed with a cutting thickness of 30 nm. BSE images were acquired at 800x magnification with a raster size of 32,000 pixels × 24,000 pixels, yielding a pixel size of 3.899 nm/pixel. A total of 1283 serial images were acquired, resulting in an image stack with tissue dimensions of roughly 124*.*8 × 93*.*6 × 38*.*5μm (∼450,000μm3). The specimen was then removed from the chamber, and an image of a diffraction grating replica specimen (Ted Pella, Redding, CA, U.S.A.) was acquired for calibration of the lateral pixel size. Low magnification images of the block-face were acquired before and after sectioning. Image alignment was performed as described in Section Image Alignment and Histogram Specification. Following alignment, the stack was downsampled in the XY-plane by a factor of two, yielding a final stack with pixel dimensions of 16,000 × 12,000 × 1283 and pixel sizes of 7.799 nm/pixel and 30 nm/pixel in the lateral and axial dimensions, respectively. Since preliminary results did not demonstrate noticeable differences in the output of our method between the native resolution stack and the downsampled stack, downsampling was performed to reduce processing time. Exact histogram specification was performed as previously described. All image alignment and pre-processing steps were performed on a custom workstation (Advanced HPC, San Diego, CA, U.S.A.) with the following configuration: Xeon X5690 3.47 GHZ CPU, 48 GB RAM, 32 TB HDD, NVIDIA Quadro FX 3800, CentOS release 6.2.

#### *Automatic segmentation*

The four types of organelles targeted for automatic segmentation were mitochondria, lysosomes, nuclei, and nucleoli. These targets were chosen because they are morphologically and texturally diverse, and thus pose a significant test of the robustness of our method.

For each organelle target, 90 seed points were placed throughout the SBEM stack as described in Section Generation of

**FIGURE 3 | The binarization of probability maps using active contours initialized by a multi-level Otsu threshold yields accurate segmentation results.** Colorized maps, M, of a nucleus **(A)** and lysosomes **(D)** generated by applying Otsu's method with multiple levels to probability maps obtained by CHM pixel classification. Each color corresponds to a unique level of the threshold. Six gray levels (*G* = 6) were used for the nucleus and four (*G* = 4) were used for the lysosomes. Initial positions **(B,E)** were determined by selecting

Training Images and Labels. Training data and labels were created using the values shown in **Table 1**. Of the 90 tiles generated for each organelle, 50 were randomly selected for use in training a CHM classifier; the other 40 were set aside to use as test data for validation. Organelle-specific CHM classifiers were trained using the values shown in **Table 1**. The performances of all classifiers were evaluated by preparing receiver operating characteristic (ROC) curves (Fawcett, 2006). Each classifier was then used to generate probability maps of the 40 test images corresponding to its organelle. Segmentation was performed as described in Section Binarization of Probability Maps using the values shown in **Table 1**. All training, pixel classification, and segmentation steps were performed on the National Biomedical Computation Resource (NBCR) cluster, rocce.ucsd.edu (http://rocce-mgr. ucsd.edu/).

#### *Validation of the active contour segmentation of CHM probability maps*

Evaluation metrics were computed for each set of organellespecific test images by comparing their segmentations with manually segmented ground truth. For each stack, the confusion matrix consisting of the number of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) pixels was computed and used to calculate the true positive rate (TPR), false positive rate (FPR), precision, accuracy, and *F*-value, such that:

pixels corresponding to only the highest levels of each threshold followed by two iterations of morphological shrinking. Output segmentations **(C,F)** were obtained by evolving active contours about each of the initial positions in **(B,E)** with 100 iterations and a smoothing factor of 8 (α = 100, λ = 8). In the case of the lysosome images, note that a myelinated axon that was originally detected by the classifier as a false positive (**D**, arrow) has been removed from the final segmentation by the application of our method (**F**, arrow).

$$\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$

$$\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}$$

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$

$$\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FN} + \text{FP} + \text{TN}}$$

$$\text{F-value} = \frac{2 \times \text{Precision} \times \text{TPR}}{\text{Precision} + \text{TPR}}$$

This analysis was then repeated with segmentations generated from the same probability maps, but with a number of different unsupervised binarization algorithms: (1) Minimum error thresholding (Kittler and Illingworth, 1986), (2) Maximum entropy thresholding (Kapur et al., 1985), and (3) Otsu's singlelevel method (Otsu, 1979). The performance of each algorithm, as quantified with the above metrics, was compared against that of our own method for each organelle target.

Since ground truth was available, the pixel intensity threshold that maximized the *F*-value of each probability map with respect to its corresponding ground truth was determined by computing the *F*-value at incrementally increasing thresholds from [0*,...,*1] and taking the maximum value.



#### **SCALE-UP TO TERAVOXEL-SIZED DATASETS** *Determination of optimal downsampling levels for different*

#### *organelles*

Since the segmentation of entire SBEM datasets is computationally intensive, we first decided to determine to what degree input images could be downsampled before segmentation results were adversely affected. Downsampled versions of each set of training images, training labels, and test images were prepared for all four organelle targets. Downsampling was performed by factors of two, three, four, and five, yielding pixel sizes of roughly 15.59, 23.39, 31.19, and 38.90 nm/pixel, respectively. CHM classifiers with two stages and two levels were trained for each set of downsampled, organelle-specific training images and labels. Probability maps were computed with *m* = 2, *n* = 2, and *U* = 20. Segmentations were generated using the active contour method with *G* = 2, α = 100, and λ = 0. For each set of output segmentations, evaluation metrics were computed as described in Section Validation of the Active Contour Segmentation of CHM Probability Maps.

#### *Segmentation of organelles from a full SBEM stack*

The entire test dataset was laterally downsampled by a factor of eight, yielding a final stack with dimensions of 4000 × 3000 × 1283 pixels. The corresponding CHM classifiers generated in Section Determination of Optimal Downsampling Levels for Different Organelles were applied to produce stacks of probability maps at this pixel size for nuclei, nucleoli, and mitochondria. Processing was performed using an 8 × 6 tile array with an overlap of 20 pixels between adjacent tiles. Tiling, pixel classification, stitching, and binarization were performed using one CPU for each input image. One hundred total CPUs were used, such that 100 images were processed in parallel to expedite processing. All steps were performed on the National Biomedical Computation Resource (NBCR) cluster, rocce.ucsd.edu. Following probability map generation, all images were appended to organelle-specific MRC stacks, and contours and surface renderings were generated as described in Section Meshing.

#### **COMPARISON TO A PREVIOUSLY PUBLISHED ALGORITHM**

The results of our approach to nuclear automatic segmentation were validated by comparison with the results obtained by the algorithm of Tek et al. (2014). The full dataset was first downsampled to isotropic voxel dimensions (30 × 30 × 30 nm), resulting in a stack of size 4029 × 3120 × 1283 voxels. Training data and images consisted of a 500 × 500 × 50 subvolume of the downsampled stack containing two adjacent nuclei. Ground truth data were generated by manual segmentation of all neuronal, glial, and endothelial cell nuclei across fifty consecutive slices from the center of the dataset. A CHM pixel classifier with two stages and two levels was trained and applied to all images in the stack. Similarly, an ilastik voxel classifier was trained using all possible features with the same training images serving as input (Sommer et al., 2011). This classifier was subsequently applied to all images in the downsampled stack. CHM probability maps were binarized using the proposed method. The ilastik probability maps were binarized by thresholding at the level *p* = 0*.*5, followed by the application of the object detection algorithm of Tek and colleagues with Vth1 and Vth2 set to 25 and 10,000, respectively (Tek et al., 2014).

The source code for CHM and all related scripts are available to download from http://www*.*sci*.*utah*.*edu/software/chm*.*html. The training images, training labels, and test images used in this study have also been made available to download at this URL.

#### **RESULTS**

ROC curves for each organelle-specific CHM classifier are shown in **Figure 4**. In comparison to those for the other organelle classifiers, the ROC curve for the lysosomal classifier (**Figure 4B**) demonstrates a sparseness of data points with a low FPR. This is due to the extreme electron density of the lysosomal compartment and the number of other features in EM images that closely approximate it. Myelin sheaths (**Figure 3D**), plasma membranes, and other organelles cut *en face* can resemble the lysosomal compartment in both pixel intensity and texture and are frequently detected as false positives. Therefore, intelligent post-processing routines that utilize size and morphology are needed to separate lysosomes from such false positives.

applied to a 500 × 500 pixel test image **(A)**, generating the probability map shown in **(B)**. Note that regions of pixels corresponding to the Golgi apparatus (yellow arrows) were detected in the probability map. The Golgi apparatus can often confuse mitochondrial pixel classifiers because it has a texture very similar to that of the mitochondrial matrix. The results of binarization of the probability map using maximum entropy **(C)** and Otsu's single-level method **(D)** are shown. Using these techniques, regions of the Golgi are permitted into the final segmentation as false positives. The resultant segmentation obtained by our method of binarization with active contours (*G* = 2, α = 100, λ = 8) is shown in **(E)**. Instances of the Golgi apparatus were automatically removed during processing. This segmentation (*F* = 0*.*863, accuracy = 0.985) is a highly faithful representation of the ground truth **(F)**.

A comparison of our proposed active contour binarization method to the other methods tested is shown in **Figure 5** using mitochondria as an example. Since the Golgi apparatus can sometimes display a texture similar to that of the mitochondrial matrix, the presence of this organelle can confuse the mitochondrial classifier (**Figures 5A,B**, arrows). Segmentations generated with the maximum entropy algorithm (**Figure 5C**, recall = 0.992, precision = 0.498, *F* = 0*.*670, accuracy = 0.948) and Otsu's singlelevel method (**Figure 5D**, recall = 0.958, precision = 0.687, *F* = 0*.*812, accuracy = 0.977) retain elements of the Golgi apparatus as false positives. However, probability map binarization using the

**diverse organelle targets.** The application of our method to different organelle targets yields consistent results without the need to significantly change the input parameters. Shown here are test images, each of size 500 × 500 pixels, and their corresponding probability maps, segmentations,

transparent overlay of the segmentation onto the test image. The evaluation metrics for each test image are as follows: Mitochondria, *F* = 0.844, accuracy = 0.984; lysosomes, *F* = 0*.*872, accuracy = 0.997; nuclei, *F* = 0*.*971, accuracy = 0.971; nucleoli, *F* = 0*.*91, accuracy = 0.977.

proposed active contour method eliminates these false positives (**Figure 5D**, recall = 0.908, precision = 0.804, *F* = 0*.*863, accuracy = 0.985) when compared to the ground truth (**Figure 5E**). Output probability maps and active contour segmentations from example test images of each organelle are shown in comparison to their corresponding ground truth in **Figure 6**.

The segmentation evaluation metrics for each full stack of 40 organelle-specific test images are shown in **Table 2**. The proposed active contour segmentation method resulted in a superior recall for all four organelles and a superior *F*-value for mitochondria, lysosomes, and nucleoli when compared to the other segmentation methods. The *F*-value for nuclear segmentation is negligibly better using Otsu's single-level method. The lack of distinction between these two binarization methods for nuclei is due largely to the already high quality of nuclear probability maps. The accuracy values obtained for each stack using active contour segmentation were 0.985, 0.997, 0.972, and 0.979 for mitochondria, lysosomes, nuclei, and nucleoli, respectively.

A histogram of the probability map pixel intensity thresholds that maximize the *F*-value for each test image are show in **Figure 7**. The wide spread of optimal threshold values for each organelle demonstrates the importance of using an unsupervised algorithm for probability map binarization, such as the one proposed here. Simply setting a pixel intensity threshold for each probability map would yield poor segmentations for a number of test images. This is especially true in very large SBEM images, where alterations in staining or focus may occur differentially throughout regions of the image stack.

The results of our downsampling experiment are shown in **Figure 8**. The resultant *F*-value for segmentation of nuclei and nucleoli remains remarkably consistent across the whole range of pixel sizes tested. The *F*-values for mitochondria and lysosomes exhibit substantial reductions at pixel sizes greater than ∼15 nm/pixel, corresponding to an overall downsampling of the original SBEM stack by a factor of four. The persistence of a high *F*-value across all scales tested for nuclei and nucleoli is likely due to their larger size and more regular texture in comparison to the other organelles. This is especially true for mitochondria, whose cristae architectures may differ dramatically from region to region.

The required wall clock time and random access memory (RAM) required for CHM classifier training and pixel classification for each organelle at each level of downsampling are shown in **Table 3**. The time and RAM required for probability map binarization are not shown because they are negligible with respect to training and classification. These results indicate that, in cases where segmentation accuracy is not dramatically affected, a vast amount of time and computational resources can be saved by downsampling the input image stacks. Simple extrapolation of pixel classification times shows that the time required by a single

**Table 2 | Segmentation evaluation metrics for the tested organelle targets using various methods of probability map binarization.**


CPU to apply a nuclear pixel classifier to our full test dataset would be reduced from ∼5.9 to ∼0.4 years when the input data are downsampled by a factor of 10.

These time and memory requirements were dramatically reduced by implementing tiling and processing over multiple CPUs. During segmentation of the full, downsampled dataset, the average processing time per 500 × 500 tile was 3.28 ± 0.39 min (average and standard deviation, *N* = 600), with no significant difference in average time between organelles. By utilizing parallel processing with 100 CPUs, probability maps for the entire stack were generated in roughly 33 h. An example full slice

**FIGURE 8 | Input images can be downsampled to various degrees before the segmentation results are negatively affected.** Each organelle-specific stack was downsampled by factors of two, four, six, eight, and ten. Separate classifiers were trained at each different pixel size and segmentations were generated for each stack using our method. Here, the *F*-value of each resultant stack is compared across the different pixel sizes obtained after downsampling. The *F*-value of nuclei (blue) and nucleoli (magenta) is remarkably independent of the level of downsampling across all levels tested. The *F*-values for mitochondria (red) and lysosomes (green) significantly decline as the level of downsampling is increased.



*The dimensions of the stack of training images and labels used to train the classifier are given. The values for pixel classification correspond to the average values required to generate a probability map for one tile of roughly 60µm2 at the tissue level (1000* × *1000 pixels at 2x downsampling). Values are reported as the mean and standard deviation (N* = *40 for each). Time is reported as the wall clock time for the indicated process.*

and its corresponding nuclear probability map are shown in **Figures 9A,C**. **Figures 9B,D** depict additional probability maps of mitochondria and nucleoli, respectively. The full slice probability maps of these other organelles were computed in a manner similar to that of the nuclei.

When applied to the segmentation of nuclei from the full SCN dataset following downsampling to isotropic voxel dimensions, the proposed method achieved a precision, recall, and *F*-value of 0.976, 0.977, and 0.977, respectively. Similarly, the method of Tek et al. (2014) achieved a precision, recall, and *F*-value of 0.976, 0.542, and 0.697, respectively, when applied to the same dataset using the same training data. Due to an already high precision and low number of false positives, the final object classification step performed by Tek and coworkers was omitted. Evaluation metrics were computed using fifty consecutive manually annotated slices as ground truth.

A surface rendering of a full SCN neuron containing renderings of its nucleus, nucleolus, and mitochondria is shown in **Figure 10**. The plasma membrane of the neuron was manually segmented by a trained neuroanatomist. The surface renderings of all organelles were automatically generated, with minor manual corrections applied.

#### **DISCUSSION**

As recently as a few years ago, the notion of reconstructing and morphologically characterizing the organelle networks of even a few whole cells was considered a monumental challenge (Noske et al., 2008). The advent and widespread adoption of high throughput, volumetric EM techniques has threatened to change that notion, with the caveat that our ability to segment and analyze data must first catch up with our ability to collect it. With that goal in mind, this study aimed to develop a method for the accurate automatic segmentation of organelles in EM image stacks that: (1) could be easily adapted to any organelle of interest, and (2) could be applied to teravoxel-sized datasets in a computationally efficient manner.

Since it does not make any large-scale, *a priori* assumptions about the morphology of the segmentation target, the proposed method can be applied to segment diverse organelles with ease. The only geometrical properties assumed throughout the method are boundary smoothness and a cross-sectional area that is sufficient enough to prevent the removal of true positives following binary shrinking. Both of these assumptions are valid for virtually all organelles under practical imaging conditions. CHM classifiers can be trained for any dataset or organelle target if given the proper training data, and the output segmentations from our method can be tuned to the demands of unique experiments. For example, decreasing the number of gray levels, G, used in the multi-level Otsu thresholding step will emphasize true positives at the expense of including false positives, which can often be excluded by post-processing filters. Additionally, it is easier to remove false positives by manual correction or crowd-sourcing (Giuly et al., 2013) than it is to add missing true positives.

The proposed method performed favorably when compared to a recently published algorithm for the automatic segmentation of cell nuclei (Tek et al., 2014). It is interesting to note that the performance of our method was very similar when trained using either images from consecutive slices of the same nuclei (precision = 0.976, recall = 0.977) or single slice images from a variety of nuclei (precision = 0.973, recall = 0.968). This similarity demonstrates the robustness of the CHM pixel classifier for this task. It is likely that the segmentation results obtained by applying the method of Tek and colleagues to the SCN dataset could be strengthened by training an ilastik voxel classifier against a greater diversity of nuclei.

Another advantage of the proposed method lies in its scalability to full datasets. The generation of probability maps from small tiles of the input image minimizes the required RAM. Additionally, it allows for computation to be easily expedited by parallelizing the processing of individual tiles across multiple CPUs. Our demonstration that accurate results for certain organelles can be achieved on downsampled stacks also helps expedite processing. One can envision an experiment in which a teravoxel-sized SBEM stack collected at high resolution for axon tracking can then be downsampled and have its nuclei or mitochondria automatically segmented at a fraction of the computational cost that would have been required at its native resolution. As innovative methods to rapidly acquire even larger datasets continue to be developed (Mohammadi-Gheidari and Kruit, 2011; Helmstaedter et al., 2013; Marx, 2013), this reduction in computational cost will prove critical.

Although it is beyond the scope of this paper, a number of 3D post-processing steps that would lead to further improvements in the results of automatic segmentation can be proposed. A simple size exclusion filter could be applied to 3D connected components to remove false positives that do not fall within the possible size range for the given organelle. A scan over every segmented slice of each 3D component could be performed to look for aberrant spikes or troughs in 2D metrics such as perimeter or area. The locations of these spikes and troughs would indicate slices on which a poor segmentation occurred, and these slices could be correspondingly removed and replaced by interslice interpolations. The application of such processes to the output from our method will be the subject of future development.

In conclusion, this paper introduces novel methods for the automatic segmentation of organelles from EM image stacks that are both robust and able to handle datasets of any size. These tools fill a critical need by allowing for the quantitative analysis of volumetric EM datasets at a scale between that of current connectomics approaches (Briggman and Denk, 2006; Anderson et al., 2011; Bock et al., 2011; Briggman et al., 2011; Kleinfeld et al., 2011; Varshney et al., 2011; Helmstaedter et al., 2013; Kim et al., 2014) and that afforded by genetically encoded markers for small molecule localization (Shu et al., 2011; Martell et al., 2012; Boassa et al., 2013).

#### **AUTHORS AND CONTRIBUTORS**

Alex J. Perez, Mojtaba Seyedhosseini, Tolga Tasdizen, Satchidananda Panda, and Mark H. Ellisman designed research. Alex J. Perez, Mojtaba Seyedhosseini, Thomas J. Deerinck, and Eric A. Bushong performed research. Alex J. Perez and Mojtaba Seyedhosseini analyzed data. Alex J. Perez wrote the paper.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Christopher Churas for his assistance with CHM and Anna Kreshuk and Stuart Berg for their assistance with ilastik. This work was supported by grants from the following entities: the National Institute of General Medical Science (NIGMS) under award P41 GM103412 to Mark H. Ellisman, the National Institute of Neurological Disorders and Stroke under award number 1R01NS075314 to Mark H. Ellisman and Tolga Tasdizen, the National Biomedical Computation Resource (NBCR) with support from NIGMS under award P41 GM103426, the National Institutes of Health (NIH) under award RO1 EY016807 to Satchidananda Panda, and Fellowship support (Alex J. Perez) from the National Institute on Drug Abuse under award 5T32DA007315-11.

#### **REFERENCES**


genetically encoded probes for correlated light and electron microscopy: implications for Parkinson's disease pathogenesis. *J. Neurosci*. 33, 2605–2615. doi: 10.1523/JNEUROSCI.2898-12.2013


Zhuravleva, E., Gut, H., Hynx, D., Marcellin, D., Bleck, C. K. E., Genoud, C., et al. (2012). Acyl coenzyme A thioesterase Them5/Acot15 is involved in cardiolipin remodeling and fatty liver development. *Mol. Cell. Biol.* 32, 2685–2697. doi: 10.1128/MCB.00312-12

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 July 2014; accepted: 19 October 2014; published online: 07 November 2014.*

*Citation: Perez AJ, Seyedhosseini M, Deerinck TJ, Bushong EA, Panda S, Tasdizen T and Ellisman MH (2014) A workflow for the automatic segmentation of organelles in electron microscopy image stacks. Front. Neuroanat. 8:126. doi: 10.3389/fnana. 2014.00126*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Perez, Seyedhosseini, Deerinck, Bushong, Panda, Tasdizen and Ellisman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Three-dimensional distribution of cortical synapses: a replicated point pattern-based analysis

#### *Laura Anton-Sanchez <sup>1</sup> \*, Concha Bielza1, Angel Merchán-Pérez 2,3, José-Rodrigo Rodríguez 2,4, Javier DeFelipe 2,4 and Pedro Larrañaga1*

*<sup>1</sup> Departamento de Inteligencia Artificial, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Madrid, Spain*

*<sup>2</sup> Laboratorio Cajal de Circuitos Corticales, Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain*

*<sup>3</sup> Departamento de Arquitectura y Tecnología de Sistemas Informáticos, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Madrid, Spain*

*<sup>4</sup> Instituto Cajal, Consejo Superior de Investigaciones Científicas, Madrid, Spain*

#### *Edited by:*

*Julian Budd, University of Sussex, UK*

#### *Reviewed by:*

*Suzana Herculano-Houzel, Universidade Federal do Rio de Janeiro, Brazil Rolf Turner, University of Auckland, New Zealand*

#### *\*Correspondence:*

*Laura Anton-Sanchez, Departamento de Inteligencia Artificial, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo s/n, Boadilla del Monte, 28660 Madrid, Spain e-mail: l.anton-sanchez@upm.es*

The biggest problem when analyzing the brain is that its synaptic connections are extremely complex. Generally, the billions of neurons making up the brain exchange information through two types of highly specialized structures: chemical synapses (the vast majority) and so-called gap junctions (a substrate of one class of electrical synapse). Here we are interested in exploring the three-dimensional spatial distribution of chemical synapses in the cerebral cortex. Recent research has showed that the three-dimensional spatial distribution of synapses in layer III of the neocortex can be modeled by a random sequential adsorption (RSA) point process, i.e., synapses are distributed in space almost randomly, with the only constraint that they cannot overlap. In this study we hypothesize that RSA processes can also explain the distribution of synapses in all cortical layers. We also investigate whether there are differences in both the synaptic density and spatial distribution of synapses between layers. Using combined focused ion beam milling and scanning electron microscopy (FIB/SEM), we obtained three-dimensional samples from the six layers of the rat somatosensory cortex and identified and reconstructed the synaptic junctions. A total volume of tissue of approximately 4500 <sup>3</sup> μm and around 4000 synapses from three different animals were analyzed. Different samples, layers and/or animals were aggregated and compared using RSA replicated spatial point processes. The results showed no significant differences in the synaptic distribution across the different rats used in the study. We found that RSA processes described the spatial distribution of synapses in all samples of each layer. We also found that the synaptic distribution in layers II to VI conforms to a common underlying RSA process with different densities per layer. Interestingly, the results showed that synapses in layer I had a slightly different spatial distribution from the other layers.

**Keywords: spatial distribution of synapses, neocortex, dual-beam electron microscopy, FIB/SEM, replicated spatial point patterns, random sequential adsorption, 3D Ripley's** *K* **function, Besag's** *L* **function**

#### **1. INTRODUCTION**

A very dense network of neuronal and glial processes occupies the space between the cell bodies of the neurons, glia, and blood vessels. This is commonly referred to as "the neuropil." Given that most synapses are found here and the neuropil accounts for the largest volume of the cerebral cortex, it follows that most synaptic interactions take place in the neuropil (Alonso-Nanclares et al., 2008). The majority of these synapses are chemical synapses (for simplicity's sake referred to as synapses) which are identified at the electron microscope level for the following elements: synaptic vesicles in the presynaptic axon terminal adjacent to the presynaptic density, a synaptic cleft (with electron-dense material in the cleft) and densities on the cytoplasmic faces in the pre- and postsynaptic membranes.

One major issue in cortical circuitry is to ascertain how synapses are distributed and whether or not synaptic connections are specific or not (DeFelipe et al., 2002b). To understand the anatomical design principles of cortical circuits, it is essential to analyze the ultrastructure of all components of the neuropil and in particular the number and spatial distribution of synapses. Furthermore, synaptic size plays an important role in the functional properties of synapses (Schikorski and Stevens, 1997; Takumi et al., 1999; Lüscher et al., 2000; Tarusawa et al., 2009). Thus, numerous researchers have been trying to find simple and accurate methods for estimating the distribution, size and number of synapses. To this end, two sampling procedures are currently available: one is based on serial reconstructions and the other on single sections. Clearly, serial reconstruction should be the method of choice for the challenging task of unraveling the extraordinary complexity of the nervous system. Indeed, serial sectioning transmission electron microscopy is a well-established and mature technology for collecting three-dimensional data from ultrathin sections of brain tissue (Stevens et al., 1980; Harris et al., 2006; Hoffpauir et al., 2007; Mishchenko et al., 2010; Bock et al., 2011). It is based on imaging ribbons of consecutive sections with a conventional transmission electron microscope (TEM). However, the major limitation is that it is extremely timeconsuming and difficult to obtain long series of ultrathin sections, often making it impossible to reconstruct large volumes of tissue. Hence, the recent development of automated electron microscopy techniques is a vital step forward in the study of neuronal circuits (Briggman and Denk, 2006; Knott et al., 2008; Merchán-Pérez et al., 2009). Using combined focused ion beam (FIB) milling and scanning electron microscopy (SEM), we have recently shown (Merchán-Pérez et al., 2014) that synapses in the neuropil of layer III of the rat somatosensory cortex show a nearly random spatial distribution, with the only constraint that they cannot overlap in space; distribution that can be modeled by a random sequential adsorption (RSA) process (Evans, 1993), where synapses are given a random position in space and assigned a certain size derived from experimental data.

The aim of this research was to explore the three-dimensional distribution of synapses in the cerebral cortex as a whole and, particularly, find out whether there is a general pattern of distribution of synapses for the six cortical layers, and identifying any possible similarities and differences between layers. To do this, we studied the density of synapses and their spatial distribution as follows. First, we analyzed the synaptic density in each of the six layers of the somatosensory cortex and examined whether there were significant differences between layers. Second, we performed spatial modeling to test whether each sample from different neocortical layers conforms to an RSA model. Third, we used replicated spatial point patterns to analyze similarities and differences in the synaptic spatial distribution between groups of samples of each cortical layer.

Finally, note that we have used postnatal day 14 Wistar rats since we intend to integrate these data with other anatomical, molecular and physiological data that have already been collected from the same cortical region of the P-14 Wistar rats. The final goal is to create a detailed, biologically accurate model of the brain within the framework of the Blue Brain Project (http://bluebrain. epfl.ch/).

#### **2. MATERIALS AND METHODS**

#### **2.1. TISSUE PREPARATION AND THREE-DIMENSIONAL ELECTRON MICROSCOPY**

Three male Wistar rats sacrificed on postnatal day 14 were used for this study. They were handled in accordance with the guidelines for animal research set out in European Union Directive 2010/63/EU, and all procedures were approved by the Spanish National Research Council (CSIC) local ethics committee. Animals were administered a lethal intraperitoneal injection of sodium pentobarbital (40 mg/kg) and were intracardially perfused with 2% paraformaldehyde and 2.5% glutaraldehyde in 0.1M phosphate buffer. The brain was then extracted from the skull and vibratome sections (150 microns thick) were obtained, processed for electron microscopy and flat-embedded in Araldite according to a previously described protocol (Merchán-Pérez et al., 2009, 2014). Three-dimensional brain tissue samples were obtained from flat-embedded vibratome sections using a combined focused ion beam/scanning electron microscope (FIB-SEM). This electron microscope (Neon40 EsB, Carl Zeiss NTS GmbH, Oberkochen, Germany) combines a high-resolution field emission SEM column with a focused gallium ion beam which mills the sample surface, removing thin layers of material on a nanometer scale.

Stacks of serial sections were obtained from the six cortical layers (see **Table 1**). Samples from layer III were used in a previous study (Merchán-Pérez et al., 2014). To select the exact location of the samples in the different cortical layers, we first obtained plastic semithin sections (2μm thick) from the block surface, which we stained with toluidine blue. These sections were

**Table 1 | Animal ID, volume, counts, and density of synaptic junctions per sample in each layer of the somatosensory cortex.**


*Total quantities and mean for each layer are shown.*

then photographed with a light microscope. The last of these light microscope images (corresponding to the section immediately adjacent to the block face) was then collated with low power SEM photographs of the block surface. In this way, we were able to accurately identify the regions of the neuropil to be studied. To obtain each sample from the selected location, the FIB was positioned perpendicular to the block surface. Next, a trench (approximately 30μm wide, 20μm high and 15μm deep) was excavated on the block surface. Since the SEM column is positioned at an angle of 54◦ to the FIB column, the distal face of the trench can be imaged with the SEM after each milling cycle. The milling/imaging cycle was then set to remove 20 nm of material from the distal face of the trench. After removing each slice, the milling process was paused and the freshly exposed surface was imaged with a 1.8 kV acceleration potential using the in-column energy selective backscattered electron detector (EsB). The milling and imaging processes were sequentially repeated to acquire long series of images by means of a fully automated procedure, outputting a stack of images that represented a three-dimensional sample of the tissue. The total number of serial sections per sample ranged from 189 to 363 (mean = 258.6), the imaged field of view was approximately 7.6 × 5.7 microns, and image resolution in the XY plane ranged from 3.7 to 11.10 nm/pixel. Z-axis resolution (section thickness) was 20 nm. In this way, the total tissue volume that was actually milled away during the milling/imaging cycles was relatively small, and we were able to obtain multiple samples from different layers or from neighboring regions within the same layer.

Synaptic junctions within each stack of serial sections were visualized, automatically segmented and reconstructed in three dimensions using Espina software (Morales et al., 2011). In order to calculate the number of synapses per unit volume, we applied a three-dimensional unbiased counting frame (Howard and Reed, 2005). Espina software output the volume of the unbiased counting frame, the number of synaptic junctions inside the frame, the spatial position of the centroids or centers of gravity of the synaptic junctions, and an estimation of their sizes using Feret's diameter (the diameter of the smallest sphere circumscribing the synaptic junction). Brain tissue shrinks during processing for electron microscopy, especially during osmication and plastic embedding. To estimate the shrinkage in our samples, we measured the surface area and thickness of the vibratome sections before and after they were processed for electron microscopy (Oorschot et al., 1991; Merchán-Pérez et al., 2009). The estimated linear, area and volume shrinkage factors were 0.90, 0.81, and 0.73, respectively. To obtain an estimate of the pre-processing values, all measured distances, areas and volumes were divided by their corresponding shrinkage factor. After correcting for tissue shrinkage, the samples that were subsequently used for spatial statistical analysis consisted of a cloud of points representing the centers of gravity or centroids of synaptic junctions. Each of these points had an associated Feret's diameter as an estimation of the size of each synaptic junction.

#### **2.2. SPATIAL STATISTICS**

Within the field of spatial statistics, spatial point processes are mathematical models that describe the arrangement of elements randomly or irregularly distributed in space. A spatial point pattern is defined as a realization of a spatial point process (Illian et al. (2008) provides a good introduction to the topic). The elements in the pattern are represented by point coordinates in the appropriate dimension. In this study, our elements are synaptic junctions located in three dimensions.

Spatial point process statistics provides the tools to characterize patterns in terms of the number and distribution of the elements. To do this, two aspects are mainly analyzed: intensity (average number of points per unit volume, denoted by λ) and inter-point interactions, closely related to distances between points.

#### *2.2.1. Synapse density in different layers*

The most important numerical summary characteristic for a point process is the intensity λ. Point intensity is the simplest distributional property and is similar to the use of the sample mean in classical statistics. Thus, the first step in our analysis was to estimate the synaptic density of each layer and, more specifically, to study whether there were significant differences between synaptic densities in different layers of the somatosensory cortex. We used the simulation process described below along with a multiple mean comparison test.

We calculated a fixed-volume sampling box to extract subsamples from the original experimental samples. The x, y, z dimensions of this box were equal to the smallest x, y, z dimensions of the experimental samples, so the box could be applied to any of the samples without exceeding their boundaries. We then used this box to extract centroids from randomly selected samples of each layer at random locations. We repeated this process 50 times for each layer, thus obtaining 50 different synaptic densities per layer. See **Figure 1**.

To study whether there were significant differences between synaptic densities of the different layers, we performed a multiple mean comparison test on the 50 extracted densities for each of the six cortical layers. Because not all of the necessary assumptions for ANOVA were satisfied (data were normally distributed but homoscedasticity was not met, i.e., the variance of data in each layer was not the same), we used the Kruskal-Wallis test and then applied the Mann-Whitney test with the Bonferroni method to adjust the *p*-values for pair-wise comparisons.

#### *2.2.2. Modeling of spatial point processes*

Merchán-Pérez et al. (2014) recently showed that the RSA model adequately describes the spatial distribution of synaptic junctions in layer III. The second step in the analysis of the entire cerebral cortex was to test the RSA model for each of our samples from layer I to VI.

Although virtually all cortical synapses can be accurately identified as asymmetric (or Gray's type I) and symmetric (or Gray's type II) using FIB/SEM (Merchán-Pérez et al., 2009), we considered synaptic junctions as a whole. This was because it was not feasible to test RSA models for such a small number of symmetric synapses (they accounted for less than 10% of the total number of synapses found in any cortical layer). Furthermore, as reported previously (Merchán-Pérez et al., 2014), results were similar when all synapses (asymmetric and symmetric) were studied as a single

group and when only asymmetric synapses were analyzed. Thus, for simplicity's sake, we will use synaptic junctions to refer to both types of synapses.

randomly, a box inside this sample and counted the number of synaptic

An RSA process (Evans, 1993) is a type of hard-core process, i.e., two points cannot be placed closer than a minimum distance, where locations are chosen randomly, subject only to the distance constraint. These minimum distances can be fixed or, as in our case, calculated according to a probability density function. Considering that the synaptic junctions cannot overlap, and therefore the minimum distances between synapses are limited by the size of the junctions at least, the RSA process is particularly well suited here. We have used Feret's diameter of each synaptic junction as an estimate of its size. As in Merchán-Pérez et al. (2014) for layer III, we found that Feret's diameters in all layers were lognormally distributed.

To test the RSA models we used one of the summary characteristics most commonly used in the analysis of spatial point processes, namely Ripley's *K* function and, particularly, a common transformation of it, Besag's *L* function (Ripley, 1977).

Ripley's *K* function for a distance *d*, *K*(*d*), is defined as the expected number of other points of the process within a distance *d* of a typical point of the process divided by the intensity. The Miles-Lantuéjoul-Stoyan-Hanisch translation edge-correction is often used to estimate *K*(*d*) (Ohser, 1983; Baddeley et al., 1993):

$$\hat{K}(d) = \frac{\nu o l(B)^2}{N(B)^2} \sum\_{\mathbf{x}\_k \in B} \sum\_{\mathbf{x}\_l \neq \mathbf{x}\_k} \frac{\mathbf{1}\{||\mathbf{x}\_k - \mathbf{x}\_l|| \le d\}}{\gamma\_B(\mathbf{x}\_k - \mathbf{x}\_l)},\tag{1}$$

where **1**{·} denotes the indicator function, || · || is the Euclidean distance, *N*(*B*) is the number of points falling in a region *B* ⊂ R3, *xk*, *k* = 1,..., *N*(*B*) are the observed points, *vol*(*B*) is the volume of the region *B* and γ*<sup>B</sup>* is the 'set covariance', γ*B*(*xk* − *xl*) = *vol*({*x*|*x* + *xk* − *xl* ∈ *B*}) = *vol*(*B* ∩ (*B* − (*xk* − *xl*))).

length in each dimension (x, y, z) considering all samples.

The homogeneous spatial Poisson point process, also known as complete spatial randomness (CSR), is considered as the reference model in spatial point process statistics, since it represents a boundary condition between regular and clustered patterns. A random pattern, where a point is equally likely to occur at any location regardless of the locations of other points, follows a CSR process. The patterns known as regular patterns show repulsion, i.e., the distances between points are larger than expected in a random pattern of the same intensity. Furthermore, patterns where points tend to be closer than they should be for a given intensity are known as clustered patterns.

The three-dimensional CSR process has the following expression for the *K* function (a clustered pattern curve will be shifted to the left, whereas a regular pattern curve will be shifted to the right):

$$K\_{\rm CSR}(d) = \frac{4}{3}\pi d^3. \tag{2}$$

Besag's *L* function is a commonly used transformation of the *K* function. The 3D expression is:

$$L(d) = \sqrt[3]{\frac{3}{4\pi}K(d)}.\tag{3}$$

This transformation converts the CSR *K* function to the straight line *LCSR*(*d*) = *d*, making the plots much easier to assess visually. The transformation approximately stabilizes the variance of the estimator, also facilitating deviation assessment. For the *L* function, a regular pattern curve will be below the diagonal (CSR) and a clustered pattern will be above.

The expression of Ripley's *K* function for the RSA process is analytically unknown, so we have to use RSA simulations. To simulate an RSA process we need to know its intensity and the probability density function of the minimum distances between points. In our case, we need the synaptic density λ and the μ and σ parameters of the lognormal distribution of Feret's diameters. An RSA process simulation starts with an empty window to which spheres, whose radii follow the lognormal distribution fitted using Feret's diameters, are added randomly one at a time. If the new simulated synapse intersects with any existing sphere, the new sphere is rejected, and another sphere is generated with another location and radius. The process continues until the target intensity is reached.

For example, **Figure 2** shows the *K* and *L* summary functions of experimental sample 1 from Layer I (blue), the average of 99 RSA simulations performed for this sample (green) and the functions for a CSR process (red). Each RSA simulation had the same intensity as the original sample, and the size of simulated synapses was calculated according to the lognormal distribution fitted using Feret's diameters of all the synapses of the sample. Generally, the *K* functions were very similar to each other across all distances for all the samples. Moreover, for short distances (200–300 nm), the *L* functions of the samples and RSA processes were well below the diagonal line (CSR) representing the empty space around centroids which should not contain any centroid (non-overlapping synapse constraint). From about 400 nm onwards, the *L* functions of both models and experimental samples were again very similar to each other.

To test differences between two summary functions we used simulation-based envelopes. The statistical rationale of this common procedure is to be found in Monte Carlo testing. Taking the advice of Baddeley et al. (2014), we transformed the *K* function into the *L* function and used global envelopes since we had no prior information about the range of spatial interaction. Note that Monte Carlo tests "are strictly invalid, and probably conservative, if parameters have been estimated from the data" (Diggle, 2003). To overcome this obstacle, we adjusted an RSA process for each sample *j* in each layer *i* (*i* = I,..., VI) and estimated the parameters λˆ*ij*, μˆ *ij* and σˆ*ij* using only the remaining samples of the same layer. The sizes of the simulated synapses were calculated according to the lognormal distribution fitted using Feret's diameters of these remaining (*mi* − 1) samples in layer *i*, where *mi* is the number of samples in layer *i*. If *volit* denotes the volume of sample *t* in layer *i*, then

$$\hat{\lambda}\_{ij} = \frac{\sum\_{\substack{t=1 \\ t \neq j \\ m\_i \equiv 1 \\ t \neq j}}^{m\_i} \lambda\_{ij} \, vol\_{it} \,} \, . \tag{4}$$

RSA processes. *K* (left) and *L* (right) functions of the experimentally observed data (blue) along with the theoretical CSR (red) and the average of 99 RSA process simulations fitted for sample 1 (green). The *K* functions of the

and the experimentally observed sample are positioned well below the diagonal (CSR) for short distances and are fairly close to the diagonal for larger distances.

The RSA null hypothesis was tested as follows. For each sample, we performed 99 RSA simulations with the described parameters. We calculated the average *L* function of all these simulations and took this average, *L*¯, to be an estimate of the theoretical mean value of the *L* summary statistic for the RSA model. The global envelope is a region of constant width 2*wmax*, where *wmax* is determined as the furthest deviation between *L*¯ and any of the *L* functions of a separate set of 99 RSA simulations with the same parameters at any distance *d* along the horizontal axis. We rejected the null hypothesis if the *L* function of the sample lay outside the envelope for any value of *d* (see Section 3.2 and **Figure 3**).

We analyzed spatial patterns using R software and the **spatstat** package (Baddeley and Turner, 2005; Baddeley, 2010). We obtained the translation edge-correction estimator of Ripley's *K* function in three dimensions for both the observed samples and the RSA simulations using the *K3est* function included in the **spatstat** package and we directly calculated the *L* functions from *K* functions using Equation (3). To compute the simulation envelopes of the *L* functions we used the *envelope.pp3* function, also included in the **spatstat** package. We used this function with the three-dimensional point pattern for each sample and 198 three-dimensional point patterns of RSA simulations performed for that sample.

#### *2.2.3. Replicated spatial point patterns*

Replicated spatial point patterns are a particular situation in the spatial point processes field where different patterns are considered as instances of the same process and are said to form a group. In our case we have several samples of each layer of the somatosensory cortex, so we conducted an analysis in the context of replicated patterns.

Let *nij* (*j* = 1,..., *mi*) be the number of synapses for the *j*th sample in the *i*th group (*i* = 1,..., *g*). Given an estimate of the *K* function for each sample *j* in each group *i*, *K*ˆ*ij*(*d*), the estimated mean function for each group is defined as

$$\bar{K}\_i(d) = \frac{\sum\_{j=1}^{m\_i} \nu\_{ij} \hat{K}\_{ij}(d)}{\sum\_{j=1}^{m\_i} \nu\_{ij}}, \ i = 1, \ldots, \text{g.} \tag{5}$$

Different weights *wij* have been proposed in the literature for function aggregation; see Pawlas (2011) for a review. Myllymäki et al. (2012) chose to use *wij* = *n*<sup>2</sup> *ij* to aggregate *K* functions together with linear mixed models to investigate the spatial structure of epidermal nerve fibers. Jafari-Mamaghani et al. (2010) used *wij* = *nij* to study the three-dimensional distribution of

**FIGURE 3 | Analysis of spatial patterns using global envelopes (sample 1 for each layer of the somatosensory cortex).** The *L* functions of the experimentally observed samples are shown in blue, and the averages of 99 RSA simulations are shown in green. The shaded area represents the envelopes of values calculated from a separate set of 99 RSA simulations.

We do not reject the RSA null hypothesis for any sample because no observed *L* function lies outside the envelope for any value of distance *d*. The results for all samples in the study were the same (see Supplementary material). Dashed red lines show the theoretical value for CSR (for the purpose of visual comparison only).

pyramidal neurons in the mouse barrel cortex. The weight *wij* = *nij* was also recommended by Diggle (2003). In this paper, we also chose this option.

We performed the Diggle test (Diggle et al., 1991, 2000) to study similarities and differences between groups of replicated data. This test uses a bootstrap procedure to check whether there are significant differences between empirical *K* functions of independent replicates. Using 5000 bootstrap iterations, we studied whether there were differences between the study animals and between different cortical layers.

It is scientifically correct to construct an aggregated estimator of the *K* function without assuming a common intensity across all replicates because the *K* function is defined as independent of the intensity. This assumes that the hypothesis of a common *K* function and varying intensity is plausible, as would be the case if the replicates were different intensity versions of a common underlying process (Diggle, 2013). To test if this applied in our case, we adjusted a global spatial model for groups of replicates in which the Diggle test found no significant differences. Then we applied different random thinning procedures (i.e., randomly deleting points from the original model) and introduced a crossvalidation technique to honestly estimate the goodness-of-fit of the resulting models.

More explicitly, assume that *A*, *B*, and *C* were the groups where the Diggle test found no significant differences, and let *mA*, *mB*, and *mC* be the number of samples in each group. We adjusted the global spatial model RSA*global* with parameters μ*global*, σ*global*, and λ*global*. Parameters μ*global* and σ*global* were obtained by fitting the lognormal distribution of Feret's diameters considering all synapses of all samples from groups *A*, *B*, and *C* and were used to estimate the size of the synapses in the global model. Let λ*ij* be the synaptic density for the *j*th sample in the *i*th group, λ*global* was chosen such that λ*global* > λ*ij* for all *i*,*j*, i.e., we considered a global model that was *denser* than each of the samples separately (we chose to make λ*global* 1% denser than the maximum density of each sample separately).

Our goal, then, was to check whether groups *A*, *B*, and *C*, whose *K* functions were found not to be significantly different, were different thinned versions of a common underlying process. In other words, we wanted to find out whether the processes that described the spatial distribution of samples from groups *A*, *B*, and *C* were different thinned versions of the global spatial model RSA*global*.

To do this, we ran 198 *dense* RSA*global* simulations with the estimated parameters μ*global*, σ*global*, and λ*global*. Then we thinned each of these *dense* simulations for each sample in each group. We used a cross-validation technique to check if these simulations had the same spatial distribution as the experimentally observed sample. Specifically, we applied the following cross-validation process for each sample *j* (test sample) in each group *i*:


density λˆ*ij*. Thus we obtained a set of 198 thinned RSA*ij* simulations for sample *j* of group *i*. These simulations were like the original simulations but had a density equal to the intensity estimation for the test sample. This process is shown in **Figure 4**.

3. Finally, we again used simulation-based envelopes to test for differences in the spatial distributions of the thinned simulations and the experimentally observed sample. We used 99 simulations to estimate the theoretical mean value of the L function for the RSA*ij* model. We used the other 99 to calculate the maximum absolute difference from this theoretical mean value, which is necessary to build the envelope.

#### **3. RESULTS**

We obtained 25 samples from the six layers of the somatosensory cortex of three 14-day-old rats by FIB/SEM microscopy. We had a total reconstructed tissue volume of approximately 4500 μm<sup>3</sup> containing almost 4000 3D reconstructions of synapses. For each of these synapses, we had information on its 3D position (center of gravity or centroid) and an estimate of its size based on Feret's diameter. We obtained the density of each sample, that is, the number of synapses per unit volume, and the mean density for each layer (**Table 1**).

#### **3.1. SYNAPSE DENSITY IN DIFFERENT LAYERS**

The density of the samples range from 0.382 synapses/μm<sup>3</sup> in a sample of layer VI to 1.382 synapses/μm<sup>3</sup> in a sample of layer II. The overall mean density is 0.870 synapses/μm<sup>3</sup> in all layers. See **Table 1** for details. As shown in **Figure 5**, the mean density of layer I is 0.794 synapses/μm3, whereas layers II and III have mean densities of 1.098 and 0.940 synapses/μm<sup>3</sup> respectively, which increases up to the maximum mean density of 1.222 synapses/μm<sup>3</sup> in layer IV and then drops again in layer V (0.828 synapses/μm3) down to the minimum mean density in layer VI, 0.466 synapses/μm3.

Following the simulation and mean comparison process described in Section 2.2.1, we looked for significant differences between the densities of the different layers of the somatosensory cortex. Using the Kruskal-Wallis test we found that there were differences between the density of layers (*p*-value ≤ 2.2 × 10<sup>−</sup>16), which is consistent with a recent work (Crandall, 2013). Pairwise comparisons revealed that there was no significant difference between the densities of layers I vs. V or between the densities of layers II vs. III.

Apart from density analysis, one of the first steps often performed to explore the spatial distribution of a spatial pattern is to obtain the distance to the nearest neighbor. So, in addition to the location and Feret's diameters of synapses of each sample, which were on average 404.73 nm, we measured the distance of each synapse to its nearest synapse. The mean distances to nearest neighbor measured between centroids of synaptic junctions ranged from 533.78 nm in a sample of layer II to 794.63 nm in a sample of layer VI, and the overall mean distance to the nearest synapse was 641.58 nm. This information is shown in **Table 2**. Using the Kruskal-Wallis test we found that there were significant differences between the distances to the nearest synapse between layers of the somatosensory cortex (*p*-value ≤ 2.2 × 10<sup>−</sup>16). We

**FIGURE 4 | Diagram of the random thinning process for three groups of replicated point patterns, A, B, and C, for which the Diggle test did not find significant differences.** Our goal is to check if these groups are differentially thinned versions of a common underlying RSA process. Random thinning of *dense* simulations is performed for each experimentally observed sample *j* in each group *i* (test sample, shown in blue). Random thinning

continues until we reach the intensity λˆ*ij*, estimated from all samples in group *i* excluding sample *j*. Then, for each experimentally observed sample *j* in each group *i*, we used simulation-based envelopes to test for differences in the spatial distributions of the thinned RSA simulations and the sample (we used 99 thinned simulations to estimate the *L* function for the RSA*ij* model and the other 99 to calculate the maximum deviation necessary to build the envelope).

**FIGURE 5 | (Left) Mean synaptic density of the six layers of the somatosensory cortex.** The synaptic density of the six layers is significantly different. However, we found no significant differences between the densities of layers I vs. V or between the densities of layers II vs. III. **(Right)**

Mean distance to nearest synapse for each layer. Nearest synapse distances are significantly different in the six layers of the somatosensory cortex, but we found no significant differences between distances of layers I vs. V, I vs. VI, II vs. III, and III vs. V.


**Table 2 | Mean distances from a synapse to its nearest neighbor and mean Feret's diameters.**

*Nearest neighbor distances are measured between centroids of synaptic junctions. Feret's diameters are an estimate of the size of synaptic junctions (diameter of the smallest sphere circumscribing each junction).*

applied the Mann-Whitney test and adjusted the *p*-values using the Bonferroni method for pair-wise comparisons. There were no significant differences for layers I vs. V, I vs. VI, II vs. III, and III vs. V. Notice that we found no differences between the synaptic densities of layers I vs. V and II vs. III either (see **Figure 5**).

#### **3.2. MODELING OF SPATIAL POINT PROCESSES**

A recent paper (Merchán-Pérez et al., 2014) analyzed the threedimensional spatial distribution of synapses in the somatosensory cortex. Merchán-Pérez and colleagues adjusted CSR and RSA models showing that RSA processes modeled the synaptic distribution more adequately. However, this study was limited to layer III of the somatosensory cortex. We extend this analysis to all layers of the cortex here.

To test the null hypothesis of RSA we used simulation-based envelopes. **Figure 3** shows the envelopes of the first sample of each layer of the somatosensory cortex (the envelopes for all samples are shown in the Supplementary material). The averages of the *L* functions of 99 RSA simulations performed for each sample are represented in green. The shaded area is a region of constant width 2*wmax*. The width *wmax* was calculated with a separate set of 99 RSA simulations as described in Section 2.2.2 using the **spatstat** package. The dashed red lines show the theoretical value for CSR for visual comparison only.

The null hypothesis is rejected if the *L* function of the experimentally observed sample (blue) lies outside the envelope for any value of distance *d*. The *L* functions of samples 2 and 7 from layer III and sample 2 from layer IV were very close to the upper boundary of the envelope at a distance of about *d* =300 nm but did not lie outside the envelope. The remaining samples were completely within the envelope for all values of *d*. So, we did not reject the RSA model for any of the 25 analyzed samples.

#### **3.3. REPLICATED SPATIAL POINT PATTERNS**

Taking advantage of the fact that we had several samples of each layer of the somatosensory cortex, we used replicated spatial point patterns in order to detect similarities and differences between groups. Because we had seen that synaptic densities between layers of the somatosensory cortex were different, we used the *K* function because it does not depend on intensity. We aggregated the *K* functions of each group using the number of synapses, as explained in Section 2.2.3 [*wij* = *nij*, Equation (5)] (Diggle et al., 1991; Diggle, 2013).

As discussed, we performed the Diggle test to compare different groups of *K* functions (Diggle et al., 1991, 2000). The first step was to check whether there were any differences between the three animals. We applied the Diggle test to *g* = 3 groups of sizes *m*<sup>1</sup> = 12, *m*<sup>2</sup> = 9 and *m*<sup>3</sup> = 4 and obtained a *p*-value = 0.724. Thus, we did not detect differences between animals in the study. **Figure 6** shows the aggregated *K* and *L* functions for each of the three animals. After ruling out differences between animals, we studied whether there were differences in the synaptic distribution between layers.

Considering each layer of the cortex as a group of replicates, we calculated the aggregated *L* function of each group transforming the aggregated *K* function of the group [Equation (5)]. **Figure 7** shows the *L* function of each observed sample in each layer as dashed blue lines, the aggregated *L* function of each layer in dark blue and the average of 99 RSA simulations fitting the RSA model for all the samples of the layer in green. We calculated the parameters λˆ*i*, μˆ *<sup>i</sup>*, and σˆ*<sup>i</sup>* of the RSA*<sup>i</sup>* model for each layer *i*, *i* = I,..., VI, calculating the volume-weighted average of the parameters λ*ij* of each sample *j* in layer *i* and fitting the lognormal distribution of Feret's diameters using all synapses in this layer. **Figure 7** also shows the envelope obtained using a separate set of 99 RSA simulations with the same parameters, as explained in Section 2.2.2. For visual comparison, we added the theoretical *L* function for a random pattern (dashed red diagonal). Because all the aggregated *L* functions were within the boundaries of the envelopes, we did not reject the RSA model for any layer of the somatosensory cortex.

Applying the Diggle test for *g* = 6 groups of sizes *m*<sup>1</sup> = 2, *m*<sup>2</sup> = 3, *m*<sup>3</sup> = 10, *m*<sup>4</sup> = 3, *m*<sup>5</sup> = 3, and *m*<sup>6</sup> = 4, we obtained

**FIGURE 6 | Aggregated** *K* **and** *L* **functions for each animal.** The Diggle test found no significant differences between the three animals used in the study.

a *p*-value of 0.002. Thus, we could conclude that there were differences between the six layers of the cortex. To better understand synaptic spatial distribution, we applied the Diggle test six times with *g* = 2 groups, each time forming a group with the *K* functions of all samples of one layer and the other group with the *K* function of all samples of the remaining layers. In this analysis, the group of samples from layer I was the only one significantly different from the other group (samples from layers II to VI) with a *p*-value of 0.009. The Diggle test found no significant differences between groups of replicates formed by layers II to VI (*g* = 5, *p*-value = 0.1176). Moreover, the Diggle test found no significant differences between the distribution of samples from layers II to VI in pair-wise comparisons of these layers. **Figure 8** shows the aggregated *K* and *L* functions of all six layers (the two identified groups are shaded differently, i.e., layer I in green and layers II to VI in violet). Layer I functions are slightly shifted to the right compared to the other layers, so the repulsion in the spatial distribution of its synapses appears to be greater.

In Section 3.1 we saw that layers of the somatosensory cortex did not have a common synaptic density, so we wanted to find out whether we had different thinned versions of a common underlying process in layers from II to VI (Diggle, 2013). We did this analysis introducing for the first time in this context a crossvalidation technique to honestly estimate the goodness-of-fit of the resulting models.

With the simulation and thinning process described in Section 2.2.3, we performed 198 *dense* RSA*global* simulations with a volume of 300μm<sup>3</sup> and a density of 1.4 synapses/μm<sup>3</sup> (λ*global* = 1.4, a density greater than the density of any of the samples), i.e., each RSA*global* simulation had 420 synapses. For each sample *j* (test sample) in group *i* (we had a group consisting of layers II to VI), we calculated the synaptic density of its RSA*ij* model using the remaining samples of the same layer [Equation (4)]. **Table 3** shows the estimated intensity λˆ*ij* for each experimental sample. For each sample, we randomly thinned each of the 198 *dense* RSA*global* simulations until they had the estimated intensity λˆ*ij*. The sizes of the simulated synapses were calculated using the lognormal distribution fitted using Feret's diameters of all samples of the group. **Table 3** also shows these parameters. Note that μ*global* and σ*global* are equal because all these layers were modeled as a common RSA*global* process. **Figure 9** shows one *dense* RSA*global*

(shown in different shades of violet). Layer I (green) is significantly different from other layers.

simulation for the group of layers II to VI and two thinned RSA simulations for two different samples in the study.

We validated the RSA*ij* model with the test sample *i* using simulation-based envelopes. To do this, we used the function *envelope.pp3* included in the **spatstat** package. The *L* functions of sample 7 from layer III and sample 2 from layer IV touched the upper boundary of the envelope slightly at distances around 200–300 nm but did not lie outside the envelope. However, sample 1 from layer IV did lie just outside the envelope at distances around 300–400 nm (envelopes for all RSA*ij* models are shown in Supplementary material). The remaining samples were completely within the envelope. Thus, for all 23 samples in layers II to VI, except for only sample 1 in layer IV, we did not reject the null hypothesis of RSA, i.e., we validated the hypothesis that the synaptic distribution of layers II to VI of the somatosensory cortex are different thinned versions of a common underlying RSA process.

#### **4. DISCUSSION**

Historically, spatial point processes have been more related to applications in which data collection tended to be costly (e.g., forestry). For this reason, the study of several independent samples as realizations of the same process was not usually considered. Recently, the field of replicated point patterns is growing strongly since technological advances have simplified sampling, particularly 3D sampling. In fact, much of the research on replicated point patterns is related to biological issues, including applications to neuroanatomical data (Diggle et al., 1991, 2000; Baddeley et al., 1993; Wager et al., 2004; Jafari-Mamaghani et al., 2010; Burguet et al., 2011; Myllymäki et al., 2012; Burguet and Andrey, 2014). Indeed, neuroanatomical data in the form of spatial point patterns is fundamental for revealing the spatial architecture of the different brain regions at all levels of analysis, from light


**Table 3 | Estimated intensity** λ**ˆ***ij* **for samples in layer II to VI using only the remaining samples of the same layer [Equation (4)].**

μ*global and* σ*global parameters are the same because layers II to VI form a group, and they were obtained using Feret's diameters of all samples of the group. We thinned RSAglobal simulations modeled with* λ*global* =*1.4 and parameters* μ*global and* σ*global until we reached the estimated intensity* λˆ*ij for each sample.*

microscopy (e.g., spatial distribution of neurons) to electron microscopy (e.g., spatial distribution of synapses). In this paper, we performed an analysis in the context of replicated point patterns by exploiting the fact that we have been able to obtain a relatively large number of samples containing the spatial distribution of synapses in the neuropil from several layers of the rat cerebral cortex. Using the Diggle test (Diggle et al., 1991, 2000) we detected groups of replicates (groups of patterns considered as instances of the same process) whose spatial distribution was found not to be significantly different. Then we modeled these groups using a global RSA replicated spatial point process. In order to collect and explain the variability in each group's synaptic density, we introduced a thinning procedure in the global model. To honestly estimate the goodness-of-fit of the resulting models, we used for the first time in this context a cross-validation technique for models within each group of replicates.

Our results confirm the assumption that the spatial distribution of synaptic junctions in the neuropil is nearly random, with the only constraint that synapses cannot overlap in space—a scenario that can be modeled by an RSA process. This model had already been suggested for layer III synapses (Merchán-Pérez et al., 2014) and is now extended to all neocortical layers. We found that the spatial distribution of synapses in all samples of each layer can be described by RSA processes. We also found that the spatial distribution of synapses in the neuropil of layers II to VI follows a common underlying RSA process with different synaptic densities. Interestingly, the results showed that the synaptic spatial distribution in layer I is slightly different than in other layers, suggesting that, although an RSA process suitably fits layer I synaptic distribution, the repulsion in the spatial distribution of synapses in this layer is slightly higher than in the other layers.

Since the synaptic density in the cerebral cortex changes with age, e.g., Rakic et al. (1986, 1994); Bourgeois and Rakic (1993); DeFelipe et al. (1997), and we used P-14 rats, the conclusion of this study regarding spatial distribution may not be applicable at other time points during development. Note, however, that the spatial distribution of synapses follows the same pattern in different cortical layers in spite of significant differences in their synaptic densities. Furthermore, our preliminary results in the adult human cerebral cortex also suggest that the spatial distribution of synapses is nearly random (Blazquez-Llorca et al., 2013). Therefore, random spatial distribution of synapses is probably a common general pattern of cortical synaptic organization. Nevertheless, further studies in other cortical areas, species and ages would be necessary to verify these conclusions.

The assumption that the distribution of synapses in the neuropil of layers I to VI follows an RSA model with different intensities (synaptic densities) per layer has several interesting implications. First, the position of a given synapse in the neuropil is practically independent of the position of neighboring synapses, so they can be arbitrarily close to one another with the only physical constraint that they cannot overlap. Second, the density of synapses varies by layers and also locally. Importantly, early studies of the cerebral cortex proposed that the density of synapses was relatively constant throughout the cortical layers, as well as across different cortical areas and different species. This uniformity in synaptic density led O'Kusky and Colonnier (1982) to propose that it probably reflects the optimal number of synapses and that it may be due to some limiting metabolic or structural factor. However, most comparisons were only qualitative and not based on statistical analyses. It now appears that, using appropriate stereological counting methods (disector or size-frequency methods; see DeFelipe et al., 1999), there are significant differences in the estimated number of synapses per volume between certain layers in several species (reviewed in DeFelipe et al., 2002a). In this study, we also found using FIB/SEM that there may be significant differences between certain cortical layers. This method has the advantage that it provides the actual number of synapses per volume instead of estimations based on the analysis of single electron microscope images (Merchán-Pérez et al., 2009).

Our results showed no significant differences in the synaptic distribution between the different rats used in the study, and RSA processes properly described the spatial distribution of synapses in all cortical layers. This argues in favor of a common general

principle of synaptic organization. However, the mean density of synapses across the six layers was significantly different, with the exception of layers I vs. V and layers II vs. III. This is an important observation in terms of connectivity, as these differences or similarities in density of synapses between layers may provide us with some fundamental rules to generate hypothetical circuits in order to gain a better understanding of cortical organization. This also means that, due to physical constraints, the volume of the neuropil that the dendritic tree of a given neuron occupies may vary depending of the density of neurons in the layer where this neuron is located. In turn, its chances of establishing synapses would be greater the more neuropil volume it occupies. This idea was put forward by Von Economo (1926) in his interpretation of Nissl's observation in terms of the evolutionary significance of the differences between species in cortical neuronal density (Nissl, 1898). Nissl observed that "in the mole and dog, cortical neurons were more crowded than in man." Von Economo proposed that the greater separation between neurons the richer the fiber plexus between them will be, increasing the chance for neuronal interactions. Thus, the larger separation of neurons in humans compared to other species could be construed as a sign of a greater complexity of the connections between neurons. Using this approach, several authors have identified an inverse relationship in the adult cerebral cortex between neuronal density and the number of synapses per neuron in different cortical areas/layers/species, but this principle does not appear to be generally applicable (DeFelipe et al., 2002a). Since in this study we found no significant differences in the density of synapses in layer I vs. V—the density of neurons in layer V is much greater than in layer I—, or between layer II vs. III—the density of neurons in layer III is much less than in layer II (work in preparation)—, this principle does not appear to be applicable to the 14-days-old rat somatosensory cortex either. In this regard it is important to keep in mind that the dendrites present in the neuropil of a given layer belong to both local neurons and neurons located below and above that layer, as dendrites, of pyramidal cells particularly, may cross several layers during their ascending course toward layer I, whereas their basal dendrites may invade the layer underneath, respectively. It follows that the number of synapses that a given neuron receives cannot be predicted solely on the basis of the synaptic density of the layer in which it is located.

Finally, the application of FIB/SEM to analyze the neuropil also revealed the existence of local variability in the synaptic density within each layer. This local variability would be the product of mere chance and can be explained (and modeled) by RSA processes. The between-layers variability, however, cannot be put down to chance, except possibly for the differences between layers I and V and between layers II and III. This would imply, as previously suggested (Merchán-Pérez et al., 2014), that spatial specificity in the neocortex is scale dependent. It is well known that at the macroscopic and mesoscopic scales the mammalian nervous system is a highly ordered and stereotyped structure, where connections are established in a highly specific and ordered way, like, for example, the connecting pathways of the visual system. Even at the microscopic level, it is clear that different areas and layers of the cortex receive specific inputs (Nieuwenhuys, 1994). At the ultrastructural level, however, our results seem to indicate that number and distribution of synapses follow a nearly random pattern. This could mean that, as the axon terminals reach their destination, the spatial resolution that they achieve is fine enough to find a specific cortical layer but not to make a synapse on a smaller target, such as a specific dendritic branch or dendritic spine within that layer. For example, axon terminals from a certain thalamic nucleus reach specific areas and layers of the cerebral cortex but, once there, they would form synapses randomly among their possible targets to a greater or lesser extent depending on particular classes of the postsynaptic neurons. For instance, studies by White and colleagues performed on the mouse somatosensory cortex found a specificity of synaptic connections by combining anterograde degeneration of thalamic axonal fibers with the retrograde transport of horseradish peroxidase to identify the projection sites of pyramidal cells (revised in White, 1989). They examined at the electron microscope level pyramidal cells projecting to ipsilateral cortical areas, to the thalamus and to the striatum and they found that each of these populations of pyramidal cells receives a characteristic proportion of their layer IV dendritic synapses from thalamocortical axon terminals. Corticothalamic cells receive the greatest number of thalamocortical synapses, corticocortical cells the next highest number, and corticostriatal cells the least. Therefore, at the synaptic scale, the specificity of connections would rely not on spatial cues but on other mechanisms such as molecular or activity-dependent cues.

#### **ACKNOWLEDGMENTS**

This work was supported by grants from the following entities: the Spanish Ministry of Economy and Competitiveness (grants TIN2013-41592-P to Laura Anton-Sanchez, Concha Bielza, and Pedro Larrañaga, BFU2012-34963 to Javier DeFelipe, and SAF 2010-18218), CIBERNED CB06/05/0066 to Javier DeFelipe, the Cajal Blue Brain Project (Spanish partner of the Blue Brain Project initiative from EPFL) to Laura Anton-Sanchez, Concha Bielza, Angel Merchán-Pérez, José-Rodrigo Rodríguez, Javier DeFelipe, and Pedro Larrañaga, and the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 604102 (Human Brain Project) to Concha Bielza, Angel Merchán-Pérez, Javier DeFelipe, and Pedro Larrañaga.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnana. 2014.00085/abstract

#### **REFERENCES**


versus non-specificity of synaptic connections. Remarks, main conclusions and general comments and discussion. *J. Neurocytol.* 31, 387–416. doi: 10.1023/A:1024142513991


Ripley, B. D. (1977). Modelling spatial patterns. *J. R. Stat. Soc. B* 39, 172–212.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 02 August 2014; published online: 26 August 2014. Citation: Anton-Sanchez L, Bielza C, Merchán-Pérez A, Rodríguez J-R, DeFelipe J and Larrañaga P (2014) Three-dimensional distribution of cortical synapses: a replicated point pattern-based analysis. Front. Neuroanat. 8:85. doi: 10.3389/fnana.2014.00085 This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Anton-Sanchez, Bielza, Merchán-Pérez, Rodríguez, DeFelipe and Larrañaga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The patterning of retinal horizontal cells: normalizing the regularity index enhances the detection of genomic linkage

#### *Patrick W. Keeley1,2 and Benjamin E. Reese1,3 \**

<sup>1</sup> Neuroscience Research Institute, University of California at Santa Barbara, Santa Barbara, CA, USA

<sup>2</sup> Department of Molecular, Cellular and Developmental Biology, University of California at Santa Barbara, Santa Barbara, CA, USA

<sup>3</sup> Department of Psychological and Brain Sciences, University of California at Santa Barbara, Santa Barbara, CA, USA

#### *Edited by:*

Julian Budd, University of Sussex, UK

*Reviewed by:*

Stephen Eglen, University of Cambridge, UK Peter Gerard Fuerst, University of Idaho, USA

#### *\*Correspondence:*

Benjamin E. Reese, Neuroscience Research Institute, University of California at Santa Barbara, Santa Barbara, CA 93106-5060, USA e-mail: breese@psych.ucsb.edu

Retinal neurons are often arranged as non-random distributions called "mosaics," as their somata minimize proximity to neighboring cells of the same type.The horizontal cells serve as an example of such a mosaic, but little is known about the developmental mechanisms that underlie their patterning.To identify genes involved in this process, we have used three different spatial statistics to assess the patterning of the horizontal cell mosaic across a panel of genetically distinct recombinant inbred strains. To avoid the confounding effect of cell density, which varies twofold across these different strains, we computed the "real/random regularity ratio," expressing the regularity of a mosaic relative to a randomly distributed simulation of similarly sized cells. To test whether this latter statistic better reflects the variation in biological processes that contribute to horizontal cell spacing, we subsequently compared the genomic linkage for each of these two traits, the regularity index, and the real/random regularity ratio, each computed from the distribution of nearest neighbor (NN) distances and from the Voronoi domain (VD) areas. Finally, we compared each of these analyses with another index of patterning, the packing factor. Variation in the regularity indexes, as well as their real/random regularity ratios, and the packing factor, mapped quantitative trait loci to the distal ends of Chromosomes 1 and 14. For the NN and VD analyses, we found that the degree of linkage was greater when using the real/random regularity ratio rather than the respective regularity index. Using informatic resources, we narrowed the list of prospective genes positioned at these two intervals to a small collection of six genes that warrant further investigation to determine their potential role in shaping the patterning of the horizontal cell mosaic.

**Keywords: nearest neighbor, Voronoi domain, packing factor, retinal mosaic, QTL, recombinant inbred strain, haplotype, principal component analysis**

#### **INTRODUCTION**

The organizing principles by which neurons of a given type are distributed within a structure in the central nervous system have gone largely unexplored. The retina is the primary exception to this, where neuronal populations have been shown to be arranged in non-random distributions known as "mosaics" (Wässle and Riemann, 1978). The patterning present in these mosaics arises from local interactions between neighboring cells of the same type that prohibit close proximity, and can be simulated using minimal distance spacing rules constraining random distributions of cells (Eglen, 2006). While the biological processes that underlie these spacing rules have been elucidated, and may vary depending upon the type of neuron (Reese and Keeley, 2014), the molecular mechanisms responsible for their execution have only recently been addressed (Kay et al., 2012).

Several spatial statistics have been employed to study the orderliness of such retinal mosaics, including the analysis of nearest neighbor (NN) distances and Voronoi domain (VD) areas (Cook, 1996; Galli-Resta et al., 1997). The frequency distribution of these measures for many orderly retinal mosaics approximates a Gaussian distribution, whereas those derived from random simulations of cells have a more Poisson distribution. One commonly used shorthand for describing the "regularity" in such orderly distributions has been to determine the mean NN distance or VD area within a sampled field and divide it by the SD (Wässle and Riemann,1978; Raven and Reese,2002). Commonly described as the "regularity index," such computed ratios will be larger for Gaussian distributions relative to density-matched random distributions. In this manner, real retinal mosaics have been shown to be more regular than random distributions, and the magnitude of the regularity index is assumed to have some biological relevance for the orderliness in such mosaics.

We have argued elsewhere that the degree of regularity in a retinal mosaic should be assessed relative to a density-matched random distribution of similarly sized cells, rather than to a random distribution of points (Reese and Keeley, 2014), because the physical size of the cells constrains spatial positioning. As either density or soma size increases, so does the degree of regularity achieved by a random distribution of cells, and it is the difference from such a random simulation that should be critical for understanding the processes contributing to the formation of regularity. In the present study, we have explored this relationship explicitly, analyzing the regularity indexes of the mosaic of horizontal cells in the mouse retina across three strains of mice, the C57BL/6J strain, the A/J strain, and their F1 cross, each of which varies in the density of these cells. We show that by normalizing each regularity index relative to density-matched random distributions constrained by soma size, achieved by computing a "real/random regularity ratio," we enhance the differences between the strains that should more acutely reflect the biological processes contributing to this patterning.

A recent study demonstrated that variation in the NN regularity index for the mosaic of cholinergic amacrine cells across a panel of 25 genetically distinct recombinant inbred mouse strains can be mapped to a discrete genomic locus. This suggests that the regularity index of a mosaic reflects a biological process or processes at work. Indeed, that study identified a candidate genetic contributor that, when rendered non-functional, reduced the mosaic regularity of the neuronal population (Keeley et al., 2014b). In the present study, we examined the NN regularity index and VD regularity index of the population of horizontal cells across this same panel of recombinant inbred strains. We then sought validation of the above normalization procedure for the regularity index, asking whether the real/random regularity ratio showed heightened linkage to the variation in genotype across the strains. Finally, we compared the results to another measure of spatial patterning, the "packing factor" (Rodieck, 1991). Using all three measures, we demonstrate that the variation in the patterning of horizontal cells is associated with two genomic loci on Chromosomes (Chrs) 1 and 14.

#### **MATERIALS AND METHODS**

Adult retinas, between 1 and 3 months of age, were examined from the following strains: the C57BL/6J (B6/J hereafter) and A/J parental strains, the B6AF1 cross, and 25 strains of the AXB/BXA recombinant inbred strain-set. The data for these strains were derived from digitized images from a previous study examining the variation in horizontal cell number across these strains; details of the tissue harvesting, immunofluorescence, and microscopy are provided therein (Whitney et al., 2011). All retinal tissues harvested from mice were collected in accord with AVMA guidelines and under authorization by the Institutional Animal Use and Care committee at the University of California, Santa Barbara.

Each retina was sampled at four central and four peripheral locations surrounding the optic nerve head (i.e., two fields in each retinal quadrant). The sampled fields were 225,802 sq. μm in area, with an aspect ratio of 1:1.25, and had a total number of horizontal cells ranging from 93 to 413, depending upon the strain (mean = 205). The X,Y coordinates of every calbindin-positive horizontal cell were determined, from which we computed the NN distance and VD area for every cell in each field, excluding those cells along the border with uncertain NN distances or VD areas. The regularity index for each statistic was calculated for each field by dividing the mean NN distance or VD area by the SD. The eight regularity indexes for a given retina were then averaged to produce the average regularity index for a given animal (sampling only one retina per mouse), with multiple animals being sampled for each strain. The number of mice sampled in each strain is indicated in the histograms. For each real field, a random field, being 225,625 sq. μm in area (475 μm × 475 μm), matched in density and constrained by average soma diameter (9.1 ± 0.7 μm; mean ± SD), was generated and similarly analyzed, to permit a direct comparison to the regularity index that would be achieved from a random distribution of horizontal cells of the same density. The real/random regularity ratio was computed by dividing the regularity index for a given mouse by the average regularity index of its densitymatched random simulations. Additionally, for each sampled field, the packing factor was calculated. A value on a bounded scale between 0 and 1, the packing factor describes the extent to which a mosaic approximates a hexagonal lattice, with a value of 0 representing a random mosaic of dimensionless points and a value of 1 representing a perfect lattice. The packing factor was calculated by dividing the effective radius derived from the density recovery profile by the theoretical maximum radius that could be achieved by a lattice of the same density (Rodieck, 1991).

The variation in the regularity indexes, the real/random regularity ratios, and the packing factor across the recombinant inbred and parental strains was mapped to the variation in strain haplotype across the genome using the simple interval mapping tool of GeneNetwork1, yielding a likelihood ratio statistic (LRS) for assessing linkage between phenotype and genotype. GeneNetwork computes 2000 permutations of the strain data to compute suggestive (*p* < 0.63) and significant (*p* < 0.05) thresholds for the LRS as another means of assessing the relative probabilities that any quantitative trait locus (QTL) contains a causal gene contributing to the variation in each trait. GeneNetwork also computes 2000 bootstrap tests to determine the relative robustness of each QTL detected. Principal component analysis (PCA) was also performed in GeneNetwork to determine the eigenvector that best accounts for the variance across the NN real/random regularity ratio, VD real/random regularity ratio, and packing factor; before performing the PCA, GeneNetwork normalized the data such that the distribution of each trait had a mean of zero and a SD of one. The first principal component derived from this analysis was then used as a novel quantitative trait that was subsequently mapped. All datasets were deposited in the AXB/BXA phenotypes database of GeneNetwork under accession ID #10282 (horizontal cells, nearest neighbor regularity index), #10283 (horizontal cells,Voronoi domain regularity index), #10288 (horizontal cells, nearest neighbor real/random regularity ratio), #10289 (horizontal cells, Voronoi domain real/random regularity ratio), #10291 (horizontal cells, packing factor), and #10292 (horizontal cells, patterning PCA). All positional data are relative to the NCBI37/mm9 build of the mouse genome.

#### **RESULTS**

The population of horizontal cells exhibits substantial variation across different mouse strains, showing a nearly twofold variation in number. The B6/J strain contains nearly twice as many horizontal cells as does the A/J strain, while their F1 cross (B6AF1) falls almost equally between them (Raven et al., 2005a). **Figure 1** illustrates sample fields taken from each of these three strains,

<sup>1</sup>http://www.genenetwork.org

along with density- and size-matched random simulations for direct comparison. It is immediately apparent that the patterning in the real mosaics is distinct from those random simulations, their cells being more regularly distributed. Less obvious from the sample fields is any difference in their regularity, but if we compute the NN regularity index for multiple retinas from each strain, they appear to differ, with the lowest density A/J strain being the most regular, having an average NN regularity index of 5.30, followed by the B6AF1 strain, having a slightly lower regularity index of 5.00, while the B6/J strain, containing the greatest density of horizontal cells, having the lowest regularity index of 4.56 (**Figure 2A**). A similar difference was seen for the VD regularity index, as the A/J, B6AF1, and B6/J strains achieved average regularity indexes of 6.14, 5.62, and 5.15, respectively (**Figure 2B**).

matched random distributions constrained by soma size, are shown for

direct comparison. Calibration bar = 100 μm.

The regularity indexes for density-matched, soma-size constrained, random simulations are also shown, for each of the same mice, in **Figures 2A,B** as de-saturated colored symbols. These random simulations have regularity indexes extending from ∼2.0 to 3.0, being more regular than theoretical random distributions associated with dimensionless points (Cook, 1996). Note though that these random distributions differ between the strains, climbing as a function of increasing cell density, due to the space-occupying nature of the cells (although to a lesser extent for the VD analysis). As a consequence, the differences in the patterning between the strains, relative to what they would achieve were they randomly distributed, must be under-appreciated when comparing their regularity indexes alone, particularly for the NN analysis. If, however, we normalize each regularity index by taking into consideration

this density-dependency of the random simulations (dividing the former by the latter to compute the real/random regularity ratio), the strain differences become more conspicuous (**Figures 2C,D**).

To examine further the utility of this real/random regularity ratio, we have computed the regularity indexes for real and simulated random fields for each of the 25 recombinant inbred strains of mice of the AXB/BXA strain-set. The means and SE for each strain are summarized in **Table 1**. Their regularity indexes vary from 4.54 to 5.59 for the NN analysis and from 5.02 to 6.12 for the VD analysis (**Figures 3A,B**), with some strains having regularity indexes higher than the parental A/J strain or lower than the parental B6/J strain. As indicated above, however, these strains also vary in their average horizontal cell densities, and so the real magnitude of the differences in these regularity indexes, relative to random distributions of cells, is obscured. If those regularity indexes are normalized, as above, by computing the real/random regularity ratio (**Figures 3C,D**; **Table 1**), a conspicuous change is revealed in the strain distribution pattern, indicated by the arrows linking the bars in the histograms in **Figure 3**. This is most readily apparent by considering the relative positioning of the parental A/J strain in the NN analysis (green bar in **Figures 3A,C**), which has migrated to the higher extreme of this strain distribution pattern, there being one other strain with a higher real/random regularity ratio (the BXA2 strain). In short, the normalization has led to a re-ordering of the strains according to how regular they are relative to their density-matched random simulations.

While **Figures 2A,B** might suggest that the difference in regularity is related to density, **Figures 4A,B** show no such relationship between the NN or VD regularity index and average horizontal cell density across this entire collection of strains. While there are slight negative relationships between the regularity indexes and density, they are non-significant (NN regularity index vs. density, *r* = –0.36, *p* = 0.06; VD regularity index vs. density, *r* = – 0.33, *p* = 0.08). Many strains with identical average densities have conspicuously different regularity indexes, making clear that the differences in regularity index between the parental and F1 strains shown in **Figure 2** should be independent of their differences in density. Such variation in regularity should reflect differences in the effectiveness by which horizontal cells space themselves apart. Normalizing the regularity index yields a reordering of the strains (**Figures 3C,D**) that should better portend the actions of biological processes underlying this variation in the regularity of the horizontal cell mosaic.

Further validation of this view is provided by mapping the variation in regularity across the 25 strains to the variation in haplotype composition across their genomes. **Figure 5A** shows the resultant whole genome map for the NN regularity index, indicating the presence of two QTL that each surpass the suggestive threshold defined by permutation testing (gray horizontal dashed line in **Figure 5A**), positioned at the distal ends of Chrs 1 and 14. At each of these loci, the presence of *A* alleles is associated with an increase in the NN regularity index. The QTL on Chr 1 is associated with an LRS score of 18.67, just below the significant threshold (pink horizontal dashed line in **Figure 5A**). The QTL on Chr 14 is associated with an LRS score

of 13.51. **Figure 5B**, by contrast, shows the whole genome map produced using the NN real/random regularity ratio. Variation in this ratio trait also maps to the same pair of loci, but the locus on Chr 1 is now associated with an LRS score of 23.26 that far surpasses the significant threshold, the latter having declined slightly, both a consequence of the strain reordering. Minimal changes were observed with the linkage to the locus on Chr 14.

Variation in the VD regularity index across these strains also mapped to these same genomic loci on Chrs 1 and 14, with the presence of *A* alleles at each locus being associated with an increase in VD regularity index; the QTL on Chr 14, however, had an LRS score that was now higher than the QTL on Chr 1, the latter failing to reach the suggestive threshold (**Figure 5C**). Mapping the variation in the VD real/random regularity ratio increased the LRS scores for each of these QTL (from 13.04 to 14.41 for the peak on Chr 14; from 10.35 to 12.32 for the peak on Chr 1), elevating that on Chr 1 to surpass the suggestive threshold (**Figure 5D**). These results suggest that the real/random regularity ratio may be an effective means for comparing regularity across strains that exhibit a large variation in density.

The packing factor is another measure of spatial patterning, describing how well a mosaic approximates a hexagonal lattice, which takes into consideration the density of each field. We

therefore asked whether the variation in packing factor across the recombinant inbred strains would correlate to either of the regularity indexes or regularity ratios, and whether this variation would map to the same genomic loci or might identify entirely novel ones. The packing factor, like the regularity indexes, varied between the parental strains, with the A/J strain have a higher degree of packing than the B6/J strain (0.35 vs. 0.32), as well as across the recombinant inbred strains, ranging from 0.31 to 0.36 (**Figure 6A**). The packing factor across the strains was positively, and significantly, correlated to both regularity indexes and ratios, though to a greater extent for the regularity ratios (**Figure 6B**; *r* = 0.69, *p* = 2.3 × 10−<sup>5</sup> for NN real/random regularity ratio; and **Figure 6C**; *r* = 0.70, *p* = 1.3 × 10−<sup>5</sup> for VD real/random regularity ratio), although this association was not as great as the correlation between the two regularity ratios themselves (*r* = 0.83, *p* = 4.3 × 10<sup>−</sup>9). This variation in packing factor across the recombinant inbred strains mapped to the same genomic locus on Chr 14, associated with an LRS score of 16.45 approaching the significant threshold (**Figure 6D**). A hint of the locus on Chr 1 was also detected, although the LRS score at this locus did not pass the suggestive threshold. As before, the presence of *A* alleles at each of these loci was associated with an increase in the packing factor trait.


**Table 1 |The averages and SE for the two regularity indexes, the two regularity ratios, and the packing factor for the two parental strains, the F1 strain, and the 25 recombinant inbred strains.**

The difference in this map and those achieved by using the regularity ratios as traits might suggest that the spatial measures of regularity vs. packing are each modulated somewhat independently by distinctive biological processes. Yet given the high degree of correlation between these three traits (the NN and VD regularity ratios and the packing factor), we wondered whether the differences in these whole genome maps reflected the actions of a few unusual strains; the BXA4 strain, for example, had the lowest packing factor of any strain analyzed, yet had regularity ratios that were in the top quarter of all strains (**Figures 6B,C**). We consequently performed PCA to determine if a single component could account for most of the variance observed across the three traits. Indeed, the first principal component accounted for over 80% of the total variance (**Figure 7A**), with each of the three traits contributing equally. The distribution of recombinant inbred strains along the first principal component is shown in **Figure 7B**. Whole

genome mapping of this new quantitative trait revealed the two previously elucidated QTL on Chr 1 and Chr 14, with LRS scores of 18.15 and 16.00, respectively, with the former QTL crossing the significant threshold and the latter falling slightly beneath (**Figure 7C**). We conclude that both loci contain genetic variants that contribute to the difference in horizontal cell patterning across the recombinant inbred and parental strains, but that there is no basis for concluding that the two loci modulate distinctive biological processes, for instance, one that modulates local spacing vs. another than coordinates patterning across larger distances.

It is of further interest to note that most of the variation observed across the recombinant inbred strains could be attributed to the magnitude of the effects at these two loci, regardless of which trait was analyzed. For instance, the additive effect of *A* alleles at the loci on Chrs 1 and 14 upon the NN regularity ratio was 0.25 and 0.19, respectively (e.g.,

**Figure 5B**), their summed effects equaling 79% of the range in this ratio across the strains (**Figure 3C**; **Table 2**). The same summed additive effects for the VD regularity ratio equaled 73%, while that for the packing factor equaled 71% (**Table 2**). Such a large proportion of the variation in all three traits being attributed to only two genomic loci may explain the somewhat step-like (rather than smooth) progression in trait values across the strains shown in **Figures 3C,D** and **6A**, as well as that observed for the first principal component in **Figure 7B**.

A list of all candidate protein-coding genes at these two loci is presented in **Table 3**. For each gene, a summary of protein structure and function was obtained from the Uniprot database2, while genetic variants present between the two parental strains were obtained from the Wellcome Trust Sanger Institute's Mouse Genomes Project3. Top candidates had known functional roles in cell-to-cell communication, cytoskeletal rearrangement or transcriptional regulation, as well as genetic variants in regulatory regions (such as upstream, downstream, splicing, or untranslated regions), which may alter gene expression, and/or in protein-coding regions, which may affect protein sequence and function. The QTL on Chr 1 is extremely narrow, and of the genes at this locus, *Esrrg* and *Ush2a* were the most compelling (**Figure 7D**). *Esrrg* belongs to a family of constitutively active nuclear receptors that modulate transcription

2http://www.uniprot.org/

3http://www.sanger.ac.uk/resources/mouse/genomes/

of Chr 14. The yellow bars indicate the bootstrap analysis (being the proportion of bootstrap samples mapping to a given locus), assessing the robustness of the mapping to any genomic locus. **(B)** Whole genome map for the NN real-random regularity ratio, with all conventions as in **(A)**. Note that the LRS score associated with the peak of the QTL on Chr 1 has increased, while the significant threshold has declined. **(C,D)** Whole genome maps (following the conventions as in **A**) for the VD regularity index and VD regularity ratio reveal the same two QTL on Chrs 1 and 14. Note that the LRS scores associated with each QTL increased after mapping the regularity ratio.

by binding to estrogen response elements, and is expressed in the nervous system, including the retina, throughout development (Hermans-Borgmeyer et al., 2000). *Ush2a* encodes the protein Usherin, mutations in which can lead to Usher syndrome, a developmental disease affecting both the visual and auditory pathways (Liu et al., 2007). Usherin is a single-pass transmembrane protein with a large extracellular domain that contains many fibronectin- and laminin-like domains, bearing similarity to Megf10 and Megf11, both of which have been shown to affect horizontal cell regularity (Kay et al., 2012). While *Ush2a* is thought to be expressed solely in photoreceptor cells in adult retinas, it is also expressed in prenatal development, potentially in developing horizontal cells. The QTL on Chr 14 encompasses a larger interval: top candidates include *Farp1, Dock9, Zic2,* and *Itgbl1* (**Figure 7E**). *Farp1* and *Dock9* both play a role in cytoskeletal dynamics, and could be potential regulators of horizontal cell movement, a suggested mechanism for achieving mosaic regularity. *Zic2* is expressed in the embryonic and early postnatal retina and has been shown to affect various processes, including retinal ganglion cell axon pathfinding and progenitor cell proliferation (Herrera et al., 2003; Watabe et al., 2011). Finally, *Itgbl1* encodes a secreted integrin-like protein that has several EGF-like repeat domains, which resemble those of the Megf proteins, although little is known about this protein aside from these structure similarities.

#### **DISCUSSION**

The non-random distribution of like-type neurons within a structure has been considered a defining feature of neuronal populations (Cook, 2003), yet the molecular mechanisms that establish these "mosaics" are relatively unknown. Spatial statistics such as the regularity index and the packing factor can be used to describe the orderliness of these distributions. By treating these statistics as quantifiable traits, one can map such variation in spatial patterning across mouse strains to the variation in haplotype structure of their genomes. This, in turn, facilitates the pursuit of candidate genes and their variants that regulate cell patterning. The present study has adopted this approach, revealing two distinct genomic loci on Chrs 1 and 14 that control the patterning of the horizontal cell mosaic.

The regularity index, being either the mean NN distance or VD area divided by the SD of those values, has a lower bound defined by a theoretical random distribution of dimensionless points, but has no upper bound, as the variance in either measure approaches zero. While it effectively describes the spatial order of a two-dimensional point pattern, it has generally been used to indicate simply that a mosaic is more regular than a random point pattern. Beyond this, it has seen little comparative application beyond demonstrations that the regularity index is altered by experimental or genetic manipulations. Interpreting such alterations, of course, requires a consideration of whether

the manipulation also changes the number of elements in the mosaic, a common variable following such perturbations, a variable also observed across different strains of mice (Keeley et al., 2014a).

**Figure 5A**. Both previously identified QTL, on Chrs 1 and 14, have high LRS scores, with the former passing the significant threshold and the latter

In the present study, we consider directly the role that density plays upon constraining a random simulation of horizontal cells, and how computing the regularity index without taking this into consideration underestimates the differences in regularity between mosaics. Indeed, we go on to show that, by calculating the real-to-random regularity ratio, we map more robust QTL with stronger linkage on Chrs 1 and 14. These results would suggest that the real/random regularity ratio, be it derived from

14, which were narrowed down to six top candidate genes based on

bioinformatic analysis.



<sup>a</sup>From*Table 1*.

<sup>b</sup>From *Figures 5B,D* and *6D*.

the NN analysis or the VD analysis, more acutely discriminates strains by the actions of biological processes that space cells apart.

#### **THE REAL/RANDOM REGULARITY RATIO**

This transformation of the data, creating the real/random regularity ratio, is not without its caveats and limitations. For instance, this ratio will by necessity change as a function of development: before the retina has achieved its adult size, but after the cells have approached their mature diameters, so the regularity index of random simulations for these denser mosaics in smaller (younger) retinas will be larger, yielding lower real/random regularity ratios relative to more mature retinas, even when the patterning of the real distributions does not differ (e.g., see Raven and Reese, 2002, for a comparison of real distributions vs. random simulations of the horizontal cell mosaic in B6/J at 3 weeks of age). Likewise, some experimental or genetic manipulations that alter the patterning of a retinal mosaic also affect somal size substantially (Cantrup et al., 2012). In such instances, particularly where cell density and retinal area change as well, the calculation of the real/random regularity ratio would likely provide little additional insight into understanding the factors controlling nerve cell spacing.

In the present study, we have compared the patterning of horizontal cells across mature mouse retinas that show little variation in retinal area but conspicuous twofold variation in cell number, consequently yielding large differences in horizontal cell density (the slight differences in retinal area across the strains, like the slight differences in age, show no significant correlation with density, regularity index, regularity ratio nor packing factor). While horizontal cells are notoriously plastic (Poché and Reese, 2009), and can hypertrophy to an excess of twice their normal somal area, for example, in the absence of *Pten* (Cantrup et al., 2012), they do not exhibit any change in soma size across the present strains (**Figure 1**). We have, consequently, carried out this transformation of the regularity index where only cell density varies across the strains. That this transformation might be meaningful for understanding the biology of mosaic order

is suggested by the reordering of the strains, yielding stronger linkage between phenotype with genotype (**Figure 5**). While it is true that the QTL on Chrs 1 and 14 were detected without transforming the data in this manner, in one of the cases (the QTL on Chr 1 for VD regularity index), it was sub-threshold and likely would have gone unexamined in the absence of the other analyses. The present results, therefore, would substantiate the principle that where density varies conspicuously in the absence of other differences in the population, correcting for this effect of density upon spacing should enhance detection of genomic linkage.

Across the recombinant inbred strains, computing the regularity ratio had a greater effect on reordering the strains in the NN analysis than the VD analysis (**Figure 3**), since random simulations for the latter measure varied less across fields with different densities (**Figure 2**). This fact, due to the relatively greater constraining effect of soma size upon the linear NN measure than upon the areal VD measure, along with the high correlation between the two measures of regularity, might lead one to believe that the VD regularity index is a sufficient and complete measure of mosaic patterning. However, despite the more modest changes in strain order between the VD regularity index and regularity ratio, the ratio statistic still increased the strength of linkage at each QTL. And while both regularity index were correlated, the QTL on Chromosome 1 was most prominent using the NN spatial statistic. These results suggest that both the NN and VD regularity ratios provide valuable information about horizontal cell regularity, and should be used in a complementary fashion.

#### **THE PACKING FACTOR**

We also compared the results obtained using the real/random regularity ratio with those from the packing factor analysis (Rodieck, 1991), which also varied across the strain-set. One attraction of the packing factor is that it has both a lower and an upper bound, ranging from 0 (a random distribution of dimensionless points) to 1 (being a perfect hexagonal matrix). Another is that this measure is normalized for density, as it is the ratio of the effective radius to the maximal radius permissible for a hexagonal lattice of identical density (Rodieck, 1991). We found that the variation in packing factor across the strains showed the strongest linkage on Chr 14, whereas the QTL on Chr 1 failed to cross the suggestive threshold. The packing factor, therefore, did not completely recapitulate the same genome maps that were generated using the real/random regularity ratios, particularly that for the NN analysis. It is important to bear in mind that the packing factor is not a measure of regularity (Rodieck, 1991); rather, it is a distinctive measure of patterning, and therefore might be indicative of distinctive biological processes that contribute to nerve cell patterning. Because the horizontal cell mosaic in the mouse retina does not evidence the presence of higher order patterning as found in a lattice, even with substantial jitter (Reese and Keeley, 2014), and since the packing factor was so strongly correlated with either regularity ratio, the present results would suggest that the measures of regularity and packing are assessing similar qualities present in these real mosaics. The PCA would support this conclusion. By using this PCA to reduce the dimensionality of the data, a single



trait emerges, capturing most of the variation in patterning across the recombinant inbred and parental strains. Mapping the variation in this principal component, we recapitulated both genomic loci, each showing strong linkage (evidenced through permutation testing) and reproducibility (evidenced by bootstrap testing). Each locus must contain genetic variants that affect the patterning of horizontal cells, presumably by modulating their intercellular spacing at the local level.

#### **GENETIC VARIANTS MODULATE HORIZONTAL CELL PATTERNING INDEPENDENT OF DENSITY**

The regularity of the horizontal cell mosaic is not significantly correlated with the variation in horizontal cell density, and the mapping of the variation in horizontal cell patterning does not coincide with the genomic locus on Chr 13 mapped by the variation in horizontal cell number or density (Whitney et al., 2011). Horizontal cells are known to interact with one another as they assemble their mosaics and initiate their differentiation (Raven et al., 2005b; Poché et al., 2008; Huckfeldt et al., 2009), and they use Megf10 and Megf11 to drive that assembly into regular distributions (Kay et al., 2012). The present QTL mapping exercise indicates that genetic variants on Chrs 1 and 14 should modulate this process. Of the candidate genes at these loci, the gene at the very peak of the QTL on Chr 1, *Ush2a*, is notable because of the structural similarities of the Usherin protein to the Megf proteins. While it is well-known for its structural role in the connecting cilium of developing photoreceptors, it is expressed in developing retina at very early stages (Huang et al., 2002), during the period when horizontal cells are being generated, and well before outer segment formation. Whether it is expressed transiently in developing horizontal cells remains to be seen. Additionally, the guanine nucleotide exchange factor Farp1 is a particularly intriguing candidate at the peak of the QTL on Chr 14 due to its role in dendritic morphogenesis (Cheadle and Biederer, 2012), as well as its interaction with Sema6A and PlexA4 proteins (Zhuang et al., 2009), both shown to be involved in horizontal cell development (Matsuoka et al., 2012). Future studies will examine the candidates at these two genomic loci in further detail to understand the genetic and molecular control of horizontal cell patterning in the developing retina.

#### **AUTHOR CONTRIBUTIONS**

PatrickW. Keeley and Benjamin E. Reese designed research; Patrick W. Keeley performed research. Patrick W. Keeley and Benjamin E. Reese analyzed data; Patrick W. Keeley and Benjamin E. Reese wrote the paper.

#### **ACKNOWLEDGMENTS**

We thank Christopher Do and Léa Tran-Le for assistance in computing the spatial statistics. This research was supported by a grant from the NIH (EY-019968).

#### **REFERENCES**

Cantrup, R., Dixit, R., Palmesino, E., Bonfield, S., Shaker, T., Tachibana, N., et al. (2012). Cell-type specific roles for PTEN in establishing a functional retinal architecture. *PLoS ONE* 7:e32795. doi: 10.1371/journal.pone.0032795


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; accepted: 22 October 2014; published online: 21 October 2014. Citation: Keeley PW and Reese BE (2014) The patterning of retinal horizontal cells: normalizing the regularity index enhances the detection of genomic linkage. Front. Neuroanat. 8:113. doi: 10.3389/fnana.2014.00113*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Keeley and Reese. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Statistical analysis and data mining of digital reconstructions of dendritic morphologies

#### *Sridevi Polavaram , Todd A. Gillette , Ruchi Parekh and Giorgio A. Ascoli\**

*Department of Molecular Neuroscience, Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA, USA*

#### *Edited by:*

*Hermann Cuntz, Ernst Strüngmann Institute in Cooperation with Max Planck Society, Germany*

#### *Reviewed by:*

*Corinne Teeter, Allen Institute for Brain Science, USA Marcel Oberlaender, Max Planck Institute for Biological Cybernetics, Germany*

#### *\*Correspondence:*

*Giorgio A. Ascoli, Molecular Neuroscience Department, Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study, George Mason University, MSN 2A1, 4400 University Dr. Fairfax, VA 22030, USA e-mail: ascoli@gmu.edu*

Neuronal morphology is diverse among animal species, developmental stages, brain regions, and cell types. The geometry of individual neurons also varies substantially even within the same cell class. Moreover, specific histological, imaging, and reconstruction methodologies can differentially affect morphometric measures. The quantitative characterization of neuronal arbors is necessary for in-depth understanding of the structure-function relationship in nervous systems. The large collection of community-contributed digitally reconstructed neurons available at NeuroMorpho.Org constitutes a "big data" research opportunity for neuroscience discovery beyond the approaches typically pursued in single laboratories. To illustrate these potential and related challenges, we present a database-wide statistical analysis of dendritic arbors enabling the quantification of major morphological similarities and differences across broadly adopted metadata categories. Furthermore, we adopt a complementary unsupervised approach based on clustering and dimensionality reduction to identify the main morphological parameters leading to the most statistically informative structural classification. We find that specific combinations of measures related to branching density, overall size, tortuosity, bifurcation angles, arbor flatness, and topological asymmetry can capture anatomically and functionally relevant features of dendritic trees. The reported results only represent a small fraction of the relationships available for data exploration and hypothesis testing enabled by sharing of digital morphological reconstructions.

**Keywords: L-Measure (RRID:nif-0000-00003), NeuroMorpho.Org (RRID:nif-0000-00006), neuroinformatics, dendritic topology, cluster analysis, cellular neuroanatomy**

#### **INTRODUCTION**

The diversity of neuronal morphologies can have broad and profound functional consequences in the nervous system, which have only begun to be understood. Dendritic geometry directly impacts (and mediates) computational processes such as signal integration, coincidence detection, and logical operations (London and Häusser, 2005). The location, orientation, and shape of neural arbors enable (and strongly affect) network connectivity, providing the anatomical substrate to investigate structure-function relationship at the circuitry level (Shepherd and Svoboda, 2005; Briggman and Denk, 2006; Kajiwara et al., 2008; Weiler et al., 2008; Burgalossi et al., 2011; Ropireddy and Ascoli, 2011; Brown et al., 2012). These areas of scientific investigation apply to the morphological differences observed both within and between neuron types across animal species, developmental stages, and brain regions (**Figure 1**).

Three-dimensional digital reconstructions of axonal and dendritic arbors, combined with neuroinformatics tools and computational approaches, allow considerable opportunities for data processing, analysis, and modeling at both cellular- and systems-level (Parekh and Ascoli, 2013). The open availability of these reconstructions in databases such as NeuroMorpho.Org (**Figure 2**) enables re-analysis of shared data (Ascoli, 2007). As of version 5.6, the repository contained over 10,000 reconstructions contributed by 120 laboratories from 21 species, 85 brain regions and 123 cell types, representing more than 240,000 hours of manual tracing. NeuroMorpho.Org users can browse the data by animal species, brain region, cell type, and contributing lab. The "search by" option can be used to select and combine specific metadata criteria (**Figure 2**, left panel top) from a drop-down menu of categories such as developmental stage, experimental condition, and reconstruction method. The morphometry search functionality (**Figure 2**, left panel bottom) allows users to find neurons matching any combination of more than 20 morphometric criteria. From the resulting summary list of neurons (**Figure 2**, middle panel), individual pages for each reconstruction can be retrieved, thus displaying related metadata, a link to the associated publication, and the pre-computed morphometrics (**Figure 2**, right panel). Each reconstruction is downloadable as the standardized version along with the original contributed version. The log files detailing the changes made during the standardization process are available for download as well. From the individual neuron pages, users can also launch an animation file and an interactive 3D viewer.

Quantitative morphometry of neuronal reconstructions is often used for shape analysis (Uylings and van Pelt, 2002;

**FIGURE 1 | Sample of NeuroMorpho.Org reconstructions representing the anatomical diversity of dendritic and axonal trees.** Each image is labeled (clockwise from its right side) with the somatic brain region, neuron types, total arbor length, and arbor width. Somata: red; axons:

gray; (basal) dendrites: green; apical dendrites: magenta. NeuroMorpho.Org IDs of these neurons from left to right: 06787, 04183, 04457, 06312, 05713, 04477, 00779, 06216, 00777, 05491, 00888, 06904, 06141, 06295, 07707, 07763, 00690, 00606.


Alternatively, reconstructions can be selected by a morphometric search (**left**

large scale statistical analyses of available data. L-Measure computes simple statistics of morphometric features as well as their frequency distribution and inter-dependence (e.g., how arbor length varies with path distance from the soma). This tool has been used in a broad range of applications, including multidimensional analysis of neuronal shape (Costa et al., 2010; Zawadzki et al., 2012) and comparative studies of sensory neurons in the fly (Ting et al., 2014) and of respiratory neurons in the pre-Bötzinger complex (Koizumi et al., 2013). In conjunction with L-Neuron (Ascoli and Krichmar, 2000), L-Measure has also been employed to generate and validate a large-scale model of the dentate gyrus with half a million neurons (Schneider et al., 2012). L-Measure has also enabled analysis of non-neuronal arbors such as arterial vasculature (Wright et al., 2013), and was integrated into other digital reconstruction and analysis systems, such as the

individual neuron page **(right panel)**.

Van Ooyen et al., 2002; Rocchi et al., 2007), also in conjunction with biologically-inspired computational simulations (Ascoli et al., 2001; Van Ooyen, 2011). For example, statistical distribution of morphological features are used in stochastic growth algorithms for generating virtual trees (Van Pelt et al., 1997; Donohue and Ascoli, 2008; Koene et al., 2009; Evans and Polavaram, 2013; Memelli et al., 2013). Moreover, statistical analyses of neuronal reconstructions facilitate and support theoretical investigations. These studies for instance provided evidence for optimal wiring principles of neuronal arbors (Wen and Chklovskii, 2008) and their power law distributions, which may relate to synaptic input sampling (Lee and Stevens, 2007; Snider et al., 2010; Teeter and Stevens, 2011; Cuntz et al., 2012).

This study uses the L-Measure software tool (Scorcioni et al., 2008) to extract morphometric data from neuronal arbors for Farsight toolkit (http://farsight-toolkit*.*org) and Vaa3D (https:// code*.*google*.*com/p/vaa3d).

With the first successes in high-throughput automatic digital neuronal tracing (Chiang et al., 2011) and overall increasing volumes of published and shared reconstructions (Halavi et al., 2012), "big data" opportunities for knowledge mining are starting to emerge. On the one hand, this increasing availability of shared data may foster remarkable discoveries. On the other, the heterogeneous source of data and disparate experimental conditions also pose non-trivial challenges to database-wide analyses. As a step toward large database analysis, here we utilize exploratory data analysis to quantify morphological similarities and differences across broadly diverse dendritic arbors. In the process, we recognize several critical limitations when pooling together widely non-uniform data sets. Consequently, we propose selection criteria and methodological choices aimed to maximize the potential biological relevance of the results. With such a research design, dimensionality reduction and unsupervised clustering reveal tentative morphological relationships between specific neuron types involving branching density, topology, size, and tortuosity. At the same time, we identify the most delicate factors in both data and metadata that must be considered to optimize the impact of future large-scale morphological investigations.

#### **METHODS**

#### **SELECTION OF DATASETS AND MORPHOMETRIC FEATURES FOR ANALYSIS**

The entire pool of 10,004 reconstructions downloaded from NeuroMorpho.Org v5.6 was screened for a pre-determined set of inclusion criteria to improve interpretability of the results. Specifically, in order to be considered for analysis, digital neuron reconstructions had to (a) belong to the "control" experimental condition; (b) contain at least four dendritic bifurcations; (c) include branch-path information and not just bifurcation connectivity; and (d) have non-zero depth range (eliminating two-dimensional tracings). The 7,143 reconstructions matching these characteristics were analyzed by their NeuroMorpho.Org metadata assignments to specific animal species, brain region, and cell type. Subsequently, for the purpose of cluster analysis chi-square testing (see below), groups of fewer than 40 neurons in any metadata combination of species, brain region, cell type, and lab of origin were excluded to ensure sufficient statistical power (Yates et al., 1999). This further selection reduced the number of reconstructions to 5,099, divided into 45 unique metadata groups.

Because of the major differences between axonal and dendritic morphology, and the remarkable abundance of reconstructed dendrites relative to axons, only dendritic arbors were included in this study. Focusing on a more consistent and comparable dataset allows addressing more biologically relevant questions. Moreover, this choice reduces the errors due to incomplete reconstructions, which are considerably more severe for projection axons than for dendrites.

L-Measure allows extraction of approximately 100 distinct features from each neuron (see http://cng*.*gmu*.*edu:8080/Lm for full listing and detailed definitions). Of these, all measures related to branch diameter were excluded due to their strong dependence on imaging resolution, optical magnification, and other experimental details causing excessive inter-laboratory variability (Scorcioni et al., 2004). All other features were subjected to cross-correlation analysis, and those with correlation greater than 80% were sequentially eliminated one at a time (re-running the cross-correlation at each step) as they were considered highly redundant with the rest of the features. This selection left 27 features (**Table 1**) that were used for the remainder of the analysis. Dendritic arbor size measures consisted of total length, number of tips, height, width, and depth. Bifurcation measures included average partition asymmetry as well as amplitude, tilt, and torque angles measured locally with the adjacent tracing points or remotely with the preceding and following bifurcations or terminations. Branch measures consisted of length, tortuosity, and fractal dimension. Lastly, local measures included branch order, terminal degree, path distance from soma, and helicity.

#### **Table 1 | Coefficients of variation of all L-Measure derived morphometric features.**


*A detailed description of each metric is provided at http:// cng.gmu.edu:8080/ Lm/ help/ index.htm.*

#### **PRINCIPAL COMPONENT ANALYSIS (PCA) AND CLUSTER ANALYSIS**

In order to reduce the dimensionality of the morphometric space for unsupervised clustering, PCA was run on the feature dataset using the "*prcomp*" routine in R (v. 2.15.1). This transformation rotates all extracted measures (27 features for 5,099 arbors) such that the first dimensions in the new space capture the most variance (in decreasing order). Prior to PCA, all features were normalized by their respective standard deviations, and the features with absolute skewness greater than unity (17/27) were log-transformed. Negatively skewed distributions were inverted and distributions with negative values were shifted prior to logtransformation. These steps ensure an approximately normal distribution of the input features, as assumed by PCA and subsequent clustering. The resulting first 17 components, accounting for 95% of the variance, were retained for cluster analysis.

Next, the dendritic arbors were clustered based on their principal morphometric components to seek a shape-based classification independent of *a priori* metadata grouping. We selected a model-based approach, in which mixtures of Gaussians are found that together have maximal likelihood to fit the data. A cluster is the collection of arbors that are most likely to come from the same multivariate Gaussian (referred to as a cluster model). We used the R "*MCLUST*" package (Farley and Raftery, 2006) for estimating optimal model parameters and selecting the most likely model type given the dataset. The model types include spherical, ellipsoidal (with a diagonal covariance matrix), and ellipsoidal with orientation (indicating correlation between dimensions). This flexibility makes model-based clustering a more suitable choice than other popular methods (e.g., K-means) for analysis of heterogeneous data sets collated from different experiments, labs, and conditions. Not only are clusters not limited to fit spherically symmetric distributions, but also each cluster is allowed to have its own distinct variance, shape, and orientation.

MCLUST implements Expectation Maximization (EM) to select models using the Bayesian information criterion (BIC). The BIC computes the log likelihood of the cluster model, but includes a penalty for the number of parameters weighted by the log of the dataset size. Thus, goodness of fit is balanced against model simplicity according to the following equation, whereby the largest value determines the best model:

$$\text{BIC} = -2 \cdot \ln \widehat{L} + k \cdot \ln \left( n \right) \tag{1}$$

Here, -*L* is the maximized likelihood computed on the marginal likelihood P(y|Mi) of the candidate model Mi given the observed data y (y1*,...* yn); *k* is the number of free parameters to be estimated; and *n* is the number of data points.

The specification of MCLUST model types and parameters is coded by three letters in each of three positions. The three positions represent the model size, shape, and orientation variables, respectively. Letter "E" indicates that the parameters are equivalent across all clusters, "V" signifies variable parameter values, and "I" denotes that the corresponding parameter is not applicable. For example, "EII" indicates spherical Gaussians (no shape or orientation) with equal size among clusters, which corresponds to the traditional K-means method. Similarly, the "VVV" model type indicates varying size, shape, and orientation parameters. This latter model was determined by EM to be optimal for the data analyzed here despite its greater BIC cost implied by the larger number of free parameters. Thus, EM provides information theory-derived evidence that the performance of simple uniform spherical (K-means-like) clustering is sub-optimal for the data used in this study.

Cluster distances from the center of coordinates were measured by Z score to account for relative variance. Pairwise cluster distances were computed as the distances between the corresponding centers normalized by the cluster scatters, which are defined as averaged distance of the cluster points from the respective cluster center (Dunn, 1973). The associations among clusters and metadata groups were assessed using the chi-square test of independence, using the (marginal) frequencies of group and cluster occurrences to calculate the expected association matrix, and computing the Bonferroni-corrected *p*-values of the observed co-occurrences from the standardized residuals.

#### **RESULTS**

#### **VARIABILITY OF DENDRITIC MORPHOLOGY AND COMPARISON BY METADATA**

To quantify the heterogeneity of the data, we computed the coefficient of variation (CV) for each of the 27 measured features over the entire set of 7,143 neurons as well as over the subset of 5,099 neurons used in cluster analysis (**Table 1**). Tortuosity, fractal dimension, and tilt angle are the least variable features, with a CV of less than 10%. In contrast, size measures are the most variable, with a CV close to or greater than unity. This apparent distinction between "local" (branch-level) vs. "global" (neuron-level) features may reflect both the effect of biological constraints (e.g., varying dimensions of different species from insects to human) and experimental conditions (slice vs. wholeanimal preparations). Most other metrics display intermediate CV values.

Dendritic morphologies were then compared across species, cell types, and brain regions. The corresponding metadata information for each reconstruction in NeuroMorpho.Org was organized hierarchically (**Figure 3**), forming groups with a sufficient number of neurons to enable statistical comparison of the results (at least 55 for species, 300 for brain regions, and 100 for cell types). Groups with fewer reconstructions were combined into "others" together with the reconstructions missing the detailed metadata information at the corresponding level of the hierarchy (marked as "not reported" in NeuroMorpho.Org).

The "leaf" nodes in each of the three metadata hierarchies (12 for species, 14 for brain regions, and 10 for cell types) were compared with a selection of representative morphometric features (**Figure 4**). In a limited set of cases, individual groups could be distinguished from the rest or from each other. For example, blowfly and cat reconstructions stood out against the neurons of all other species for their large topological asymmetry and Z span, respectively. The dendritic arbors of magnopyramidal cells tended to have extensive total length but low fractal dimension, whereas granule cells displayed opposite characteristics. At the same time, most groups show extensive overlap of their morphometric variance, preventing firm statistical conclusions. Part of the reason for such broad distributions is likely due to the non-uniform

nature of archive-wide data sets pooled together across different experiments and laboratories. It is also clear that these metadata dimensions are not mutually independent because of evolutionary constraints (e.g., bony fishes lack a neocortex) and the finite sample of reconstructions (e.g., all monostratified ganglion cells came from the mouse retina). More generally, while popular in comparative anatomy, such a pairwise approach lacks the ability to reveal multivariate effects that are unavoidable given the nonrandom association between metadata groups and experimental conditions.

#### **EXTRACTING PRIMARY MORPHOLOGICAL FEATURES BY PCA AND CLUSTER MODELS**

In order to overcome the above limitations, we adopted an unsupervised clustering approach following dimensionality reduction with PCA. The first step is to reduce the initial parameter space to fewer orthogonal dimensions capturing most of the data variability. In mathematical terms, PCA identifies the linearly independent combinations of variables ordered by the amount of variance they explain. From the (27) original morphometric features, the first 17 dimensions of PCA covered 95% of the data variance and were used for cluster analysis.

The first 6 of these principal components were responsible for three quarters of the variance and displayed distinctive compositions of their primary morphometric features (**Table 2**). Identifying the heaviest contributors in the linear combination of morphometric features of each principal component ("loadings") is useful to aid subsequent interpretation of the results. The first component (PC1) is positively loaded on bifurcation angles and negatively on branch path length, reflecting high branching density. The morphometric features most descriptive of PC2 and PC3 are respectively overall size and branch tortuosity. Together, the first three components capture the majority of the data variance. The simplest morphological descriptors of PC4, PC5, and PC6 are arbor flatness (related to torque angle), fractal dimension (or "space filling"), and topological asymmetry (the average normalized sub-tree partition at bifurcation points), respectively.

In order to produce the most informative statistical model, unsupervised clustering selects the optimal number of clusters as well as their parameters, by maximizing the BIC. These data were

**features within each main metadata dimension.** Crosshairs represent medians and quartile ranges of each group corresponding to the leaf nodes in the hierarchies shown in **Figure 3**. Dotted lines indicate "other" groups with merged data. **(A)** Differences in arbor depth and topological asymmetry among species. **(B)** Differences in arbor width and average bifurcation angle among brain regions. **(C)** Differences in fractal dimension and total arbor length among cell types.

best fit to six clusters with varying size, shape, and orientation (**Figure 5**). The numerical difference between this model and the variant with constant cluster shape, however, was minimal (and is undetectable in **Figure 5A**). The same model type, moreover, performed nearly as well with five or seven clusters as indicated by the absence of a clear peak in the BIC plot. We experimented with these alternative model variant and numbers of clusters and found no substantial differences in findings. At the same time, the data were *not* adequately described by traditional spherical clusters, even if with unequal sizes (**Figure 5A**).

Since six clusters correspond to the maximum value for both top model types, we selected this number as the most suitable for exploratory analysis. Such a choice, nevertheless, should not be taken to reflect a ground truth that only six **Table 2 | Primary morphometric loading (with absolute values of 0.25 or higher) of the first six principal components of the dendritic arbors used in cluster analysis.**


"true" classes exist within the data. This selection simply maximizes the inter-similarity of co-clustered classes relative to classes in other clusters given the scope, size, quality, and composition of the available dataset. To determine if further differences exist between classes that associate with the same cluster, it would be appropriate to run the same analysis on a subset of the data (sub-clustering). This additional analysis, however, requires larger datasets to meet the selection criteria based on a minimum number of reconstructions in each dataset.

The two-dimensional data projection on the first and second components illustrates the relative discrimination of clusters by branching density and arbor size (**Figure 5B**). Cluster ranking by variance-normalized distance from the center of coordinates (**Figure 5C**) allows for focused analysis on clusters farther from the origin (*a–d*), and thus morphologically distinctive, relative to those closer (*e* and *f*) to the origin. The six clusters contain respectively 585 (*a*), 1488 (*b*), 762 (*c*), 555 (*d*), 818 (*e*), and 891 (*f*) reconstructions. Pairwise distances (**Figure 5D**) reveal that one and the same cluster (*b*) is both the farthest from (*a*) and closest to (*e*) than to other clusters.

ellipsoidal clusters. Among those, those accounting for unequal orientation (EEV, VEV, and VVV) performed better, especially with unequal size (VEV and VVV). The highest BIC value was attained at 6 clusters with varying size,

#### **STATISTICAL ASSOCIATIONS BETWEEN CLUSTERS AND METADATA COMBINATIONS**

Unsupervised cluster models segregate neuronal reconstructions solely based on morphological features. This classification is thus complementary to, and independent of, the metadata associated with each reconstruction. The correspondence between the six morphological clusters and the 45 unique metadata groups characterized by species, brain region, neuron type, and lab of origin can shed light on the most important morphometric signatures of each metadata group. The 45-by-6 chisquare contingency matrix (**Table 3**) reports the probabilities that the observed over-representation and under-representations of associations between morphological clusters and metadata groups would be due to chance if the observed numerical compositions of each cluster and group were independent of each other. For example (first data row in **Table 3**), pyramidal neurons from mouse primary somatosensory cortex in Smit– Rigter's archive are significantly over-represented in cluster *a* (*p <* 0*.*0002 = 10−3*.*73) and significantly under-represented in cluster *<sup>b</sup>* (*<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*<sup>001</sup> <sup>=</sup> <sup>10</sup>−3*.*05). In contrast, the proportion of these same neurons in cluster *d* is within the range expected from the sizes of this metadata group and morphological cluster.

distances normalized by the corresponding scatters. Farthest distances are in

green and nearest are in red.

Interestingly, each and every metadata group is overrepresented in, and thus associated with, one of the six morphological clusters. The majority (38/45) are associated with exactly one cluster, and all of the remaining (7/45) are each split between just two clusters. Most possible metadata/cluster pairs deviated significantly from the random distribution expected from the "null hypothesis": 53 out of 270 were significantly overrepresented and 87 out of 270 significantly under-represented. This overall partition of metadata groups in distinct clusters constitutes a remarkable outcome for a fully unsupervised classification method. Certain metadata groups are over-represented in one morphological cluster and under-represented in all other clusters, such as ganglion cells from mouse retina in Masland's archive (cluster *a*) and pyramidal cells from human prefrontal cortex in Jacobs' archive (cluster *b*). Other metadata groups are over-represented in one morphological cluster, but otherwise scattered throughout all other clusters per the respective numerical abundance, such as pyramidal cells from monkey frontal lobe in Luebke's archive (cluster *d*)



*The Bonferroni adjusted p-values obtained by the chi-square test of independence are converted for ease of comparison into log*<sup>10</sup> *values, inverting the sign for overrepresented (green) cells. The color gradient shows the interaction strength. Non-significant (p > 0.05) associations are indicated with NS.*

and motoneurons from rat brainstem in Cameron's archive (cluster *e*).

Several observations can be made that transcend individual archive identities. All rodent retinal ganglion cell groups are associated with cluster *a*, whereas fish and salamander retinal ganglion cells groups are associated with cluster *f*. The relative cluster positions in the first two principal components and the corresponding morphological loadings (**Figure 5B** and **Table 2**) suggest that the retinal ganglion cells are larger and with denser branching in rodents than in non-mammals. Neocortex pyramidal cell groups are distributed across all clusters, with preference mostly based on species (most notably, human in *b*, rodents in *c*, and monkey in *d*). All rodent non-cortical and non-pyramidal cell groups are found in cluster*f* (along with salamander and fish retinal ganglion cells). Such metadata heterogeneity, together with this cluster's minimal distance from the morphological center (**Figure 5C**) suggests a putative "catch-all" role for cluster *f*, which makes it broadly representative of the whole dataset.

In several cases, the split of a metadata group into two morphological clusters reflects previously reported relations. For example, three groups of pyramidal neurons from the (anterior, middle, and posterior) human insular gyrus in the Jacobs' archive divided between clusters *b* and *e* according to structural differences related to the subject's gender (Anderson et al., 2009). Similarly, mouse primary somatosensory pyramidal cells are overrepresented in both clusters *a* and *d*, consistent with the reported differences between young and adult animals (Smit-Rigter et al., 2012). The grouping of neurons from younger mice with retinal ganglion cells (in cluster *a*) and from the older mice with pyramidal cells of larger mammals, such as monkey, elephant, and human (in cluster *d*), could be expected since the former groups are characterized by the shortest branch path length and the latter groups by the largest. The scattered clustering of pyramidal neurons, however, does not necessarily reflect existing biological relations, but might rather result from the combination of the choice of analysis algorithms, selection of parameters, and experimental differences.

The other splits of metadata groups between two clusters (**Table 3**) similarly revealed differences likely due to experimental procedures, such as staining protocol or slicing direction, which were not recognized in the original reports (Anderson et al., 1995; Soloway et al., 2002; Goldberg et al., 2003; MacLean et al., 2005; Nikolenko et al., 2007; Woodruff et al., 2009). For example, the separate clustering of different mouse S1 pyramidal cell datasets can be explained by the differences between intracellular biocytin injection (e.g., Yuste's archive) and bulk Golgi staining (e.g., Brumberg's archive). While the mechanisms underlying the different visualization by these techniques are not yet fully understood (Thomson and Armstrong, 2011), the histological labeling information is available as metadata in NeuroMorpho.Org, thus aiding interpretation.

A complementary way to examine the associations between morphological clusters and metadata groups is to systematically analyze the composition of each cluster in terms of its associated groups, broken down by fraction of group, fraction of cluster, and neuron count (**Table 4**). For example (first data row in **Table 4**), 33% of the mouse S1 pyramidal cells from the Smit–Rigter archive are in cluster *a*, accounting for only 3% of this cluster (17 out of 560 neurons). The sums of cluster fractions in **Table 4** correspond to the proportion of neurons in each cluster (e.g., 97% for cluster *a*) made up by the cluster's associated metadata groups (green entries in **Table 3**). The remaining portions of the clusters are composed of neurons falling outside of their associated cluster. Notably, the blowfly tangential cell group is associated with cluster *a*. Moreover, clusters *b* and *c* are exclusively associated with human pyramidal cell (in which only basal dendrites are reconstructed) and rodent neocortex cell groups respectively.

#### **PAIRWISE MORPHOMETRIC COMPARISONS OF NEURON GROUPS IDENTIFIED BY CLUSTER ANALYSIS**

Exploratory inspection of neuronal clusters in the 6-dimensional space of principal morphometric components together with the association between clusters and metadata groups (**Tables 3**, **4**) suggested closer inspection of specific morphological features in selected pairs of neuronal groups defined by their species, brain region, and cell type. The first example pertains to rodent retinal ganglion cells (**Figure 6**), which are characterized by high branching density and related morphological features (e.g., wide bifurcation angles). These neurons, pooled from mice and rats in four different archives, constitute 80% of cluster *a*, the farthest away from the center (**Figure 5C** and **Table 4**). At the opposite end along the first principal components is cluster *b*, entirely made of human pyramidal basal dendrites. Visual inspection (**Figure 6B**) reveals the distinctive shapes of rodent ganglion cells and human basal dendrites. Statistical analysis of the two main morphological loadings of PC1 (bifurcation amplitude and branch path length) confirmed the considerable difference between these two neuron groups, even when including those found in clusters other than *a* and *b* (**Figure 6C**).

The second most prominent group in cluster *a* is constituted by blowfly tangential sensory neurons. These neurons share with the rodent ganglion cells not only comparable branching density properties captured by PC1 (low branch path length and high bifurcation angle), but also similar distributions on PC2 through PC5 and all corresponding morphological features loading on those dimensions. These include measures of size (e.g., total dendritic length and spanned volume), of space filling (fractal dimension and tortuosity), and of arbor planarity (torque and tilt angles). Such tight alignment on the first five principal components along with the morphological co-clustering suggests a structural basis for the functional commonalities between blowfly tangential cells and retinal ganglion cells, both of which process motion-sensitive visual information (Kong et al., 2005; Cuntz et al., 2008).

Nevertheless, rotation on the sixth principal component exposed a surprising and nearly perfect separation between retinal ganglion cells and blowfly tangential cell (**Figure 6A**). Since the main morphological feature loading on PC6 is topological asymmetry (the average partition of terminal degree over all bifurcations), we compared the distribution of this measure between the two neuron classes (**Figure 6D**). This analysis demonstrated that blowfly tangential neurons have much more asymmetric bifurcations than ganglion cells (and most typical neurons). Interestingly, the data projection over the first and sixth principal components (**Figure 6A**) also suggested a linear relationship between topological asymmetry and branching density in rodent retinal ganglion cells but not in other groups. The Pearson correlation coefficients for branching density and asymmetry index (*R* = −0*.*50) and for bifurcation amplitude remote and asymmetry (*R* = 0.51) are both statistically highly significant (*p <* 10−10).

Rotating the data along the first and third principal components (related to branching density and tortuosity, respectively) revealed another distinct relationship across pyramidal cells from different species, brain regions, and developmental



*(Continued)*

#### **Table 4 | Continued**


*Associations between metadata groups and morphological clusters are quantified as fraction of the group, fraction of the cluster, and absolute neuron count of group/cluster intersection. Within cluster, groups are arranged in ascending order of the group fraction.*

(top), blowfly tangential cells (middle), and human basal pyramidal cells (bottom). NeuroMorpho.Org IDs of these neurons from left to right: 06464, 05352, 05405, 06652, 01895, 06640, 03723, 03724, 03722. **(C)** Rodent ganglion cells have larger amplitude angles compared to human basal pyramidal cells (and most other cell classes). **(D)** Rodent ganglion cells also display shorter branch length, corresponding to higher branching density. **(E)** The blowfly neurons, while sharing similar branch path length and amplitude angles with the retinal cells, have higher topological asymmetry.

stages (**Figure 7**). Specifically, neocortical pyramidal cells from rodents (clusters *c*) and primates (cluster *d*) display a trend of increasing branch tortuosity with increasing branch density (**Figure 7A**). Visual examination of morphologies selected from the corresponding clusters in the PC1-PC3 scatter plot demonstrates a correspondence in the increase of branch density and branch tortuosity (**Figure 7B**). The least tortuous trees, and many of the primate neurons, are noted to be incomplete reconstructions, in which only dendrites proximal to the soma are traced. In contrast, the dendrites of rodent neocortical pyramidal neurons tend to be fully reconstructed in both apical and basal arbors.

#### **CRITICAL ASSESSMENT OF POTENTIAL CONFOUNDS**

In the course of the iterative process of data inspection, hypothesis formulation, research design, and quantitative analysis, we encountered numerous challenges pertaining to data validation, curation, and standardization across labs. After a preliminary exploration of the entire content of NeuroMorpho.org, we decided to include in our study only approximately half of the available neurons. Specifically, we chose to avoid multi-lab analysis of axons, because of the extreme dependence of axonal morphology on experimental conditions. In our early analysis attempt that did not segregate axons from dendrites, biological findings became practically impossible to disentangle from major artifacts. This selection effectively defines a standard of minimal requirements for effectively comparing neural arbors.

Moreover, we excluded measures related to branch diameter (branch power ratios, surface areas, occupied volume, etc.) due to their strong sensitivity on the inter-laboratory variety of labeling or staining, imaging resolution or optical magnification, and other experimental details affecting tracing conditions (Scorcioni et al., 2004). Furthermore, most reconstructed cells originate from preparations in acute brain slices (*in vitro*). In the primary somatosensory region of rat neocortex (S1), this common preparation may result in trimming off more than 50% of the dendritic arbor (Oberlaender et al., 2012). These slicing artifacts impact larger brains to a greater extent, as reflected by the fact that human cells are only represented by basal dendrites. In addition to species differences, trimming effects also depend on animal age, slicing thickness and orientation, and the depth of electrode penetration in the tissue. For these reasons, when mining the cluster analysis results, we paid particular attention to only report findings as "biological" (**Figures 6**, **7**) that were not based on size or any morphometrics significantly affected by trimming artifacts. Instead, we identified correlations based on measures such as branching density, tortuosity, and branch angles, all of which have been previously found to be consistent between *in vitro* and *in vivo* preparations (Pyapali et al., 1998).

On the one hand, this judicious design allowed the independent reproduction of findings reported in prior publications. These included several cases of "split metadata groups" into two morphological clusters, which reflected structural differences related to the subject's gender (Anderson et al., 2009) or developmental stage (Smit-Rigter et al., 2012). On the other hand, experimental artifacts still contributed to clustering, and other splits of metadata groups between two clusters (**Table 3**) revealed differences likely due to staining protocol or slicing direction, which were not recognized or discussed in the original reports (Anderson et al., 1995; Soloway et al., 2002; Goldberg et al., 2003; MacLean et al., 2005; Nikolenko et al., 2007; Woodruff et al., 2009). Thus, database-wide analyses can reveal potential confounds not easily pinpointed in individual studies.

One of the most common artifacts of tissue processing is shrinkage, and this factor is also highly variable among labs. Shrinkage differentially affects the slice planar and perpendicular dimensions (the latter typically producing a larger effect).

**linear relationship between PC1 and PC3. (A)** The majority (71%) of cluster *c* consists of rodent cortical pyramidal cells, whereas a similar proportion of cluster *d* (72%) corresponds to primate pyramidal cells, which tend to be only partially reconstructed. **(B)** Sample images of incomplete primate pyramidal

cells in the top row (1–4) and rodent cortical pyramidal cells in the bottom (5–8). The numbers indicate their corresponding position in the cluster plot illustrating the progressive increase in branching density and tortuosity in both clusters. The NeuroMorpho.Org IDs of these neurons from left to right: 01821, 01526, 01627, 01623, 09630, 09474, 02569, 00266.

Thinner slices tend to shrink more and so do preparations from younger animals. The duration of the experimental procedure may also impact shrinkage, as do the bathing and embedding media. Shrinkage can be measured in all dimensions and it can therefore be compensated for by multiplying the resulting position coordinates by an appropriate correction factor. However, this post-processing operation also exacerbates noise due to light diffraction and other experimental errors. These sources of errors tend to be larger in the direction corresponding to the depth of the slice ("Z"), which is usually estimated through a piezocontroller in the motorized stage. Moreover, shrinkage typically varies both within and between sections, and an accurate calibration therefore requires multiple repeated measurements that add to the already demanding labor intensity of digital reconstruction. For these reasons, shrinkage is not always measured, reported or corrected for. This variability across published studies further worsens the numerous sources of differences due to experimental processing.

In light of the above consideration, we specifically looked for potential shrinkage-related confounds in the clustering results. Out of 56 unique combinations of clusters, metadata groups, and corresponding published articles, only 14 reported shrinkage estimates or mentioned shrinkage altogether. Of those, a mere 5 applied the corresponding correction to the data. Unsurprisingly given the limited sample, we found no statistically significant association between both corrected or uncorrected values and clustering. Next, we examined slicing thickness, which was reported in 49 (out of 56) cases (with median 200µm). Values varied broadly from 80 to 400µm, with 85% of them falling between 120 and 350µm. No statistical association was found between clustering and these values. The lack of explicit shrinkage information prevents firm conclusions and leaves open the possibility that some of the findings we report may be ultimately due to slicing artifacts. However, the low coefficient of variation of measurements typically sensitive to shrinkage, especially tortuosity and fractal dimension (**Table 1**), suggests that the noise related to shrinkage (as opposed to that affecting size measures) may affect most of the analyzed data to a similar degree.

Fully assessing the potential usefulness of the reported results will require additional investigation. For example, morphologically detailed electrophysiological simulations might be useful to explore how the observed relations between datasets (**Figure 6**) or between morphological variables (**Figure 7**) could affect input/output relationship of individual neurons (e.g., Scorcioni et al., 2004; Komendantov and Ascoli, 2009). Similarly, the effect of these morphological relations on potential network connectivity could be studied by embedding the digital reconstructions in an appropriate three-dimensional model of the surrounding neural tissue (e.g., Chiang et al., 2011; Ropireddy and Ascoli, 2011). The continuous expansion of the available pool of neuronal reconstructions will also enable the future validation and refinement of these results with additional or independent datasets.

#### **DISCUSSION**

This work illustrates how shared morphological data can lead to new observations of potential neurobiological interest by enabling statistical quantification of commonalities and differences among neuron groups. However, our results also demonstrate the challenges of working with large-scale datasets from heterogeneous sources, even after extensive effort in metadata curation and management as well as in data standardization and selection. Direct analysis of selected morphometric features among large neuron groups organized by the main metadata dimensions of species, brain region, and cell type failed to reveal meaningful patterns beyond the well-known variability of neuronal shape. At the same time, systematic pairwise examination of all 45 neuronal groups with distinct species, brain region, cell type, and lab of origin for each of the 27 main morphological features would produce more than 50,000 comparisons, raising questions of scientific interpretation and statistical significance.

To overcome these issues, we adopted principal component analysis to identify the most discriminant morphological features throughout the dataset, and model-based cluster analysis to segregate neuron groups solely on the basis of the morphometric characteristics. This approach allowed rigorous examination of the statistical associations between clusters and metadata and inspection of the most informative morphological measurements on the basis of their principal component loadings. The results revealed morphological differences between specific cell types and animal species that were robust to lab provenance while retaining considerable sensitivity to developmental stages and fine regional location as well as to the original experimental conditions. For example, neocortical pyramidal cells from rodents and primates alike display a trend of increasing branch tortuosity with increasing branch density (**Figure 7A**). This distinct relationship, holding across different species, brain regions, and developmental stages, appears robust to slicing artifacts as demonstrated by the co-alignment of both partially reconstructed and fully reconstructed neurons (**Figure 7B**).

The primary features of dendritic morphology corresponded to branching density, size, space filling, and bifurcation asymmetry. Of these features, size is likely to be the most dramatically impacted by differential trimming artifacts from brains of varying size. Nevertheless, the most interesting biological findings were based on branch- or bifurcation-level observations. Rodent retinal ganglion cells stood out for their extreme branching density, and clustered together with other neuron types involved in primary sensory processing as well as with developing pyramidal cells from the somatosensory cortex of 6–9 day-old rat. Moreover, the results also highlighted species differences within the same cell types by differentiating retinal cells of rodent from those of fish and amphibians. Specifically, ganglion cells have denser branching and wider bifurcation angles in rodents than in nonmammalian vertebrates (**Figures 5B**, **6**, **Table 2**). This observation is based on pooling of mice and rats data from four different labs in one cluster, and of fish and salamander from two different labs in the other, and we failed to find any methodological reasons that could explain these morphological differences.

Blowfly tangential sensory neurons are similar to the rodent ganglion cells in many morphological features (e.g., low branch path length, comparable fractal dimension, tortuosity, and arbor planarity), possibly providing a geometric correlate for their similar function in processing motion-sensitive visual information (Kong et al., 2005; Cuntz et al., 2008). Nevertheless, retinal ganglion cells and blowfly tangential cells can also be neatly distinguished due to the much more asymmetric bifurcations of the latter neurons (**Figure 6A**) relative to those of the former (and of most typical neurons). Interestingly, cluster analysis also suggested a linear relationship between topological asymmetry and branching density in rodent retinal ganglion cells but not in other groups, pointing to a previously unrecognized peculiar morphological signature of this class only.

The branching density of mature cortical pyramidal cells, in contrast, was at the opposite end relative to ganglion cells (also demonstrating the effect of developmental changes) and displayed a distinctive correlation with branch tortuosity. Adult neocortex pyramidal cells represent the largest population in NeuroMorpho.Org and come from a broad range of animals, anatomical subregions, layers, and experimental conditions, enabling certain morphological differentiations (e.g., rodent S1 vs. primate M1). Non-cortical neurons, including striatal, olfactory, and others, were distinguished for the smaller size and larger variability of their dendritic arbors.

Several recent investigations have adopted similar analysis designs and strategies for dimensionality reduction, mainly for the purpose of exploratory neuron type classification (e.g., Kong et al., 2005; McGarry et al., 2010; Santana et al., 2013). Alternative approaches to develop automated machine-learning classifiers for identifying neuron types also promise to be effective for large data sets. The present exploratory study used multivariate morphometric analysis to identify the most informative morphological features that distinguish between neuron groups organized by their metadata. We predict that statistical morphometric mining will also prove to be useful for developing quantitative hypotheses and for designing computational models of dendritic growth. At the same time, we discussed the considerable challenge of pooling together data from disparate experimental conditions, and the resulting analysis limitations.

Generation of standardized morphological data across laboratories and research designs could yield much more powerful large-scale data mining. In particular, we are convinced that better clustering would result from more consistent data collection. Systematic reliability assessment of experimental protocols can maximize morphological reproducibility and minimize tracing artifacts (e.g., Dercksen et al., 2014). Any such improvements would also help refine cluster analysis by reducing variability. Unfortunately, the arguably "ideal" experimental choices (*in vivo* labeling, reconstructions at the resolution limit of light, systematic measurement and compensation of tissue shrinkage, serial tracing across histological sections, etc.) also correspond to the most labor-intensive conditions for manual or semi-manual morphological reconstructions. This tension between quality, sample size, and research cost underscores the need and desirability of fully automated and broadly applicable tracing technologies (Brown et al., 2011; Donohue and Ascoli, 2011).

#### **ACKNOWLEDGMENTS**

We thank Dr. Diek Wheeler, Dr. Rubén Armañanzas, and Mr. David Hamilton for feedback on an earlier version of the manuscript. This work was supported in part by NIH grant R01 NS39600 and a Keck NAKFI award from the National Academy of Science. Publication of this article was funded in part by the George Mason University Libraries Open Access Publishing Fund.

#### **REFERENCES**


classes and reconstructing laboratories. *J. Comp. Neurol.* 473, 177–193. doi: 10.1002/cne.20067


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 June 2014; accepted: 06 November 2014; published online: 04 December 2014.*

*Citation: Polavaram S, Gillette TA, Parekh R and Ascoli GA (2014) Statistical analysis and data mining of digital reconstructions of dendritic morphologies. Front. Neuroanat. 8:138. doi: 10.3389/fnana.2014.00138*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Polavaram, Gillette, Parekh and Ascoli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Towards a "canonical" agranular cortical microcircuit

#### **Sarah F. Beul <sup>1</sup>\* and Claus C. Hilgetag1,2**

<sup>1</sup> Department of Computational Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

<sup>2</sup> Department of Health Sciences, Boston University, Boston, MA, USA

#### **Edited by:**

Patrik Krieger, Ruhr University Bochum, Germany

#### **Reviewed by:**

Giorgio Innocenti, Karolinska Institutet, Sweden Kathleen S. Rockland, Boston University School Medicine, USA

#### **\*Correspondence:**

Sarah F. Beul, Department of Computational Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246 Hamburg, Germany e-mail: s.beul@uke.de

Based on regularities in the intrinsic microcircuitry of cortical areas, variants of a "canonical" cortical microcircuit have been proposed and widely adopted, particularly in computational neuroscience and neuroinformatics. However, this circuit is founded on striate cortex, which manifests perhaps the most extreme instance of cortical organization, in terms of a very high density of cells in highly differentiated cortical layers. Most other cortical regions have a less well differentiated architecture, stretching in gradients from the very dense eulaminate primary cortical areas to the other extreme of dysgranular and agranular areas of low density and poor laminar differentiation. It is unlikely for the patterns of inter- and intra-laminar connections to be uniform in spite of strong variations of their structural substrate. This assumption is corroborated by reports of divergence in intrinsic circuitry across the cortex. Consequently, it remains an important goal to define local microcircuits for a variety of cortical types, in particular, agranular cortical regions. As a counterpoint to the striate microcircuit, which may be anchored in an exceptional cytoarchitecture, we here outline a tentative microcircuit for agranular cortex. The circuit is based on a synthesis of the available literature on the local microcircuitry in agranular cortical areas of the rodent brain, investigated by anatomical and electrophysiological approaches. A central observation of these investigations is a weakening of interlaminar inhibition as cortical cytoarchitecture becomes less distinctive. Thus, our study of agranular microcircuitry revealed deviations from the well-known "canonical" microcircuit established for striate cortex, suggesting variations in the intrinsic circuitry across the cortex that may be functionally relevant.

**Keywords: cytoarchitecture, intrinsic circuitry, interlaminar connectivity, striate cortex, structural variation**

#### **INTRODUCTION**

The cerebral cortex is arguably one of the most complex physical systems. Untangling the intricate relations of the myriad elements of the gray matter is one of the formidable challenges of science, as already pronounced by Santiago Ramon y Cajal:

"Devotion to the cerebral hemispheres, enigma of enigmas, was old in me. . .the supreme cunning of the structure of the gray matter is so intricate that it defies and will continue to defy for many centuries the obstinate curiosity of investigators. That apparent disorder of the cerebral jungle, so different from the regularity and symmetry of the spinal cord and of the cerebellum, conceals a profound organization of the utmost subtlety which is at present inaccessible." (Cajal, 1937)

Decades later, the picture has become more refined, but a comprehensive understanding of the cortical microarchitecture still remains a fundamental scientific challenge. A crucial step was the recognition that the cerebral cortex is locally structured into horizontal compartments ("layers") as well as vertical units ("columns") which both may be of functional relevance. Traditionally, the isocortex has been characterized in the context of a six-layered scheme (Brodmann, 1909; Vogt, 1910; von Economo, 2009), as opposed to three-layered allocortex. This scheme is, however, subject to substantial variation in the relative prominence of layers and disrupted in a considerable number of cortical areas. Nonetheless, and in spite of his acknowledgment that "the distinction of six layers can be both arbitrary and conventional" (von Economo, 2009), already von Economo himself asserted that "on practical grounds, we retain the six-layer division" (von Economo, 2009). Indeed, the simplified concept of a uniformly six-layered isocortex has prevailed (Zilles and Amunts, 2012) and become generally accepted.

The radial organization of the cortex became a subject of interest when vertical columns spanning all cortical layers were proposed to exist (Lorente de Nó, 1949; Mountcastle, 1957), with uniform columns repeating across the cortex to form an intermediate-level neural substrate for information processing. Within these columns, connectivity across cortical layers appeared stereotypical (Szentagothai, 1978; Gilbert and Wiesel, 1983). While there is still considerable debate about the existence, precise definition and the extent of heterogeneity in the cellular composition of cortical columns (Rakic, 2008; da Costa and Martin, 2010; Rockland, 2010; Smith, 2010a,b,c,d; Carlo and Stevens, 2013; Herculano-Houzel et al., 2013), the concept of radial cortical organization was later extended to the notion of a "canonical" microcircuit (Douglas et al., 1989; Douglas and Martin, 1991,

2004), as a generic template of intrinsic cortical circuitry. The computations performed by such a fundamental neuronal circuit are thought to be prescribed by the intrinsic circuitry within a cortical column, with functional specificity added by patterns of axonal inputs and outputs to and from the column. Substantial work has been devoted to the computational performance and theoretical properties of the "canonical" microcircuit (e.g., Douglas et al., 1989, 1995; Haeusler and Maass, 2007; George and Hawkins, 2009; Haeusler et al., 2009; Wagatsuma et al., 2011; Bastos et al., 2012; Habenschuss et al., 2013). In the primate prefrontal cortex, the "canonical" microcircuit was shown to be subject to modifications from the striate circuit (Heinzle et al., 2007; Godlove et al., 2014). More generally, abundant data is available on variants of intrinsic connectivity in cortical regions such as prefrontal cortex (Melchitzky et al., 2001), somatosensory cortex (Lübke and Feldmeyer, 2007; Petersen, 2007; Lefort et al., 2009; Feldmeyer et al., 2013) or auditory cortex (Barbour and Callaway, 2008; Oviedo et al., 2010; Watkins et al., 2014). Nonetheless, the notion of a "canonical" microcircuit, which has gained popularity especially in the computational neuroscience community and has also inspired neuroengineering solutions (e.g., Merolla et al., 2014), is still largely based on work in one particular cortical area, striate cortex. Moreover, much of this work has concentrated on the cat and non-human primate brain (Douglas and Martin, 2007a). Striate cortex is not only special in the amount of probing it has undergone, but is also exceptional in its cytoarchitectonic differentiation. Striate cortex is the cortical region with the highest neuron density, sporting numbers substantially higher than the remainder of the cortex (Schüz and Palm, 1989; Collins et al., 2010; Cahalane et al., 2012; Herculano-Houzel et al., 2013). The number of (sub)layers that can be identified is also higher in striate cortex than in other regions of the cortex. Instead of all parts of the cortex being uniformly differentiated, cytoarchitectonic differentiation changes gradually across the cortex (Sanides, 1970; von Economo, 2009; Zilles and Amunts, 2012), as illustrated in **Figure 1A** for the human brain. The spectrum of differentiation ranges from striate cortex, the most clearly eulaminate area, to agranular areas that lack the inner granular layer (L4), and have few identifiable sublayers as well as very low neuron density. In between these two extremes, one can find areas that are still eulaminate, but without the remarkable clarity of differentiation or dense packing of neurons found in striate cortex, such as prestriate cortex, as well as dysgranular areas with a lower density of neurons, a dissolving inner granular layer and fewer identifiable sublayers. Quantitative differences in many aspects of the structural organization of cortical tissue have been extensively demonstrated (e.g., Beaulieu and Colonnier, 1989; Defelipe et al., 1999; Dombrowski et al., 2001; Yáñez et al., 2005; Collins et al., 2010).

The variation in local cortical structure needs to be taken into account when describing a "canonical" microcircuit, because it is unlikely for the patterns of inter- and intra-laminar connections to be uniform in spite of strong variations of their structural substrate. Indeed, experimental results, for example from rodent barrel cortex, demonstrate that intrinsic connectivity is not uniform across the cortex (Sato et al., 2008; Meyer et al., 2013; Reyes-Puerta et al., 2014). Heterogeneity in cytoarchitectonic differentiation has been shown to have consequences for other aspects of structural connectivity in the brain. The laminar patterns of extrinsic connections which link brain regions along white matter pathways are strongly associated with the relative cytoarchitectonic differentiation of the connected areas (Barbas, 1986; Barbas and Rempel-Clower, 1997; Medalla and Barbas, 2006; Hilgetag and Grant, 2010; Beul et al., 2014). The stereotypic laminar patterns that have been found in non-human primate and cat cortex (**Figure 1B**) show distinctly infra- and supragranular origins and terminations for projections between areas of weak differentiation and areas of strong differentiation, while these patterns change gradually towards multilaminar origin and termination profiles as the difference in differentiation between the connected areas becomes less pronounced.

Since the variation of cytoarchitectonic differentiation is an aspect of cortical organization that is insufficiently considered in discussions of intrinsic circuitry, we here want to raise awareness of the importance of architectonic differences, by providing a first approximation of general features of intrinsic circuitry in agranular regions of the cerebral cortex. We do this by assimilating information from the available literature on inter- and intralaminar connectivity in the agranular frontal cortex of the rodent brain, in order to present a tentative model of intrinsic circuitry in cortical regions on the opposite end of the differentiation spectrum than has previously been predominantly considered for such models. This variation is crucial for applying insights gained from such model circuits in a realistic way, for example in the biological grounding of *in silico* experiments (e.g., Merolla et al., 2014).

In the following review, we briefly introduce current accounts of the "canonical" microcircuit, and then highlight a report of experimental results that reveal variation in interlaminar inhibition across cortical regions of distinct cytoarchitecture (Kätzel et al., 2011). Subsequently, we present the result of the literature survey we performed regarding data that can shed light on the intrinsic microcircuitry in agranular cortex. We chose to concentrate on the rodent brain, capitalizing on the relative abundance of experimental data available for this popular animal model. In comparison, fewer studies report on intrinsic circuitry in nonhuman primates, and only a small proportion of those considered agranular cortical regions, which are relatively infrequent in the primate brain. By focusing on the rodent brain, we can therefore provide a more detailed sketch of the intrinsic circuitry in agranular cortex without having to incorporate data across a wide range of species, which would have been a more uncertain approach.

#### **INTRINSIC CIRCUITRY IN GRANULAR CORTEX**

Over the last decades, general features of intrinsic circuitry in striate cortex have emerged from studies in the cat and nonhuman primate. Connections are proposed to form a processing loop across cortical layers, where recurrent excitation and inhibition are interlinked, which leads to amplification of inputs into the cortical column and appropriate modulation of the ensuing activity (Markram et al., 2004; Douglas and Martin, 2004, 2007a; Bannister, 2005; Lübke and Feldmeyer, 2007). To probe the local microcircuitry, diverse experimental methods with different degrees of sensitivity and reliability have been used. Two investigations that supplied the most comprehensive data on cat striate cortex employed electrophysiological and morphological approaches, respectively. Thomson et al. (2002) used dual intracellular recordings to characterize synaptic connections across cortical layers. Binzegger et al. (2004) reconstructed the morphology of neurons in striate cortex in three dimensions and estimated the number of synaptic contacts between different cell types. Both data sets have been adapted and used in various studies, for example, in the construction of computational models (e.g., Haeusler and Maass, 2007; Haeusler et al., 2009; Bastos et al., 2012; Du et al., 2012; Potjans and Diesmann, 2014). But even though the same model system, cat striate cortex, was considered across these studies, there currently exists no definite scheme of this area's intrinsic circuitry. There are, for example, diverging data on whether recurrent excitation occurs between L3 and L5 or between L4 and L3 (cf. Thomson et al., 2002; Thomson and Bannister, 2003 and Binzegger et al., 2004; Douglas and Martin, 2004).

Such discrepancies may be reconciled by future experimental findings. In contrast, reports of differences in interlaminar activation patterns across cortical regions point towards the existence of genuine variations in intrinsic circuitry across the brain. Kätzel et al. (2011) used genetically targeted photostimulation to comprehensively map inhibitory-to-excitatory connectivity in three distinct regions of mouse cortex. They assessed intra- and interlaminar connectivity in striate cortex, primary somatosensory and primary motor cortex. As mentioned before, striate cortex is by far the most differentiated cortical region, even in the rodent brain (Herculano-Houzel et al., 2013), where it is less well differentiated than for example in the primate. Primary somatosensory cortex, although still clearly eulaminate, is already much less dense and comprises fewer distinguishable sublayers, while primary motor cortex is even less cytoarchitectonically differentiated (Collins et al., 2010; Herculano-Houzel et al., 2013). Primary motor cortex thus ranges in the lower end of the differentiation spectrum with dysgranular cortical regions, although it is sometimes classified as agranular (lacking an inner granular layer, L4): see Shipp (2005) and García-Cabezas and Barbas (2014) for an extensive discussion of this issue. Other than probing connectivity in three cortical regions processing different modalities, this study can, therefore, be used to demonstrate potential differences regarding intrinsic circuitry in three areas occupying different positions in the differentiation spectrum. While Kätzel et al. (2011) report relatively uniform patterns of intralaminar inhibition across these three cortical regions, interlaminar inhibitory-to-excitatory connectivity differed substantially (**Figure 2**). In striate cortex, considerable interlaminar inhibition was observed between all layers (L2/3, L4, L5/6). In primary somatosensory cortex, a similar pattern of interlaminar inhibition was reported, but without inhibition between L2/3 and L5/6. In primary motor cortex, in contrast, no substantial inhibition between L2/3, L4, and L5/6 was evident. Thus, across the three sampled regions, interlaminar inhibitory-to-excitatory connectivity was progressively weaker in less cytoarchitectonically differentiated areas. Interpreting the results this way, we assume that they reflect genuine variation in the presence of interlaminar inhibition, and not the impact of other aspects of structural variation across the studied areas. For example, systematic differences in cellular morphology across the sampled areas could lead to skewed results from applying the same measurement approach to all areas. Nonetheless, these observations support the notion that intrinsic circuitry cannot be uniform in the face of considerable variation of the structural substrate, as is the case in regions of the cerebral cortex with profoundly differing cytoarchitectonic differentiation.

#### **TENTATIVE INTRINSIC CIRCUITRY OF THE AGRANULAR CORTEX**

**Figure 3** summarizes our review of the available literature on intrinsic interlaminar circuitry in the agranular frontal cortex of the rodent brain and puts it in comparison to a recent rendering of the intrinsic circuitry in striate cortex. Excitatoryto-excitatory connections from L2/3 to L5 have clearly been demonstrated in rat agranular frontal cortex by measuring excitatory postsynaptic currents (EPSC) in monosynaptically coupled pyramidal neurons in L5, induced by stimulation in L2/3 (Kang, 1995; Otsuka and Kawaguchi, 2008, 2009, 2011; Hirai et al., 2012). One of these paired recording studies (Otsuka and Kawaguchi, 2009) additionally demonstrated the existence of excitatory-to-inhibitory connections from L2/3 to L5, a finding also reported by Apicella et al. (2012) in mouse motor cortex. The experiments of Hirai et al. (2012) showed that reciprocal connections to the excitatory-to-excitatory connections from L2/3 to L5 exist from L5 pyramidal cells to L2/3 pyramidal cells. This observation is confirmed in medial entorhinal cortex of the rat (van Haeften et al., 2003), which can be considered agranular since its layer IV ("lamina dissecans") is mainly acellular (Andersen et al., 2007). The microscopy study of van Haeften et al. (2003) traced the processes of pyramidal cells in the deep layers ramifying in superficial layers, and identified the synaptic contacts made by those neurons. The analysis revealed excitatory-to-excitatory, as well as excitatory-to-inhibitory, connections from deep to superficial layers.

Considering the trend of weakening inhibitory-to-excitatory connectivity in cytoarchitectonically less differentiated areas (Kätzel et al., 2011, see above), we consider it likely that there exists no substantial interlaminar inhibition of excitatory neurons in rodent agranular frontal cortex, which is reflected in our tentative circuit diagram. The study by van Haeften et al. (2003) in medial entorhinal cortex, which reports an absence of inhibitory-to-excitatory synapses from deep to superficial layers, supports the same conclusion. Van Haeften et al. furthermore report that only a small percentage of the observed synapses could potentially be classified as inhibitory-to-inhibitory, thus giving little evidence for such a connection from deep to superficial layers. Considering the reciprocal inhibitory-to-inhibitory connection from superficial to deep layers, we could find no studies reporting either on the absence or presence of such a connection. In the circuit diagram (**Figure 3**), we did not include connections which could only be inferred from exclusively morphological results (e.g., Kawaguchi, 1993, 1995; Kawaguchi and Kubota, 1997; Kubota et al., 2011), since we did not consider data on the spatial spread of axon collaterals sufficiently reliable to demonstrate a functional connection, given that synapse formation has been shown to be highly specific (e.g., Kozloski et al., 2001; Brown and Hestrin, 2009). For these reasons, **Figure 3B** indicates no inhibitory interlaminar connections, although the validity of this assessment of course remains contingent upon further experimental data.

By contrast, there is abundant evidence for rich intralaminar connectivity including excitatory-to-inhibitory and inhibitoryto-excitatory connections (Kang, 1995; Somogyi et al., 1998; Kawaguchi and Kondo, 2002; Barthó et al., 2004; Otsuka and Kawaguchi, 2009; Fino and Yuste, 2011; Kätzel et al., 2011). Therefore, we assumed a stereotypical pattern of connectivity within deep and superficial layers as illustrated in **Figure 3B**.

The intrinsic circuitry we have sketched here thus comprises interlaminar excitatory connections that connect neuronal populations from both upper and lower layers to excitatory as well as inhibitory neuron populations in the complementary cortical layers. Within upper and lower layers, intralaminar connections reciprocally connect excitatory and inhibitory

neuron populations. This intrinsic interlaminar circuitry is strikingly similar to the simplified original circuit diagram for the striate cortex of Douglas et al. (1989), and allows for recurrent excitation and inhibition. These physiological interactions were inferred to underlie essential computational mechanisms in striate cortex (Douglas et al., 1995; Douglas and Martin, 2007b, 2009). The microcircuitry as we sketch it here should accordingly be able to support elemental neural functions, such as the amplification of weak inputs through positive feedback or gain control and signal normalization through negative feedback.

#### **DISCUSSION**

The starting question of this review was whether there exists a generic template of intrinsic microcircuitry in the cortex, despite pronounced regional differences in cytoarchitectonic organization. The answer depends strongly on how broadly the concept of stereotypy is framed (Silberberg et al., 2002), but even for the cortical region studied most intensely in this context, striate cortex, there exists as yet no consensus on a detailed "canonical" microcircuit. Moreover, differences in circuitry have been reported across the cortex, which are consistent with the changes in the structural substrate in which intrinsic connectivity is embedded. In order to account for these structural differences, we propose a tentative circuit diagram for the agranular frontal cortex of the rodent brain, an agranular region which is strikingly opposed to striate cortex in its cytoarchitectonic organization. Our review of the existing literature points to an intrinsic circuit that features excitatory-to-excitatory and excitatory-to-inhibitory connections from upper layers to lower layers, as well as from lower layers to upper layers (**Figure 3B**), but shows no interlaminar inhibitoryto-inhibitory or inhibitory-to-excitatory connections. This circuit is based on multiple approaches for structural and functional circuit investigation (such as electrophysiological paired recordings using microstimulation, anatomical tracing experiments, or examination of morphological features using light and electron microscopy), with different caveats and varying levels of reliability. Importantly, the information was drawn from studies whose primary goal was not necessarily the characterization of interlaminar circuitry. Our circuit diagram is therefore subject to debate and should be modified in the light of future information. In compiling the circuit diagram, we engaged in some common simplifications regarding the anatomical substrate in which the connections are placed. In studying intrinsic circuitry, distinct sublayers are often collapsed, as for example when layers 5A, 5B and 6 are considered collectively as "infragranular" layers. This treatment may be misleading, since different (sub)layers have been shown to be involved in distinct processing circuits (for example, Lübke and Feldmeyer, 2007). The same caveat holds for the merging of diverse neuron types into the two main classes of inhibitory and excitatory neurons. It discards a wealth of functionally relevant information about morphological and physiological differences between neuron types, as well as about cell type specific connectivity (Kozloski et al., 2001; Silberberg et al., 2002; Thomson and Bannister, 2003; Kampa et al., 2006; Otsuka and Kawaguchi, 2008, 2009, 2011; Brown and Hestrin, 2009; Xu and Callaway, 2009; Apicella et al., 2012; Hirai et al., 2012). Not to disambiguate such significant anatomical features introduces additional uncertainty about the validity of any intrinsic circuit diagram. Moreover, note that a description of general layer-to-layer connectivity within a column, as we propose here, does not necessarily reflect synaptic circuits formed by individual neurons across layers, as, for example, Binzegger et al. (2004) have estimated. Thus, there may exist functionally relevant differences between the average laminar interconnections described here and the specific laminar microcircuits formed within these average patterns. A further dimension that is missing from many descriptions of local microcircuitry is an estimation of connection strength. However, with current technology, structural measures of strength, such as the frequency of connections from one cell type onto another or the number of involved synapses and their morphology, can only be obtained by arduous manual labor. Moreover, the translation of structural into functional strength, as expressed by the amplitude of evoked postsynaptic currents, is opaque: number, size, morphology and position of synapses matter, as do numerous molecular mechanisms regulating synaptic function at both the pre- and postsynaptic site. In addition, the impact of evoked currents on postsynaptic cell function depends on many further factors. All these aspects are not static, but can potentially change on short time scales (Squire et al., 2008; Buonomano and Maass, 2009; Dityatev et al., 2010; Eroglu and Barres, 2010; Silver, 2010; Ribrault et al., 2011; Arnsten et al., 2012; Camiré and Topolnik, 2012; Caroni et al., 2012; Cortés-Mendoza et al., 2013; Dallérac et al., 2013; Vitureira and Goda, 2013; Chevaleyre and Piskorowski, 2014).

Although the proposed intrinsic circuitry for agranular cortex is still speculative, the issue we address remains crucial (Marcus et al., 2014). There has to be variation in intrinsic circuitry across the cerebral cortex, because the composition of the cortex is not uniform, but highly variable on a number of dimensions. We are convinced that a better understanding of the intrinsic cortical circuitry is essential for an improved comprehension of its physiology, and has to take into account differences in the cortical structural substrate. We hope that we have provided a starting point for discussion which will lead to the synthetization of new insights from available data or further experimental efforts to elucidate circuitry outside of striate cortex, taking structural variation into consideration.

#### **ACKNOWLEDGMENTS**

Supported by DFG grant SFB 936/A1. We thank K.A.C. Martin anf H. Barbas for helpful comments on the manuscript.

#### **REFERENCES**


network interactions and extracellular features. *J. Neurophysiol.* 92, 600–608. doi: 10.1152/jn.01170.2003


biocytin labelling in vitro. *Cereb. Cortex* 12, 936–953. doi: 10.1093/cercor/ 12.9.936


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; accepted: 19 December 2014; published online: 14 January 2015*.

*Citation: Beul SF and Hilgetag CC (2015) Towards a "canonical" agranular cortical microcircuit. Front. Neuroanat. 8:165. doi: 10.3389/fnana.2014.00165*

*This article was submitted to the journal Frontiers in Neuroanatomy*.

*Copyright © 2015 Beul and Hilgetag. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

### Generation of dense statistical connectomes from sparse morphological data

#### *Robert Egger 1,2,3†, Vincent J. Dercksen4†, Daniel Udvary1,2,3, Hans-Christian Hege4 and Marcel Oberlaender 1,3,5\**

*<sup>1</sup> Computational Neuroanatomy Group, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany*

*<sup>2</sup> Graduate School of Neural Information Processing, University of Tuebingen, Tuebingen, Germany*

*<sup>3</sup> Bernstein Center for Computational Neuroscience, Tuebingen, Germany*

*<sup>4</sup> Department of Visual Data Analysis, Zuse Institute Berlin, Berlin, Germany*

*<sup>5</sup> Digital Neuroanatomy Group, Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA*

#### *Edited by:*

*Patrik Krieger, Ruhr University Bochum, Germany*

#### *Reviewed by:*

*Jochen Ferdinand Staiger, University Medicine Goettingen, Germany Sean L. Hill, International Neuroinformatics Coordinating Facility, Sweden Jaap Van Pelt, VU University Amsterdam, Netherlands*

#### *\*Correspondence:*

*Marcel Oberlaender, Computational Neuroanatomy Group, Max Planck Institute for Biological Cybernetics, Spemannstraße 38-44, Tuebingen 72076, Germany e-mail: marcel.oberlaender@ tuebingen.mpg.de*

*†These authors have contributed equally to this work.*

Sensory-evoked signal flow, at cellular and network levels, is primarily determined by the synaptic wiring of the underlying neuronal circuitry. Measurements of synaptic innervation, connection probabilities and subcellular organization of synaptic inputs are thus among the most active fields of research in contemporary neuroscience. Methods to measure these quantities range from electrophysiological recordings over reconstructions of dendrite-axon overlap at light-microscopic levels to dense circuit reconstructions of small volumes at electron-microscopic resolution. However, quantitative and complete measurements at subcellular resolution and mesoscopic scales to obtain all local and long-range synaptic in/outputs for any neuron within an entire brain region are beyond present methodological limits. Here, we present a novel concept, implemented within an interactive software environment called *NeuroNet*, which allows (i) integration of sparsely sampled (sub)cellular morphological data into an accurate anatomical reference frame of the brain region(s) of interest, (ii) up-scaling to generate an average dense model of the neuronal circuitry within the respective brain region(s) and (iii) statistical measurements of synaptic innervation between all neurons within the model. We illustrate our approach by generating a dense average model of the entire rat vibrissal cortex, providing the required anatomical data, and illustrate how to measure synaptic innervation statistically. Comparing our results with data from paired recordings *in vitro* and *in vivo*, as well as with reconstructions of synaptic contact sites at light- and electron-microscopic levels, we find that our *in silico* measurements are in line with previous results.

**Keywords: Peters' rule, barrel cortex, cortical column, thalamus, axon, dendrite**

#### **INTRODUCTION**

One of the major challenges in neuroscience is to relate results from structural and functional measurements across multiple spatial scales. Current anatomical approaches either provide information of synaptic connectivity at macroscopic, i.e., between brain regions (e.g., using bulk injections of retro/anterograde agents, Oh et al., 2014), mesoscopic, i.e., between cell types (e.g., using transgenic animal models, Wickersham et al., 2007), microscopic, i.e., between small numbers of individual neurons (e.g., using multi-electrode recordings in acute brain slices *in vitro*, Feldmeyer et al., 1999; Perin et al., 2011) or nanoscopic scales, i.e., reconstructing synaptic contact sites within small volumes (e.g., using electron microscopy in dense, Briggman et al., 2011, or sparsely labeled tissue, Schoonover et al., 2014). While all of these approaches provided important structural information about the neuronal circuitry, results obtained at different scales (and often even at the same scale when obtained by different methods) are largely incompatible. This prevents from generating wiring diagrams that provide quantitative and complete information of the

number and subcellular location of all synaptic in/outputs for any neuron within and across brain areas (commonly referred to as "dense connectome").

At present, methods that allow for measurements of synaptic connectivity at sufficiently high resolution (i.e., (sub)cellular levels) can be grouped into three main categories: First, electrophysiological approaches determine connectivity between pairs (or small numbers) of neurons using simultaneous patchclamp recordings (e.g., Feldmeyer et al., 1999; Lefort et al., 2009), or combinations of single neuron recordings with optical stimulation, such as glutamate uncaging (Callaway and Katz, 1993; Schubert et al., 2007) or channelrhodopsin-assisted circuit mapping (Petreanu et al., 2009). Often, these approaches are combined with labeling the recorded neurons, allowing for reconstruction of the respective soma locations, dendrite morphologies and putative contact sites at light-microscopic levels (Feldmeyer et al., 1999, 2002; Sun et al., 2006; Frick et al., 2008; da Costa and Martin, 2011). Paired recording/reconstruction approaches are however limited to acute brain slices *in vitro*, where slice thicknesses of 300μm result in substantial cutting of dendrites (Oberlaender et al., 2012a) and axons (Oberlaender et al., 2011), limiting these measurements to close-by neurons.

Second, electron-microscopic approaches, such as serial block face scanning (SBFSEM, Denk and Horstmann, 2004) or ionbeam techniques (Merchan-Perez et al., 2009), allow for automated imaging of small tissue volumes containing sparse (Lang et al., 2011) or densely labeled (Briggman et al., 2011) neuronal structures. Whereas technical issues of these microscope systems, which currently prevent from imaging larger volumes (e.g., an entire cortical column), may be overcome in the near future (Mikula et al., 2012), annotation and reconstruction of the rapidly increasing image data renders as the major bottleneck, limiting these approaches to tissue samples of at most 0.5 × 0.5 × 0.5 mm3 (Helmstaedter, 2013). Despite great progress in automated tracing (Kim et al., 2014), crowd sourcing of manual annotation (Helmstaedter et al., 2011) and combinations of manual and automated tools (Takemura et al., 2013), generation of complete dense connectomes (i.e., wiring diagrams that specify all in/outputs to a neuron) will require reconstructions of entire brain areas, spanning volumes of several cubic millimeters to centimeters, spatial scales that are multiple orders of magnitude beyond the present limits of these techniques.

Third, statistical approaches allow to determine cell typeand/or location-specific connectivity patterns by measuring structural overlap between reconstructed axons and dendrites of individual (Lubke et al., 2003) or bulk-labeled neurons (Meyer et al., 2010). Such approaches are commonly referred to as application of Peters' rule (White, 1979), but the validity of predicting synaptic connectivity by axo-dendritic overlap remains controversial (Mishchenko et al., 2010). The primary reason for this controversy arises from the fact that to date a quantitative and coherent framework to measure structural overlap is missing. Specifically, Peters' rule is often misinterpreted in a binary fashion, namely if dendrites and axons of two neurons overlap within a certain volume, it is assumed they are connected (Brown and Hestrin, 2009). In contrast, if dendrites and axons do not overlap, there will be no contact, the strongest implication from this approach. However, independent of the spatial scale at which the overlap is measured, within the respective overlap volume, dendrites and axons from other (unstained) neurons will be present and are equally likely to be connected to the stained neurons. Thus, overlap can never be assumed as evidence for a connection, but has to be interpreted as a probability for a connection with respect to all present neurons instead.

Here, we present a novel approach, implemented within an interactive software environment called *NeuroNet (NN)*, which formulates a coherent framework to measure structural overlap between two neurons, yielding connection probabilities with respect to all neurons present in the overlapping volume. This quantitative version of Peters' rule requires generation of an average dense model of the neuronal circuitry; dense referring to the fact that every neuron within the model of the brain structure of interest (i) has to be distributed according to measured 3D soma distributions, (ii) is represented by a complete 3D reconstruction of soma/dendrites/axon found at the respective location and (iii) contains information of cell type, as well as subcellular distributions of dendritic spines, diameters and axonal boutons (**Figure 1A**). *NN* allows integrating such anatomical data into a common reference frame that describes the average geometry, as well as its variability across animals, of the brain region(s) of interest (**Figure 1B**). Within the resolution of the reference frame, *NN* further allows to calculate synaptic innervation between any two neurons in the model, always taking all other neurons within the respective overlap volumes into account (**Figure 1C**). The resultant dense "statistical" connectome yields pairwise connection probabilities, numbers of putative synaptic contacts and subcellular synapse distributions for all neurons within an entire brain region, allowing for comparison of these *in silico* measurements with electrophysiological, light- and electron-microscopic data.

We illustrate our approach using the vibrissal part of rat primary somatosensory cortex (i.e., barrel cortex, vS1), present the required anatomical data and compare our *in silico* measurements of cell type-specific local (i.e., within a layer 4 (L4) barrel) and long-range (i.e., between thalamus and L4, L5, and L6 in vS1) innervation with previous results. Because our *in silico* measurements match previous *in vitro*/*vivo* data, we conclude that our concept of generating an average dense network model and providing a coherent framework to calculate Peters' rule in terms of innervation probabilities is an accurate alternative to any currently available connectivity mapping method. In addition, our approach now opens the possibility to investigate locationspecific differences of connectivity within a population, as well as presence of higher-order connectivity patterns within and across cell types.

#### **METHODS**

#### **DESIGN OF** *NeuroNet* **SOFTWARE**

The interactive software environment *NN* is implemented as an extension package for the *Amira* visualization software (FEI-Visualization Sciences Group, 2014), allowing for 3D visualization of anatomical input data, dense neuronal networks and synaptic connectivity measurements (Dercksen et al., 2012). *NN* comprises three major building blocks. First, the interface between *NN* and the anatomical input data is implemented as a *NeuralNetworkSpecification* data object. The user creates such a data object as a first step (initialized as an empty network) and loads all required input data (see specifications of data and format below). The *NeuralNetworkSpecification* object encapsulates all required anatomical data and can be saved to disk. Second, a network assembly module called *NeuronDistributor* takes the *NeuralNetworkSpecification* object as its input, integrates all anatomical data and performs an up-scaling operation to generate an average dense model of the network. The output of the *NeuronDistributor* module is a *SpatialGraphSet* data object, containing a list of transformed morphologies with an associated cell type. This *SpatialGraphSet* can be saved to disk. Third, a connectivity computation module called *NeuralNetworkAnalyzer* takes as input the *NeuralNetworkSpecification* and the *SpatialGraphSet* to calculate axo-dendritic overlaps between individual neurons. This compute module offers a query interface and selection/visualization options. The output generated by the *NeuralNetworkAnalyzer* includes a dense statistical connectome

**FIGURE 1 | Generating dense statistical connectomes. (A)** Generating a dense statistical connectome of a brain or brain region requires a standardized 3D reference frame of this brain region. The reference frame is used to register all anatomical data obtained from different experiments to a common coordinate system. Anatomical data to be collected from the brain region of interest: Number and 3D distribution of excitatory and inhibitory neuron somata; 3D reconstructions of representative samples of dendrites and axons of excitatory and inhibitory neuron cell types; determination of postsynaptic target densities, e.g., spine densities and dendrite surfaces, and presynaptic bouton densities for excitatory and inhibitory neuron cell types. **(B)** Anatomical data are assembled into a complete 3D network model. First, based on their 3D location, excitatory and inhibitory neuron somata are assigned to different anatomical substructures of the brain regions and to cell types. Next, somata of all cell types are replaced with dendrite and axon morphologies of the respective cell types. **(C)** Innervation from neuron *i* to neuron *j* is computed in 3D at a resolution determined by the anatomical variability of the 3D reference frame. This computation takes all possible postsynaptic targets of neuron *i* in addition to neuron *j* into account.

as represented by an innervation matrix *Iij* (for all selected neuron pairs *i* and *j*), as well as aggregate statistics about cell typeand location-specific connectivity, such as the convergence, divergence, connection probabilities, average number of synapses per cell or per cell type, and information about the number of neurons per cell type. These data can be saved as *AmiraMesh* tables or text files.

All routines of *NN* are implemented in C++ and the software is available for download online at http://www.zib.de/ software/neuronet, including a manual for installation/usage and an exemplary dataset for testing the software. Downloads are available for Windows and Linux operating systems. *NN* supports multi-threaded computation using the OpenMP libraries. Computations presented in the Results section were performed on a desktop computer with 8 CPUs and 48 GB RAM. Hardware requirements depend on the size (number of neurons, dendritic/axonal lengths) of the neuronal network. For example, calculating connectivity between thalamus and all excitatory neurons within a single cortical column required memory of ∼12 GB RAM. Instead, for networks containing several hundreds of thousands of neurons (e.g., for entire vS1), we recommend a computeserver with at least 64 CPUs and 500 GB RAM.

#### **ANATOMICAL INPUT DATA**

Mandatory anatomical input data to *NN* comprise: 1. a standardized 3D reference frame, 2. 3D distributions of excitatory and inhibitory neuron somata, 3. representative samples of cell type-specific complete 3D morphological reconstructions and 4. measurements of cell type-specific subcellular distributions of soma/dendrite surface areas, dendritic spines and axonal boutons. In the following we introduce the formats for presenting the respective data to *NN*, provide example datasets for rat vS1 and review methodological approaches that allowed generating these example datasets (all anatomical data used in the Results section were acquired using experimental procedures carried out in accordance with the animal welfare guidelines of the Max Planck Society).

#### *Standardized 3D reference frame*

The most important prerequisite to assemble average dense models of the neuronal circuitry is the definition of a standardized 3D reference frame that allows integration of anatomical data obtained from many animals. In general, the reference frame describes the 3D geometry of the brain region(s) of interest in terms of anatomical landmarks. Further, it specifies the variability of these landmarks across animals, which serves as a resolution limit of the average circuit model. More specifically, the 3D reference frame has to describe (i) the boundaries of the brain region(s) of interest, (ii) anatomical substructures within these regions, and (iii) a global and/or multiple local coordinate systems. The latter reflects the general scenario that brain areas have irregular and/or curved boundaries and sub-structures.

In case of rat vS1, the 3D reference frame has been generated by reconstructing the pial surface of entire rat cortex, the white matter tract (WM) and the circumferences of 24 cortical barrel columns (i.e., each representing one of the large facial whiskers on the animal's snout, Woolsey and Van der Loos, 1970), using high-resolution 3D images of the left hemisphere of Wistar rats at an age of 28 days (Egger et al., 2012). Repeating these reconstructions for 12 animals of the same strain and age, we superimposed all geometries using rigid transformations, minimized the distances between the respective center locations of the 24 barrel columns and calculated the average column center locations, column diameters and orientations, as well as the average 3D surfaces of the pia and WM above and below vS1, respectively (**Figure 2A**). The column centers are given with respect to a global coordinate system, where the z-axis is defined as the shortest perpendicular axis between the center of the barrel column representing the D2 whisker and the pial surface above the column. The x-axis points from the D2 center toward the center of the first adjacent rostral column (i.e., along the whisker row toward D3). The y-axis points approximately toward the first adjacent caudal column (i.e., along the whisker arc toward C2).

Because the pial and WM surfaces are curved, the orientation of each barrel column is tilted with respect to the (D2) z-axis. Therefore, we determined 23 additional local coordinate systems (i.e., for each barrel column), using the same approach used to determine the global D2 coordinate system. The final standardized reference frame of rat vS1 thus comprises the average pial and WM surfaces, 24 column center coordinates and diameters with respect to the global D2 coordinate system and 24 z-axes, representing local coordinate systems that define the orientation of each barrel column within the curved cortex. We further determined the variability of these anatomical landmarks across animals. The standard deviations (SDs) of the column center locations were on average ∼90μm, of the pia-WM distances ∼100μm and of the column orientations ∼4.5 degrees (Egger et al., 2012). Thus, the geometry was remarkably preserved across animals and we defined the resolution limit of our average network model accordingly as 50μm. Consequently, the volume comprising the standardized reference frame of rat vS1 was superimposed with a grid of 50 × 50 × 50μm<sup>3</sup> voxels and a local z-axis was calculated for each voxel by interpolating from the respective nearest barrel column axes.

The 3D reference frame of rat vS1 is presented to *NN* as follows: (1) A spreadsheet (*csv* file) contains information about the barrel column geometries with respect to the global coordinate system, i.e., the 3D center locations, column radii and a unit vector pointing along the respective orientation. Each column is further assigned a unique identifier (substructure) label. (2) A 3D vector field (*AmiraMesh vector field*) containing unit vectors at 50μm resolution pointing toward the curved pial surface. In general, such vector fields should be sampled at the resolution of the 3D reference frame. (3) 3D boundary surfaces (*AmiraSurface* format) describing the 3D volume of the brain region (here: pial and WM surfaces). Additional boundary surfaces of anatomical substructures can be provided, e.g., borders of cytoarchitectonic cortical layers. *NN* currently supports the reference frame of vS1, but can be easily extended to other brain areas that can be described by 3D boundary surfaces and global and/or local coordinate systems. The resolution (i.e., voxel grid used for computations in *NN*, see below) can be adjusted to any value as determined by the inter-animal variability of the respective reference frame.

**FIGURE 2 | Anatomical data used for generating dense statistical connectomes of rat vibrissal cortex (vS1). (A)** Left: Rat vS1 contains segregated anatomical structures, called barrels, which are arranged somatotopically to the pattern of the large facial whiskers. Right top: Tangential view of barrels in the standardized rat vS1 cortex (see inset on left). These barrels provide natural landmarks for registration of anatomical data into the standardized reference frame. Bottom: Semi-coronal view of barrel columns in 3D. Pial and white matter (WM) surfaces delineate the vertical cortical boundaries in 3D. **(B)** 3D distribution of excitatory (left) and inhibitory (right) neuron somata with respect to cortical barrel columns in rat vS1. Center: Close-up view of neuron somata in insets in left and right panels. **(C)** Left: 3D dendrite reconstructions of 10 excitatory (black) and 5 inhibitory (green) cell types. Right: 3D dendrite (black) and axon (blue) reconstruction of

an excitatory L5 slender-tufted pyramidal neuron. **(D)** Close-up views of the soma and dendrite surface reconstructions of an excitatory (black, top) and an inhibitory (green, bottom) neuron, corresponding to the dendrite morphologies marked with an asterisk (**\***) in **(C)**. **(E)** Determination of dendritic spines, dendrite surface and axonal boutons of a L4 spiny stellate neuron. Top: z-projection of a 50 μm thick section containing the soma, dendrites and axon branches. Center: From left to right: Close-up view of dendrite branch in left inset in top panel; close-up view of dendrite segment in inset in panel to the left; digital reconstruction of dendrite surface and spine locations of dendrite segment in panel to the left. Bottom left: Close-up view of axon branch in right inset in top panel. Bottom right: Close-up view of axon segment in inset in bottom left panel, with digital reconstruction of axon and bouton locations along the axon (shifted for visualization).

#### *3D soma distributions*

The second anatomical prerequisite to generate an average dense model of the neuronal circuitry are measurements of the number and 3D distribution of excitatory and inhibitory neuron somata for the entire brain region(s) of interest. These distributions have to be obtained with respect to, and at the resolution of, the anatomical reference frame. In case of rat vS1, we stained 50μm thick histological sections, cut tangentially to the D2 barrel column axis from the pia toward the WM, for NeuN (Mullen et al., 1992) and GAD67 (Kaufman et al., 1986; Kobayashi et al., 1987; Julien et al., 1990) to reveal all excitatory and inhibitory neurons, respectively. Using automated soma detection software (Oberlaender et al., 2009b), we determined the 3D center locations of all excitatory/inhibitory neuron somata for entire rat vS1 of four Wistar rats (age 28–29 days, Meyer et al., 2013, **Figure 2B**). For each counting dataset, we superimposed a 50μm voxel grid and generated two 3D somata distributions for excitatory and inhibitory neurons, respectively (i.e., number of somata in 10<sup>3</sup> per mm3). The two average soma density fields are provided to *NN* as 3D images (*AmiraMesh* format). We further determined the number of neurons per thalamic barreloid (Land et al., 1995; Meyer et al., 2013), which provide whisker-specific input to the respective barrel column (Brecht and Sakmann, 2002).

#### *Cell type-specific 3D morphologies*

The third prerequisite to generate an average dense model of the neuronal circuitry are reconstructions of complete 3D soma/dendrite/axon morphologies. The morphological dataset has to be representative for the brain region, fulfilling two criteria: (1) objective classification approaches should reveal all axo-dendritic cell types (i.e., dendrite as well as axon projection patterns are similar within, but significantly different between cell types) reported for the brain region(s) of interest (see, Narayanan et al., under review, for excitatory cell types in rat vS1), and (2) spatial sampling of neurons should be performed at the resolution of the anatomical reference frame (i.e., revealing locationdependent differences in morphology, spatial distribution and overlap of different cell types). For each cell type, a number of properties is defined using a spreadsheet (*csv* file) with predefined format: (1) whether the cell type is excitatory or inhibitory, (2) whether the morphology should be rotated during network assembly, i.e., if dendrites display asymmetric projections, such as polar dendrites pointing toward the center of a substructure (e.g., L4ss, Egger et al., 2008), (3) whether the reconstructions contain only axon or dendrites/axon, (4) whether the cell type has somata within and/or outside sub-structures (e.g., L4ss are only located inside the column, but not in septa between columns, Staiger et al., 2004; Bruno and Sakmann, 2006; Egger et al., 2008), and (5) the density of presynaptic contact sites (i.e., boutons) per μm axon, differentiated by sub-structures, in particular one value for boutons in infragranular, granular and supragranular layers of vS1, respectively. Finally, the spatial distribution of each cell type is determined by 3D boundary surfaces that describe the (sub)regions(s) where the cell type is found. If more than one cell type is present within such a 3D region, the relative frequency of morphologies from each cell type within the overlap region is specified using spreadsheets (*csv* files) with predefined format.

In case of rat vS1, we labeled individual neurons with Biocytin using cell-attached recordings *in vivo* (Pinault, 1996; Narayanan et al., 2014). After cutting the brain into 100μm thick vibratome sections (i.e., tangential to the D2 barrel column axis, from the pia toward the WM), manual tracing software (e.g., *NeuroLucida*) or custom-designed semi-automated imaging and tracing systems (Oberlaender et al., 2007, 2009a; Dercksen et al., 2014) allow reconstruction of complete 3D morphologies with respect to the anatomical reference frame of rat vS1. Doing so, we reconstructed 153 excitatory neurons across the entire cortical depth (i.e., from L2 to L6) and used objective classification approaches to subdivide our sample into 10 axo-dendritic excitatory cell types (**Figure 2C**, Narayanan et al., under review). Because we obtained morphologies for every 50 μm of cortical depth, our spatial sampling is regarded as representative for rat vS1. Further, the 10 excitatory cell types represent all morphological classes that have been reported to date for rat vS1: L2 pyramids (L2, *n* = 16) and L3 pyramids (L3, *n* = 30) (Brecht et al., 2003; Staiger et al., 2014); L4 star pyramids (L4sp, *n* = 15), L4 spiny-stellates (L4ss, *n* = 22) and L4 pyramids (L4py, *n* = 7) (Staiger et al., 2004); L5 slender-tufted pyramids (L5st, *n* = 18) and L5 thick-tufted pyramids (L5tt, *n* = 16) (Hallman et al., 1988; Larkman and Mason, 1990); L6 corticocortical pyramids (L6cc, *n* = 11), L6 corticothalamic pyramids (L6ct, *n* = 13) and L6 inverted pyramids (L6inv, *n* = 5) (Kumar and Ohana, 2008). Consequently, sampling ∼1% of all excitatory neurons located within a barrel column of rat vS1 is regarded as representative for all cell type-specific soma/dendrite/axon morphologies. Further, we reconstructed the cortical parts of thalamocortical axons (with respect to the reference structures of vS1, *n* = 14), labeled *in vivo* in the ventral posterior medial nucleus (VPM) of rat vibrissal thalamus (Oberlaender et al., 2012b). Similarly, axo-dendritic cell types of inhibitory interneurons (INH) need to be defined. **Figure 2C** illustrates five axo-dendritic INH types, as previously reported (Helmstaedter et al., 2009; Koelbl et al., 2013) and kindly provided by Moritz Helmstaedter, Dirk Feldmeyer and Hanno S. Meyer. At this point, it remains to be investigated whether these classes can be regarded as representative of rat vS1 in terms of the above stated criteria. INH morphologies are thus used purely for illustration of our approach throughout the present article. Further, in contrast to the excitatory dataset, INH morphologies were obtained by recording/labeling in acute brain slices *in vitro*. The total number of morphologies used in the subsequent application examples is 371 (153 excitatory and 204 inhibitory neurons from vS1 and 14 thalamocortical neurons from VPM).

*NN* expects these morphologies to be organized into folders according to [sub-structure label (e.g., barrel column ID)]/[cell type folder name]. The morphologies are specified either as *Amira SpatialGraphs* (Dercksen et al., 2014) or in the NEURON *hoc* language (Hines and Carnevale, 1997). If presented as *SpatialGraphs*, the branches comprising the morphologies have to be labeled as *Soma*, *ApicalDendrite*, *BasalDendrite*, or *Axon,* respectively. If specified in the *hoc* language, branches have to be labeled *soma*, *apical* for apical dendrites, *dendrite* for basal dendrites, or *axon,* respectively. Each cell type is represented twice, both as an axon cell type and a dendrite cell type. This implementation allows including long-range connections from cell types located in other brain regions (e.g., VPM axons, where soma/dendrites are located in the thalamus). The number of these long-range axon morphologies is specified in *NN* using a spreadsheet (*csv* file) with predefined format. In case of VPM axons, the number of morphologies innervating a respective barrel column is determined from cell counts in thalamus (i.e., the number of neurons per whisker-specific barreloid, Meyer et al., 2013).

#### *Subcellular morphological statistics*

The final anatomical prerequisite to generate an average dense model of the neuronal circuitry is measurements of the density of postsynaptic target sites (PSTs), i.e., spines along dendrites of excitatory neurons and surface areas of somata and dendrites of excitatory/inhibitory neurons for all cell types present within the brain region(s) of interest. 3D reconstruction of soma and dendrite diameters of excitatory and inhibitory neurons was performed manually using *NeuroLucida* software (**Figure 2D**). Dendritic spine densities and axonal bouton densities were determined manually from high-resolution 3D image stacks (92 × 92× 200 nm<sup>3</sup> voxel size) along skeleton tracings of *in vivo* labeled neurons of all cell types (**Figure 2E**). These data are grouped by morphological cell type.

Connections between cell types are specified in *NN* using a spreadsheet (*csv* file) with predefined format. For each possible connection between two cell types, the presynaptic cell type, postsynaptic cell type, as well as the normalized number of PSTs per μm2 area, and/or per μm branch length is defined, based on measured values (using the methods stated above) for each cell type and substructure (soma, apical dendrite, or basal dendrite). This meta-connectivity list thus specifies general knowledge of whether two cell types can in principle connect to each other and at which substructures. For example, inhibitory interneurons may specifically innervate somata and dendritic shafts of excitatory neurons. Thus, connections from interneuron to excitatory cell types can be specified in the meta-connectivity list such that PSTs are exclusively calculated by the surface areas of the excitatory somata and dendrites (i.e., soma/dendrite surface-specific PSTs). In contrast, connections from excitatory to excitatory cell types may be specified in the meta-connectivity list such that PSTs are calculated exclusively by the spine densities (i.e., dendrite length-specific PSTs).

#### **DATA INTEGRATION AND UP-SCALING TO GENERATE AVERAGE DENSE CIRCUIT MODELS**

Upon availability of the above described anatomical data in appropriate formats, *NN* automatically generates an average dense representation of the neuronal circuitry of the brain region defined by the reference frame (**Figure 3**). First, the cell typespecific boundary surfaces are integrated (**Figure 3A** shows a subsample of the cell type-boundaries) into the 3D reference frame. Next, the excitatory and inhibitory somata distributions are registered into the 3D reference frame. Excitatory and inhibitory soma positions are generated for all voxels in the soma density grid by multiplying the respective density values with the voxel volume (e.g., 503 μm3) and rounding to the nearest integer. 3D soma locations within a voxel are drawn from a uniform distribution. Based on the 3D location, each soma is further assigned to a unique substructure (barrel column) and cell type (**Figure 3B**). Each soma is assigned to the barrel column (modeled as a cylinder) that is closest to the 3D soma position. To determine the cell type, first the region containing the soma is determined by identifying its location with respect to the cell type boundary surfaces. The cell type is then selected randomly based on the relative frequency of cell types within this region (as specified by the respective *csv* file). Soma/dendrite morphologies are then placed at all computed soma positions (**Figure 3C**). For each soma, a dendrite morphology is chosen at random from all morphologies fulfilling the following criteria: (1) the cell type of the morphology is the same as the cell type assigned to the soma, (2) the morphology is registered to the sub-structure (e.g., column) that is closest to the new soma location, and (3) the soma location of the morphology is not further away from the new soma location than one voxel of the reference frame resolution (i.e., in case

**FIGURE 3 | Network assembly process. (A)** Standardized 3D reference frame of rat vibrissal cortex, with 3D organization of horizontal (i.e., barrel columns) and vertical (i.e., layers) structures. Every point in this brain region can be assigned to a barrel column and a cortical layer with 50 μm precision. **(B)** 3D distribution of 530,000 somata of 10 excitatory and 5 inhibitory cell types. **(C)** Replacement of somata with cell type-specific 3D dendrite morphologies. **(D)** Replacement of somata with cell type-specific 3D axon

morphologies. Shown here: Thalamocortical axons from VPM (black), intracortical axons of inhibitory interneurons (green). **(E)** Top: Close-up view of inset in **(B)**. Center: Close-up view of inset in **(C)**, showing the dendrites of a single L4 spiny stellate (L4ss) neuron (red) next to all dendrites from all cell types in the neighboring barrel column. Bottom: Close-up view of inset in **(D)**, showing a single thalamocortical VPM axon (blue) next to all axons from two cell types in the neighboring barrel column.

of rat vS1, the original soma location of the morphology and its location within the model are within ± 50μm along the z-axis of the respective column). The latter step guarantees that potential location-specific morphological properties are preserved within the resolution limit of the reference frame. Lastly, the morphologies are transformed as follows: (i) translation of the morphology to the new soma location; (ii) rotation around the soma, such that the vertical orientation is preserved and optionally (iii) cells with asymmetric projection patterns (e.g., polar dendrites) are rotated such that their orientation is retained (e.g., L4ss are rotated around the column axis to preserve projections toward the barrel column center). Third, axon morphologies of each cell type are inserted to match the number of somata/dendrites for each cell type (**Figure 3D**). For each soma, an axon morphology is chosen at random from all morphologies fulfilling the following criteria: (1) the cell type of the morphology is the same as the cell type assigned to the soma, and (2) the morphology is registered to the substructure (e.g., column) that is closest to the soma location. In contrast to dendrite morphologies, axon morphologies are not transformed to new soma locations to prevent that rotation/translation results in loss of location-specific projection patterns (e.g., L4ss neuron in vS1 display axons confined to the respective barrel column containing the soma (Egger et al., 2008) and hence translations would result in inappropriate innervation of septal areas). Long-range axons innervating the modeled brain region (i.e., their somata are located elsewhere) are registered in the same way as cell types with somata inside the brain region of interest, preserving their vertical and horizontal projection patterns with respect to the reference frame at 50 μm resolution. Then, long-range axons are up-scaled (i.e., duplicated) until the number of morphologies specified for this cell type (i.e., in the input *csv* file) is reached (e.g., VPM axons are up-scaled to meet the average number of somata per thalamic barreloid, e.g., 311 for the D2 whisker, Meyer et al., 2013). The result of the network assembly step is a dense representation of the neuronal circuitry of an entire brain region, where each neuron of a measured 3D soma distribution is represented by dendrite/axon morphologies of the appropriate cell type and location/orientation within the resolution of the geometrical reference frame (**Figure 3E**).

#### **CALCULATION OF STATISTICAL SYNAPTIC INNERVATION AT SUBCELLULAR LEVELS**

The dense statistical connectome *Iij* is computed as follows: First, for each presynaptic neuron *i* its axon is converted into a 3D bouton density at the resolution of the reference frame by clipping the axon of neuron *i* with all six faces of each voxel, summing up the length of the respective axon branches within the voxel and multiplying this value by the cell type- and substructure-specific bouton length density. Second, each postsynaptic neuron *j* is converted into a 3D PST density at the resolution of the reference frame by clipping the soma and dendrites of neuron *j* with all six faces of each voxel, summing up the length and the surface area of the respective dendrite branches and the soma and multiplying these values by the connection-specific PST length or area density. Dendrite and soma surface area are computed from the diameter values along the branches using trapezoidal integration. 3D PST densities of each postsynaptic neuron *j* for connections with neurons of cell type *T(i)* of the presynaptic neuron *i* in the voxel centered on −→*<sup>x</sup>* are determined as the sum of two terms *PSTspines* + *PSTsurface*) :

$$\begin{aligned} PST\_{\vec{j}}\left(\overrightarrow{\mathbb{X}}^{\flat},T(i)\right) &= \sum\_{\text{labels }L} l\_{\vec{j},L}(\overrightarrow{\mathbb{X}}^{\flat}) \cdot \lambda\_{T(i),T(j)}(L) \\ &+ \sum\_{\text{labels }L} a\_{\vec{j},L}(\overrightarrow{\mathbb{X}}^{\flat}) \cdot \alpha\_{T(i),T(j)}(L) \end{aligned}$$

Here, *label L* refers to a subcellular structure of the postsynaptic neuron, i.e., soma, basal dendrite or apical dendrite. *lj*,*L*( −→*x* ) is the total length of all compartments of label *L* of neuron *j* inside the voxel centered on −→*<sup>x</sup>* (in <sup>μ</sup>m). <sup>λ</sup>*T*(*i*),*T*(*j*)(*L*) is the length PST density (e.g., 1 spine perμm basal dendrite) for connections from neurons of type *T(i)* to neurons of type *T(j)* onto target structures with label *L* (in μm<sup>−</sup>1), as provided by spine density measurements and specified in the meta-connectivity spreadsheet. *aj*,*L*( −→*<sup>x</sup>* ) is the total surface area of all compartments of label *<sup>L</sup>* of neuron *<sup>j</sup>* inside the voxel centered on −→*<sup>x</sup>* (in <sup>μ</sup>m2). α*T*(*i*),*T*(*j*)(*L*) is the surface PST density (e.g., 0.4 PSTs perμm<sup>2</sup> soma surface) for connections from neurons of type *T(i)* to neurons of type *T(j)* onto target structures with label *L* (in μm<sup>−</sup>2). Whereas spine and bouton distributions can be measured (e.g., using the methods stated above), we derived surface PST densities by assuming that the total number of boutons *Ball*( −→*<sup>x</sup>* ) from all presynaptic cell types *T*(*i*) should match the number of total PSTs from all cell types *T*(*j*):

$$\sum\_{i,j} PST\_{surface,j}(\overrightarrow{\ \ }T\ (i)) = B\_{all}(\overrightarrow{\ \ \ }) - PST\_{spines}(\overrightarrow{\ \ \ \ )})$$

Reducing this equation to 1 dimension (i.e., collapsing the 3D densities to the z-axis), we fit the respective surface PST density values α*T*(*i*),*T*(*j*) using standard least squares algorithms (see fitting result in the online meta-connectivity list).

Third, the precision (across animal variability) of the geometrical reference frame determines the voxel resolution, i.e., the smallest scale at which axo-dendritic overlap can be calculated between morphologies obtained in different animals. Thus, locations of somata/dendrites/axons within a voxel cannot be further resolved and proximity of boutons and PSTs within a voxel cannot be used to estimate synaptic innervation. Instead, we assume that all PSTs within a voxel are equally likely to receive any bouton in the same voxel (i.e., independent synapse formation at resolutions smaller than the accuracy of the reference frame). The probability that neuron *<sup>j</sup>* is targeted by a bouton within the voxel centered on −→*<sup>x</sup>* is then given by:

$$p\_{\vec{f}}(\overrightarrow{\vec{\mathcal{X}}}, T(i)) = \frac{PST\_{\vec{f}}(\overrightarrow{\vec{\mathcal{X}}}, T(i))}{PST\_{\text{all}}(\overrightarrow{\vec{\mathcal{X}}}, T(i))}$$

Here, *PSTall*( −→*<sup>x</sup>* , *<sup>T</sup>*(*i*)) refers to the total number of potential postsynaptic contact sites for connections with presynaptic cell of type *T(i)* in the voxel centered on −→*<sup>x</sup>* , i.e.,

$$PST\_{all}(\overrightarrow{\ \mathbb{X}}^{}, T(i)) = \sum\_{j} PST\_{j}(\overrightarrow{\ \mathbb{X}}^{}, T(i))$$

If *Bi* boutons from neuron *<sup>i</sup>* are present in the voxel at −→*<sup>x</sup>* , the probability that neuron *j* is targeted by *n* of these boutons is given by the binomial distribution:

$$P(n; p\_j, B\_i) = \binom{B\_i}{n} p\_j^n (1 - p\_j)^{B\_i - n}$$

Average values for *Bi* and *pj* in our networks are *O(*101*)*-*O(*102*)* and *O(*10−3*)*, respectively. Given the ∼5 orders of magnitude differences between *Bi* and *pj*, we can approximate the binomial distribution by a Poisson distribution (i.e., *Bi* → ∞ and *pj*→ 0):

$$P\left(n; \tilde{I}\_{\vec{i}\vec{j}}(\overrightarrow{\vec{\chi}}\,)\right) = \frac{\tilde{I}\_{\vec{i}\vec{j}}^n(\overleftarrow{\vec{\chi}}\,)}{n!} \exp\left(-\tilde{I}\_{\vec{i}\vec{j}}(\overleftarrow{\vec{\chi}}\,)\right)$$

Here, we defined the average innervation ˜*Iij*( −→*<sup>x</sup>* ) from neuron *<sup>i</sup>* to neuron *<sup>j</sup>* in the voxel at −→*<sup>x</sup>* :

$$\vec{I}\_{\vec{i}\vec{j}}(\overrightarrow{\vec{x}}\,):=B\_{\vec{i}}(\overrightarrow{\vec{x}}\,)\cdot p\_{\vec{j}}(\overrightarrow{\vec{x}}\,)$$

The connectivity statistics between any two neurons *(i,j)* can thus be described by the 3D scalar field ˜*Iij*( −→*<sup>x</sup>* ). The probability of finding a connection between any two neurons *i* and *j* within a specific voxel located at −→*<sup>x</sup>* is further given by:

$$\rho\_{\vec{\eta}}(\overrightarrow{\chi}^{\flat}) = 1 - P(n = 0; \vec{I}\_{\vec{\eta}}(\overrightarrow{\chi}^{\flat})) = 1 - \exp(-\vec{I}\_{\vec{\eta}}(\overrightarrow{\chi}^{\flat})) $$

Because we assume that synapses in different voxels are formed independently of another, the total probability of finding a connection between two neurons *i* and *j* is:

$$\begin{aligned} p\_{\vec{i}\vec{j}} &= 1 - \prod\_{\vec{x}} P(\eta = 0; \tilde{I}\_{\vec{i}\vec{j}}(\overrightarrow{\vec{x}}\,)) = 1 - \exp(-\sum\_{\vec{x}} \tilde{I}\_{\vec{i}\vec{j}}(\overrightarrow{\vec{x}}\,)) \\ &= 1 - \exp(-I\_{\vec{i}\vec{j}}) \end{aligned}$$

Here, *Iij* : = -−→*<sup>x</sup>* ˜*Iij*( −→*<sup>x</sup>* ) is the total (i.e., summed over all voxels) average innervation from neuron *i* to neuron *j*. Intuitively, *Iij* is the expected number of synapses connecting neuron *i* to neuron *j*.

#### **CALCULATION OF STATISTICAL SYNAPTIC INNERVATION AT CELL TYPE LEVELS**

Using the innervation matrix *Iij* for all pairs of neurons in the network, analyses can be extended to the population level, allowing comparison with pairwise connectivity measurements performed *in vitro/vivo*. *In silico*, pairwise connectivity between two populations (pre: A and post: B) can be described by three experimentally accessible parameters: the convergence *Cb*, i.e., the fraction of the presynaptic population connected to a single postsynaptic neuron *b* ∈ *B*, the divergence *Da*, i.e., the fraction of the postsynaptic population targeted by a single presynaptic neuron *a* ∈ *A*, and the connection probability *PAB*, i.e., the probability that any two neurons *a* ∈ *A*, *b* ∈ *B* are connected. We can now define these three quantities in terms of the neuron-to-neuron connection probability *pij* = 1 − exp( − *Iij*) introduced above:

$$C\_b = \langle p\_{ab} \rangle\_{a \in A}$$

$$D\_a = \langle p\_{ab} \rangle\_{b \in B}$$

$$P\_{AB} = \langle p\_{ab} \rangle\_{a \in A, b \in B}$$

Here, ···*<sup>a</sup>* <sup>∈</sup> *<sup>A</sup>* is the ensemble average across all neurons *a* in population *A* etc. Additionally, we can compute the distribution of the number of synapses per connection *nAB* between these two populations by averaging across the individual synapse number distributions *nij* := *P*(*n*;*Iij*):

$$\langle n\_{AB} = \langle n\_{ab} \rangle\_{a \in A, b \in B} = \langle Poisson(I\_{ab}) \rangle\_{a \in A, b \in B}$$

**RESULTS**

#### **APPLICATION EXAMPLE 1: DENSE 3D MODEL OF RAT vS1**

Based on the anatomical input data (**Figure 2**) specified in the Methods section, we used *NN* to generate an average dense model of entire rat vS1 (**Figure 3**). The model consists of 10 excitatory and 5 inhibitory axo-dendritic cell types, in 24 barrel columns. The total volume of the vS1 model was 6.4 mm<sup>3</sup> (Egger et al., 2012).

First, the average 3D distributions of excitatory and inhibitory somata were registered to the reference frame and somata were placed and assigned to cell types (**Figure 3B**) and anatomical substructures as described above (i.e., each soma contains four labels: the nearest barrel column, whether the soma is inside the column or within the septum, the cell type, excitatory or inhibitory). The total number of neurons within the model was 529926, with 462436 being excitatory and 67490 being inhibitory. Neuron numbers and their 3D distributions are within the mean ± SD (529715 ± 39104) of the measured soma distributions at 50μm resolution (Meyer et al., 2013). Next, *NN* replaced each soma by appropriate 3D soma/dendrite/axon morphologies, using the upscaling routines specified in the Method section (**Figures 3C–E**). The somata and dendrites of each neuron were converted into 3D PST surface densities, reflecting the respective surface areas multiplied with connection-specific PST distributions. Likewise, dendrites of excitatory neurons and axons of all neurons were converted into 3D PST spine and bouton distributions, respectively (see meta-connectivity list online for all values). The resultant total soma/dendrite surface area (i.e., of all neurons in rat vS1) was 1.9 × 10<sup>10</sup> μm2. The total number of spines was 5.2 × 109, and the total number of boutons was 6.4 × 109.

The average bouton (synapse) density across entire rat vS1 was 1 bouton per μm3, which matches previous measurements (0.94 ± 0.12 synapses perμm3) of synapse densities using electron-microscopic tomography on small tissue (∼200μm3) volumes of rat vS1 (Merchan-Perez et al., 2014). Hence, the up-scaled model of entire rat vS1 resembles the average structural organization of this brain region at mesoscopic (geometry within 50μm inter-animal variability), microscopic (cellular distributions within 7% inter-animal variability) and nanoscopic (bouton densities) scales. Consequently, within the margins specified by the respective inter-animal variability (SDs of geometry, soma distribution, cell type-specific dendrite/axon projections, and spine/bouton densities), we consider the dense 3D model of rat vS1 as a precise average representation of this particular piece of neuronal tissue.

#### **APPLICATION EXAMPLE 2: STATISTICAL CONNECTOME OF RAT vS1**

Within the dense model of rat vS1, we used *NN* to determine structural overlap of PSTs and presynaptic boutons between all pairs of neurons, always taking all neurons present in the respective overlap volumes into account. **Figure 4** illustrates this process on the example of one L4ss neuron (*j*) being innervated by one thalamocortical axon (*i*) originating in VPM (**Figures 3C–E**). First, *NN* determines the bounding box (BB) around the dendrites of the postsynaptic neuron (**Figure 4A** left) and calculates the number of PSTs for each 50μm voxel within the BB. In case of VPM neurons innervating L4ss (i.e., excitatory cell types), PSTs are limited to spines (Schoonover et al., 2014) as specified in the meta-connectivity input file (see Methods). The exemplary L4ss neuron comprises a total of 4640 spines, with a maximum of 523 spines per voxel (**Figure 4A** right). Second, *NN* determines the number of presynaptic boutons present in any voxel where dendrites and axons of the two neurons overlap. For the present example, the particular VPM axon has a total of 2964 boutons in the overlap volume, with up to 94 boutons per voxel.

However, within the overlap volume, dendritic spines originating from other excitatory neurons are present, rendering as equally likely contact sites for the VPM boutons as the spines of the exemplary L4ss neuron. The total number of spines within the BB of the overlap volume was 2.1 × 107, with a maximum

**FIGURE 4 | Computation of statistical innervation between neurons in dense networks. (A)** Left: VPM axon (blue) and L4ss dendrite (red) from **Figures 3C–E**. The grid used for computing bouton, spine and dendrite surface densities is shown for scale. Right: Calculation of the 3D innervation density ˜ *Iij*( −→*<sup>x</sup>* ) from the VPM axon to the L4ss dendrite. The gray-colored squares in the grid represent the maximum projection of the respective pre/postsynaptic quantity. Scale bar shows maximum value of the respective pre/postsynaptic quantity in the grid. Above each scale bar, the total number of pre/postsynaptic elements in the grid is shown. **(B)** Resulting subcellular 3D innervation density ˜ *Iij*( −→*<sup>x</sup>* ). **(C)** Left top: Connection probability from neuron *i* to neuron *j* as a function of the total innervation *Iij*. Bottom: Possible range of the number of synapses from neuron *i* to neuron *j*, *nij* (95th percentile for *n* > 0) as a function of the total innervation *Iij*. Right: Four possible synapse distributions and their probability of occurrence for the innervation from the VPM axon to the L4ss dendrite, computed from the 3D innervation density in **(B)**.

of 130,000 spines per voxel. Furthermore, VPM axons could also target somata and/or dendritic shafts of inhibitory interneurons (Staiger et al., 1996, as specified in the meta-connectivity input file), where a total of 1.8 × 10<sup>6</sup> PSTs on inhibitory surfaces are present within the BB of the overlap volume, with a maximum of 13,500 surface PSTs per voxel. Consequently, the 3D innervation field ˜*Iij*( −→*<sup>x</sup>* ) between the dendrites of the L4ss neuron (*j*) and the axon of the VPM neuron (*i*), was determined with respect to all other potential PSTs (i.e., excitatory and inhibitory) present in the overlap volume. In addition, the number of all available target sites (2.3 × 107) was four orders of magnitude larger than the number of spines/boutons from the individual neurons, justifying the approximation of the binomial connection probability by a Poisson distribution.

The resultant 3D innervation field ˜*Iij*( −→*<sup>x</sup>* ) between the two exemplary neurons is shown in **Figure 4B**. Summing across all voxels results in the total innervation *Iij* = 0.66, with a maximal innervation of 0.11 per voxel. This innervation number corresponds to a pairwise connection probability of *pij* = 0.48, and to a range of putative synapses between *i* and *j* of *nij* = 0–3 (**Figure 4C** left). Thus, even though the axonal arbor of the example VPM neuron displays substantial overlap with the dendritic arbor of the example L4ss neuron, the probability of these two neurons being connected according to our quantitative implementation of Peters' rule is less than 50%. Because there are on the order of 1000 other potential postsynaptic target neurons projecting dendrites into the overlap region, approaches that calculate connectivity from structural overlap without normalization by the total number of PSTs (e.g., Brown and Hestrin, 2009) will result in gross overestimation of connection probabilities.

In consequence, we argue that structural axo-dendritic overlap should never be calculated from sparse morphological data alone and that connectivity measurements by Peters' rule should not be presented in a binary fashion (i.e., overlap equals connectivity, no overlap equals no connectivity). Instead, structural overlap in the present form results in innervation measurements at subcellular (reference frame) resolution, which can be converted into pairwise connection probabilities and a range of putative synapse numbers. In case of the present example, the overlap between 2964 VPM boutons with 4640 L4ss spines did thus not result in a connection probability of 1, but instead, the probability that the two neurons were unconnected was 52%, that they were connected by a single synapse was 34%, and by two or three synapses was 12% and 2%, respectively (**Figure 4C** right).

#### **APPLICATION EXAMPLE 3: COMPARISON OF** *IN SILICO* **WITH** *IN VITRO/VIVO* **CONNECTIVITY**

In the following, we compare our *in silico* measurements of pairwise connection probabilities and putative synaptic contact sites with previously reported measurements in rat vS1 using (i) paired recording/reconstruction between L4ss neurons *in vitro* (Feldmeyer et al., 1999; Petersen and Sakmann, 2000), (ii) dual recordings and correlation analysis between VPM and L4, L5A, L5B, and L6 neurons *in vivo* (Bruno and Sakmann, 2006; Constantinople and Bruno, 2013), and (iii) electron-microscopic reconstructions of synaptic contact sites between VPM and individual L4ss neurons (Schoonover et al., 2014). For comparison, we restricted *in silico* connectivity measurements between the respective cell types to neurons located within a single barrel column (D2, **Figures 5A–C**) and averaged connectivity measurements across all neurons of the respective D2 populations.

The D2 column comprised 17810 excitatory neurons including 4657 neurons of L4 cell types (2480 L4ss; 1707 L4sp; 470 L4py), 1386 L5st, 1103 L5tt, 1391 L6cc, 767 L6inv, and 4048 L6ct neurons. Further, the D2 column model contained 2545 inhibitory neurons and 311 thalamocortical axons originating in the D2 barreloid (Meyer et al., 2013) of the VPM. Computing the innervation *Iij* for all pairs of VPM and L4, L5st, L5tt, and L6 neurons, respectively, as well as for all pairs of L4ss neurons, allowed calculating the respective neuron-to-neuron connection probabilities *pij* and the average distribution of the number of synapses per connection *nAB* (**Figure 5D**). Further, we computed the cell type averages of (i) convergence between L4ss neurons, as well as between VPM and L4, L5st, L5tt, and L6 neurons in our D2 column model, and (ii) the 99th percentile of the number of putative synapses, and compared these numbers to experimental results (**Figure 5E**). The *in silico* L4ss-to-L4ss convergence measurements yielded a value of 0.31 ± 0.10, compared to 0.31– 0.36 as measured *in vitro* (Feldmeyer et al., 1999; Petersen and Sakmann, 2000). VPM-to-L4 convergence was 0.40 ± 0.13 (*in silico*), compared to 0.43 ± 0.08 (*in vivo*). VPM-to-L5st convergence was 0.29 ± 0.10 (*in silico*), compared to 0.17 ± 0.12 (*in vivo*). VPM-to-L5tt convergence was 0.38 ± 0.10 (*in silico*), compared to 0.44 ± 0.17 (*in vivo*) and VPM-to-L6 convergence was 0.19 ± 0.09 (*in silico*), compared to 0.09 ± 0.14 (*in vivo*) (Bruno and Sakmann, 2006; Constantinople and Bruno, 2013). The *in silico* measurements of pair-wise connection probabilities matched the previously reported cell type-specific values within one SD. Interestingly, even though somata of the different cell types intermingled within and across cortical layers, our model predicted cell type-specific differences in synaptic connectivity within layers (e.g., VPM to L5st vs. L5tt). These findings are in line with previous reports that revealed that synaptic connectivity is in general cell type- and not layer-specific (Shepherd et al., 2005; Brown and Hestrin, 2009). To further evaluate how the sample size of morphological reconstructions affects our connectivity estimates, we repeated these measurements and progressively increased the number of VPM axons used for up-scaling from 1 to 14. We found that increasing the sample size beyond ∼5 VPM axons did not change our results (**Figure 5F**), indicating that at least 5 axon reconstructions are required to capture the variability of projection patterns (at 50 μm resolution) within a cell type.

Finally, the range of putative synapses per connection for L4ss-to-L4ss connections was 1–5 (*in silico*), compared to 2–5 (*in vitro*, Feldmeyer et al., 1999). For VPM-to-L4 connections, the range was 1–6 (*in silico*), compared to 1–6 (*in vivo*, Schoonover et al., 2014). Whereas the *in silico* ranges of putative synapses per connection matched the previous *in vitro/vivo* results, our predictions showed that the most likely scenario for interconnected L4ss should be that they share only a single synaptic

connection. However, reconstructions from paired-recordings revealed a more bimodal distribution, i.e., pairs of L4ss share either no contacts, or if they are connected, they share more than one contact (Feldmeyer et al., 1999). This potential discrepancy

distribution of the number of synapses per connection *nij* for the four postsynaptic cell types in **(B)** and the two presynaptic cell types in **(C)**. **(E)** Comparison of pair-wise connectivity statistics in the model D2 column (*in*

> could arise from limitations to identify weakly connected L4ss (i.e., just one synaptic contact) using paired-recordings, or could indicate that our assumption of independent synapse formation is not justified for L4ss.

thalamocortical input to these four cell types as a function of the VPM axon

sample size. Bottom: Standard deviation of the convergence of

sample size.

#### **APPLICATION EXAMPLE 4: ANALYSIS OF HIGHER-ORDER CONNECTIVITY PATTERNS**

Because the average dense model of rat vS1 resembles the structural organization of this neuronal tissue at meso-, micro- and nanoscopic scales (see Application example 1) and structural overlap measurements within the model reproduced cell typespecific pairwise connectivity measurements (see Application example 3), we investigated whether the resultant dense statistical connectome can be used to investigate higher-order connectivity patterns beyond pairwise measurements.

The simplest higher-order pattern to be investigated is connectivity between three neurons (Sporns and Kotter, 2004; Song et al., 2005), in the following referred to as triplet motifs. To do so, we calculated the innervation matrix *Iij* (i.e., dense statistical connectome) for the population of L4ss neurons within the D2 barrel column and randomly selected three neurons from the matrix (**Figures 6A–B**). The six entries specifying innervation between the three neurons in the *Iij* matrix yield connectivity statistics about each possible connection in terms of triplet motifs. Triplet motifs are illustrated as triangles of nodes (i.e., each node representing one of the three respective neurons, **Figure 6C**), connected by uni- and/or bidirectional edges (i.e., each edge representing synaptic connections between two neurons, and the direction specifies pre- and postsynaptic partners, respectively). For example, the innervation from neuron 1 to neuron 2 is determined by the matrix entry *I*<sup>12</sup> = 0.68, which corresponds to a pairwise connection probability of *p*<sup>12</sup> = 0.49. This can be interpreted as the probability that the triplet motif contains an edge

**FIGURE 6 | Higher-order connectivity in dense statistical connectomes. (A)** The connection matrix between L4ss neurons of the D2 barrel in rat vibrissal cortex. Each entry represents the innervation *Iij* between pre- and postsynaptic neurons *i* and *j*. Connections between three neurons are highlighted. **(B)** Zoom into the connection matrix (see box in **A**) around the matrix entry representing the connection from neuron 1 to neuron 2. **(C)** Left: Innervation between three example L4ss neurons (highlighted in **A**), and the respective connection probabilities and strengths (see also **Figure 4C**). Right: One possible configuration of a three-neuron motif between these three

neurons. Bottom: Summation over all configurations resulting in this motif (motif ID 7) gives the total probability of occurrence of this motif for these three neurons and the L4ss network, respectively. **(D)** Probability of finding each non-redundant three-neuron motif, calculated from the pairwise innervation. All 16 non-redundant motifs are listed at the bottom. Top: Motif distribution for the three neurons from **(C)**. Bottom: Motif distribution for the L4ss network from **(A)**. **(E)** Deviation of motif occurrence probability from expected value based on the average connection probability of L4ss neurons. Top: Three neurons from **(C)**. Bottom: L4ss network from **(A)**.

from node 1 to node 2. Conversely, the probability that this particular edge is missing is 1-*p*<sup>12</sup> = 0.51.

In general, three nodes can be connected by 64 different motifs of bidirectional edges. However, multiple motifs are redundant (e.g., 1 connected to 2 and no other edge present is the same motif as 2 connected to 3 and no other edge is present). Thus, the 64 triplet motifs can be reduced to 16, of which 7 contain three edges (three-connected), 6 contain two edges (two-connected, 2 contain one edge (one-connected) and 1 motif (no edges) represents the absence of any connections between the three neurons (**Figures 6D–E**). Using the pairwise connection probabilities for the three example neurons (i.e., *p*12, *p*21, *p*13, *p*31, *p*23, *p*32) allows computing the probability of finding each triplet motif by multiplying the probability of finding/not finding all six possible edges. For example, the probability that the three neurons are connected according to motif 7 (i.e., three-connected by unidirectional edges) is computed as follows:

$$\mathfrak{p} = (1 - p\_{12}) \cdot \mathfrak{p}\_{21} \cdot (1 - p\_{13}) \cdot \mathfrak{p}\_{31} \cdot (1 - p\_{23}) \cdot \mathfrak{p}\_{32} = 0.092$$

There are five other possibilities of arranging connections between these three neurons that result in the same triplet motif. Thus, the total probability of finding this triplet motif among these three neurons is the sum over these six individual connection arrangements, resulting in a total probability of *p*<sup>123</sup> = 0.146 (**Figure 6C**).

In the same way, we calculated the probability of occurrence for each of the 16 possible non-redundant triplet motifs, illustrated as a motif spectrum (Sporns and Kotter, 2004, **Figure 6D top**). Further, we extended the motif analysis to the entire population of L4ss neurons in the D2 model, by repeating the motif probability calculations 10 times for 2000 randomly selected neuron triplets. Each triplet was allowed to share at most one neuron with any other triplet. For each triplet, we computed the motif spectrum as described for the example neurons, and averaged these spectra to obtain the distribution of triplet motifs within the L4ss network (**Figure 6D bottom**). Finally, we compared the triplet motif spectrum of the L4ss network in a D2 barrel column with the distribution expected when assuming uniform connectivity. This scenario represents the case where average pairwise connection probabilities are known (e.g., *p* = 0.31 between L4ss neurons, as determined statistically or by paired recordings) and connectivity within the population is assumed to be homogenous (i.e., lack of variability within a population caused by cell- and/or location-specific morphological variations).

The deviations between the "uniform" spectra of triplet motifs from those predicted by the dense statistical connectome were substantial (**Figure 6E**). For example, motif 2 (unidirectional loop) is much less likely (∼30%) compared to assuming uniform connectivity, whereas the remaining three-connected motifs are in general more likely. In contrast, two-connected motifs are in general less likely. Thus, the average dense model of the L4ss network yields high-order connectivity patterns that are significantly (*p* < 0.0001, z-score > 5 for all motifs except for motifs 8 and 15) different from a uniformly connected random network with equal pairwise connection probability.

#### **DISCUSSION**

In the present study, we introduced a novel quantitative approach for measuring synaptic connectivity at subcellular resolution and mesoscopic scales. The measurements are based on sparse morphological datasets, integrated into a common anatomical reference frame that allows up-scaling to an average dense model of the neuronal circuitry and determining axo-dendritic overlap between any two neurons in the model. Illustrating our approach for excitatory thalamo- and intracortical circuits in rat vS1, we (i) defined the mandatory anatomical information required to generate average dense circuit models, (ii) introduced the interactive software environment *NN* to calculate Peters' rule with respect to all neurons present in axo-dendritic overlap volumes, and (iii) found that our cell type-specific *in silico* measurements are in line with previously reported *in vitro/vivo* data.

#### **PREVIOUS APPROACHES TO GENERATE AVERAGE NEURONAL NETWORK MODELS**

In recent years, multiple approaches began integrating morphological data to generate anatomically well-constrained neuronal network models. However, compared to *NN*, where synaptic connectivity is measured within the circuit model itself, previous approaches require synaptic connectivity data as input. For example, *neuroConstruct* (Gleeson et al., 2007) connects randomly distributed neurons to networks using average pairwise connection probabilities, thereby neglecting for example location-specific differences in connectivity. *BlueBuilder* (Kozloski et al., 2008), developed within the BlueBrainProject (Markram, 2006), generates neuronal networks, where *in vitro* labeled dendrite and axon morphologies are integrated into an idealized cortical column (i.e., neglecting column-specific geometry and soma distributions) and putative dendrite-axon contacts (at a predefined distance) are pruned until they match predefined connectivity statistics (originating from paired-recordings *in vitro,* Ramaswamy et al., 2012).

Therefore, we argue that our approach can be regarded as more general for investigating structural organization principles of the neuronal circuitry. First, the present concept relies on definition of a standardized 3D reference frame that describes the average geometry of the brain structure (and substructures) of interest. Consequently, no assumptions about the mesoscopic organization of neuronal circuits are required. For example, in case of rat vS1, we previously reported that each cortical barrel column has a specific diameter, height and orientation, and barrel columns representing whiskers located within different rows along the animals' snout have substantially deviating volumes (Egger et al., 2012). Such whisker row-specific organization patterns may substantially influence connectivity, e.g., increased connectivity between columns in the same row compared to across whisker rows, an effect that would be missed by assuming that cortical columns are elementary and uniform structural building blocks (Markram, 2006).

Second, the up-scaling to a dense average circuit model is based on measured 3D distributions of excitatory and inhibitory neurons. Consequently, no assumptions about the microscopic (i.e., cellular) organization of the neuronal circuits are required. For example, in case of rat vS1, we previously reported that separation between individual barrel columns is only present within the distribution of excitatory neurons in L4, where neuron densities are significantly lower in the septum, compared to densities in barrel columns (Meyer et al., 2013). In contrast, neither excitatory distributions in superficial and infragranular layers, nor densities of inhibitory somata throughout the cortical sheet displayed differences between columns and septa. Such excitatory/inhibitory location-specific cellular organization patterns may substantially influence connectivity, e.g., the relative fraction of excitatory to inhibitory connections may be higher within the L4 barrel compared to septa and/or other layers (van Vreeswijk and Sompolinsky, 1996), effects that would be missed by assuming uniform and/or randomly distributed neuron somata (Rockel et al., 1980; Carlo and Stevens, 2013).

Finally, connectivity measurements are based upon complete 3D reconstructions of *in vivo* labeled neurons. Consequently, no assumptions about (sub)cellular organization of the neuronal circuits are required. For example, in case of rat vS1, we previously reported that axons of excitatory neurons are in general not confined to the dimensions of a single cortical column (Oberlaender et al., 2011). Thus, extrapolation of dendrite/axon morphologies from *in vitro* labeling/reconstruction (Hill et al., 2012; Ramaswamy et al., 2012) will miss cell type and/or location-specific horizontal axonal projection patterns, resulting in assessments of connectivity by structural overlap that are biased toward close-by neurons (e.g., within columns compared to across columns). Further, substantial cutting of dendrites/axons during multi-electrode recordings *in vitro* will result in unsystematically hampered measurements of pairwise connection probabilities (i.e., depending on cell type, location and distance of the recorded neurons), questioning whether constraining connectivity within neuronal network models by such data (Lefort et al., 2009; Perin et al., 2011) will result in anatomically realistic representations of the neuronal circuitry.

In summary, because organizational principles of the neuronal circuitry are generally influenced by brain region- and speciesspecific mesoscopic, cellular and subcellular quantities, generation of well-constrained network models should not be based on assumptions, but on measurements of these quantities instead. Assessments of these quantities provide information about the respective variability across animals, allowing to determine (i) the appropriate resolution for connectivity measurements within an average representation of the neuronal circuitry and (ii) how representative the average model is (i.e., in terms of SDs of (sub)cellular properties).

#### **VALIDITY OF PETER'S RULE**

The validity of measuring synaptic innervation by structural overlap between dendrites and axons has been discussed controversially (Stepanyants and Chklovskii, 2005; Shepherd et al., 2005; Mishchenko et al., 2010; Briggman et al., 2011). Specifically, reconstructions at electron-microscopic resolution provided evidence that proximity of axons and dendrites at submicron resolution in general does not imply that the two neurons form synaptic contacts (Mishchenko et al., 2010). Further, pairwise connection probabilities obtained by paired-recordings *in vitro* were considered to contradict measurements of structural overlap after reconstructing morphologies of the respective neuron pairs (Shepherd et al., 2005; Brown and Hestrin, 2009).

However, to date, neither the appropriate spatial resolution to apply Peters' rule, nor a coherent framework to obtain structural overlap in terms of connection probabilities with respect to all neurons projecting dendrites into the overlapping volume existed. We provide both. First, the resolution for determining structural overlap within an average network model (i.e., integration of morphological data from different animals) is defined by the inter-animal variability of the geometrical reference frame used to integrate the data. Increasing the voxel size will provide less accurate connectivity estimates (i.e., cells or cell types that do not overlap at 50μm resolution may overlap at 100μm scales). In contrast, decreasing the voxel size below the precision of the registration framework would imply inappropriate accuracy. Hence, implications of synaptic innervation below the resolution limit, or even at submicron resolution, are beyond the limits of Peters' rule. Instead, measurements of subcellular synapse locations remain exclusive to reconstructions at electron-microscopic levels (but see, Druckmann et al., 2014; Schoonover et al., 2014).

Second, we illustrate that in general, millions of potential postsynaptic target sites (PSTs) from unstained neurons are present within the overlap volume of two stained neurons. Hence, when normalizing innervation by the total number of PSTs, the resultant innervation and pairwise connection probabilities are small. In case of the exemplary calculation between the dendrites of one L4ss and one thalamocortical VPM axon in rat vS1, overlap between ∼4500 spines and ∼3000 boutons did not result in a connection probability of one, but instead there is a 52% chance that the two neurons are unconnected. Hence, connectivity measurements by structural overlap have to be performed with respect to *all* neurons, for example using the present approach of generating an average dense model of the brain region of interest. Consequently, the absence of synaptic contacts at touching dendrites and axons in sparsely labeled tissue should not be regarded as a violation of Peters' rule.

#### **HIGHER-ORDER CONNECTIVITY IN DENSE STATISTICAL AND ELECTRON-MICROSCOPIC CONNECTOMES**

In addition to illustrating that pairwise connection probabilities determined by structural overlap are in line with measurements using conventional recording/reconstruction techniques, we provide a strategy that allows investigation of higher-order connectivity patterns within dense statistical connectomes. On the example of the population of L4ss neurons located within a barrel of rat vS1, we determined the probabilities of obtaining all possible three-neuron (triplet) motifs and compared the resultant motif spectra with those to be expected from randomly connected networks that have the same average pairwise connection probability. Interestingly, we found that the two spectra displayed significant deviations. For example, unidirectional triplets (i.e., recurrent loops) are much less likely to occur within the L4ss population compared to randomly connected networks. In contrast, other triplet configurations were significantly more likely. Arguably such deviations can be considered as evidence for specificity in the organization of the neuronal circuitry, for example caused by inhomogeneous distributions of somata (e.g., excitatory soma density decreases from the barrel center toward the borders), dendrites and axons (e.g., polar dendrite morphologies pointing toward the barrel center).

Hence, we suggest using statistical spectra of higher-order motifs as a definition of cell type-specific "structural fingerprints" for the respective neuronal circuits. Comparing these fingerprints with dense connectomes obtained at electron-microscopic resolution, will indicate whether such cell type-specific higherorder patterns can be explained by the meso- and microscopic organization of the network, or whether additional specificity originates at nanoscopic scales. In consequence, not the absence of synapses between touching dendrites/axons, but deviations of higher-order connectivity patterns observed in statistical and electron-microscopic dense connectomes should be considered as evidence for violations of statistical network organization.

#### **CONCLUSION**

We present a novel concept for measuring pairwise and highorder connectivity patterns at subcellular resolution and mesoscopic scales. We provide the required software to generate average dense circuit models, to calculate structural overlap, and to convert these measurements into dense statistical connectomes. Further, we describe the anatomical data necessary to assess structural organizational principles of the neuronal circuitry without assumptions about homogeneity at meso/microscopic and subcellular scales. Given that the required anatomical data is available, we consider our approach as generalizable to other brain structures and species. This sets the stage to generate well-constrained network models that allow simulating sensoryevoked signal flow to provide unprecedented insight into the interplay between the structural organization and function of the respective local and long-range neuronal circuits.

#### **ACKNOWLEDGMENTS**

We thank Bert Sakmann for discussions and his generous support. Funding was provided by the Max Planck Florida Institute for Neuroscience (Marcel Oberlaender), the Studienstiftung des deutschen Volkes (Robert Egger), the Bernstein Center for Computational Neuroscience, funded by German Federal Ministry of Education and Research Grant BMBF/FKZ 01GQ1002 (Daniel Udvary, Robert Egger and Marcel Oberlaender), the Max Planck Institute for Biological Cybernetics (Daniel Udvary, Robert Egger and Marcel Oberlaender), the Werner Reichardt Center for Integrative Neuroscience (Marcel Oberlaender), the Max Planck Institute of Neurobiology (Vincent J. Dercksen) and the Zuse Institute Berlin (Vincent J. Dercksen and Hans-Christian Hege).

#### **REFERENCES**


connectivity in neocortical neural microcircuits. *Proc. Natl. Acad. Sci. U.S.A.* 109, E2885–E2894. doi: 10.1073/pnas.1202128109


structure-function relationship of individual cortical neurons. *J. Vis. Exp.* e51359. doi: 10.3791/51359


tracing from single, genetically targeted neurons. *Neuron* 53, 639–647. doi: 10.1016/j.neuron.2007.01.033

Woolsey, T. A., and Van der Loos, H. (1970). The structural organization of layer IV in the somatosensory region (SI) of mouse cerebral cortex. The description of a cortical field composed of discrete cytoarchitectonic units. *Brain Res.* 17, 205–242. doi: 10.1016/0006-8993(70)90079-X

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 July 2014; accepted: 22 October 2014; published online: 10 November 2014.*

*Citation: Egger R, Dercksen VJ, Udvary D, Hege H-C and Oberlaender M (2014) Generation of dense statistical connectomes from sparse morphological data. Front. Neuroanat. 8:129. doi: 10.3389/fnana.2014.00129*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Egger, Dercksen, Udvary, Hege and Oberlaender. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Context-aware modeling of neuronal morphologies

#### *Benjamin Torben-Nielsen1\* and Erik De Schutter 1,2*

<sup>1</sup> Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna son, Japan

<sup>2</sup> Theoretical Neurobiology and Neuroengineering, University of Antwerp, Wilrijk, Belgium

#### *Edited by:*

Hermann Cuntz, Ernst Strüngmann Institute in Cooperation with Max Planck Society, Germany

#### *Reviewed by:*

Ivan Soltesz, University of California at Irvine, USA Arjen Van Ooyen, VU University Amsterdam, Netherlands

#### *\*Correspondence:*

Benjamin Torben-Nielsen, Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa 904-0495, Japan e-mail: btorbennielsen@gmail.com Neuronal morphologies are pivotal for brain functioning: physical overlap between dendrites and axons constrain the circuit topology, and the precise shape and composition of dendrites determine the integration of inputs to produce an output signal. At the same time, morphologies are highly diverse and variant. The variance, presumably, originates from neurons developing in a densely packed brain substrate where they interact (e.g., repulsion or attraction) with other actors in this substrate. However, when studying neurons their context is never part of the analysis and they are treated as if they existed in isolation. Here we argue that to fully understand neuronal morphology and its variance it is important to consider neurons in relation to each other and to other actors in the surrounding brain substrate, i.e., their context.We propose a context-aware computational framework, NeuroMaC, in which large numbers of neurons can be grown simultaneously according to growth rules expressed in terms of interactions between the developing neuron and the surrounding brain substrate. As a proof of principle, we demonstrate that by using NeuroMaC we can generate accurate virtual morphologies of distinct classes both in isolation and as part of neuronal forests. Accuracy is validated against population statistics of experimentally reconstructed morphologies. We show that context-aware generation of neurons can explain characteristics of variation. Indeed, plausible variation is an inherent property of the morphologies generated by context-aware rules. We speculate about the applicability of this framework to investigate morphologies and circuits, to classify healthy and pathological morphologies, and to generate large quantities of morphologies for large-scale modeling.

**Keywords: dendrite, morphology, computational modeling, growth cone, extracellular space**

#### **INTRODUCTION**

Neuronal morphology is important for brain functioning. The interplay between dendritic and axonal morphology limits the microcircuits (Peters and Payne, 1993), and the shape and composition of dendrites define how inputs are integrated to produce outputs (London and Häusser, 2005; Silver, 2010; Torben-Nielsen and Stiefel, 2010). As such, it is not surprising that changing morphological traits and morphological anomalies are implicated in neuro-developmental and degenerative diseases (Kaufmann and Moser, 2000; Dierssen and Ramakers, 2006). Nevertheless, neurons come in all shapes and sizes. The diversity is said to express the difference between neuron classes while variation represents the intra-class differences (Soltesz, 2005). Diversity originates from the genetic make-up of neurons (Jan and Jan, 2010; Tavosanis, 2012). By contrast, the variance can be assumed to originate from interactions between the developing neuron and the brain substrate, its context (McAllister, 2000; Scott and Luo, 2001; Landgraf and Evers, 2005; Jan and Jan, 2010; Tavosanis, 2012). Indeed, in both axonal (Mortimer et al.,2008) and dendritic (Gao,2007; Cove et al., 2009) development a plethora of microscopic interactions have been revealed to influence branching patterns and "guide" the direction of growth. Thus, a neuron's context holds the key to understanding morphological variance.

Unfortunately, the context surrounding a neuron has historically been neglected in the analysis and quantification of morphologies. In a highly influential work, Hillman argued that dendritic morphologies could be described completely and accurately by a finite set of morphometric descriptors (Hillman, 1979). Thus, the idea was born that careful description of morphometrics measured from isolated neurons would be sufficient to characterize neuronal morphology. Later, when digital reconstructions became more common practice, this idea inspired the way neurons are represented digitally: a pure representation of the morphology itself without any information about the context. Currently, a digital representation consists of a set of points in three dimensions with additional information on how they are linked to each other, as is done in the *de facto* standard SWC format (Cannon et al., 1998).

As a consequence, morphometric features used to quantify and analyze morphologies (such as the order and degree of points in the neuronal tree or neurite lengths) relate to the neuron itself and are unable to describe any characteristic of the context. Hence, statistical approaches to analyze morphologies and their variance that use these morphometric features are bound to fail to describe neuronal morphologies correctly as contextual influences including boundaries, capillaries and other neurons, cannot be taken into account. Indeed, in earlier work it was shown that the variance in morphometric features can be so high that no statistical model can be constructed to accurately describe the limited data (Torben-Nielsen et al., 2008).

An alternative, albeit in practice closely related to the pure statistical approach to study neuronal morphologies is the so-called "generative approach" (Ascoli et al., 2001; van Pelt and Uylings, 2002; Stiefel and Sejnowski, 2007; Torben-Nielsen and Cuntz, 2014). In this approach virtual morphologies are generated *de novo* using morphogenetic algorithms. In most cases, these algorithms adhere to the ideas proposed by Hillman and sample from statistical distributions representing morphometric features to generate a morphology (Eberhard et al., 2006; Lindsay et al., 2007; Torben-Nielsen and Cuntz, 2014). Clearly, these methods can mimic statistical properties of the data set but fail to capture contextual influences and plausible variation (but see Samsonovich and Ascoli, 2003). Notable exceptions exist and target specific characteristics of the context. Luczak proposed a generative method based on diffusion-limited aggregation to illustrate how competition over resources and the spatial distribution thereof could shape dendritic morphologies (Luczak, 2006). In another work, Cuntz and colleagues proposed a generative approach based on high-level wiring constraints. By generating multiple virtual morphologies in the same volume, competition over resources could be mimicked (Cuntz et al., 2010). In previous work, we demonstrated that self-referential contextual cues (e.g., self-avoidance, somatropism, and membrane stiffness) could be used to explain some characteristics of dendritic morphologies (Memelli et al., 2013). Recently, CX3D was designed to simulate neuronal development based on intrinsic and extrinsic, contextual factors (Zubler et al., 2013).

In this work we argue that in order to fully understand neuronal morphologies we need to break with the view that neurons can be treated as independent, isolated entities. Therefore, we propose a new approach to study morphologies in which large numbers of virtual morphologies are generated simultaneously *de novo* while embedded in a virtual brain substrate, resulting in a mechanistic – in contrast to a statistical – description of morphologies. In this approach, morphologies are generated by repeatedly extending simulated, phenomenological growth cones that are guided by interactions with other actors in the brain substrate.

We designed and implemented a prototype of the proposed computational framework, NeuroMaC ("Neuronal Morphologies and Circuits"). We showcase the functionality of our framework related to single neuron morphologies by synthesizing spinal cord motor neurons, hippocampal granule cells and cortical layer 5 (L5) pyramidal neurons. All results are validated against publicly available, experimentally reconstructed morphologies.

#### **MATERIALS AND METHODS OUTLINE**

The rationale behind our proposed framework is based on two key experimental findings. The first is that the genetic make-up of a neuron determines its shape to a large extent. In cell culture experiments, neurons have a recognizable morphology, albeit one that differs from *in situ* occurrences (Banker and Cowan, 1977; Kriegstein and Dichter, 1984). Second, the genetic make-up of neurons also appears to outline a blueprint of neurons in terms of interactions with the substrate in which they develop. Growth is mainly determined by growth-cones that contain filopodia-like structures that sense the molecules present in the extracellular matrix. Sensation of these molecules then influences when a growth cone branches or terminates as well as the direction of elongation (Itoh et al., 1993; Scott and Luo, 2001; Mortimer et al., 2008; Jan and Jan, 2010).

We extrapolate these key findings to operational concepts in our framework that simulates phenomenological growth cones called fronts. Broadly speaking, fronts contain growth rules that can be expressed in terms of interactions with other agents present in the substrate. Interactions are always "local" in the sense that a front is able to sample its direct surrounding. As such, fronts are a simple metaphor for biological growth cones.

**Figure 1** outlines the concepts underlying NeuroMaC. Based on the "local" nature of sensing and sampling of fronts we can decompose the simulated brain volume into small sub volumes (SVs). Each SV has full knowledge about all contained fronts and contextually relevant actors in the substrate, e.g., boundaries and other neurons amongst others. All SVs repeatedly extend all active fronts contained inside their spanned volume. Because fronts also have a physical dimension with a location and a radius, extending fronts creates the simulated neurites by creating a frustum between the initial position of a front and the new position after extension. Details about the construction rules of fronts are provided in the next section and for now it suffices to understand that – in line with the behavior of growth cones – fronts can extend, branch or terminate, and that they can use contextual cues to influence these actions. Once the active fronts are extended, the SVs perform the crucial step of checking and resolving structural overlaps while simultaneously recording locations of putative synapses. As a result, generation of morphologies and construction of a circuit (without structural overlaps) can be performed in one pass.

#### **NeuroMaC**

We designed and implemented NeuroMaC in accordance to the rationale and key concepts outlined above. Here we describe indepth the components of the proposed framework.

#### *Multi-agent architecture and parallelization*

NeuroMac is designed as a multi-agent system, that is, different components of the framework work autonomously and communicate with each other through messages. A multi-agent system allows straightforward parallelization with the number of computing cores to ensure scalability. NeuroMaC has two agent types: one central administration agent and multiple SV-agents.

The administration agent performs all internal housekeeping. It reads a configuration file (**Table 1**) that defines the simulation and system specific settings. Subsequently the administration agent decomposes the brain substrate into smaller SVs and initializes the SVs. During initialization each SV is assigned a space it controls together with all environmental details required for the fronts to develop. The administration agent maintains a central clock to synchronize updating of fronts in each SV. A clock ensures that irrelevant issues such as execution time on the computing resource do not bias simulated growth. In case an updated front moves outside of the space covered by a particular SV, the administration agent brokers the migration of that front to the

appropriate new host SV. All updates inside an SV are communicated to the central agent, which compiles a centralized output file containing all neuronal morphologies.

The SV agents perform the same behavior in parallel. The number of these agents can scale with the number of available computing nodes; more nodes results in smaller decomposed volumes and faster run times. Conceptually, SVs represent the direct neighborhood surrounding a developing growth cone. Distal parts of the brain substrate are of no concern to a growth cone as all contextual cues are sensed in the direct vicinity. SVs contain all local information about the substrate itself, e.g., boundaries, laminar structure, same and other neuron structures, etc. Diffusible molecules in the extracellular space can promote longdistance interactions and while we do not simulate diffusion explicitly, the effect of contextual cues can propagate from SV to SV so that these are also locally available for growth cones. Any cue not on the hosting SV or on one of the direct neighbors is summarized (averaged) and only this information is revealed to active fronts. This measure is valid because it is irrelevant for an active front to know the exact locations of very distant cues.

During each general time step SVs execute the algorithm listed in **Figure 1B**. However, just before the algorithm is executed, each SV communicates with its neighbors to query their contained volume. This is needed because, if an active front is close to an SV boundary (e.g., close enough that it might interact with a neurite contained in a neighboring SV), it also has to sense the neighboring substrate. During the main algorithm, SVs call each active front inside their volume, in randomized order, to compute its

next location (see next section). Once the SV receives the updated front, it performs several checks. First, it checks if the new location of the front is still inside the volume it spans. If not, the front is migrated to another SV. Otherwise, the SV checks whether the new front physically overlaps with existing fronts and neurites. Overlap is tested between two fronts and their associated line segments. That is the line segment between a front and its parent. If the minimal distance between two such line segments is smaller than the sum of the radii of both associated fronts we consider this to be an overlap. Unless the radius of a front is drastically smaller than that of its parent front, this method yields adequate results. When a potential overlap is detected, the SVs will try to resolve it by randomly perturbing the front's location. If the conflict cannot be resolved in a predetermined number of attempts, the front is terminated at its previous position. When all active fronts are updated and validated, the corresponding newly formed neurites are communicated to the administration agent. Putative synapse locations are computed in the same way (and at the same time) as the structural overlaps with the difference that a maximally allowed distance is set by the user that reflects the pre-synaptic bouton and post-synaptic spine size. Although rudimentary, this method yields a list of putative synapse locations that can be pruned in a post-processing step (Hill et al., 2012), but also see van Pelt et al. (2010).

#### *Growth cones as cellular automata*

In NeuroMaC fronts are phenomenological implementations resembling biological growth cones. An active front is a front that is still developing; an inactive front becomes continuation point,

#### **Table 1 | Exemplar configuration file used in NeuroMaC.**

```
[system]
```

```
# framework related settings
seed = 2
proxy_sub_port = 5599
proxy_pub_port = 5560
pull_port = 55002
time_out = 10000
# simulation related settings
no_cycles = 105
out_db = models/L5_pyramid/forest_Z8.db
synapse_distance = 5
# attempts to resolve overlap-conflicts
avoidance_attempts = 2
```

```
[substrate]
```

```
# settings about the simulated brain substrate
dim_xyz = [6000.0,1800.0,1410]
# volume decomposition into xa x ya x za SVs
xa=6
ya=6
za=1
# laminar structure
virtual_LAYER = {6:[[0,0,0],[2000,2000,471]],\
                 5:[[0,0,471],[2000,2000,826]],\
                 4:[[0,0,826],[2000,2000,1090]],\
                 3:[[0,0,1090],[2000,2000,1192]],\
                 2:[[0,0,1192],[2000,2000,1311]],\
                 1:[[0,0,1311],[2000,2000,1406]]}
#pia as boundary
```

```
pia = models/L5_pyramid/pia_forest.pkl
```

```
[cell_type_1]
# settings related to the growth rules
no_seeds = 100
algorithm = Full_detailed
location = [[250,250,800],[5750,1550,1180]]
soma_radius = 10
```
The configuration follows the Python ConfigParser structure. Parameters are pooled in several sections and parameter values can take the form of executable Python statements. A description is in the main text.

branching point or a terminal tip. As such, neurites are represented by frusta connecting subsequent fronts (**Figure 1C**; Cannon et al., 1998; Ascoli et al., 2007).

Fronts have a dual identity. On the one hand they are physical structures with a location and radius in space. On the other hand, a front is a cellular automaton-like machine that contains its own growth rules describing how and when it should extend, branch or terminate (see **Table 2** for an example). When an active front is not terminating, it either produces one or two new fronts; the old front becomes inactive and the newly formed front(s) become(s) active fronts. The location of the new front is computed in accordance to a front's construction rules and locally available information. Information can be everything that is contained in the SV. For instance, homotypic (Grueber et al., 2005; Marks and Burke, 2007; Memelli et al., 2013) and same-type (Scott and Luo, 2001; Jan et al., 2003) cues can be used, or, the transient

laminar information through which a front might travel (Hevner et al., 2003; Chen et al., 2005). The aforementioned cues have a direct biophysical interpretation, but also more phenomenological cues such as (directional) information related to a boundary can be used in our framework. A biological counterpart thereof could be envisioned to be Reelin secreted by Cajal-Retzius cells (Frotscher, 1998; Marin-Padilla, 1998). Construction rules define how the front interacts with these other inhabitants of the SV: no interaction, repulsion or attraction. Hence, the context is used as a guidance cue (**Figure 1C**). The influence of these cues can be distance-dependent mimicking gradients of secreted molecules (Mortimer et al., 2008). In addition, fronts can also modify the substrate by secreting entities: phenomenological representations of secretion molecules that can in turn be used as a guidance cue (Hentschel and van Ooyen, 1999).

#### **IMPLEMENTATION**

We implemented a prototype in Python and use ZeroMQ (Hintjens, 2013) to send messages between the components because it has the ability to buffer large messages and operate asynchronously. The algorithm underlying the behavior of an active front is a Python script and is the only part that has to be implemented by an end-user. This prototype is available on https://groups.oist.jp/cnu/neuromac.

Combined, the eminent features of NeuroMaC are: (1) Context-aware generation of virtual morphologies that will not overlap with one another in space; (2) The ability to detect and record synapses on the fly; and (3) Straightforward scalability and parallelization to generate large numbers of morphologies at the same time.

#### **RESULTS**

In order to validate the proposed framework we generated sets of virtual neuronal morphologies and compared them to the statistics of experimentally reconstructed morphologies. We validate NeuroMaC by demonstrating that we can (1) generate morphologies in isolation as current state of the art approaches do, (2) populate a space by generating a forest of non-overlapping and interacting hippocampal granule cells, and (3) generate fully context-aware morphologies that interact with the environment (L5 pyramidal neurons in a laminar architecture). We selected these neuron types because motor neurons and hippocampal granule neurons are often used in algorithmic generation; pyramidal neurons were chosen because their higher morphological complexity and assumed context-dependence. The experimentally reconstructed neurons were downloaded from NeuroMorpho.org (Ascoli et al., 2007). We took two motor neuron archives, the Burke archive (*N* = 6, Cullheim et al., 1987) and the Fyffe archive (*N* = 8, Alvarez et al., 1998). The granule neurons come from the Lee archive (*N* = 25, Carim-Todd et al., 2009). Pyramidal neurons are layer 5, secondary motor cortex neurons and come from the Kawaguchi archive (*N* = 10, Hirai et al., 2012).

#### **MOTOR NEURONS IN ISOLATION**

Motor neurons have a relatively straightforward morphology that, from the point of view of an external observer, is fairly

#### **Table 2 | Complete Python code used to implement the growth rules underlying the generated motor neurons (illustrated in Figure 2).**

```
from growth_procs import unit_sample_on_sphere,\
   direction_to,\
   gradient_to,\
   normalize_length,\
   get_entity,\
   get_eigen_entity,\
   prepare_next_front
L_NORM = 40 # fixed-size elongations
def extend_front(front,seed,constellation):
   if front.order== 0 : # this is the soma, create the stems
      new_fronts = []
      for i in range(np.random.randint(8,17)):
         rnd_dir = unit_sample_on_sphere()
         new_pos = front.xyz + normalize_length(rnd_dir,L_NORM)
         new_front = prepare_next_front(front,new_pos,\
                set_radius=8.0,add_order=True)
         new_front.swc_type=2
         new_fronts.append(new_front)
      return new_fronts
   else:
      # Follow a simple branching rule in all other cases
      bif_prob = 0.6 / (front.order*2.5)
      if front.order > 5 :
         bif_prob = 0.03
      if np.random.random() > bif_prob: # continue a front
         # random component
         rnd_dir = unit_sample_on_sphere()
         # unit vector of current heading
         heading=normalize_length(front.xyz - front.parent.xyz,1)
         # soma-tropism, sample direction away from the soma
         soma_dir = -1.0 * normalize_length(direction_to(front,\
                     [front.soma_pos],what = "nearest"),0.4)
         # combine all infliences on the new direction of growth
         new_dir = heading,1.0+ soma_dir + rnd_dir
         new_pos = front.xyz + normalize_length(new_dir,L_NORM)
         new_front = prepare_next_front(front,new_pos,\
                      radius_factor = 0.9,add_order = False)
         if np.random.random() < 0.06 and front.path_length > = 600:
             return []
         return [new_front]
      else: # branch a front, generate two child fronts
         new_fronts = []
         for i in range(2):
             rnd_dir = unit_sample_on_sphere()
             heading = front.xyz - front.parent.xyz
             new_dir = normalize_length(heading,1.5) +rnd_dir
             new_pos = front.xyz+normalize_length(new_dir,L_NORM)
             new_front = prepare_next_front(front,new_pos,\
                         radius_factor = 0.7,add_order = True)
             new_fronts.append(new_front)
         return new_fronts
```
NeuroMaC contains auxiliary function to build fronts and sample the context; these functions are first imported. The main function "extend\_front" is called by the sub volume and contains the actual growth rules. In this example, a single contextual cue, soma-tropism, is used.

context-independent (**Figures 2A–C**). We devised a purely phenomenological growth rule to mimic the final morphology consisting of two sub rules: one rule for the initial front (=the soma) and one rule for all other fronts. The full Python code of the growth rule is listed in **Table 2**. At the soma ("front.order == 0"), multiple stems are created in random directions around the soma. Once the stems are created fronts

can bifurcate with a probability inversely proportional to the branching order, terminate with a small probability or extend otherwise. When a front grows outside the assigned substrate space it is terminated. Current heading, repulsion by the soma and a random component set the direction of a bifurcating or extending front. Typical resultant virtual morphologies are listed in **Figures 2D–F**.

Exemplar experimentally reconstructed spinal cord alpha motor neurons [**A,B** from the Fyffe archive (Alvarez et al., 1998), **C** from the Burke archive (Cullheim et al., 1987)]. **(D–F)** Virtual morphologies generated by NeuroMaC. **(G–I)** Quantitative comparison. Population morphometrics are shown for the Burke ("Burke") and Fyffe ("Fyffe") archives and for the

each branching point in all morphologies. **(I)** Occurrence of branching points in each morphology as a function of Euclidean distance (i.e., Sholl-intersections, see main text). See **Table 3** for detailed statistics of these (and other) morphometrics.

Visual inspection shows high resemblance between the exemplar and generated motor neuron morphologies. We then checked the global morphometric, namely the Euclidean distance between the soma and terminal tips (**Figure 2G**) and the two-dimensional local metrics "order" that expresses the occurrences of branching points as a function of branching order (**Figure 2H**), and, "Sholl-like," a quick implementation of the Sholl metric that measures branch points as a function of Euclidean distance from the soma (**Figure 2I**). Trends contained in the experimentally reconstructed neurons (labeled "Burke" and "Fyffe") are replicated by the generated neurons (labeled "Syn"). We quantify the distribution by the median (M) and median absolute deviation (MAD) because the shape of the resultant distribution of the measures is unknown a priori and do not necessarily follow a normal distribution. Spread of the distribution is quantified with the interquartile range (IQR). Quantification is listed in **Table 3**. From the quantification we can see that there is a fair difference between the exemplar archives and that the generated neurons fit well between the values of the exemplars.

Both visual inspection and the quantitative measures show a good correspondence between the experimentally reconstructed and generated morphologies. These results are on par with the previously published results (Memelli et al., 2013), and hence we can

conclude that by using NeuroMaC we can create sets of neurons generated in isolation.

#### **A FOREST OF HIPPOCAMPAL GRANULE NEURONS**

Next we set to generate granule cells, both in isolation and in a "forest" setting, that is, many neurons packed in one volume with all neurons being generated simultaneously. Three experimentally reconstructed exemplar morphologies are shown in **Figures 3A–C**. We devised a straightforward construction rule in a similar vein to the rule usedfor the virtual motor neurons. Once the soma and two initial branches are created, branching occurs with a probability that decreases with the centrifugal order of the front. The direction of growth is determined by repulsion away from same-neuron dendrites, the current heading of a dendrite, and the direction towards the superficial part of the substrate, which in this case is the superficial part of the dentate gyrus. A random component is added to all growth directions as well. We generated two sets of virtual morphologies, namely a set in which each neuron was generated in isolation (*N* = 25, **Figures 3D–F** are representative examples) and one set in which 100 morphologies were generated simultaneously in a (**Figure 3G**). The growth instructions were kept identical in both sets. The simulated volume, however, was increased 20-fold in theforest setting (i.e., 1300μ×300μ×225μ, with 225 μ being a plausible depth of the dentate gyrus). Note that




Distribution of observed morphometrics are given by the median (M), median absolute deviation (MAD) and inter-quartile range (IQR). Values shown for the generated ("synthetic") morphologies and the morphologies originating from the Burke and Fyffe archives (see main text).

in the "forest" setting, developing morphologies interact indirectly with each other through overlap-prevention.

Visually the generated morphologies bear strong resemblance to the exemplar ones. We then measured the Euclidean distance between some and terminal tips and the maximum order in a tree (**Figures 3H,I**), as well as the two-dimensional"Order"and"Sholllike" metric (**Figures 3J,K**) for the set of exemplar morphologies ("Lee") and the sets of morphologies generated in isolation ("Syn") and in a forest setting ("Forest"). To avoid biases introduced to an unequal number of samples, we randomly picked 25 morphologies from the forest and computed the appropriate features from this subset. The histograms indicate similar trend in the data of all data sets. Quantification of all measured morphometrics is provided in **Table 4**. It is interesting to note that the variance in the morphologies generated in a forest setting is higher. This observation results from the fact that all these neurons are generated simultaneously. As a result, some branches would overlap with each other. Overlaps are detected and an attempt is undertaken to resolve the overlap. However, if no quick resolution is found, the branch is terminated. In the forest setting, the somata are close to each other and some conflicts in the proximal branches could not be resolved and caused very small Euclidean length and low maximal order in rare cases (**Figures 3H,I**, left-most red bars). The two dimensional metric indicate a good match in the topological and geometrical distribution of branch points (**Figures 3J,K**).

Even though the neurons in the forest setting were densely packed (**Figure 3G**) no overlaps occurred as neurite locations were either corrected or terminated during the validity checks performed by the SVs. Therefore, we conclude that with NeuroMaC we can generate forests of non-overlapping, plausible morphologies.

#### **CONTEXT-AWARE L5 PYRAMIDAL NEURONS**

As a final demonstration of the capabilities of NeuroMaC, we generated context-dependent layer 5 pyramidal neuron morphologies. Three exemplar morphologies are shown in **Figure 4A**. By visually examining these morphologies, we can observe some morphological traits such as a difference in"height"but these traits are hard to relate to their context. However, from canonical circuit information, we know that the somas are located in layer 5, that their basal dendrites remain mainly in L5 and may extend a bit into L4, that their apical dendrite extends to the superficial parts and ends close to the pia (in L1) after branching extensively in layers L3–L1, and, that oblique dendrites sprout from the apical trunk in L4. The remarkable difference in "height" of the apical tree, is a clear signature of this context dependence as more superficially located pyramidal cells cannot extend as far as more deeply positioned ones.

We designed construction rules that take these canonical, contextual traits based on laminar structure into account. A truncated code snippet is listed in **Table 5** to indicate particular context-dependent growth rules. Note that the growth rules are different for basal and apical dendrites, and a further division of the apical growth rules into rules for L5/L4, oblique dendrites, and the dendrites in L3/L2/L1. At the soma, we generate an appropriate number of basal stems and one apical stem. The basal dendrite branches with a probability inverse proportional with the centrifugal order; at orders higher than 6 no branching is allowed. Termination of a basal branch occurs with a small probability or when a branch grows outside the limiting volume. Direction of growth is again influenced by the heading and same-neuron repulsion and an additional random factor. The apical branch is contextually aware and the construction rules change depending on the layer it is in (**Table 5**, "extend\_apical\_front"). Layer-dependent behavior is biologically feasible because in cortex some transcription factors are exclusively expressed in layer specific neurons (Hevner et al., 2003; Chen et al., 2005). In layers 5 and 4, oblique dendrites can sprout and grow away from their initial branch point at the apical trunk. In subsequent layers (3, 2, and 1) neurons can branch with layer specific probabilities as long as a maximum increase in order has not occurred yet in one layer. Same-neuron repulsion, current heading, a distance-dependent attraction to the pia, and a random component determine the direction of growth in the superficial layers 3–1. Apical neurites can terminate as soon as they reach layer 3 (and later 2 and 1) with a small probability. All apical neurites are terminated if the pia is closer than 35 μ away.

Two sets of morphologies are generated; again one with neurons in isolation (*N* = 10 to match the sample size in the Kawaguchi archive) and one with 100 simultaneously generated morphologies in a forest setting. The volume in the "forest" setting

of 100 simultaneously generated, non-overlapping granule cells. **(H–K)** Quantitative comparison. Population morphometrics are shown for the Lee archive ("Lee"), synthetic neurons generated in isolation ("Syn") and as part of branching points in each morphology as a function of Euclidean distance (i.e., Sholl-intersections). See**Table 4** for a detailed quantification of these (and other) morphometrics.

was a rectangle of size 6000 μ × 1800 μ × 1400 μ, where 1400 μ is the estimated depth of L5 in the exemplar data. All morphologies from the former set are plotted in **Figure 4B** along with the canonical virtual laminar architecture in which they grew (blue line: pia, red dashed lines: layer boundaries. Layer 1 is at the top and layer 5 at the bottom; layer 6 is not shown). The forest from the latter set is plotted in **Figure 4C**.

Visually, the generated neurons clearly exhibit the morphological traits summarized above. Furthermore we compared the total number of branch points (**Figure 4D**), the Euclidean distance to the terminal tips (**Figure 4E**) and the total length (**Figure 4F**). A quantification of all measured morphometrics is listed in **Table 6**. The basal and apical dendrites are treated separately in these measures. The basal trees show great correspondence with the

exemplar morphologies in terms of the Euclidean distance to the terminal tips and the total length of the dendritic trees. The number of branch points in the generated neurons is markedly higher than in the exemplar ones; a range of [19,39] for the Kawaguchi archive and [20,52] and [19,53] for the generated neurons in isolation and forest setting, respectively. Given a correct match with the total length and the Euclidean distance to the tips, we speculate that the simple branching and termination rules are not sufficient for the basal trees, although the low number of branch point can also result from incomplete reconstructions (Anwar et al., 2009, but also see Section "Discussion").

Considering the apical trees, we observe a mismatch in the Euclidean distances and the total length between the exemplar



Generated morphologies can be generated in isolation or in a forest setting. Presentation as in*Table 3*. Values shown for the morphologies from the Lee archive and the morphologies generated in "Isolation" and in the "Forest" settings (see main text).

and the generated morphologies. We attribute both to a difference in the oblique dendrites. As seen in **Figure 4E** (left panel, "Kawaguchi"), there is a peak of terminals in the apical dendrite that terminate close to the soma. While the generated data also displays a second peak due to terminals of the oblique dendrites, this peak is less pronounced and shifted to greater Euclidean distances. We speculate that in the exemplar dendrites, more oblique dendrites sprouted more proximally than in our model. Given a major thalamic synaptic pathway in cortex projecting to layer 4 and synapsing onto oblique dendrites (Meyer et al., 2010; Oberlaender et al., 2012), it is not unreasonable to think the oblique dendrites mainly sprout in layer 4 as in our model. But, as said, an SWC file does not contain any contextual information so the true dimensions of the laminar architecture of the animals from which the neurons were reconstructed remain a guess. Moreover, we consider the ability of NeuroMaC to construct context-dependent dendrites a quality, even if no context-dependent information related to the exemplar morphologies was directly available. The fact that the apical trees generated by NeuroMaC all reach the L1 – and not further – are a great illustration of this context-dependence.

Our results indicate a clear and valid context-dependence, which is similar to the morphological traits in the exemplar data. Therefore, we can conclude that the generated morphologies exhibit context-dependent morphological traits that match to the traits discovered in the exemplar data.

#### **DISCUSSION**

We started this work with the observation that there is a large discrepancy between the way neuronal morphologies are studied (in isolation) and the way they develop and take their shape (in interaction with a dense surrounding substrate). From experimental studies it appears that the surrounding brain substrate, the context of all neurons, plays a pivotal role in shaping the morphology and resultant brain circuits. To overcome this discrepancy, we proposed a new computational framework, NeuroMaC, to study how neuronal morphologies emerge from interactions with other actors in the brain substrate.

We opted for a phenomenological framework for the sake of conceptual simplicity and to curb computational costs. Construction rules are conceptually related to the genetic make-up of a neuron and express how a neuron has to grow in terms of repulsive or attractive interactions with the surrounding substrate. A phenomenological framework helps to reduce the computational resources in contrast to biologically and physically detailed ones. Moreover, the design of NeuroMaC as a multi-agent system ensures scalability with the number of available processors. As a consequence of the design choices, NeuroMaC can be used to generate large numbers of interacting morphologies simultaneously. This feature is unrivaled. CX3D, an existing computational tool aims to simulate the whole of cortical development, from migration over polarization and differentiation to dendrite and axon formation. However, the main version is serial (i.e., not parallel) which limits its applicability to generate multiple full morphologies at the same time. NETMORPH, a tool capable of generating large cortical networks (Koene et al., 2009) adopts a strategy in which a volume is populated by adding neurons that are generated in isolation. The topology of neurons is based on a mechanistic growth rule but the geometry assigned to embed the topology in space is statistically sampled from exemplar data. Hence, in NETMORPH all neurons are independent and not based on any contextual cues (van Ooyen et al., 2014). Although it has to be noted that exemplar data contains morphologies that are shaped through contextual interactions and, therefore, if a model succeeds in reproducing morphological traits it implicitly captures some of these interactions. Historically, ArborVitae (Senft and Ascoli, 1999) was proposed to generate large networks of neurons simultaneously and with some phenomenological interaction based on resource competition. While promising initial results were generated, this tool is no longer in development. Hence, NeuroMaC is currently the only computational framework to study explicitly how neurons grow together while interacting with the environment.

We demonstrated that by using NeuroMaC we can generate plausible neuronal morphologies with construction rules based on local interactions, which inhabit the same simulated substrate and have no physical overlaps. In the current work, construction rules underlying the growth of morphologies are a crude approximation of the hypothesized growth rules used by neurons. The aim of this work was not so much the generation of the most "realistic"morphologies or morphological traits but rather showcasing the power and usability of our new framework. As such, we illustrated that construction rules expressed in terms of repellants and attractors are a useful metaphor to study morphologies.

NeuroMaC can be used in any desired way on the continuum between small and large spatial scales and their associated level of biological detail. At one end of this continuum it can be used to study the effects of detailed, biologically plausible construction rules. This way, studies can be conducted investigating how particular construction rules representing biophysical processes influence morphological traits. On the other end of the continuum, one could opt to use less detailed rules to generate full morphologies and, because putative synapse locations are recorded as well, the resultant circuits. Of course, highly detailed construction rules can also be used (at little extra computational cost) to generate full circuits and any "intermediate" level of detail can be implemented as well. However, while it is possible to compute the propagation of microscopic rules to the meso-scale circuit, it can be a tedious

task to analyze the whole circuit at large for traces of the underlying microscopic interactions. Another noteworthy feature of NeuroMaC is that it supports a mixed-methodology with respect to the growth rules. That is, existing contextindependent neurogenetic algorithms can be implemented in a straightforward fashion so that they can be used as growth rules. As such, a simulated brain substrate could be populated by morphologies grown in accordance to different methodologies.

One important observation is that our virtual morphologies generated in a forest setting exhibit a larger variance than present in the exemplar data (**Figures 3I,J** and **4D,F**). This effect is smaller but still present in the neurons generated in isolation. We turn to the data sets of experimentally reconstructed neurons to explore the issue of variance.

#### **Table 5 | Code snippet illustrating the growth rules to generate layer 5 pyramidal neurons.**

```
def extend_front(front,seed,constellation):
   if front.order = = 0:
      new_fronts = []
      apical_front = create_apical_branch(front,constellation)
      basal_fronts = create_basal_branches(front,constellation)
      new_fronts.append(apical_front)
      new_fronts.extend(basal_fronts)
      return new_fronts
   elif front.swc_type = = 3:
      if front.update_cycle < = np.random.randint(35,47):
         return extend_basal_front(front,constellation)
      else:
         return []
   else
      return extend_apical_front(front,constellation)
def create_apical_branch(front,constellation):
   # create one branch in direction of the pia
   pia = get_entity("pia",constellation)
   dir_to_pia = direction_to(front,pia,what = "nearest")
   new_dir = normalize_length(dir_to_pia,3.0)
   new_pos = front.xyz + normalize_length(new_dir,APICAL_NORM)
   new_front = prepare_next_front(front,new_pos,\
               set_radius = 1.0,add_order = True)
   return new_front
def create_basal_branches(front,constellation):
   for i in range(np.random.randint(5,11)):
      # construct a number of basal branches
   ...
def extend_basal_front(front,constellation):
   # branch, continue or terminate
def extend_apical_front(front,constellation):
   # terminate branches too close to the pia
   # sprout oblique dendrites in L5 or L4
   if (front.layer = = 4 or front.layer = = 5) and not front.oblique:
   # special rule for oblique dendrites
   if front.oblique:
      # compute next location and return
      # continue or terminate oblique branch?
   # layer specific rules for fronts in different layers
   if front.layer > = 3:
      # branch, continue or terminate
   if front.layer = = 2:
      # branch, continue or terminate
   if front.layer = = 1:
      # branch, continue or terminate
```
The code is incomplete and merely for the purpose to illustrate some of the context-dependent cues such as growth direction to the pia and layer specificity (for the apical tree).

We can start by assuming that the data is a good representative of all neurons. In that case, our data exhibits too much variation. Here the explanation would be that the used branching rules are too simple and that branch probability and termination are also dependent on both intrinsic and extrinsic signals. Intrinsic signals could be mediated through the production and transport of actin filaments that are required for scaffolding the neuronal membrane (Graham and van Ooyen, 2004). A detailed, mechanistic rule based on these intrinsic properties has been proposed (van Pelt and Verwer, 1986; van Pelt and Schierwagen, 2004) and could be used in our framework. Extrinsic signals are inherently context-dependent. Experimental work has demonstrated that the presence of specific molecules in the extra-cellular space influence branching and termination properties (Itoh et al., 1993; Dimitrova et al., 2008). While we did not address biologically plausible termination and branching conditions, we did use the contextual laminar architecture as a cue to set layer specific branching probabilities, and fronts in close proximity to the pia were terminated. Another way of restricting virtual morphologies is by generating them inside a limited space as applied here to


**Table 6 | Quantitative description of experimentally reconstructed L5 pyramidal neurons and their generated counterparts.**

Generated morphologies can be generated in isolation or in a forest setting. Basal and apical dendrites are treated separately. Presentation as in *Table 3*. Values shown for the morphologies from the Kawaguchi archive and the morphologies generated in "Isolation" and in the "Forest" setting (see main text).

the neurons generated in isolation. In such cases, a neurite terminates once it leaves the designated space (Cuntz et al., 2010; Memelli et al., 2013). This might explain in part why the neurons generated in isolation and in a limited space show less variance (**Figures 3H,I** and **4E,F**). However, since one of the future goals of

this work is to generate full circuits, and because synapse occurrence is proportional to structural overlap between axons and dendrites (Peters and Feldman, 1976), we cannot constrain the space and generate large ensembles of neurons simultaneously (as in the forest setting, **Figures 3G** and **4C**). Therefore, future work will also focus on the design of proper rules for branching and termination.

We can also start an argument by assuming that the exemplar is not representative for all neurons. It has been demonstrated that reconstructed neurons contain a lot of biases related to reconstruction methods and selection by the experimenter (Horcholle-Bossavit et al., 2000; Kaspirzhny et al., 2002; Szilágyi and De Schutter, 2004; Steuber et al., 2004). For instance, the experimenter might select only "typical" neurons that are labeled well in the slice, which leads to a strong bias in the data. Also, neurons at the edge of a slice are more likely to be selected for technical reasons while precisely these neurons might be affected by the slice preparation in that neurites might be cut. Because these biases are not documented it is hard to make an estimate of their effect on the sample. As such, another option remains to explain the large variance in the generated data remains: the construction rules can be incomplete. Clearly the rules employed in this work are phenomenological and only crudely mimic morphological traits, so are incomplete. But assuming the rules are correct has interesting implications mainly because of the predictive power associated with a mechanistic model. Having a mechanistic explanation of neuron morphology has the advantage that morphological traits of various kinds can be predicted. For instance, age has an influence on morphologies and makes classifying neurons of varying age to correct classes nearly impossible (but see da Fontoura Costa et al., 2002). With a mechanistic model, morphologies corresponding to a certain age could be generated and serve as ground truth. Similarly, to assess pathological cases, simulated knock-outs could be predicted. Predictions, in turn, could be used to validate the phenomenological construction rules: predict the outcome of a particular knock-out and compare the resultant traits *in silico* and *in vitro.*

In conclusion, we designed, implemented and validated a new computational framework in accordance to a paradigm shift in the study of neuronal morphologies: away from studying morphologies in isolation to a study of neuronal morphologies as participants in their neuronal context. We demonstrated the potential of this new framework to study variation in neuronal morphology through a "generative" approach. Future research will focus on the generation and emergence of complete microcircuits.

#### **REFERENCES**


*Networks, IWANN '99, Alicante*, Vol. I, *Lecture Notes in Computer Science 1606* (Alicante: Springer), 25–33.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 June 2014; accepted: 20 August 2014; published online: 05 September 2014.*

*Citation: Torben-Nielsen B and De Schutter E (2014) Context-aware modeling of neuronal morphologies. Front. Neuroanat. 8:92. doi: 10.3389/fnana.2014.00092 This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Torben-Nielsen and De Schutter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The effects of neuron morphology on graph theoretic measures of network connectivity: the analysis of a two-level statistical model

Jugoslava Acimovi ´ c´ 1 \*, Tuomo Mäki-Marttunen1, 2 and Marja-Leena Linne<sup>1</sup>

<sup>1</sup> Computational Neuroscience Group, Department of Signal Processing, Tampere University of Technology, Tampere, Finland, <sup>2</sup> Psychosis Research Centre, Institute of Clinical Medicine, University of Oslo, Oslo, Norway

We developed a two-level statistical model that addresses the question of how properties of neurite morphology shape the large-scale network connectivity. We adopted a low-dimensional statistical description of neurites. From the neurite model description we derived the expected number of synapses, node degree, and the effective radius, the maximal distance between two neurons expected to form at least one synapse. We related these quantities to the network connectivity described using standard measures from graph theory, such as motif counts, clustering coefficient, minimal path length, and small-world coefficient. These measures are used in a neuroscience context to study phenomena from synaptic connectivity in the small neuronal networks to large scale functional connectivity in the cortex. For these measures we provide analytical solutions that clearly relate different model properties. Neurites that sparsely cover space lead to a small effective radius. If the effective radius is small compared to the overall neuron size the obtained networks share similarities with the uniform random networks as each neuron connects to a small number of distant neurons. Large neurites with densely packed branches lead to a large effective radius. If this effective radius is large compared to the neuron size, the obtained networks have many local connections. In between these extremes, the networks maximize the variability of connection repertoires. The presented approach connects the properties of neuron morphology with large scale network properties without requiring heavy simulations with many model parameters. The two-steps procedure provides an easier interpretation of the role of each modeled parameter. The model is flexible and each of its components can be further expanded. We identified a range of model parameters that maximizes variability in network connectivity, the property that might affect network capacity to exhibit different dynamical regimes.

Keywords: network connectivity, neuron morphology, theoretical model, neurite density field, graph theory, motifs

### 1. Introduction

We analyze how the low-resolution properties of single neuron morphology constrain the connectivity within a large population of neurons. We develop a two-level framework that includes details of single cell morphology while allowing the analysis of large populations of neurons as

#### Edited by:

Stephen Eglen, University of Cambridge, UK

#### Reviewed by:

Arnd Roth, University College London, UK Raoul-Martin Memmesheimer, Radboud University Nijmegen, Netherlands

#### \*Correspondence:

Jugoslava Acimovi ´ c,´ Computational Neuroscience Group, Department of Signal Processing, Tampere University of Technology, PO Box 553, 33101 Tampere, Finland jugoslava.acimovic@tut.fi

> Received: 25 September 2014 Accepted: 18 May 2015 Published: 10 June 2015

#### Citation:

Acimovi ´ c J, Mäki-Marttunen T and ´ Linne M-L (2015) The effects of neuron morphology on graph theoretic measures of network connectivity: the analysis of a two-level statistical model. Front. Neuroanat. 9:76. doi: 10.3389/fnana.2015.00076 well as the derivation of compact analytical expressions for most of the considered aspects of morphology and connectivity. The presented framework can further be extended to take into account additional aspects of neuronal morphology and additional properties of connectivity.

In this work, single neurons and neurites are modeled statistically. Each axon and each dendrite is represented by a single neurite field, the probability distribution that describes the density of the neurite branches within a limited area of the neurite. This way each neuron consists of one neurite field for the dendrite, one for the axon, and the parameter that determines the average distance between the dendrite and axon centers. The adopted neurite field model is discussed in the literature. The studies in Snider et al. (2010) and Teeter and Stevens (2011) propose a universal method to describe different neuronal types based on the description of neurite fields of dendrites. A study in Cuntz (2012) demonstrates how realistic neuronal morphologies arise when dendrite segments, distributed according to Snider et al. (2010), get connected using the optimal wiring principle. In van Pelt and van Ooyen (2013) the realism of the obtained synaptic distributions and connectivity probabilities was tested for neurons modeled using density fields.

The use of graph theoretic measures to quantify neuronal connectivity is a methodology adopted from the classical studies of network theory. In various studies and different contexts, it has been demonstrated how such measures can distinguish between functionally different network types. The methodology has been applied to very different networks, from computer networks to social networks, and from gene regulatory networks to neuroanatomy (Boccaletti et al., 2006). Theoretical studies, on the other hand, focus on analysis of generic networks of coupled oscillators demonstrating how statistical properties of network connectivity change the overall dynamics of the complex system. A particularly interesting question in such studies is the search for connectivity that optimizes some aspects of network functionality. Some commonly addressed concepts include small-world networks that minimize the average distance between network nodes while maximizing the cooperation across the node neighborhood. Another concept is the scalefree network that installs system dynamics on the edge between order and disorder, thus maximizing the repertoire of dynamical regimes that a system can exhibit as well as the information diversity in the system (Boccaletti et al., 2006; Mäki-Marttunen et al., 2011). Small-world networks were first introduced in Watts and Strogatz (1998), and then addressed in other studies, also in the neuroscience context (Boccaletti et al., 2006; Herzog et al., 2007; Kriener et al., 2009; Voges et al., 2010; Sporns, 2011; McAssey et al., 2014). They were often examined in the context of the large-scale recordings of whole-brain activity, or the anatomical large-scale connectivity between brain regions (Sporns, 2011). For the smaller-scale networks of individual neurons it is relatively difficult to estimate the small-world property as it requires tracking the synaptic connectivity between neurons in large populations (particularly in order to estimate path lengths). Most of the studies present in the literature examine theoretical concepts through mathematical models, or analyze functional connectivity estimated from recordings. In our previous study, we examined a large repertoire of connectivity measures aiming to find a consistent descriptor of connectivity that has implications on network dynamics (Mäki-Marttunen et al., 2013). Two measures were distinguished, the clustering coefficient for networks with binary distribution of node degrees, and maximal eigenvalue for networks with more variability in the in-degree distribution.

In this study, we primarily focus on the estimation of motif counts (Milo et al., 2002). Motifs represent minimal networks with structured connectivity and are as such suitable for experimental studies. In three previous studies, the non-random distribution of motifs was demonstrated in small networks of pyramidal cells (Song et al., 2005; Perin et al., 2011), and also in networks of interneurons (Rieubland et al., 2014). The implications of these non-random features of connectivity are yet to be explained. Using a theoretical model we derived closedform expressions for motif counts that do not depend on the network size, but only on the average density of neurons. In addition, the clustering coefficient, that was already found to significantly affect the network activity (Mäki-Marttunen et al., 2013), can be straightforwardly computed from motif counts, as demonstrated in what follows.

A relatively large part of the paper is dedicated to analytical approach to solving the considered two-level model as well as the obtained closed-form solutions. Understanding different levels of organization in neuronal systems and the interaction between those levels is a frequently discussed issue in computational neuroscience literature (Frégnac et al., 2007; Deco et al., 2008). Even the detailed single-level models can become computationally exhaustive and complex, and combining them into multilevel models leads to an explosion in complexity that can obscure the interactions between particular model components. A suggested alternative is the mean-field approximation of each level before linking it to higher-levels of organization (Deco et al., 2008; Sompolinsky, 2014). The presented study complies with this methodology. We first analyze the level of neurons in order to derive simple properties relevant for the network level in the model. In this way, the dimensionality of that level is compressed, which provides the possibility of deriving simpler expressions for the second level characteristics.

Several approximations were adopted when constructing the model of this paper. The neurite structure is described statistically and the fine details of neurite structure are lost. The fine patterns of synaptic distribution are also averaged out. The organization of neurons in the space is chosen to be simple and corresponds to cell cultures more so than to the cortical tissue. Finally, the activity-dependent synaptic reorganization is not considered in this study. Synapses are formed solely based on geometry, and the obtained connectivity corresponds more to potential connectivity as defined in Stepanyants and Chklovskii (2005). In the discussion, we will address some relevant properties of neuronal systems that are not part of the model, and propose a way to incorporate them in the presented framework.

The main result of this study are the analytical expressions for several frequently addressed network measures, including motif counts, clustering coefficient, and path length between network nodes. We particularly addressed motif counts, as they represent the smallest possible networks with structured connectivity. As they capture only the local properties of connectivity, they can be measured experimentally, as demonstrated in Song et al. (2005), Perin et al. (2011), and Rieubland et al. (2014). In addition, the clustering coefficient can be straightforwardly computed from motif counts. From the clustering coefficient and path length, we computed small-world coefficients using two definitions from the literature (Watts and Strogatz, 1998; Telesford et al., 2011). The addressed connectivity measures depend on several model parameters. Some of the parameters contribute as multiplicative constants, while others show nonlinear relations to the considered measures. The most interesting parameter is the ratio between the effective radius of a neurite and the distance between the axon and dendrite centers of the same neuron. The effective radius is the maximal distance that permits a connection between two neurons. Depending on this ratio, a network can have a connectivity similar to uniform random, or similar to locally coupled network. The most interesting situations are in between these two extremes, where the network increases variability in its connectivity repertoire.

#### 2. Methods

To address the principal goal of this study, in other words, to analyze how neuronal morphology can affects connectivity in large networks, we constructed a two-level model. The first level specifies the anatomic properties of each neurite statistically, by defining a probability distribution of neurite branches. The probability distribution is non-zero only within a limited area, the support of neurite distribution. This low-resolution description of neurites was already analyzed in several studies (Snider et al., 2010; Teeter and Stevens, 2011; van Pelt and van Ooyen, 2013). It depends on a small number of parameters, four for the two-dimensional neurites, and is suitable for the analysis of large-scale network connectivity. The second level defines the properties of the neuronal population. In order to emphasize neuron morphology we selected the simplest network model, a two-dimensional virtually infinite-size network with a uniform distribution of neurons. Every pair of sufficiently close axondendrite branches forms synapses, the number of synapses is proportional to the axon-dendrite overlap (Peters' rule, Peters and Feldman, 1976; Peters et al., 1991). The obtained synapses correspond to potential connectivity as defined in Stepanyants and Chklovskii (2005). Activity dependent synapse formation and pruning was not considered in this study, although it has been shown to play an important role in remodeling synaptic patterns. Including the activity-dependent mechanisms would require a dynamical model with a more complex synapse formation rule, eventually also described statistically. Activityinduced modifications of neurite distribution might also be considered. In this study, we wanted to analyze a simpler model where the role of morphology was emphasized, as it is the most stable among several properties that shape the connectivity in large networks. The concepts presented here can be combined with models of other relevant mechanisms, including the models of network activity, e.g., the one described in Mäki-Marttunen et al. (2013).

The first part of Methods Section gives a detailed description of the analyzed model. The second part presents the analysis of neurite distribution and shows how its properties determine first-order connectivity statistics under the adopted synapse connectivity rule. In the third part, we present closed-form analytical expressions for the two network measures and an iterative method to obtain another measure frequently addressed in the literature (Sporns, 2011).

#### 2.1. Model Description

The model consists of several components, including a neuronal population description, single neuron and single neurite description, and the rule for establishing contacts between neurons (i.e., potential synapses). All these components are illustrated in **Figure 1**.

#### 2.1.1. Population of Neurons (Figure 1A)

Neurons are distributed randomly in the two-dimensional space of the size L × L, where L is chosen to be much bigger than the neuron size, thus making the space around each neuron virtually infinite. The population of neurons is homogeneous, all of the neurons have identical properties and they are randomly oriented in space. The neurons are uniformly distributed in space with the density equal to <sup>1</sup> l 2 , i.e., a square of the size l × l contains on average one neuron, which gives a total of <sup>N</sup> <sup>=</sup> L 2 l <sup>2</sup> neurons. To avoid boundary conditions, the edges of the surface are wrapped to form a torus and provide virtually infinite space (which is illustrated in **Figure 1A**). The model corresponds to the arrangement of neurons in dissociated neuronal cultures. A model of the cortical tissue, on the other hand, requires a non-uniform arrangement of neurons that should follow the distribution of the considered cell types across layers. In addition, the non-random orientation of neurons could be imposed.

#### 2.1.2. Neuron and Neurite Models (Figure 1B)

All of the neurons in the model are identical and consist of two neurite fields, one for the (basal) dendrite and one for the axon. The dendrite is centered in the soma and the axon center is at a distance 1ad from the soma. For the uniform distribution of somata and the random orientation of axons, the distribution of axon centers becomes equal to the one of somata. The neurites are modeled statistically, as a distribution of neurite segments on a finite area, the distribution support. In this study, we considered circular supports with a radius R<sup>a</sup> for axons and R<sup>d</sup> for dendrites, where R<sup>a</sup> ≥ Rd. We analyzed cases with uniform and truncated Gaussian distributions of neurites, described by density functions pa(x, y) for axons and pd(x, y) for dendrites. The expression for the uniform distribution is given by Equation (1) and for the truncated Gaussian by Equation (2), with parameters (xa/d, ya/d)—the coordinates of the axon and dendrite centers, σd, σa—the variances along both axes.

$$\mathcal{C}\_{a/d} = 1 - \exp\left(-\frac{\mathcal{R}\_{a/d}^2}{2\sigma\_{a/d}^2}\right)$$

are the normalization coefficients that compensate for the cut off part of Gaussians. The presented results can be extended to more general forms of density distributions and elliptic distribution supports.

$$\mathfrak{p}\_{a/d}(\mathbf{x}, \boldsymbol{\chi}) = \begin{cases} \frac{1}{R\_{a/d}^2 \pi}, & (\boldsymbol{x} - \boldsymbol{x}\_{a/d})^2 + (\boldsymbol{\chi} - \boldsymbol{\chi}\_{a/d})^2 \le R\_{a/d}^2\\ 0, & \text{else} \end{cases} \tag{1}$$

$$\left\{ \begin{array}{c} \frac{1}{2\pi \sigma\_{a/d}^2 \ C\_{a/d}} \exp\left(-\frac{(\boldsymbol{x} - \boldsymbol{x}\_{a/d})^2 + (\boldsymbol{y} - \boldsymbol{y}\_{a/d})^2}{2\sigma\_{a/d}^2}\right),\\ \end{array} \right\},$$

$$p\_{a/d}(\mathbf{x}, \boldsymbol{\uprho}) = \begin{cases} \frac{2\pi \sigma\_{a/d}^2 \mathbf{C}\_{a/d}}{2\pi \sigma\_{a/d}^2 \mathbf{C}\_{a/d}} \exp\left(-\frac{\mathbf{y} - \mathbf{y}\_{a/d}^2 \mathbf{y}\_{a/d}}{2\sigma\_{a/d}^2}\right), & \text{if } \mathbf{y} \neq \mathbf{0} \\\\ \left(\mathbf{x} - \mathbf{x}\_{a/d}\right)^2 + \left(\mathbf{y} - \mathbf{y}\_{a/d}\right)^2 \le \mathbf{R}\_{a/d}^2 \\\ \text{else} \end{cases} \tag{2}$$

#### 2.1.3. Neurite Segments and Density Fields (Figure 1C)

We introduce the maximal number of neurite segments, N<sup>a</sup> for axons and N<sup>d</sup> for dendrites, for two reasons. First, this concept allows us to compute the expected number of synapses between an axon-dendrite pair, which is an important first step in the derivation of the considered connectivity measures. Second, it connects the individual neurites with the statistical description of neurite fields, which is illustrated in **Figure 1C**. Each neurite is discretized into segments of length 2D. In what follows we will call D the unit length of a neurite, so each neurite segment is two units long. If the total length of a neurite is La/d, then La/<sup>d</sup> = 2DNa/d. The neurite field describes the probability of finding every neurite segment inside the neurite support, and it can be obtained by superimposing many neurites. We assume that the dendrite center coincides with the soma center as we represent all dendrite branches with the same density field.

#### 2.1.4. Potential Synapse Formation Rule (Figure 1D)

We adopted a simple rule that forms synapses between a pair of neurons independently from other neurons in the population, the number of obtained synapses is proportional to the overlap between the two neurites (Peters' rule, Peters and Feldman, 1976; Peters et al., 1991). Consider a dendrite-axon pair, for each dendrite segment we examine its near neighborhood, a ball of radius D centered in the segment center (delineated with a blue circle in **Figure 1D**). If there is any axon segment present in this ball, the potential synapse between these segments is

established. If there is more than one axon segment, only one, randomly selected, of them will form a potential synapse with the dendrite segment. Consequently, every dendrite segment can form at most one potential synapse with the considered axon, but it can simultaneously form potential synapses with other axons that cross its near neighborhood. In the example in **Figure 1D**, the near neighborhood of a dendrite segment is crossed by two axons and two potential synapses are formed (the blue arrows indicate positions of the potential synapses). This is a rather mild constraint on the number of synapses and in a large population of neurons the number of synapses per neurite can become unrealistically high. Still, it is a reasonable assumption when analyzing potential connectivity, as we are interested in estimating the number of all possible contact places, which is much bigger than the number of actually formed synapses. Alternative rules that take into account all the available segments from all the proximal axons can also be defined.

#### 2.2. The Methodology Used to Analyze Neurites: Connectivity between Axon-dendrite Pairs 2.2.1. Expected Number of Synapses per Neurite

From the neurite description and the adopted synapse formation rule we derived the expression for the expected number of synapses per neurite (S, Equation 3). The details of the derivation of the expression are given in the Supplementary Material 1. The same expression was already proposed in the literature to estimate the number of synapses from neurite density fields (Peters et al., 1991; Liley and Wright, 1994; van Pelt and van Ooyen, 2013). In van Pelt and van Ooyen (2013), an equivalent equation was derived using less strict assumptions about the distribution of axonal field than the one adopted in our study.

$$\overline{S} = N\_a N\_d D^2 \pi \int \int\_{\Omega\_d \cap \Omega\_d} p\_d(\mathbf{x}, \boldsymbol{\jmath}) \, p\_d(\mathbf{x}, \boldsymbol{\jmath}) \, d\mathbf{x} \, d\boldsymbol{\jmath} \tag{3}$$

Replacing the expressions for neurite field distributions into this equation gives the final formula for the expected number of synapses

$$\overline{S} = \frac{N\_d N\_d D^2}{R^2 \pi} \cdot \phi(\rho, \eta, M) = \frac{4N\_d N\_d D^2}{\Delta^2 \pi} \rho^2 \phi(\rho, \eta, M). \tag{4}$$

Here, R = R<sup>a</sup> + R<sup>d</sup> 2 is the average neurite radius, 1 is the distance between the considered axon-dendrite pair of two proximal neurons, <sup>ρ</sup> <sup>=</sup> 1 2R = 1 R<sup>a</sup> + R<sup>d</sup> is the normalized distance between the axon-dendrite pair, <sup>η</sup> <sup>=</sup> R<sup>a</sup> − R<sup>d</sup> R<sup>a</sup> + R<sup>d</sup> is the asymmetry index that accounts for the different size of the axons and dendrites, and M is the set of parameters that determine the distribution of neurite segments. M is an empty set for a uniform distribution and M = σ, k<sup>σ</sup> for the considered case of truncated Gaussian distribution. Here, <sup>σ</sup> <sup>=</sup> σd 2R is the normalized dendrite distribution variance, and k<sup>σ</sup> = σd σa is the ratio between the dendrite and axon variances. In what follows, the function φ(ρ,η, M) will be called distance-dependent expected number of synapses as it describes the dependency between the expected number of synapses and the axon-dendrite distance. This function can be evaluated analytically for the uniform distribution and numerically for the truncated Gaussians, all relevant derivations are given in Supplementary Material 1 and the function is further discussed in Results Section. The only requirement for this function is to be reversible, at least partially. Similarly, the function ρ <sup>2</sup>φ(ρ,η, M) will be called size-dependent expected number of synapses as it describes the dependency on the average neurite size.

#### 2.2.2. Computation of Node Degree and Effective Radius from Neurite Field Distributions (Figure 2A)

Two neurons are expected to connect if their axon-dendrite pair has S ≥ 1. The expected number of synapses depends on the model parameters (Na, Nd, D,1, R) and the normalized parameters ρ, η, and M. First we fix all the parameters except 1 (and ρ), and then we find the maximal axon-to-dendrite distance <sup>1</sup>max (and <sup>ρ</sup>max) which satisfies the condition <sup>S</sup> <sup>≥</sup> 1. This maximal distance is called **the effective radius** of a neurite and its computation is illustrated in **Figure 2A**. The circle centered in the neurite with the radius equal to the effective radius is called **the connectivity area**. The effective radius integrates the properties of both, the axon and the dendrite, and is consequently equal for both types of neurites. Once it is computed, it simplifies the analysis of network connectivity. Every neuron can be represented as two circles of radius 1max with the distance between the circle centers being 1ad. Different network connectivity measures are computed from the intersection of pairs of circles for several neurons.

$$\begin{aligned} \overline{S} \ge 1 &\Rightarrow \frac{N\_a N\_d D^2}{R^2 \pi} \phi(\rho, \eta, M) \ge 1 \\ &\Rightarrow \rho \le \phi^{-1} \left( \frac{R^2 \pi}{N\_a N\_d D^2}, \eta, M \right) \\ &\Rightarrow \Delta \le 2R \cdot \phi^{-1} \left( \frac{R^2 \pi}{N\_a N\_d D^2}, \eta, M \right) \\ &\Rightarrow \Delta\_{\text{max}} = 2\mathbf{R} \cdot \phi^{-1} \left( \frac{\mathbf{R}^2 \pi}{N\_a \mathbf{N}\_d \mathbf{D}^2}, \eta, \mathbf{M} \right) \end{aligned} \tag{5}$$

The function <sup>φ</sup>(·) has to be invertible with respect to the first argument. Here, φ −1 (x,η, M) means the inverse of φ with respect to argument x and with η and M considered as constants. In case of uniform distribution, the function φ is monotonic without discontinuities only for <sup>η</sup> <sup>≤</sup> ρ < 1. The analysis of this case, shown in Results Section, confirms that the general conclusions still apply.

Finally, the node degree, equal for all the neurons, can be computed as a function of the effective radius. The average number of output connections for a neuron is equal to the average number of dendrite centers within the connectivity area of its axon

$$n\_{\text{degree}} = \frac{\Delta\_{\text{max}}^2 \pi}{l^2} = \frac{4R^2 \pi}{l^2} \psi \left(\frac{R^2 \pi}{2N\_a N\_d D^2}, \eta, M\right), \quad \text{(6)}$$

$$\text{where} \quad \psi(\mathbf{x}, \eta, M) = \left(\phi(\mathbf{x}, \eta, M)^{-1}\right)^2.$$

#### 2.2.3. Constraints on Model Parameters

So far, no constraints on model parameters were imposed, but obviously a random choice in the 8-dimensional space

FIGURE 2 | (A) Definition of the effective radius and the connectivity area. The effective radius is the distance between an axon center A1 and a dendrite center BX that satisfies the condition S¯ (1) <sup>=</sup> <sup>1</sup>. Every point within the connectivity area of A1 is at a distance smaller than 1max from A1. (B) Normalized coordinate system. The polar coordinate system is fixed to the representative neuron N1 defined by its axon center A1 and dendrite center B1. The coordinate center is in the axon center A1, and the coordinate axis goes from A1 to the dendrite center B1. The angular coordinate is measured counterclockwise from the coordinate axis. All radial coordinates are normalized, i.e., divided by <sup>1</sup>ad, so that B1 has coordinates (0, 1) and BX coordinates (α<sup>X</sup> , <sup>r</sup>max), where <sup>r</sup>max <sup>=</sup> 1max <sup>1</sup>ad . (C) 2-Node motif counts. The panel illustrates two steps in the computation of the expected numbers of 2-node motifs. In the first step, the position of the dendrite center B2 is chosen within the connectivity area of axon A1. In the second step, axon A2 is chosen on the circle of radius 1 around B2 (the red dashed line). The function κ<sup>1</sup> gives the probability that A2 falls within the connectivity area of B1, κ<sup>1</sup> is determined by the angle between points B2, I and J. (D) 3-Node motif counts. In the first step, the positions of two dendrite centers, B2 and B3, are chosen within the connectivity area of axon A1. The second step defines the position of axon center A2, placed on the circle of radius 1 around B2 (the dashed red line). Intersections of this circle with the connectivity areas of dendrites B1 and B3 define functions κ<sup>1</sup> (the red line), κ (the orange line), and λ (the purple line), which are determined by the angles ∠B2IJ, ∠B2KL, and ∠B2JK, respectively. The expected number of motifs for all three-node motifs can be computed considering different positions of A2 with respect to the connectivity areas of B1 and B3, and as a combination of functions κ<sup>1</sup> , κ, and λ. (E) M2 and M9 counts: Computation of the expected numbers of M2 and M9 requires additional steps. In the first step, the axon center A3 is chosen within the connectivity area of B1. In the second step, the dendrite center B2 is chosen in the connectivity area of A1 but outside the connectivity area of A3 (dark green area). In the third step, the dendrite center B3 is chosen on the circle of radius 1 around A3 (the dashed red line), but outside of the connectivity area of A1 (unshaded part of the dashed red line). The fourth step is identical as the second step in (D).

 D, Na, Nd, R,η, ρ,σ, k<sup>σ</sup> can lead to unrealistic morphologies. In this work, we will not search for biologically realistic parameters using reconstructed neurons or detailed simulations of neurites, e.g., using NETMORPH toolbox (Koene et al., 2009). This will be addressed in our future work. Here, we only give a set of weak conditions necessary for having feasible morphologies.

**Condition 1: Upper bound for the number of neurite segments. Figure 1C** illustrates the discretization of neurites into segments of length 2D. A circle of radius D is circumscribed around each such segment. As shown in **Figure 1C**, these circles overlap only immediately after their branching points. As we assume that D is small compared to the average segment between two branching points, we can also assume a small number of overlapping circles compared to the total number of circles covering a neurite. If, in addition, we assume that the number of neurite segments should not be too dense, and that the neurites tend to avoid self-intersections, we derive the following upper bound for the number of neurite segments:

$$N\_d \le \frac{R\_d^2 \pi}{D^2 \pi}, \quad N\_a \le \frac{R\_a^2 \pi}{D^2 \pi}.$$

Right sides of the equations give the approximate number of circles of radius D inside the neurite of radius Rd/<sup>a</sup> . For the truncated Gaussian we have an additional relation:

$$N\_a \le N\_a \cdot f\left(\frac{R\_d}{\sqrt{2}\sigma\_a}\right) \le \frac{R\_d^2 \pi}{D^2 \pi}, \quad N\_d \le N\_d \cdot f\left(\frac{R\_d}{\sqrt{2}\sigma\_d}\right) \le \frac{R\_d^2 \pi}{D^2 \pi}.$$

If we replace the parameters (Ra, Rd,σa,σd) with the normalized parameters (R,η,σ, k<sup>σ</sup> ) the relation becomes:

$$N\_a \le N\_a \cdot f\left(\frac{(1+\eta)k\_\sigma}{2\sqrt{2}\sigma}\right) \le \left(\frac{(1+\eta)\,R}{D}\right)^2,$$

$$N\_d \le N\_d \cdot f\left(\frac{1-\eta}{2\sqrt{2}\sigma}\right) \le \left(\frac{(1-\eta)R}{D}\right)^2. \tag{7}$$

The function f(x) = x 2 1−exp(−x 2 ) is derived in the Supplementary Material (see Supplementary Material 1, derivation of Equation 4) for the upper bound of Na. The relation for N<sup>d</sup> follows from the same analysis when switching the roles of dendrites and axons.

**Condition 2: Weak lower bound for the number of neurite segments.** Each neurite should have at least one connected straight fiber. If the neurite radius is Rd/<sup>a</sup> , the fiber length should be at least 2Rd/<sup>a</sup> . Clearly, a better approximation for a single fiber would be elliptic support with a longer diagonal equal to Ra/<sup>d</sup> and a shorter one much smaller than Ra/d. But, if we only consider the circular support of neurites, as it is done in this study, the single fiber of length 2Rd/<sup>a</sup> is approximated with a circle of the radius Rd/<sup>a</sup> . Therefore, we have

$$N\_a \ge \frac{2R\_a}{2D}, \quad N\_d \ge \frac{2R\_d}{2D}.\tag{8}$$

**Condition 3: Connected network.** In order to have a connected network the following relation between the model parameters has to hold:

$$n\_{\text{degree}} \ge 1 \Rightarrow \psi\left(\frac{R^2 \pi}{2N\_d N\_d D^2}, \eta, M\right) \ge \frac{l^2}{2R^2 \pi}.\tag{9}$$

**Condition 4: The inverse of function** φ**.** The model parameters should be in the range of values where the inverse of φ exists:

$$0 \le \frac{R^2 \pi}{2N\_a N\_d D^2} \le \phi\_{\text{max}}(\eta, M). \tag{10}$$

**Condition 5: Upper bound for the expected number of synapses.** As each dendrite segment accommodates at most one synapse with a proximal axon, the upper bound of S can be estimated as the total number of circles of the radius D that can be placed inside the axon-dendrite intersection area:

$$\mathbb{S} \le \frac{\|\Omega\_a \cap \Omega\_d\|}{D^2 \pi}.$$

In cases when the number of neurite segments is much smaller than the neurite radius this upper bound allows more than one synapse per neurite segment, so a more strict constraint should be imposed:

$$\overline{S} \le \min\{N\_a, N\_d\} \Rightarrow \phi(\rho, \eta, M) \le \frac{R^2 \pi}{2D^2} \cdot \max\left\{\frac{1}{N\_a}, \frac{1}{N\_d}\right\}.\tag{11}$$

#### 2.3. The Methodology Used to Analyze Networks: Statistical Measures of Network Connectivity

We analyze network connectivity by computing standard statistical measures, such as motifs, clustering coefficient, harmonic path length, and two versions of small-world coefficient. Most of the section is dedicated to motifs, and the expression for clustering coefficient directly follows from it. The harmonic path length is computed using an iterative procedure. Small-world coefficients are adopted from the literature (Watts and Strogatz, 1998; Telesford et al., 2011) and will only be described in brief. We compute the connectivity measures for one fixed cell, the neuron N1, which is the representative of all the neurons in the homogeneous population. We consider all the other neurons (N2, N3,. . . ,N<sup>k</sup> ) that can form different connectivity patterns with N1.

#### 2.3.1. Coordinate System and Normalization (Figure 2B)

The polar coordinate system is fixed to the neuron N1, with the axon center A1 and the dendrite center B1. The center of the coordinate system is in A1 and the coordinate axis follows the direction from A1 to B1. The angular coordinate is measured counterclockwise with respect to the coordinate axis and takes values <sup>α</sup> <sup>∈</sup> [−π,π]. The radial coordinates are normalized, i.e., divided by 1ad, so that B1 has the coordinates (0, 1), and a dendrite center BX on the edge of connectivity area has the coordinates <sup>α</sup>X,rmax <sup>=</sup> 1max <sup>1</sup>ad 1 . **Figure 2B** illustrates the described coordinate system<sup>2</sup> .

<sup>1</sup>To simplify the explanations in the text we sometimes use the notation for neurite centers when talking about the corresponding neurites. For example, A1 could stand for "the center of the axon of neuron N1" but also for "the axon of neuron N1." Since all neurons have equal properties, the only parameters that distinguish them are the coordinates of their centers.

<sup>2</sup>The notation r is simultaneously used for radial coordinates in the coordinate system of axon A1 and the axon-dendrite distances between A1 and other dendrites, because these distances are at the same time radial coordinates in the coordinate system of A1.

#### 2.3.2. Notation

The symbol BR<sup>x</sup> (X) is used to denote "a ball" or "a circular neighborhood." The subscript indicates the normalized radius, and the center of the ball is given between brackets. If the center X has the coordinates (αX, rX), the ball <sup>B</sup>R<sup>x</sup> (X(αx, Rx)) is a set of all points X(α,r) such that

$$||A1X|| = \sqrt{r^2 + r\_x^2 - 2r \, r\_x \cos(\alpha - \alpha\_x)} \le R\_x \cdot r\_x$$

This notation is also used to mark the connectivity area of a neurite, for example Brmax (A) is the connectivity area of an axon centered in A. If we replace the inequality in the expression above with an equality the expression corresponds to the edge of the ball, the circle CR<sup>x</sup> (X).

#### 2.3.3. Expected Number of Two-Node Motifs (Figure 2C)

**Figure 2C** illustrates the two-step method for computation of two-node motifs. We consider two connected two-node motifs, i.e., whether two neurons have a unidirectional (N<sup>1</sup> → N2) or a bidirectional (N<sup>1</sup> ↔ N2) connection. For the bidirectional motif we will use the notation M1 − 2, and for the unidirectional the notation M2 − 2. In the first step (the left side of **Figure 2C**), the position of the dendrite center B2 is chosen inside the connectivity area of axon A1 which, according to the definition of the connectivity area, results in the connection N<sup>1</sup> → N2. From the model definition, the axon-dendrite distance in a neuron is fixed to 1ad (1 in the normalized coordinate system) and the orientation of the neuron is random in the 2D space. Therefore, for the fixed B2 the axon center A2 can take any position on the circle of radius 1 centered in B2, C1(B2), with equal probability. This circle is shown as a red dashed line on the right side of **Figure 2C**. Given the set of possible positions of A2, we can compute the probability that A2 falls inside the connectivity area of B1, which would give a bidirectional connection between the two neurons. This probability is proportional to the part of the circle C1(B2) that falls inside the connectivity area around B1 (highlighted in **Figure 2C**), and is also described by the function κ1. If A2 is outside the connectivity area of B1, the resulting motif will be the unidirectional connection N<sup>1</sup> → N2.

From this analysis we can estimate the probability that neuron N<sup>2</sup> forms a unidirectional or a bidirectional motif with the neuron N1. To compute the expected number of two-node motifs for N<sup>1</sup> we should consider all the possible positions of B2 (and consequently A2) within the connectivity area of A1, which is done by integrating over all the coordinates B2(α2,r2) inside the ball Brmax (A1). In addition, the expression obtained for the motif M2 − 2 is multiplied by two as we should consider two directions of the connection, N<sup>1</sup> → N<sup>2</sup> and N<sup>2</sup> → N1. The obtained expected numbers of motifs are given by the following expressions:

$$\mathcal{N}\_{M1-2} = \frac{\Delta\_{ad}^2}{2l^2\pi} \int\_0^{r\_{\text{max}}} \int\_{-\pi}^\pi \kappa\_1(\alpha, r) \, r \, dr \, d\alpha,\tag{12}$$

$$\mathcal{N}\_{M2-2} = \frac{\Delta\_{ad}^2}{l^2\pi} \int\_0^{r\_{\text{max}}} \int\_{-\pi}^\pi (2\pi - \kappa\_1(\alpha, r)) \, r \, dr \, d\alpha.$$

If the effective radius is larger than the axon-dendrite distance in a neuron (1max > 1ad) the dendrite center B1 falls inside the connectivity area of its axon A1. In the considered model, the dendrite centers coincide with the somata and, in general case, they should not be dimensionless. We neglect the finite size of the somata assuming it to be much smaller than the size of the neurite field and the connectivity area. If the somata are not negligible, a correction needs to be applied in order to exclude possibility that some dendrite center overlaps with B1. The correction coefficients for all 2-node and 3-node motifs are given in Supplementary Material 2.

#### 2.3.4. The Definition of κ<sup>1</sup> and κ

The function κ<sup>1</sup> describes the probability that A2 falls inside the connectivity area of B1, Brmax (B1), and is proportional to the intersection between this connectivity area and the circle C1(B2). The intersection is determined by the angle ∠B2IJ shown in **Figure 2C**, this angle is entirely determined by the coordinates of the dendrite centers B1(0, 1) and B2(α2,r2). Similarly, we can define a more general function κ if we replace B1 with some other dendrite center B3(α3,r3) with arbitrarily chosen coordinates. This way we have <sup>κ</sup>1(α2,r2) <sup>=</sup> <sup>κ</sup>(α2,r2, <sup>0</sup>, 1)<sup>3</sup> . The function κ is shown by the orange line in **Figure 2D**, and it is equal to the angle ∠B2KL shown in the same panel

$$\begin{aligned} \kappa'(\alpha\_2, r\_2, \alpha\_3, r\_3) &= 2 \arccos\left(\frac{1 - r\_{\text{max}}^2 + d\_{23}^2}{2d\_{23}}\right), \\ d\_{23} &= \|B2B3\| = \sqrt{r\_2^2 + r\_3^2 - 2r\_2r\_3\cos(\alpha\_2 - \alpha\_3)}. \end{aligned}$$

One special case has to be considered when defining κ ′ . If the distance between the dendrite centers is smaller or equal to <sup>1</sup>max <sup>−</sup> <sup>1</sup>ad, i.e., if the circle <sup>C</sup>1(B2) entirely belongs to the connectivity area of the other dendrite, the function κ ′ (·) becomes complex as its argument becomes larger than 1. However, the intersection angle in this case is 2π. This special case is taken into account in the final definition of <sup>κ</sup>(·):

$$\begin{aligned} \kappa(\alpha\_2, r\_2, \alpha\_3, r\_3) &= \\ \begin{cases} \kappa'(\alpha\_2, r\_2, \alpha\_3, r\_3), & \|r\_{\max} - 1\| < \|B2B3\| < r\_{\max} + 1\\ 2\pi, & \|B2B3\| \le r\_{\max} - 1, \ r\_{\max} \ge 1\\ 0, & \|B2B3\| \ge r\_{\max} + 1\\ 0, & \|B2B3\| \le 1 - r\_{\max}, \ r\_{\max} < 1 \end{cases} \end{aligned} \tag{13}$$

#### 2.3.5. Three-Node Connectivity Patterns (Figure 2D)

**Figure 2D** describes the two-step procedure needed to evaluate the expected number of the majority of three-node motifs. In the first step, two dendrite centers B2 and B3 are placed inside the connectivity area of the axon A1, which ensures the connections from N<sup>1</sup> to N<sup>2</sup> and N3. In the second step, the position of the axon center A2 is chosen on the circle C1(B2) around the dendrite center B2. The intersections of this circle with the connectivity areas around B1 and B3 determine possible connectivity patterns between the three neurons, and the lengths of these intersections

0

<sup>3</sup>The functions κ<sup>1</sup> and κ are introduced as two separate notions in order to simplify the notation in the equations that follow.

are proportional to the probabilities of the connectivity patterns.

The intersection <sup>C</sup>1(B2) <sup>∩</sup> <sup>B</sup>rmax (B1) defines the function <sup>κ</sup>1, as in the case of 2-node motifs, which corresponds to the angle <sup>∠</sup>B2IJ in **Figure 2D** and is colored red. The intersection <sup>C</sup>1(B2)<sup>∩</sup> <sup>B</sup>rmax (B3) defines the function κ, a generalization of κ1, which is shown in orange in **Figure 2D** and corresponds to the angle ∠B2KL. If the circle and both connectivity areas intersect, the function λ is non-zero. This is shown in purple in **Figure 2D** and corresponds to the angle ∠B2KJ.

If A2 falls inside the connectivity area around B1, but outside of the connectivity area around B3, the neuron N<sup>2</sup> will have a bidirectional connection with N<sup>1</sup> but no connection toward N<sup>3</sup> (although, it is possible that it receives a connection from N3). The probability for this is proportional to the function (κ<sup>1</sup> <sup>−</sup> <sup>λ</sup>). If A2 falls inside the connectivity area of B3, but outside the one of B1, the neuron N<sup>2</sup> receives a unidirectional connection from N1, and also forms the connection with N<sup>3</sup> (which might be unidirectional or bidirectional, depending on the position of axon A3). Finally, if A2 falls within the intersection between two connectivity areas, neuron N<sup>1</sup> has a bidirectional connection with N<sup>2</sup> and at least a unidirectional connection to N3.

The same analysis is repeated for the intersections between the circle C1(B3), which defines the possible positions of the axon center A3, and the connectivity areas around B1 and B2. This gives the probabilities for the remaining connections. Finally, the following probabilities correspond to the connectivity patterns between the three neurons:

<sup>N</sup><sup>2</sup> <sup>→</sup> <sup>N</sup>1, <sup>N</sup>3: 1 2π λ(α2,r2,α3,r3), (14) <sup>N</sup><sup>2</sup> <sup>→</sup> <sup>N</sup>1, <sup>N</sup><sup>2</sup> 6→ <sup>N</sup>3: 1 2π <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) , <sup>N</sup><sup>2</sup> <sup>→</sup> <sup>N</sup>3, <sup>N</sup><sup>2</sup> 6→ <sup>N</sup>1: 1 2π κ(α2,r2,α3,r3) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) , <sup>N</sup><sup>2</sup> 6→ <sup>N</sup>1, <sup>N</sup>3: 1 2π <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>(α2,r2,α3,r3) <sup>−</sup> <sup>κ</sup>1(α2,r2) <sup>+</sup> <sup>λ</sup>(α2,r2,α3,r3) , <sup>N</sup><sup>3</sup> <sup>→</sup> <sup>N</sup>1, <sup>N</sup>2: 1 2π λ(α3,r3,α2,r2), <sup>N</sup><sup>3</sup> <sup>→</sup> <sup>N</sup>1, <sup>N</sup><sup>3</sup> 6→ <sup>N</sup>2: 1 2π <sup>κ</sup>1(α3,r3) <sup>−</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>N</sup><sup>3</sup> <sup>→</sup> <sup>N</sup>2, <sup>N</sup><sup>3</sup> 6→ <sup>N</sup>1: 1 2π κ(α3,r3,α2,r2) <sup>−</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>N</sup><sup>3</sup> 6→ <sup>N</sup>1, <sup>N</sup>2: 1 2π <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>(α3,r3,α2,r2) <sup>−</sup> <sup>κ</sup>1(α3,r3) <sup>+</sup> <sup>λ</sup>(α3,r3,α2,r2) .

The expressions on the right are divided by 2π, as the full circle corresponds to the probability 1.

#### 2.3.6. Definition of λ

The first step is to find the angular coordinates of the intersection points between the circle C1(B2) and the edges of the two connectivity areas, Crmax (B1) and Crmax (B3). These points are indicated as I, J, K, and L in **Figure 2D**. The same is done for the intersections between C1(B3) and the edges of connectivity areas around B1 and B2. The following list summarizes these angles:

$$\begin{aligned} I(\varphi\_1^{21}), \ f(\varphi\_2^{21}) & \colon & \mathcal{C}\_1(B2) \cap \mathcal{C}\_{r\_{\max}}(B1), \\ K(\varphi\_1^{23}), \ f(\varphi\_2^{23}) & \colon & \mathcal{C}\_1(B2) \cap \mathcal{C}\_{r\_{\max}}(B3), \\ \varphi\_1^{31}, \ \varphi\_2^{31} & \colon & \mathcal{C}\_1(B3) \cap \mathcal{C}\_{r\_{\max}}(B1), \\ \varphi\_1^{32}, \ \varphi\_2^{32} & \colon & \mathcal{C}\_1(B3) \cap \mathcal{C}\_{r\_{\max}}(B2). \end{aligned}$$

The angles ϕ 21 1,2 and ϕ 31 1,2 always exist as the corresponding intersections exist for every B2 and B3 inside the connectivity area of A1. The intersections ϕ 23 1,2 , ϕ32 1,2 exist when rmax ≥ 1, but for rmax < 1 an additional condition for the coordinates of B2 and B3 has to be imposed.

The function λ depends on the length of the arc between these angles, which is independent of the choice of the reference coordinate system. The simplest equations are obtained if we translate the coordinate system from A1 to B2, then rotate it to have the coordinate axis in the direction from B2 to B1. The new coordinate center is B2, while B1 maintains the zero angular coordinate. The first translation requires the following coordinate transform

$$
\tilde{r}\cos(\tilde{\alpha}) = r\cos(\alpha) - r\_2\cos(\alpha\_2), \ \tilde{r}\sin(\tilde{\alpha}) = r\sin(\alpha) - r\_2\sin(\alpha\_2).
$$

The second rotation is done by subtracting the angular coordinate of B1 in the translated system, equal to τ (0, 1,α2,r2), from all other angles. The relations between the original coordinates and the coordinates in the translated-then-rotated system are:

$$\begin{aligned} \tilde{r} &= \sqrt{r^2 + r\_2^2 - 2r\_2 \cos(\alpha - \alpha\_2)}, \\ \tilde{\alpha} &= \pi(\alpha, r, \alpha\_2, r\_2) - \pi(0, 1, \alpha\_2, r\_2), \\ \pi(\alpha, r, \alpha\_2, r\_2) &= \arctan\left(\frac{r \sin(\alpha) - r\_2 \sin(\alpha\_2)}{r \cos(\alpha) - r\_2 \cos(\alpha\_2)}\right). \end{aligned}$$

Function τ updates the angular coordinates after the translation of the coordinate system to (α2,r2). In the new coordinate system the intersecting angles between C1(B2) and Brmax (B3) are given as

$$\begin{split} \tilde{\varphi}\_{1,2}^{23} &= \tilde{\alpha}\_3 \mp \arccos \left( \frac{1 - r\_{\text{max}}^2 + \tilde{r}\_3^2}{2\tilde{r}\_3} \right) \\ &= \pi(\alpha\_3, r\_3, \alpha\_2, r\_2) \mp \frac{1}{2} \kappa(\alpha\_3, r\_3, \alpha\_2, r\_2) .\end{split}$$

All the relevant intersection angles are:

$$\begin{aligned} \mathcal{C}\_{1}(\mathsf{B2}) \cap \mathcal{C}\_{r\_{\max}}(\mathsf{B1}) & \colon & \tilde{\varphi}\_{1,2}^{21} = \mathfrak{r}\left(0, 1, \alpha\_{2}, r\_{2}\right) \mp \frac{1}{2} \kappa\_{1}(\alpha\_{2}, r\_{2}), \\ \mathcal{C}\_{1}(\mathsf{B2}) \cap \mathcal{C}\_{r\_{\max}}(\mathsf{B3}) & \colon & \tilde{\varphi}\_{1,2}^{23} = \mathfrak{r}\left(\alpha\_{3}, r\_{3}, \alpha\_{2}, r\_{2}\right) \\ & \qquad & \mp \frac{1}{2} \kappa\left(\alpha\_{3}, r\_{3}, \alpha\_{2}, r\_{2}\right), \\ \mathcal{C}\_{1}(\mathsf{B3}) \cap \mathcal{C}\_{r\_{\max}}(\mathsf{B1}) & \colon & \tilde{\varphi}\_{1,2}^{31} = \mathfrak{r}\left(0, 1, \alpha\_{3}, r\_{3}\right) \\ & \qquad & \mp \frac{1}{2} \kappa\_{1}(\alpha\_{3}, r\_{3}), \end{aligned}$$

Frontiers in Neuroanatomy | www.frontiersin.org June 2015 | Volume 9 | Article 76 |

$$\begin{aligned} \mathcal{C}\_1(B3) \cap \mathcal{C}\_{r\_{\max}}(B2) & \colon & \tilde{\varphi}\_{1,2}^{32} = \mathfrak{r}(\alpha\_2, r\_2, \alpha\_3, r\_3) \\ & \mp \frac{1}{2} \kappa(\alpha\_2, r\_2, \alpha\_3, r\_3) = \tilde{\varphi}\_{1,2}^{23} .\end{aligned}$$

Obtaining the length of the intersection arc from these angles requires considering each possible mutual position of the three angles. This problem was solved using the following procedure. The four angles were sorted from smallest to largest into a vector of angles <sup>ϕ</sup>˜(α2,r2,α3,r3). The sorted angles parcel the circle C1(B2) into four arcs. For each arc we evaluated the distance between its middle point and the two centers B1 and B3. If both distances are smaller than rmax, it indicates that the whole segment belongs to the intersection area Brmax (B1) ∩ Brmax (B3). All the segments that passed this test were summed up to obtain the function λ ′ (α2,r2,α3,r3). This function is non-zero when all three circles intersect. If dendrites B1 and B3 do not overlap, the function is zero. The function can be expressed as

$$\begin{aligned} \lambda' &= \sum\_{\text{Cond.}} |\phi\_i - \phi\_j| \cdot h(r\_{\text{max}} - d\_{i,j}^1) \cdot h\left(r\_{\text{max}} - d\_{i,j}^3\right), \\ \text{Cond.} &: i = 1..4, \ j = \text{mod}(i, 4) + 1, \\ d\_{i,j}^k &= \sqrt{\left(1 + \tilde{r}\_k^2 - 2\tilde{r}\_k \cos(\phi\_i + \phi\_j/2)\right)}, \ k = 1, 3. \end{aligned}$$

The function h(·) is the Heaviside function, equal to one if the argument is positive and equal to zero otherwise. The variables d 1 i,j and d 3 i,j are distances from the middle points of the four arcs to the dendrite centers B1 and B3, respectively. The variables ϕ**<sup>i</sup>** are the sorted angles from the vector <sup>ϕ</sup>˜.

If C1(B2) does not intersect with dendrite B1 or B3, the function λ ′ is not defined, and the extension of the definition given by Equation (15) is needed. The first case in the list corresponds to the situation when all three circles intersect and the length of the intersection angle is between 0 and 2π. When <sup>k</sup>B2B3k ≤ <sup>r</sup>max <sup>−</sup> 1 the circle <sup>C</sup>1(B2) is inside <sup>B</sup>rmax (B3) and <sup>λ</sup> <sup>=</sup> <sup>2</sup>π. On the contrary, when <sup>k</sup>B2B3k ≤ <sup>1</sup> <sup>−</sup> <sup>r</sup>max, the area Brmax (B3) is inside C1(B2) and the function is zero. It is also zero when kB2B3k ≥ 1 + rmax, i.e., when the circle and the area are missing each other.

$$\begin{cases} \lambda(\alpha\_2, r\_2, \alpha\_3, r\_3) = \\ \begin{cases} \lambda'(\alpha\_2, r\_2, \alpha\_3, r\_3), & \|B1B2\| > r\_{\max} - 1 \\ & \text{\& } |r\_{\max} - 1| < \|B2B3\| < 1 + r\_{\max} \end{cases} \\\\ \kappa\_1(\alpha\_2, r\_2), & \|B1B2\| > r\_{\max} - 1 \text{ \& } \|B2B3\| \le r\_{\max} - 1 \\\\ \kappa(\alpha\_2, r\_2, \alpha\_3, r\_3), & \|B1B2\| \le r\_{\max} - 1 \\ & \text{\& } |r\_{\max} - 1| < \|B2B3\| < r\_{\max} + 1 \\\\ 2\pi, & \|B1B2\| \le r\_{\max} - 1 \text{ \& } \|B2B3\| \le r\_{\max} - 1 \\\\ 0, & \|B2B3\| \le 1 - r\_{\max} \\\\ 0, & \|B2B3\| \ge r\_{\max} + 1 \end{cases} \tag{15}$$

#### 2.3.7. Minimal Set of Connectivity Patterns Needed to Describe Three-Node Motifs, the Definition of Central Node in a Connectivity Pattern

To compute the expected numbers of three-node motifs one has to analyze all the possible connectivity patterns between the three neurons N1, N2, and N3, each represented by two circular connectivity areas, one for the dendrite and one the for axon. **Figure 3A** shows the standard schematic representation of the 3-node motifs (Milo et al., 2002), and **Figure 3B** shows all the possible connectivity patterns between N1, N2, and N<sup>3</sup> that correspond to each of the motifs<sup>4</sup> . We will demonstrate how this full list of patterns can be reduced to 10 representative ones, sufficient to compute the expected counts for all the motifs. These 10 patterns are shown in red in the table and are also marked with the star symbol. The choice of the patterns is somewhat arbitrary and an alternative set can also be adopted, which should not affect the obtained expected numbers of motifs. Reduction to the minimal set of patterns also ensures that each pattern is counted only once.

First, we need to introduce the notion of **central node** in the motif, suppose it is N1. If N<sup>1</sup> is central to the motifs M1, M3, M5, M6, M8, M10, M12, and M13, both dendrite centers B2 and B3 belong to the connectivity area of axon A1, i.e., N<sup>3</sup> ← N<sup>1</sup> → N<sup>2</sup> has to be included in the connectivity pattern. If N<sup>1</sup> is central to the motifs M4, M7, and M11, the situation is inverse, both axon centers A2 and A3 have to belong to the connectivity area of dendrite B1, i.e., N<sup>3</sup> → N<sup>1</sup> ← N<sup>2</sup> has to be included in the pattern. If N<sup>1</sup> is central to the motifs M2 and M9, the neuron N<sup>1</sup> is on the path from N<sup>3</sup> to N2, i.e., N3 → N1 → N2 has to be part of the pattern.

The definition of the central node for the three groups of motifs is chosen to emphasize the similarities between the connectivity patterns and to enable selection of the minimal set of patterns. For example, the central node for M11 can be defined the same way as for M1, but the adopted definition emphasizes the similarity between M11 and M6. Following the definition of the central node, all of the patterns are divided into three sets, shown as three columns in **Figure 3B**. Column i contains connectivity patterns where neuron N<sup>i</sup> represents the central node. Since all of the neurons in the network have the same properties, the motif counts do not depend on the choice of the central node. Therefore, for counting all the motifs that include the neuron N1, it is sufficient to count the motifs where N<sup>1</sup> is central and multiply the obtained counts with a coefficient.

Motifs M1, M4, M8, and M13 have one possible pattern with N<sup>1</sup> as the central node, M2, M3, M5, M6, M7, M9, and M10 have two, M11 and M12 have four, but only two should be considered as the other two are repeated in columns two and three. If we further analyze the pairs of patterns that appear in column one, it is evident that one of them can be obtained from the other by

<sup>4</sup>To following calculation confirms that all patterns are included in the table. Each pair of nodes forms one of the four types of connections (2 in one direction, 1 bidirectional, no connection), this gives 4<sup>3</sup> <sup>=</sup> 64 motifs and 54 of them are connected motifs. In the table, some patterns repeat. Each motif M1, M4, and M8 corresponds to 3 different patterns. M2, M3, M5, M7, M10, and M12 correspond to 6 patterns each. M6 and M11 correspond to 3 different patterns, M9 to 2, and M13 to 1 pattern. This gives 54 patterns in total.

10 patterns are considered in computations of the expected motif counts.

switching the positions of N<sup>2</sup> and N3. Therefore, it is sufficient to consider only one of them, irrelevant which one is chosen (here, we selected the first one). The reason is the following: in order to create patterns from the first group the dendrite centers B2 and B3 have to be inside the connectivity area of axon A1. To compute all the motif counts, we have to consider every possible position of B2 and B3 within Brmax (A1). Consequently, both choices of coordinates <sup>B</sup><sup>2</sup> <sup>=</sup> (αa,ra), <sup>B</sup><sup>3</sup> <sup>=</sup> (α<sup>b</sup> ,rb ) and <sup>B</sup><sup>2</sup> <sup>=</sup> (α<sup>b</sup> ,rb ), <sup>B</sup><sup>2</sup> <sup>=</sup> (αa,ra) are considered, as well as both connectivity patterns that correspond to a certain motif. It can also happen that B2 = B3 or B2 = B1 or B3 = B1, but the number of such examples is negligible, as shown in Supplementary Material 2. To count all the occurrences of M2 and M9, we put one dendrite center (B2 or B3) in the connectivity area of axon A1, and one axon center (A3 or A2) in the connectivity area of dendrite B1. Regardless of the neuron numeration, this is sufficient to take into account every appearance of these two motifs.

Next, consider motifs M1 and M4. One of them is obtained from another by switching the orientation of all the connections. This is equivalent to exchanging dendrites and axons, if motif M1 requires B2 and B3 inside the connectivity area of A1, then motif M4 requires A2 and A3 inside the connectivity area of B1. Connectivity areas of dendrites and axons are equal, which means that counts for M1 and M4 must be equal, <sup>N</sup> (**M1**) <sup>=</sup> <sup>N</sup> (**M4**). The same holds for motifs M3 and M7, and also for M6 and M11. Consequently, M4, M7, and M11 do not need to be considered separately. This completes the search for the minimal set of patterns that are shown in red in **Figure 3B**.

Once the counts for the 10 representative patterns are computed, the final motif counts are obtained by multiplying them with the following coefficients: **3 for M2, M3, M5, M7, M10, and M12, 1.5 for M1, M4, M6, M8, and M11, 1 for M9, 0.5 for M13.** The first set of motifs is multiplied by 3 in order to take into account three possible choices of the central node. There is no need to take into account two different patterns for each central node because that is already accounted for by considering all the possible coordinates of B2 and B3, as described in a previous paragraph. Motifs M1, M4, M8 are multiplied by 3 2 , because each central node corresponds to only one pattern. Consequently, the procedure that takes into account all possible positions of B2 and B3 leads to counting every pattern twice. Closer inspection of the patterns for M6, M10, and M11 shows that each pattern in the table in **Figure 3B** repeats twice, e.g., for M6, pattern 1 for N<sup>1</sup> as the central node is equal to pattern 2 for N<sup>2</sup> as the central node. If we multiply the motif counts for central node N<sup>1</sup> by 3, in order to take into account other choices of central nodes, we actually consider each pattern twice. So the counts should be additionally divided by 2. Next, motifs M9 and M13 are circular and any choice of the central node gives the same pattern. So there is no need to multiply the counts obtained for N<sup>1</sup> by 3. In addition, M13 has only one pattern that corresponds to N<sup>1</sup> as the central node, so the count should be additionally divided by 2.

#### 2.3.8. The Expected Number of Motifs M1, M3, M5, M6, M8, M10, M12, and M13

The expressions for the expected number of 3-node motifs are obtained by combining Equations (14) with the procedure for computing the expected number of 2-node motifs. Equations (14) give probabilities for different types of connections from N<sup>2</sup> to N<sup>1</sup> and N3, and also from N<sup>3</sup> to N<sup>1</sup> and N2. The probability for each connectivity pattern from **Figure 3** is obtained by multiplying the probability of the appropriate connection from N<sup>2</sup> to N<sup>1</sup> and N<sup>3</sup> with the probability of the connection from N<sup>3</sup> to N<sup>1</sup> and N2. These probabilities are defined for any pair of coordinates of B2 and B3. In order to form any of the listed motifs, B2 and B3 have to be inside the connectivity area of A1, which defines the range of their coordinates: in the coordinate system fixed to A1, the angular coordinates α<sup>2</sup> and α<sup>3</sup> take all the possible values and the radial coordinates r<sup>2</sup> and r<sup>3</sup> have to be smaller than rmax. Similarly, as in the case of 2-node motifs we should integrate the expressions for the probabilities of connectivity patterns over all the possible coordinates for both B2 and B3, i.e., over two pairs of coordinates. This results in a quadruple integral, and the coefficient in front of the integral is the square of the coefficient obtained for the 2-node motifs.

The following expression (Equation 16) gives the expected number of the representative connectivity patterns for the motifs from this group. The total motif counts are obtained by multiplying them with the coefficients given in the previous section.

(α2,r2) is chosen inside the connectivity area of A1, but outside of the connectivity area of A3 (the dark green area in **Figure 2E**). This results in the connectivity pattern N<sup>3</sup> → N<sup>1</sup> → N2, a necessary condition for both motifs M2 and M9. In the third step, the dendrite center B3(α3,r3) is chosen on the circle C1(A3), but outside the connectivity area of A1, i.e., in the domain <sup>D</sup>(B3) <sup>=</sup> <sup>C</sup>1(A3) \ <sup>B</sup>rmax (A1). This way, the bidirectional connection between N<sup>1</sup> and N<sup>3</sup> is avoided. If C1(A3) entirely

$$\mathcal{N}\_{\rm Mfi} = \frac{\Delta\_{\rm ad}^4}{l^4} \int\_{a\chi = -\pi}^{\pi} \int\_{r\_2 = 0}^{r\_{\rm max}} \int\_{a\chi = -\pi}^{\pi} \int\_{r\_3 = 0}^{r\_{\rm max}} \eta\_i \{a\_2, r\_2, \alpha\_3, r\_3\} r\_2 \ r\_3 \ dr\_2 \, dr\_3 \, d\alpha\_2 \, d\alpha\_3 \tag{16}$$

The expression NMi corresponds to the motif Mi, and depends on the function ni(α2,r2,α3,r3):

belongs to the connectivity area of A1, motifs M2 and M9 are impossible. Therefore, an additional condition for the

<sup>n</sup>1(α2,r2,α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>κ</sup>(α2,r2,α3,r3) <sup>+</sup> <sup>λ</sup>(α2,r2,α3,r3) · · <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>1(α3,r3) <sup>−</sup> <sup>κ</sup>(α3,r3,α2,r2) <sup>+</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>3(α2,r2,α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) · · <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>1(α3,r3) <sup>−</sup> <sup>κ</sup>(α2,r2,α3,r3) <sup>+</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>5(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>κ</sup>(α2,r2,α3,r3) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) · <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>(α3,r3,α2,r2) <sup>−</sup> <sup>κ</sup>1(α3,r3) <sup>+</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>6(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>λ</sup>(α2,r2,α3,r3) · <sup>2</sup><sup>π</sup> <sup>−</sup> <sup>κ</sup>1(α3,r3) <sup>−</sup> <sup>κ</sup>(α3,r3,α2,r2) <sup>+</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>8(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) · <sup>κ</sup>1(α3,r3) <sup>−</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>10(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>λ</sup>(α2,r2,α3,r3) · <sup>κ</sup>(α2,r2, <sup>α</sup>3,r3) <sup>−</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>12(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>λ</sup>(α2,r2,α3,r3) · <sup>κ</sup>1(α2,r2) <sup>−</sup> <sup>λ</sup>(α3,r3,α2,r2) , <sup>n</sup>13(α2,r2α3,r3) <sup>=</sup> 1 4π<sup>2</sup> · <sup>λ</sup>(α2,r2,α3,r3) · <sup>λ</sup>(α3,r3,α2,r2).

From the definition of κ, κ1, and λ, all the functions n<sup>i</sup> have discontinuities and therefore cannot be integrated straightforwardly. The problem was solved by dividing the entire domain of integration into sub-domains where the functions are continuous. Then, the integration was performed for each sub-domain and the total motif count is obtained by summing up all of the obtained values. The details are presented in Supplementary Material 2.

#### 2.3.9. The Expected Number of Motifs M4, M7, M11

From the previous discussion, these values are equal to the expected number of motifs M1, M3, and M6, respectively.

#### 2.3.10. The Expected Number of Motifs M2 and M9 (Figure 2E)

The computations for motifs M2 and M9 require a four-step procedure illustrated in **Figure 2E**. First, the axon center A3, given by coordinates (α a 3 ,r a 3 ), is chosen inside the connectivity area of dendrite B1. Next, the dendrite center B2 with coordinates coordinates of A3 is: r a <sup>3</sup> <sup>&</sup>gt; <sup>r</sup>max <sup>−</sup> 1. In the final step, axon A2 is chosen on the circle C1(B2). Same as before, the intersection between this circle and the connectivity areas of B1 and B3 defines the probabilities to form motifs M2 and M9. These probabilities are expressed using functions κ1, κ, and λ. Motif M2 emerges if A2 falls outside of both connectivity areas, while M9 emerges if A2 falls inside the connectivity area of B3, but outside the one of B1.

The expected numbers of motifs M2 and M9 are computed similarly as before. The probabilities of the representative connectivity patterns are integrated for all possible positions of A3 and B2. In addition, we have to take into account all the positions of B3, which adds the fifth integral to the equations. The easiest way to evaluate this innermost integral is by translating the coordinate system from A1 to A3, to simplify expressions for the coordinates of B3 in <sup>D</sup>(B3) <sup>=</sup> <sup>C</sup>1(A3) \ <sup>B</sup>rmax (A1). The outer quadruple integral is evaluated in the coordinate system of A1. The obtained expressions for the expected number of motif counts are:

$$\begin{split} \mathcal{N}\_{\mathrm{M2}/\mathrm{M}9} &= \frac{\Delta\_{\mathrm{ad}}^{4}}{l^{4}} \int \int\_{\left(\alpha\_{3}^{4},r\_{3}^{4}\right)} \int \int\_{\left(\alpha\_{2},r\_{2}\right)} n\_{2/9} \langle \alpha\_{2},r\_{2},\alpha\_{3}^{4},r\_{3}^{4}\rangle \, r\_{2} \, r\_{3}^{4} \, dr\_{2} \, dr\_{3} \, d\alpha\_{2} \, d\alpha\_{3}^{4}, \\ n\_{2}\langle \alpha\_{2},r\_{2},\alpha\_{3}^{4},r\_{3}^{4}\rangle &= \frac{1}{4\pi^{2}} \int\_{\mathcal{D}(\mathrm{B3})} \left(2\pi-\kappa\_{1}(\alpha\_{2},r\_{2})-\kappa(\alpha\_{3},r\_{3}(\alpha\_{3}),\alpha\_{2},r\_{2})+\lambda(\alpha\_{2},r\_{2},\alpha\_{3},r\_{3}(\alpha\_{3}))\right) \, d\alpha\_{3}, \\ n\_{3}\langle \alpha\_{2},r\_{2},\alpha\_{3}^{4},r\_{3}^{4}\rangle &= \frac{1}{4\pi^{2}} \int\_{\left(\alpha\_{3},r\_{3}\right)} \Big(\kappa(\alpha\_{3},r\_{3}(\alpha\_{3}),\alpha\_{2},r\_{2})-\lambda(\alpha\_{2},r\_{2},\alpha\_{3},r\_{3}(\alpha\_{3}))\Big) \, d\alpha\_{3}. \end{split}$$

3 4π<sup>2</sup> D(B3)

3

#### 2.3.11. Clustering Coefficient (CC)

Clustering coefficient quantifies the density of connections in the local neighborhood of each network node. The percent of connected neighbors is estimated for each network node, and the average over all nodes represents the clustering coefficient (Watts and Strogatz, 1998; Boccaletti et al., 2006). A global measure related to the clustering coefficient is transitivity (Watts and Strogatz, 1998; Boccaletti et al., 2006) which estimates the number of triangles among all the connected triplets in a network. Here, we consider a simple case of identical neurons (network nodes) uniformly distributed in a planar space without boundaries. The clustering coefficient of the resulting network is identical to the local clustering coefficient of each node. Similarly, the global transitivity measure reduces to the measure evaluated for a single node. We employ one possible extension of the original clustering coefficient (for undirected networks) to the case of directed networks (Boccaletti et al., 2006; Sporns, 2011; Telesford et al., 2011; Mäki-Marttunen et al., 2013):

$$\begin{split} \text{CC}\_{N\_1} &= \frac{1}{4 \cdot n\_{\text{neighbors}} (n\_{\text{neighbors}} - 1)} \\ &\times \sum\_{i=2}^{\mathcal{N}} \sum\_{j=2}^{i-1} (M\_{1i} + M\_{i1}) \cdot (M\_{1j} + M\_{j1}) \cdot (M\_{ij} + M\_{ji}). \end{split} \tag{18}$$

The expression holds for a network of N nodes where each node has nneighbors neighbors. The values Mij describe the presence or absence of a connection between nodes i and j, Mij = 1 if a connection from i to j exists and Mij = 0 otherwise. This equation can be re-written as a linear combination of motif counts. We can group all pairs of neighbors of node N<sup>1</sup> according to the motif they form. The number of pairs in each group is equal to the corresponding motif count. Each motif count should be multiplied with the coefficient determined by the product from the summation above. Clearly, if a motif has two unconnected nodes (like M1 or M2) the coefficient is zero. For M5 and M9 the coefficient is 1, for M6, M10, M11 it is 2, for M12 it is 4, and for M13 it is 8. From the previous derivations, the expected motif counts are given by the values 3NM<sup>5</sup> for M5, 1.5NM<sup>6</sup> for M6, NM<sup>9</sup> for M9, 3NM<sup>10</sup> for M10, 1.5NM<sup>11</sup> for M11, 3NM<sup>12</sup> for M12, 0.5NM<sup>13</sup> for M13. The number of neighbors can be expressed using the expected 2-node motif counts, <sup>n</sup>neighbors <sup>=</sup> <sup>N</sup>M1−<sup>2</sup> <sup>+</sup> <sup>N</sup>M2−2, as the sum of unidirected and bidirected connections that start or end in N1. The equation for the expected clustering coefficient becomes

$$\begin{aligned} CC\_n &= 3\mathcal{N}\_{M5} + 3\mathcal{N}\_{M6} + \mathcal{N}\_{M9} + 6\mathcal{N}\_{M10} + 3\mathcal{N}\_{M11} \\ &+ 12\mathcal{N}\_{M12} + 4\mathcal{N}\_{M13} \end{aligned}$$

$$\text{CC}\_d = 4(\mathcal{N}\_{M1-2} + \mathcal{N}\_{M2-2})(\mathcal{N}\_{M1-2} + \mathcal{N}\_{M2-2} - 1)\tag{19}$$

$$\text{CC} = \text{CC}\_n / \text{CC}\_d\tag{19}$$

#### 2.3.12. Path Length

The path length PLij from neuron N<sup>i</sup> to neuron N<sup>j</sup> is equal to the minimal number of edges on a **traversable** path between them. If the neurons are unconnected then PLi,<sup>j</sup> = ∞. If PLi,<sup>j</sup> <sup>=</sup> <sup>k</sup> <sup>&</sup>gt; 1, no direct connection between the two neurons exists. Instead, the path from one of them to the other goes through k − 1 other neurons. We compute the harmonic path length (Watts and Strogatz, 1998; Boccaletti et al., 2006; Mäki-Marttunen et al., 2011), the harmonic mean over the shortest path lengths for all the pairs of neurons in the network. In the population of identical, randomly oriented and uniformly distributed neurons, this coefficient becomes equal to the harmonic path length computed for one fixed neuron, for example neuron N1, as follows

$$PL^{-1} = \frac{1}{\mathcal{N} - 1} \sum\_{i=2}^{\mathcal{N}} \frac{1}{PL\_{1,i}}.$$

Instead of computing the harmonic mean we use an equivalent expression for the expected harmonic path length

$$PL^{-1} = \sum\_{k=1}^{+\infty} \frac{1}{k} \cdot P(PL = k). \tag{20}$$

There, P(PL = k) is the probability that the shortest path from N<sup>1</sup> to some other node goes through k direct edges, i.e., through k−1 other nodes. For sufficiently large networks, the mean converges toward the expected value, which should hold for the considered model. In the derivations that follow, all the coordinates are expressed in the coordinate system fixed to neuron N1, as it was described before. In this coordinate system, the path length from N<sup>1</sup> to a specific neuron N<sup>X</sup> depends only on the radial but not on the angular coordinate of NX, so we can fix the angular coordinate to <sup>α</sup><sup>X</sup> <sup>=</sup> 0 and consider only the neurons along the coordinate axis.

The probability of the shortest path length P(PL = k) is computed using the following expression

$$P\left(PL = k\right) = \frac{2\pi\Delta\_{ad}^2}{l^2\mathcal{N}} \int\_{r\_\mathcal{X}} \left(P(PL \le k \mid r\_\mathcal{X})\right)$$

$$-P(PL \le k - 1 \mid r\_\mathcal{X})\int r\_\mathcal{X} \, dr\_\mathcal{X}.\tag{21}$$

where the integration is done over the radial coordinate rX. The integrated function is the joint distribution of path length and radial coordinate. The joint distribution is expressed as the product of the shortest path length distribution conditioned on the radial coordinate and the probability that a dendrite center has such radial coordinate. The probability of having a radial coordinate r<sup>X</sup> is simply expressed as the number of dendrite centers within a ring with the radius r<sup>X</sup> divided by the total number of neurons N . The path length distribution conditioned on the radial coordinate is expressed using another conditional probability, P(PL ≤ k | rX). For the fixed radial coordinate, this probability shows how likely is that the shortest path length of the considered neuron does not exceed k.

The last conditional probability is obtained from the following analysis. Consider a neuron N<sup>X</sup> and fix its dendrite center to rX. If it has the shortest path length at most k, then there must be one other neuron that connects to NX, i.e., that has its axon center within the connectivity area of the dendrite BX, and that has the shortest path length no bigger than k−1. Clearly, this is opposite to the statement that every neuron either does not connect to N<sup>X</sup> or has the shortest path length bigger than k−1. If we express this formally, as probabilities of the described events, and consider all neurons independent on each other we can write the conditional probability as

$$P\left(PL \le k \mid r\chi\right) = 1 - \left(1 - \nu(k - 1 \mid r\chi)\right)^{N-2}.\tag{22}$$

The last equation depends on the assisting function <sup>ν</sup>(<sup>k</sup> <sup>−</sup> <sup>1</sup>|rX). If we consider one particular neuron with the fixed coordinates, the probability that it connects to N<sup>X</sup> and has the path length at most equal to <sup>k</sup> <sup>−</sup> 1 is described by <sup>ν</sup>(<sup>k</sup> <sup>−</sup> <sup>1</sup>|rX). Finally, this function can be expressed as a function of the conditional probability

$$\nu(k-1\mid r\chi) = \frac{\Delta\_{ad}^2}{2\pi l^2 \mathcal{N}} \int\_{\mathcal{r}} \int\_{\mathcal{\alpha}} P(\mathcal{P}L \le k-1\mid r) \cdot \kappa(\alpha, r, 0, r\chi) \cdot r \, dr \, d\alpha. \tag{23}$$

The expressions for the conditional probability and the νfunction form a pair of iterative equations that should be computed for all feasible values of k. The definition of connectivity area gives the initial condition for these equations

$$P\left(PL \le 1 \mid r\_X\right) = \begin{cases} 1 & r\_X \le r\_{\max} \\ 0 & r\_X > r\_{\max} \end{cases}$$

The obtained expressions are different from the methodology used for motif counts or clustering coefficient. The harmonic path length represents a global measure of network structure and consequently depends on the total number of neurons in the population. The equations derived here are carefully analyzed in Supplementary Material 3. Every step in the presented procedure is illustrated. An equivalent model is simulated and the results from the theoretical model (from this section) and the simulated model are shown alongside.

#### 2.3.13. The Small-World Coefficient

The clustering coefficient and the shortest path length are sufficient for the computation of the small-world coefficient. We consider two different definitions. The classical definition of the small-world coefficient (Watts and Strogatz, 1998) is the following:

$$SW\_{\text{ws}} = \frac{CC/CC\_{random}}{PL/PL\_{random}}.$$

Here, the clustering coefficient CC and the shortest path length PL of the considered network are compared to those of a uniform random network. In a small-world network, the clustering coefficient should be relatively high, similarly to the situation in lattice networks, while the path length should be short, similarly to the case of uniform random networks. Therefore, the SW coefficient should be close to one for the uniform random networks and much bigger than one for the small-world networks.

Additionally, we consider another definition from the literature introduced in Telesford et al. (2011) that compares a network with both, uniform random and locally coupled networks

$$SW\_q = \frac{PL\_{random}}{PL} - \frac{CC}{CC\_{local}}.\tag{24}$$

For a network similar to the uniform random one, the first factor PLrandom PL should be close to one while the second factor CC CClocal becomes very small as uniform random networks have a much smaller clustering coefficient than locally coupled networks. Therefore, SW<sup>q</sup> is positive and close to one. For a network similar to a locally coupled network, the first factor is small, as the PL of such networks is much larger than in random networks, while the second factor is close to one. The coefficient SW<sup>q</sup> is negative and close to minus one. In case of small-world networks, both the first and the second factor are close to one and SW<sup>q</sup> is close to zero.

#### 2.3.14. Locally Coupled Networks

The locally coupled networks are generated to correspond to the extreme situation in our model, the overlapping axon and dendrite centers (1ad <sup>=</sup> 0). The number of <sup>N</sup> nodes is uniformly distributed in the two-dimensional space (of size L × L) with the density equal as before i.e., equal <sup>1</sup> l 2 . The twodimensional space is projected on a torus to avoid boundary conditions. The number of nodes is sufficiently bigger than the maximal considered node degree. A node is connected to every other node inside its connectivity area, which gives the node degree according to Equation (6). A network generated this way has only bi-directional connections and can express only motifs M2-1, M8, and M13, we call it "strictly locally coupled network." To increase variability in motif counts and still maintain the properties of a locally coupled network, we removed 10% randomly selected connections and established them with the nearest neurons outside the connectivity area, we refer to it as "locally coupled network with 10% of non-local connections."

Frontiers in Neuroanatomy | www.frontiersin.org June 2015 | Volume 9 | Article 76 |

#### 2.3.15. Uniform Random Networks

These networks are generated in a standard way. Each connection is set with the probability p = ndegree N independently on other connections. Clearly, the finite size of these networks raises some issues. In the analyzed model networks, the total number of nodes is explicitly considered only when computing path length through the network. The network is considered virtually infinite. There is no boundary conditions and each node has an equal number of available neighbors. In the locally coupled networks, as described above, a comparable model is provided by choosing a large enough network and projecting it on a torus. In uniform random networks, the problem is somewhat more difficult because the network size determines probability of connection, the parameter that affects all considered network measures. In the results presented here, we fix the network size and the probability of connection solely varies with the node degree.

### 3. Results

The results of the model analysis are divided into two parts, similarly to the model description. In the first part, the properties of neurite morphology are related to the connectivity between pairs of neurons. Quantitative measures such as the expected number of synapses, the effective radius of the connectivity area, and the node degree are derived as functions of the neurite model parameters. In the second part, the concept derived in the first step, the effective radius, is related to the typical measures used to quantify connectivity in networks, motif counts, clustering coefficient, path length, and small-world coefficient. This way we divided the initial question, how the properties of neurite morphology affect connectivity in large networks, into two easier goals that better explain the role of different aspects of the model.

#### 3.1. The Expected Number of Synapses

In this section, we show how the expected number of synapses S depends on the neuron model parameters. We give the general expression for this dependency in Methods Section by Equation (4). The derivation of S¯ is given in Supplementary Material 1. We consider the neurites with circular support, i.e., with neurite segments distributed inside the circle of radius R<sup>a</sup> for axons and R<sup>d</sup> for dendrites, and with one of the two forms of distributions, uniform or truncated Gaussian. The truncated Gaussians have equal variances along the two dimensions and the zero cross-covariance, the cases that simplify computations. Neurite distributions are described by the parameter set M, which is an empty set for the uniform distribution and contains normalized parameters M = {σ, <sup>k</sup><sup>σ</sup> } for the truncated Gaussians. The presented methodology can be applied in more general situations, for neurites with elliptic support and a general form of truncated Gaussian distribution.

According to Equation (4), S¯ depends linearly on the number of axon and dendrite segments, N<sup>a</sup> and Nd, and also on the square of the unit length D. It has a non-trivial dependency on the axon-dendrite distance 1, on the average neurite size R, and on their ratio, the normalized axon-dendrite distance ρ. In addition, it depends on the asymmetry index η, the parameter that quantifies asymmetry between the size of axons and dendrites. This parameter takes values from the interval <sup>η</sup> <sup>∈</sup> [0, 1], for <sup>η</sup> <sup>=</sup> 0 the dendrite and axon radii are the same (R<sup>a</sup> <sup>=</sup> <sup>R</sup>d), and for <sup>η</sup> <sup>→</sup> 1 the axons are much bigger than dendrites (R<sup>a</sup> >> <sup>R</sup>d). In the considered model, the axons are always bigger than the dendrites. Finally, S¯ depends on the neurite density distributions and the set of normalized parameters M.

#### 3.1.1. The Expected Number of Synapses as a Function of Axon-Dendrite Distance (Figures 4A–D)

We first show how S¯ depends on the axon-dendrite distance and on the normalized axon-dendrite distance by fixing all the other parameters. This way the expected number of synapses becomes proportional to the function φ(ρ,η, M), consequently called the distance-dependent expected number of synapses. This is illustrated in the left column in **Figures 4A–D**. Different panels correspond to different distributions of neurite segments, which are indicated on each panel along with the distribution parameters. The x-axis in **Figures 4A–D** shows the axondendrite distance (1 <sup>∈</sup> [0, <sup>R</sup><sup>a</sup> <sup>+</sup> <sup>R</sup>d]) and the normalized axon dendrite distance (ρ <sup>∈</sup> [0, 1]). Four different cases in each panel correspond to different values of the asymmetry index (values for the asymmetry index and the color code are indicated in **Figure 4**).

**Figure 4A** illustrates the expected number of synapses obtained when both the axon and dendrite have uniform distribution of neurite segments. In this case, the function is determined solely by the overlap between neurites, i.e., by the parameters that determine the overlap, the (normalized) axondendrite distance and the average neurite size. For <sup>ρ</sup> <sup>≤</sup> <sup>η</sup>, i.e., <sup>1</sup> <sup>≤</sup> <sup>R</sup><sup>a</sup> <sup>−</sup> <sup>R</sup>d, the dendrite is entirely inside the axon and the expected number of synapses is maximal. As the axon-dendrite distance increases further, the overlap between the two neurites decreases until it vanishes for ρ > 1, i.e., for 1 > <sup>R</sup><sup>a</sup> <sup>+</sup> <sup>R</sup>d.

**Figures 4B–D** show three typical results obtained for axons and dendrites modeled as truncated Gaussians. When neurite segments are evenly distributed across the neurite support, i.e., when the distribution variances are similar or larger than the neurite radii, the size of the axon-dendrite overlap dominantly determines the shape of distance-dependent expected number of synapses. The resulting function, shown in **Figure 4B**, is somewhat similar to the case obtained for the uniform density distributions from **Figure 4A**. For <sup>ρ</sup> <sup>≤</sup> <sup>η</sup> the function slowly decreases (unlike the case in **Figure 4A** where it is constant), while for ρ > η it decreases faster until it becomes zero. If one of the variances is similar to the average neurite size and the other is much smaller, the expected number of synapses behaves like an example in **Figure 4C**. The decrease from the maximal to zero value is much faster than in the case of **Figure 4B**. The presented example resembles a bell-shaped curve, but for some other model parameters the decrease can be even faster and result in a step function. The reason for this behavior is the following: one of the neurites has a small distribution variance, which means that the majority of neurite segments gets concentrated around the center of the neurite field. In this case, the increase

in the distance between neurite centers decreases the distancedependent expected number of synapses much faster than in the example in **Figure 4B**. When the neurite centers are close, the majority of neurite segments can form synapses, which gives maximal connectivity. For all axon-dendrite distances, when the neurite with small variance stays inside the area of other neurite, the number of synapses is high. But, when it approaches to the edge of the other neurite the majority of its segments becomes unavailable for creating synapses, so the expected number of synapses quickly decreases. If both the axon and dendrite have small variances, the expected number of synapses is a very narrow bell-shaped curve, as shown in **Figure 4D**. Both neurites have a majority of segments located around the neurite centers. As soon as those centers move apart, the probability of connection drops to almost zero. In this case, the neurite asymmetry index does not affect the expected number of synapses as much as in the other cases because narrow distributions effectively decrease neurite radii.

#### 3.1.2. The Expected Number of Synapses as a Function of the Average Neurite Size (Figures 4E–H)

The relation between S and the average neurite size is examined by fixing all the parameters except R. The dependency is described by function ρ <sup>2</sup>φ(ρ,η, M), named the size-dependent expected number of synapses, and illustrated in the right column in **Figures 4E–H**. The x axis shows the inverse of the normalized axon-dendrite distance on the interval <sup>1</sup> ρ <sup>∈</sup> [1, +∞) and the average neurite radius on the interval R ∈ - 1 2 , +∞ . The same neurite distributions and the same values of the asymmetry index are considered as in **Figures 4A–D**.

The size-dependent expected number of synapses is determined by two opposing mechanisms. An increase in the average neurite size leads to an increasing overlap between the two neurites from zero for R = 1 2 to the maximal overlap containing the entire dendrite field for R = 1 2η . The increasing overlap leads to the increasing expected number of synapses. At the same time, the increase in the average neurite radius leads to a decrease in the normalized axon-dendrite distance, the variable that reflects the distribution of neurite segments. As the neurite size increases, the fixed number of segments gets distributed over a larger area, so that the probability of neurite segment per unit area decreases. Eventually, this probability approaches zero as the average neurite size becomes very big. Clearly, the smaller probability of finding two neurite segments within the same unit area decreases the expected number of potential synapses.

For small values of the average neurite size, the first effect is dominant and the expected number of synapses increases with R. For the larger neurites the second effect dominates and the expected number of synapses decreases with the increasing R. The same arguments hold for all the neurite distributions that we examined which is illustrated in **Figures 4E–H**.

#### 3.1.3. Properties of the Distance-Dependent Expected Number of Synapses (Figure 5)

Two additional aspects of the distance-dependent expected number of synapses should be analyzed for the truncated Gaussian neurites, its maximal value obtained when the axon and dendrite centers overlap (ρ <sup>=</sup> <sup>0</sup>) and the value obtained when the axon and dendrite edges touch from the inside (ρ <sup>=</sup> η). When the axon-dendrite overlap is maximal for <sup>ρ</sup> <sup>≤</sup> <sup>η</sup> , the expected number of synapses slowly decreases as the distance between the neurite centers increases, but when the overlap is smaller than the maximum for ρ > η the decrease becomes faster. The point of change is marked with a star in **Figures 5A,B**, which are the repeated examples from **Figures 4B,C**. The neurites in **Figure 5A** have more evenly distributed neurite segments so the size of the axon-dendrite overlap has a bigger effect on the expected number of synapses and the shape of the function φ(ρ,η, M). For the truncated Gaussian neurites, the function φ(ρ,η, M) is always invertible and the effective radius can be computed (see Equation 5). The situation is different for neurites with the uniform distribution of segments, where the point (<sup>ρ</sup> <sup>=</sup> <sup>η</sup>) marks the transition from the constant to the monotonously decreasing part of the function. The constant segment is not invertible, therefore we consider only the monotonously decreasing part, i.e., the function obtained for ρ > η.

**Figures 5C,D** illustrate the range of maximal values for the distance-dependent expected number of synapses, obtained when the two neurites overlap maximally. For the truncated Gaussian neurites the maximal overlap is also given by the following equation obtained for <sup>ρ</sup> <sup>=</sup> 0 (see Supplementary Material 1):

$$\begin{split} \phi\_{\max}(\eta, M) = \phi(0, \eta, M) &= \frac{k\_{\sigma}^{2}}{1 + k\_{\sigma}^{2}} \cdot \frac{\pi}{8\sigma^{2}} \\ &\cdot \frac{1 - \exp\left(-\frac{(1 - \eta)^{2} \cdot (k\_{\sigma}^{2} + 1)}{8\sigma^{2}}\right)}{\left(1 - \exp\left(-\frac{(1 + \eta)^{2} k\_{g}^{2}}{2\sigma^{2}}\right)\right) \cdot \left(1 - \exp\left(-\frac{(1 - \eta)^{2}}{2\sigma^{2}}\right)\right)} . \end{split} \tag{25}$$

**Figure 5C** shows log(φmax) for the asymmetry index <sup>η</sup> <sup>=</sup> <sup>0</sup>.<sup>1</sup> and a wide range of values for σ and k<sup>σ</sup> , the logarithm is used because the function varies a lot for the given range of parameters. **Figure 5D** illustrates the range of values for log(φmax) obtained for different asymmetry indices, i.e., it shows the difference between log(φmax) obtained for <sup>η</sup> <sup>=</sup> <sup>0</sup>.8 and for <sup>η</sup> <sup>=</sup> <sup>0</sup>.1. Blue areas in **Figure 5D** correspond to the cases when the distance-dependent expected number of synapses decreases with the increase of the asymmetry index. The white lines that parcel the parameter space (k<sup>σ</sup> ,σ) mark the regions that give different types of functions. The upper left region corresponds to narrow dendrites, wider axons and the distance-dependent expected number of synapses in the form that goes from a stepfunctions to a bell-shaped function. The upper right region with high amplitudes corresponds to narrow axons and dendrites that give narrow bell-shaped expected number of synapses show in **Figure 4D**. The lower left region marks the parameter space that gives functions similar to those obtained for the uniformly distributed neurites, examples are shown in **Figure 4B** and in **Figure 5A**. The lower right triangular region corresponds to narrow axons and wider dendrites and the expected number of synapses in the form that goes from step-like to bell-shaped functions. In this case, the function maximum depends a lot on the asymmetry index, as indicated by the large values in **Figure 5D**. As the asymmetry index increases the size of the dendrite decreases compared to the axon size, the dendrite segments become more concentrated in a small area around the center which increases the probability of forming a synapse. The example in **Figure 4C** and in **Figure 5B** is picked near the border between the two regions, close to the dashed white line. The dashed line indicates a gradual transition between the regions<sup>5</sup> .

#### 3.2. Effective Radius and Node Degree (Figure 6)

The effective radius 1max is the maximal distance between an axon-dendrite pair expected to connect with at least one synapse and is given by Equation (5). In this section we analyze the properties of the inverse distance-dependent expected number of synapses φ −1 that maps the non-linear dependency between the effective radius and the model parameters. **Figure 6** shows

<sup>5</sup>Function φ(ρ,η, M) varies a lot with the model parameters. In order to ensure an accurate numerical evaluation of the function it has to be scaled down with a fixed coefficient before integration, then multiplied with the same coefficient after integration. Very small values of σ and large k<sup>σ</sup> (very narrow Gaussians) might cause numerical errors even after the scaling (in the form of glitches for some values of ρ), which then requires additional scaling of the function. Anyway, such narrow Gaussians likely correspond to unrealistic morphologies.

Frontiers in Neuroanatomy | www.frontiersin.org June 2015 | Volume 9 | Article 76|

The node degree is a quadratic function of the effective radius in the considered model. It depends on the square of the inverse

values <sup>η</sup> ∈ {0.1, <sup>0</sup>.3, <sup>0</sup>.6, <sup>0</sup>.8}. The distribution of neurite segments is either uniform or truncated Gaussian with parameters (<sup>σ</sup> <sup>=</sup> <sup>0</sup>.5, <sup>k</sup><sup>σ</sup> <sup>=</sup> 1),

(<sup>σ</sup> <sup>=</sup> <sup>10</sup>, <sup>k</sup><sup>σ</sup> <sup>=</sup> 50) or (<sup>σ</sup> <sup>=</sup> <sup>0</sup>.50, <sup>k</sup><sup>σ</sup> <sup>=</sup> 5).

how this function depends on variable z that is related to several model parameters as z = π 2NaNdD<sup>2</sup> · R 2 . The range of values for z is determined from Equation (5) and is equal to

$$\begin{aligned} 1 \le \frac{1}{z} \phi(\rho, \eta, M) \le \frac{1}{z} \phi\_{\text{max}}(\eta, M) \\ \implies z \le \phi\_{\text{max}}(\eta, M) = \phi(0, \eta, M). \end{aligned}$$

The inverse distance-dependent expected number of synapses, and consequently the effective radius, exists on the interval z ∈ [0,φmax] for every distribution that we analyzed in this work. **Figure 6A** shows the case with uniform distribution of neurite segments where the function almost linearly decreases from one for z = 0 to η for z = π (1+η) 2 . **Figures 6B–D** illustrate examples with a truncated Gaussian distribution of neurite segments (the same examples are shown in **Figures 4**, **5**). In all those examples the effective radius decreases with z, but with non-linearities that are the most visible for z around zero and around the maximum. When z increases the ratio <sup>N</sup>aN<sup>d</sup> <sup>R</sup><sup>2</sup> decreases. This ratio is proportional to the average number of axon-dendrite pairs of segments per unit area. A decrease of <sup>N</sup>aN<sup>d</sup> <sup>R</sup><sup>2</sup> decreases the expected number of synapses everywhere, and further from the neurite center the expected number of synapses can decrease below one. Consequently the effective radius becomes smaller.

The node degree is a quadratic function of the effective radius (see Equation 6) described by the function ψ(z,η, M), which is the square of the inverse distance-dependent expected number of synapses. This function is shown in **Figures 6E–H** for the same examples as those in **Figures 6A–D**<sup>6</sup> .

#### 3.3. Motif Distribution

Equation (12) for 2-node motifs and Equations (16, 17) for 3 node motifs were numerically integrated using the Matlab builtin function quad2d <sup>7</sup> . The exception is the innermost integral in Equation (17) which was computed using the simple trapezoid method in order to increase the speed of computations. The obtained results were additionally verified by simulating the equivalent model in Matlab, then counting motifs from the simulations.

#### 3.3.1. The Expected Number of Motifs (Figure 7)

**Figure 7A** summarizes the expected motif counts for all 2 node and 3-node motifs. Each column in the color-coded matrix corresponds to one motif, while each row corresponds to one value of the normalized effective radius (rmax) obtained by dividing the effective radius (1max) with the axondendrite distance in a neuron (1ad). We consider a wide range of values for the normalized effective radius, rmax ∈ {0.1, <sup>0</sup>.3, <sup>0</sup>.5, <sup>0</sup>.7, <sup>1</sup>, <sup>1</sup>.7, <sup>2</sup>, <sup>5</sup>, <sup>10</sup>}. The schematic representation of each motif is plotted at the top of the corresponding column.

<sup>7</sup>Matlab version R2014a

The motifs that have identical expected counts are represented by the same column (e.g., M1 and M4). Each motif count is normalized with the total number of same-size motifs, i.e., M1-2 and M2-2 are divided with the total number of 2-node motifs and motifs M1–M13 are divided with the total number of 3-node motifs. Normalization removes parameters that act as multiplicative constants in the expressions for motif counts, i.e., it removes the coefficient <sup>1</sup><sup>2</sup> ad l 2 for the 2-node motifs and <sup>1</sup><sup>4</sup> ad l 4 for the 3-node motifs. The normalized expected motif counts depend only on the normalized effective radius.

The first two columns in the color-coded matrix correspond to the 2-node motifs. For the smaller values of the normalized effective radius (rmax ≤ 2) most of the connections are unidirectional, as indicated by the higher percent of motifs M2- 2 in the second column. For the two biggest values of the normalized effective radius most of the connections become bidirectional and the fraction of motif M1-2 increases over 50%.

The 3-node motifs are shown in columns 3–15, arranged according to the increasing number of connections (in one direction, i.e., a bidirectional coupling counts twice). For the smallest values of rmax the motifs with two unidirectional connections are dominant. As the parameter increases the percent of the motifs with one bidirectional and one unidirectional connection (M3 and M7) increases. The middle range of values for the normalized effective radius (rmax between 1 and 2, encircled with the dashed white line in the figure) is the most interesting as it gives the biggest variability of motif counts. For these values, almost all of the motifs are present in the network structure. For the biggest values of rmax, most of the nodes form bidirectional connections and the motifs with bidirectional couplings become dominant. It should be noted that motifs M3 and M7 appear for all values of the normalized effective radius but the smallest one. They contain one bidirectional connection and one unidirectional connection between the three-nodes and seem to be the most feasible connectivity pattern for the considered type of network (with uniformly distributed and randomly oriented neurons). On the contrary, the cyclic pattern of motif M9 almost never appears in these networks.

These conclusions are additionally illustrated in **Figures 7B–D**, which show the motif percents for the three representative values of the normalized effective radius. **Figure 7B** shows the case for <sup>r</sup>max <sup>=</sup> <sup>0</sup>.3 when the motifs with a small number of unidirectional connections (M2, M1, and M4) dominate. **Figure 7C** illustrates the middle range of values for the normalized effective radius (example: <sup>r</sup>max <sup>=</sup> <sup>1</sup>.7), which enables the biggest variability of motifs. **Figure 7D** shows the case for the biggest rmax when the motifs with bidirectional connections dominate.

#### 3.3.2. Comparison with the Uniform Random and the Locally Coupled Networks (Figure 8)

For a comparison, motif counts are computed for the uniform random and for the locally coupled networks described in Methods Section. The networks are simulated for N = 3600 nodes. Node degrees are computed according

<sup>6</sup>Function φ −1 (z,η, M) is computed using a simple method. The values of φ(ρ,η, M) are computed on the interval [0, 1] with a resolution 0.001. For each value z the closest value of φ(ρ), φ(ρ0), is found. Then φ −1 (z) is computed from the approximation <sup>φ</sup>(ρ) <sup>=</sup> <sup>φ</sup>(ρ0) <sup>+</sup> (<sup>φ</sup> −1 (z) <sup>−</sup> <sup>ρ</sup>0) · <sup>φ</sup>(ρo) ′ <sup>⇒</sup> <sup>φ</sup> −1 (z) = <sup>ρ</sup><sup>0</sup> <sup>+</sup> <sup>φ</sup>(ρ) <sup>−</sup> <sup>φ</sup>(ρ0) φ′ (ρ0) . The derivative is also estimated using the simple equation φ ′ (<sup>ρ</sup> <sup>+</sup> <sup>d</sup>ρ) <sup>=</sup> <sup>φ</sup>(<sup>ρ</sup> <sup>+</sup> <sup>d</sup>ρ) <sup>−</sup> <sup>φ</sup>(ρs) dρ .

to Equation (6). The values of the normalized effective radius are the same as those considered in **Figure 7**, i.e., <sup>r</sup>max ∈ {0.1, <sup>0</sup>.3, <sup>0</sup>.5, <sup>0</sup>.7, <sup>1</sup>, <sup>1</sup>.7, <sup>2</sup>, <sup>5</sup>, <sup>10</sup>}. The axon-dendrite distance in a neuron is fixed to <sup>1</sup>ad <sup>=</sup> 1, and the parameter that determines the neuron density is <sup>l</sup> <sup>∈</sup> {0.3, <sup>0</sup>.5}. For <sup>l</sup> <sup>=</sup> <sup>0</sup>.3, the square of edge 1ad contains about 11 somata (a denser network). For <sup>l</sup> <sup>=</sup> <sup>0</sup>.5 that square contains 4 somata (a sparser network). For each value of the node degree we generated a uniform random network, strictly locally coupled network, and a locally coupled network with 10% of non-local connections. The construction of these networks is described in Methods Section. Each connection in the uniform random network is established with equal probability (that depends on the selected node degree) and independently of other connections. In the strictly locally coupled network, each node is connected to all the nodes within its connectivity area, which results in all bi-directional connections. The second example of the locally connected network is similar to the first one, but 10% of all the connections are removed and re-established with the closest nodes outside of the connectivity area.

**Figure 8** shows the comparison between our model and the simulated uniform random and the locally coupled networks. The color maps show t-scores computed using Matlab function ttest.m. For a simulated network, we obtained motif counts for every node (3600 values) and tested whether this sample has a mean value statistically equal to the expected motif count obtained from our model. The cases that pass the test are marked with the crossed pink squares in the figure. Clearly, most of the cases have significantly different motif counts than our model. The positive t-scores indicate that our model gives more motifs of a certain type than the simulated network, while the negative scores indicate fewer motifs in our model compared to the simulated network (t-scores obtained from Matlab are multiplied with -1). Also, we set the values outside of the interval [−500, 500] to <sup>±</sup>500, to emphasize the values closer to zero. In some cases, certain motifs do not appear in the simulated

number in our model than in the simulated network. All the values outside the interval [−500, <sup>500</sup>] are set to <sup>±</sup>500. The scores are shown as color maps in

> networks with 10% non-local connections. The first column corresponds to the denser population (<sup>l</sup> <sup>=</sup> <sup>0</sup>.3), and the second column to the sparser population (<sup>l</sup> <sup>=</sup> <sup>0</sup>.5).

possible. (E,F) Comparison with the locally coupled networks with 10%

non-local connections.

For almost all the cases shown in **Figure 8**, the number of bidirectional motifs (M1-2) in our model is larger than in the uniform random networks and smaller than in the locally coupled networks, while the opposite holds for the unidirectional motifs (M2-2). Similarly, the number of 3-node motifs with two unidirectional connections (M1 and M2) is smaller in our model than in the uniform random networks, and larger than in the locally coupled networks. On the contrary, the motifs with

network. We marked them with gray crossed squares in the figure. When our model gives zero expected number of motifs, the case is marked with both gray and pink squares.

The color maps in **Figure 8** are in the same format as in **Figure 7**. The color bar at the bottom of the figure explains the color code. The motif types are indicated on the x axis, and the values of the normalized effective radius are indicated on the y axis. The first row (**Figures 8A,B**) shows the comparison with the uniform random networks, the second row (**Figures 8C,D**) is the comparison with strictly locally coupled networks and the third row (**Figures 8E–F**) is the comparison with locally coupled solely bidirectional connections (M8 and M13) are almost always more frequent than in the uniform random networks and less frequent than in the locally coupled networks. The motifs with three or four connections are in most cases more frequent in our model than in both the uniform random and the locally coupled networks. The exceptions are motifs M5 and M9 that become less frequent than in uniform random networks for a sufficiently big rmax. As the normalized effective radius increases, our model forms more bidirectional connections and the motifs that require three unidirectional connections becomes less likely (this is more visible for M9, as it is anyway rare in our model). For even higher values of the effective radius, motif M10, which is not very frequent in our model, becomes underrepresented compared to both types of networks.

For the expected node degree of approximately 25–35% (cases: <sup>r</sup>max <sup>=</sup> <sup>5</sup>, <sup>l</sup> <sup>=</sup> <sup>0</sup>.3 and <sup>r</sup>max <sup>=</sup> <sup>10</sup>, <sup>l</sup> <sup>=</sup> <sup>0</sup>.5), our model contains dense local connectivity with many bidirectional connections. Eventually, most of the 3-node motifs become less frequent than in the uniform random networks except the three highly connected motifs, M8, M12, and M13. The motif M13 becomes more represented than in the locally coupled networks with 10% non-local connections, indicating very dense local connectivity in our model for these values of model parameters. On the contrary, motif M8 with two bidirectional connections is always less frequent in our model than in the locally coupled networks.

The last set of model parameters, <sup>r</sup>max <sup>=</sup> <sup>10</sup>, <sup>l</sup> <sup>=</sup> <sup>0</sup>.3, gives very high connectivity, the probability of connection reaches 0.97 in the network with 3600 nodes. The obtained results are not consistent with the rest of the analysis, as in this case both simulated networks contain a high number of the most connected motif M13, while many other motif types become less frequent than in our model. We wanted to show this case to illustrate the effect of the finite simulation size. Our model allows analysis for any value of the model parameters, but in the simulated networks, the model size determines the maximal range of feasible parameters.

#### 3.4. Clustering Coefficient, Path Length and Small-world Coefficient (Figure 9)

Once the motif counts are obtained, the clustering coefficient follows from Equation (19). For comparison, we also evaluated the clustering coefficient for the uniform random and for the locally coupled networks with 10% of non-local connections (see Methods Section). The clustering coefficients are computed from the motif counts. Motifs in random and locally coupled networks are computed in a standard way, by counting the connectivity patterns. Those counts are used in Equation (19) instead of NMi values. The motifs are multiplied with the coefficients 1 for M5 and M9, 2 for M6, M10, and M11, 4 for M12, and 8 for M13 in the numerator of the equation in order to take into account bidirectional connections in some of the motifs, the same way as in the standard expression for the clustering coefficient (Equation 18).

The simulated random and locally coupled networks have <sup>N</sup> <sup>=</sup> 3600 or <sup>N</sup> <sup>=</sup> <sup>4900</sup> nodes and model parameters <sup>1</sup>ad <sup>=</sup> <sup>1</sup> and <sup>l</sup> ∈ {0.3, <sup>0</sup>.5}. We consider only values from 0.3 to 10 for the normalized effective radius. For <sup>r</sup>max <sup>=</sup> <sup>0</sup>.1 the obtained networks are sparsely connected, possibly with many isolated cells. This causes a bias in the computation of the clustering coefficient and we omit these examples.

**Figures 9A,B** show the clustering coefficient for our model (red line), and also for the uniform random (blue) and the locally coupled network (turquoise) for two populations, a denser one (for <sup>l</sup> <sup>=</sup> <sup>0</sup>.3) and a sparser one (<sup>l</sup> <sup>=</sup> <sup>0</sup>.5). For most of the values of the normalized effective radius (rmax), the clustering coefficient of our model is in between those of uniform random and locally coupled networks. Only for the largest two values of rmax in the sparser population the clustering coefficient becomes bigger than the one in the locally coupled networks. For small values of rmax, axon and dendrite centers are relatively far apart and connect with different groups of cells. As rmax increases the axon and dendrite centers approach each other and the cells that connect to the axon become closer to those that connect to the dendrite, which makes connections between them more probable. This increases the number of motifs that contribute to the clustering coefficient. The distance between the axon and dendrite increases the effective area of the neighborhood, which might be a reason for the cases with higher clustering coefficient than in locally coupled networks. In the locally coupled networks, the dendrite and axon centers are identical and the neighborhood is defined by a single circle around that center. It should be noted that (Rieubland et al., 2014) reports higher clustering coefficient estimated from the experimental data than the one computed from the simulated uniform random and locally coupled networks. The example obtained for <sup>l</sup> <sup>=</sup> <sup>0</sup>.3 and rmax = 10 demonstrates that the cut-off effect present in smaller networks (see **Figure 8** obtained for <sup>N</sup> <sup>=</sup> 3600) disappears when comparing our model with bigger simulated networks (for <sup>N</sup> <sup>=</sup> 4900). An extensive comparison between our "infinitesize" model and the finite size simulated networks is presented in Supplementary Material 3.

The expected harmonic path length obtained using the iterative (Equations 20–23) (see Methods) is shown in **Figures 9C,D**. For all the considered model parameters the harmonic path length is slightly bigger in our model than in the uniform random network and smaller than in the locally coupled network. The computations used in this study result in somewhat smaller values for the harmonic path length than those obtained when simulating the equivalent model. This is a consequence of the finite simulation size (see the analysis presented in Supplementary Material 3). Consequently, the harmonic path length obtained from the numerical simulations differs more from the harmonic path length in the uniform random network, but is still smaller than the harmonic path length in the locally coupled network.

Finally, we computed the small-world coefficients out of clustering coefficients and path lengths. Two definitions of this coefficient are computed, the standard Watts-Strogatz definition (SWws, see Watts and Strogatz, 1998), shown in **Figure 9E**, and an alternative definition SW<sup>q</sup> from Telesford et al. (2011), shown in **Figure 9F**. The standard version compares our model to the uniform random networks and should be large for small-world networks. The alternative definition compares our model to both, the uniform random and the locally coupled networks, and should be around zero for the small-world networks. Both considered populations (for <sup>l</sup> <sup>=</sup> <sup>0</sup>.3 and <sup>l</sup> <sup>=</sup> <sup>0</sup>.5) maximize SWws for the normalized effective radius <sup>r</sup>max <sup>=</sup> <sup>0</sup>.7. The alternative coefficient SW<sup>q</sup> is the closest to zero for <sup>r</sup>max <sup>=</sup> <sup>1</sup>.7 and <sup>r</sup>max <sup>=</sup> 2, although, for the denser population (<sup>l</sup> <sup>=</sup> <sup>0</sup>.3) it stays above zero for all the values of rmax. The parameter rmax in the interval [1, 2] also maximizes the repertoire of possible motif counts, as shown in **Figure 7**.

#### 4. Discussion

We presented a two-level statistical model that examines how properties of single neurons and neurites constraint the connectivity in neuronal population. The connectivity is quantified using the standard graph theoretic measures like motif counts, clustering coefficient, harmonic path length, and the two definitions of small-world coefficient. Neurites are represented as neurite fields in accordance with the model already addressed in the literature (Snider et al., 2010; Teeter and Stevens, 2011; Cuntz, 2012; van Pelt and van Ooyen, 2013; McAssey et al., 2014). Such model provides a low-resolution and low-dimensional representation of neurites. The entire neuron model has three components, the neurite field of the axon, the neurite field of the dendrite, and the parameter that maps the distance between the axon and dendrite centers. The population of neurons is uniformly distributed in two-dimensional space with the density of neurons defined by a model parameter. This resembles the experiments with dissociated cortical cultures, and is often used in theoretical studies. Finally, the synapse formation rule is entirely based on the proximity of axons and dendrites (Peters' rule, Peters et al., 1991; Peters and Feldman, 1976), and no activity-dependent synapse reorganization is considered. Consequently, we consider only the potential connectivity as defined in Stepanyants and Chklovskii (2005). The synapse formation rule, as well as the population properties, are selected to emphasize the role of neuron morphology and make a clear link between the morphology and connectivity.

#### 4.1. Summary of the Findings

We first introduced the notion of effective radius of neurites, which is the maximal distance between an axon-dendrite pair of two neurons expected to connect with at least one synapse. The effective radius, the expected number of synapses, and the expected node degree are expressed as functions of neurite parameters. The expected number of synapses linearly depends on the density of neurite distribution, but non-linearly on the neurite size and the distance between the axon and dendrite centers. We considered several choices of neurite distributions, including the uniform distribution and several cases of the truncated Gaussian distribution with different distribution parameters. When both axon and dendrite are evenly distributed within the distribution support the expected number of synapses decreases almost linearly with the axon-dendrite distance. This has also been observed in the experimental studies (Rieubland et al., 2014), and in the modeling studies that reproduce realistic neuronal morphologies (Hill et al., 2012).

Next, we expressed the considered connectivity measures as functions of the normalized effective radius, which is the effective radius divided by the distance between axon and dendrite centers of the same neuron. We derived the closed-form expressions for the 2- and 3-node motifs. These motifs represent the minimalsize networks with structured connectivity that can be studied experimentally. The experimental study of path lengths requires recording of a much bigger population of neurons, which can easily become infeasible. The expected motif counts are expressed in the form of multiple integrals that are evaluated numerically. The obtained results vary significantly for different values of the normalized effective radius. For most of the considered values of the normalized effective radius, the unidirectional 2 node motifs are more frequent than the bidirectional motifs. This resembles the statistics of 2-node motifs in the uniform random networks. For large values of the normalized effective radius, the bidirectional motifs become dominant, similarly as in the locally coupled networks. Additional comparison shows that our model always expresses more bidirectional motifs that the uniform random networks and less than the locally coupled networks. The opposite holds for the unidirectional motifs.

The sparsely connected 3-node motifs (with 2 unidirectional connections) are dominant for the small normalized effective radius, which resembles the 3-node motif distribution in the uniform random networks. For the large normalized effective radius, the densely connected motifs (with two or three bidirectional connections) become frequent, which is typical for the locally coupled networks. For all considered values of the normalized effective radius, our model exhibits less sparsely connected motifs than the uniform random networks and more than the locally coupled networks. The opposite holds for the motif with the maximal connectivity (i.e., with three bidirectional connections). In-between these extremes we can identify the range of values for the normalized effective radius that maximizes the variability in connection repertoires on the micro-scale. For these values, almost all the motifs are present in the network, which is not the case in the uniform random and the locally coupled networks that favor certain motifs. The analysis of the clustering coefficient, harmonic path length, and the small-world coefficient shows that the same range of values results in the small-world coefficient closest to the one of the small-world networks. For the normalized effective radius between 1 and 2, the clustering coefficient is close to the one of the locally coupled networks, and the path length is somewhat longer than the one of the uniform random networks.

#### 4.2. Axons and Dendrites Modeled as Neurite Fields

We adopted several approximations when choosing models for the individual neurons and for the populations of neurons. In what follows we will additionally motivate the adopted choices. The coarse representation of neurites, reduced to the distribution of neurite segments, neglects the fine details of the neurite tree structure, including the non-random orientation of neurite segments, the branching patterns, or any correlation in the structure of neurite branches. Previous studies suggest that this low-resolution neurite description still captures relevant dendrite properties at the level of the whole neuron morphology (Snider et al., 2010; Teeter and Stevens, 2011). In this work, we also used density fields to represent axons, which better describes the properties of neurons in cell cultures than in the threedimensional tissue. In the cortical tissue, the axons are elongated and branched structures that cover large area compared to dendrites. In most of the cases, just a single axonal branch passes through the dendritic field (Braitenberg and Schüz, 1998). The axon density field can be interpreted as uncertainty of the position of individual axonal branches within the space covered by the axon. This complies with our model, where the principal axonal orientation is random, and the neurite field describes the additional randomness of position of the axonal branches with respect to the principal orientation. In the systems with non-random principal orientation of axons, or in neurons for which the correlation between the axonal branches cannot be approximated a model that describes each branch might be more suitable. For example, a neurite field description of dendrites can be combined with axons modeled in NETMORPH (Koene et al., 2009). Still, as long as both dendrites and axons cover a limited space, the effective radius can be derived as well as the expressions for the considered measures of network connectivity. Eventually, the expression for the effective radius might have more complex dependency on the neurite properties.

#### 4.3. Potential Synapses Estimated from the Neurite Fields

An important issue related to this modeling approach is addressed in van Pelt and van Ooyen (2013). This study systematically examines several aspects of connectivity, including the number of synapses per neurite, the number of synapses between pairs of neurons, and the connectivity per neuron. Those aspects are evaluated for neurites with realistic branching trees and also for neurites described by the neurite density distribution. The paper finds agreement between realistic and neurite field based descriptions of neurons when estimating the expected number of synapses. But, the disagreement arises when computing the expected number of synapses per connected axondendrite pair. To overcome the problem, the authors proposed an empirical mapping function between the connectivity obtained from detailed simulated morphologies and the connectivity computed using density fields obtained by averaging over detailed simulated morphologies.

The model examined in our study derives the average connectivity from neurite distributions, therefore might suffer from the issues indicated in van Pelt and van Ooyen (2013). We can adopt the same method to overcome the problem, and apply an empirical mapping function to the Equation (5) that defines the effective radius. On the right side of the first inequality, instead of one there will be a constant dependent on the empirical mapping function. This constant will be added to the expression for the effective radius, but the rest of the analysis will not be affected. Eventually, the optimal range of values for the normalized effective radius might be shifted from the interval [1, 2]. Alternatively, the expected number of synapses can be obtained from a more realistic model of neurites, e.g., from the reconstructed cells from neuroimaging studies or from detailed morphologies simulated using NETMORPH (Koene et al., 2009). As long as the obtained function is at least piecewise invertible the effective radius can be computed from it, and the results for the expected motif counts and for the other considered measures still apply.

#### 4.4. Relation Between the Actual and the Potential Number of Synapses

We derive all the network measures from the potential connectivity, but potential connectivity does not fully explain the actual connectivity. The obtained potential number of synapses (**Figure 4**), the range of values and the functional form, is in agreement with other studies that estimate the connectivity from neuronal morphology (Hill et al., 2012; van Pelt and van Ooyen, 2013), but it cannot fully explain the actual synaptic connectivity reported in Markram et al. (1997) and Fares and Stepanyants (2009). **Figure 4** indicates the adequate range of values for the properly selected model parameters. The distance-dependent expected number of synapses [φ(ρ,η, M)] is smaller than 3 for the typical examples presented in **Figure 4**, with much bigger values obtained only for the very narrow (and unrealistic) neurite fields. To obtain the expected number of potential synapses, it should be multiplied with a coefficient that depends on the model parameters and is not greater than φmax. The obtained expected number of synapses per connection reaches 10 synapses or less. Although the range of values is (roughly) accurate for the properly selected model parameters, the distribution of synapse counts is not according to Fares and Stepanyants (2009). This study demonstrates that the distribution of potential synapses between a connected axon-dendrite pair has much higher variance than the distribution of actual synapses. They proposed a cooperative model of synapse formation, described by a sigmoid function, that establishes actual synapses only between axon-dendrite pairs with sufficient number of potential contacts, while it rules out the pairs with few contacts. This correction can be incorporated in our study, similarly to the mapping function discussed in the previous paragraph, by applying the proposed sigmoid function to the left side of the first inequality in Equation (5).

Finally, corrections proposed in van Pelt and van Ooyen (2013) and Fares and Stepanyants (2009) can be combined. First, the empirical mapping function from van Pelt and van Ooyen (2013) can be used to convert the synapse counts obtained from the neurite fields to the values that would be obtained by simulating detailed morphologies. Then, the cooperative rule from Fares and Stepanyants (2009) can be used to convert the number of potential synapses to the counts of actual synapses. All these operations will somewhat alter the functional form of Equation (5) and, consequently, the expression for the effective radius and how it depends on the neurite parameters. Eventually, an additional parameter might be introduced to describe the connectivity area. The computation of the network measures can then be done following the same method described in this study.

#### 4.5. Alternative Potential Synapse Formation Rules

We considered a simple potential synapse formation rule based on the proximity criteria: axon and dendrite segments form contacts if they find themselves on a distance smaller than the average dendritic spine length. The only constraint is that a dendritic segment cannot form potential synapses with more than one segment of the same neighboring axon. Still it can form potential synapses with the segments of other axons. A more realistic rule would require that each dendritic segment connects to at most one among all the proximal segments of all the axons, this may better reflect the connectivity in cortical tissue (Braitenberg and Schüz, 1998) and also reduce the total number of potential synapses per neuron. In the current model, the number of potential synapses is controlled by the choice of model parameters (see Methods). An alternative potential synapse formation rule would allow a wider range of model parameters. Implementing the alternative rule would likely result in a more complex relation between the effective radius and the neurite parameters. Still, if we consider one particular dendrite, all the axons that connect to it have to be on a finite distance from it, and the effective radius is always finite. The alternative rule would alter the criterion for connectivity: a neuron would not connect to all the neurons inside of its connectivity area, but just to some of them and according to some selection criteria derived from the potential synapse formation rule.

Activity-dependent synaptic rearrangement is not considered in this study, although it represents an important mechanism in shaping the synaptic patterns. We focus on the most stable aspects of neuronal connectivity, those governed by morphology of neurite trees. As indicated in the literature (Stepanyants et al., 2002), remodeling of neurite branches requires longer time scale than formation or removal of the individual synapses. The synaptic connectivity derived from neuromorphology can be considered as an additional constrain in the process of the activity-dependent synaptic rearrangement. It is reasonable to expect that the networks with larger diversity of motif counts retain larger variability of the connectivity also in the presence of the activity-dependent synaptic changes.

#### 4.6. Comparison with the Experimentally Observed motif Counts

The presented study focuses on a statistical description of neuronal connectivity and the constraints to connectivity imposed by low-resolution properties of neuronal morphology. The considered problem was solved analytically. We established the functional dependencies between the considered connectivity descriptors and the parameters that describe neuronal morphology and the organization of neuronal population. In order to solve the described problem, we had to approximate several mechanisms that significantly influence the formation and maintenance of synaptic contacts. Those include the details of neurite structure, the realistic organization of neurons in the cortical tissue (as we considered a model that corresponds to organization in cell cultures), and most importantly the fine tuning of connectivity patterns through synaptic plasticity. Consequently, certain differences between the results obtained from our model and the corresponding experimental findings are expected.

The studies in Markram et al. (1997), Song et al. (2005), and Perin et al. (2011) examined the connectivity between cortical layer 5 pyramidal neurons and reported over-representation of bidirectional motifs compared to the uniform random networks. The study in Rieubland et al. (2014) addressed the connectivity between molecular layer interneurons in the cerebellum and found no significant difference compared to the uniform random networks. In Markram et al. (1997), 30% observed connections were bidirectional and 70% unidirectional. This corresponds to distribution of unidirectional and bidirectional connections obtained in our model for the normalized effective radius close to 1.7. Our results show that values of the normalized effective radius smaller than 2 give less than 50% of bidirectional connections, while the opposite holds for the normalized effective radius larger than 2. For almost every choice of the model parameter value, the number of bidirectional connections exceeds the one of the uniform random networks, similarly as in Song et al. (2005) and Perin et al. (2011). A recent study (Cossell et al., 2015) examined the role of bidirectional connections in sensory information processing. They found that neurons with correlated responses to visual stimuli often connect with strong bidirectional couplings, while the majority of neurons exhibits weakly or uncorrelated responses to visual stimuli and connects with unidirectional couplings.

Three studies (Song et al., 2005; Perin et al., 2011; Rieubland et al., 2014) reported the distribution of 3-node motifs in cortical neuronal networks. In Song et al. (2005), the authors defined the optimal transitive connectivity rule stating that "if node N<sup>1</sup> connects to N2, and N<sup>2</sup> connects to N<sup>3</sup> (in any direction), the probability that N<sup>1</sup> connects to N<sup>3</sup> significantly exceeds the chance level." Motifs M1, M5, M6, M9, M10, M11, M12, and M13 have been found in the data more often than in the uniform random networks. In addition, motif M3 was less frequent than in the uniform random networks. The study in Perin et al. (2011) confirms the same connectivity rule and finds motifs M1, M5, M6, and M11 to be overrepresented in the data compared to the locally coupled networks. In Rieubland et al. (2014), the preference for transitive motifs is also confirmed, with motifs M1 and M5 being overrepresented compared to both the uniform random and the locally coupled networks. Our model suggests the optimal range of values for the normalized effective radius that supports formation of the reported motifs (particularly, M5, M6, M10–M13), i.e., the interval <sup>r</sup>max <sup>∈</sup> [1, 2]. Outside of this interval, some of these motifs become rare. Contrary to Song et al. (2005), we rarely ever observe the loop-motif M9, the same motif is also rare in the locally coupled networks. Motif M12 becomes relatively frequent in our model for the sufficiently big values of rmax. Although it is not reported in all experimental studies, it also has transitive connectivity. We frequently observe motifs M3 and M7, more frequently than in both the locally coupled and the uniform random networks. Such motifs can be formed between three neurons if two of them fall inside the connectivity area of the third one in such a way that one is close to the center of the connectivity area and the other is close to its border. The neuron close to the center of the connectivity area is likely to form a bidirectional connection present in motifs M3 and M7. The neuron close to the border of the connectivity area is likely to form the remaining unidirectional connection.

Finally, it should be mentioned that our model cannot predict missing connections and disconnected motifs, like those analyzed in Rieubland et al. (2014), or the anti-clustering coefficient emphasized in the same study. This is a consequence of the definition of connectivity area and the fact that all dendrites within the connectivity area of an axon synapse to that axon. A different synapse formation rule, allowing that some of the dendrites within the connectivity area remain disconnected from the considered axon, like the alternative rule described in a previous paragraph, would allow analysis of the missing connections and the additional motifs discussed in the literature.

#### 4.7. Limitations of the Experimental Studies

Connectivity measures obtained from experimental studies are to some extent affected by the adopted experimental protocols. A recent modeling study (Miner and Triesch, 2014) examined the possible bias in the connectivity measures introduced by sampling and finite size of the slices. Our model can also be used to examine the effects of the finite size of the considered neuronal population. The analytical results presented in our study are derived for an infinite-size population of neurons. On contrary, simulation of the equivalent model can only be done for the finite number of neurons. Comparison between the analytical and the simulated results illustrates the bias induced when estimating the properties of a large neuronal circuit using a small subpopulation. In the following paragraph, we give two examples of this issue. We illustrate a case where the finite network size affects motifs computation. We carefully discuss how the reduction of model size affects the path-length and the small-world coefficient computations.

#### 4.8. The Effects of the Finite Model Size

In most of the derivations presented in this study, the network size is not explicitly considered, i.e., we treat the model as if it were infinite. An exception is the path length, a global measure of the network structure that has to depend on the model size. In our study, the information about model size is, however, introduced only in the later steps of the path length computation. (In)finite model size becomes an issue if we want to compare our model to a simulated, therefore, a finite-size network. In **Figure 8**, we compare the expected motif counts obtained from our model to those obtained from the uniform random and the locally coupled networks. The result shown for the biggest value of the effective radius is biased due to the finite number of neurons in the network. While our model does not suffer from this effect, the two simulated networks do. The large effective radius leads to a large number of neighbors, in the considered case those neighbors represent 97% of all the network nodes. Clearly, both the uniform random and the locally coupled networks become densely connected, close to all-to-all connectivity, so the results obtained in this case visibly deviate from all the other examples.

In our model, the harmonic path length is computed using the iterative equations derived in Methods Section. The obtained harmonic path length is somewhat smaller than the result computed by simulating the equivalent model. In Supplementary Material 3, we analyze steps in computation of the path length and identify all the differences between the analytically solved and the simulated model. We first compute all the intermediate steps and probabilities defined by the iterative procedure. Then, we calculate all those intermediate probabilities from the simulated model and compare them to the results of the iterative procedure. The finite size of the model imposes the maximal distance between any pair of neurons. As we approach this maximal distance, the intermediate probabilities computed from simulations converge to zero. On contrary, the intermediate probabilities obtained using the iterative method do not contain the information about the model size, but instead describe an infinite-size model. Next, we compute the distribution of path lengths from the intermediate probabilities. This is possible only if we cut-off the intermediate probabilities at the maximal allowed distance between a pair of neurons in the model, and therefore artificially introduce the model size. Consequently, for the larger values of the effective radius the probabilities obtained from the iterative equations drop faster than the same probabilities obtained from the simulations. The harmonic path length obtained from the iterative equations is somewhat smaller than the result of simulations.

The connectivity measures most affected by model size are the small-world coefficients. As the size of simulated networks increases, the small-world coefficient computed using the definition from Watts and Strogatz (1998) increases. At the same time, the coefficient from Telesford et al. (2011) decreases and becomes closer to zero. In both cases, larger analyzed networks are more likely to be classified as small-world networks. The impact of numerical methods, model size, and number of simulation iterations is discussed in Supplementary Material 3. It should also be noted that the two considered definitions of the small-world coefficient lead to somewhat different conclusions. While the coefficient from Telesford et al. (2011) suggests that networks obtained for the effective radius in the interval [1, 2] have the connectivity closest to the smallworld networks, the definition from Watts and Strogatz (1998) points at smaller values of the effective radius, namely the interval [0.7, 1]. The interval [1, 2] also maximizes the diversity in the obtained expected motifs counts, so the results obtained using the coefficient from Telesford et al. (2011) better agree with the motifs analysis. At the other hand, this coefficient seems to be more sensitive to the methodology used to compute the harmonic path length, although both considered methods (our iterative method and the numerical simulations) give qualitatively similar results.

#### 4.9. Related Modeling Studies

Two previous modeling studies, Herzog et al. (2007) and Voges et al. (2010), use a similar neuron description to address the same problem, i.e., how the coarse scale properties of neuronal morphology shape the connectivity in large networks. They examined a neuron model that reproduces patchy connections observed in the cortex. Axons are modeled as Gaussian neurite fields with the axon center displaced from the soma in order to capture the long-distance connectivity in the considered networks. A neuron is allowed to connect to other neurons close to its soma and also to the neurons close to its displaced axon field. The generated networks exhibit small-world properties suggesting optimized wiring in the cortex. In our model, both axons and dendrites are described by neurite fields, but axons can connect only to the dendrites that are sufficiently close to the axonal field. Our model is constructed to capture general properties of neuronal morphology suggested by Snider et al. (2010), while the studies in Herzog et al. (2007) and Voges et al. (2010) focus on the specific types of pyramidal cells with long patchy projections and the neuronal connectivity derived from this property. In a recent study (McAssey et al., 2014), a similar model that uses neurite density fields to represent axons and dendrites is analyzed through simulations. The authors carefully fitted the density fields using the reconstructed neuronal morphologies fed to the simulator (Koene et al., 2009). They demonstrated the realistic distribution of potential synapses and the optimal properties of the obtained networks treated as weighted graphs. The results suggest that these networks possess properties similar to the small-world networks.

The model we considered in this study is very similar to those described in Herzog et al. (2007), Voges et al. (2010), and McAssey et al. (2014), but we opted for a different approach to analyzing the model. Instead of simulating the model for different parameter sets, we derived the analytical solution that allows us to fully understand the significance of the individual model parameters. We introduced the concepts of effective radius and connectivity area. Through these concepts we mapped the parameters of the individual neurons to a combined parameter that further determines the network-level properties. Additional work should be done to estimate this parameter from the experimental data, an issue that will be a subject of our future studies.

The two-level statistical model analyzed in this study can be seen as a framework to connect single neuron properties with the network-level organization. The main question is how to reduce the number of parameters in the neuron model in order to easier embed it to the network-level model. Ideally, the single neuron model should include as much details as possible that are then reduced using averaging and statistical description into a lower-dimensional representation. The lower-dimensional representation should provide a possibility to clearly tract the most crucial aspects of the neuron model when embedded into the network. We followed this methodology by introducing the concept of effective radius. The adopted methodology provides flexibility in selection of model components and allows easier modification of the presented framework to include new aspects of neurons and neuronal populations.

### Acknowledgments

This work was supported by Academy of Finland project 132877, Foundation of Tampere University of Technology (M.-L.L).

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnana. 2015.00076/abstract

### References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 A´cimovi´c, Mäki-Marttunen and Linne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Homeostatic structural plasticity can account for topology changes following deafferentation and focal stroke

#### *Markus Butz <sup>1</sup> \*, Ines D. Steenbuck2 and Arjen van Ooyen3*

*<sup>1</sup> Simulation Lab Neuroscience - Bernstein Facility for Simulation and Database Technology, Institute for Advanced Simulation, Jülich Aachen Research Alliance, Forschungszentrum Jülich, Jülich , Germany*

*<sup>2</sup> Student of the Medical Faculty, University of Freiburg, Freiburg, Germany*

*<sup>3</sup> Department of Integrative Neurophysiology, VU University Amsterdam, Amsterdam, Netherlands*

#### *Edited by:*

*Stephen Eglen, University of Cambridge, UK*

#### *Reviewed by:*

*Volker Steuber, University of Hertfordshire, UK Mika Rubinov, University of Cambridge, UK*

#### *\*Correspondence:*

*Markus Butz, Simulation Lab Neuroscience - Bernstein Facility for Simulation and Database Technology, Institute for Advanced Simulation, Jülich Aachen Research Alliance, Forschungszentrum Jülich, Wilhelm-Johnen-Straße, 52428 Jülich, Germany e-mail: m.butz@fz-juelich.de*

After brain lesions caused by tumors or stroke, or after lasting loss of input (deafferentation), inter- and intra-regional brain networks respond with complex changes in topology. Not only areas directly affected by the lesion but also regions remote from the lesion may alter their connectivity—a phenomenon known as diaschisis. Changes in network topology after brain lesions can lead to cognitive decline and increasing functional disability. However, the principles governing changes in network topology are poorly understood. Here, we investigated whether homeostatic structural plasticity can account for changes in network topology after deafferentation and brain lesions. Homeostatic structural plasticity postulates that neurons aim to maintain a desired level of electrical activity by deleting synapses when neuronal activity is too high and by providing new synaptic contacts when activity is too low. Using our Model of Structural Plasticity, we explored how local changes in connectivity induced by a focal loss of input affected global network topology. In accordance with experimental and clinical data, we found that after partial deafferentation, the network as a whole became more random, although it maintained its small-world topology, while deafferentated neurons increased their betweenness centrality as they rewired and returned to the homeostatic range of activity. Furthermore, deafferentated neurons increased their global but decreased their local efficiency and got longer tailed degree distributions, indicating the emergence of hub neurons. Together, our results suggest that homeostatic structural plasticity may be an important driving force for lesion-induced network reorganization and that the increase in betweenness centrality of deafferentated areas may hold as a biomarker for brain repair.

**Keywords: topology, deafferentation, focal retinal lesion, neuronal network model, structural plasticity, homeostatic plasticity, stroke, epileptogenesis**

#### **1. INTRODUCTION**

Repair of brain networks following lesions, stroke or neurodegeneration goes along with massive rewiring of connections. Rewiring is brought about by synapse formation and deletion, dendritic remodeling, and axonal sprouting, pruning and re-routing (structural plasticity) (Butz et al., 2009b). Network rewiring induced by lesions or neuronal loss contributes to changes in network topology associated with tumors (Bartolomei et al., 2006; Honey and Sporns, 2008), stroke (van Meer et al., 2012; Yin et al., 2013), and neurodegenerative diseases, including Alzheimer's disease (Stam et al., 2009; Sanz-Arigita et al., 2010) and multiple sclerosis (He et al., 2009; Tewarie et al., 2014). Interestingly, in all these pathologies, brains become more randomly connected or lose complexity of hierarchical structure (Tewarie et al., 2014). Increasing randomness and decreasing betweenness centrality (a topological measure for the importance of neurons in a network) correlate with network degeneration and decline in cognitive function (Bosma et al., 2009; Schoonheim et al., 2013). An important aspect of network rewiring is diaschisis (von Monakov, 1914; Andrews, 1991), the phenomenon that brain regions not directly affected by the primary lesion but deafferentated by the lesion change their connectivity. Extending this early concept of diaschisis, recent studies analysing neuroimaging data (e.g., from stroke patients) using graph theoretical methods have revealed complex changes in global network topology after brain lesions (Honey and Sporns, 2008; Alstott et al., 2009; Carter et al., 2012; van Meer et al., 2012; Rehme and Grefkes, 2013). These studies showed that while brain networks as a whole generally become more random following network rewiring, the deafferentated areas themselves increase their betweenness centrality (Wang et al., 2010)—an unexpected result because random networks tend to have nodes with low betweenness centrality. Changes in topology after brain damage have mostly been reported for inter-area connectivity (Wang et al., 2010), but both global inter-area connectivity and local intra-area connectivity rewire after lesions (Murphy and Corbett, 2009; Winship and Murphy, 2009).

Topology changes in inter-area and intra-area connectivity are poorly understood, partly because of a lack of understanding of the principles governing structural plasticity. An elegant way to study structural plasticity after deafferentation is the experimental paradigm of focal retinal lesions (Eysel et al., 1980; Keck et al., 2008; Yamahachi et al., 2009). In this paradigm, the primary lesion is made in the eye so that no damage of brain tissue overlays the massive cortical reorganization following deafferentation (Darian-Smith and Gilbert, 1994; Keck et al., 2008; Yamahachi et al., 2009; Keck et al., 2011; Marik et al., 2014). Compared with the brain, the eye is also better accessible for lesioning, and because of retinotopy, the retinal lesion leads to a well-defined deafferentated lesion projection zone (LPZ) in the primary visual cortex. Recently, we postulated that the need of neurons to maintain homeostasis of their average electrical activity may act as a driving force for structural plasticity (Butz and van Ooyen, 2013) (see also van Ooyen and van Pelt, 1994; van Ooyen et al., 1995; Butz et al., 2008, 2009a; Tetzlaff et al., 2010; van Ooyen, 2011). We developed a novel computational model, called Model of Structural Plasticity (MSP) (Butz and van Ooyen, 2013; Butz et al., 2014), in which neurons create new dendritic spines and axonal boutons when neuronal activity is below a homeostatic set-point, and delete spines and boutons when activity is above the set-point. Synapses are formed by merging spines and boutons. Using MSP, we showed (Butz and van Ooyen, 2013) that homeostatic structural plasticity, without any additional forms of Hebbian plasticity, can account for the changes observed in the visual cortex after focal retinal lesions: an increased dendritic spine turnover in the center of the LPZ (Keck et al., 2008), an overshoot in axonal sprouting from the peri-LPZ into the LPZ (Yamahachi et al., 2009), and a functional retinotopic remapping (Giannikopoulos and Eysel, 2006; Keck et al., 2008). In MSP, changes in topology arising from structural plasticity do not require any goal-directed network process but emerge solely from a local neuronal mechanism aimed at restoring neuronal firing rates.

Here, we investigated how local changes in connectivity brought about by homeostatic structural plasticity altered intraarea connectivity. Currently, there are no experimental studies available on intra-area topology changes after brain damage or deafferentation, but we found remarkable similarities between our model results and observed changes in inter-area connectivity especially after subcortical stroke. As a direct result of network rewiring after focal deafferentation, the model network as a whole first increased its small-worldness and then became more random and consequently less small-world. At the same time that the whole network became more random, the deafferentated neurons themselves increased their betweenness centrality if network repair was succesful. The increase in betweenness centrality may therefore hold as a biomarker for brain repair after deafferentation. The decrease in small-worldness of the whole network was associated with a decrease in local but an increase in global efficiency of the deafferented neurons, with efficiency defined as the average inverse of shortest paths between neurons. Our modeling results strongly resemble experimental and clinical data showing that during the course of post-stroke reorganization, interregional networks become more random, while areas that lost input as a consequence of the infarct increase their betweenness centrality (Wang et al., 2010). Thus, our model of homeostatic structural plasticity, even though at first interpretation a model for intra-area reorganization, may provide valuable insights into the mechanisms underlying inter-area topology changes during brain repair.

#### **2. MATERIALS AND METHODS**

#### **2.1. THE MODEL AT A GLANCE**

Our Model of Structural Plasticity (MSP) (Butz and van Ooyen, 2013; Butz et al., 2014) represents synapses not merely as synaptic weight factors but as composed of two complementary synaptic elements: an axonal element representing axonal boutons or terminals, and a dendritic element representing any postsynaptic specialization on the dendrite (e.g., a dendritic spine). Synaptic elements develop independently of their matching element in an activity-dependent manner. A neuron creates new synaptic elements when its level of electrical activity is below a homeostatic set-point and decreases the number of elements when its activity exceeds this set-point. In addition, neurons need a minimum level of activity to form synaptic elements. Newly formed elements are vacant and available for synapse formation. Vacant axonal and dendritic elements can connect to form a new synapse. Synaptic elements of adjacent neurons are more likely to connect than those of more distant neurons. Vacant synaptic elements that are not used for synapse formation decay spontaneously with a certain rate. Existing synapses can break up if an element bound in a synapse is removed by the hosting neuron. The complementary synaptic element of the broken-up synapse becomes vacant and available for synapse formation again, which enables structural rewiring of neuronal networks. The algorithm proceeds in three steps. First, electrical activity is computed for every neuron. Second, numbers of synaptic elements are updated depending on the current average level of electrical activity of each neuron, which may cause the breaking of synapses. Third, vacant synaptic elements are recombined to form new synapses. Changes in electrical activity and number of synaptic elements proceed on a continuous timescale, whereas the breaking and formation of synapses take place at discrete time steps.

#### **2.2. NEURON MODEL**

The same network and neuron model was used as in Butz and van Ooyen (2013), with *nex* = 320 excitatory and *nin* = 80 inhibitory Izhikevich neurons (Izhikevich, 2003). Inhibitory neurons only differ from excitatory ones in the sign of synaptic transmission. Excitatory neurons were placed with some jitter on a 20 x 16 grid with a spatial distance between two grid points of 150µm. Inhibitory neurons were placed evenly between the excitatory neurons. Electrical activity is modeled by two differential equations, one for the membrane potential *v* and one for a recovery variable *u* enabling re-polarization after an action potential:

$$\frac{d\nu}{dt} = k\_1 \nu^2 + k\_2 \nu + k\_3 - \mu + I\_{\text{syn}} + I\_{\text{ext}}$$

$$\frac{d\nu}{dt} = a(b\nu - u) \tag{1}$$

where *v* and *u* are in mV, *t* is in ms, *k*<sup>1</sup> = 0.04 mV<sup>−</sup>1ms<sup>−</sup>1, *k*<sup>2</sup> = 5 ms<sup>−</sup>1, and *k*<sup>3</sup> = 140 mVms<sup>−</sup>1. Every time a neuron fires (*v* ≥ 30 mV), *v* and *u* are reset:

$$\text{if } \nu \ge 30 \text{ mV, then } \begin{cases} \nu \gets \mathcal{c} \\ u \gets u + d \end{cases} \tag{2}$$

where *a* = 0.1 ms<sup>−</sup>1, *b* = 0.2 ms<sup>−</sup>1, *c* = −65 mV, and *d* = 2 mVms<sup>−</sup>1. Synaptic input *Isyn* has a fixed strength of 1 mVms−<sup>1</sup> for every synapse. Synaptic input arriving at the postsynaptic neuron is low-pass filtered by an exponential filter function *h*(*t*) = *exp*( − *<sup>t</sup>* <sup>μ</sup>) with decay constant μ = 5 ms. External input *Iext* is permanently delivered as white noise with mean 5 mVms−<sup>1</sup> and standard deviation 1 mVms−<sup>1</sup> according to Izhikevich (2003); Butz and van Ooyen (2013).

Intracellular calcium concentration is used as a low-passed filtered average of the firing frequency of each neuron (Butz and van Ooyen, 2013). Every time a neuron fires, calcium concentration is increased by β = 0.001 ms−<sup>1</sup> and then decreases exponentially to zero with decay time τ*Ca* = 10000 ms.

#### **2.3. MODEL OF STRUCTURAL PLASTICITY**

We used our model of structural plasticity (MSP), which is described in detail in Butz and van Ooyen (2013); Butz et al. (2014). The model proceeds in three steps: (1) updating electrical activity, as described above; (2) updating the number of synaptic elements and eventually the breaking of synapses if synaptic elements were deleted; and (3) the formation of new synapses.

#### *2.3.1. Update of synaptic elements and breaking of synapses*

We applied Gaussian growth curves (**Figure 1**) for the number *Ai* of axonal elements, the number *Dex <sup>i</sup>* of excitatory dendritic elements and the number *Din <sup>i</sup>* of inhibitory dendritic elements:

$$\frac{dz\_i}{dt} = \upsilon \left( 2 \operatorname{e} \left( \frac{\left[ Ca^{2+} \right]\_i - \xi\_z}{\xi\_z} \right)^2}{} \right)$$

$$\xi\_z = \frac{\eta\_z + \epsilon}{2}$$

$$\xi\_z = \frac{\eta\_z - \epsilon}{2\sqrt{-\ln(1/2)}} \tag{3}$$

where ν is the growth rate and is the homeostatic set-point, at which *dz*/*dt* = 0. The variable *z* needs to be replaced by the respective type of synaptic element *A*, *Dex*, or *Din*. If the calcium concentration *Ca*2+ *<sup>i</sup>* (a measure for the average electrical activity of the neuron) is higher than , synaptic elements are removed; if it is lower than that, synaptic elements are formed. However, there is also a minimum calcium concentration required for the formation of elements: η*<sup>A</sup>* for axonal elements and η*<sup>D</sup>* for dendritic elements. If the concentration is lower than η*A*, axonal elements are removed; if it is lower than η*D*, dendritic elements are removed. The center and width of the Gaussian-shaped growth curve are given by ξ and ζ , respectively.

*2.3.1.1. Parameters of activity-dependent changes in synaptic elements.* For all types of elements, we chose ν = 10−<sup>4</sup> ms<sup>−</sup>1. As in Butz and van Ooyen (2013), we studied three cases with different sets of growth curves (**Figure 1**): (1) η*<sup>A</sup>* = 0.4, η*<sup>D</sup>* = 0.1, = 0.7; (2) η*<sup>A</sup>* = η*<sup>D</sup>* = 0.1, = 0.7; and (3) η*<sup>A</sup>* = 0.1, η*<sup>D</sup>* = 0.4, = 0.7. The first case is referred to as the physiological case because it best reproduces experimental findings on dendritic spine and axonal bouton dynamics in the primary visual cortex after focal retinal lesion (Butz and van Ooyen, 2013). The

**FIGURE 1 | Depending on the neuronal growth curves for the change** *dD/dt* **in number of dendritic elements and the change** *dA/dt* **in number of axonal elements, network reorganization after lesions leads to different network topologies.** Changes in the number of elements are dependent on the time-averaged neuronal electrical activity as measured by the cell's intracellular calcium concentration *Ca*2+ . **(A)** If the minimal activity for dendritic element formation is lower than that for axonal element formation (η*<sup>D</sup>* = 0.1, η*<sup>A</sup>* = 0.4, respectively), networks reorganize in a physiological manner, with axonal and dendritic element

dynamics (Butz and van Ooyen, 2013) resembling experimental observations (Keck et al., 2008). **(B)** If dendritic and axonal elements can already grow at low activity levels (η*<sup>D</sup>* = η*<sup>A</sup>* = 0.1), we obtain strongly recurrently connected networks after a lesion. **(C)** If dendritic elements need high levels of activity (η*<sup>D</sup>* = 0.4, η*<sup>A</sup>* = 0.1), no network repair takes place, i.e., no restoration of activity levels. We replaced the homeostatic set-point = 0.7 by a homeostatic range of 0.65 ≤ ¯ ≤ 0.75, in which no change in number of axonal or dendritic elements takes place. We chose ν = 10−<sup>4</sup> ms<sup>−</sup>1.

other two cases are aberrant cases. The second case is called the recurrent case because network repair is brought about by massive recurrent connections in the LPZ. The third case is called the no-repair case because with this choice of growth parameters, neurons are not able to restore their electrical activity back to the homeostatic set-point.

Since with discrete synaptic elements there is no solution where all neurons are exactly at the homeostatic set-point, neurons will continue to rewire their connectivity at a low rate. To stop network rewiring when neurons are close to the homeostatic set-point , we replaced the set-point by a homeostatic range ¯ = [0.65..0.75]. In this range, neurons do not initiate activitydependent changes in number of synaptic elements; i.e., *dz*/*dt* = 0 if 0.65 ≤ *Ca*2+ ≤ 0.75.

In addition to activity-dependent changes in synaptic elements, vacant synaptic elements decay spontaneously with a very slow time constant of τ*vac* = 10 updates in connectivity.

*2.3.1.2. Breaking of synapses.* Since network connectivity is updated at discrete time steps but synaptic elements change continuously over time due to the activity-dependent growth rules, it can happen that a neuron has more outgoing synapses than axonal elements or more incoming synapses than dendritic elements at the time of the next update in network connectivity. In that case, the neuron has to delete the surplus of synapses and to update connectivity.

To update connectivity, the algorithm needs to select which synapses are to be removed. All synapses have an equal chance of being deleted. Note, however, that multiple synapses can co-exist from neuron *j* to *i* and that the more synapses there are, the higher the chance that a synapse between neuron *j* and *i* will be deleted. The probability *Pdel <sup>i</sup>*,*<sup>j</sup>* for synapse deletion between neuron *j* and *i* is computed by the following master equation that captures four different cases:

$$P\_{i,j}^{del} = \frac{\mathcal{W}\_{i,j}}{\sum \mathcal{W}\_{k,l}} \tag{4}$$

For deletion of incoming synapses, we need to distinguish between excitatory and inhibitory synapses in Equation 4. For deletion of incoming excitatory synapses of neuron *i* ∈ {*In* ∪ *Ex*}, we sum up *Wk*,*<sup>l</sup>* over all *l* ∈ {*Ex*}. For deletion of incoming inhibitory synapses of neuron *i* ∈ {*In* ∪ *Ex*}, we sum up *Wk*,*<sup>l</sup>* over all *l* ∈ {*In*}. For deletion of outgoing excitatory synapses of excitatory presynaptic neuron *j* ∈ {*Ex*}, all synapses are considered to any postsynaptic neuron *k* ∈ {*In* ∪ *Ex*}. Thus, we sum up *Wk*,*<sup>l</sup>* over all *k* ∈ {*In* ∪ *Ex*}. The same holds true for outgoing inhibitory synapses with *j* ∈ {*In*}.

Sequentially, outgoing and incoming excitatory and inhibitory synapses were selected for deletion. For every type of synapse, the accumulated sum of *Pdel <sup>i</sup>*,*<sup>j</sup>* (see description of Equation 4 for the range of *i* and *j*) gave a probability distribution from which we drew the required number of synapses to be deleted. The selected synapse was deleted by reducing the respective entry *Wi*,*<sup>j</sup>* in the connectivity matrix by one. It can happen that more than one synapse is selected for deletion from the same connection *j* to *i*. In that case, the implementation of the algorithm made sure that the number of synapses to be deleted did not exceed *Wi*,*j*. Whenever a neuron deletes a synaptic element that is bound in a synapse, the complementary synaptic element on the other neuron remains and becomes vacant again.

#### *2.3.2. Synapse formation*

For synapse formation, the algorithm checked whether a neuron gained vacant synaptic elements, i.e., whether the total number of synaptic elements exceeded the number of bound synaptic elements of this type. Matching vacant synaptic elements (vacant excitatory axonal elements *Avac <sup>j</sup>* , *j* ∈ {*Ex*}, with vacant excitatory dendritic elements *Dex*,*vac <sup>i</sup>* , and vacant inhibitory axonal elements *Avac <sup>j</sup>* , *j* ∈ {*In*}, with vacant inhibitory dendritic elements *Din*,*vac <sup>i</sup>* ) were randomly connected among each other with probability density function *Pform*. The probability *Pform <sup>i</sup>*,*<sup>j</sup>* for forming new synapses between neuron *j* and *i* depended on the number of vacant synaptic elements they offered and on the Euclidean distance between neuron *j* and *i*:

$$P\_{i,j}^{\text{form}} = \begin{cases} j \in \{\text{Ex}\} & : \quad \frac{A\_j^{\text{vac}} \ D\_i^{\text{ex}, \text{vac}}}{\sum\_{\iota \in \{\text{Ex}\}} A\_\iota^{\text{vac}} \sum\_{\kappa \in \{\text{Ex} \cup \iota \}} D\_\kappa^{\text{ex}, \text{vac}}} K\_{ij} \\\ j \in \{\text{In}\} & \frac{A\_j^{\text{vac}} \ D\_i^{\text{int}, \text{vac}}}{\sum\_{\iota \in \{\text{In}\}} A\_\iota^{\text{vac}} \sum\_{\kappa \in \{\text{Ex} \cup \iota \}} D\_\kappa^{\text{in}, \text{vac}}} K\_{ij} \\\ \text{with } i \in \{\text{Ex} \cup \text{In}\}. \end{cases} \tag{5}$$

where *Ki*,*<sup>j</sup>* is the Euclidean distance-dependent likelihood (kernel function) that neuron *j* connects to neuron *i* at all, irrespective of the number of vacant elements *i* and *j* offer. As in our previous work on MSP (Butz and van Ooyen, 2013; Butz et al., 2014), we applied either a flat kernel *Ki*,*<sup>j</sup>* = 1 (creating random networks) or a two-dimensional Gaussian kernel (creating small-world networks):

$$K\_{i,j,\ i \neq j} = e^{-\frac{(\rho o\_{\text{xj}} - \rho o\_{\text{xi}})^2 + (\rho o\_{\text{yj}} - \rho o\_{\text{yi}})^2}{\sigma^2}} \tag{6}$$

with *posxi* the x-coordinate and *posyi* the y-coordinate of postsynaptic neuron *i*, and *posxj* and *posyj* the coordinates of presynaptic neuron *j*. The probability for autapse connections (i.e., a neuron connecting to itself) was set to zero (*Ki*,*<sup>j</sup>* = 0 for *i* = *j*). For these simulations, we chose σ = 1 × 150µm, where 150µm is the distance between two grid points. Because *K* only depends on the Euclidean distance between neurons and since neurons do not migrate, *K* remains fixed.

For every update in connectivity, the minor number of vacant excitatory and inhibitory axonal or dendritic elements determined how many new excitatory and inhibitory synapses, respectively, could at most be formed (so-called potential synapses). Thus, the number of excitatory and inhibitory potential synapses equaled

$$M^{\text{PotSyn},\text{cc}} = \min \left( \sum\_{\iota \in \{\text{Ex}\}} A\_{\iota}^{\text{vac}}, \sum\_{\kappa \in \{\text{Ex}\cup\text{In}\}} D\_{\kappa}^{\text{ex},\text{vac}} \right)$$

$$M^{\text{PotSyn},\text{in}} = \min \left( \sum\_{\iota \in \{\text{In}\}} A\_{\iota}^{\text{vac}}, \sum\_{\kappa \in \{\text{Ex}\cup\text{In}\}} D\_{\kappa}^{\text{in},\text{vac}} \right) \qquad (7)$$

for every update in connectivity.

From this distribution, the algorithm chose at maximum *MPotSyn*,*ex* excitatory and *MPotSyn*,*in* inhibitory connections at which new synapses were created. The respective entries *Wi*,*<sup>j</sup>* in the connectivity matrix were then increased by one. A connection was chosen by drawing a random number from a uniform distribution and comparing it to the accumulated probabilities *Pform i*,*j* for all excitatory connections and all inhibitory connections of the entire network. That connection was chosen that had the highest accumulated probability that the random number just did not exceed. If, for this try, the random number exceeded all accumulated probabilities, no synapse was formed. Hence, not necessarily all of the potential synapses were formed.

Additionally, synapse formation needed to fulfill the condition that the number *W*<sup>+</sup> *<sup>i</sup>*,*<sup>j</sup>* of newly formed synapses from neuron *j* to *i* did not exceed the number of vacant synaptic elements that neuron *j* and *i* offered:

$$W\_{i,j}^+ \le \begin{cases} j \in \{\text{Ex}\} & : \quad \min\left(A\_j^{\text{vac}}, D\_i^{\text{ex}, \text{vac}}\right) \\ j \in \{\text{In}\} & : \quad \min\left(A\_j^{\text{vac}}, D\_i^{\text{in}, \text{vac}}\right) \end{cases}$$
 
$$\text{with } i \in \{\text{Ex} \cup \text{In}\}. \tag{8}$$

In every update, this condition was checked and synapse formation infringing this condition was rejected. Alternatively, update of connectivity can also be implemented in a purely local fashion (Butz and van Ooyen, 2013).

#### **2.4. MODELING DEAFFERENTATION**

We grew every model network from scratch, i.e., starting with zero connectivity and zero synaptic elements. Networks were formed by exactly the same growth rules that were effective after the lesion. However, in order to grow networks from scratch, it was necessary to use initially a higher level of external input. We used *Iext* = 8 mVms−<sup>1</sup> for the first 500 updates in connectivity and then lowered it gradually down to 5 mVms−<sup>1</sup> according to *Iext*(*T*) = ((8 − 5)/(1 + *exp*((*T* − 500)/200)) + 5) mVms<sup>−</sup>1. At *T* = 8000, we removed the input of a circumscribed area, the lesion projection zone (LPZ), by setting *Iext*,*LPZ*(*T*) = 0 (for *T* ≥ 8000) permanently. The LPZ spans from *x*1 = 5 × 150µm to *x*2 = 12 × 150µm and from *y*1 = 5 × 150µm to *y*2 = 12 × 150µm (cf. **Figure 5**) for all simulations and all cases (cf. Update of synaptic elements and breaking of synapses). We refer to the rest of the network with intact input as "intact zone." Every simulation is continued for another *T* = 12000 updates in connectivity. As in our previous work (Butz and van Ooyen, 2013), we matched 1000 updates in connectivity with 14 days post-lesion. Thus, simulations predict the time course of network rewiring for 24 weeks after the lesion.

#### **2.5. TOPOLOGY MEASUREMENTS**

A neuronal network can be seen as a graph, with neurons as nodes and synapses as edges or links between nodes. Since the presynaptic neuron always activates the postsynaptic neuron (and never the other way round), we regard the graph as directed. In order to describe changes in network topology after a focal loss of input, we assessed the following graph theoretical measures at every update in connectivity. To reduce the complexity of the assessment, we considered only the topology of the excitatory synaptic connections *Wex*,*ex* between the *nex* excitatory neurons. For the graph theoretical assessments, the brain connectivity toolbox by Rubinov and Sporns was used (Rubinov and Sporns, 2010).

#### *2.5.1. Weighted characteristic path length*

The characteristic path length *L* measures the average shortest path from one (excitatory) neuron to any other (excitatory) neuron in the network. Path length is defined as the number of connections that needs to be traveled to go from one neuron (possibly via intermediate neurons) to any other neuron:

$$L = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} L\_i = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} \frac{\sum\_{j, \, j \neq i}^{n^{\rm ex}} d\_{ij}}{n^{\rm ex} - 1} \tag{9}$$

On top of this definition, a direct connection between two neurons in a weighted network is considered "shorter" the stronger the weight of the connection is. For our network model, we take the number of synapses *Wex*,*ex <sup>i</sup>*,*<sup>j</sup>* between two directly linked neurons *j* and *i*, with *i*, *j* ∈ *Ex*, as the weight of the connection and the inverse 1/*Wex*,*ex <sup>i</sup>*,*<sup>j</sup>* as the length *li*,*<sup>j</sup>* of the connection. The shortest path *di*,*<sup>j</sup>* is then the smallest sum of connection lengths that lead from neuron *j* to *i* via any intermediate neurons. We calculated the weighted characteristic path length according to Rubinov and Sporns (2010). Additionally, in order to study the connectivity between subnetworks, we used Equation 9 to compute the average path lengths from neurons in the intact zone (with intact input) to neurons in the LPZ (deprived of input) and vice versa.

#### *2.5.2. Weighted clustering coefficient*

The clustering coefficient is an indication for how strongly neurons in a network are interconnected. It measures how many of any two neurons *j* and *h* that are both connected to node *i* are also connected to each other, relative to all neurons connected to *i*:

$$C = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} C\_i = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} \frac{\sum\_{j,h}^{n^{\rm ex}} a\_{ij} a\_{ih} a\_{jh}}{k\_i (k\_i - 1)} \tag{10}$$

where *aij*, *aih*, *ajh* ∈ {0, 1} (1 if a connection between the respective neurons exists and 0 if not) and *ki* is the number of neurons that neuron *i* is connected to. For weighted directed networks, the clustering coefficient can be computed according to the formalism by Fagiolo (2007). We computed the clustering coefficient at every update in connectivity according to the implementation by Rubinov and Sporns (2010). In addition to the averaged clustering coefficient of the entire network, we also computed the clustering coefficient averaged over either the LPZ neurons only or over the intact zone neurons only.

#### *2.5.3. Small-world parameter*

To estimate the small-worldness of networks, we applied the formalism by Humphries and Gurney (2008):

$$\mathcal{S} = \frac{\mathcal{Y}}{\lambda} = \frac{\mathcal{C}/C^{rand}}{L/L^{rand}} \tag{11}$$

We replaced the clustering coefficient *C* and the characteristic path length *L* by the version for weighted directed graphs as described above. To obtain the normalized clustering coefficient γ and the normalized characteristic path length λ, *C* and *L* were divided by *Crand* and *Lrand*, respectively, taken from an Erdos- ˝ Rényi random graph generated with the same number of neurons and synapses as in the deafferentated networks at every update in connectivity.

#### *2.5.4. Betweenness centrality*

Betweenness centrality measures the importance of neurons in the network. Betweenness centrality of a neuron is calculated by summing up the number of all shortest paths in the network that go via this neuron and dividing it by the number of all other shortest paths that do not pass this neuron. Global betweenness centrality is the sum over the betweenness centrality of all neurons:

$$BC\_{\mathcal{gl}obl} = \sum\_{i}^{n^{\infty}} \sum\_{k \neq i \neq l} \frac{\sigma\_{kl}(i)}{\sigma\_{kl}} \tag{12}$$

where σ*kl* is the total number of multiple shortest paths between neuron *k* and neuron *l*, and σ*kl*(*i*) is the number of shortest paths that go via neuron *i*. Shortest paths are based on weighted excitatory connections *Wex*,*ex <sup>i</sup>*,*<sup>j</sup>* , and global betweenness centrality was computed by the formalism for weighted directed networks by Brandes (2001) as implemented by Rubinov and Sporns (2010).

#### *2.5.5. Local efficiency*

Local efficiency *Eloc*,*<sup>i</sup>* measures how well the neighbors of neuron *i*, i.e., other neurons that directly form a synapse with *i*, are interconnected and is therefore related to the clustering coefficient. For this, the average of the shortest path lengths *djh*(*Gex <sup>i</sup>* ) between any two excitatory neighboring neurons *j* and *h* of neuron *i* is computed that uses only paths of the subgraph *Gex <sup>i</sup>* consisting of all the excitatory neighbors of *i* but not of *i* itself (Latora and Marchiori, 2001):

$$\begin{split} E\_{\text{loc}} &= \frac{1}{n^{\text{ex}}} \sum\_{i}^{n^{\text{ex}}} E\_{\text{loc},i} \\ &= \frac{1}{n^{\text{ex}}} \sum\_{i}^{n^{\text{ex}}} \frac{\sum\_{j,h,\ j,h \neq i}^{n^{\text{ex}}} a\_{ij} a\_{jh} \left[ d\_{jh} (G\_i^{\text{ex}}) \right]^{-1}}{k\_i (k\_i - 1)} \end{split} \tag{13}$$

where *aij*, *ajh* ∈ {0, 1} (1 if a connection between the respective neurons exists and 0 if not) and *ki* is the number of neurons that neuron *i* is connected to. We used the weighted, directed version of local efficiency (Rubinov and Sporns, 2010).

#### *2.5.6. Global efficiency*

Global efficiency *Eglob* is related to the inverse of the characteristic path length, but with the advantage that it can also be meaningfully computed for unconnected graphs. Whereas the path length between unconnected nodes is infinite (cf. Equation 9), the inverse is zero and therefore adds neutrally to global efficiency (Latora and Marchiori, 2001; Achard and Bullmore, 2007):

$$E\_{\rm global} = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} E\_{\rm global}, i = \frac{1}{n^{\rm ex}} \sum\_{i}^{n^{\rm ex}} \frac{\sum\_{j, \, j \neq i}^{n^{\rm ex}} d\_{ij}^{-1}}{n^{\rm ex} - 1} \tag{14}$$

where *Eglob*,*<sup>i</sup>* is the efficiency of node *i* and *nex* is the number of excitatory neurons. We used the version of this equation for weighted, directed graphs (Rubinov and Sporns, 2010). Note that local efficiency and clustering coefficient as well as global efficiency and characteristic path length are closely related but not identical measures. Local and global efficiency are frequently used in clinical studies and are therefore presented here in addition to clustering coefficient and characteristic path length.

#### **3. RESULTS**

#### **3.1. PHYSIOLOGICAL NETWORK REWIRING**

In our previous work (Butz and van Ooyen, 2013), we postulated activity-dependent growth curves for axonal and dendritic elements that gave rise to the same kind of network rewiring as observed in primary visual cortex after focal retinal lesions. With these growth curves (referred to as physiological growth curves), in which axonal elements required higher levels of electrical activity than dendritic elements to grow out (η*<sup>A</sup>* = 0.4, η*<sup>D</sup>* = 0.1), the LPZ recovered from the outside to the inside and the turnover of dendritic elements was surprisingly similar to the experimental data on dendrtic spine turnover (Butz and van Ooyen, 2013). In the present study, we investigated how network topology changes in response to a focal loss of input, with neurons rewiring their inputs (and outputs) locally in order to restore a desired level of electrical activity. Our modeling results show that networks employing physiological growth curves return to a homeostatic range in electrical activity (**Figure 2A**) and, as a result of compensatory rewiring, become more randomly connected, as indicated by a lower value of the small-world parameter *S* (**Figure 2B**) measured over the entire network. Although random networks have no nodes of particular importance and hence a low betweenness centrality, neurons in the LPZ have a higher betweenness centrality after network rewiring than before the lesion (**Figure 2C**).

The decrease in small-world parameter *S* is determined by the course of the clustering coefficient γ and the characteristic path length λ. While λ converges to one, γ decreases markedly (**Figure 3**) and is thereby responsible for networks becoming more random. The decrease in clustering is not immediate but sets in between 6 and 8 weeks after the lesion. As will be shown below, it takes some time until network reorganization has managed to restore neuronal activities to their homeostatic range. During this time period, there is a temporary drop in characteristic path length below one, which contributes to a temporary rise in *S* (**Figure 2B**). However, after about 16 weeks, λ reaches stable

**FIGURE 2 | Physiological case.** Compensatory network rewiring renders neuronal networks more random and increases their betweenness centrality. **(A)** Average electrical activities, as measured by the mean calcium concentration of the respective area, are restored to the homeostatic range for neurons in the LPZ (red) and the intact zone (green). Neurons corresponding to the LPZ in a non-lesioned network do not alter their calcium concentration (control, black). **(B)** Networks become more random after

deafferentation, as indicated by a decrease in small-world parameter *S* (red) measured over the entire network, whereas control networks show no change in small-worldness (black). **(C)** At the same time, betweenness centrality increases in the LPZ (red) but decreases in the intact zone (green). Betweenness centrality of neurons corresponding to the LPZ in a non-lesioned network remains stable (control, black). Means over five simulations per scenario. Shadings of the curves indicate standard deviations.

values around one. From the same time on, *S* stabilizes at lower levels than in control networks without lesions.

From our previous work on modeling cortical rewiring after focal retinal lesions (Butz and van Ooyen, 2013), we know that functional network repair can be brought about by an, also experimentally observed, ingrowth of connections from the intact zone to the LPZ (Darian-Smith and Gilbert, 1994; Yamahachi et al., 2009). For physiological network repair to go along with functional retinotopic remapping (as shown in mice Keck et al., 2008), we found that it is important that the majority of new connections impinging on deafferentated neurons originates from intact areas and transmits electrical activity from the intact zone to the LPZ. Here, we further investigate whether the changes in global topology parameters express this ingrowth of connections. For this, we first focus on the activity-dependent changes in synapse numbers and connectivity between the intact zone and the LPZ. The first 6 weeks are dominated by a loss of synapses originating from the LPZ (**Figure 4A**). This is a direct consequence of neuronal activities being low and calcium concentrations being below η*<sup>A</sup>* = 0.4 (**Figure 4B**), which causes axonal elements to be removed. By contrast, axonal elements from the intact zone form additional synapses with the LPZ right from the onset of the lesion. Between 6 and 8 weeks after the lesion, most neurons in the LPZ have reached calcium levels of 0.4 and start forming additional axonal elements, connecting to targets in the LPZ as well as the intact zone. The number of recurrent synapses from the LPZ to the LPZ does thereby at no time exceeds the number of synapses from the intact zone to the LPZ, as required for a functional remapping to emerge.

The change in λ and γ as shown in **Figure 3** is measured over the entire network. We further want to understand whether the course of γ and λ is caused by the changing connectivity between

**FIGURE 4 | In the physiological case, compensatory network rewiring relies on the formation of new synapses from the intact zone to the LPZ. (A)** Synapse numbers from the intact zone to the LPZ increase (green), while synapses numbers from the LPZ to the intact zone decrease (red). **(B)** All neurons in the intact zone (green) and most neurons in the LPZ (red) return to the homeostatic range following deafferentation. Neurons lose axonal and dendritic elements if their calcium concentration is lower than 0.1 or higher

than 0.75 (dark gray background). Neurons form only dendritic elements if their calcium concentration is greater than 0.1 but lower than 0.4 (gray), and form both axonal and dendritic elements if their calcium concentration is greater 0.4 but lower than 0.65 (light gray). The homeostatic range, in which synaptic element numbers do not change, spans from 0.65 to 0.75. The diagram helps to match changes in topology with the current level of *(Continued)*

#### **FIGURE 4 | Continued**

electrical activity. **(C)** The normalized average clustering coefficient γ of neurons in the LPZ (including connections with the entire network) decreases while neuronal activities are very low (<0.1) and increases as soon as activities of LPZ neurons are greater than 0.1. The first bump in clustering is brought about by ingrowing synapses from the intact zone into the LPZ, whereas the second rise in clustering is caused predominantly by new synapses within the LPZ, which are formed when calcium concentrations of LPZ neurons exceed 0.4. The γ of neurons in the intact zone (considering all their connections to any neuron in the entire network) decreases continuously after a temporary

the intact zone and the LPZ. For this, we assessed γ and λ for the set of LPZ and intact zone neurons separately. We can distinguish three phases in the time course of both parameters. These phases arise from the interaction between the loss of connections from the LPZ and the formation of new connections from the intact zone. The initial phase lasts for the first 4 weeks after the lesion and is dominated by a loss of connections from the LPZ. This is reflected by a decrease in γ , especially of LPZ neurons but to a lesser extent also of intact zone neurons (**Figure 4C**). At the same time, λ of paths from the LPZ to the intact zone increases (**Figure 4D**) due to the loss of connections from the LPZ to the intact zone. Conversely, λ of paths from the intact zone to the LPZ decreases because new connections are being formed originating from the intact zone.

During the second phase, roughly between 4 and 8 week, we see a temporal increase in γ of both the LPZ and the intact zone neurons (**Figure 4C**). This increase essentially contributes to the temporal increase in small-worldness of repairing networks as shown in **Figure 2B**. During this phase, the decrease in number of connections from the LPZ slows down, while new connections from the intact zone are still being formed. During this second phase, especially λ for paths from the LPZ to the intact zone shows a rapid decrease (**Figure 4D**). This rapid decrease is brought about by a few new connections that are formed as soon as LPZ neurons reach calcium levels of 0.4 (**Figure 4B**). This happens already slightly before the average number of synapses from the LPZ to the intact zone increases significantly at about 6 weeks after lesion.

A third phase can be distinguished from 8 weeks after the lesion onwards, when LPZ neurons start forming outgoing connections again. Especially the recurrent connections inside the LPZ (**Figure 4A**) lead to an increase in γ of LPZ neurons (**Figure 4C**), while neurons in the intact zone show a decrease in γ after the temporary rise. However, γ of the LPZ neurons is not strictly increasing over time; between 8 and 12 weeks after the lesion, γ decreases a second time before it finally increases toward a stable level. We can explain this fluctuation in γ by the ongoing replacement of connections during this period. Only if all neurons in the LPZ have reached calcium levels beyond 0.4, and hence contribute to axonal element and (outgoing) synapse formation, does the clustering coefficient strictly increase until rewiring comes to a standstill. During the third phase, λ of paths from intact zone to LPZ further decreases (**Figure 4D**). This further decrease is brought about by additional connections inside the LPZ, contributing to network repair and shortening paths to rise. **(D)** Average shortest paths from neurons in the intact zone to neurons in the LPZ show a steady decrease (green), while average path lengths from LPZ to intact zone neurons return to initial levels after a tri-phasic increase and decrease. **(E)** The clustering coefficient with no normalization (Equation 10) does not show a decrease for intact zone neurons as the normalized clustering coefficient γ does. **(F)** No differences were found between the characteristic path length and the normalized characteristic path length λ. Green curve indicates changes in clustering coefficient of intact zone neurons with the entire network. Means over five simulations per scenario. Shadings of the curves in **(A,C–F)** indicate standard deviations.

neurons in the LPZ. The decrease in path lengths to the LPZ also explains the increasing betweenness centrality of LPZ neurons, since betweenness centrality by definition is a measure of how many shortest paths go via certain nodes. As shown in **Figure 4D**, λ of paths from the LPZ to the intact zone takes on values of a randomized network.

Interestingly, the absolute clustering of neurons in the intact zone shows very little change (**Figure 4E**), implicating that the particular course of γ arises from changes in the number of connections and their clustering in comparison with a randomized network. By contrast, the changes in clustering of the LPZ (**Figure 4E**) as well as the characteristic path length for both the LPZ and the intact zone (**Figure 4F**) show similar courses for the non-normalized and normalized values. Therefore, we may conclude that networks become more random because of the increase in number of connections, whereas the increase in betweenness centrality (as a result of decreasing path lengths from the intact zone to the LPZ) is a consequence of added specific projections from the intact zone to the LPZ.

#### **3.2. ABERRANT NETWORK REWIRING**

Network repair does not in all cases lead to the formation of synapses from the outside to the inside and a functional reorganization of connectivity. In our previous study (Butz and van Ooyen, 2013), we identified three different cases of network rewiring depending on the relative values of the growth parameters η*<sup>A</sup>* and η*D*. For η*<sup>A</sup>* > η*D*, we observed network repair in line with the exeperimental data (physiological case); for η*<sup>A</sup>* = η*<sup>D</sup>* = 0.1, we observed network repair brought about by massive recurrent connections (recurrent case); and for η*<sup>D</sup>* > η*A*, we observed no network repair at all (no-repair case).

The network rewiring occurring in the last two cases are referred to as aberrant network rewiring. **Figure 5** depicts the most evident differences in the layout of connections after compensatory network rewiring between the physiological and the recurrent case and shows the no-repair case for the sake of completeness.

Whereas in the physiological case (**Figure 5A**) most of the newly formed synapses from the intact zone terminate in the LPZ, we do not see this ingrowing of new synapses in the recurrent case (**Figure 5B**) or in the no-repair case (**Figure 5C**). In the recurrent case and the no-repair case, new synapses from anywhere in the intact zone predominantly connect to neurons in the intact zone in the direct vicinity of the LPZ. In the pysiological and the recurrent case, but not in the no-repair case,

physiological and the recurrent case. Insets in the middle column illustrate the axonal projection pattern of an individual neuron in the LPZ. In the physiological case, neurons at the border of the LPZ connect to neurons more central in the LPZ, whereas in the recurrent case neurons have less preferrence for particular targets. The right column shows that many synapses originating from the LPZ are deleted in the physiological case but not in the recurrent case. All measurements are based on the difference between the number of synapses present before (*T*<sup>0</sup> = 7950) and after the lesion (*T*<sup>1</sup> = 20000 updates in connectivity, corresponding to 24 weeks after lesion), separately for excitatory and inhibitory synapses. Only excitatory neurons and excitatory to excitatory connections were used in the topological assessments.

LPZ neurons contribute to network repair by forming additional synapses. However, there is an important difference between the physiological and the recurrent case in where LPZ neurons project to. LPZ neurons in the physiological case form new connections to neurons in the LPZ and preferentially to those in its center (inset **Figure 5A**), whereas LPZ neurons in the recurrent case also project to neurons in the intact zone and show less projection preference (inset **Figure 5B**). A marked difference between the physiological and the recurrent case is seen in the loss of synapses originating from the LPZ. Whereas many synapses are lost in the physiological case, almost no synapses originating from the LPZ are eliminated in the recurrent case. Therefore, network repair in the recurrent case is brought about by addition of new synapses, whereas in the physiological case network repair goes along with a replacement of synapses. The no-repair case shows a considerable loss of synapses originating from the LPZ. Neurons in the LPZ are not able to raise their activity beyond η*<sup>A</sup>* (**Figure 6C**) and therefore lose axonal elements and outgoing synapses as a direct consequence of the growth rules.

Whereas physiological network repair goes along with increasing randomness of network connectivity, as indicated by a decrease in small-world parameter *S*, we do not see a considerable change in *S* in the recurrent case (**Figure 6A**). Interestingly, in networks that lack repair after lesions, we even see an increase in *S*. The strongest increase in betweenness centrality of neurons in the LPZ is observed in the recurrent case (**Figure 6B**) and sets in much earlier (after about 2 weeks) than in the physiological case due to a mere addition of synapses rather than a replacement of synapses. Betweenness centrality goes to zero (**Figure 6B**) when neurons do not return to the homeostatic range in activity (**Figure 6C**). Given that in the recurrent case, neurons in the LPZ restore their activity most quickly and completely and with the strongest increase in betweenness centrality, we may conclude that the increase in betweenness centrality is an indicator for the success of network repair in terms of restoring neuronal activity.

Local and global efficiency are additional measures quantifying changes in network topology (Equations 13 and 14). Global efficiency indicates how efficiently information can travel through the entire network; i.e., global efficiency is the averaged sum of the inverse of the shortest paths between any two neurons in the entire network. By contrast, local efficiency of neuron *i* measures how efficiently information can be exchanged among neurons that are connected to neuron *i*; i.e., local efficiency is the averaged sum of the inverse of the shortest path between any two neurons connected to neuron *i* (excluding neuron *i* itself). Especially in sparsely connected networks, efficiency as a topology measure is preferred over characteristic path length and clustering coefficient because it can be meaningfully computed also for unconnected neurons. In the physiological case, we observe a decrease in local efficiency (**Figure 7A**) but an increase in global efficiency (**Figure 7C**) relative to the efficiencies before the lesion. Both local and global efficiency go through an initial phase in which they decrease, reaching a minimum at about 6 weeks after the lesion. The global efficiency recovers and finally even exceeds its initial level, whereas the local efficiency recovers little and remains lower than before the lesion. By contrast, in the recurrent case, both local and global efficiency increase immediately after the lesion and exceed by far their initial levels and the levels in the physiological case. A drop in local and global efficiency is observed when no network repair takes place. The intact zone does not show a considerable change in either local or global efficiency (**Figures 7B,D**). The ratio of local to global efficiency indicates the relative amount of local clustered and global long-range connectivity. The stronger increase in global than in local efficiency in the physiological case reflects the increasing ramdomness (cf. **Figure 2B**), whereas recurrent networks with a strong increase in global and local efficiency become even more small-world (cf. **Figure 6A**).

The stronger increase in local efficiency in the recurrent case compared with the physiological case is brought about by the massive formation of partly recurrent connections originating from the LPZ. In fact, the number of recurrent synapses in the LPZ exceeds by far the number of synapses from the intact zone to the LPZ and from the LPZ to the intact zone (**Figure 8A**). The high number of recurrent synapses leads to a strong increase in clustering coefficient (**Figure 8B**). The clustering coefficient of the LPZ after rewiring even exceeds that of the intact zone; the latter does not change notably after the lesion. Remarkably, the average shortest paths from the intact zone to the LPZ and those from the LPZ to the intact zone strongly decrease simultaneously (**Figure 8C**).

#### **3.3. CHANGES IN DEGREE DISTRIBUTION RESULTING FROM NETWORK REWIRING**

The different types of network rewiring have a direct impact not only on global network topology but also on the local degree distributions of neurons. Before the lesion, neurons of the intact zone and the LPZ have the same in- and out-degree distribution, in the physiological case (**Figures 9A,B**) as well as in the recurrent case (**Figures 10A,B**). The distributions in the physiological case are slightly more tailed than in the recurrent case. After the lesion in the physiological case, the center of the in-degree distribution of the LPZ neurons shifts to the right (**Figure 9C**), indicating

**FIGURE 8 | The growth rules in the recurrent case, whereby axonal and dendritic elements can already form at low neuronal activity, have a considerable impact on network topology after the lesion. (A)** A strong increase in synapse numbers within the LPZ (black) is seen after the lesion. **(B)** The surplus of recurrent synapses in the LPZ gives rise to an increasing clustering coefficient of LPZ neurons (red) that even exceeds the clustering

decreases and later increases but remains lower than its initial value. If

of the intact zone (green). In computing the average clustering coefficients over the excitatory neurons of the intact zone and the LPZ, we considered all excitatory connections from the entire network. **(C)** Average path lengths from neurons in the intact zone to neurons in the LPZ (green) and vice versa (red) show a steady decrease after deafferentation. Means over five simulations per scenario. Shadings of the curves indicate standard deviations.

global efficiency of neurons in the intact zone **(D)**.

the presence of neurons with high in-degrees. The centers of the out-degree distributions of LPZ and intact zone neurons do not change, but the distributions as a whole become more fat-tailed (**Figure 9D**). Thus, compensatory network rewiring generates more hub-like neurons in the LPZ, but the majority of neurons in the LPZ and the intact zone maintains its out-degree. In the recurrent case, LPZ neurons shift the centers of their in- and outdegree distributions completely to the right (**Figure 10**), but the distributions do not become more fat-tailed. The in- and outdegree distributions become markedly different from the ones before the lesion and from the degree distributions of the intact zone neurons. We may conclude that due to the massive recurrent connections, the LPZ neurons separate from the intact zone in terms of degree distribution.

#### **3.4. IMPACT OF INITIAL TOPOLOGY ON NETWORK REPAIR**

Network repair is not dependent on a particular initial network topology. The networks considered so far have a high clustering and a low characteristic path length before the lesion (small-world networks). However, even random networks with low initial clustering and characteristic path length fully restore their average electrical activity (in terms of calcium concentration) back to the homeostatic range, regardless of whether growth rules of the physiological case (**Figure 11A**) or the recurrent case (**Figure 11D**) are used. In random networks, the activity of the LPZ does not decrease so strongly as in clustered networks, because vacant axonal elements are available from anywhere in the network and network repair is immediately effective. The fastest restoration of electrical activity is seen for the recurrent case with random networks (**Figure 11D**). In addition to the availability of axonal elements from anywhere in the network, in the recurrent case neurons with low activity also provide their own vacant axonal elements, contributing to fast network repair.

Irrespective of growth rules and initial network topology, restoration of firing rates is accompanied by an increase in betweenness centrality (**Figures 11B,E**). For the physiological and the recurrent case, betweenness centrality reaches higher absolute values in small-world networks than in random networks. However, the greatest increase in betweenness centrality is seen for the recurrent case with random networks. Interestingly, for all scenarios studied (**Figures 11A,D**), the strongest increase in betweenness centrality is associated with the fastest restoration of electrical activity. We conclude that the increase in betweenness centrality is a generic effect of compensatory network rewiring because it is independent of initial connectivity and strongly correlates with effectiveness of network repair, in terms of speed and completeness of restoring electrical activity. Moreover, the physiological case with small-world networks is the only scenario in which topology becomes more random (**Figure 11C**). In all other scenarios (**Figures 11C,F**), we see only little change and initially random networks become only slightly more structured (small increase in *S*) after the lesion.

#### **4. DISCUSSION**

We postulated that network repair after focal deafferentation is brought about by a local neuronal mechanism that aims to maintain homeostasis of neuronal electrical activity by adapting the neuron's number of input and output connections (homeostatic structural plasticity). In the model in which we studied the implications of this mechanism for network topology after deafferentation, we found that local changes in number of synaptic connections, as governed by homeostastic structural plasticity, led to pronounced alterations in global network topology, especially in the connectivity between intact and deafferentated areas. While local connections in the LPZ were massively eliminated, new connections from the intact zone grew into the LPZ, helping deafferentated neurons to restore their level of activity (see also Butz and van Ooyen, 2013). This replacement of short- by longrange connections lowered the clustering coefficient and reduced the characteristic path length, making the network more random than before the lesion. At the same time, neurons in the LPZ enhanced their betweenness centrality. Furthermore, LPZ neurons increased their global but decreased their local efficiency and got longer tailed degree distributions, indicating the emergence of hub neurons.

So far, only very few models have addressed dynamic changes in network topology after brain lesions. Li et al. (2013) described changes in topology merely phenomenologically and did not include any neuronal mechanism such as the formation and deletion of synapses. Others have applied neural mass models with various rules of plasticity and assessed by graph theoretical methods the changes in inter-area connectivity in response to lesions and degeneration (Rubinov et al., 2009; Stam et al., 2010; de Haan et al., 2012). In contrast with these more abstract models, our neuronal network model is more detailed and strongly inspired by the notion that neurons after a permanent loss of input, e.g., after focal retinal lesions, aim to restore their firing rates homeostatically by morphological adaptations such as the replacement of dendritic spines and axonal boutons. Therefore, we can derive predictions on how morphological alterations in individual neurons rewire intra-area connectivity in response to lesions or lasting loss of input. Insight into intra-cortical topology changes after loss of input is particularly important because local topographic features influence restoration of vision in humans (Sabel et al., 2011, 2013; Gall et al., 2013).

As yet, there are no experimental studies on dynamic changes in intra-area or inter-area network topology after focal retinal lesions, the experimental paradigm our model is most closely linked to (cf. Butz and van Ooyen, 2013). However, massive rewiring of synaptic connections not only occurs after focal retinal lesions in the visual cortex (Keck et al., 2008; Yamahachi et al., 2009; Marik et al., 2014) but also accompanies functional recovery after focal or subcortical stroke (Carmichael, 2003, 2006; Cramer,

2008). The findings from our model provide useful predictions also for focal or subcortical stroke, because the way a subcortical stroke affects cortical motor networks is essentially a deprivation of inputs from the lesioned subcortical to the intact cortical motor areas. Indeed, brain regions deafferentated by stroke show a restoration of electrical activity to normal levels in chronic patients, as measured by fMRI, that go along with persistent changes in inter-area topology (Sharma et al., 2009). We hypothesize that the, as yet not investigated, lesion-induced topology changes in intra-area connectivity may follow the same underlying rules as the observed changes in inter-area connectivity after focal stroke (Wang et al., 2010).

Lesion-induced structural plasticity does not always lead to restoration of impaired functions, and miss-wiring of brain circuits after lesions may even give rise to post-traumatic epilepsy (Topolnik et al., 2003). An additional interesting outcome of our model is that homeostatic structural plasticity can overcompensate a loss of input, resulting in pronounced oscillatory network activity that may account for the emergence of posttraumatic epilepsy (Butz and van Ooyen, 2013).

#### **4.1. FROM MICRO- TO MACRO-SCOPIC**

Remarkably, network reorganization in the model shows striking similarities with intracortical network reorganization on a mesoscopic scale [e.g., a retinotopic remapping with filling of the LPZ from the outside to the inside (Butz and van Ooyen, 2013); mesoscopic defined as in Liljenstroem, 2001] and may also account for macroscopic network changes after, for example, focal, subcortical stroke. An impairment of motor function of the hand after subcortical stroke coincides with a loss in effective connectivity of inter-area cortical motor networks, especially between pre-motor and primary motor cortices in the hemisphere ipsilateral to the stroke site (Grefkes et al., 2008). Conversely, restoration of electrical activity and functional recovery are associated with increasing effective connectivity from prefrontal to motor cortices (Sharma et al., 2009). The functional effects are thought to arise from a loss of local connections within the motor network and the formation of additional long-range excitatory connections from prefrontal to motor areas (Sharma et al., 2009). In our model, we also observe the local removal of connections from the deafferentated neurons in the LPZ and an ingrowth of more long-range connections from the intact zone into the LPZ. Note that in the model as well as in the brain, the loss of local connections is not a mere consequence of degeneration caused by the primary lesion but a secondary effect due to network reorganization (Rehme et al., 2011).

In stroke patients, even brain regions remote from the lesion site change their topology (Carmichael, 2006). However, not the entire brain changes its topology but only those regions directly connected to the primary lesion site (Carmichael et al., 2001; Dancause et al., 2005; Rehme and Grefkes, 2013). Provided that connected brain regions become deafferentated by the primary lesion site, homeostatic structural plasticity, as revealed by our modeling study, may account for the observed changes in macroscopic topology after a lesion, i.e., an increased randomness of network connectivity and an increased or decreased betweenness centrality of particular regions (Wang et al., 2010; Shi et al., 2013). Wang et al. (2010) reported an increase in betweenness centrality of brain regions that became deafferentated by a subcortical stroke, namely the ipsilesional primary motor area and the contralesional cerebellum. Note that the predominant cortico-pontocerebellar fiber tract crosses in the brainstem to the contralateral side, and a subcortical lesion will therefore deafferentiate the contralesional cerebellum. By contrast, the contralesional primary motor cortex and the ipsilesional cerebellum decreased their betweenness centrality. The latter two regions are not directly affected by deafferentation but are still involved in reorganization, since especially contralateral areas seem to support their homotopic regions by compensatory sprouting during stroke recovery (Carter et al., 2010). An increase in betweenness centrality of deafferentated brain regions and a decrease in betweenness centrality of brain regions supporting recovery perfectly match with our model findings, so we hypothesize that the observed topological changes in the brain of stroke patients may be accounted for by homeostatic structural plasticity. Furthermore, in the model, the increase in betweenness centrality has proven to be the strongest indicator of network repair under different conditions. Therefore, increasing betweenness centrality could be a biomarker for brain repair after lesions such as stroke. In future work, we intend to implement homeostatic structural plasticity in a large-scale model of micro- and macroscopic connectivity containing multiple brain regions (Potjans and Diesmann, 2014).

#### **4.2. TIME COURSE OF NETWORK REPAIR**

The time course of changes in topology in our self-repairing network model shows remarkable similarities with the time course of topology changes during brain repair, especially in patients with subcortical stroke. Subcortical stroke involves, apart from damage to a circumscribed volume of brain tissue, a loss of input to other brain regions, particularly those of the motor network. The model predicts a pronounced increase in small-worldness of the entire network during the initial phase of compensatory network rewiring, before the network in the end becomes more random. Indeed, brain networks after subcortical stroke increase their small-world property in the subacute phase (about 1 week post-infarct) (van Meer et al., 2012) and thereafter become continuously more random. As in our model, the change in smallworldness of brain networks is brought about by a marked change in clustering.

From our model we further predict that right after the lesion, local as well as global efficiency drops markedly as a result of loss of connections. The decrease in efficiency is in agreement with changes in network topology observed after stroke (Honey and Sporns, 2008; Alstott et al., 2009). In the model, local efficiency remains always lower than before the lesion, but global efficiency increases markedly and reaches values higher than before the lesion. Strikingly, even in well-recovered stroke patients, brain networks are found with low local but high global connectivity (Rehme and Grefkes, 2013). However, brain network topology with low local and high global efficiency may contribute to less stable performance of sensorymotor skills (Rehme and Grefkes, 2013).

Interestingly, in the model we observe only small changes in topology within the first 4 weeks. Network repair in stroke patients is also not immediate. From monkey studies it is well known that it takes about 7–14 days after stroke until axonal sprouting occurs, and new connections are visible not before 28 days (Carmichael, 2003), with lesion-induced network rewiring continuing for at least 3–6 months (Carmichael, 2006; Cramer, 2008). The time course of axonal sprouting in the experiment is comparable to the time course of axonal element formation in our model and also matches the physiological time course of structural plasticity in mice (Keck et al., 2008; Butz and van Ooyen, 2013). The model illustrates that network repair after deafferentation and stroke can be brought about by local, homeostatic growth rules.

The time course of network repair is determined by the relation between the growth curve parameters η*<sup>A</sup>* and η*D*. If axonal elements require more activity to form than dendritic elements (i.e., η*<sup>A</sup>* > η*D*), networks will show a compensatory growth of connections from the intact zone into the LPZ. However, if axonal and dendritic elements grow at the same low level of activity, deafferentated neurons will literally pull them selves by their own bootstraps by forming massive recurrent connections to restore activity to the homeostatic set-point (Butz and van Ooyen, 2013). By contrast, the course of reorganization is not crucially dependent on the particular choice of parameters for neuronal electrical activity (Figure S1), as long as the network is able to reach a homeostatic equilibrium before the external input is removed. Other parameters, such as the width of the kernel σ = 150 µm, have been chosen in agreement with experimental findings (De Paola et al., 2006). Likewise, the decay time of intracellular calcium was chosen to be of the same order of magnitude as measured experimentally (Hofer et al., 2011).

#### **4.3. HOMEOSTATIC STRUCTURAL PLASTICITY vs. SYNAPTIC SCALING**

The notion that neurons strive to restore their level of electrical activity after loss of input is now widely accepted (Hengen et al., 2013; Keck et al., 2013). Even in stroke, the need of neurons to restore electrical activity to a homeostatic set-point may be an underlying principle of recovery (Avramescu et al., 2009). Today, the predominantly discussed mechanism for maintaining homeostasis in electrical activity is synaptic scaling (Turrigiano and Nelson, 1998). However, synaptic scaling restores activity within 48 h, yet network reorganization continues massively for several months. Therefore, synaptic scaling cannot be the only mechanism involved in network reorganization after deafferentation and stroke. Moreover, Hengen et al. (2013) showed that firing rates in V1 after focal retinal lesions restore within the first 48 h but drop again thereafter before they slowly rise again. This finding is in line with previous reports on the extended time course of network repair (up to 12 months) after focal retinal lesions and the restoration of electrical activity from the outside to the inside of the LPZ (Giannikopoulos and Eysel, 2006). In the first 48 h after the lesion, homeostatic synaptic scaling may upregulate firing rates (Keck et al., 2013), but the continuing structural changes in connectivity beyond 48 h may alter activity levels and may bring neurons again outside their homeostatic range of activity. Rewiring connectivity may provide a straightforward explanation for the experimental observation that activity levels drop again after 48 h (Hengen et al., 2013) and slowly recover over several weeks (Giannikopoulos and Eysel, 2006; Hu et al., 2009, 2010). In the model, the removal of connections is accounted for by the minimal levels of activity needed for maintainance and formation of axonal and dendritic synaptic elements (η*<sup>A</sup>* and η*D*, respectively); required minimal levels of activity have not been discussed in recent concepts of homeostatic plasticity (but see van Ooyen et al., 1996). In our model as well as in visual cortex after focal retinal lesions, activity slowly recovers over periods of weeks and months (Giannikopoulos and Eysel, 2006; Butz and van Ooyen, 2013). We hypothesize that the increase in firing rates is due to the ingrowth of long-range connections from intact regions, which may be guided by homeostatic structural plasticity. Therefore, we postulate that homeostatic plasticity is more than just synaptic scaling and needs to be extended to encompass structural plasticity, including the reorganization of synaptic connections. With the present model, we have provided growth rules that can govern homeostatic structural plasticity and that can lead to physiologically realistic network reorganization on a microscopic, mesoscopic (Butz and van Ooyen, 2013), and macroscopic level.

#### **4.4. A POTENTIAL ROLE OF HOMEOSTATIC PLASTICITY IN EPILEPTOGENESIS**

Partial deafferentation, as caused by focal stroke for example, can lead to epileptiform activity and seizures (Topolnik et al., 2003; Avramescu and Timofeev, 2008). It has been discussed that homeostatic synaptic plasticity may contribute to post-traumatic epileptogenesis in chronically isolated cortex (Houweling et al., 2005). Synaptic scaling (Turrigiano and Nelson, 1998), a wellstudied mechanism for homeostatic synaptic plasticity, is known to generate epileptiform activity (Froehlich et al., 2008). However, synaptic plasticity does not include the rewiring of networks and acts on timescales of hours rather than weeks or months. Although a previous modeling study (Houweling et al., 2005) has suggested that anatomical network rewiring is not required for epileptiform activity to occur, we argue that without network rewiring an important aspect of lesion-induced plasticity is left out. For example, models without structural plasticity cannot account for the clinical observation that although spontaneous seizures are most frequent within months after the lesion, they can occur up to 5 years post-lesion (Temkin, 2001). Therefore, we propose that synaptic scaling may account for spontaneous seizures early after the lesion but that for the pathogenesis of posttraumatic epilepsy months after the lesion, homeostatic structural plasticity may be a more suitable explanation (see also van Oss and van Ooyen, 1997).

In the model, a change in the value of just a single parameter, namely the level of activity needed for axonal elements to form (η*A*), leads to massive recurrent connections, which, as we showed in a previous study (Butz and van Ooyen, 2013), can generate strongly synchronized activity patterns comparable to epileptiform activity. In an *in vitro* injury model of epilepsy, Srinivas et al. (2007)showed that epileptogenesis goes along with a marked increase in connectivity [also supported by findings on recurrent mossy fiber sprouting in an organotypic cell culture model of hippocampal epilepsy (Kharatishvili et al., 2007)] and that the shape of the degree distribution of the neurons changes from powerlaw to Gaussian. Interestingly, in the recurrent case of our model, which generates epileptiform activity after network reorganization, the degree distribution of LPZ neurons is much less tailed than in the physiological case after network repair. Therefore, we hypothesize that the way brain networks rewire after lesions determines whether or not patients develop post-traumatic epilepsy. This notion is further supported by the finding that the shape of the lesion can affect epileptogenesis (Volman et al., 2011), since it is more likely that the shape of the lesion can influence epileptogenesis with growth of new connections than with synaptic scaling. Importantly, our model predicts that the sensitivity of axonal outgrowth to low levels of activity might be decisive for whether recurrent connections with epileptiform activity, or physiological network repair with normal activity patterns, emerge after brain lesions. This insight may help find novel molecular targets for pharmacological treatments to prevent post-traumatic epilepsy, which are urgently needed as post-traumatic epilepsy is often impervious to medical treatment (Herman, 2002; van Breemen et al., 2007).

#### **4.5. HOMEOSTATIC STRUCTURAL PLASTICITY AS AN ORGANIZING PRINCIPLE FOR BRAIN REPAIR**

Homeostatic structural plasticity is a new concept for network reorganization, with large implications for understanding and stimulating brain repair after lesions. Models of homeostatic structural plasticity can help integrate recent clinical findings on changing brain topology after a variety of pathologies, including stroke, Alzheimer's disease and multiple sclerosis. These models can assist us in uncovering the mechanisms underlying functional reorganization and in finding biomarkers for successful brain repair, such as an increased betweenness centrality of brain regions deprived of input from primary lesion sites. Most importantly, however, homeostatic structural plasticity puts functional reorganization of brain networks into a different light. The predominant dogma of plasticity is still Hebbian plasticity, with its "fire together, wire together" slogan (Hebb, 1949). With Hebbian plasticity, enforcing (synchronous) activity strengthens synapses. By contrast, the homeostatic nature of structural plasticity implies the need for a moderate level of activity, because the formation of axonal and dendritic structures is maximal for activity levels slightly below a desired set-point of electrical activity. We postulate that the brain has the highest plasticity for recovery when neurons and brain regions, especially those supporting deafferentated regions in the recovery process, have not yet returned to their homeostatic equilibrium. We might call this initial phase a critical period for brain repair, in analogy to critical periods in neural development. During network development, too, neurons shape their connectivity until desired activity levels are reached (Tetzlaff et al., 2010). As a consequence, in neurorehabilitation treatment, not only stimulation by physical training or direct electrical stimulation but also pauses in treatment may be important. Stimulation may increase electrical activity beyond the homeostatic set-point, inducing pruning of existing synaptic connections, whereas treatment pauses may lower activity and bring activity levels into an optimal range for the formation of new connections (Butz et al., 2009a). Moreover, network reorganization does not always need to be functional; as our model suggests, post-traumatic epilepsy could be the result of miss-wiring or over-compensation. Treatments must therefore focus more on the time course and current state of network repair. Lastly, largescale computer models, such as those developed in the context of the human brain project (www.humanbrainproject.eu) will, once structural plasticity has been incorporated, be valuable tools in finding and testing treatment strategies for patients with brain damage.

#### **ACKNOWLEDGMENTS**

This work was supported by the Helmholtz Association through the Helmholtz Portfolio Theme "Supercomputing and Modeling for the Human Brain" and by the NETFORM project (grant number 635.100.017, awarded to Arjen van Ooyen) of the Computational Life Sciences program of the Netherlands Organization for Scientific Research (NWO).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnana. 2014.00115/abstract

#### **REFERENCES**


evaluation by graph analysis of synchronization matrices. *Clin. Neurophysiol.* 117, 2039–2049. doi: 10.1016/j.clinph.2006.05.018


states in humans. *J. Physiol. (Lond.)* 591(Pt 1), 17–31. doi: 10.1113/jphysiol.2012.243469


neurite outgrowth. *Neural Process. Lett.* 3, 123–130. doi: 10.1007/BF00 420281


in primary visual cortex. *Neuron* 64, 719–729. doi: 10.1016/j.neuron.2009. 11.026

Yin, D. D., Wu, J., Xie, C. B., Cao, Z. P., Lü, Q., and Zhang, R. Q. (2013). [Comparison of three-dimensional fluorescence characteristics of two isomers: phenanthrene and anthrancene]. *Guang Pu Xue Yu Guang Pu Fen Xi* 33, 3263–3268.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 July 2014; accepted: 24 September 2014; published online: 16 October 2014.*

*Citation: Butz M, Steenbuck ID and van Ooyen A (2014) Homeostatic structural plasticity can account for topology changes following deafferentation and focal stroke. Front. Neuroanat. 8:115. doi: 10.3389/fnana.2014.00115*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Butz, Steenbuck and van Ooyen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Parametric Anatomical Modeling: a method for modeling the anatomical layout of neurons and their projections

#### *Martin Pyka1,2\*, Sebastian Klatt 1,3 and Sen Cheng1,2*

*<sup>1</sup> Department of Psychology, Mercator Research Group "Structure of Memory," Ruhr-University Bochum, Bochum, Germany*

*<sup>2</sup> Faculty of Psychology, Ruhr-University Bochum, Bochum, Germany*

*<sup>3</sup> Faculty of Electrical Engineering and Information Technology, Ruhr-University Bochum, Bochum, Germany*

#### *Edited by:*

*Hermann Cuntz, Goethe-University and Ernst Strüngmann Institute (in Cooperation with Max Planck Society), Germany*

#### *Reviewed by:*

*Alberto Mazzoni, Scuola Superiore Sant'Anna, Italy Peter Jedlicka, Goethe University, Germany*

#### *\*Correspondence:*

*Martin Pyka, Mercator Research Group "Structure of Memory", Department of Psychology, GA 04/40, Universitätsstr. 150, 44801 Bochum, Germany e-mail: martin.pyka@gmx.de*

Computational models of neural networks can be based on a variety of different parameters. These parameters include, for example, the 3d shape of neuron layers, the neurons' spatial projection patterns, spiking dynamics and neurotransmitter systems. While many well-developed approaches are available to model, for example, the spiking dynamics, there is a lack of approaches for modeling the anatomical layout of neurons and their projections. We present a new method, called Parametric Anatomical Modeling (PAM), to fill this gap. PAM can be used to derive network connectivities and conduction delays from anatomical data, such as the position and shape of the neuronal layers and the dendritic and axonal projection patterns. Within the PAM framework, several mapping techniques between layers can account for a large variety of connection properties between pre- and post-synaptic neuron layers. PAM is implemented as a Python tool and integrated in the 3d modeling software Blender. We demonstrate on a 3d model of the hippocampal formation how PAM can help reveal complex properties of the synaptic connectivity and conduction delays, properties that might be relevant to uncover the function of the hippocampus. Based on these analyses, two experimentally testable predictions arose: (i) the number of neurons and the spread of connections is heterogeneously distributed across the main anatomical axes, (ii) the distribution of connection lengths in CA3-CA1 differ qualitatively from those between DG-CA3 and CA3-CA3. Models created by PAM can also serve as an educational tool to visualize the 3d connectivity of brain regions. The low-dimensional, but yet biologically plausible, parameter space renders PAM suitable to analyse allometric and evolutionary factors in networks and to model the complexity of real networks with comparatively little effort.

**Keywords: 3d model, functional morphology, hippocampal formation, Blender, NEST, connection patterns, conduction latencies, brain anatomy**

#### **INTRODUCTION**

Computational simulations of neural networks have become an important tool to untangle the relationship between the function of a network and its structural properties. There are several levels on which artificial neural network can capture properties of the biological ideal. At the neuronal level, these are, for example, the spiking dynamics (Dayan and Aboot, 2001), dendritic morphology (London and Häusser, 2005; Cuntz et al., 2010), and the rules underlying structural (Butz and van Ooyen, 2013) and spike-timing dependent plasticity (Morrison et al., 2008). At the network level, connections between neurons and their spatial distances are of particular importance. They can have an influence on conduction delays, which in turn can be functionally important (Carr and Konishi, 1988; Blumberg, 1989; Bartos et al., 2002; Maex and De Schutter, 2003; Soleng et al., 2003; Gong and van Leeuwen, 2007; Buzsáki, 2010; Hu et al., 2012).

Temporal dynamics of neural activity and plasticity rules can be mathematically described with comparatively great accuracy and they can be efficiently translated into a programming language. Several well-established tools, like Neuron (Hines and Carnevale, 2001), GENESIS (Beeman, 2005), NEST (Eppler et al., 2008), and Brian (Goodman and Brette, 2008) can be used for this purpose. In order to integrate spatial properties into the network, e.g., location dependent connections and conduction latencies, specialized tools have been developed. For instance, in Neuroconstruct (Gleeson et al., 2007), neurons with realistic morphologies or abstract probability distributions can be imported or generated. They can be either manually placed in space or distributed based on user-defined functions or across simple geometric shapes (e.g., a cube). Recently, a new tool called NeuralSyns (Sousa and Aguiar, 2014) was presented, which allows the processing of up to 107 synapses and the real-time visualization of spiking activity and connections. Neurons can be placed in space and connected with each other using procedural approaches. For simulations, in which the topographical arrangement of neurons is of predominant importance, the topology toolbox for NEST (Eppler et al., 2008) and Topographica (Bednar, 2009) provide helpful tools to set up 2d sheets of neurons and to connect them with each other using pre-defined kernel functions.

These tools have proven to be of great value in models of local regions in the brain and of connection principles that do not rely on the anatomy of biological brain regions (Gouwens and Wilson, 2009; Rothman et al., 2009; Bednar, 2012; Azizi et al., 2013; Helias et al., 2013; Mattioni and Le Novère, 2013; Stevens et al., 2013). However, they barely support the integration of large-scale anatomical properties obtained from histological and imaging data or tracer studies. In fact, converting anatomical knowledge to a formal description of connections and conduction latencies between a large number of neurons poses to be a very hard problem, as axonal and dendritic projections and the location and orientation of neurons follow complex non-linear patterns. This problem is, for example, very apparent in the hippocampal formation. Besides global axes (such as anterior-posterior, dorsal-ventral) along which the shape of the different layers of the hippocampus can be described, projection patterns within these layers (e.g., CA3, CA1) follow local axes (e.g., proximal-distal, septal-temporal) (Andersen et al., 2006). The orientation of these local axes, however, depends on the shape of the hippocampus along the global axes (e.g., **Figures 1C**, **5C**). Topological relations between CA1 and entorhinal layers remain roughly preserved. However, connection distances between those layers vary widely as different parts of CA1 and entorhinal cortex have different nonlinear tracks and therefore varying distances to each other (Van Strien et al., 2009).

This dependency between the shape, location and projection pathways of neural layers on the one hand and connections and path lengths on the other, can be basically found in the entire brain. The visual pathways are a good example for topological mapping between two distant layers that differ in their anatomical shape (Rodieck, 1979). Cortical layers connect to their immediate neighbors but also to more distant regions via long myelinated axons forming the white matter (Passingham and Wise, 2012).

More and more detailed knowledge about biological neural networks becomes available through numerous independent studies and large initiatives like the Human Brain Project (Markram, 2012) or the data portal of the Allen Brain Atlas (Jones et al., 2009). By contrast, currently available tools for creating 3d neural networks do not provide the possibility to efficiently make use of the vast amount of data that are publicly available.

With Parametric Anatomical Modeling (PAM), we propose a technique and a Python implementation to close this gap. The basic idea of PAM is to trace neural, synaptic and intermediate layers from anatomical data and relate those layers to each

**FIGURE 1 | Illustration of basic concepts in PAM. (A)** *2d* layers define the location of neurons, their projection directions and which neurons form synapses. Probability functions for pre- and post-synaptic neurons are applied on the surface of the synaptic layer to determine connections between the two neuron groups. **(B)** A layer is defined as a 2d manifold (a deformed 2d surface) in 3d Euclidean space (upper part). Each point on the surface is therefore described by x, y, and z coordinates. The relative positions on the flattened

surface can be described in uv-coordinates (lower part) which may correspond to anatomical axes. This example depicts a rough sketch of CA1-3. **(C)** A simplified example of the outlined idea for the visual pathway from the retina (green surface) to the lateral geniculate nucleus (LGN, yellow) (proportions do not match). Different mapping techniques (see chapter "Mapping") allow a location-dependent mapping of a neuron position in the pre-layer onto the synaptic layer (red layer) of the left and right LGN, respectively.

other. With a set of mapping techniques, complex relationships between those layers can be defined to determine how axonal and dendritic projections traverse through space and where synapses are formed. A powerful feature of PAM is that spatial relations between and within layers can be combined to derive connections and distances between neurons. Furthermore, two- and three-dimensional experimental data (e.g., gene expression maps, marked neurons) can be integrated in the model to describe neural density or functional properties of neurons. As a side effect, neural networks created using PAM are of high educational value as the depiction of neural layers created from anatomical data along with the selective visualization of axons, dendrites and synapses can be explored in 3d and clearly demonstrate how the layers are wired.

In the following, we first introduce the principles that PAM is built on and then an implementation of PAM in the 3d software Blender. Subsequently, we apply PAM to build a model of the hippocampal formation in the rat. Finally, we find that these models can lead to new insights about brain structures and, potentially, functions.

#### **THE MODEL**

#### **THE BASIC CONCEPTS**

The most important concept in PAM is the "layer" (**Figure 1**). A layer is a two-dimensional grid-like structure that can be deformed in 3d space to resemble any anatomical layer in the brain. Layers are the structures that can be directly created from anatomical data to denote, for instance, the location of pre- and post-synaptic neurons, synaptic layers (SLs) and intermediate layers that help to define the trajectories of axons and dendrites. Using a simple set of mapping techniques (see below), various relations between layers can be described in order to create location-dependent trajectories of neurons in 3d space. These trajectories are used to determine connections and distances between neurons which may affect the transmission delay. As will become apparent in the following, when we use PAM to implement a model of the hippocampal formation, complex connection patterns between layers can be expressed easily.

Note, that the wiring of the network is defined solely on the level of layers and not for single neurons. This approach corresponds to the notion that in real networks 3d patterns define where in space precursor cells proliferate and in which directions axon and dendrite cones grow. PAM uses these low-dimensional but biological plausible categories to define the architecture of neural networks. Groups of identical neurons are then distributed over the layer with a given density. Their connections to other neurons is a result of their relative location to other neurons and their projection direction across intermediate and SLs. PAM does not include a developmental component such as structural development through gene regulatory networks or cell migration. Instead, our approach rather aims to understand the functional implications of the developed structure.

A network is defined by mappings between pre-synaptic and a post-synaptic neural layer (green and yellow layer in **Figures 1A,C**) on SL (red layer in **Figures 1A,C**). A layer can be involved in an arbitrary number of mappings and the same layer can be both the pre- and post-SL of a mapping to account for recurrent connectivity. Thereby, projections of different neuron groups located on the same layer to different regions in the network can be described. The definition of relations between layers is a general form that describes how dendrite and axons traverse through layers until they form synapses. With these definitions, the corresponding position on the intermediate and SL can be computed for any point on a pre- or post-SL.

Connections between pre- and post-synaptic neurons are determined by probability functions that define for any relative position on the SL its probability for generating synapses (**Figure 1A**, red layer). Using probability functions reflects the assumption that all neurons on a layer that belong to the same neuron type have the same genetic code and emerged through cell proliferation. The individual morphological structure of each neuron is an instance of a general connectivity pattern that the neuron encodes influenced by other factors, such as structural and synaptic plasticity. The probability functions represent the general connectivity pattern.

A special feature of layers in PAM is that any point on the layer can be described by its xyz-coordinates in Euclidean space and in surface-coordinates, commonly called uv-coordinates in 3d graphics. uv-coordinates are generated when the 3d mesh of a layer is unfolded until all points of the layer are mapped onto a 2d plane (**Figure 1C**). As we will see in the next chapters, this transformation can be used to determine distances and connections on the surface level. Moreover, it allows to describe anatomical properties either along the spatial axes (xyz) or along anatomical axes (uv) which are not necessarily straight (like the proximal-distal axis in the hippocampal formation).

#### **MAPPING**

A central feature of PAM is that through various mapping techniques spatial relations on the surface of layers and spatial distances between layers can be combined to compute connections and distances between neurons. The top row in **Figure 2** depicts the four types of mappings between two layers. In the following, we explain each mapping in more detail and outline its use cases.

#### *Topological mapping*

When two layers have the same internal topology (e.g., identical number and ordering of vertices and definition of quads and triangles), for any point on the first layer its corresponding position on the second layer can be directly computed. This mapping technique is useful whenever topological relations between neurons should be preserved independent from the origin and target location of axons and dendrites in space. The most obvious example for this is the mapping between photoreceptor cells in the retina to V1, where intermediate layers could be used to layout the realistic trajectory of the fibers to the visual cortex. But also the mapping of the dentate gyrus layer on a SL around CA3 in the hippocampus could make use of topological similarities to constrain the axonal projections along the septo-temporal axis.

#### *Normal mapping*

Any point *p* on a layer *X* is mapped on another layer *Y* by computing the intersection between the line normal to *X* through the point *p* and layer *Y*. If there is no intersection, there is no connection. This mapping technique can be used when the projection

direction of neurons solely depends on the layer it is located in (e.g., cortical layers). Furthermore, this mapping technique can be helpful to selectively map subareas of a layer onto certain target regions (e.g., connections from the lateral and medial entorhinal cortex to different parts of the dentate gyrus, see exemplary demonstration section).

#### *Euclidean mapping*

Euclidean mapping computes for a given point *p* on the first layer the closest point on the next layer. Such a mapping can be useful, when the relative position of neurons on the first layer and its proximity to the target layer determine their entry direction on the target layer. This can be helpful if the curvature of layers in space do not allow a reliable mapping between layers based on normal mapping.

#### *Random mapping*

The random mapping maps a point *p* on one layer to an arbitrary location on the next layer. This mapping is useful when the projection kernels of neurons are well-defined while the axonal or dendritic projections through space are randomly distributed across brains.

#### *Distance calculation*

The connection distance between a pre- and a post-synaptic neuron along the axon and dendrite is an important piece of information, for instance, when conduction latencies should be part of the network simulation. PAM includes several methods to measure the distance between a neuron and a synapse incorporating spatial distances on a layer and between layers (**Figure 2**). UV-distances are needed when neurites grow along a certain layer that is curved in 3d space. The projections of pyramidal cells in CA3, for example, traverse the stratum oriens and stratum radiatum in CA3 and CA1, which has a strong effect on the overall pathlength between pre- and post-synaptic neurons (Andersen et al., 2006). A similar effect can also be found in the projections of pyramidal cells in the cortical layers V and VI (Passingham and Wise, 2012).

Euclidean distances between two layers correspond to connections along the shortest paths in space. These can be connections through the whole nervous system, like thalamo-cortical connections or sensory pathways but also more direct connections between cortical layers.

Note that the distances computed by any of these methods represent estimates of the lower bounds since the convoluted morphology of real dendrites and axons may result in longer pathways and therefore longer latencies between two endpoints.

Electrophysiological studies have shown some variability in the conduction latency per mm (Ferster and Lindström, 1983; Swadlow, 1994; Soleng et al., 2003). The assumption in PAM is that this variability emerges as a result of variability in the development that may lead to, e.g., different neurite lengths, different degrees of myelination, etc. To account for this variability, the conversion from connection length to conduction latency in PAM introduces a certain degree of variance based on experimental data.

#### **CONNECTIVITY KERNELS**

Synaptic connections have to follow functional as well as anatomical constraints. Synapses have a physical location in space and, more often than not, pre-synaptic neurons connect preferentially to post-synaptic neurons in certain locations (Passingham and Wise, 2012). To model both the connectivity preference and the spatial distribution of synaptic connections, we employ the following method. Pre- and post-synaptic neurons are assigned spatial locations in the SL in uv-coordinates, *<sup>z</sup>*pre and *<sup>z</sup>*post, respectively. This position is somewhat arbitrary and becomes meaningful only together with the connectivity kernel *p (z*|*z*neuron*)* that determines the probability of a neuron forming synapses in location *z* (**Figure 3**). Roughly speaking, the kernel models the reach of the dendritic tree or the axon, and the density with which synapses are formed. An arbitrary number of parameters can be integrated in the kernels to further parameterize it. In general, the shape of the kernel might depend on the position of the neuron x, y, z and/or uv-coordinates and can be defined by the user. PAM currently includes a few connectivity kernels, such as a 2d-Gaussian distribution, or a 1d-Gaussian distribution along a local anatomical axis. The user can easily add new kernel functions (e.g., a power law distribution) by creating python module in the kernel folder (see gaussian.py in the code for a template).

To make the problem of determining synaptic connections more tractable, we assume that the probability of having a

**FIGURE 3 | Two examples for connectivity kernels.** Arbitrary connectivity kernels can be defined to generate synapses between pre- and post-synaptic neurons. Kernel functions are mapped onto the synaptic layer and define the probability for a neuron to form synapses at a relative position in the synaptic layer. Illustrated are two different kernel functions (green shading) for two pre-synaptic neurons (green dots) and the potential post-synaptic partners (yellow dots), which have their connectivity kernels (not shown). The joint probability of pre- and post-synaptic kernels determines if and where a synapse if formed.

synaptic connection is the product of the pre- and post-synaptic connectivity kernels.

$$\operatorname{p}\left(z|z\_{\text{pre}},z\_{\text{post}}\right) = \operatorname{p}\left(z|z\_{\text{pre}}\right)\operatorname{p}\left(z|z\_{\text{post}}\right) \tag{1}$$

The task of finding synapses is equivalent to sampling from this distribution, which is simple to implement.

The general form of the function also allows us to define connectivity kernels in which the position of the neuron on the surface influences the shape of the kernel. Thereby, anatomical axes (e.g., the proximal-distal axes in the hippocampal formation) can be integrated in the definition of the kernel (see Discussion for more details).

However, in a network with realistic numbers of neurons and synapses, the computation of synaptic connections can be very

time consuming. If every potential connection between *n* presynaptic and *m* post-synaptic neurons is evaluated at *c* spatial locations, the computational effort scales as *O(cnm)*, a large number even in a small rat brain. In some cases where modeling the connectivity precisely is important, there might be no other alternative. If, on the other hand, the details are less important than the gross features of the connectivity, we can use an approximate sampling algorithm that provides a trade-off between mathematical accuracy and computational efficiency.

If synaptic connections are formed sparsely, we can save computational time by systematically skipping partners that have a very low connection probability. The specific algorithm is as follows.

Step 1: The SL is divided into *c* bins (**Figure 4**), where *c* is chosen appropriately depending on the number of neurons and the size of the connectivity kernels.

Step 2: Each post-synaptic neuron is mapped onto the SL and added to every bin *zi* in which the connectivity kernel exceeds a certain threshold, i.e., *p* - *zi*|*z*post ≥ *p*<sup>0</sup> (see Section Methods). The values *p* - *zi*|*z*post are stored with the neuron id in the bin for later use.

Step 3: Each pre-synaptic neuron is mapped on the SL and we sample as many times from its connectivity kernel *p* - *z*|*z*pre as we need to generate synapses for this pre-synaptic neuron (**Figure 4B**). Each sample yields a bin in the SL *zj*, in which the pre-synaptic neuron forms a connection. This sampling can be further sped up by skipping low-probability bins, for which *p* - *z*|*z*pre ≤ *p*1.

Step 4: From each bin in the previous step, we determine the post-synaptic neuron to connect by sampling from *p* - *z*post|*zj* . These probabilities are related to the probabilities stored in Step 2 through Bayes' theorem

$$\operatorname{p}\left(z\_{\text{post}}|z\_{\text{j}}\right) = \operatorname{p}\left(z\_{\text{j}}|z\_{\text{post}}\right)\frac{\operatorname{p}\left(z\_{\text{post}}\right)}{\operatorname{p}\left(z\_{\text{j}}\right)}\tag{2}$$

Since *p* - *z*post is the same for every post-synaptic neuron *p* - *z*post|*zj* ∝ *p* - *zj*|*z*post .

The computational costs for this algorithm scales with the number of synapses *s* = α*nm*, which is significantly better than the exact algorithm because for large networks <sup>α</sup> is generally a small value.

#### **METHODS**

#### **IMPLEMENTATION OF THE FRAMEWORK**

PAM is a general approach to generate artificial neural networks based on anatomical data. To apply this technique, tools are needed to model and define the relationships between the layers. Therefore, we developed the functionality for defining parametric anatomical models (PAMs) in the open source 3d software Blender1. Using an existing 3d software for creating PAMs has the advantage that most of the tools for creating 3d layers are already implemented. **Figure 5** lists some of the functions that are

to generate, bins are samples following the connectivity kernel of the pre-synaptic side. **(C)** In each selected bin, a post-synaptic neuron is randomly selected, incorporating the probabilities for the post-synaptic neurons.

of particular relevance for creating PAMs and that are generally implemented in most 3d tools. Most importantly, duplication of layers make it easy to map points between layers with arbitrary, but identical, shapes (**Figure 5F**). Furthermore, important for PAM is that 3d shapes can be unfolded to assign non-linear axes to the object (**Figure 5C**). The development of neuroscientific tools, such as Py3DN (Aguiar et al., 2013) and BrainBlend (Pyka et al., 2009), and tools for other disciplines, such as BioBlender (Andrei et al., 2012) and MORSE <sup>2</sup> as Blender add-ons, suggest that Blender could become a unifying Python-based platform for developing scientific tools.

<sup>1</sup>http://www*.*blender*.*org

<sup>2</sup>http://www*.*openrobots*.*org/wiki/morse/

**FIGURE 5 | Some functions of Blender that are important for PAMs. (A)** Various modeling techniques and non-distructive modifiers (like the Mirror- or Subdivision-modifier) allow an efficient creation of *3d* models of anatomical regions. **(B)** Anatomical slices along with transparency values can be displayed for easier *3d* tracking of neural layers (here the Hippocampus Brain Atlas). **(C)** *3d* objects can be unwrapped on a *2d*

plane to assign non-linear anatomical axes to them. **(D)** Textures can be used to define the probability distribution of neurons or synapses along xyz- or uv-axes. **(E)** Using the shrink-fatten operator, layers can be easily generated from existing layers. **(F)** Duplicates of layers make it easy to map locations of one layer on other layers, as their internal ordering of vertices and edges is the same.

There were additional reasons for implementing PAM in the Blender environment. Because of its strong support for Python and its open application programming interface (API), Blender can be used as an integrated development environment for creating new tools and amending existing tools such as NEST. PAM for Blender consists of a set of add-ons and Python modules that extend the functionality of Blender to generate and relate anatomical layers to each other and to create neural networks for the networks simulator NEST. These tools along with example files and video tutorials are freely available <sup>3</sup> . In the following, we give a short introduction into the available tools by explaining the workflow for creating PAMs.

#### *Creating anatomical layers*

First, layers need to be created that define the location for the cell bodies of neurons and for their synapses. Depending on the brain region, intermediate layers might be included to describe important landmarks for the trajectories of neurites. Since 2d and 3d images can be imported into Blender, atlas data or anatomical data, such as histological images or 3d data acquired through computer tomography or magnetic resonance imaging, can be used to support the modeling process. The depiction of metric units within the modeling environment allows to model the 3d structures with the correct scaling. All layers can be automatically unfolded to make uv-coordinates for the layers available. This part relies on Blender's internal tools and requires some modeling skills. However, once a brain region has been modeled as layers, it can serve as a template for a variety of neural network models.

#### *Setting neural parameters*

The traced anatomical layers already allow first inferences. For example, the user can obtain the surface area of the layers and calculate the total number of neurons hosted by the layer, given for example the neural density per mm2. For each neuron group in a layer, the number of neurons that should be used in the

<sup>3</sup>http://cns*.*mrg1*.*rub*.*de/index*.*php/software (will be available upon acceptance of article).

simulation, can be defined. The PAM add-on for Blender provides a user interface for calculating the surface area, number of neurons and for visualizing the connectivity kernels on the SL (**Figure 6**). However, everything can also be set up using Python scripts. The cell bodies of the neurons are usually homogenously distributed over the surface. Additionally, using build-in functions of Blender, 2d and 3d textures (like gene expression maps or gene marker data) can be mapped on the surface of the layers to determine the location-dependent density of neurons.

#### *Creating mappings*

Each layer can host several neuron types which in turn can connect to several regions. Each mapping is defined by




Note, that a 3d-layer can have multiple roles in the definition of a mapping. For example, it can be pre- and post-SL, and technically even the SL at the same time. Therefore, recurrent connections can be described using the same syntax as feedforward connections. We provide PAM modules for defining connections and computing the mapping, the synapses between neurons and their connection lengths. Furthermore, neurons and connections can be visualized to obtain a qualitative impression of the setup and to manually adjust the connectivity kernels. Several video tutorials and a wiki on the project website document how connections in PAM can be defined to rebuild connectivity patterns of real neural networks (http://cns*.*mrg1*.*rub*.*de/index*.*php/software).

#### *Export connectivities and distances*

Connections and distances between neurons can be exported as CSV-file or as Python pickle-file for further processing in an arbitrary environment. The generation of connections and conduction delays is separated from the generation of neural properties to allow users to work with the simulation environment that meets their demands. After connections in Blender are defined on the level of layers using PAM, connections and distances between neurons can be computed based on given number of neurons and synapses per layer and the projection kernels for axons and dendrites. As a proof-of-concept, we implemented an importer for the neural network simulator NEST to run neural network simulations based on networks generated by PAM (see Results).

#### **IMPLEMENTATION OF THE HIPPOCAMPAL MODEL**

In the following, we demonstrate how PAM can be used to model connectivity patterns and distances between neurons based on neuroanatomical data. Note that we make no claim that this hippocampus model is complete. In our analyses, we focus on the connections between DG and CA3, CA3 and CA3, and CA3 and CA1 to reveal that with PAM structurally important features of the anatomy can be identified and incorporated in a computational model.

The projection patterns of the connections between entorhinal cortex and the hippocampal formation are also a very good example to demonstrate the benefits of PAM (e.g., Figures 3–34 and 3–41 in Andersen et al., 2006) but we do not feel confident enough in our understanding how axonal projections from the entorhinal cortex exactly enter the hippocampal formation and how the axonal projections of CA1 and subiculum project back in 3d space. Therefore, we limit ourselves to modeling the topographic relations between entorhinal cortex and the hippocampal formation in PAM. Getting the spatial form of the axonal projections right can be accomplished in the future by adding additional intermediate layers.

#### *Data*

The hippocampus [including dentate gyrus (DG), CA3, CA1 and subiculum] and the entorhinal cortex (medial EC, lateral EC, perirhinal cortex) were modeled based on publicly available data. The neural layers were traced in alignment with the atlas data of the Rat Hippocampus Atlas <sup>4</sup> (Kjonigsen et al., 2011) and the 3d surface model <sup>5</sup> by Ropireddy et al. (2012). The neural layers of the hippocampal formation were first traced slice by slice in coronal sections. When three-dimensional shapes are modeled in this way, regions between the slices can become very irregular due to misalignment and deformation of the slices. Therefore, in a second step the neural layers were recreated by placing vertices and edges along the natural shape of the neural layers.

Subsequently, synaptic and intermediate layers were created to define the connections. We based the model on the following reference: for the overall picture (Andersen et al., 2006), for more detailed information (Van Strien et al., 2009) and the hippocampome-project <sup>6</sup> . Additionally, the Allen Brain Atlas (Jones et al., 2009), which contains a fully annotated atlas for the mouse, was consulted to extrapolate data in cases where rat data were not available to us.

#### *Demonstration of the modeling advantages*

In the following, we explain how previously hard to define connectivity patterns can be generated with PAM based on anatomical data.

*Connections from entorhinal cortex to DG.* Neurons in the superficial layers of the lateral and medial entorhinal cortex (LEC and MEC) project to DG (and to CA3 and CA1). More specifically, the lateral and caudiomedial part of the LEC/MEC network projects to the septal half of DG and two more rostral areas project to the third and fourth quarter of the dentate gyrus in the temporal half (Andersen et al., 2006). To mimic the grid like density of pyramidal cells in the entorhinal cortex (Ray et al., 2014), a Voronoi-like procedural texture was generated to define the location dependent density of neurons on this layer (**Figure 5D**). Of course, textures generated from tracer studies could be used here to generate this effect more accurately.

Intermediate layers were placed to sketch out the perforant pathway. Three subareas of the entorhinal cortex connect to different parts of the dentate gyrus. **Figure 7A** shows the mapping for the most caudal and lateral band to the septal portion of DG. To construct the mapping, we took advantage of the different mapping techniques in PAM. The complex relationships described in the following are also explained in a video (see http://cns*.*mrg1*.*rub*.*de/index*.*php/software). First, we created a layer (IL1) that served as a mask for the caudal lateral part of LEC/MEC, which we wanted to map to the septal portion of DG. Between LEC/MEC and IL1, we used normal mapping to ensure that only those neurons located in the caudal lateral part of LEC/MEC will project to the SL. In general, normal mapping is a helpful tool to project only a subgroup of neurons on another layer or to change the mesh structure of the layer. Additional intermediate layers (IL2 and IL3) were added to define the geometry of the perforant path. The mesh topology of IL2 and IL3 were identical to that of IL1, while their shapes were deformed to match the connection pathways of the LEC/MEC neurons. The mesh topology is used to define the relative position of the neuron projections in each layer, and the shape of the layer defines the position in 3d space. The mapping between layers IL1, IL2, and IL3 was topological since they all had the same mesh topology. From IL3, neuronal projections enter the SL using Euclidean mapping. Topological mapping could not be used here, as SL is a copy of the DG layer and, therefore, does not have the same mesh topology as IL3. Instead, using Euclidean mapping, neuronal projections on IL3, which are distributed along the septo-temporal axis, enter SL at the most posterior point of DG. From there, the connectivity kernel defines that synapses for a particular neuron can be generated along the whole proximo-distal axis of the SL (see green area on SL in **Figure 7A**). Normal mapping is used between SL and DG to include just the upper septal part of DG.

Note, that the spatial form of the axonal projections is only roughly sketched out in this example. In a similar manner, projections from entorhinal cortex regions to different portions of CA3 and CA1 could be modeled, but are currently not included in this model. Note, that the spatial form of the axonal projections is only roughly sketched out in this example.

*Intra-hippocampal connections.* Granule cells in DG project to pyramidal cells in CA3, which in turn have recurrent connections and projections to CA1 cells. While connections of CA3 neurons cover nearly the entire proximal-distal axis in the hippocampal loop, their coverage along the septo-temporal axis is restricted (Ropireddy and Ascoli, 2011). Using PAM, a SL was placed between DG and CA3 (**Figure 7B**). As neural projections from DG should enter the SL on their shortest path and traverse along the proximal-distal axis, Euclidean mapping was used between DG and the SL. We also used Euclidean mapping between the SL and CA3, as the SL also does not have the same mesh topology as CA3 and we wanted to be sure that every CA3 neurons projects to the SL.

For the recurrent and forward projections of CA3, a SL covering CA3 and CA1 was created with normal-based mapping (**Figure 8A**). Since the SL is very close to CA3 and CA1, normal-based and Euclidean mapping yield very similar results, in particular with large connection kernels on the SL.

*Output projections.* Pyramidal cells on the more proximal part of CA1 project to more distal parts of the subiculum and vice versa (Amaral et al., 1991). In PAM, this can easily be modeled by creating a copy of the CA1 layer, mirroring it in the caudal-rostral axis and deforming it to a SL over the subiculum (**Figure 8B**). As neural projections from CA1 are mapped via topological mapping on the SL, the mesh layout of the SL is used to describe the projection targets of CA1 on subiculum. Since the meshes of the SL and subiculum do not have the same topology, normal mapping is used.

<sup>4</sup>http://cmbn-approd01*.*uio*.*no/zoomgen/hippocampus/home*.*do

<sup>5</sup>http://krasnow1*.*gmu*.*edu/cn3/hippocampus3d/

<sup>6</sup>http://www*.*hippocampome*.*org—the authors are aware, that this project is still in alpha-stadium. Therefore, data were checked in the provided reference material.

CA1 and subiculum cells project back to the deep layers of LEC and MEC roughly maintaining the topological order of the cells along the septo-temporal and proximal-distal axis (Amaral et al., 1991). In PAM, this can be achieved, for instance, by normal-based mapping between the subiculum layer and an intermediate layer, which contains more subdivisions. Here, normal-based mapping is just used for a simple 1-to-1 mapping from one layer onto another with a different internal organization. A copy of this layer is deformed to match the SL close to the entorhinal cortex. Because of topologically identical shapes, projections between the intermediate layer and the SL can be directly determined. From the SL, normal-based mapping provides the link to the entorhinal layer (**Figure 9**).

of neurons based on texture information. Bottom: Conceptual view on the connection between LEC/MEC and DG. Colors match the 3d depiction of the top image. Using normal mapping, only a subset of

#### *Neuron and synapse numbers*

In order to demonstrate that PAM can model anatomically relevant features, we will compare patterns of connectivity matrices and connection length distributions. To assess these data, it is not crucial to include realistic numbers of neurons and synapses into the model. However, the ratios of neuron numbers in different regions in our model matches experimental estimates (Amaral et al., 1990; West et al., 1991; Mulders et al., 1997; Cutsuridis et al., 2010). The total number of neurons were scaled by a factor of 0.001 (**Table 1**). The number of synapses per post-synaptic neuron are roughly based on experimental estimates but scaled up to allow for spike-propagation in the hippocampal loop (see last experiment).

synaptic layer coats CA3. Neurons in Dentate Gyrus (DG) project directly onto the synaptic layer where they can build synapses along

#### **RESULTS**

#### **QUALITATIVE VIEW ON THE MODEL**

the entire proximal-distal axis.

The Python implementation of PAM contains functions to visualize connections, unconnected neurons and synapse locations. **Figure 10** shows the reconstructed neural layers of the hippocampal model and some visualizations of the connections computed by PAM using the intermediate layers described in the previous chapter.

#### **THE IMPORTANCE OF LAYER MORPHOLOGY AND DISTANCE CALCULATIONS**

A crucial question is whether two key features of PAM, the modeling of the 3d shape of neuronal layers and the realistic calculation of connection distances, are important for the inferred connectivity patterns and distances. For illustration, we compared the connectivity and distance matrices describing DG-CA3, recurrent CA3, and CA3-CA1 connections in two models of the hippocampal formation. The reconstructed model incorporates the realistic shapes of neuronal layers in the hippocampal formation and was reconstructed in PAM from anatomical data (**Figures 10**, **11**, left 3d model). This model is contrasted with a simple model that approximates the gross anatomical shape of the hippocampal formation as two half tubes (**Figure 11**, right 3d model). The

simple model represents what can be generated with previously available tools such as, for example, NeuralSyns. In both models, equal numbers of neurons were homogeneously distributed over the layers and connected with equal numbers of synapses and the same connectivity kernels were used.

#### *Impact on connectivity matrices*

We examined how the morphology of the neural layers affected the connectivity between neurons. One way to visualize the connectivity are the connectivity matrices between two layers, where the neurons are sorted along the septo-temporal axis of the hippocampus (**Figure 11**, scatter plots). Both models produce connectivity matrices that are very sparse and locally restricted, as evident in the concentration of connections around the diagonal. However, in the reconstructed model, the spread along the diagonal of the connectivity matrix is wider than in the simple model. To investigate this spread in more detail, we computed the index differences between the pre-synaptic neurons and their post-synaptic targets in four regions along the septo-temporal axis (**Figure 11**, histograms). The index differences were calculated as the difference between the pre-synaptic index and the post-synaptic index. The connectivity spread in the reconstructed model is significantly wider than in the simple model for all anatomical subdivisions. In addition, there is another marked difference between the simple model and the reconstructed model. In the simple model, the distributions remain equal along the septo-temporal axis whereas variations are recognizable in the reconstructed model. For example, neurons in the most septal part of DG project to wider areas in CA3 than DG neurons in more temporal parts do (**Figure 11**, red vs. orange, yellow, or green distribution). A similar patterns of septo-temporal heterogeneity is seen for recurrent CA3 and CA3-CA1 connections.

The reason for the wider connection spread in the reconstructed model is indeed the anatomical shape of the neural layers. For example, while in the simple model, the length of CA3 at the proximal and distal part is equal, in the anatomical hippocampus, CA3 is longer at the distal end than at its proximal end. Since neurons are homogeneously distributed across the proximal-distal axis, any segment of CA3 along the septotemporal axis must contain more neurons in the distal part than in the proximal part (**Figure 12**). Furthermore, since neurons in both models form the same number of connections, the projection of neurons must spread further along the septo-temporal axis in the reconstructed model than in the simple model. The other observation that the connectivity spread in the reconstructed model depends on the septo-temporal location, can be accounted for by a change in the proximal-distal asymmetry along the septo-temporal axis.

#### *Significance of distance computation technique*

Next, we studied how the connection distance depends on the morphology and the distance computation model in three

**Table 1 | Number of neurons and connections in the hippocampal formation used in this study.**


*The numbers are scaled based on anatomical studies in rats (Amaral et al., 1990; West et al., 1991; Mulders et al., 1997; Cutsuridis et al., 2010). Estimated neuron numbers are given in parenthesis. For the connectivity, the numbers in the columns denote the number of incoming connections that one cells receives from the regions listed in the left most column. The estimated number of incoming connections as reported in the literature (if available) is given in brackets. For example: each DG neuron in our model receives input from 38 sEC neurons. In the rat DG, it is estimated, that each neuron receives input from 3520 of sEC neurons (Cutsuridis et al., 2010).*

different scenarios. In the first and second scenario, we used the reconstructed model and computed distances between neurons, respectively, based on the mapping techniques unique to PAM and based on Euclidean distance, which was available in previous tools. In the third scenario, representing the conventional

approach, we used Euclidean distance in the simple model. Distance histograms were generated for DG-CA3, CA3-CA3, and CA3-CA1 connections (**Figure 13**).

All pair-wise comparisons between distance distributions created from PAMs and the other models using the Kolmogoroff– Smirnoff test showed significant differences (*p <* 10<sup>−</sup>50). These exceptionally low *p*-values are largely due to the large sample size and belies the comparatively small differences in some of the pairwise comparisons. However, for the CA3-CA1 connections, the distributions of distances are qualitatively different. The advantage of layer-based distance calculation becomes the most apparent for these connections as the axons of CA3 pyramidal cells project along the proximal-distal axis of the cornu ammonis regions rather than traversing directly to the CA1 target neurons (**Figure 13**, bottom).

#### **EFFECT OF SYNAPTIC DELAYS IN NEURAL NETWORK SIMULATIONS**

As a proof-of-concept, we imported the calculated connectivity and distance matrix from the hippocampal loop [superficial EC (sEC), DG, CA3, CA1, Sub, deep EC (dEC)] into a NEST simulation (Gewaltig and Diesmann, 2007). Due to

the lack of detailed knowledge about the projections between entorhinal and hippocampal areas, connection distances represent only a rough average between the minimal and maximal spatial distances between neurons in the entorhinal cortex and dentate gyrus, CA1 and subiculum. In analogy to our previous more abstract model (Pyka and Cheng, 2014), all neurons were modeled as excitatory Izhikevich neurons (Izhikevich, 2003) (*a* = 0*.*02, *b* = 0*.*2, *r*<sup>1</sup> = −65, *r*<sup>2</sup> = 8) without STDP or any other sort of adaptation. Connection weights were manually adjusted to match activity levels observed in experimental studies (Leutgeb et al., 2004; Vazdarjanova and Guzowski, 2004; Tashiro et al., 2007). The weights were sEC-DG: 9 mV, DG-CA3: 5 mV, CA3-CA3: 4 mV, CA3-CA1: 5 mV, CA1-Sub: 4 mV, CA1-dEC: 4 mV, Sub-dEC: 4 mV (see also Supplemental data: hippocampus\_nest/hippocampus.py).

To convert connection distances to conduction delays, the connection distances calculated in PAM were multiplied by 4.36 ms/mm according to experimental measurements of conduction latencies in rats CA3 axons (Soleng et al., 2003). Since variability and neuron-type-specific differences are not incorporated in our model, our results need to be confirmed once more information about conduction latencies and neuron-types becomes available.

We then simulated neural activity in this network by injecting input currents created by Poisson noise into sEC for 10 ms with 50 mV. The currents are sufficient to drive spiking activity in sEC (**Figure 14**). After some delay these spikes in sEC in turn drive spiking activity in downstream CA3, and so on and so forth. The spiking activity finally completes the tri-synaptic loop and reaches the output layer, dEC, after around 120 ms (**Figure 14**), which is somewhat similar to the period of the theta oscillations at about 6–12 Hz.

#### **DISCUSSION**

With PAM, we introduced a technique to use anatomical data to build large scale artificial neural networks with realistic connectivity and conduction delays. In PAM, neural networks are represented by layers, which are related to each other with a set of mapping techniques. The combination of different mapping types allows us to model complex neuronal projections, e.g., between entorhinal cortex and dentate gyrus. PAM offers the unique capability to have local as well as global anatomical axes

influence the connectivity patterns. Furthermore, it can combine distances between layers and within layers to calculate connection distances between neurons.

#### **FEATURES OF PAM AND PREDICTIONS**

PAM is a very efficient approach to model large-scale network structures with complex wiring patterns as it mimics an important property of neural networks and biological structures in general: it indirectly describes the neural network through a lower-dimensional encoding. Spatial structures are defined for the placement and mappings of neurons and projections, rather than specifying the location and connectivity of each neuron. This low-dimensional encoding has two practical advantages. First, for the given complexity of real networks, the human effort it takes to describe position and projection directions is comparatively low. Second, the amount of data generated by the encoding is also low given the complexity of networks that can be created with PAM. Even though PAM does not include a model of the developmental process, it can be used efficiently to represent snapshots of the neural network at certain developmental stages due to the low-dimensional encoding. We believe that in combination with other properties of networks, such as neural dynamics, plasticity, and external inputs, PAM could be a valuable contribution toward a complete description of nervous systems for computational models.

We deliberately kept the computation of connections and connection distances separate from the neural network simulator, so that it is compatible with a wide range of simulation engines. Therefore, researchers who already feel comfortable with the simulator of their choice can add PAM to their workflow for generating the neural network. The connection data generated by PAM can be exported as CSV-files or as binaries using the pickle-modul of Python, which can then be imported by many other programs. An import-script for NEST is included in the downloadable package of PAM.

Based on the reconstruction of the hippocampal formation and computation of connection properties in PAM, we can derive two predictions about the structural properties of the hippocampus. First, the spread of connections is higher at the most septal locations in the hippocampus than at the more temporal locations (**Figure 11**). Second, conduction delays in CA3-CA1 connection have a higher variability than CA3 recurrent or DG-CA3 connections (**Figure 13**). Both predictions can be readily tested experimentally. Future modeling studies are needed to analyse the functional consequence of these anatomical properties.

Furthermore, we found in a preliminary simulation, that the total synaptic delay in the hippocampal formation in the current model is close to the period of the theta oscillation, which dominates the local field potential in the hippocampal formation. While the precise relationship between synaptic delays and theta oscillations needs to be ascertained in the future, we speculate at this point that there might be a correspondence between these two parameters that could account for inter-species differences in theta frequencies. As the neuron distances and, hence, the conductions delays, scale with the size of the hippocampus, we predict a relationship between brain size and the frequency of hippocampal theta, consistent with comparative studies of theta oscillations across nine species (Blumberg, 1989). This allometric relationship cannot be easily explained in models that generate theta oscillations within an isolated subregion (Crotty et al., 2012). It has to be noted that several mechanisms have been already proposed for theta and conceptually the theta frequency does not need to match the traveling time of spikes in the hippocampal loop (e.g., the frequency could be higher than suggested by the loop). However, allometric measures might constrain the range in which spike oscillations can be observed. This might be in particular relevant when brain sizes differ by several orders of magnitude.

#### **MORPHOLOGY-BASED vs. KERNEL-BASED CONNECTIONS**

For networks with spatial dependencies, it is common to use either kernel-based or morpohology-based methods to compute the connectivity and connection lengths between neurons. Kernel-based methods use two- or three-dimensional mathematical functions to define the probability for a neuron to form synapses as a function of the spatial distance to the soma of the neuron. This method provides a fast and efficient way to connect neurons with each other and is widely used in software packages like Neuroconstruct (Gleeson et al., 2007), NeuralSyns (Sousa and

Aguiar, 2014), Topographica (Bednar, 2009), or NEST Topology (Eppler et al., 2008).

On the other hand, increasing effort is invested into generating networks based on realistic morphologies (Halavi et al., 2012). Tools to analyse morphologies, like the commerical software Neurolucida or Py3DN (Sousa and Aguiar, 2014) provide the data to generate artificial neurons with realistic morphology (Ascoli et al., 2001; Eberhard et al., 2006; Cuntz et al., 2011) and to generate networks, e.g., with NETMORPH (Koene et al., 2009) or NeuGen (Eberhard et al., 2006). This line of research is primarily motivated by the fact that the axonal and dendritic morphology can have a functional influence on the dynamics of the neuron (London and Häusser, 2005).

PAM combines kernel-based methods to determine synapses on a SL with structural properties of brain regions, which determine the connection patterns and their lengths. However, different types of branching morphologies of axonal and dendritic trees are not incorporated in this model. The focus of PAM lies more on an efficient translation of large-scale network morphologies to more abstract network simulations, like NEST or Brian, in contrast to GENESIS and NEURON. The unique contribution of PAM here is that topological relations between distant layers, properties along local and global anatomical axes, and anatomical data about connection pathways and cell- and synapse-densities can be modeled and converted into artificial neural networks.

However, these features do not necessarily exclude the incorporation of morphological data. In fact, we belief that approaches for generating neuron morphologies (e.g., Koene et al., 2009; Cuntz et al., 2011) could be amended by anatomical cues derived from PAM to guide the growth of dendrites and axons along layers specified in a 3d model. Thereby, a low-dimensional encoding for neuron-morphologies and network-morphologies could be generated that would allow the study of neural networks on different abstraction levels.

#### **LIMITATIONS OF PAM AND FUTURE PLANS**

Creating PAMs requires more effort as compared to setting up more commonly used network simulations, because the 3d model

has to be build from reconstructions of the anatomical data. However, we think that this effort is justified since our results show that the morphology of the neuronal layers and the mapping of connections have significant effects on connectivity and conduction delays. In addition, once the 3d model has been built, it can be used to generate an arbitrary number of network models for functional simulations due to the parametric nature of the 3d model in PAM. Parameters such as the number of neurons and synapses will have to be adjusted depending on the scientific question pursued and computational power available.

Currently, neurons and synapses are constrained to the neuronal layer, the surface of a 2d manifold in 3d space. For future versions, we plan to implement a parametric way to add a variable offset to the locations of neurons and synapses in a biologically plausible manner. Furthermore, although connections and distances can be calculated on the order of 105 neurons and 10<sup>6</sup> synapses in a reasonable amount of time, we have not yet exploited all possibilities to increase the computational speed. More work will be invested to allow the generation of networks with realistic numbers of neurons and synapses.

#### **POTENTIAL APPLICATIONS OF PAM**

We hope that PAM will help close the current gap between the computational models of neural networks, which tend to be rather abstract (Cheng, 2013) and the anatomical data, which is highly detailed. For a multitude of species, including humans, high quality structural data of the central nervous system are continuously collected and refined (e.g., Jones et al., 2009; Markram, 2012). These data proved useful in studying, for example, network properties (Soleng et al., 2003; Mason and Verwoerd, 2007; Van Strien et al., 2009), functional correlates of structural properties (Carr and Konishi, 1988; Lavenex and Amaral, 2000; Buzsáki and Moser, 2013) or genotype-phenotype relationships (Lein et al., 2007; Thompson et al., 2008). However, few neural network models are generated from structural data, possibly because an effective and powerful method to formally describe and translate those data into neural models was missing so far.

An intriguing possibility that PAM offers is to study the precise functional effect of brain lesions. Since the network models are derived from spatial anatomical data, any kind of local modification in the biological network can be reproduced in the virtual network, and vice versa. For example, controlled or known brain lesions can be simulated anatomically correctly in a virtual network. Alternatively, insights about a virtual lesion in a network simulation based on PAM can serve as prediction for *in vivo* studies. For instance, there is an ongoing debate about the functional differences between the septal and temporal part (or dorsal and ventral part in primates) of the rodent hippocampus (Thompson et al., 2008; Fanselow and Dong, 2010; Segal et al., 2010). With detailed projection patterns between neural layers, the septal and temporal part of the hippocampus can be analyzed separately in computational models generated by PAM. In general, we think that computational and experimental studies of neural networks could be more tightly integrated, if they shared a common anatomical reference frame.

The anatomical reference frame provided by PAM might prove useful in investigating the relationship between size and form of a brain structure on the one side and its impact on connectivity patterns, conduction distances, self-organizing processes and function on the other hand. This is in particular relevant to understanding the emergence of certain networks from an evolutionary point of view. For example, allometric factors can be incorporated into the analysis of networks in scaled up versions of brain structures recreated with PAM. By recreating homologous regions from different species, the functional influence of the scale and the form of a network could be dissociated with the help of computational models.

PAM could be used as a powerful educational and documentation tool. The possibility to visually explore and manipulate the reconstructed model of the hippocampus has helped us tremendously to better understand the structure of the hippocampus and the projection patterns of its neurons. In Blender, it is possible to rotate the 3d model in all three directions, remove certain layers, color layers, including making them partially transparent, and to do many more things. In addition, the Python implementation of PAM includes tools that can facilitate the understanding of the synaptic connectivity patterns. It provides functions to visualize connections and synapse locations. The description of anatomical layers and mappings between those layers as provided by PAM, could serve to collect and document knowledge about neuron locations, densities and axonal and dendritic projections.

#### **CONCLUSION**

We have proposed a new modeling technique, PAM, that can generate neural networks with connectivity patterns and connection distances that are consistent with experimentally measured layer morphologies and complex projection patterns. PAM can also serve as a tool for collecting, systemizing, and visualizing anatomical data. Using a common reference frame for anatomical data would greatly facilitates the transfer of such data and increase their potential impact. It is therefore our hope that computational and experimental neuroscientists alike will find PAM a useful tool for their research.

#### **INFORMATION SHARING STATEMENT**

The Python implementation of PAM, reported in this article, along with some example files of the hippocampal model and the exported data for NEST can be found in the Supplemental Materials. PAM is further under development. The most recent version, a Wiki, and videos can be found at http://cns*.*mrg1*.*rub*.* de/index*.*php/software. We invite the reader to examine the code and contribute to the project.

#### **AUTHOR CONTRIBUTIONS**

Martin Pyka developed the PAM technique, created the hippocampal model, Python implementation of PAM, data analyses, writing the manuscript. Sebastian Klatt Python implementation of PAM, writing the manuscript. Sen Cheng developed the PAM technique, data analyses, writing the manuscript.

#### **ACKNOWLEDGMENTS**

This work was supported by a grant (SFB 874, project B2) from the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and a grant from the Stiftung Mercator.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnana*.* 2014*.*00091/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 May 2014; accepted: 20 August 2014; published online: 15 September 2014.*

*Citation: Pyka M, Klatt S and Cheng S (2014) Parametric Anatomical Modeling: a method for modeling the anatomical layout of neurons and their projections. Front. Neuroanat. 8:91. doi: 10.3389/fnana.2014.00091*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Pyka, Klatt and Cheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Saltatory conduction in unmyelinated axons: clustering of Na<sup>+</sup> channels on lipid rafts enables micro-saltatory conduction in C-fibers

#### *Ali Neishabouri 1,2\* and A. Aldo Faisal 1,2,3*

*<sup>1</sup> Brain and Behaviour Lab, Department of Computing, Imperial College London, London, UK*

*<sup>2</sup> Brain and Behaviour Lab, Department of Bioengineering, Imperial College London, London, UK*

*<sup>3</sup> Faculty of Medicine, MRC Clinical Sciences Centre, London, UK*

#### *Edited by:*

*Julian Budd, University of Sussex, UK*

#### *Reviewed by:*

*Maarten H. P. Kole, Netherlands Institute for Neuroscience, Netherlands Hugh Robinson, University of Cambridge, UK Stephen Waxman, Yale, USA*

#### *\*Correspondence:*

*Ali Neishabouri, Department of Computing, Royal School of Mines, Imperial College London, South Kensington Campus, SW72AZ London, UK e-mail: ali.neishabouri@ imperial.ac.uk*

The action potential (AP), the fundamental signal of the nervous system, is carried by two types of axons: unmyelinated and myelinated fibers. In the former the action potential propagates continuously along the axon as established in large-diameter fibers. In the latter axons the AP jumps along the nodes of Ranvier—discrete, anatomically specialized regions which contain very high densities of sodium ion (Na+) channels. Therefore, saltatory conduction is thought as the hallmark of myelinated axons, which enables faster and more reliable propagation of signals than in unmyelinated axons of same outer diameter. Recent molecular anatomy showed that in C-fibers, the very thin (0.1µm diameter) axons of the peripheral nervous system, Nav1.8 channels are clustered together on lipid rafts that float in the cell membrane. This localized concentration of Na<sup>+</sup> channels resembles in structure the ion channel organization at the nodes of Ranvier, yet it is currently unknown whether this translates into an equivalent phenomenon of saltatory conduction or related-functional benefits and efficiencies. Therefore, we modeled biophysically realistic unmyelinated axons with both conventional and lipid-raft based organization of Na<sup>+</sup> channels. We find that APs are reliably conducted in a micro-saltatory fashion along lipid rafts. Comparing APs in unmyelinated fibers with and without lipid rafts did not reveal any significant difference in either the metabolic cost or AP propagation velocity. By investigating the efficiency of AP propagation over Nav1.8 channels, we find however that the specific inactivation properties of these channels significantly increase the metabolic cost of signaling in C-fibers.

**Keywords: C-fiber, Nav1.8, Hodgkin-Huxley, action potential, axon, lipid raft, saltatory conduction**

#### **1. INTRODUCTION**

Propagation of action potentials (AP) in axons relies on the concerted action of membrane-spanning selectively permeable ion channels (Hodgkin and Huxley, 1952). Myelinated axons feature a highly structured distribution of voltage-gated ion channels, with a characteristic clustering of Na<sup>+</sup> channels at the nodes of Ranvier. Saltatory conduction (Huxley and Stämpfli, 1949; Fitzhugh, 1962) in myelinated axons refers to the rapid propagation of the electrical waveform from each node to the next (the AP seems to jump between nodes). This mode of conduction allows faster (Rushton, 1951; Waxman and Bennett, 1972) and more reliable (Kuriscak et al., 2002) propagation of signals than unmyelinated axons. In contrast, only unmyelinated axons, which are generally feature uniformly distributed ion channels (Black et al., 1981), are found at diameters approaching the physical limits to axon diameter (Waxman and Bennett, 1972) (*d*) at 0.1µm (Faisal et al., 2005), thus making the high connection densities of mammalian cortex possible.

The number of ion channels on the surface of neurons' membrane is usually thought to be large enough to justify combining the individual channel conductances into a continuous measure of overall conductivity (Dayan and Abbott, 2001), as originally done by Hodgkin and Huxley (1952). However in the case of thin axons, the number of ion channels may be too small for these approximations to be valid. Faisal and Laughlin (2007) showed that in order to accurately model thin axons, the behavior of individual ion channels needs to be taken into account. Channel noise in very thin axons has a large effect, limiting the miniaturization of fibers by imposing a lower diameter on axons at 0.1µm (Faisal et al., 2005). The conceptual transition from conductivity (per surface area) to density of channels, with each channel having only two possible conductance value corresponding to its open and closed states, involves investigating the effects of possible non-uniformities in the distribution of ion channels across the membrane.

Based on observations from the neonatal rat optic nerve, Waxman et al. (1989) hypothesized that action potentials could be propagated along thin (*d* ≈ 0.2µm) axons by "jumping" between individual Na<sup>+</sup> channels placed a few microns apart. This postulated mode of propagation would be the analog of saltatory conduction in myelinated axons (Huxley and Stämpfli, 1949) and was termed micro-saltatory conduction. Faisal and Laughlin (2007) showed that probabilistic gating of ion channels due to thermodynamic fluctuations, or channel noise (reviewed by White et al., 2000) makes micro-saltatory conduction between individual Na<sup>+</sup> channels impossible. This is because the very low diameters required for individual Na<sup>+</sup> channels to have a measurable effect on the membrane potential make the axon overly sensitive to stochastic opening of Na<sup>+</sup> channels, resulting in an excessive spontaneous AP rate.

In C-fibers, Pristerà et al. (2012) have recently discovered that NaV1.8, the voltage-gated Na<sup>+</sup> channels of these 0.1µm diameter unmyelinated axons (Sangameswaran et al., 1996; Baker, 2005), are packed tightly together on lipid rafts. Lipid rafts are "dynamic, nanoscale, sterolsphingolipids enriched, ordered assemblies of proteins and lipids" (Pike, 2006; Coskun and Simons, 2010; Simons and Gerl, 2010). They play a role in organizing the cell membrane, and act as hubs for functional localization of proteins (Pristerà et al., 2012). They also intervene in trafficking, clustering and electrophysiological properties of ion channels, and have an effect on cell excitability (reviewed by Pristerá and Okuse, 2012). They are typically 0.1–0.3µm long (Personal communication, Amber Finn and Kenji Okuse, 2011), and placed ∼5–10µm apart (Pristerà et al., 2012). Disassociating lipid rafts and Nav1.8 channels in DRG neurons is correlated with impaired neuronal excitability (Pristerà et al., 2012).

C-fibers are very thin unmyelinated peripheral axons responsible for transmitting nociceptive pain sensations (Lawson, 2002). A variety of Na<sup>+</sup> channels are found on the membrane of Cfibers, including TTX-sensitive Nav1.6 (Black et al., 2002) and Nav1.7 (Black et al., 2012) channels. Voltage clamp experiments have shown that these TTX-sensitive channels are involved in amplifying subthreshold depolarizations, and are active during APs (Vasylyev and Waxman, 2012). The slow-activating, slowinactivating Nav1.8 channels play a crucial part in the generation and propagation of APs in these fibers (Akopian et al., 1999; Renganathan et al., 2001; Lai et al., 2003). As a result, these TTX-resistant channels are of particular interest for treating neuropathic pain symptoms (reviewed by Scholz and Woolf, 2002). The clustering of Na<sup>+</sup> channels on lipid rafts resembles the structure of nodes of Ranvier in myelinated fibers, and may permit micro-saltatory conduction in those thin axons. Here, we investigate whether this mode of propagation is indeed possible, and its potential benefits in terms of basic constraints faced by neural fibers.

#### **2. MATERIALS AND METHODS**

We investigated the effects of the lipid-raft clustering of Na<sup>+</sup> channels on the function of neural fibers, using both deterministic and stochastic simulations. In stochastic simulations, the changes of conformations of ion channels were individually modeled (Faisal, 2012). Simulations were based on biophysical data from Baker (2005). Computations were carried out using the Modigliani stochastic simulator (Faisal et al., 2002, 2005), on a Linux PC using an Intel core i7 processor with the binomial algorithm, chosen because it allows accurate simulations that are less computationally intensive than the Gillespie algorithm (Faisal, 2010). Membrane capacitance was set to 0.81µF cm<sup>−</sup>2, axial resistance to 70 cm, and the membrane leak conductance was 0.14 mS cm<sup>−</sup>2. Leak reversal potential, Na<sup>+</sup> reversal potential and K<sup>+</sup> reversal potentials were, in order, −61.14, 79.6, and −85 mV.

Our C-fiber model axon contains only two types of voltage gated ion channels. We use a model of TTXresistant Na<sup>+</sup> channels (NaV1.8) based on physiological data from Baker (2005). The instantaneous Na<sup>+</sup> conductance in the model is given by *g*Na<sup>+</sup> = *g*¯Na<sup>+</sup> × *m*3*h* where *g*¯Na<sup>+</sup> = 1.25 mS cm<sup>−</sup>2. *m* and *h* follow the classical Hodgkin and Huxley (1952) dynamics, with rates α*<sup>m</sup>* = 3.83/(1 + *exp*((*Vm* + 2.58)/ − 11.47)), β*<sup>m</sup>* = 6.894/(1 + *exp*((*Vm* + 61.2)/19.8)), α*<sup>h</sup>* = 0.013536 × *exp*( − (*Vm* + 105)/46.33) and β*<sup>h</sup>* = 0.61714/(1 + *exp*((*Vm* − 21.8)/ − 11.998)). This transforms into the 8-state Na<sup>+</sup> channel model for stochastic simulations.

We also used the model for fast K<sup>+</sup> channels given by Baker (2005). The instantaneous K<sup>+</sup> conductance in the model is given by *g*K<sup>+</sup> = *g*¯K<sup>+</sup> × *n*<sup>4</sup> where *g*¯K<sup>+</sup> = 0.17 mS cm<sup>−</sup>2. and and the kinetic rates for *n* are given by α*<sup>n</sup>* = 0.00798(*Vm* + 72.2)/(1 − *exp*(( − 72.2 − *Vm*)/1.1)) and β*<sup>n</sup>* = 0.0142( − 55 − *Vm*)/(1 − *exp*((*Vm* + 55)/10.5)). This transforms into a 5-state channel for stochastic simulation (for details, see Faisal et al., 2002),

We simulated both uniformly distributed channels and channels clustered on lipid rafts placed regularly along the 0.1µm diameter C-fiber axon (see **Figure 1**). For the uniformly distributed Na<sup>+</sup> channels axon model, we use a single Na<sup>+</sup> channel conductance of 20 pS, which translates into a density of 56.25µm<sup>−</sup>2, and a single K<sup>+</sup> channel conductance of 17 pS, which translates into a density of 10µm<sup>−</sup>2. Single channel conductance values are putative (based on typical values for ion channels). For the clustered Na<sup>+</sup> channel model, we kept the density of K<sup>+</sup> channels constant in the region between lipid rafts. For comparisons between the axon with uniformly distributed Na<sup>+</sup> channels and the one with lipid rafts, the overall Na<sup>+</sup> density was kept constant (900µm−<sup>2</sup> for 0.2µm long rafts placed 3µm apart). We also simulate different cluster configurations (length, distance) based on previous work (Zeng and Tang, 2009) although the results from those can not be directly compared to

**FIGURE 1 | Schematic view of axonal models. (A)** The null-hypothesis axon. Both Na<sup>+</sup> and K<sup>+</sup> channels are uniformly diffused along the axon. **(B)** Axon with Na<sup>+</sup> channels clustered together on lipid rafts. We model this axon by placing a compartment containing a high density of Na<sup>+</sup> channels at regular distances in between compartments containing only K<sup>+</sup> channels.

the uniform axon. On the lipid rafts there are no K<sup>+</sup> channels. For all lipid raft configurations i.e., all values of raft length *l* and distance between rafts *L*, we simulated an axon long enough to contain 100 lipid rafts. At each trial, we injected a small current step twice. The first evoked AP was only used to ensure the ion channels were properly initialized. We only used data from the second AP of each trial.

The width of each AP is measured between the half-width points. The metabolic cost of AP propagation is usually defined as the amount of ATP molecules necessary to reverse the Na<sup>+</sup> current by Na<sup>+</sup> -K<sup>+</sup> -ATPase (Alle et al., 2009; Sengupta et al., 2010). However, we chose to keep this measure in terms of Na<sup>+</sup> charge and not convert it into a measure in terms of the amount of ATP molecules, because Na<sup>+</sup> charge is given directly by the amount of current crossing the membrane, and does not require additional assumptions on how the Na<sup>+</sup> charge is reversed.

#### **3. RESULTS**

**Figure 2** illustrates the propagation of an action potential in a Cfiber axon with uniformly distributed Na<sup>+</sup> channels, using both deterministic and stochastic simulations. In deterministic simulations, the AP waveform is kept constant while propagating through the axon. The activation profile of Na<sup>+</sup> channels, and hence the Na<sup>+</sup> current, is also the same at all points along the axon, as we expect.

In stochastic simulations using discrete, stochastic ion channels, there is considerable variability in the Na<sup>+</sup> current crossing the axon, as shown by the profile of Na<sup>+</sup> current **Figure 2D**. In addition, the stochastic opening of each discrete channel has a minimum current flow determined by the single channel conductance, that is larger than the minimum conductance allowed in the deterministic model. This is visible in the absence of blue bands of low Na<sup>+</sup> current in **Figure 2D**, although they are present at the beginning and end of the AP in **Figure 2C**.

#### **3.1. MICROSALTATORY CONDUCTION ALONG LIPID RAFTS**

Clustering Na<sup>+</sup> channels on putative lipid rafts still allows AP conduction (**Figure 3**). In both deterministic (**Figures 3A,C**) and stochastic (**Figures 3B,D**) simulations, the AP is sustained by the Na<sup>+</sup> current in lipid rafts alone. Plotting the profile of the AP waveform (**Figure 3E**) shows bumps in the waveform, corresponding to the placement of Na<sup>+</sup> channel lipid rafts.

The height of the AP is only slightly lower outside of lipid rafts. This is because *L* is much lower than the axon's length constant λ. Therefore, the membrane potential over the inter-raft region is roughly constant, and equal to that over lipid rafts. Note that in myelinated axons, the amplitude of APs over the internodal regions is also not much lower than the amplitude in nodes of Ranvier (Bakiri et al., 2011). This is also confirmed by our simulations of a mammalian myelinated axon model (see **Figure 3F**), based on data from McIntyre et al. (2002).

AP waveforms are slightly wider over lipid rafts in both deterministic (**Figure 3A**) and stochastic (**Figure 3B**) simulations. This effect is due to the reopening of Na<sup>+</sup> channels in the repolarizing phase of the AP (**Figure 8**). We investigated the influence of the length (*l*) of rafts, and the distance between them (*L*) on the shape of action potential. The results are plotted in **Figure 4**. Both AP width and height seem affected by the size and placement of lipid rafts. Longer rafts increase the width of APs almost linearly (**Figure 4A**). The greater number of Na<sup>+</sup> channels in longer lipid rafts also pushes the peak of APs toward the Na<sup>+</sup> reversal potential, increasing it from ∼55 to 75 mV (**Figure 4B**). Increasing the distance between lipid rafts, on the other hand, shortens the width of APs. With 0.2µm long rafts 10µm apart, the AP width is halved compared to placing the rafts only 1µm apart (**Figure 4C**). The height of APs is also reduced by furthering lipid rafts, this time in an almost linear fashion (**Figure 4D**) which is expected since the change is no longer limited by the Na<sup>+</sup> reversal potential.

Changes in the height and width of APs are expected because of the changes in the overall Na<sup>+</sup> channel density that is caused by changes in the size or placement of lipid rafts, and not the clustering of Na<sup>+</sup> channels *per se*. We treat this question in Section 3.5.

#### **3.2. CHANGE IN SHAPES OF APs RESULTS IN LOWER METABOLIC COSTS**

The change in the shape of APs directly results into changes in their metabolic cost (**Figure 5**). Shortening lipid rafts lowers the amount of Na<sup>+</sup> charge crossing the membrane and thus the cost in ATP associated with pumping Na<sup>+</sup> ions back out of the cell (**Figure 5A**). Increasing the distance between rafts also reduces the metabolic cost. The profile of the variation in metabolic cost closely follows that of change in AP width, suggesting that width, rather than height, determines the metabolic cost of firing APs (**Figures 4A,C**, **5**).

We can now compare the metabolic cost of APs in axons with clustered Na<sup>+</sup> channels to the cost in axons where Na<sup>+</sup> channels are uniformly distributed (**Figure 6**). Experimental results (Personal communication, Amber Finn) suggest *l* = 0.1–0.3µm long lipid rafts placed *L* ≈ 3µm apart. The overall density of Na<sup>+</sup> channels is kept constant by assuming a density of 900µm−<sup>2</sup> in the lipid rafts. In our simulation, this does not result in a significant change of metabolic cost. The metabolic cost for propagating APs over axons with both clustered and uniformly distributed Na<sup>+</sup> channels is ∼13 fCµm−<sup>1</sup> for the 0.1µm diameter axon in deterministic simulations. In stochastic simulations, the opening of a channel means that a conductance equal to that of the single channel is added to the membrane. This minimum

current due to the discrete nature of ion channel conductance has an impact on the metabolic cost of APs. Stochastic simulations yield a median value of 16 fCµm−<sup>1</sup> per AP in axon with clustered or uniformly distributed channels.

Increasing *L*, and partially compensating by increasing the density of Na<sup>+</sup> channels in lipid rafts lowers the Na<sup>+</sup> charge crossing the membrane: 0.2µm long rafts placed 8µm apart (Zeng and Tang, 2009) are more metabolically efficient than

uniformly placed Na<sup>+</sup> channels (∼11 fC cm−<sup>1</sup> in deterministic simulations). Shortening the lipid rafts to 0.1µm reduces the Na<sup>+</sup> charge even further, while maintaining the axon's capacity to propagate APs.

#### **3.3. PROPAGATION VELOCITY OVER CLUSTERED Na+ CHANNELS**

Due to their very small diameter, it is extremely difficult to obtain intracellular data from C-fibers, and therefore we can only estimate the propagation velocity in these fibers using extracellular recordings (Tigerholm et al., 2014). These estimations can not be reliably linked to axonal diameter. C-fiber axons are known for their very low conduction velocities. The conduction velocity is estimated to be 69cm s−<sup>1</sup> for a 0.25µm diameter axon (Tigerholm et al., 2014).

Deterministic simulations yield a velocity of ∼11cm s−<sup>1</sup> in both the axon with uniformly distributed Na<sup>+</sup> channels, and over clustered Na<sup>+</sup> channels (**Figure 7**). In stochastic simulations, we obtained a comparable median value. However, as was the case with the metabolic cost of APs, shortening lipid rafts or increasing the distance between them resulted in a reduction of the AP propagation velocity.

This difference can be attributed to the lowered inward ionic current. In axons, membrane current not only depolarizes the local membrane, but it also serves to drive the waveform of APs forward. A lower membrane current will result in slower depolarization of the membrane segment "ahead," and thus in slower AP propagation. Increasing *L*, and partially compensating by increasing the density of Na<sup>+</sup> channels as done in Section 3.2 reduces the metabolic cost, and accordingly the propagation velocity of APs. In the most extreme case we considered, with 0.2µm long lipid rafts placed 8µm apart, the median propagation velocity was ∼7cm s<sup>−</sup>1. In this axon, stochastically simulated APs fail to propagate in 3 trials out of 20.

#### **3.4. IONIC MECHANISMS BEHIND THE REDUCTION OF METABOLIC COST**

In order to find the mechanism behind the reduction of metabolic cost, we plotted the instantaneous Na<sup>+</sup> current and number of open Na<sup>+</sup> channels over the course of an AP (**Figure 8**). In the more metabolically efficient axons (green and red lines and shades), APs are shorter (**Figure 8A**, red and green curves) than in the axon with uniformly distributed Na<sup>+</sup> channels, or the axon with 0.2µm long lipid rafts placed 3µm apart.

The shortening of APs is due to a shorter period of Na<sup>+</sup> current activity (**Figure 8B**) in metabolically efficient axons. The Na<sup>+</sup> current seems to be relatively constant over the course of the AP. However, plotting the number of open Na<sup>+</sup> channels (**Figure 8C**) reveals that Na<sup>+</sup> conductance reaches its peak near the peak of the AP. In the repolarizing phase, the number of open Na<sup>+</sup> channels decreases markedly due to inactivation. But approximately halfway through repolarization, Na<sup>+</sup> channels can open again. This late reopening causes the "bump" in the repolarizing phase of the AP waveform. In the more metabolically efficient axons, the number of open Na<sup>+</sup> channels decreases faster, the reopening is less pronounced, and the end of Na<sup>+</sup> current is reached earlier. This explains the lower overall transfer of Na<sup>+</sup> charge in these axons.

We can also explain the lower metabolic cost of APs in stochastic simulations compared to deterministic simulations of the same axon (**Figure 6**). In stochastic simulations, there is more reactivation of Na<sup>+</sup> channels in the repolarizing phase as compared with deterministic simulations (**Figure 8C**). This effect is due to the "positive feedback" of Na<sup>+</sup> channels. The random opening of any Na<sup>+</sup> channel prolongs the repolarizing phase, and makes the opening of other Na<sup>+</sup> channels more likely.

**FIGURE 8 | Action potential and sodium current waveform in uniform channel density axons (Blue, STD shaded light blue, deterministic results in blue dotted line), in 0.2 µm long clusters placed 3 µm apart (Black, STD shaded gray, deterministic results in black dotted line), in 0.2 µm long clusters placed 8 µm apart (Red, STD shaded light res, deterministic results in red dotted line) and in 0.1 µm long clusters placed 8 µm apart (Dotted green line) in a 0.1 µm diameter C-fiber axon. (A)** The action potential waveforms **(B)** Instantaneous Na<sup>+</sup> current and **(C)** open Na<sup>+</sup> channels in a single compartment. Although there are few Na<sup>+</sup> channels open in the repolarizing phase of the AP, the larger difference between *Vm* and *E*Na<sup>+</sup> creates a Na<sup>+</sup> current comparable to that of the earlier stages.

#### **3.5. THE OBSERVED EFFECTS IN RAFTS ARE CAUSED BY LOWER Na+ CHANNEL DENSITY**

The effect of clustering Na<sup>+</sup> channels on the shape and metabolic cost of APs could simply be due to lower overall Na<sup>+</sup> channel densities. In order to verify if the clustering of Na<sup>+</sup> channels was in fact the responsible for the reduced metabolic cost, we simulated axons with uniformly distributed Na<sup>+</sup> channels by varying the density of said channels, and plotted the resulting metabolic cost (**Figure 9**).

There is no noticeable difference between the metabolic cost of APs in the axon with uniformly distributed Na<sup>+</sup> channels and the metabolic cost of APs propagating along an axon with Na<sup>+</sup> channels clustered on lipid rafts if both axons have the same overall Na<sup>+</sup> channel density (**Figure 9A**). Equivalently, the propagation velocity of APs is also the same in both types of axons, if the Na<sup>+</sup> channel density is kept constant (**Figure 9B**). Our results show that the observed effect on the metabolic cost is due solely to the reduced equivalent Na<sup>+</sup> channel density. That is, reducing the density of Na<sup>+</sup> channels in the uniformly distributed channels model produces the same short and metabolically efficient axons than in the clustered model.

Using our data, we can estimate the efficiency of AP propagation in C-fiber axons. Here we define the efficiency as Na<sup>+</sup> influx needed to charge membrane capacitance to AP peak/Na<sup>+</sup> influx

**FIGURE 9 | Metabolic cost and action potential velocity as a function of Na+ channel density in a 0.1 µm diameter C-fiber axon. (A)** Metabolic cost and **(B)** action potential velocity as a function of Na<sup>+</sup> channel density in both axons with uniformly distributed Na<sup>+</sup> channels (black) and axons with clustered Na<sup>+</sup> channels (red). Although the overall density of Na<sup>+</sup> channels clearly has an impact on both metabolic cost and velocity, there is no detectable effect from clustering Na<sup>+</sup> channels in lipid rafts.

per AP. Na<sup>+</sup> influx to charge membrane capacitance to AP peak is given by *dCmV* where *V* is the AP amplitude. The effective Na<sup>+</sup> influx per AP can be estimated using our simulation data. For the 0.1µm diameter C-fiber axon, we have plotted the results in **Figure 10**.

The axon with uniformly distributed ion channels is consuming almost 50 times the capacitive minimum current necessary to charge its membrane to AP peak. The least inefficient axon in this figure still is 20 times more expensive than the theoretical minimum. To check if this inefficiency is specific to the axon, we simulated a simple spherical membrane using the same ion channel densities and physiological data than the axon. We then compare the AP waveform in the spherical compartment and the axon in **Figure 11**. The AP in the axon is

**FIGURE 11 | (A)** Simulated and recorded action potentials and **(B)** sodium current waveform in a uniform channel density axon (Black) and in a soma (Red). The green curve in **(A)** is reproduced from Figure 6 in Baker (2005). The recorded AP is elicited by a long period of current injection, and therefore the membrane potential before the AP is not representative of the true resting potential, reported to be -58 mV. The AP is wider and more metabolically expensive in the axon.

wider than both the AP simulated in our spherical compartment, and recorded APs from DRG cells (Baker, 2005). Note that the recorded APs were elicited by a rather long current injection in the cell. In the soma, the effective Na<sup>+</sup> current is ∼20 times the capacitive minimum current. This inefficiency factor is much larger than even notably inefficient axons such as the squid giant axon (Hodgkin, 1975; Vetter et al., 2001). This inefficiency seems due to incomplete inactivation of Na<sup>+</sup> channels. Indeed, plotting the β*<sup>h</sup>* function used for Na<sup>+</sup> channels in this channels reveals a significantly delayed inactivation compared to that used for Na<sup>+</sup> channels in the squid giant axon, for instance (**Figure 12**).

In order to confirm this, we simulated a spherical membrane compartment using physiological data from the C-fiber axon model, but with the same β*<sup>h</sup>* function as the squid giant axon Na<sup>+</sup> channels. The shape of the AP waveform and Na<sup>+</sup> current in this model is plotted in **Figure 11**. Action potentials in this model are much shorter than with the original kinetics for Nav1.8, and the total amount of Na<sup>+</sup> current crossing the membrane is correspondingly smaller. This results in a higher efficiency for the revised kinetics model, the inefficiency factor (Effective Na<sup>+</sup> current over minimum Na<sup>+</sup> current) being ∼3, compared to ∼17 for the soma with original Nav1.8 kinetics. These calculations take into account the difference in the amplitudes of APs between the two models.

#### **4. DISCUSSION**

We find that microsaltatory conduction is possible in C-fiber axons with Na<sup>+</sup> channels attached to lipid rafts, i.e., action potentials (APs) can propagate from one cluster of Na<sup>+</sup> channels to the next in thin C-fiber axons. We also show how late reactivation of Na<sup>+</sup> channels affects the average shape of AP waveforms and increases the metabolic cost of APs in thin axons. Reducing

the density of Na<sup>+</sup> channels in both axons with uniformly spaced Na<sup>+</sup> channels and axons with clustered Na<sup>+</sup> channels results in shorter and therefore more metabolically efficient APs.

the whole biophysically relevant range.

Varying the length of lipid rafts (*l*) and the distance between them (*L*) effects the shape of AP waveforms because of the associated changes in the Na<sup>+</sup> channel density. Smaller rafts, as well as rafts placed further apart from each other result in smaller Na<sup>+</sup> channel densitie and reduced AP width. This is due to reduced reactivation of Na<sup>+</sup> channels in the repolarizing phase of the AP. Because in this phase the membrane potential is already far from Na<sup>+</sup> reversal potential (ENa<sup>+</sup> ), a large current, comparable to the current at the peak of the AP, crosses the membrane through any randomly reactivated Na<sup>+</sup> channel.

The reactivation of even a small number of channels maintains the membrane potential in a depolarized state longer. This in turn opposes the repolarization of the membrane, leaving more time for the possible opening of other channels. This positive feedback effect makes APs slightly wider in stochastic simulations, where the possible stochastic opening of channels is taken into account. The discretization of ion channel conductances amplifies this effect, by increasing the minimum conductance. Since the effect of the opening of each channel is bigger in smaller axons (Faisal et al., 2005), reactivation of Na<sup>+</sup> channels results in slightly wider action potentials in thinner axons.

Our simulations lead to two new findings regarding the metabolic cost of propagating APs in C-fibers. First, incomplete inactivation of Nav1.8 channels, the primary voltage gated Na<sup>+</sup> channels in C-fibers, leads to a long lasting Na<sup>+</sup> current. This in turn creates very wide APs, which are metabolically very expensive. The Na<sup>+</sup> charge transfer necessary for one AP in a spherical membrane using these Na<sup>+</sup> channels is 17 times more than the minimum charge transfer needed to depolarize the membrane to AP peak. This value is higher than 4, previously obtained for squid giant axon channels (Hodgkin, 1975; Attwell and Laughlin, 2001), and much higher than the very metabolically efficient channel kinetics (Alle et al., 2009; Sengupta et al., 2010). However, the latter kinetics are obtained in higher temperatures and these comparisons should only be used as an illustration.

Although incomplete inactivation has been shown to allow fast spiking (Carter and Bean, 2009), it is not clear why slow firing fibers such as C-fibers exhibit the same phenomena. Cfibers are quiet in the absence of stimulation, and their firing rate does not seem to exceed ∼2 Hz (Campero et al., 2004; Obreja et al., 2010). Presumably, the very slow firing rates of these high-threshold fibers reduce the impact of metabolic cost of signaling in C-fibers. The very wide APs may have a functional role by ensuring a strong post-synaptic response (Klein and Kandel, 1980; Augustine, 2001), and thus prioritize APs carried by Cfibers. Another explanation may be that incomplete deactivation plays a role in ensuring transmission of APs in noise-prone thin fibers. Nav1.8 are not the only channels expressed on C-fibers (Black et al., 2002, 2012; Vasylyev and Waxman, 2012) and there is evidence for other channel types to be present uniformly along these axons. It is possible that these channels allow for lower Nav1.8 densities. The role of the Nav1.8 channels could then be to ensure a wide action potential, and clustering them together would lower the overall metabolic cost. More detailed simulations are needed to test this hypothesis.

We also find that the cost of propagating APs in axons is significantly higher than that of an AP in a spherical membrane compartment. In our simulations, the cost of propagating action potentials in axons is roughly three times the cost estimated at the soma. The higher cost is associated with wider APs in the axon than in the soma. Our simulations use the same Na<sup>+</sup> channel kinetics in the soma model and in the axon, and the broadening effect can thus only be attributed to the spatial arrangement of the membrane, as opposed to channels kinetics (Hallermann et al., 2012 for instance, use different channel kinetics in their model which leads to narrower APs in axons).

Lipid rafts play a role in organizing trafficking and localization of proteins on the membrane and the clustering of Na<sup>+</sup> channels over lipid rafts may be beneficial in this context. Lipid rafts may allow colocalization of ionic pumps and Na<sup>+</sup> sensitive channels with Nav1.8 channels, which may have some beneficial results on the cell's ionic homeostasis by placing ion pumps near the sources of current.

Because there is no myelin sheath around C-fiber axons, the membrane capacitance and leak conductance are too high for Na<sup>+</sup> clusters to be placed at distances on the order of the axon's length constant (λ ≈ 200µm). In our simulations, the maximum distance *Lmax* between lipid rafts which allowed action potential propagation was ∼20µm. The proximity of lipid rafts makes the waveform of the action potential virtually unchanged, compared to the waveform in an axon with uniformly distributed Na<sup>+</sup> channels. This is in stark contrast with myelinated axons, where the myelin sheath lowers the capacitance and leak conductance of the membrane. As a result, nodes of Ranvier can be placed much further apart.

#### **FUNDING**

Ali Neishabouri was supported by the UK Engineering and Physical Sciences Research Council (EPSRC). A. Aldo Faisal is supported by the Human Frontiers in Science Program Grant (HFSP RPG00022/2012).

#### **REFERENCES**


White, J. A., Rubinstein, J. T., and Kay, A. R. (2000). Channel noise in neurons. *Trends Neurosci.* 23, 131–137. doi: 10.1016/S0166-2236(99)01521-0

Zeng, S., and Tang, Y. (2009). Effect of clustered ion channels along an unmyelinated axon. *Phys. Rev.* 80:021917. doi: 10.1103/PhysRevE.80.021917

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 May 2014; accepted: 15 September 2014; published online: 13 October 2014.*

*Citation: Neishabouri A and Faisal AA (2014) Saltatory conduction in unmyelinated axons: clustering of Na*<sup>+</sup> *channels on lipid rafts enables micro-saltatory conduction in C-fibers. Front. Neuroanat. 8:109. doi: 10.3389/fnana.2014.00109*

*This article was submitted to the journal Frontiers in Neuroanatomy.*

*Copyright © 2014 Neishabouri and Faisal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

#### OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org