# NEUROMODULATION OF EXECUTIVE CIRCUITS

EDITED BY: M. Victoria Puig, Allan T. Gulledge, Evelyn K. Lambe and Guillermo Gonzalez-Burgos PUBLISHED IN: Frontiers in Neural Circuits

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-707-1 DOI 10.3389/978-2-88919-707-1

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **NEUROMODULATION OF EXECUTIVE CIRCUITS**

## Topic Editors:

**M. Victoria Puig,** Hospital del Mar Medical Research Institute, Spain **Allan T. Gulledge,** Geisel School of Medicine at Dartmouth College, USA **Evelyn K. Lambe,** University of Toronto, Canada **Guillermo Gonzalez-Burgos,** University of Pittsburgh, USA

Dopaminergic projections (red) from the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) to the prefrontal cortex (PFC, mammals) or the nidopallium caudolaterale (NCL, birds) and striatum in the brain of a primate (human), a rodent (rat), and a bird (pigeon). Cortical areas (pallial in birds) across species are shaded in gray, the hatched area denotes the PFC/NCL regions while striatal areas are shaded in blue. Note that, in all species, dopamine neurons in the VTA and SNc project to several subregions of the PFC/NCL and striatum. Taken from: Puig MV, Rose J, Schmidt R and Freund N (2014) Dopamine modulation of learning and memory in the prefrontal cortex: insights from studies in primates, rodents, and birds. Front. Neural Circuits 8:93. doi: 10.3389/fncir.2014.00093

Cover image: Like drops of water on the surface of a lake, neural processing generates peaks and valleys of activity that undulate across the brain. Neuromodulatory transmitters regulate this ongoing process, tuning brain circuits to optimize perception and behavior. Photo credit: the Puig laboratory at Hospital del Mar Medical Research Institute in Barcelona.

High-order executive tasks involve the interplay between frontal cortex and other cortical and subcortical brain regions. In particular, the frontal cortex, striatum and thalamus interact via parallel fronto-striatal "loops" that are crucial for the executive control of behavior. In all of these brain regions, neuromodulatory inputs (e.g. serotonergic, dopaminergic, cholinergic, adrenergic, and peptidergic afferents) regulate neuronal activity and synaptic transmission to optimize circuit performance for specific cognitive demands. Indeed, dysregulation of neuromodulatory input to fronto-striatal circuits is implicated in a number of neuropsychiatric disorders, such as schizophrenia, depression, and Parkinson's disease. However, despite decades of intense investigation, how neuromodulators influence the activity of fronto-striatal circuits to generate the precise activity patterns required for sophisticated cognitive tasks remains unknown. In part, this reflects the complexity of the cellular microcircuits in these brain regions (i.e. heterogeneity of neuron subtypes and connectivity), cell-type specific expression patterns for the numerous receptor subtypes mediating neuromodulatory signals, and the potential interaction of multiple signaling cascades in individual neurons. This Research Topic includes 10 original research articles and seven review articles addressing the role of neuromodulation in executive function at multiple levels of analysis, ranging from the activity of single voltage-dependent ion channels to computational models of network interactions in cortex-striatum-thalamus systems.

**Citation:** Puig, M. V., Gulledge, A. T., Lambe, E. K., Gonzalez-Burgos, G., eds. (2016). Neuromodulation of Executive Circuits. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-707-1

# Table of Contents


Rammohan Shukla, Akiya Watakabe and Tetsuo Yamamori


Mirjana Carli and Roberto W. Invernizzi


Sakyasingha Dasgupta, Florentin Wörgötter and Poramate Manoonpong


Jill R. Crittenden, Carolyn J. Lacey, Tyrone Lee, Hilary A. Bowden and Ann M. Graybiel


Behrad Noudoost and Kelsey L. Clark

## Editorial: Neuromodulation of executive circuits

#### M. Victoria Puig<sup>1</sup> \*, Allan T. Gulledge<sup>2</sup> , Evelyn K. Lambe<sup>3</sup> and Guillermo Gonzalez-Burgos <sup>4</sup>

*1 Integrative Pharmacology and Systems Neuroscience Research Group, Hospital del Mar Medical Research Institute, Barcelona, Spain, <sup>2</sup> Department of Physiology and Neurobiology, Geisel School of Medicine at Dartmouth College, Lebanon, NH, USA, <sup>3</sup> Department of Physiology, University of Toronto, Toronto, ON, Canada, <sup>4</sup> Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA*

#### Keywords: neuromodulation, executive function, prefrontal cortex, striatum, cognition

The executive control of behavior involves functional interactions between the frontal cortex and other cortical and subcortical brain regions, in particular with the striatum and thalamus, via parallel fronto-striatal-thalamic loops. In all of these brain regions, neuronal excitability, and synaptic transmission are regulated by serotonergic, dopaminergic, cholinergic, adrenergic, and peptidergic neuromodulatory afferent systems that are critical for optimizing cognitive task performance. By contrast, dysfunctional neuromodulation of fronto-striatal circuits is implicated in various neuropsychiatric and neurodegenerative disorders, such as schizophrenia, depression, and Parkinson's disease. Yet, despite decades of intense investigation, it remains poorly understood how neuromodulators influence the flow of neural activity in fronto-striatal circuits to facilitate cognition. Crucial pending questions in the field include (but are not limited to): (1) How the heterogeneity of neuron subtypes and their connectivity contribute to the complexity of the underlying cellular microcircuits that are substrates of neuromodulator effects. (2) Whether the numerous receptor subtypes mediating the neuromodulator effects have cell-type specific expression patterns and effects, (3) How multiple intracellular signaling cascades mediating neuromodulator receptor effects interact in individual neurons, (4) How do neuromodulators control the strength and plasticity of synaptic inputs onto different neuron types in fronto-striatal circuits, and (5) To what extent cellular, circuit and system level effects of neuromodulators are conserved across species. This Research Topic includes 10 original research articles and seven review articles addressing the role of neuromodulation in executive function at multiple levels of analysis, ranging from the activity of single voltage-dependent ion channels to computational models of network interactions in cortex-striatum-thalamus systems.

#### Edited by:

*Manuel S. Malmierca, University of Salamanca, Spain*

#### Reviewed by:

*Livia De Hoz, Max Planck Institute of Experimental Medicine, Germany*

> \*Correspondence: *M. Victoria Puig mpuig3@imim.es*

Received: *18 May 2015* Accepted: *22 September 2015* Published: *08 October 2015*

#### Citation:

*Puig MV, Gulledge AT, Lambe EK and Gonzalez-Burgos G (2015) Editorial: Neuromodulation of executive circuits. Front. Neural Circuits 9:58. doi: 10.3389/fncir.2015.00058*

Using cell-attached recordings of single channel and ensemble currents, Gorelova and Seamans (2015) show that dopamine (DA) D1/D5 receptors enhance persistent Na<sup>+</sup> current in the soma and dendrites, but not in the axon initial segment of layer 5 pyramidal cells (L5PCs) in the rat prefrontal cortex (PFC). This finding suggests a subcellular compartment-specific regulation of excitability in PFC L5PCs. Vitrac et al. (2014) find that DA D2 family receptors also modulate L5PC activity in the mouse primary motor cortex. They report that D2 receptor activation, by either systemic or intracortical administration of the D2 agonist quinpirole, enhances the firing of putative L5PCs in vivo. However, Dembrow and Johnston (2014) review recent evidence suggesting that neuromodulation of PFC L5PC activity by DA, serotonin (5HT), acetylcholine (ACh), or metabotropic glutamate receptors (mGluRs) may increase or decrease the probability of L5PC firing depending on their long-distance projection targets. Consistent with this hypothesis, Stephens et al. (2014) report that 5HT, via both 5-HT 1A and 2A receptors, differentially regulates L5PC activity in the PFC based on both their long-distance projection targets and their activity state (e.g., at rest, during current-induced firing, or with simulated synaptic input).

Neuromodulators can also influence synaptic signaling and plasticity in executive circuits. Ruan et al. (2014) examined spike-timing-dependent plasticity of glutamate synaptic inputs onto L5PCs in mouse PFC and found that interactions of D1/D5 and D2 DA receptors enable Hebbian and anti-Hebbian forms of NMDA receptor dependent plasticity. In their mini review, Arroyo et al. (2014) highlight recent work using optogenetic tools to address the nicotinic ACh receptor (nAChR)-mediated effects produced by selective stimulation of cholinergic axons, including studies assessing the mechanisms underlying nAChR-mediated fast synaptic transmission in cortical circuits. Bloem et al. (2014) also review studies of cholinergic modulation in PFC, focusing on how nAChRs affect signal processing in PFC microcircuits, and proposing that ACh neuromodulation of PFC circuit function is critical for attention via ACh actions on different nAChR subtypes localized in interneurons and PCs of different cortical layers.

Puig et al. (2014), reviewing DA neuromodulation of learning and memory processes across a spectrum of animal models, including birds, rodents, humans, and non-human primates, propose a highly conserved role for DA across mammals that also evolved comparatively, albeit independently, in the avian brain. Chandler et al. (2014) review the heterogeneity of DA and norepinephrine (NE) midbrain neurons, and the specific roles of subpopulations of both DA and NE neurons in PFCdependent cognitive tasks and in mental disorders. The review by Clark and Noudoost (2014) focuses on how DA in the PFC influences the interaction between neuronal activity in PFC and in other cortical regions in non-human primates, proposing that changes in catecholamine levels in the PFC contribute to attention and working memory function. Studying the nonhuman primate (marmoset) brain, Shukla et al. (2014) examined the expression of mRNAs for all of the 13 members of the 5HT receptor family, finding layer- and region-specific 5HT receptor expression in cortex and subcortical structures that suggest precise co-localization of different classes of receptors with 5HT and 5HT axons. The mini review by Miguelez et al. (2014) further explores the localization of 5HT receptor subtypes in various divisions of the basal ganglia in rodents, monkeys, and humans and discusses the physiological and behavioral effects of their manipulations in relation to the potential role of 5HT in the motor and cognitive disturbances in Parkinson's disease.

Carli and Invernizzi (2014), review the crucial role 5HT and DA play in executive function and attention, focusing on the effects of 5HT and DA receptor manipulation on behavioral disturbances produced in rodents by disrupting glutamate signaling in the PFC via local NMDA receptor antagonist administration. Using a computational network model, Morita and Kato (2014) explore the possibility that DA neurons, believed to compute reward prediction errors, convey this signal to cortico-striatal circuits in part via progressive increases of DA in the striatum that controls the decay of synaptic potentiation produced during performance of reward-associated navigation tasks. Dasgupta et al. (2014) similarly used simulations in a computational model network to test the hypothesis that, to generate goal-directed control of behavior, reward-based learning (dependent on cortex-striatum-thalamus circuits) cooperates with correlation-based learning (dependent on cerebellumthalamus-cortex circuits). Their model suggests a crucial role for neuromodulation of thalamic function in the integration of these processes. The impact of neuromodulation in the different thalamic nuclei and associated circuits is reviewed in detail by Varela (2014), who focuses on the role of midline and intralaminar groups of thalamic nuclei that may play important and specific roles in shaping executive function. Finally, Crittenden et al. (2014) report results of experiments testing the effects of overexpression of the vesicular ACh transporter in mouse brain, to assess if enhanced ACh signaling increases catecholamine levels/release, and thus modulates amphetamineinduced stereotypical behaviors that are a relevant model of behavioral alterations by drug abuse in humans.

Together, the articles summarized above demonstrate the elegant precision with which neuromodulators target specific neural circuits and subcircuits to facilitate cognition. While details of receptor expression, signaling cascades, and effector systems remain to be fully elucidated, the work highlighted in this collection demonstrates that the functional interactions between the frontal cortex and other cortical and subcortical brain regions are exquisitely sensitive to fine tuning by local release of neuromodulators. The data reported and summarized in these articles show evidence that this tuning involves neuron subtypespecific receptor expression, as well as receptor-specific effects within certain neuronal subtypes or subcellular compartments. A challenge for future studies will be linking such neuromodulatory effects at the level of the synapse or neuron with their role in plasticity at the systems level, described in the articles investigating fronto-striatal-dependent learning and behavior in animals and computational models. Fortunately, many of the cellular and system level effects appear to be conserved across mammalian and non-mammalian species, highlighting the importance of the themes addressed in this Research Topic for understanding fronto-striatal system function and dysfunction in psychiatric and neurological brain disorders.

## References


by NMDA receptor hypofunction in the 5-choice serial reaction time task. Front. Neural Circuits 8:58. doi: 10.3389/fncir.2014.00058


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Puig, Gulledge, Lambe and Gonzalez-Burgos. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Cell-attached single-channel recordings in intact prefrontal cortex pyramidal neurons reveal compartmentalized D1/D5 receptor modulation of the persistent sodium current

## **Natalia Gorelova and Jeremy K. Seamans \***

Department of Psychiatry and Brain Research Centre, University of British Columbia, Vancouver, BC, Canada

#### **Edited by:**

Allan T. Gulledge, Geisel School of Medicine at Dartmouth, USA

#### **Reviewed by:**

Bruce P. Bean, Harvard Medical School, USA Christian Alzheimer, Universität Erlangen-Nürnberg, Germany

#### **\*Correspondence:**

Jeremy K. Seamans, Department of Psychiatry and Brain Research Centre, University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC V6T 2B5, Canada e-mail: jeremy.seamans@ubc.ca

The persistent Na<sup>+</sup> current (INap) is believed to be an important target of dopamine modulation in prefrontal cortex (PFC) neurons. While past studies have tested the effects of dopamine on INap, the results have been contradictory largely because of difficulties in measuring INap using somatic whole-cell recordings. To circumvent these confounds we used the cell-attached patch-clamp technique to record single Na<sup>+</sup> channels from the soma, proximal dendrite (PD) or proximal axon (PA) of intact prefrontal layer V pyramidal neurons. Under baseline conditions, numerous well resolved Na<sup>+</sup> channel openings were recorded that exhibited an extrapolated reversal potential of 73 mV, a slope conductance of 14–19 pS and were blocked by tetrodotoxin (TTX). While similar in most respects, the propensity to exhibit prolonged bursts lasting >40 ms was many fold greater in the axon than the soma or dendrite. Bath application of the D1/D5 receptor agonist SKF81297 shifted the ensemble current activation curve leftward and increased the number of late events recorded from the PD but not the soma or PA. However, the greatest effect was on prolonged bursting where the D1/D5 receptor agonist increased their occurrence 3 fold in the PD and nearly 7 fold in the soma, but not at all in the PA. As a result, D1/D5 receptor activation equalized the probability of prolonged burst occurrence across the proximal axosomatodendritic region. Therefore, D1/D5 receptor modulation appears to be targeted mainly to Na<sup>+</sup> channels in the PD/soma and not the PA. By circumventing the pitfalls of previous attempts to study the D1/D5 receptor modulation of INap, we demonstrate conclusively that D1/D5 receptor activation can increase the INap generated proximally, however questions still remain as to how D1/D5 receptor modulates Na<sup>+</sup> currents in the more distal initial segment where most of the INap is normally generated.

**Keywords: prefrontal cortex, Na**<sup>+</sup> **channels, single channel recordings, persistent Na**<sup>+</sup> **current, dopamine, D1/D5 receptors**

## **INTRODUCTION**

Dopamine modulates a number of cognitive functions mediated by the prefrontal cortex (PFC) while dysregulation of the mesocortical dopamine system is thought to occur in psychiatric conditions. One current that plays an important role in shaping PFC activity is the persistent Na<sup>+</sup> current (*I*Nap). *I*Nap is similar to the fast transient Na<sup>+</sup> current but tends to activate at a lower voltage and inactivates more slowly (French and Gage, 1985; Patlak and Ortiz, 1985; French et al., 1990; Alzheimer et al., 1993; Taylor, 1993; Astman et al., 2006). *I*Nap strongly regulates intrinsic excitability, membrane oscillations (White et al., 1998; Hu et al., 2009), synaptic amplification (Stuart and Sakmann, 1995) and persistent activity (Durstewitz et al., 2000) while computational modeling has suggested that *I*Nap neuromodulation can profoundly affect overall network activity (Durstewitz et al., 2000; Durstewitz and Seamans, 2008).

A number of studies have reported that dopamine modulates *I*Nap in PFC neurons but the issue has been quite contentious. The inconsistencies may stem largely from the limitations of the techniques commonly used to study *I*Nap. In the initial papers, sharp intracellular pipettes were used (Geijo-Barrientos and Pastore, 1995; Yang and Seamans, 1996) which create a considerable shunt around the electrode and extremely poor voltage control. A subsequent study used whole-cell patchclamp recordings (Gorelova and Yang, 2000) which provided better, but still imperfect voltage control given the expansive dendritic arbor of deep layer PFC pyramidal neurons. One way to circumvent this problem, employed by Maurice et al. (2001) was to use dissociated cells where the neurites were enzymatically and mechanically severed. However, given the diameter of the axon relative to the soma, if the axon is >10 um in length, it's voltage is still difficult to control from a somatic electrode (White et al., 1995) while a second drawback is that key intracellular cascades could be disrupted or lost in the dissociation procedure which could be potentially serious given the dramatic differences in *I*Nap in the presence vs. absence of various intracellular molecules (Ma et al., 1994, 1997; Fleidervish et al., 2008). A final problem is that each Na<sup>+</sup> channel subtype tends to be distributed nonuniformly throughout the axonalsomato-dendritic region (Raman and Bean, 1997; Smith et al., 1998; Caldwell et al., 2000; Goldin, 2001; Rush et al., 2005; Osorio et al., 2010). Since all studies of dopamine modulation of *I*Nap to date have recorded exclusively from the soma, the issue of compartmentalized modulation has not been experimentally addressed.

The only solution to this myriad of potential artifacts and complications is to employ a technique where one can record in different cellular compartments with perfect voltage control while leaving intracellular signaling cascades untouched. This is possible with the cell-attached recording configuration. In the present study we performed cell-attached recordings from the soma, proximal apical dendrite (PD) and the proximal axon (PA) of deep layer PFC neurons. Using this approach we tested the effects of a D1/D5 receptor agonist on multiple aspects of Na<sup>+</sup> channel gating in hopes of gaining new insights into this controversial issue.

## **METHODS**

#### **SLICE PREPARATION**

The use and care of animals as well as protocol for slice preparation from anesthetized rats were approved by University of British Columbia Animal Care Committee.

Slices containing the medial prefrontal cortex (mPFC) were prepared from brains of 16–26 day old Sprague-Dawley rats. Animals were anesthetized with Isoflurane and killed by decapitation. The brain was quickly removed and placed in icecold oxygenated (CO<sup>2</sup> 95%, O<sup>2</sup> 5%) cutting solution containing (in mM): 120 NaCl, 20 NaHCO3, 10 HEPES, 3 NaOH, 2.5 KCl, 9 MgCl2, 0.5 CaCl2, 25 D-glucose, 0.4 L-ascorbic acid. Coronal slices containing mPFC were cut on a vibratome at 300 µm. Dissected slices were kept at room temperature in a holding chamber in continuously oxygenated artificial cerebrospinal solution (ACSF) containing (in mM) 125 NaCl, 25 NaHCO3, 2.5 KCl, 2 CaCl2, 1 MgCl2, 1.25 NaH2PO4, 25 D-glucose, 0.4 L-ascorbic acid and 0.01 CNQX. The same composition ASCF was used for recording. After >1 h incubation, slices were transferred to a recording chamber and perfused with continuously oxygenated ACSF at a rate of 1–1.5 ml/min. Recordings were made at room temperature.

#### **PHARMACOLOGICAL AGENTS**

Stock solutions of CNQX, AP5 (Ascent Scientific, Princeton USA), TTX (Alomone labs, Israel) and SKF81297 (Sigma) were prepared in water, aliquoted and stored frozen at −30◦C. Each drug was thawed and diluted to an appropriate concentration immediately before application.

#### **SINGLE CHANNEL RECORDINGS**

Layer V pyramidal cells were visualized in brain slices using infrared differential interference contrast optics (Axioskop Zeiss). Recordings were made from cell bodies, proximal apical dendrites (PD, 5–10 µm from soma) and proximal axons (PA, axon initial segment, 3–15 µm from soma) (**Figure 1A**). Pipettes were brought next to the neuron and very weak positive pressure was used to clean the surface before seal formation. Single channel recordings were made in cell-attached configuration. Patch pipettes were made from thick wall borosilicate glass capillaries with an outer diameter of 1.5 mm. The internal surface of the glass capillaries was treated with Sigmacote and allowed to dry at room temperature at least 3 days before being used for manufacturing patch pipettes. This treatment significantly reduced capacitance and improved the quality of the seal, which approached values >40 GΩ. To reduce the number of single channels in a patch we used pipettes with resistances of 15–25 MΩ when filled with patch solution. The pipette solution for recording Na<sup>+</sup> channels contained the following (in mM): 130 NaCl, 3 KCl, 2 CaCl2, 2 MgCl2, 0.1 CdCl2, 0.02 CNQX, 0.05 AP5, 10 D-glucose, 5 tetraethylammonium chloride, 1 4-AP and 10 HEPES with a pH of 7.3. The pipette solution for recording delayed rectifier K<sup>+</sup> channels contained the following (in mM); 150 KCl, 10 HEPES, 2 CaCl2, 2 MgCl2, 10 D-glucose with a pH of 7.4.

Command voltage protocols were generated and singlechannel currents were acquired using an Axopatch 200 B amplifier with a Digidata 1320A analog-to-digital interface (Axon Instruments, CA). Capacitive transients were minimized using built-in circuits of the amplifier. Data were low-pass filtered at 2 or 5 kHz and digitized at 50 kHz. The root mean square (RMS) noise was usually between 0.125 and 0.25 pA. Patches were held 20–40 mV more negative than the resting membrane potential and stepped to potentials 20–80 mV more positive than the resting membrane potential.

#### **DATA ANALYSIS**

Data were analyzed using Clampfit 9.0 and 10.4 (pClamp package, Axon Instruments). Residual capacitance transits were nullified by off-line subtraction. For detection of single channels, state transitions with a minimum duration threshold of 0.05 ms were used. A list of idealized channel events was created and used for further analysis. For deriving single channel conductances, the amplitudes of well resolved square shape unitary events were chosen and the amplitudes of 15–25 unitary events measured at a given membrane potential were plotted against membrane potential for each patch. To calculate a slope conductance and extrapolated reversal potential, a linear regression analysis was performed in Statistica. For calculating the conductance of channels entering the prolonged bursting mode, we used the following depolarizing voltage ramp: from a holding potential 40 mV more negative than resting membrane potential, the voltage was slowly increased to 80 mV more positive than resting membrane potential at a rate of 0.2 mV/ms. Traces without channel openings were averaged and this average trace was used for leak subtraction.

Ensemble-average traces were constructed by averaging 60 individual sweeps. The peak current at each potential was then converted to a conductance assuming a Na<sup>+</sup> reversal potential of +60 mV. Least square fits to the Boltzmann function:

$$\chi = A/\left\{1 + \exp\left[\left(Vm - V1/2\right)/k\right]\right\} + C$$

were made in Clampfit for each individual patch as well as for groups of patches.

A repeated measures ANOVA was used for statistical analysis of the voltage dependance of brief late Na<sup>+</sup> channel openings in the PD, soma and PA. A Student's *t*-test was used to determine the significance of the effect of the D1/D5 receptor agonist on ensemble currents. For statistical analysis of the effect of the D1/D5 receptor agonist on the brief late Na<sup>+</sup> channel openings and their gating, Student's*t*-tests with Holm-Bonferroni correction for multiple comparisons were performed. The values in the text and figures are presented as mean ± SEM. The degrees of freedom are presented as the subscripts to *F* and *t*.

#### **RESULTS**

#### **BASELINE CHARACTERISTICS OF UNITARY AND ENSEMBLE NA**<sup>+</sup> **CHANNEL CURRENTS IN mPFC NEURONS**

The present study includes 22 cell-attached recordings from the soma, 34 from the PD and 13 from the proximal axon (PA; **Figure 1**). Even though all recordings were performed in cellattached mode, inward currents were shown as downward for consistency.

The ability to analyze and compare cell-attached recordings from different sites or under different conditions requires a reasonably accurate knowledge of the transmembrane potential. This can be difficult for cell-attached recordings. Following each recording, we applied suction to attain whole-cell mode and quickly recorded the membrane potential. The average resting voltage at break in was −72.1 ± 0.7 mV, *n* = 23. The membrane potential at break-in was used as a correction in all of the analyses described below.

Since the pipette solution contained blockers of K<sup>+</sup> (TEA, CsCl), Ca2<sup>+</sup> currents (CdCl), AMPA (CNQX) and NMDA (AP5) currents, the remaining inwardly going single channel openings were assumed to be Na<sup>+</sup> currents. Accordingly, when the selective blocker of Na<sup>+</sup> channels, TTX (1 uM), was included in patch solution, no inward single channel openings were observed (*n* = 11, not shown). Examples of Na<sup>+</sup> channel gating in a PD cell-attached recording is shown in **Figure 1B**. From a presumed holding potential of −100 mV, 80 mV voltage steps produced early channel openings as well as multiple late channel openings. Openings included single brief openings, short bursts of brief openings as well as prolonged burst openings. The amplitudes of the brief yet fully resolved late (>20 ms after voltage step initiation) single openings were quantified across a family of voltage steps and plotted against the membrane potential (**Figure 1C**). The slope of regression line gave us the conductance of unitary openings and the extrapolated reversal potential. The reversal potentials calculated for 11 patches were between +68.8 and +79.1 mV, with an average +73.1 ± 0.96 mV, *n* = 11. This is very close to the calculated Nernst equilibration potential for Na<sup>+</sup> current at 25◦C which would be +66.9 to 76 mV with an external Na<sup>+</sup> concentration (i.e., the patch solution) of 135 mM and assuming an internal Na<sup>+</sup> concentration of 7–10 mM.

Using the same approach we also calculated the slope conductances of the Na<sup>+</sup> channels recorded from the three regions. The average conductance of late single events recorded from PD recordings was 15.2 ± 3.8 pS, *n* = 8, from somatic recordings was 15.1 ± 3.2 pS, *n* = 8 and from PA recordings was 16.1 ± 4.8 pS, *n* = 7. While combining many patches in this manner was useful in that it produced robust overall estimates, it could occlude subtle differences in the individual slope conductances present in a given patch. While most patches had Na<sup>+</sup> channels with conductances of ∼16 pS, there were a few patches from the axon and dendrite (but not soma) that exhibited a slope conductance of ∼19 pS. These conductance values are very consistent with past studies of *I*Nap in cultured cortical pyramidal neurons (Magistretti et al., 1999a,b; Magistretti and Alonso, 2006).

In addition to the brief late openings, the channels sometimes exhibited prolonged burst openings that could last several hundreds of milliseconds (**Figure 1B**). To attain a measure of the conductance of channels displaying sustained burst openings, we exploited the prolonged nature of these bursts and recorded channel openings during depolarizing voltage ramps from a transmembrane potential of −120 mV to 0 mV. An example of one of these prolonged burst openings recorded from a dendritic patch during a depolarizing ramp is shown in **Figure 1D**. Regression analysis yielded a slope conductance of 16.5 pS for the patch shown in **Figure 1D** and an average value of 16.7 ± 2.96 pS for 5 additional PD patches. This conductance value was very consistent with what was obtained from single events shown in **Figure 1C**. Therefore, the present results suggest that *I*Nap in layer V PFC neurons can be produced by a population of ∼16 pS Na<sup>+</sup> channels that enter a distinct prolonged gating mode, consistent with past studies in neurons from other cortical regions (Alzheimer et al., 1993).

All patches contained multiple channels as manifest by the appearance of overlapping multiple openings at the beginning of the depolarizing steps. To combine or compare data obtained from different patches we estimated the number of channels in each patch using peak current variance methods. Assuming that all Na<sup>+</sup> channels within a patch are independent and have uniform conductance and open probability, the number of channels (*N*) and the peak open probability (*P*o) can be derived as follows (Kimitsuki et al., 1990; Astman et al., 2006):

$$\begin{array}{rcl} N & = & I\_{\text{peak}}/iP\_{\text{o}}\\ P\_{\text{o}} & = & 1 - \sigma\_{\text{peak}}^2/iI\_{\text{peak}} \end{array}$$

where *I*peak is the average Na<sup>+</sup> current value at the peak, σ <sup>2</sup>peak is the peak Na<sup>+</sup> current variance and i is the unitary single channel current amplitude.

To estimate the number of channels in each patch we measured the amplitude of the peak current during a 60 mV depolarizing step as well as the later unitary single channel currents that occurred from 20 ms to the end of the step. Across the entire data set, there was an average of 5.9 ± 0.8, *n* = 16 channels/patch in somatic patches, 7.6 ± 1.5, *n* = 18 channels/patch in PD patches and 9.5 ± 1.8, *n* = 13 channels/patch in PA patches. For the analysis of late openings, we normalized the number of openings and open probability obtained for each patch based on the estimated number of channels in the patch.

The late channel openings were counted starting 20 ms after the beginning of the depolarizing step. The average number of late openings per channel per sweep was calculated by dividing the number of all late openings by the number of channels in the patch and by the number of depolarizing sweeps. Open probability of late openings was calculated as a ratio of the total open time during depolarizing steps relative to the total time of the depolarization and then divided by the estimated number of channels in the patch. To obtain the voltage dependance of late openings, patches were held 20 mV more negative than the resting membrane potential and stepped to potentials 20–80 mV more positive than the resting membrane potential in 5 mV intervals (corrected based on the resting membrane potential at break-in).

To derive mean values of the number of openings, dwell time and *P*<sup>o</sup> we combined data from different patches in 5 mV bins. The mean number of openings, dwell time and *P*<sup>o</sup> for events recorded from the three regions are shown in **Figure 2**. We included in the analysis all late single openings or late openings that appeared as a part of brief bursts. Bursts with durations longer than 40 ms were excluded from this analysis but will be dealt with below. For all regions the largest number of openings was observed at an estimated transmembrane voltage of −30 to −40 mV. The mean number of openings was not significantly different for the three areas (*F*(2,8) = 0.98, *p* = 0.41). The mean dwell time progressively increased with larger step voltages and attained an asymptote at ∼−20 mV. Again the three regions did not differ in terms of mean dwell time (*F*(2,8) = 1.77, *p* = 0.22). Finally the mean *P*<sup>o</sup> peaked at ∼−30 mV and also did not show a difference between the regions after Holm-Bonferroni correction for multiple comparisons (*F*(2,8) = 4.3, *p* = 0.049).

Next we characterized the ensemble currents produced by summing over numerous single sweeps (**Figure 3A**). For these experiments patches were held −40 mV below rest and a series of voltage steps 20–80 mV above rest were delivered. Even for patches with the smallest N, an ensemble current could always be observed by averaging hundreds traces following a voltage step to −20 mV. However, for constructing I-V plots we only used patches containing more than 6 channels. **Figure 3B** describes the I-V relationship of the ensemble current depicted in **Figure 3A**. We used two approaches to calculate the average half activation voltage (*V*mid) for each region. First, Boltzmann fits to the normalized conductances for each patch were performed and the average *V*mid was then calculated. The resultant *V*mid values were not different between regions: −16.1 ± 1.11 mV, *n* = 6 for the PD vs. −16.4 ± 2.65 mV, *n* = 5 for the soma vs. 16.5 ± mV, *n* = 5 for the PA (*F*(2,14) = 0.04, *p* = 0.96). Second, for each region we combined the normalized conductance values from all single patches into a single plot and then performed the Boltzmann fits (**Figure 3C**). The obtained values of *V*mid were similar to the first approach and were −16.4 ± 0.65 mV for the PD, −16.2 ± 1.04 mV for the soma and 16.14 ± 0.72 for the PA.

#### **D1/D5 RECEPTOR MODULATION OF UNITARY AND ENSEMBLE NA**<sup>+</sup> **CHANNEL CURRENTS IN mPFC NEURONS**

Prior to analyzing the effects of the D1/D5 receptor agonist SKF81297 on Na<sup>+</sup> channel gating, it was important to determine whether the drug affected the membrane potential, since a change in voltage would alter all voltage-dependent measurements. To test this the K<sup>+</sup> reversal potential was analyzed under baseline conditions and following the administration of SKF81297 (3– 5 µM) in the bath. The delayed rectifier K<sup>+</sup> current was chosen because it is very prominent in cell-attached recordings from mPFC neurons in the absence of TEA. To measure changes in K<sup>+</sup> reversal potential, we recorded the delayed rectifier K<sup>+</sup> channel using a patch solution with a potassium concentration of 150 mM. This was close to the internal potassium concentration, thereby bringing the K<sup>+</sup> reversal potential in the patch close to 0. We used the following ramping voltage protocol: from the resting membrane potential, the voltage was slowly increased to 120 mV more positive than the resting membrane potential at a rate of 0.2 mV/ms. By delivering such ramping protocols it allowed us directly record the reversal potential of the current with an accuracy of ±0.5 mV. In 5 patches tested, the K<sup>+</sup> reversal potential changed by less than 1 mV (range −0.8 mV + 0.6 mV) following D1/D5 receptor agonist administration (**Figure 4**). This indicated that any impact of SKF81297 on membrane potential was negligible and should not contaminate our analysis of its effects on *I*Nap.

The effect of the D1/D5 receptor agonist on Na<sup>+</sup> channel gating was assessed in two ways. Since it was difficult to attain a viable patch with unwavering seal resistance for more than ∼15 min, there was usually insufficient opportunity to measure Na<sup>+</sup> channel gating across a variety of voltage steps under baseline and SKF81297 conditions in the same patch. Therefore, we either tested a single voltage step under baseline conditions and following SKF81297 in a single patch, or we performed a series of voltage steps in one group of patches under control conditions and repeated the same voltage steps in a different group of patches that received SKF81297 immediately upon seal stabilization.

The average ensemble response from a single PD patch under baseline and SKF81297 conditions is shown in **Figure 5A** for a voltage step to a transmembrane potential of −20 mV. It shows a moderate increase in the ensemble current in response to D1/D5 receptor stimulation. **Figure 5B** represents group data for the patches from the three regions. The amplitudes of the ensemble

triangles). The SEM is given by the corresponding colored lines. **(A)** The number of late Na<sup>+</sup> channel openings (per channel, per sweep) (N) **(B)** dwell time or **(C)** open probability (Po) of late Na<sup>+</sup> channel openings for each region as a function of transmembrane voltage.

currents were increased by SKF81297 in PD patches by 28 ± 8 %, *n* = 6, in somatic patches by 23 ± 17 %, *n* = 6 and in PA patches by 25 ± 7 %, *n* = 5.

The normalized conductances were then plotted as a function of voltage for the group of patches recorded under baseline conditions and a different group of patches recorded in the presence of 3 µM SKF81297. Boltzmann fits revealed that SKF81297 shifted the Na<sup>+</sup> current activation curve leftward in all three regions (**Figure 5C**). The same analysis was rerun in a slightly different manner in that the Boltzmann fits were performed first on each patch and then the results were combined. This also showed that SKF81297 had a significant effect on *V*mid in the PD (−16.3 ± 2.7 mV, *n* = 6 in control vs. −22.6 ± 3.6 mV, *n* = 7 in SKF81297, *t*<sup>11</sup> = 3.79, *p* < 0.01), the soma (−16.1 ± 4.4 mV, *n* = 5 in control vs. −21.4 ± 3.2 mV, *n* = 5 in SKF81297 *t*<sup>8</sup> = 2.43, *p* < 0.05) and the PA (−16.6 ± 2.5 mV, *n* = 5 in control vs. −21.6 ± 0.75 mV, *n* = 5 in SKF81297, *t*<sup>8</sup> = 4.44, *p* < 0.01). There were no significant differences in the average maximal current amplitudes between the control group of patches and the patches treated with SKF81297 (PD: 7.3 ± 4.8 pA, *n* = 6 in control vs. 10.3 ± 5.2 pA, *n* = 7 in SKF81297, *t*<sup>11</sup> = 1.1, *p* = 0.29) (soma: 6.7 ± 1.8 pA, *n* = 5 in control vs. 6.1 ± 1.8 pA, *n* = 5 in SKF81297, *t*<sup>8</sup> = 0.54, *p* = 0.6) (PA:15.1 ± 8.2 pA, *n* = 5 in control vs. 14.04 ± 2.9 pA, *n* = 5 in SKF81297, *t*<sup>8</sup> = 0.21, *p* = 0.84). Therefore, based on this analysis of ensemble currents, D1/D5 receptor activation caused a greater Na<sup>+</sup> current for the same voltage step because it produced a leftward shift in activation, rather than an absolute increase in the peak channel conductance. In these experiments, the average membrane potential at break in was −71.8 ± 0.7 mV, *n* = 18 for all cells in the control group and did not differ significantly from the average membrane potential at break-in for all cells treated with SKF81297 (−71.2 ± 0.4 mV, *n* = 18) (*t*<sup>34</sup> = −0.8, *p* = 0.43).

Next we investigated the effects of SKF81297 on multiple late single channel openings. In these experiments we utilized 100 ms and 550 ms depolarizing voltage steps. Since we didn't find any difference in the late Na<sup>+</sup> channel openings between these two groups, they were pooled. The number of late openings was calculated by dividing the number of all late openings by the number of channels in the patch and by the number of depolarizing sweeps and then scaled to a 80 ms length of sweep. **Figure 6A** shows example traces from a dendritic patch under baseline conditions and following the activation of D1 receptors by SKF81297. Across the population of patches recorded at voltage steps to transmembrane potentials of −40 to −50 mV, SKF81297 significantly increased the number of openings in the PD (0.17 ± 0.036 in control vs. 0.24 ± 0.042 in SKF81297, *t*<sup>9</sup> = 5.96, *p* = 0.0001), but not the soma (0.24 ± 0.04 in control vs. 0.29 ± 0.051 in SKF81297, *t*<sup>8</sup> = 1.15, *p* = 0.14) or PA (0.148 ± 0.021 in control vs. 0.18 ± 0.035 in SKF81297, *t*<sup>6</sup> = 1.84, *p* = 0.057) (**Figure 6B**). SKF81297 also significantly increased *P*<sup>o</sup> in the PD (0.00165 ± 0.00054 in control vs. 0.00217 ± 0.00068 in SKF81297, *t*<sup>9</sup> = 3.75, *p* < 0.002) the soma (0.00155 ± 0.0003 in control vs. 0.00217 ± 0.00053 in SKF81297, *t*<sup>8</sup> = 2.41, *p* = 0.045) but not the PA (0.00168 ± 0.00044 in control vs. 0.00160 ± 0.00028 in SKF81297, *t*<sup>6</sup> = −0.33, *p* = 0.37) (**Figures 6B,C**). The overall dwell time did not differ under baseline vs. SKF81297 (**Figures 6B–D**) in the PD (0.663 ± 0.081 ms in control vs. 0.656 ± 0.068 ms in SKF81297, *t*<sup>9</sup> = 0.31, *p* = 0.38), the soma (0.519 ± 0.058 ms in control vs. 0.621 ± 0.072 ms in SKF81297, *t*<sup>8</sup> = 2.52, *p* = 0.05) or the PA (0.944 ± 0.158 ms in control vs.

**FIGURE 4 | Testing the effects of SKF81297 on membrane potential based on an analysis of K**<sup>+</sup> **channels**. To get a surrogate measure of transmembrane voltage in cell-attached mode, the reversal potential for delayed rectifier K<sup>+</sup> channel openings was used. For these experiments, the patch solutions were altered by removing K<sup>+</sup> channel blockers and matching the [K+] in the patch pipette to the intracellular concentration, yielding a reversal potential near 0 mV. Voltage ramps started at the resting membrane potential and moved to +120 mV depolarized from rest (bottom schematic). The resting membrane potential for the presented cell was −80 mV. Multiple continuous openings were evoked. These openings started as outward but flipped to inward as the patch was depolarized. The reversal occurred at a transmembrane potential of −3.45 mV (top). Following the bath application of SKF81297 (3–5 µM), the reversal occurred at a transmembrane potential of −3.43 mV (bottom). Black and red lines are single sweeps. Sweeps with channel openings across a wide range of voltages were chosen. The background current was subtracted.

0.982 ± 0.119 ms in SKF81297, *t*<sup>6</sup> = −1.27, *p* = 0.12). Therefore, D1/D5 receptor stimulation mainly increased the probability that Na<sup>+</sup> channels open in the PD and to a lesser extent in the soma.

In order to confirm that the above effect of SKF81297 on Na<sup>+</sup> channel gating in the PD was due to D1/D5 receptor activation, we tested if the D1/D5 receptor antagonist SCH23390 could block the effect of SKF81297 by applying SCH23390 (3 µM) 10 min before application of SKF81297(3 µM). As can be seen in **Figure 7**, when the D1/D5 receptor agonist was applied in the presence of a D1/D5 receptor antagonist, no increase in either the number of late channel openings (0.24 ± 0.0049 in SCH23390 control vs. 0.21 ± 0.052 in SKF81297 + SCH23390, *t*<sup>4</sup> = 2.23, *p* = 0.09), the channel open probability (0.0015 ± 0.0003 in SCH23390 control vs. 0.0014 ± 0.0003 in SKF81297 + SCH23390, *t*<sup>4</sup> = 4.7, *p* = 0.009) or the dwell time (0.457 ± 0.037 ms in SCH23390 control vs. 0.486 ± 0.073 ms in SKF81297 + SCH23390, *t*<sup>4</sup> = 0.66, *p* = 0.55) was observed.

Next we compared the effects of the D1/D5 receptor agonist on the number of single openings vs. short bursts (prolonged bursts will be dealt with separately). For these analyses, short bursts were defined as multiple events occurring within an interval <2 ms and with a total duration of less than 40 ms. SKF81297 had marginal but non-significant effects on the number of isolated single events in the PD (0.082 ± 0.015 in control vs. 0.097 ± 0.021 in SKF81297, *t*<sup>9</sup> = 1.44, *p* = 0.09), the soma (0.089 ± 0.016 in control vs. 0.094 ± 0.019 in SKF81297, *t*<sup>8</sup> = 0.48,*p* = 0.32) and the PA (0.049 ± 0.033 in control vs. 0.062 ± 0.014 in SKF81297, *t*<sup>6</sup> = 2.02, *p* = 0.05). In contrast, the D1/D5 receptor agonist affected various burst metrics as shown in **Table 1**. Specifically, D1/D5 receptor stimulation significantly increased the total number of short bursts but only in the PD (0.034 ± 0.0093 in control vs. 0.053 ± 0.014 in SKF81297, *t*<sup>9</sup> = 6.07, *p* = 0.00009, **Table 1**). It also increased the total number of events that occurred within all the recorded bursts, but again only in the PD (0.097 ± 0.028 in control vs. 0.146 ± 0.033 in SKF81297, *t*<sup>9</sup> = 5.43 *p* = 0.0002, **Table 1**). In contrast, the D1/D5 receptor agonist did not affect the average number of events/burst (2.51 ± 0.48 in control vs. 2.78 ± 0.75 in SKF81297, *t*<sup>9</sup> = 0.95, *p* = 0.09, **Table 1**). Thus the most likely explanation for these results was that SKF81297 caused an enhanced propensity of the Na<sup>+</sup> channel to open in bursts.

Finally we analyzed the effect of SKF81297 on prolonged bursts. First we analyzed the probability of channel entering the prolonged burst mode in patches subjected to 50 mV depolarizing steps in control and during D1/D5 agonist application. For each region we calculated the number of prolonged openings for all patches and divided this number by the total number of traces multiplied by the number of channels in each patch. The probability of prolonged burst were higher during D1/D5 agonist application compared to the control in the PD (0.000282 in control vs. 0.000783 in SKF81297) and the soma (0.000212 in control vs. 0.001575 in SKF81297) but not the PA (0.000562 in control vs. 0.000631 in SKF81297). The low probability of prolonged bursts prevented us from performing statistical comparisons on these data. To overcome this, we calculated the probability of prolonged openings in control patches and in a separate group of patches that were exposed to the D1/D5 receptor agonist. Each patch was subjected to a series of steps to several membrane potentials, totaling ∼1000 traces for each patch. The probability of prolonged bursts was calculated for each patch. Even though prolonged bursts were many fold more prevalent in the PA than the PD or soma under baseline conditions, SKF81297 increased the mean probability of their occurrence only in the PD (*t*<sup>12</sup> = 6.42, *p* = 0.0003) and the soma (*t*<sup>10</sup> = 2.36, *p* = 0.01) but not the PA (*t*<sup>10</sup> = −0.59, *p* = 0.48) (**Figure 8**). In fact, the D1/D5 receptor agonist brought the prevalence of prolonged bursts in the PD and soma to the level of the PA under baseline conditions (**Figure 8**) and therefore selectively boosted the relative impact of *I*Nap in these regions. This tendency to promote prolonged bursting was the most significant effect of the D1/D5 receptor agonist on *I*Nap overall yet is very consistent with the conclusion above, that the drug also increased the propensity of Na<sup>+</sup> channels to open in shorter bursts.

Given these findings, it is of interest to consider how the recorded channels might contribute to the whole cell *I*Nap under control conditions and following SKF81297. The total *I*Nap current for each region was estimated based on Na<sup>+</sup> channel kinetics for steps to −20 mV using the following equation:

$$I\_{\rm Nap} = N^\*(P\_{\rm oB} + P\_{\rm oL})^\*i,$$

(right, red). The resting membrane potential for this patch recorded after break-in is given by the dotted line in the bottom schematic. **(B)** Change in the average ensemble Na<sup>+</sup> current amplitude evoked by a 50 mV voltage step above rest in single patches by SKF81297. Each dot represents the

where *N* is the total number of channels, *P*oB is the open probability for brief openings, *P*oL is the open probability for prolonged burst openings and *i* is the unitary current amplitude. Although we estimated the number of channels in each patch from the actual recordings and used this value as a means to make conclusions about the single channel properties, our experimentally derived values for *N* were not used in the calculation of the whole-cell *I*Nap, since we were not exactly steps to various transmembrane potentials for PD, somatic and PA patches. Each dot represents the normalized peak conductance for a single patch. Lines represent Boltzmann fits under control conditions (blue) and in SKF81297 (red). Average half activation is given in the insets.

certain of the area of our patches and because the cytoskeletal properties of each region may differently affect the number of channels/patch (Kole et al., 2008). Rather, the determination of N was based on the published properties of cortical pyramidal neurons. Assuming the soma of a layer V cortical pyramidal cell is ∼20 µm wide and 25 µm long, it possesses a total surface area of 1099 µm<sup>2</sup> (for a cone). A PD ∼5 µm in diameter and cylindrical, would have a surface area 314 µm<sup>2</sup> for a 20 µm length, whereas

an axon ∼1.2 µm in diameter, would have a surface area of 75 µm<sup>2</sup> for a 20 µm length. Sodium channel density has been estimated to be 5 per µm<sup>2</sup> for the soma and PD (Hu et al., 2009). The estimates of sodium channel density in the initial segment vary from three fold to 40–50 fold higher than that of soma depending on methods used (Colbert and Pan, 2002; Kole et al.,

2008; Hu et al., 2009; Fleidervish et al., 2010). For our calculations we used a 10 times higher density of sodium channels in the PA compared to the soma, yielding 50 channels per µm<sup>2</sup> . Therefore, we estimate there would be 5495 Na<sup>+</sup> channels at the soma, 1570 Na<sup>+</sup> channels in the first 20 µm of the PD and 3750 Na<sup>+</sup> channels in the first 20 µm of the PA. In our recordings, the average unitary current amplitude at −20 mV across all the patches was 1.55 pA. *P*oL was calculated using the following equation:

$$P\_{\rm oL} = \sum T\_{\rm j} / \sum \mathfrak{r}\_{\rm j} \* \mathfrak{r}\_{\rm j},$$

where *T<sup>j</sup>* is the total time of all prolonged burst openings for the patch *j*, *t*<sup>j</sup> is the total duration of all recorded −20 mV steps for the patch *j* and *n*<sup>j</sup> is the number of channels in patch *j*.

The sums for each region were calculated across all patches subjected to 50 mV depolarizing steps in control and during D1/D5 receptor agonist application. This gave *P*oL values of 0.00028 for the PD, 0.00021 for the soma and 0.00042 for the PA. Values of *P*oB were 0.00165, 0.00155 and 0.00168 for the PD, soma and PA respectively.

Based on these values, under control conditions the contribution of brief late openings to the total *I*Nap would be ∼4 pA for the PD, 13.2 pA for the soma and 9.8 pA for the PA while the contribution of prolonged burst openings would be ∼0.68 pA for the PD, 1.79 pA for the soma and 2.44 pA for the PA. The combined contribution of brief late openings and prolonged burst openings would be expected to produce a total *I*Nap of ∼4.68 pA for the PD, 14.99 pA for the soma and 12.2 pA for the PA. The total *I*Nap across the three regions would be ∼31.9 pA, a value that is comparable to that obtained previously in acutely dissociated cortical pyramidal cells (see Maurice et al., 2001).

These calculations were repeated but using values obtained from the same patches in the presence of SKF81297. The *P*oB values during D1/D5 receptor agonist application were 0.00217, 0.00217 and 0.0016 for the PD, soma and PA respectively. And the



For each region, the left column is the percent change from baseline and the right column is the p-value as determined by paired sample t-tests. The degrees of freedom are 9 for the dendrite, 8 for the soma and 6 for the axon. Alpha levels were determined by Holm-Bonferroni correction.

calculated *P*oL values were 0.00078, 0.0015 and 0.00047 for the PD, soma and PA respectively. In this case, the *I*Nap produced by brief late openings would now be ∼5.28 pA for the PD, 18.48 pA for the soma and 9.3 pA for the PA, whereas the *I*Nap resulting from prolonged burst openings would be ∼1.9 pA for the PD, 12.8 pA for the soma and 2.73 pA for PA. The total *I*Nap in SKF81297 would therefore be ∼7.18 pA for the PD, 31.28 pA for the soma and 12 pA for the PA and when combined across the three regions would produce a total *I*Nap of ∼50.46 pA. This represents a 60% increase over control. Furthermore, under control conditions prolonged bursts would contribute only ∼15% of total *I*Nap, whereas following SKF81297, the contribution of prolonged bursts would increase to 35%.

## **DISCUSSION**

The present study investigated the effects of the D1/D5 receptor agonist SKF81297 on single Na<sup>+</sup> channel gating recorded from the PD, soma and PA of deep layer mPFC neurons in acute brain slices. We found that SKF81297 shifted the activation of the early transient channel openings to more negative potentials in all three regions, while increasing the *P*<sup>o</sup> of late openings and increasing prolonged burst probability mainly in the PD and to lesser extent in the soma. And as was estimated above, these effects would lead to an increase in the whole-cell *I*Nap.

*I*Nap was first demonstrated in neocortical neurons by Stafstrom et al. (1982, 1985). It was initially thought that a prolonged Na<sup>+</sup> current could be produced by a window current attributable to the overlap between steady-state activation and inactivation (Attwell et al., 1979). Subsequently, *I*Nap has been commonly interpreted to result from brief forays of the fast Na<sup>+</sup> channel into a persistent or "noninactivating" gating mode during as little as 1% or less of all depolarizations (French and Gage, 1985; Patlak and Ortiz, 1985; French et al., 1990; Alzheimer et al., 1993; Taylor, 1993; Astman et al., 2006). It was proposed that in cortical layer V pyramidal cells, *I*Nap was generated primarily by Na<sup>+</sup> channels in the axon (Astman et al., 2006) and was attributed to the presence of Nav 1.6 channels (Caldwell et al., 2000; Hu et al., 2009) which enter the noninactivating gating mode more frequently and produce a significantly larger *I*Nap than Nav1.1–1.2 channels localized in the soma and dendrites (Raman and Bean, 1997; Smith et al., 1998; Goldin, 2001; Rush et al., 2005). However, data obtained from Nav 1.6 knock-out mice revealed that although a large proportion of *I*Nap in layer V PFC cells is attributable to Na<sup>+</sup> channels containing the Nav 1.6 subunit, Na<sup>+</sup> channels with Nav1.1−1.2 subunits also contribute to *I*Nap (Maurice et al., 2001).

In the present study, the early transient current recorded in the PA displayed the same half activation as the early transient current recorded from soma and PD. Yet Na<sup>+</sup> channels recorded from PA displayed significantly larger open probabilities than for the PD and soma. Specifically, under control conditions the average probability of prolonged bursts in the PD and soma was 10 times lower than that of the late single or short burst openings. In contrast, in the PA the average probability of prolonged bursts was comparable to that of the late single or short burst openings. This implies that the prolonged bursts make a far greater contribution to the total *I*Na<sup>+</sup> in the PA. It also suggests that while our recordings were in close proximity to each other, the regions were still functionally segregated in terms of their compliment of Na<sup>+</sup> channels.

In the studies of Na<sup>+</sup> channel gating in cultured entrorhinal layer II neurons, the average conductance of persistent burst openings was higher than that of early openings responsible for the transient Na<sup>+</sup> current (∼20 vs. ∼15 pS) (Magistretti et al., 1999a,b; Magistretti and Alonso, 2006). While we did detect subgroups of Na<sup>+</sup> channels with different conductance levels, we found that channels with a conductance of ∼16 pS could produce persistent openings. Magistretti et al. (2003) showed that single Na<sup>+</sup> channels can exhibit at least three "bursting states" of different mean durations but that each Na<sup>+</sup> channel preferentially operates predominately in a specific gating mode for protracted periods (Magistretti et al., 1999b; Magistretti and Alonso, 2006). These observations raise the perennial question of whether *I*Nap is mediated by differential gating in a common pool of Na<sup>+</sup> channels or whether distinct Na<sup>+</sup> channels are responsible for *I*Nat and *I*Nap. Magistretti et al. (1999b) argued for the possibility of something in between, as a subgroup of transient Na<sup>+</sup> channels may undergo some form of modulation to enter prolonged persistent gating modes. Supporting this contention, Szulczyk et al. (2012) recently showed that activation of D1/D5 dopamine receptors increased the availability of the fast Na<sup>+</sup> current without affecting current amplitude through a cAMP/PCA mechanism in mPFC neurons recorded in cell-attached mode. The present data also support the predictions of Magistretti et al. (1999b). On one hand, the single channel openings themselves were little changed as the single channel amplitude and dwell times in control and SKF81297 conditions were not significantly different. In spite of this, D1/D5 receptor activation significantly increased the number of openings as well as the propensity of the Na<sup>+</sup> channels to open in short and especially prolonged bursts in the PD and soma. In fact, prolonged burst probability increased three fold in the PD and nearly seven fold in the soma, which effectively brought the probabilities to the levels observed under baseline conditions in the PA. Thus D1/D5 receptor stimulation created a more uniform *I*Nap in mPFC neurons by equalizing basal differences in burst propensity across the soma, axon and dendrite.

Our estimates of the contribution of late openings of Na<sup>+</sup> channels in the PD, soma and PA to the whole cell *I*Nap show that activation of D1/D5 receptors can lead to a significant increase in the whole cell *I*Nap. Although useful as a means to help contextualize the significance of the single channel data, there are some caveats to these estimates that should be borne in mind. First, our estimates of *I*Nap from the PA are not a reliable indicator of the total *I*Nap produced in the initial segment. As shown by Astman et al. (2006), most of the *I*Nap in cortical pyramidal neurons is generated in the distal portion of the initial segment, well beyond where we recorded. In the proximal region of the axon, Nav 1.2 is dominant, rather than Nav 1.6 (Hu et al., 2009) that exits more distally. On the other hand, our estimates of *I*Nap from the soma and PD do not fall prey to this issue since the density of Na<sup>+</sup> channels does not tend to increase as one moves away from the soma into the dendrites. Therefore, if the peak whole cell *I*Nap recorded from the soma is ∼300 pA (Astman et al., 2006), and we estimate that the three proximal compartments collectively generate a ∼30 pA *I*Nap, then the distal initial segment of the axon must generate the remaining 90% of the whole cell *I*Nap. This conclusion is well in line with that of Astman et al. (2006). Hence, in order to attain a comprehensive understanding of dopamine modulation of Na<sup>+</sup> currents in mPFC neurons, similar cell-attached recordings from Na<sup>+</sup> channels in the distal axonal initial segment are still required. A second important point is that the "whole-cell" *I*Nap may not always be the key variable of interest as *I*Nap generated in unique compartments might independently contribute to different aspects of signal processing. While the distal Nav 1.6 channels were proposed to be the main spike triggers, Nav 1.2 channels may primarily aid in spike back propagation from the axon to soma (Hu et al., 2009). Dendritic Na<sup>+</sup> channels might have a completely different function. For example, in an intact brain, overall membrane conductance is expected to be greater during periods of enhanced network activity, making neurons less electrically compact. This will result in a greater attenuation of synaptic potentials approaching the soma and axon along the apical dendrite. This may be one situation where D1/D5 receptor modulation plays a particularly important role, given the dramatic increase in the propensity of dendritic Na<sup>+</sup> channels to burst following SKF8127.

#### **THE EFFECTS OF DOPAMINE ON I** Nap **IN THE CONTEXT OF PAST WHOLE-CELL PATCH-CLAMP STUDIES**

While there is a growing consensus that dopamine acting via D1/D5 receptors increases the excitability of deep layer mPFC neurons, the present data shed some light on the sharp disagreement about whether this change in excitability is related to a change in *I*Nap. Initially, Geijo-Barrientos and Pastore (1995) used sharp intracellular recordings in the absence of blockers of other ionic currents to show that dopamine reduced a persistent inward current with properties consistent with *I*Nap. Because other ion channels were not blocked, it was difficult to attribute the change directly to *I*Nap modulation however. Soon after Yang and Seamans (1996) used similar recording techniques but found that D1/D5 receptor agonists increased the TTX sensitive Na<sup>+</sup> plateau potential. A problem with this study was that since sharp somatic electrodes were used, it was impossible to control the voltage of the axo-somato-dendritic region adequately, and although various ion channel blockers were used, the nature of the modulation could not be precisely ascertained. Subsequently, Gorelova and Yang (2000) employed whole-cell patch-clamp recordings in the presence of blockers of most K<sup>+</sup> and Ca2<sup>+</sup> channels. They found that D1/D5 receptor agonists shifted the activation of the whole cell *I*Nap leftward and slowed inactivation. Although much better voltage control could be attained with patch electrodes, it was still impossible to control voltage changes in the tiny axon and dendrites from the somatic electrode. Furthermore, a space clamp error by definition means that there is a difference in the potential from the clamped soma to the more distal neurites and therefore a flow of current. In extended pyramidal neuron under these conditions, that flow of current can resemble *I*Nap (White et al., 1995). Maurice et al. (2001) then attempted to circumvent these issues by performing recordings in acutely dissociated mPFC neurons. While they achieved much better voltage control than in past studies, even a length of axon as short as 10 µm can be difficult to control from a somatic pipette (White et al., 1995). In addition, the reported absence of an effect of D1/D5 receptor agonists on *I*Nap could potentially have been the result of a loss/disruption of critical molecules needed for D1/D5 receptor modulation during the enzymatic/mechanical dissociation procedure. While Maurice et al. (2001) provided clear evidence that the D1-PKA pathway was functionally intact and able to modulate the fast Na<sup>+</sup> current in the dissociated neurons, the D1/D5 mediated increase in excitability of intact PFC neurons is thought to be mediated via a PKC and not a PKA mechanism (Chen et al., 2007). A PKC dependent increase in *I*Nap was also reported by Astman et al. (1998) who showed that PKC activation via phorbol esters greatly increased *I*Nap in somatosensory cortical neurons.

Finally a more recent attempt to address the issue was made by Rotaru et al. (2007). They employed a different approach as they investigated the D1/D5 receptor modulation of the EPSP amplification that is mediated mainly by *I*Nap (Stuart and Sakmann, 1995). They reported that D1/D5 receptor agonist reduced the amplification of EPSP waveforms and concluded that this was due to a reduction in *I*Nap. While these authors showed that other currents, including *I*<sup>h</sup> could impact EPSP amplification in separate experiments, they did not investigate the effects of a D1/D5 receptor agonist on EPSP amplification in the presence of an *I*<sup>h</sup> blocker. Since D1/D5 agonists increase *I*<sup>h</sup> (Rosenkranz and Johnston, 2006), this could potentially explain the apparent reduction in amplification by a D1/D5 agonist. The simultaneous modulation of *I*<sup>h</sup> and *I*Nap by D1/D5 receptor stimulation may be held within a tight balance and small differences in experimental procedures could conceivably shift the balance and thereby contribute to the differences across past studies.

The present study was designed to circumvent these past issues by using cell-attached recordings and showed that D1/D5 agonists increased *I*Nap mainly by promoting more robust bursting behavior in the PD and soma. While uncontaminated by the same issues that plagued past studies, we did not record from the distal initial segment where most of the *I*Nap is generated. Therefore, while the present data are quite clear in terms of how D1/D5 receptor activations modulates Na<sup>+</sup> channels proximal to the soma, general statements about how D1/D5 receptors modulate *I*Nap overall and under various realistic conditions, await future investigations.

#### **ACKNOWLEDGMENTS**

This research was supported by grants from CIHR.

## **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 July 2014; accepted: 08 January 2015; published online: 12 February 2015*. *Citation: Gorelova N and Seamans JK (2015) Cell-attached single-channel recordings in intact prefrontal cortex pyramidal neurons reveal compartmentalized D1/D5 receptor modulation of the persistent sodium current. Front. Neural Circuits 9:4. doi: 10.3389/fncir.2015.00004*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2015 Gorelova and Seamans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Dopamine control of pyramidal neuron activity in the primary motor cortex via D2 receptors

## *Clément Vitrac 1,2 , Sophie Péron1,2 , Isabelle Frappé1,2,3 , Pierre-Olivier Fernagut 4,5 , Mohamed Jaber 1,2,3 , Afsaneh Gaillard1,2 and Marianne Benoit-Marand1,2 \**

<sup>1</sup> Laboratoire de Neurosciences Expérimentales et Cliniques, INSERM, U1084, Poitiers, France

<sup>2</sup> Laboratoire de Neurosciences Expérimentales et Cliniques, Université de Poitiers, Poitiers, France

<sup>3</sup> CHU de Poitiers, Poitiers, France

<sup>4</sup> Institut des Maladies Neurodégénératives, UMR 5293, Université de Bordeaux, Bordeaux, France

<sup>5</sup> CNRS, Institut des Maladies Neurodégénératives, UMR 5293, Bordeaux, France

#### *Edited by:*

Allan T. Gulledge, Geisel School of Medicine at Dartmouth, USA

#### *Reviewed by:*

Kuei Y. Tseng, Rosalind Franklin University of Medicine and Science, USA Vikaas Singh Sohal, University of California at San Francisco, USA

#### *\*Correspondence:*

Marianne Benoit-Marand, Laboratoire de Neurosciences Expérimentales et Cliniques, Université de Poitiers, Bâtiment B36, 1 rue Georges Bonnet BP 633, 86022 Poitiers CEDEX, France

e-mail: marianne.benoit.marand@ univ-poitiers.fr

The primary motor cortex (M1) is involved in fine voluntary movements control. Previous studies have shown the existence of a dopamine (DA) innervation in M1 of rats and monkeys that could directly modulate M1 neuronal activity. However, none of these studies have described the precise distribution of DA terminals within M1 functional region nor have quantified the density of this innervation. Moreover, the precise role of DA on pyramidal neuron activity still remains unclear due to conflicting results from previous studies regarding D2 effects on M1 pyramidal neurons. In this study we assessed in mice the neuroanatomical characteristics of DA innervation in M1 using unbiased stereological quantification of DA transporter-immunostained fibers. We demonstrated for the first time in mice that DA innervates the deep layers of M1 targeting preferentially the forelimb representation area of M1. To address the functional role of the DA innervation on M1 neuronal activity, we performed electrophysiological recordings of single neurons activity in vivo and pharmacologically modulated D2 receptor activity. Local D2 receptor activation by quinpirole enhanced pyramidal neuron spike firing rate without changes in spike firing pattern. Altogether, these results indicate that DA innervation in M1 can increase neuronal activity through D2 receptor activation and suggest a potential contribution to the modulation of fine forelimb movement. Given the demonstrated role for DA in fine motor skill learning in M1, our results suggest that altered D2 modulation of M1 activity may be involved in the pathophysiology of movement disorders associated with disturbed DA homeostasis.

**Keywords: motor cortex, dopamine, mice, unbiased stereology,** *in vivo* **electrophysiology**

#### **INTRODUCTION**

The primary motor cortex (M1) is involved in fine voluntary movements control and in novel motor skill learning (Hosp et al., 2011). It integrates inputs from the premotor cortex and drives excitatory outputs to the spinal cord and the basal ganglia via glutamatergic pyramidal neurons. Dopamine (DA) indirect regulation of motor function through the modulation of basal ganglia activity has been widely described (Alexander et al., 1986; Lang and Lozano, 1998; Murer et al., 2002; Dejean et al., 2012). In addition, neuroanatomical studies have shown the existence of a direct DA innervation from the midbrain to M1 that could directly modulate M1 neuronal activity (Descarries et al., 1987; Gaspar et al., 1991; Raghanti et al., 2008).

Indeed, Gaspar et al. (1991) suggested the presence of such an innervation in the most superficial layers in human M1 using a tyrosine hydroxylase (TH) immunostaining to visualize monoaminergic fibers. In rats, Descarries et al. (1987) showed a dopaminergic innervation in cortical areas such as the cingulate cortex (Cg), or in the deep layers of M1, by using 3H-DA labeling. More recently, Hosp et al. (2011) described in rats direct projections from the ventral tegmental area (VTA) to M1. Although detectable dopaminergic tissue levels can be measured in the motor cortex, this DA innervation remains weak compared with other structures such as the striatum or nucleus accumbens. For instance, Godefroy et al. (1991) showed that DA concentration in the somatomotor cortex is about 50 times lower than in the striatum. However, the functional implication of DA in the motor cortex and other cortical regions, such as the prefrontal and cingulate cortices, has been well documented despite low tissue and extracellular DA levels (Awenowicz and Porter, 2002; Lopez-Avila et al., 2004; Schweimer and Hauber, 2006; Hosp et al., 2009; Molina-Luna et al., 2009). DA acts via five different receptors grouped in two classes, D1-like and D2-like, modulating differentially adenylyl cyclase (Jaber et al., 1996). In the last three decades, studies using *in situ* hybridization (Camps et al., 1990; Mansour et al., 1990; Gaspar et al., 1995; Santana et al., 2009) showed a wide distribution of the DA receptors in rodents. In the cortex, D1 receptors are localized in the layer VI whereas D2 receptors are localized primarily in the layer V (Weiner et al., 1991; Gaspar et al., 1995), which contains the principal output pathway

to all other cortical areas and to subcortical targets as the striatum or the pyramidal tract. Taken together, these data suggest that DA receptors could play a direct role in modulating the activity of M1.

Awenowicz and Porter (2002) and Huda et al. (2001) described *in vivo*, respectively, in rats and cats, that DA application decreases pyramidal neurons activity via both D1 and D2 receptors. More recently, Hosp et al. (2009) showed a transient reduced excitability of M1 mediated by the injection of a D2 antagonist, but not a D1 antagonist, in rats *in vivo*. Moreover, specific dopaminergic deafferentation of M1 impairs motor skill learning (Hosp et al., 2011) and is associated with decreased long term potentiation (LTP) that is mimicked by reversible blockade of D2 receptors (Molina-Luna et al., 2009). These data suggest that D2 receptors could potentiate basal activity of M1 neurons. Even though a DA projection was reported in M1, the literature lacks quantification of this innervation. Moreover, functional studies are still conflicting regarding the involvement of D1 receptors in the modulation of M1 neuronal activity, and even though the literature agrees on the involvement of D2 receptors, results diverge regarding its excitatory or inhibitory effect on M1 activity. Unfortunately, none of these studies was performed in mice; this is of interest given the substantial number of transgenic mice models targeting the DA system and often used as models of psychiatric or neurodegenerative disorders.

The aim of this study was to assess the neuroanatomical distribution of DA innervation in M1 in mice, and to evaluate the functional role of this innervation on M1 neuronal activity. To this end, we first characterized anatomically DA fiber density in M1 by using the DA transporter (DAT) as a specific marker of DA terminals. In order to precisely quantify this innervation, we performed an unbiased stereological quantification of DAT labeled fibers in M1. Secondly, since all previous studies consensually point to an involvement of D2 receptors in M1, we have tested the direct influence of DA on M1 neuronal activity through this receptor. For that purpose, we performed electrophysiological recordings of M1 neuronal activity while pharmacologically modulating D2 receptors. Our study indicates that DA innervates M1 in mice and is able to enhance the activity of pyramidal neurons in this structure.

## **MATERIALS AND METHODS**

#### **ANIMALS AND SURGERY**

All experiments were conducted in accordance with the guidelines of the French Agriculture and Forestry Ministry (decree 87849) and of European Union Directive (2010/63/EU). Adequate measures were taken to minimize animal pain as well as the number of animals used. Female mice C57/BL6 (3–6 months at the time of experiments, Janvier, France) were housed in ventilated cages and kept under a 12 h dark/light cycle. Animals had access to food and water *ad libitum*.

Before surgery, mice were deeply anesthetized with Urethane (1.8 g/kg) injected intraperitoneally (i.p.) before being secured to a stereotaxic frame (LPC, France) and maintained at 37–38◦C with a heating pad. A mouse brain stereotaxic atlas (Paxinos and Franklin, 2001) was used to guide electrode and pipette placements. Throughout the experiment, the efficiency of anesthesia

was determined by examining the tail pinch reflex. Additional Urethane (0.25 g/kg, i.p.) was administered when necessary.

#### **ELECTROPHYSIOLOGICAL PROCEDURES**

Electrophysiological single unit activity was recorded in M1 using electrodes pulled from borosilicate glass capillaries (GC 150 F, Harvard Apparatus, England) with a P-97 Flaming Brown (Sutter Instrument, USA). The tip of the electrode was broken to a diameter of 2 μm, and the electrode filled with a 0.4 M NaCl solution containing 2.5% neurobiotin (Vector Labs, USA). Electrodes had an *in vivo* resistance of 12–20 M-. Recording electrodes were lowered in M1 (1.3–1.5 mm lateral and 1.0–1.5 mm anterior to bregma) at a depth of between 0.65 and 1 mm from the brain surface.

Neuronal activity was amplified 10 times, filtered (bandwith: 300 Hz–10 kHz), and further amplified 100 times (Multiclamp 700-B, Axon Instruments, USA). The signal was digitized (Micro 1401 mk II, Cambridge Electronics Design, England) and acquired on computer using Spike 2 software. Recorded neurons were juxtacellularly labeled with neurobiotin (Vector Labs, USA) as described elsewhere (Pinault, 1996). Briefly, positive 250 ms current pulses were applied at 2 Hz with increasing currents (1–5 nA) until driving cell firing for at least 5 min. Immediately after the neurobiotin injection, mice were transcardiacally perfused with 0.9% NaCl and 4% paraformaldehyde (PFA). Brains were collected and post-fixed for 24 h at 4◦C in 4% PFA and cryoprotected overnight in 30% saccharose at 4◦C. Serial coronal sections (40 μm) containing M1 were cut using a cryostat (CM 3050 S, Leica, Germany). To reveal neurobiotin, sections were rinsed three times in 0.1 M phosphate buffer saline (PBS), processed for 1 h with a blocking solution (3% bovine serum albumine (BSA), 0.3% Triton X-100 in PBS) and incubated overnight at 4◦C within Streptavidin Alexa 568 (Invitrogen, USA) diluted 1:800 in PBS containing 3% BSA and 0.3% Triton X-100. Sections were then rinsed three times in PBS before being mounted on gelatin coated-slides, air-dried and coverslipped with DePeX (VWR, USA).

Antidromic stimulation of the striatum ipsilateral to the recording site was performed using a concentric bipolar electrode (SNEX-100, Rhodes Medical Instruments, USA) implanted in the dorsolateral striatum (2 mm lateral and 0.2 mm anterior to the bregma, depth of 1.85 mm from the brain surface). Electrical stimulations (0.5 ms, 600–800 μA) were applied every 5 s using an external stimulator (DS3; Digitimer, England) triggered by a 1401 Plus system (Cambridge Electronic Design, England).

#### **DRUG APPLICATION**

Systemic administration of D2 pharmacology was performed through an i.p.-implanted-needle connected to a syringe filled either with a D2 agonist (quinpirole, 0.5 mg/kg, Sigma, USA), D2 antagonist (haloperidol, 0.5 mg/kg, Sigma, USA) or 0.9% NaCl. Drug injections were performed after a 30 min baseline recording and electrophysiological activity was monitored for 45 min following the injection.

Local intracortical drug administration was performed using a glass pipette pulled from a glass capillary (GC 100 FS, Harvard Apparatus, England) filled with either quinpirole 100 μM, quinpirole 1 μM or artificial cerebrospinal fluid (ACSF) that was lowered close to the tip of the recording pipette. After a 5 min baseline recording, the drug was applied by air pressure and neuronal firing was monitored for another 15 min.

#### **ANALYSIS OF ELECTROPHYSIOLOGICAL DATA**

The recordings were analyzed offline. Action potential (AP) duration was measured from the time when AP begins to the time when baseline is recovered. In order to assess the pharmacological modulation of neuronal activity, AP firing rate was analyzed before and after pharmacological treatments of 10 min or 1 min durations, respectively, for i.p. and intracortical drug injection. AP durations, neuron responsiveness to striatal stimulation, and firing frequencies were analyzed using Spike 2 7.0 (Cambridge Electronics Design, England). AP firing patterns were analyzed using NeuroExplorer burst analysis (maximum interval to start a burst = 40 ms, maximum interval to end a burst = 10 ms, minimum interval between bursts = 20 ms, minimum duration of a burst = 5 ms and minimum number of spikes in a burst = 2).

#### **IMMUNOHISTOCHEMICAL PROCEDURES**

Three mice were deeply anesthetized with chloral hydrate (400 mg/kg). They were then perfused transcardiacally with 0.9% NaCl and 1% PFA. Brains were removed, post-fixed in 1% PFA at 4◦C for 24 h and cryoprotected overnight in 30% saccharose. Brains were serially cut in six sets of coronal sections (40 μm) using a vibrating microtome (MICROM HM 650V,Thermo Scientific, France). Free-floating sections were kept at −20◦C in glucose 0.19%, ethylene glycol 37.5% and sodium azide 0.25% in PBS 0.05 M.

For each brain, one of the six sets of sections was randomly chosen for DAT immunohistochemical processing. Sections were rinsed three times in 0.1 M Tris-buffered saline (TBS), treated with 0.6% H2O2 in TBS for 15 min, rinsed three times in TBS, and incubated for 90 min in blocking solution (10% donkey serum, 0.3% triton X-100 in TBS). Sections were incubated for 48 h at 4◦C with primary antibody (rabbit anti-DAT, 1:5000, gift from Pr Bertrand Bloch, CNRS UMR5293) diluted in blocking solution. Sections were rinsed three times in TBS and incubated for 1 h in the secondary antibody (donkey anti-rabbit biotin SP, Jackson Immuno Research, USA) diluted 1:500 in TBS containing 5% donkey serum and 0.3% triton X-100. Sections were rinsed three times in TBS, incubated in 0.5% avidin–biotin complex (Vector Labs, USA) in TBS, rinsed three times in TBS and processed with 3-3 -diaminobenzidine (Sigma, USA) and 0.33% H2O2. Sections were mounted, air-dried, and coverslipped in DePeX (VWR,USA).

#### **STEREOLOGICAL ANALYSIS**

Cingulate cortex was defined anteriorly from 2.58 mm to the bregma to posteriorly −0.82 mm to the bregma, as defined by Paxinos and Franklin (2001). The medial boundaries are defined by the medial line of the brain and the lateral boundaries are defined by the presence of horizontal cortical layers. M1 was defined anteriorly from 1.1 mm bregma to posteriorly −0.94 mm to the bregma from layers I to VI, as defined in a stereotaxic atlas. The relatively narrow layer IV and thick layer V defined the lateral and medial boundaries of M1, and ventral boundaries consisted of the most dorsal part of the corpus callosum. The deep layers of M1 were defined as the most ventral half of M1 (from 500 μm to the surface to the dorsal outline of the corpus callosum), as defined by Lev and White (1997). For the total number of sections containing M1, we sampled every sixth section, starting with a section randomly selected from the first six sections, to generate a set of distributed sections within each sample. After the DAT immunohistochemistry, the average final thickness of the sections was 11.97 ± 0.38 μm (i.e., a shrinkage of ∼70% during processing). The stereological analysis used was described previously by Mouton et al. (2002). Each section was scanned by a camera (Orca-R2, Hamamatsu Electronic, Japan) connected to a microscope (DM 5500, Leica, Germany). Then, virtual sphere probes were scanned on the *Z* axis of M1 and Cg using the Mercator Software (Explora Nova, France). Each sphere was 4 μm radius and contained in a 10 μm × 10 μm square, spacing between each square was 50 μm × 50 μm. Spheres were visualized as a series of concentric circles of changing circumferences upon focusing through the tissue. Finally, the intersections between the outline boundary of the sphere and the fibers were counted at each focal plane. To avoid artifacts due to border effects, upper and lower guard zones of 1 μm were kept for each section. The total length of fibers is calculated according to the following equation:

$$L = \text{--}\, \Sigma \, Q[\nu/a] F\_1 \cdot F\_2 \cdot F\_3$$

where *L* = total length of linear feature (in μm), *Q* = sum intersections between fibers and spheres, *F*<sup>1</sup> = 1/section sampling fraction (1/6), *F*<sup>2</sup> = 1/area sampling fraction, *F*<sup>3</sup> = 1/thickness sampling fraction, *v*/*a* = the ratio of the volume of one sampling box to the surface area of one spherical probe. All values are given as the mean ± SE. Calculated values are corrected for the 70% shrinkage due to section processing.

#### **DETERMINATION OF THE DOPAMINERGIC FIBERS DISTRIBUTION WITHIN M1**

To determine the rostrocaudal and mediolateral extent of dopaminergic fibers withinM1, photomicrographs of sections that previously underwent stereological analysis were used to determine the surface area occupied by DAT labeled fibers. On each section, the results were plotted as the occupied surface inμm<sup>2</sup> relative to the anteroposterior axis. Measures were performed using ImageJ 1.47v.

#### **STATISTICAL ANALYSIS**

Statistical analyses were performed using the Mann–Whitney test for independent data, and a two-way ANOVA with Bonferroni posttests when comparing drugs effect over time.

### **RESULTS**

#### **ANATOMICAL DISTRIBUTION OF THE DOPAMINERGIC TERMINALS IN M1**

DA fibers were labeled using DAT immunostaining in order to visualize the dopaminergic innervation in M1 (**Figures 1A,B,D**) and Cg (**Figures 1C,E**). Dopaminergic fibers were present in the deep layers of M1. In M1 and Cg, these fibers were long, tortuous and thin with tangles and branches. Stereology was used to precisely evaluate the extent of this innervation.

The mean total length of dopaminergic fibers was 1.89±0.22 m in M1 and 3.64 ± 0.56 m in Cg. The dopaminergic innervation density, calculated as the result of the total fibers length divided by the volume of the structure, was 0.54 ± 0.01 m/mm<sup>3</sup> in M1 and 2.18 ± 0.20 m/mm3 in Cg. Thus, according to this stereological approach, DA innervation is 4.4 times higher in Cg than in M1. However, since the dopaminergic fibers in M1 were found mostly in the deep layers (**Figure 1D**), we performed a stereological quantification of the dopaminergic innervation in the deep layers of M1 defined as the deepest half of M1 (**Figure 1B**). Total dopaminergic fibers length in the deep layers of M1 was 1.39 ± 0.06 m. This length is not statistically different from the total length of dopaminergic fibers found in the entire volume of M1 (*p* = 0.097), confirming our initial observation that dopaminergic terminals in this structure are mostly restricted to the deep cortical layers. The density of DA terminals in the deep layers of M1 was then estimated to 1.38 ± 0.17 m/mm3. Therefore, when restricting the analysis to the specific region innervated by DA in

M1, the dopaminergic innervation density is of the same order of magnitude as in Cg.

To further characterize the neuroanatomical distribution of dopaminergic innervation, we measured the distribution of DA fibers within M1. Differences appeared in the rostrocaudal distribution of DA fibers. Indeed, the area innervated by DA fibers is maximal between 0.2 and 1.10 mm anterior to the bregma (**Figure 1F**). Furthermore, regarding the mediolateral distribution of dopaminergic fibers in M1 (**Figure 1G**), we observed that only this area, which corresponds to the forelimb representation area (Tennant et al., 2011), is innervated on the whole mediolateral extend of the structure.

Altogether, these data show that DA innervates the deep layers of mouse M1 with a rostrocaudal gradient. The density of this innervation in M1deep layers is comparable to that of Cg. It has been well described that DA could modulate Cg neuronal activity (Lopez-Avila et al., 2004; Schweimer and Hauber, 2006). Thus, our results further suggest that the density of DA innervation in M1 deep layers could be sufficient to significantly impact neuronal activity.

#### **ELECTROPHYSIOLOGICAL CHARACTERISTICS OF RECORDED NEURONS**

We addressed the functional role of D2 receptors on M1 neuronal activity by electrophysiological single unit recordings in anesthetized mice (**Figure 2A**). Ninety-seven neurons in 56 mice were recorded in deep layers (**Figure 2B**). In order to investigate D2 effects on M1 output neurons, we focused our experiments on pyramidal neurons, although local-circuit inhibitory neurons are also present (Markram et al., 2004). Previous studies have established the electrophysiological characteristics of pyramidal neurons in rat prefrontal cortex (PFC). Pyramidal neurons exhibit low firing frequencies (between 0.1 and 5 Hz; Hajos et al., 2003) and AP durations above 0.95 ms (Mallet et al., 2005; Tseng et al., 2006). We analyzed these physiological characteristics in the 97 neurons recorded in this study; however, in our conditions, no clear bi-modal distribution emerged from this analysis that would have allowed to discriminate between cortical neuronal populations (inhibitory interneurons and excitatory pyramidal neurons; **Figure 2C**). Regarding firing patterns, we found that 83 neurons presented doublets or triplets (**Figure 2A**) and a bursty discharge pattern (34.47 ± 2.44% of spikes in burst). In order to determine an inclusion criteria specific to our experimental conditions, we analyzed the electrophysiological characteristics of neurons identified as projection neurons by their antidromic response to the stimulation of the ipsilateral striatum (**Figure 3A**). Neurons that presented antidromic responses were considered as pyramidal. We recorded nine antidromically responding neurons and four neurons that did not respond to the striatal stimulation. Responsive and non-responsive neurons were statistically different regarding their firing pattern (*p* < 0.01). Indeed, all neurons responding to the antidromic stimulation presented at least 25% of their spikes in bursts (ranging from 25 to 68%) whereas the non-responding neurons presented at most 8.8% of their spikes in bursts (ranging from 0 to 8.8%; **Figure 3B**). Thus, in our experimental conditions, the percentage of spikes in bursts is the best electrophysiological characteristic to consider a neuron as a pyramidal one. Using this characteristic as a criterion, 30 neurons presenting at least

potential shape (averaged over 5 min recording), the action potential duration is measured between the two dashed lines. **(B)** Schematic representation of the distribution of recorded neurons in M1 1.4 mm anterior to Bregma, neurobiotine labeled neurons (red dots) and non labeled neurons (black dots). Photomicrograph shows a representative example of neurobiotine labeled neuron. Scale bar represents 20 μm. **(C)** Distribution of the mean frequency (Hz) and AP duration (ms).

15% spikes in burst were included in the study and referred to as "putative pyramidal neurons".

#### **EFFECTS OF DOPAMINE D2 RECEPTOR AGONIST AND ANTAGONIST ON PUTATIVE PYRAMIDAL NEURON ACTIVITY IN M1** *In Vivo*

To study the effects of DA on M1 neuronal activity, we recorded AP firing rate of putative pyramidal neurons in the deep layers of M1 and their response to the D2 agonist quinpirole or the D2 antagonist haloperidol. We first performed intraperitoneal (i.p.) injections of quinpirole (0.5 mg/kg; *n* = 5), haloperidol (0.5 mg/kg; *n* = 5) or saline 0.9% (*n* = 5; **Figure 4**). D2 receptor activation by quinpirole enhanced putative pyramidal neurons firing rate by more than 200% (from 1.46 ± 0.39 Hz to 3.44 ± 0.81 Hz, two way ANOVA *F*(2,60) = 15.11, *p* < 0.001). There was no statistically significant effect of D2 receptors blockade by haloperidol on AP firing rate.

These effects could be due to a network effect, particularly via the basal ganglia. To avoid the indirect network effects of DA and address the direct effect of D2 activation on M1 activity, we performed intracortical injections of quinpirole 100 μM, quinpirole 1 μM or ACSF (**Figures 5A,B**). Due to absence of significant modifications after i.p. injections of haloperidol, we did not test the pyramidal neuron responses to intracortical injections of the D2 antagonist. Consistent with the results obtained after i.p. injections, local D2 receptor activation by quinpirole (100 or 1 μM) enhanced putative pyramidal neurons firing rate (respectively: Two way ANOVA *F*(4,28) = 5.24, *p* < 0.001; Two way ANOVA *F*(4,36) = 3.98, *p* < 0.01). Quinpirole (1 μM) also increased spike firing rates from 1.53 ± 0.44 Hz to 2.47 ± 0.62 Hz (**Figure 5C**). Furthermore, analysis of neuronal AP firing pattern revealed that the number of bursts, but not the percentage of spikes in burst, was increased by D2 receptors activation (data not shown). These results indicate that DA can enhance pyramidal neuron firing rates, but does not modulate firing patterns. Taken together, these results show that DA exerts a direct role on M1 neuronal activity by enhancing neuronal firing rate via D2 receptors.

#### **DISCUSSION**

In this study, we demonstrated for the first time in mice that DA innervates the deep layers of M1. We also established that these fibers target preferentially the forelimb representation area of M1. To address the functional role of DA on M1 neuronal activity, we performed electrophysiological recordings of single neuron activity *in vivo* and pharmacologically modulated D2 receptors. We demonstrated that D2 receptor activation by quinpirole enhanced pyramidal neuron spike firing rates. Our results also show that this increase was not due to an extracortical network effect, but is locally mediated in M1.

#### **ANATOMICAL CHARACTERIZATION OF DA INNERVATION OF M1 IN MICE**

Although TH immunolabeling is commonly used to reveal dopaminergic fibers (Gaspar et al., 1991; Busceti et al., 2008), TH is an enzyme common to all catecholamines synthesis, and such does not allow one to distinguish between adrenergic and dopaminergic fibers. Thus, to specifically target dopaminergic fibers, we used a DAT antibody. DAT distribution has already been shown to be restricted to dopaminergic regions (Ciliax et al., 1995). Our results in mice showing the existence of a dopaminergic innervation of M1 are in accordance with previous studies conducted in different species including rat (Descarries et al., 1987), monkey (Raghanti et al., 2008) and human (Gaspar et al., 1991; Raghanti et al., 2008). Moreover, this study provides for the first time a precise and direct quantification of this innervation in M1 and Cg using an unbiased stereological approach. This quantification allowed us to precisely detail the distribution of DA fibers at different levels of M1. Our data complement previous observations by showing that the density of dopaminergic innervation is similar in the deep layers of M1 and in Cg. The functional significance of DA in Cg has been well established (Lopez-Avila et al., 2004). Previous studies showing the existence of D1 and D2 receptors in M1 (Camps et al., 1990; Mansour et al., 1990; Gaspar et al., 1995; Santana et al., 2009), together with our present results, provide anatomical evidence

**FIGURE 3 | Electrophysiological characteristics of antidromically identified neurons. (A)** Representative electrophysiological recording trace of a cortical neuron responding to the striatal stimulation by an antidromic spike (left). The occurrence of a spontaneous AP just before the stimulation collides with the antidromic spike resulting in the absence of the antidromic

response after the stimulation (right). **(B)** Neurons were divided in two groups according to their response (black dots) or not (white dots) to the striatal stimulation, the graphs show the individual data (large dots) as well as the mean ± SEM of electrophysiological characteristics: Mean frequency (Hz), AP duration (ms) and percentage of spikes included in a burst (%). \*\*p < 0.01.

suggesting that DA can exert a direct influence onto M1 neuronal activity.

#### **DA MODULATION OF M1 NEURONAL ACTIVITY** *IN VIVO*

We investigated the hypothesis that DA directly modulates M1 activity using single unit electrophysiological recordings in anesthetized mice and showed that DA has a direct influence on putative pyramidal neuron activity in M1. In our experiments, D2 receptor activation increased neuronal spike firing rate by enhancing the number of spikes, but not the percentage of spikes in bursts. Our results are consistent with a previous study showing in rats that a local injection of haloperidol induced an increase of motor threshold and a reduced size of motor maps, suggesting

an excitatory role of D2 receptor activation in M1 (Hosp et al., 2009).

Awenowicz and Porter (2002) previously reported the involvement of the two types of DA receptors in a synergistic manner in rat motor cortex. Their study showed a global inhibitory effect in pyramidal neuron activity following iontophoretic DA (0.1 M) administration. The discordance between their results and ours could be explained by the difference in the local injection procedure (iontophoresis versus pressure ejection). Although this study showed a DA effect on M1 electrophysiological activity, one must consider the possible electrophysiological perturbations in neuronal activity induced by iontophoresis injection. Indeed, it was recently shown that high current injections near neurons can lead to decreased neuronal firing rates (Moore et al., 2011).

Our results showing enhanced putative pyramidal neuron activity after D2 receptor activation are consistent with the finding that quinpirole acting on D2 receptors increases the excitability of layer V pyramidal neurons in the PFC of adult mice (Gee et al., 2012). This study, performed in brain slices, demonstrated an excitatory effect of D2 receptor activation on PFC pyramidal neurons by the induction of a calcium-channel-dependent after-depolarization.

However, other scenarios might also contribute to the effects of D2 agonists on motor cortex excitability. On one hand, DA effects on putative pyramidal neuron activity might be local, but indirect via the modulation of cortical inhibitory interneurons. Indeed, in primate PFC, DA axons establish direct contacts with interneurons expressing parvalbumin (Sesack et al., 1998). More recently, Santana et al. (2009) reported that inhibitory interneurons in rats PFC express D1 and D2 receptors. Moreover, electrophysiological studies from mice and rat PFC slices suggest that D2 receptor activation inhibits GABA interneurons (Xu and Yao, 2010), resulting in a decreased GABA release probability and

a reduction of inhibitory postsynaptic currents (Seamans et al., 2001). Although these studies were conducted in prepubertal animals, they suggest that D2 receptor agonists could decrease the activity of inhibitory interneurons, thus indirectly enhancing pyramidal neuron activity.

On the other hand, DA effects observed in this study might be exerted directly on pyramidal neurons. Indeed, a recent study in PFC showed that pyramidal neurons in rats express the D2 receptor mRNA (Santana et al., 2009). Thus, DA may directly enhance pyramidal neuron activity by activating D2 receptors.

Additionally, our pharmacological data cannot rule out an effect of D2 agonists on D2 autoreceptors on dopaminergic terminals. The presynaptic modulation of DA release by D2 agonists might induce postsynaptic D1 as well as D2 receptor modulation. However, in our conditions, since the D2 agonist would directly stimulate the postsynaptic D2 receptors, the presynaptic inhibition of DA release would mainly result in a decrease of D1 receptors stimulation.

#### **FUNCTIONAL AND PATHOLOGICAL CONSIDERATIONS**

Finally, it is interesting to note that our study shows that DA innervation in mouse M1 specifically targets an area that corresponds to the forelimb representation (Tennant et al., 2011). DA in motor cortex is known to regulate novel motor skill learning (Molina-Luna et al., 2009; Hosp et al., 2011). Furthermore, recent studies in rats showed that unilateral disruption of DA projections to M1 leads to a reduction of forelimb representation map associated with a reduction of intracortical microstimulationinduced distalforelimb movements (Viaro et al.,2011) and impairs motor skill learning (Molina-Luna et al., 2009; Hosp et al., 2011). Thus, these studies suggest a potential role of DA in the modulation of forelimb representation in M1. Considering pathological conditions, patients with *de novo* Parkinson's disease (PD), a neurodegenerative disorder caused mainly by disruption of the DA nigrostriatal pathway, show abnormally high grip force in a precision lifting task (Fellows and Noth, 2004). Moreover, Gaspar et al. (1991) have shown that PD patients have altered dopaminergic innervation of motor cortex. Disruption of fine motor skills may involve the degeneration of dopaminergic terminals in M1. Taken together, these results suggest a role for DA in fine motor skill control of forelimb. Interestingly, studies on human M1 also reported that LTP cannot be induced in PD patients (Morgante et al., 2006) as long as they are off dopaminergic medication (Huang et al., 2011). Furthermore, Morgante et al. (2006) indicated that abnormal motor cortex plasticity may underlie the development of L-DOPA induced dyskinesia in PD patients. These results suggest that DA could be a key component in M1 plasticity.

### **CONCLUSION**

In conclusion, our study provides for the first time a precise description of the dopaminergic projections to M1 in mice, with a stereological quantification of DA innervation density and fiber distribution within M1. In addition, we show an increased putative pyramidal neurons firing activity induced by local D2 agonist. The exact mechanisms of this modulation remain to be elucidated and the role of D1 receptors has yet to be considered. Nevertheless, these results constitute a new step towards understanding the mechanisms by which DA modulates M1 activity and suggest that altered local D2 modulation may be involved in pathophysiological conditions associated with disturbed DA homeostasis.

#### **ACKNOWLEDGMENT**

This work was funded by grants from the Fondation de France, FEDER No. 33552 and the CPER 5.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 03 December 2013; accepted: 10 February 2014; published online: 28 February 2014.*

*Citation: Vitrac C, Péron S, Frappé I, Fernagut P-O, Jaber M, Gaillard A and Benoit-MarandM (2014) Dopamine control of pyramidal neuron activity inthe primary motor cortex via D2 receptors. Front. Neural Circuits 8:13. doi: 10.3389/fncir.2014.00013 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014Vitrac, Péron, Frappé, Fernagut, Jaber, Gaillard and Benoit-Marand. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Subcircuit-specific neuromodulation in the prefrontal cortex

## **Nikolai Dembrow \* and Daniel Johnston**

Center for Learning and Memory, The University of Texas at Austin, Austin, TX, USA

#### **Edited by:**

Allan T. Gulledge, Geisel School of Medicine at Dartmouth, USA

#### **Reviewed by:**

Rodrigo Andrade, Wayne State University School of Medicine, USA Yasuo Kawaguchi, National Institute for Physiological Sciences, Japan

#### **\*Correspondence:**

Nikolai Dembrow, Center for Learning and Memory, The University of Texas at Austin, 1 University Station Stop C7000, Austin, TX 78712-0805, USA e-mail: nikolai@mail.clm.utexas.edu During goal-directed behavior, the prefrontal cortex (PFC) exerts top-down control over numerous cortical and subcortical regions. PFC dysfunction has been linked to many disorders that involve deficits in cognitive performance, attention, motivation, and/or impulse control. A common theme among these disorders is that neuromodulation of the PFC is disrupted. Anatomically, the PFC is reciprocally connected with virtually all neuromodulatory centers. Recent studies of PFC neurons, both in vivo and ex vivo, have found that subpopulations of prefrontal projection neurons can be segregated into distinct subcircuits based on their long-range projection targets. These subpopulations differ in their connectivity, intrinsic properties, and responses to neuromodulators. In this review we outline the evidence for subcircuit-specific neuromodulation in the PFC, and describe some of the functional consequences of selective neuromodulation on behavioral states during goal-directed behavior.

**Keywords: neuromodulation, projection neurons, prefrontal cortex**

#### **INTRODUCTION**

The prefrontal cortex (PFC) guides experience-driven, goaldirected behavior. Hallmarks of PFC damage include incapacity to suppress impulsive responses and inability to switch strategies when a previously learned rule is no longer successful (Milner, 1963; Shallice and Burgess, 1991; Aron et al., 2004). Similar deficits are observed in non-human primates performing rule-guided tasks after the PFC is lesioned or inactivated (Brozoski et al., 1979; Dias et al., 1996). Although rodents do not exhibit goal directed behaviors as sophisticated as those observed in primates, disrupting functionally analogous regions of the rodent PFC impairs performance in a variety of tasks designed to test executive function: temporal control (Risterucci et al., 2003; Narayanan and Laubach, 2006; Narayanan et al., 2013), attention (Broersen and Uylings, 1999; Chudasama et al., 2005; Kahn et al., 2012), working memory (Floresco et al., 1997; Dias and Aggleton, 2000; Lee and Kesner, 2003), and strategy shifting (Ragozzino et al., 1999a,b, 2003; Rich and Shapiro, 2007, 2009). Different components of PFC function may be mediated by different PFC subregions (well reviewed in Robbins, 1996; Uylings et al., 2003; Kesner and Churchwell, 2011). Elucidating the precise cellular constituents and mechanism(s) underlying PFC function, and how it exerts top-down control over other brain regions, remains an important area of exploration.

One critical component for PFC function is the contribution of neuromodulatory inputs. How neuromodulation contributes to the executive control of goal directed behavior has been largely examined on two separate levels: actions of neuromodulators on generic neurons and/or synapses within the PFC, and the effects of neuromodulators on network activity in conjunction with behavioral performance. The goal of this review is to begin to bridge these two levels of analysis by detailing recent advances in mapping out connectivity, neuromodulatory responses and the intrinsic properties of different classes of projection neurons in the rodent PFC.

#### **NEUROMODULATION AND THE PREFRONTAL CORTEX**

The efficacy by which the PFC drives behavior is highly sensitive to the actions of neuromodulators. Best studied among these include noradrenaline (NA), acetylcholine (ACh), serotonin (5-HT), and dopamine (DA). Other neuromodulators (histamine, adenosine, and many neuropeptides) can also alter PFC function, but for the purposes of this mini review we will focus on these four. The primary source of neuromodulators in the PFC is from terminals originating from subcortical neuromodulatory systems (**Figure 1A**). Infusing neuromodulators or their receptor agonists/antagonists directly into the PFC changes behavioral performance (Febvret et al., 1991; Broersen et al., 1995; Ragozzino and Kesner, 1998; Mao et al., 1999; Wall et al., 2001; Winstanley et al., 2003; Bang and Commons, 2012; Yang et al., 2013). Optimal PFC function occurs within a tight range of neuromodulatory action: both too little and too much of a given neuromodulator will impair task performance (Broersen et al., 1995; Zahrt et al., 1997; Ragozzino and Kesner, 1998; Mao et al., 1999; Granon et al., 2000; Wall et al., 2001; Winstanley et al., 2003; Vijayraghavan et al., 2007; Wang et al., 2007; Yang et al., 2013).

Anatomically, the PFC is reciprocally connected with these neuromodulatory centers (**Figure 1A**). While none of the neuromodulatory centers exclusively targets the PFC, there is a topographical organization to these outputs (Berger et al., 1991;

Bang et al., 2012; Zaborszky et al., 2013). For example individual LC inputs, but not BF inputs, preferentially target either the ventral mPFC or dorsal mPFC (Chandler and Waterhouse, 2012). The PFC neuromodulatory inputs may be specialized in some cases. The PFC is one of few cortical regions that receive input from both the medial and dorsal portions of the RN (Bang et al., 2012). Similarly, dopaminergic fibers originate from the VTA and SN in the rodent PFC (Berger et al., 1991). The density of cholinergic fibers, and of the enzyme acetylcholinesterase (the enzyme responsible for removing extracellular acetylcholine), is densest in the mPFC, suggesting that cholinergic input is particularly tightly controlled there (Werd et al., 2010; Zaborszky et al., 2012). It is important to note that in addition to the neuromodulatory substance each center produces, some of their projections also can contain fast inhibitory (GABAergic) and/or excitatory (glutamatergic) transmitters (Febvret et al., 1991; Hur and Zaborszky, 2005; Bang and Commons, 2012; Chandler and Waterhouse, 2012). Thus, the effect of neuromodulatory centers on the PFC may act on multiple time scales.

In addition to receiving input from subcortical neuromodulatory systems, glutamatergic outputs from PFC selectively target specific neuron populations within each neuromodulatory center. In the VTA, prefrontal inputs synapse upon the dopaminergic neurons that project back to the PFC, but not with neurons projecting to the accumbens. Conversely, prefrontal inputs synapse onto GABAergic neurons projecting to nucleus accumbens, but not those projecting to the PFC (Carr and Sesack, 2000). Prefrontal inputs to LC synapse onto the dendrites of noradrenergic neurons in the peri-LC region (Luppi et al., 1995). In the dorsal RN, prefrontal inputs synapse primarily onto GABAergic interneurons, although they also synapse on serotonergic neurons as well (Jankowski and Sesack, 2004; Commons et al., 2005). Similarly prefrontal projections to the BF synapse onto inhibitory parvalbumin-positive interneurons, but not cholinergic projection neurons, in the horizontal limb of the BF (Zaborszky et al., 1997). Consistent with selective innervation of neuromodulatory centers, PFC stimulation promotes burst firing in VTA and increased activity in LC, but inhibits firing in dorsal raphe nuclei (DRN) and BF (Overton et al., 1996; Tong et al., 1996; Jodo and Aston-Jones, 1997; Jodo et al., 1998; Celada et al., 2001). As such, the PFC is able to regulate its own neuromodulatory input by driving or inhibiting subcortical centers.

In addition to regulating its own neuromodulatory input, the PFC may also alter the output of neuromodulatory centers to other brain areas. This provides an interesting means by which the PFC might exert a more global "top-down" control of behavior. A small population of PFC neurons may be responsible for this output, as individual neurons within the PFC innervate more than one neuromodulatory center. For instance, a small population of PFC neurons project to both the RN and the VTA (Gabbott et al., 2005; Vázquez-Borsetti et al., 2009, 2011). Similarly, a subset of PFC neurons project to both RN and LC (Lee et al., 2005). The extent to which these projections represent a means to exert topdown control over other brain regions represents an exciting area of exploration for future studies.

## **PFC PROJECTION NEURONS**

By using optogenetic stimulation in vivo, several studies have demonstrated that the PFC can alter behavior. In one important study, Warden et. al. tested the effects of optogenetically driving the PFC during a forced-swim test (Warden et al., 2012). Driving PFC output to the DRN promoted active escape, while driving PFC output to the lateral habenula inhibited escape behavior. These results suggest that different subsets of PFC output neurons drive distinct, even mutually antagonistic, behaviors. Other groups have shown that PFC output to the amygdala, striatum, and DRN shift behavioral output (Challis et al., 2014; Vialou et al., 2014). But what is the identity of these output neurons, and what electrophysiological properties and connectivity patterns do they exhibit?

To better understand how the PFC *exerts top-down control* over downstream targets, it is useful to identify and characterize the neurons that provide output from the PFC. Most of this work has been done in the rodent medial prefrontal cortex. Cytoarchitectonically, the rodent PFC differs from the primate PFC in that it is agranular cortex, meaning that it lacks a granulecell layer 4. Despite this, supragranular pyramidal neurons (in layers 2–3) can be demarcated from infragranular pyramidal neurons (in layers 5–6) by a band of thalamocortical fibers in deep layer 3 (Kubota et al., 2007; Cruikshank et al., 2012; Hirai et al., 2012).

Output neurons of the PFC are broadly divided into two categories: (1) pyramidal tract, or PT neurons, and (2) intratelencephalic, or IT neurons (Molnár and Cheung, 2006; Shepherd, 2013). PT neurons project subcortically via the pyramidal tracts projecting to ipsilateral striatum, thalamus, and/or brainstem. PT neurons are located within the infragranular layers. Unlike motor and sensory cortex, both PT and IT L5 neurons in the PFC are distributed throughout L5A and L5B (Dembrow et al., 2010; Hirai et al., 2012; Ueta et al., 2013, but see Cowan and Wilson, 1994). IT neurons are present in both supragranular and infragranular layers of PFC. They make long-range projections to ipsilateral perirhinal cortex, amygdala and striatum, as well as to the contralateral striatum and cortex (Gabbott et al., 2005; Hirai et al., 2012). The IT and PT categories express disparate transcription factors during development that guide their different long-range projections (Molyneaux et al., 2007, 2009; Fame et al., 2011). Recently, it has become evident that L5 PT and IT neurons within rodent PFC possess distinct intrinsic properties, local connectivity, and long-range inputs. Although most of these differences have been characterized in rodents, different categories of PFC pyramidal neurons are also present in humans and non-human primates (Foehring et al., 1991; Tasker et al., 1996; Chang and Luebke, 2007). PT and IT neuron categories can be further subdivided into groups based on gene expression, specific projection targets and laminar distribution. IT neurons are particularly diverse (Molyneaux et al., 2009). PT neurons project to the thalamus or spinal cord depending upon whether they are in L5A or 5B, respectively (Hirai et al., 2012; Ueta et al., 2013).

PT and IT neurons are connected within the PFC differently (Schematic **Figure 1B**). Most of this work has been done by Kawaguchi and colleagues in the cortical subregion immediately dorsal to, or within, the most dorsal part of mPFC. L2/3 IT and L5 IT neurons receive inputs from other IT neurons, but very infrequently from PT neurons (Morishima and Kawaguchi, 2006). In contrast, PT neurons receive inputs from both L2/3 and L5 IT neurons, as well as from other PT neurons. PT neurons exhibit higher rates of reciprocal connections (where two PT neurons mutually excite one another) than do IT neurons (Morishima and Kawaguchi, 2006; Morishima et al., 2011). Paired recordings of PT-like and IT- like neurons (categorized by their morphology) suggest that PT to PT connections display more synaptic augmentation (Wang et al., 2006). Such synaptic specializations may underlie the robustness of behavior-dependent persistent activity of neurons in the PFC, as compared with other cortical areas (Hempel et al., 2000; Wang et al., 2006, 2008). PT and IT neurons receive different inhibitory inputs from local interneurons as well. PT and IT neurons seem to be equivalently connected to fast spiking interneurons (Otsuka and Kawaguchi, 2013), however PT neurons receive stronger inhibition from parvalbumin-positive fast spiking interneurons (Lee et al., 2014). Therefore, PT neurons may represent a final convergence point for numerous local excitatory and inhibitory synaptic inputs.

Equally important to the connections they make and receive, PT and IT neurons exhibit subpopulation-specific intrinsic electrophysiological properties. Such differences cause PT and IT neurons to respond to time-varying signals differently (Dembrow et al., 2010). When injected with a sinusoidal current with increasing frequency, PT neurons respond most strongly in the theta-frequency range (4–10 Hz), while IT neurons respond optimally to slower (<2 Hz) signals (**Figure 2**). The distinct subthreshold physiological properties of PT and IT neurons are consistent with differences in the hyperpolarization-activated cyclic nucleotide gated cation current (*h*-current) in these neurons. Blocking *h*-current changes the subthreshold properties of both neuron types, abolishing differences in the time-dependent membrane filtering both at the soma and dendrite (Dembrow et al., 2010; Kalmbach et al., 2013). In the apical dendrites, where *h*-channels are preferentially targeted in pyramidal neurons in the hippocampus and somatosensory cortex (Magee, 1999; Williams and Stuart, 2000; Berger et al., 2001), subthreshold differences between IT and PT neurons are more pronounced (Kalmbach et al., 2013). As a result of *h*-current related properties, PT neurons integrate dendritic inputs over a narrow time window, and are thus preferentially responsive to coincident inputs. On the other hand, IT neurons summate over wider time windows, allowing them to better integrate nonsynchronous input.

PT and IT neurons in PFC also express different active properties. IT neurons have a lower threshold for action potential initiation, and greater action potential half-width than PT neurons (Dembrow et al., 2010). These differences are also observed in anaesthetized animals in vivo (Cowan and Wilson, 1994). Once driven to spike, PT and IT neurons exhibit differing firing patterns. In response to a long (10 s) square step of current sufficient to drive action potentials depolarization, PT neurons show spike frequency acceleration. In contrast, IT neurons show significant spike frequency accommodation (Morishima and Kawaguchi, 2006; Otsuka and Kawaguchi, 2008; Dembrow et al., 2010). In other cortical regions, the acceleration in spiking in is caused by a "D"-type potassium current (Miller et al., 2008). The source of IT spike accommodation is less clear. Enhancing small conductance calcium-activated potassium channel (SK) type currents can contribute to spike frequency accommodation (Pedarzani et al., 2005). IT neurons display a pronounced slow afterhyperpolarizations (Kalmbach et al., 2013), which may be partially caused by calcium-sensitive potassium channels (but see Gulledge et al., 2013). Alternatively, differences in accommodation may be caused by *m*-current, sodium-dependent potassium current, sodium pump activity, or differences in the inactivation recovery time of sodium channels that drive the spikes (Schwindt et al., 1989; Santini and Porter, 2010; Gulledge et al., 2013).

The importance of differences in ion channel expression in PT and IT neurons is highlighted by observations that manipulating these ion channels alters working memory performance. Manipulations of *h*-current within the PFC alter working memory task performance in both monkeys and rodents. Removal the hyperpolarization-activated cyclic nucleotide-gated channel 1 (HCN1) subunit from the mPFC impaired performance on a delayed alternation task (Thuault et al., 2013), while *h*-channel blockade, or HCN1 knockdown, improved memory performance (Wang et al., 2007). Similarly, both SK channel and *m*-current blockade can enhance working memory function (Brennan and Arnsten, 2008; Wang et al., 2011). Differences in ion channel expression in prefrontal PT and IT neurons likely contribute to their functional role within executive circuits.

## **PROJECTION-SPECIFIC NEUROMODULATION**

PT and IT neurons also respond differently to neuromodulation. Neuromodulators change both subthreshold and suprathreshold responses in PT and IT neurons. In the presence of muscarinic activation, PT neurons display a subtle reduction in their subthreshold resonance (Dembrow et al., 2010). More strikingly, PT neurons shift into a persistent firing-primed state, wherein they respond to a brief suprathreshold input with persistent firing lasting tens of seconds (**Figure 2**, #4). While cholinergic modulation enhances the afterdepolarization in IT neurons, it causes no change in their subthreshold resonance, results in little, if any, persistent firing. Thus, PT and IT neurons respond to cholinergic input differently. Similarly, metabotropic glutamate receptor group I activation causes both PT and IT neurons to exhibit a slow after depolarization, but causes a long lasting reduction in *h*-related parameters only in PT neurons (**Figure 2**, #2: Kalmbach et al., 2013). Alpha-2A noradrenergic modulation alters *h*-related properties as well. As a result, noradrenergic and metabotropic glutamate receptor shift PT neurons from preferentially responding to coincident inputs to more broadly tuned integrators, effectively making them similar to IT neurons. Importantly, alpha-2A adrenergic modulation increases the input resistance of both PT and IT neurons, increasing their action potential output in response to depolarization (**Figure 2**, #3: Dembrow et al., 2010). Similarly, adenosine hyperpolarizes both IT-like neurons PT-like neurons via the A1 receptor, although the amount of hyperpolarization is greater in IT neurons (van Aerde et al., 2013). In all of these cases, the responses of PT and IT neurons to neuromodulatory stimulation are constrained by their differential patterns of ion channel expression.

Alternatively, the difference in neuromodulatory responses is the function of cell-type-specific expression of various receptor subtypes in IT and PT neurons. PT neurons are inhibited by serotonin via 5-HT1A receptors (**Figure 2**, #5), while IT neurons are excited by serotonin via 5-HT2A receptors (Avesar and Gulledge, 2012). Interestingly, 2A-dependent excitation also occurred in supragranular IT neurons that projected contralaterally, while other L2/3 pyramidal neurons were inhibited by serotonin (Avesar and Gulledge, 2012). Consistent with this, in BAC mice expressing green fluorescent protein driven by 5-HT2A receptor expression in the neocortex was most dense in L5A (Weber and Andrade, 2010), a sublayer enriched with IT-like neurons in sensory and motor cortical regions (Reiner et al., 2003; Anderson et al., 2010; Groh et al., 2010).

Dopaminergic modulation also depends on long-range projection types. Reports on the effects of DA in PFC neurons have been complicated by the diversity of response types, which may be due to several complicating factors: dopamine's instability, diverse actions on interneurons, effects on glutamatergic transmission, and the diversity of DA receptor subtypes. The recent generation of BAC mice selectively expressing reporter genes in neurons that express different DA receptor subtypes has clarified some of this ambiguity. L5 neurons expressing D1 receptors exhibit the physiological and anatomical hallmarks of IT neurons, while D2 receptor expressing L5 neurons have properties consistent with PT neurons (Gee et al., 2012; Seong and Carter, 2012). Further, D1 agonists enhance the firing responses of IT-like neurons via PKA (**Figure 2**, #3). Conversely, prolonged optogenetic activation of glutamatergic inputs paired with the D2 agonist quinpirole generates a long-lasting afterdepolarization that can produce persistent firing in PT-like, but not IT-like, projection neurons (**Figure 2**, #4). It remains less clear whether all IT neurons are D1 receptor positive, or whether they are limited to specific subpopulations of IT neurons (e.g., those projecting to the contralateral cortex versus amygdala). Similarly, all PT neurons may not be D2-receptor positive. An earlier study in rats examining receptor mRNA expression in different projection neurons reported that corticothalamic, corticocortical and corticostriatal neurons express D1 and/or D2 receptors, while D2 receptors are absent from corticopontine, corticospinal, and corticothalamic neurons (Gaspar et al., 1995), a result at odds with data from the BAC mice. Further studies will be needed to clarify these discrepancies, and to test whether the expression of other dopamine receptor subtypes (D3, D4, D5) are segregated by projection subtype.

The importance of selective modulation of IT neurons in PFC has been recently highlighted in several in vivo studies. Mice trained in an operant delay task, where they were trained to nosepoke for food 20 s after a light stimulus, were unable to perform correctly timed responses when D1-positive neurons in the PFC were photoinactivated (Narayanan et al., 2012). Conversely, stimulating the D1-positive neurons enhanced temporal precision of behavior. These data are in line with data that infusion of D1 antagonists into the PFC impairs temporal precision in the same task in rats. There may also be a D1-sensitive, IT-subcircuit important for driving food consumption. Infusion of a D1 antagonist into the PFC alters consumption (Touzani et al., 2010; Nair et al., 2011), while feeding activates D1-positive neurons in the PFC. Optogenetically stimulating them increases food intake, while bilateral inactivating them reduces food intake (Land et al., 2014). The downstream target of these neurons is the ipsilateral amygdala. Combined, these studies suggest that the disparate effects of neuromodulatory transmitters may reflect differential expression of receptor subtype and ionic mechanisms in prefrontal neurons projecting to specific downstream brain regions.

#### **FUTURE DIRECTIONS**

PFC-neuromodulatory circuits are beginning to be mapped at the cellular and subcellular level. Rather than uniformly increasing or decreasing activity, the effect of neuromodulators on prefrontal neurons depends upon their long-range targets. Understanding how these modulatory systems contribute to information flow in the PFC will be important for understanding how the PFC exerts top-down control of behavior. This map, however, represents an initial step towards elucidating how these dynamic and plastic systems function (Marder, 2012). Future studies will need to identify the specific neuron subtypes contributing to mnemonic persistent activity, and how neuromodulatory systems selectively regulate synaptic connections and intrinsic excitability within this network. Most importantly, complex models that take into account differences in connectivity, information processing, and long range connections to downstream targets will be necessary to elucidate how the PFC drives goal-directed behaviors.

#### **ACKNOWLEDGMENTS**

Work was supported by NIMH grants MH048432 and MH094839 and Memory and Cognitive Disorders Award from the McKnight Foundation to Daniel Johnston and Michael Mauk, and a Brain and Behavior Foundation Young Investigator award from the Walter K. Sartory Estate to Nikolai Dembrow. Thanks to Dr. Brian Kalmbach for discussion and helpful comments on the manuscript.

### **REFERENCES**


the firing properties of hippocampal pyramidal neurons. *J. Biol. Chem.* 280, 41404–41411. doi: 10.1074/jbc.M509610200


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 April 2014; accepted: 05 May 2014; published online: 05 June 2014*. *Citation: Dembrow N and Johnston D (2014) Subcircuit-specific neuromodulation in the prefrontal cortex. Front. Neural Circuits 8:54. doi: 10.3389/fncir.2014.00054 This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Dembrow and Johnston. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Activity-dependent serotonergic excitation of callosal projection neurons in the mouse prefrontal cortex

## *Emily K. Stephens1,2 ‡, Daniel Avesar 1,2 †‡ and Allan T. Gulledge1,2 \**

<sup>1</sup> Department of Physiology and Neurobiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA <sup>2</sup> Program in Experimental and Molecular Medicine, Dartmouth College, Hanover, NH, USA

#### *Edited by:*

Guillermo Gonzalez-Burgos, University of Pittsburgh, USA

#### *Reviewed by:*

Rodrigo Andrade, Wayne State University School of Medicine, USA Jean-Claude Béïque, University of Ottawa, Canada

#### *\*Correspondence:*

Allan T. Gulledge, Department of Physiology and Neurobiology, Geisel School of Medicine at Dartmouth, One Medical Center Drive, Dartmouth-Hitchcock Medical Center, Borwell 704E, Lebanon, NH 03756, USA e-mail: allan.gulledge@dartmouth.edu

## *†Present address:*

Daniel Avesar, Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403, USA

‡Emily K. Stephens and Daniel Avesar have contributed equally to this work.

Layer 5 pyramidal neurons (L5PNs) in the mouse prefrontal cortex respond to serotonin (5- HT) according to their long-distance axonal projections; 5-HT1A (1A) receptors mediate inhibitory responses in corticopontine (CPn) L5PNs, while 5-HT2A (2A) receptors can enhance action potential (AP) output in callosal/commissural (COM) L5PNs, either directly (in "COM-excited" neurons), or following brief 1A-mediated inhibition (in "COM-biphasic" neurons). Here we compare the impact of 5-HT on the excitability of CPn and COM L5PNs experiencing variable excitatory drive produced by current injection (DC current or simulated synaptic current) or with exogenous glutamate. 5-HT delivered at resting membrane potentials, or paired with subthreshold depolarizing input, hyperpolarized CPn and COM-biphasic L5PNs and failed to promote AP generation in COM-excited L5PNs. Conversely, when paired with suprathreshold excitatory drive generating multiple APs, 5-HT suppressed AP output in CPn L5PNs, enhanced AP generation in COM-excited L5PNs, and generated variable responses in COM-biphasic L5PNs. While COM-excited neurons failed to respond to 5-HT in the presence of a 2A receptor antagonist, 32% of CPn neurons exhibited 2A-dependent excitation following blockade of 1A receptors. The presence of pharmacologically revealed 2A receptors in CPn L5PNs was correlated with the duration of 1A-mediated inhibition, yet biphasic excitatory responses to 5-HT were never observed, even when 5-HT was paired with strong excitatory drive. Our results suggest that 2A receptors selectively amplify the output of COM L5PNs experiencing suprathreshold excitatory drive, while shaping the duration of 1A-mediated inhibition in a subset of CPn L5PNs. Activity-dependent serotonergic excitation of COM L5PNs, combined with 1Amediated inhibition of CPn and COM-biphasic L5PNs, may facilitate executive function by focusing network activity within cortical circuits subserving the most appropriate behavioral output.

**Keywords: serotonin, 5-HT2A receptor, 5-HT1A receptor, prefrontal cortex, executive function, pyramidal neuron, mouse**

## **INTRODUCTION**

The prefrontal cortex (PFC) provides "top-down" executive control of behavior, and prefrontal processing is profoundly influenced by a variety of neuromodulatory transmitters, including serotonin (5-HT; for review, see Robbins and Roberts, 2007; Puig and Gulledge, 2011). Deficits in serotonergic input to the PFC impair executive control in rats (Geyer et al., 1976; Winstanley et al., 2004a; Koot et al., 2012), monkeys (Clarke et al., 2004, 2005), and humans (LeMarquand et al., 1998; Worbe et al., 2014), as typified by inappropriate response selection, impulsivity, and/or perseverative behavior. Serotonergic mechanisms in the PFC are also implicated in episodic (Bekinschtein et al., 2013) and working (Williams et al., 2002) memory.

Cortical neurons respond to 5-HT primarily through three postsynaptic receptor subtypes, metabotropic Gi/o-coupled 5- HT1A (1A) and Gq-coupled 5-HT2A (2A) receptors that are expressed in subpopulations of excitatory (pyramidal) and inhibitory (non-pyramidal) cortical neurons, and ionotropic 5-HT3 receptors preferentially expressed in subpopulations of non-pyramidal neurons (Morales and Bloom, 1997; Willins et al., 1997; Aznar et al., 2003; Puig et al., 2004, 2010; Santana et al., 2004; Lee et al., 2010; Weber and Andrade, 2010). In cortical pyramidal neurons, postsynaptic 1A and 2A receptors mediate opposing inhibitory and excitatory responses, respectively (for review, see Puig and Gulledge, 2011), and are directly implicated in a variety of psychiatric diseases. For instance, 1A receptor density in the human PFC is inversely correlated with anxiety (Tauscher et al., 2001), while 1A agonists have anxiolytic and antidepressant effects (Goldberg and Finnerty, 1979; Feighner and Boyer, 1989; Akimova et al., 2009; Hesselgrave and Parsey, 2013). On the other hand, excessive activation of cortical 2A receptors contributes to the etiology of schizophrenia (Geyer and Vollenweider, 2008; Benekareddy et al., 2010), and 2A receptors are preferred targets for atypical antipsychotics (Meltzer et al., 1989; González-Maeso and Sealfon, 2009) and hallucinogens (Willins and Meltzer, 1997; Vollenweider et al., 1998). However, little is known regarding how cortical 1A and 2A receptors interact to facilitate normal cognitive function.

Layer 5 pyramidal neurons (L5PNs) are a major source of output from the PFC to distal cortical and subcortical brain regions. While many cortical pyramidal neurons express both 1A and 2A receptors (Ashby et al., 1994; Amargós-Bosch et al., 2004; Béïque et al., 2004; Santana et al., 2004; W¸edzony et al., 2008; Moreau et al., 2010), most display purely inhibitory or excitatory responses to 5-HT (Davies et al., 1987; Araneda and Andrade, 1991; Tanaka and North, 1993; Foehring et al., 2002; Zhong and Yan, 2011), although biphasic responses involving both receptor subtypes also occur (Araneda and Andrade, 1991; Puig et al., 2005). We recently revealed that, in the mouse medial prefrontal cortex (mPFC), the direction of serotonergic modulation of L5PNs is correlated with their long-distance axonal projections (Avesar and Gulledge, 2012). 5-HT, acting at 2A receptors, selectively excites callosal/commissural (COM) projection neurons that innervate the contralateral cerebral hemisphere. While most COM neurons show a purely excitatory response to 5-HT ("COM-excited" neurons), a subpopulation of COM neurons respond to 5-HT with biphasic responses in which 1A-mediated inhibition is followed by 2A-dependent excitation ("COM-biphasic" neurons). On the other hand, 5-HT generates only 1A-dependent inhibition in brainstem-projecting corticopontine (CPn) neurons. Here we have tested the interaction of 5-HT receptors in COM and CPn L5PNs experiencing variable extrinsic excitatory drive. Our results suggest that selective, activity-dependent serotonergic regulation of cortical projection neurons may facilitate executive function by focusing network activity in circuits subserving the most appropriate behavioral output.

## **MATERIALS AND METHODS**

#### **ANIMALS**

Experiments involved C57BL/6J (6-to-8-week-old) male and female mice according to methods approved by the Institutional Animal Care and Use Committee of Dartmouth College.

#### **RETROGRADE LABELING**

Red or green fluorescent beads (Retrobeads, Lumafluor Inc.) were injected unilaterally into either the prelimbic cortex (to label COM neurons) or the pons (to label CPn neurons) using age-appropriate coordinates (Paxinos and Franklin, 2004). Animals were anesthetized throughout surgeries with vaporized isoflurane (∼2%). Following craniotomy, a microsyringe was lowered into the brain region of interest, and 300–700 nL of undiluted Retrobead solution was injected over a 10 min period. Animals were allowed to recover from surgery for at least 72 h before use in electrophysiological experiments. The location of dye injection was confirmed *post hoc* in coronal sections of the mPFC or brainstem.

#### **SLICE PREPARATION**

Following isoflurane anesthesia and decapitation, brains were quickly removed into artificial cerebral spinal fluid (ACSF) containing, in mM: 125 NaCl, 25 NaHCO3, 3 KCl, 1.25 NaH2PO4, 0.5 CaCl2, 6 MgCl2, and 25 glucose, saturated with 95% O2/5% CO2. Coronal brain slices (250 μm thick) of the mPFC were cut using a Leica VT 1200 slicer and stored in a chamber filled with ACSF containing 2 mM CaCl2 and 1 mM MgCl2. Slices were stored at

35◦C for ∼45 min, then kept at room temperature for up to 8 h prior to use in experiments.

#### **ELECTROPHYSIOLOGY**

Slices were transferred to a recording chamber continuously perfused with oxygenated ACSF at 35–36◦C and visualized with an Olympus BX51WI microscope. Whole-cell current-clamp recordings of L5PNs were made with patch pipettes (5–7M-) filled with, in mM, 135 K-gluconate, 2 NaCl, 2 MgCl2, 10 HEPES, 3 Na2ATP, and 0.3 NaGTP (pH 7.2 with KOH). Epifluorescence illumination (Cairn Research; 470 or 530 nm LEDs) was used to identify labeled COM or CPn neurons in the prelimbic cortex for whole-cell recording. CPn neuron somata are exclusively found in layer 5, while COM neurons reside in both layers 5 and 2/3 (Morishima and Kawaguchi, 2006). In targeting layer 5 COM neurons along the narrowing dorsal–ventral axis of the medial cortex, we targeted COM neurons in the lateral half of labeled neurons (at least 250 μm from the pia) but above layer 6, as identified by higher-density somata and the presence of "inverted" pyramidal neurons (Van Brederode and Snyder, 1992). Data were acquired with Axograph software (Axograph Company) using a BVC-700 amplifier (Dagan Corporation) and an ITC-18 digitizer (HEKA Instruments). Membrane potentials were sampled at 25 kHz, filtered at 5 kHz, and corrected for a liquid junction potential of +12 mV.

5-HT (100 μM) was dissolved in ACSF and loaded into a patch pipette placed ∼50 μm from the targeted soma. After whole-cell break-in, neurons were initially classified as 5-HT- "inhibited," "excited," or "biphasic" based on their response to 5-HT (delivered for 1 s at ∼10 PSI) during periods of continuous AP generation (∼6 Hz) evoked by DC current injection through the recording electrode. Neurons referred to as "COMexcited" or "COM-biphasic" were classified based on this initial response to 5-HT alone, regardless of their responsiveness to 5-HT during other manipulations (e.g., 5-HT responses generated at resting membrane potentials; RMPs). Serotonergic inhibition was quantified as the duration of AP cessation, while excitatory responses were quantified as the peak increase in instantaneous spike frequency (ISF) relative to the average baseline firing frequency. Biphasic 5-HT responses were defined as a brief inhibition lasting at least 10 times the average baseline interspike interval, followed by an increase in AP frequency of at least 1 Hz. In some experiments, 5-HT receptors were selectively blocked with 1A (WAY 100635, 30 nM; Sigma– Aldrich) and/or 2A (MDL 11939, 500 nM; Tocris Bioscience) antagonists.

Somatic current injection was used to simulate excitatory synaptic input. The synaptic current waveform was modeled in NEURON (freely available at http://www.neuron.yale.edu) using a ball and stick "pyramidal" neuron with AMPA conductances (exponential rise and decay of 0.2 and 2 ms, respectively, a 500 pS maximum conductance, and reversing at 0 mV) placed at 1 μm intervals along a 1000 μm-long spiny dendrite (as in Gulledge et al., 2012). Synaptic currents generated by activation of all synaptic inputs at randomized timings twice within the 1500-ms-long simulation were recorded with a simulated voltage clamp (−70 mV) at the soma. The resulting current waveform was loaded into Axograph and used as a template for somatic current injections. Because of intrinsic cell-to-cell variability in input resistance and excitability, for each neuron the synaptic waveform was scaled in amplitude to generate ∼7 APs during baseline trials. The simulated synaptic current was then delivered 29 times at 3 s intervals, the exception being the sixth trial which was delayed 3 s due to application of 5-HT (100 μM, 1 s).

In other experiments, exogenous glutamate (1 mM; dissolved in ACSF) was focally applied from a patch pipette positioned near the proximal apical dendrite (∼50 μm from the soma). Depending on the experiment, the duration of glutamate puffs (8 to 20 ms duration) were adjusted (in 1 ms increments) to generate "just-subthreshold" (i.e., the maximum puff duration failing to generate APs) or reliably suprathreshold responses producing one or more APs. Single applications, or bursts of 5 applications (at 200 ms intervals), were delivered twenty times at 0.05 Hz, and 5-HT (100 μM, 1 s) applied from a second patch pipette midway between the fifth and sixth trials.

#### **STATISTICAL ANALYSES**

Data are presented as mean ± SEM. Comparisons across cell groups utilized one-way ANOVAs (with Bonferroni or Tukey-Kramer post-tests), while comparisons within groups was accomplished using 2-tailed Student's *t*-tests (paired or unpaired), or repeated measures ANOVA with IBM SPSS Statistics (version 21). Significance was defined as *p* < 0.05.

### **RESULTS**

#### **ACTIVITY-DEPENDENT SEROTONERGIC EXCITATION**

To explore the interaction of 5-HT and extrinsic depolarizing drive, we made whole-cell recordings from labeled COM and CPn L5PNs in slices of mouse mPFC. COM and CPn neurons were identified by the presence of fluorescent beads (Retrobeads) that had been injected into the contralateral prelimbic cortex, or ipsilateral pons, several days prior to slice preparation (**Figure 1A**). 5-HT (100 μM for 1 s) was delivered either at RMPs, or paired with suprathreshold DC current injection (**Figure 1B**). As previously observed (Avesar and Gulledge, 2012), 5-HT applied during current-induced activity inhibited AP generation in CPn neurons (*n* = 9), and either increased AP frequency ("excited"; *n* = 13), or generated "biphasic" (inhibitoryexcitatory) responses (*n* = 6), in COM L5PNs (**Table 1**). When applied to these same neurons at RMPs, 5-HT rapidly hyperpolarized CPn (by 3.5 ± 0.4 mV) and COM-biphasic neurons (by 3.8 ± 0.7 mV), but depolarized COM-excited L5PNs by 3.3 ± 0.2 mV (**Figure 1C**). Unlike hyperpolarizing responses at RMP, serotonergic depolarization of COM-excited neurons developed slowly, with the latency to peak depolarization (28 ± 3 s) being significantly delayed relative to the latency of peak excitation for 5-HT responses occurring during DC-current-induced activity (11 ± 1 s; *n* = 13; *p* < 0.05, paired Student's *t*-test; **Table 1** and **Figure 1D**), confirming that serotonergic excitation of COM neurons is facilitated by coincident excitatory drive (see also Araneda and Andrade, 1991; Zhang and Arsenault, 2005).

The duration of inhibitory serotonergic responses in CPn and COM-biphasic neurons was also sensitive to activity state, albeit in opposite directions. In CPn neurons, the durations of spike cessations (37 ± 2 s) during current-induced depolarization were longer than were hyperpolarizations generated at RPMs (28 ± 4 s; *p* < 0.05, paired Student's *t*-test; **Table 1**, **Figure 1D**). On the other hand, inhibitory responses in COM-biphasic neurons were longer at RMPs (25 ± 4 s) than they were during current-induced activity (18 ± 3 s; *p* < 0.05, paired Student's *t*-test; **Table 1**; **Figure 1D**). This differential effect of activity state on inhibitory serotonergic responses in CPn and COM-biphasic neurons likely reflects the larger driving force for 1A-driven potassium conductances at depolarized potentials (Andrade and Nicoll, 1987), and activitydependent recruitment of 2A-mediated excitation in depolarized COM-biphasic neurons. These data demonstrate that serotonergic regulation of COM and CPn neuron excitability is influenced by the level of coincident excitatory drive.

To further explore the interaction of 5-HT and excitatory drive in L5PNs, we applied 5-HT to neurons receiving suprathreshold simulated synaptic input generated via somatic current injection (see Materials and Methods and **Figure 2A**). Simulating synaptic drive allowed us to deliver an equivalent excitatory stimulus to each neuron (generating ∼7 APs) while avoiding the potentially confounding presynaptic effects of 5-HT on transmitter release (e.g., Kruglikov and Rudy, 2008; Troca-Marín and Geijo-Barrientos, 2010). In CPn neurons (*n* = 13), application of 5-HT produced a 98 ± 2% decrease in the number of APs generated by the simulated synaptic current, from 7.3 ± 0.7 APs in baseline conditions to a low of 0.2 ± 0.2 APs occurring 3.5 ± 0.3 s after 5-HT application, to 7.0 ± 1.0 APs during the final trial (**Figure 2B**; blue symbols). On the other hand, 5-HT application resulted in a 59 ± 7% increase in the number of APs in COM-excited L5PNs (*n* = 30), with the peak increase occurring 8.3 ± 1 s after 5-HT application. In these neurons, 5-HT increased the number of APs from 7.5 ± 0.3 APs in baseline conditions to a peak of 11.8 ± 0.6 APs after 5-HT application, with output returning to 6.9 ± 0.5 APs on the last trial (**Figure 2B**; red symbols). Finally, 5-HT generated a transient 59 ± 12% decrease in AP number in COM-biphasic L5PNs (*n* = 13), from 7.0 ± 0.5 in baseline conditions to a low of 3.4 ± 1 APs occurring 6.5 ± 2 s after 5-HT, with recovery to 6.2 ± 0.7 APs during the final trial (**Figure 2B**; orange symbols). The maximum effects of 5-HT on AP output were significantly different among CPn, COM-excited, and COM-biphasic L5PNs (*p* < 0.05, ANOVA, **Figure 2B**).

We also monitored RMPs (as measured just prior to synaptic current injections) and the integrals of voltage responses to simulated synaptic currents (**Figure 2B**). 5-HT increased response integrals in COM-excited L5PNs, but only during the first two trials immediately after 5-HT application (trials 6 and 7; **Figure 2B**). The mean increase in response integral was 12 ± 1% over baseline (*p* < 0.05). Conversely, 5-HT induced longer-lasting decreases in response integrals in CPn neurons (mean peak change was −21 ± 2%, lasting ∼20 s; *p* < 0.05), and transient dips in response integrals in COM-biphasic L5PNs (mean peak change was −15 ± 3% relative to baseline; *p* < 0.05).

5-HT hyperpolarized CPn and COM-biphasic L5PNs by 2.9 ± 0.3 mV and 2.5 ± 0.5 mV, respectively. Hyperpolarizing

were injected unilaterally into either the contralateral prelimbic cortex (to label COM L5PNs) or into the ipsilateral pons (to label CPn L5PNs). Dashed green lines represent the axons of cortical projection neurons conveying Retrobeads to the somata of pyramidal neurons in the mPFC. **(B)** Responses of labeled COM-excited (red), COM-biphasic (orange), or CPn (blue) neurons to focally applied 5-HT (green bar) delivered during periods of current-induced action potential (AP) generation (top) or at resting membrane potentials (RMP, bottom). Middle plots show ISF over time. Dashed-lines indicate 0 Hz (middle) n = 6), and CPn (blue; n = 9) L5PNs. Responses for each neuron were resampled at 1 Hz, and population data plotted as mean ± SEM for each resulting time point. **(D)** Comparisons of latencies to peak 5-HT responses across neuron subtypes (top) and durations of 5-HT-induced inhibition in CPn and COM-biphasic neurons (bottom), for 5-HT responses generated during periods of AP generation (opaque bars) or at RMPs (semi-opaque bars). Asterisks indicate significant differences (p < 0.05) between responses occurring at RMPs or while firing.



Asterisks indicate p < 0.05 vs 5-HT responses occurring during DC-evoked firing, paired Student's t-test.

responses occurred rapidly after 5-HT application, with latency to peak hyperpolarization of 4.4 ± 0.5 s in CPn neurons, and 3.7 ± 0.5 s in COM-biphasic neurons (i.e., peak hyperpolarization occurred within the first two post-5-HT trials). Conversely, 5-HT depolarized COM-excited L5PNs by 2.1 ± 0.3 mV,with peak depolarization occurring 8 ± 1 s after 5-HT application, a latency similar to the latency of peak serotonergic excitation observed in these same neurons during DC-current-induced AP generation (*p* = 0.55; paired Student's *t*-test; see also **Figure 3**). Surprisingly, 5-HT-induced depolarization of COM-excited neurons occurred only after the initial post-5-HT exposure to simulated synaptic currents (**Figure 2B**). 3 s after 5-HT application, at the start of trial 6, but before simulated synaptic currents were applied, COMexcited neurons were *hyperpolarized* relative to baseline values (by 0.6 ± 0.2 mV; *p* < 0.05), even as AP output increased moments later in response to that same trial's simulated synaptic input. By the beginning of the very next trial, trial 7, RMPs were significantly depolarized relative to baseline potentials (0.8 ± 0.3 mV; *p* < 0.05, **Figure 2B**), suggesting that serotonergic depolarization of COM-excited neurons may be facilitated by exogenous excitatory drive. We verified this by comparing the latency-topeak-excitation (determined by the timing of peak increases in spike rate and/or peak depolarization of the RMP) across COMexcited L5PNs experiencing different levels of excitatory drive (**Figure 3**). Pairing 5-HT with suprathreshold DC current, or simulated synaptic currents, significantly reduced latencies to peak excitatory responses (One-way ANOVA; *p* < 0.05), with peak increases in AP generation occurring significantly earlier than peak depolarization of the RMP (Tukey-Kramer *post hoc* test, *p* < 0.05).

To confirm that delayed serotonergic depolarization results from the interaction of 5-HT and simulated synaptic drive, rather than reflecting a slower, time-dependent mechanism, we performed additional experiments in COM-excited neurons, in which the resumption of simulated synaptic currents was delayed by an additional 3 s (until trial 7, 6 s after 5-HT application; **Figure 4**). With this additional delay, in which simulated synaptic input was not delivered during trial 6, RMPs remained at baseline levels even at the beginning of trial 7 (6 s following 5-HT application), but depolarized sharply (by 1.5 ± 0.4 mV; *n* = 21; *p* < 0.05) after resumption of simulated synaptic input, as measured at the beginning of trial 8 (9 s after 5-HT application). These results further demonstrate that serotonergic excitation of COM-excited neurons is facilitated by extrinsic excitatory drive.

Since previous studies in the rat PFC have found that serotonergic excitation of L5PNs is preferentially facilitated when paired with strong, rather than weak, depolarizing drive (Araneda and Andrade, 1991; Zhang and Arsenault, 2005), in a subset of COM-excited L5PNs (*n* = 14) we tested the interaction of 5-HT and subthreshold simulated synaptic drive by scaling current amplitudes to 80% of those necessary to evoke a single AP (**Figure 5**). 5-HT was focally applied after five baseline subthreshold trials, and subthreshold current injections resumed for an additional 24 trials. Under these conditions, application of 5-HT failed to promote AP generation by the simulated synaptic input (**Figure 5A**). Instead, 5-HT significantly *reduced* response integrals (by 9.2 ± 1.4%; *p* < 0.05; **Figures 5B,C**) and transiently *hyperpolarized* neurons (by 0.7 ± 0.1 mV; *p* < 0.05; **Figures 5B,C**). These results confirm that serotonergic excitation of COMexcited L5PNs is facilitated by strong, but not weak, excitatory drive.

We next tested the impact of 5-HT on the excitability of COM-excited L5PNs experiencing a second form of excitatory drive: focal application of exogenous glutamate (1 mM). In initial experiments, the duration of glutamate application (7 to 20 ms) was adjusted to generate either reliably suprathreshold responses (i.e., producing one or two APs per trial; **Figure 6**) or "just-subthreshold" responses (**Figure 7**). 5-HT (100 μM, 1 s) was applied midway between the fifth and sixth of fourteen trials delivered at 0.05 Hz. When paired with single suprathreshold applications of glutamate, 5-HT increased the number of glutamate-induced APs in about half of COM-excited neurons (*n* = 5 of 9), from 1.6 ± 0.3 APs in baseline conditions to 4.2 ± 1.1 APs following 5-HT application (**Figure 6A**). In the remaining four COM-excited neurons, 5-HT reduced the mean number of APs from 2.4 ± 0.2 in baseline conditions to 0.8 ± 0.8 APs after 5-HT application. Across all COM-excited L5PNs tested (*n* = 9), there was no significant effect of 5-HT on AP genesis or RMPs, and no immediate effect of 5-HT on response integrals, although

**FIGURE 2 | 5-HT modulates neuronal responses to simulated synaptic input. (A)** Responses of COM-excited (red), COM-biphasic (orange), and CPn (blue) L5PNs to somatic current injections simulating a barrage of excitatory synaptic input under baseline conditions (top voltage traces), after focal 5-HT application (middle voltage traces), or about one minute after 5-HT application (lower voltage traces). Injected currents (bottom traces) were scaled to generate approximately 7 action potentials (APs) in baseline conditions. **(B)** Plots of the number of APs generated by simulated synaptic input (top), changes in RMPs (middle), and percent changes in response integrals (bottom), for COM-excited (red), COM-biphasic (orange), and CPn (blue)

L5PNs. 5-HT was focally applied for 1 s at the time indicated by the green bar. Gray bars indicate time-points for data shown in **A**. Green symbols indicate data from experiments in COM-excited L5PNs in which no 5-HT was applied. Asterisks indicate significant changes from baseline (p < 0.05). Black arrows point out COM-excited responses during trial 6 (immediately after 5-HT). Note that, while the number of APs and response integrals increase immediately during trial 6, RMP, as measured 10 ms prior to simulated synaptic input, was hyperpolarized relative to baseline in trial 6. Only following the initial post-5-HT suprathreshold current injection (trial 6) did RMPs depolarize, as observed at the beginning of trial 7 (red arrow).

we did observe a slowly developing and highly variable increase in response integral (**Figure 6B**; *p* < 0.05).

These results suggest that serotonergic excitation of COMexcited neurons may be activity dependent, with the threshold for excitatory responses being multiple APs. To test whether serotonergic excitation might be enhanced by more robust glutamatergic drive, in a different group of COM-excited neurons we paired 5- HT with bursts of five glutamate applications delivered at 200 ms intervals (5 Hz), with bursts delivered at 0.05 Hz. When 5-HT was paired with suprathreshold glutamate exposure (in which APs resulted from at least three of the five glutamate applications per trial), 5-HT enhanced AP output in all neurons tested (*n* = 8). 5-HT increased glutamate-evoked output from 7.8 ± 0.8 APs in baseline conditions to a peak of 12.5 ± 1.4 APs after 5-HT application (**Figure 6B**; *p* < 0.05). 5-HT also depolarized COM-excited neurons by 2.4 ± 0.6 mV (*p* < 0.05), but only after the initial post-5-HT suprathreshold glutamate application. Finally, 5-HT increased response integrals in trials 6 and 7 by 26 ± 4% and 14 ± 5%, respectively (*n* = 8; *p* < 0.05 for each).

We also paired 5-HT with subthreshold glutamate applications, delivered individually (*n* = 9) or in bursts of five applications at 200 ms intervals (*n* = 8; **Figure 7**). In none of these experiments did 5-HT application lead to AP genesis in response to glutamate. Further, 5-HT failed to immediately enhance the integral of voltage responses to single (*p* = 0.30) or multiple (*p* = 0.22) glutamate applications. 5-HT did induce slight depolarization of the RMP during trials of single glutamate applications (in trial 7, by 0.9 ± 0.3 mV; *p* < 0.05), but not in response to bursts of glutamate, although the long (20 s) inter-trial interval limited our ability to detect subtle changes in RMP over time.

Finally, we assessed the interaction of 5-HT and glutamatergic excitatory drive in COM-biphasic L5PNs (*n* = 6; **Figure 8**). As observed in COM-excited neurons, 5-HT failed to promote AP generation in response to subthreshold bursts of glutamate application (five applications at 5 Hz, repeated at 0.05 Hz). Instead, 5-HT hyperpolarized COM-biphasic L5PNs by 2.9 ± 0.8 mV (*p*<0.05), and decreased response integrals by 39±7% (*p*<0.05). When paired with suprathreshold bursts of glutamate, 5-HT again failed to promote AP generation in COM-biphasic neurons (*p* = 0.8), or change response integrals (*p* = 0.4), but did induce transient hyperpolarization of 3.5 ± 0.8 mV (*p* < 0.05). The lack of obvious serotonergic inhibition in COM-biphasic L5PNs during these experiments may be the result of the long duration (∼10 s) between 5-HT application and the first post-5- HT glutamate trial, as in these neurons the inhibitory response to 5-HT during suprathreshold DC current injection lasted only 24 ± 6.6 s (*n* = 6). Together, these results confirm that serotonergic excitation of COM L5PNs is activity-dependent, promoted by suprathreshold, but not subthreshold, extrinsic excitatory drive.

#### **INTERACTION OF 1A AND 2A RECEPTORS**

We next used pharmacological approaches to test the interaction of 1A and 2A receptors in generating serotonergic responses in COM and CPn neurons. As previously reported (Avesar and Gulledge, 2012), serotonergic inhibition of CPn neurons was blocked by the 1A antagonist WAY 100635 (WAY, 30 nM; **Figure 9A**; *n* = 31). In the presence of WAY, 32% of CPn neurons (*n* = 10 of 31) exhibited an excitatory response to 5-HT during DC-current-evoked AP generation (**Figure 9B**). However, the magnitude of this excitation (61 ± 14% over baseline frequencies; *n* = 10) was less robust than that observed in COM-excited L5PNs (135 ± 12% over baseline frequencies; *n* = 59; *p* < 0.05, Student's *t*-test). Pharmacologically revealed excitation in CPn neurons was blocked by additional bath application of MDL 11939 (MDL, 500 nM; *n* = 7; **Figure 9A**), confirming the presence of functional 2A receptors in this subpopulation of CPn neurons.

Since 2A-dependent excitation is facilitated by extrinsic depolarizing drive, and given that 5-HT can increase the gain of L5PNs in the rat PFC (Araneda and Andrade, 1991; Zhang and Arsenault, 2005), we hypothesized that 2A receptors in CPn neurons might contribute to serotonergic responses when CPn neurons are driven with strong, rather than weak, excitatory drive. To test whether this was the case, we applied 5-HT to CPn neurons under two levels of excitatory drive. DC current was adjusted to generate low- (4.9 ± 0.4 Hz) or high- (9.1 ± 0.5 Hz) frequency baseline firing rates (**Figure 9C**). In both conditions, 5- HT generated hyperpolarizing responses without delayed biphasic excitation (**Figures 9C,D**). Surprisingly, *post hoc* pharmacological detection of 2A receptors was correlated with the duration of inhibitory responses only when 5-HT was paired with low-, rather than high-, frequency baseline AP generation (**Table 2**). When 5- HT was delivered during periods of low-frequency firing, the mean durations of serotonergic inhibition were 21 ± 3 s and 31 ± 3 s in "2A-responsive" and "2A-unresponsive" CPn neurons, respectively (*p* < 0.05; Student's *t*-test). On the other hand, when 5-HT was delivered during periods of high frequency AP generation, the durations of spike inhibition were similar in 2A-responsive (17 ± 4 s) and 2A-unresponsive (20 ± 2 s) CPn neurons. In no cases did CPn firing rates increase above baseline levels following 1A-dependent inhibition. These results suggest that 2A receptor in CPn neurons can moderate serotonergic inhibition during periods

**FIGURE 4 | Serotonergic depolarization of COM-excited neurons is facilitated by simulated synaptic drive. (A)** Top traces: responses of a COM-excited L5PN to simulated synaptic currents delivered before (trial 5; left traces), immediately following (trial 6; middle traces), or 3 s after, 5-HT application (trial 7; right traces). Bottom traces: responses in a COM-excited L5PN in which no simulated synaptic current was delivered during trial 6 (middle trace; "blank" trial). The green bar indicates 5-HT application between trials 5 and 6 (see panel **B**). **(B)** Plots of the number of action potentials (APs) generated by simulated synaptic current injection (top), changes in RMPs (middle), and percent changes in voltage response integrals (bottom) in

COM-excited L5PNs experiencing blank trails during trial 6 (red symbols; n = 21). Data are superimposed on those data from COM-excited L5PNs receiving simulated synaptic current on all trials (from **Figure 2B**, gray; n = 30). Timing of 5-HT application shown by green bar. Responses during trials 5, 6, and 7 are indicated with black, red, and blue bars, respectively. Black arrows indicate increases in AP number and response integral, but no change in RMP, during trial 7 when neurons were not exposed to current injection during trial 6. The red arrow points out the rapid depolarization of RMPs observed at the start of trial 8. Asterisks indicate significant changes from baseline values (p < 0.05).

**FIGURE 5 | Subthreshold simulated synaptic input does not facilitate serotonergic responses in COM-excited L5PNs. (A)** Responses of a COM-excited L5PN experiencing suprathreshold simulated synaptic input (top; action potentials truncated) or subthreshold simulated synaptic input (bottom). Baseline responses (trial 5, black) are superimposed with responses immediately after 5-HT application (trial 6, red), or 3 s later (trial 7, blue), as indicated in panel **C**. For subthreshold trials, current intensities were scaled to 80% of the minimum current necessary to elicit a single spike. **(B)** Plots of cumulative response integrals (binned at

100 ms intervals) for trials using suprathreshold (left) or subthreshold (right) simulated synaptic currents during trials 5 (black), 6 (red), and 7 (transparent blue). **(C)** Plots of changes in RMPs (top) and response integrals (bottom) for trials using subthreshold simulated synaptic input (red symbols; n = 14) superimposed on data from COM-excited L5PNs (from **Figure 2B**) that experienced suprathreshold simulated synaptic input (gray symbols; n = 30). The timing of 5-HT application is indicated by the green bar. Asterisks indicate significant changes from baseline values (p < 0.05).

L5PNs to single glutamate applications that were not facilitated by 5-HT application (top traces; n = 4 of 9) and responses of COM-excited L5PNs to single glutamate application in neurons that were facilitated by 5-HT application (middle traces; n = 5 of 9). Blue traces at left show baseline responses to glutamate, while red traces show responses to glutamate after 5-HT application, and green traces show responses to glutamate approximately two minutes after 5-HT exposure ("wash"). Bottom traces show responses to bursts of five glutamate applications (5 Hz) in baseline conditions (blue traces), after 5-HT application (red traces), and in wash

bursts of suprathreshold glutamate (gray symbols; top; p < 0.05). RMPs were significantly depolarized in neurons receiving bursts of glutamate only after the initial post-5-HT suprathreshold glutamate application (red arrow; middle; p < 0.05) while response integrals significantly increased immediately after 5-HT application in these neurons (bottom). 5-HT had no significant effect on AP generation or RMP in neurons receiving single suprathreshold applications of glutamate, but was associated with a slowly developing and highly variable increase in response integral. Asterisks indicate significant changes from baseline values (p < 0.05).

of limited excitatory drive, but that under normal conditions they do not generate excitatory or biphasic responses to 5-HT.

Finally, we tested whether 1A receptors participate in shaping excitatory serotonergic responses in COM-excited L5PNs (**Figure 10**). When COM-excited neurons were exposed to the 2A antagonist MDL (*n* = 10), inhibitory responses were never revealed. Further, baseline firing frequencies were not correlated with the magnitude of serotonergic excitation of COM neurons (*p* = 0.22; data not shown). Thus, despite a significant proportion

of COM neurons exhibiting both 1A- and 2A-receptor-mediated responses to 5-HT (i.e., COM-biphasic L5PNs), COM-excited neurons appear to respond to 5-HT solely via activation of 2A receptors.

#### **DISCUSSION**

#### **ACTIVITY-DEPENDENT EXCITATION OF L5PNs**

Our results demonstrate that serotonergic excitation of COM neurons is enhanced when 5-HT is paired with suprathreshold

extrinsic excitatory drive. This activity-dependent facilitation of 2A-mediated excitation appears to require more than one or two APs, as 5-HT did not consistently enhance the number of APs generated by single suprathreshold applications of glutamate. Our results are consistent with findings byAraneda andAndrade (1991) and Zhang and Arsenault (2005), who, in rat L5PNs, observed that 5-HT preferentially enhanced AP generation from strong depolarizing stimuli. Our results go further, showing that in the mouse mPFC, 5-HT preferentially enhances AP output in COM-excited neurons receiving significant suprathreshold depolarizing drive, but does not boost responses to subthreshold input.

We also found that 1A-receptor-dependent serotonergic inhibition is influenced by concurrent excitatory drive, albeit to a lesser extent than is serotonergic excitation, and in opposite directions in CPn and COM-biphasic neurons. The duration of 5-HTinduced spike cessation in CPn neurons was prolonged relative to hyperpolarizing responses generated at RMPs, even in CPn neurons exhibiting 2A-dependent responses in the presence of WAY (discussed below). This effect is not unexpected, given the greater driving force at depolarized membrane potentials for the G-protein-coupled inwardly rectifying potassium channels (GIRK channels) that underly 1A-dependent inhibition (Andrade and Nicoll, 1987). On the other hand, in COM-biphasic neurons, hyperpolarizing responses generated at RMPs persisted longer than inhibition of APs during periods of suprathreshold DC current injection. This likely reflects the activity-dependent contribution of 2A receptors to serotonergic responses in these neurons; 2A-dependent excitation is expected to compete with, and limit the duration of, 1A-dependent inhibition when neurons experience suprathreshold excitatory drive.

In our experiments, we used brief (1 s) applications of exogenous 5-HT to characterize serotonergic responses in cortical neurons. During wakefulness, release of endogenous 5-HT likely generates longer-lasting stimulation of 5-HT receptors (Portas and McCarley, 1994; Sakai and Crochet, 2001). While our results suggest that COM-excited and most CPn neurons would respond to prolonged 5-HT exposure with purely excitatory and inhibitory responses, respectively, it is less clear how COMbiphasic and 2A-expressing CPn neurons might respond to tonic serotonergic stimulation. 2A receptors display a lower affinity for 5-HT, and are more prone to agonist-induced desensitization, than are 1A receptors (for review, see Zifa and Fillion, 1992), suggesting that tonic release of low concentrations of 5-HT may preferentially inhibit overall cortical output. However, 1A and 2A receptors are susceptible to heterologous desensitization (Zhang et al., 2001; Carrasco et al., 2007), raising the possibility of complex interaction among 1A and 2A signaling *in vivo*. Additional studies will be necessary to explore the response of COM and CPn neurons to tonic exposure to physiological concentrations of 5-HT, and to endogenous release of 5-HT *in vivo*.


(red), and approximately two minutes after 5-HT application ("wash"; green). Dashed-lines indicate RMP. **(B)** Plots of the number of action potentials (APs;


Asterisk indicates p < 0.05 vs CPn neurons without revealed 2A responses.

#### **INTERACTION OF 5-HT RECEPTORS**

The antagonistic interplay of 1A and 2A receptors in regulating behavior is well established (Berendsen and Broekkamp, 1990; Darmani et al., 1990; Willins and Meltzer, 1997; Carli et al., 2006), and direct interaction of 1A and 2A receptors within individual cortical pyramidal neurons has long been hypothesized (Ashby et al., 1994; Martín-Ruiz et al., 2001; Amargós-Bosch et al., 2004). Yet, while many L5PNs in the rodent PFC

subthreshold glutamate applications. Asterisks indicate significant changes

from baseline values (p < 0.05).

express both 1A and 2A receptors (Martín-Ruiz et al., 2001; Santana et al., 2004; W¸edzony et al., 2008; Vázquez-Borsetti et al., 2009), most respond to 5-HT with unidirectional, 1A- or 2Amediated, responses (Araneda and Andrade, 1991; Tanaka and North, 1993; Spain, 1994; Zhang, 2003; Benekareddy et al., 2010; Avesar and Gulledge, 2012), with 1A-mediated inhibition predominating in the mature cortex (Béïque et al., 2004; Puig et al., 2004, 2005). The prevalence of 1A-mediated inhibition may reflect the greater abundance of CPn L5PNs relative to COM L5PNs (Hattox and Nelson, 2007), and/or the activity-dependence of 2A-mediated excitation, as pyramidal neurons are generally quiescent *in vitro* and have reduced excitatory drive under anesthesia *in vivo* (Hentschke et al., 2005).

Our results also confirm that a significant proportion of CPn neurons (∼32%) express functional 2A receptors capable of generating modest excitatory responses when 1A receptors are pharmacologically blocked. This proportion is comparable to the proportion of rat prelimbic neurons found to coexpress 1A and 2A mRNA (∼41%; Santana et al., 2004) or protein (∼38%; W¸edzony et al., 2008), and is similar to the proportion of COM neurons exhibiting 1A-mediated biphasic inhibition (∼35%; Avesar and Gulledge, 2012). Yet, even when present, 2A receptors have only limited impact in shaping serotonergic responses in CPn neurons. We found the presence of 2A receptors to influence the duration of inhibitory responses when 5-HT was delivered during periods of low, but not high, frequency AP generation. This contrasts with the activity-dependence of serotonergic excitation in COM neurons, and with the hypothesis that 2A receptors generally increase the gain of cortical pyramidal neuron output (Araneda and Andrade, 1991; Zhang and Arsenault, 2005). While more studies will be necessary to evaluate the role of 2A receptors in regulating excitability in CPn neurons, it is also possible that 2A expression primarily serves alternative functions, such as regulation of synaptic transmission and/or dendritic excitability in ways not readily observable in our experiments (Carr et al., 2002;Yuen et al., 2008; Zhong et al., 2008; Troca-Marín and Geijo-Barrientos, 2010).

One cortical circuit influenced by both 1A and 2A receptors provides positive feedback to serotonergic neurons in the dorsal raphe; injection of the selective 2A agonist DOI into the mPFC increases the local release of 5-HT (Martín-Ruiz et al., 2001; Puig et al., 2003). Since 2A receptors are expressed in subpopulations of brainstem projection neurons (Martín-Ruiz et al., 2001; Santana et al., 2004), one possibility is that DOI directly excites cortico-raphe neurons. However, 2A-dependent increases in 5-HT release required, and were mimicked by, intracortical glutamatergic transmission (Martín-Ruiz et al., 2001; Puig et al., 2003), suggesting a role for indirect excitation of cortico-raphe neurons from the directionally selective synaptic connectivity between 2A-expressing COM L5PNs and brainstem-projecting neurons (Morishima and Kawaguchi, 2006). Martín-Ruiz et al. (2001) also demonstrated that 1A receptor activation can suppress the effects of DOI, confirming an antagonist relationship between 1A and 2A receptors in regulating cortical circuits, and the primacy of 1A receptors in regulating the overall output of L5PNs projecting to the brainstem.

#### **MECHANISMS OF SEROTONERGIC REGULATION OF L5PNs**

Although there is general consensus that 1A-dependent inhibition of cortical pyramidal neurons is mediated by Gi/o-coupled GIRK channels (Andrade et al., 1986; Lüscher et al., 1997), the mechanisms responsible for 2A-dependent excitation of cortical neurons remain uncertain. We previously observed that 2Adependent excitation remains intact in the presence of blockers of fast synaptic transmission (Avesar and Gulledge, 2012). Similarly, 5-HT can enhance AP generation resulting from current injection or exogenous glutamate, suggesting that 5-HT has direct effects on the intrinsic excitability of COM L5PNs (see also Béïque et al., 2007). Yet, the ionic mechanisms responsible for intrinsic 2A-dependent excitation remain mysterious. One possibility is that 2A receptor activation suppresses potassium conductances facilitated by depolarization, including "M-like" currents (McCormick and Williamson, 1989; Tanaka and North, 1993; Zhang, 2003) and the calcium-dependent potassium conductances associated with slow afterhyperpolarizations (Araneda and Andrade, 1991; Pedarzani and Storm, 1993; Villalobos et al., 2005, 2011), although this later effect appears to be limited at physiological temperatures (Spain, 1994; see also Gulledge et al., 2013). Another attractive possibility is that 2A receptors enhance voltage-sensitive cationic conductances similar to, or perhaps identical to, those mediating cholinergic excitation in prefrontal L5PNs (Haj-Dahmane and Andrade, 1996, 1999; Yan et al., 2009; but see Dasari et al., 2013). Less likely are direct actions of 5-HT on voltage-gated sodium or calcium channels, as 2A receptors are generally considered negative regulators of these conductances (Bayliss et al., 1995; Carr et al., 2002). While the ionic effectors mediating serotonergic excitation remain unknown, the ability to selectively target 2A-excited L5PNs (i.e., COM L5PNs) in the mouse mPFC may facilitate future studies focusing on 2A-dependent postsynaptic signal transduction.

#### **FUNCTIONAL IMPLICATIONS OF ACTIVITY-DEPENDENT SEROTONERGIC EXCITATION OF COM NEURONS**

Serotonergic input to the PFC facilitates behavioral inhibition, and depletion of prefrontal 5-HT produces impulsive behaviors in both animals and humans (Harrison et al., 1997; Winstanley et al., 2004b; Worbe et al., 2014). Studies in rodents have dissociated the roles of 1A and 2A receptors in regulating impulsive behavior, finding that 2A receptor agonists enhance (Koskinen et al., 2000; Winstanley et al., 2003; Blokland et al., 2005), while 1A agonists and 2A antagonists suppress (Higgins et al., 2003; Winstanley et al., 2003; Carli et al., 2004, 2006; Blokland et al., 2005; Fletcher et al., 2007) impulsivity. Thus, the effect of 5-HT on cortical circuits will depend, in part, on the net balance of 2A-dependent

excitation of COM-excited neurons and 1A-dependent inhibition of COM-biphasic and CPn neurons. Our results suggest that pharmacological enhancement of 2A-dependent excitation, without enhanced 1A-dependent inhibition, may contribute to impulsivity via non-specific amplification of intracortical networks. On the other hand, blockade of 2A receptors, or activation of 1A receptors, is expected to reduce overall cortical drive. Given that most L5PNs in the adult neocortex exhibit 1A-mediated inhibition (Béïque et al., 2004; Avesar and Gulledge, 2012), and that 5-HT directly excites subpopulations of GABAergic interneurons via ionotropic 5-HT3 receptors (Morales and Bloom, 1997; Puig et al., 2004; Lee et al., 2010), increased cortical 5-HT might be expected to reduce impulsivity by limiting output from the PFC. This may well be the case, as serotonergic tone in the mPFC is negatively correlated with impulsivity (Barbelivien et al., 2008), and selective 5-HT reuptake inhibitors (SSRIs) that boost 5- HT levels in the mPFC (Jordan et al., 1994) generally reduce impulsivity (Baarendse and Vanderschuren, 2012). The activitydependent serotonergic excitation of COM L5PNs described here may further help reduce impulsivity by restricting 2Adependent amplification of cortical output to behaviorally relevant circuits.

#### **ACKNOWLEDGMENTS**

This work was supported by PHS grant R01 MH099054 (Allan T. Gulledge). The authors thank Vicky Puig for comments on the manuscript, Ken Orndorff for help with microscopy, and Corey Hill for technical assistance.

#### **REFERENCES**


volunteers. *Am. J. Psychiatry* 158, 1326–1328. doi: 10.1176/appi.ajp.158. 8.1326


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 April 2014; accepted: 29 July 2014; published online: 26 August 2014. Citation: Stephens EK, Avesar D and Gulledge AT (2014) Activity-dependent serotonergic excitation of callosal projection neurons in the mouse prefrontal cortex. Front. Neural Circuits 8:97. doi: 10.3389/fncir.2014.00097*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Stephens, Avesar and Gulledge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Dopamine-enabled anti-Hebbian timing-dependent plasticity in prefrontal circuitry

## *Hongyu Ruan1,2 ‡,Taixiang Saur 1,2 †‡ andWei-DongYao1,2 \**

<sup>1</sup> Harvard Medical School – New England Primate Research Center, Southborough, MA, USA

<sup>2</sup> Department of Psychiatry, Beth Israel Deaconess Medical Center, Boston, MA, USA

#### *Edited by:*

M. Victoria Puig, Massachusetts Institute of Technology, USA

#### *Reviewed by:*

Guillermo Gonzalez-Burgos, University of Pittsburgh, USA Marco A. Huertas, University of Texas Health Science Center at Houston, USA

#### *\*Correspondence:*

Wei-Dong Yao, Harvard Medical School – New England Primate Research Center, 1 Pine Hill Drive, Southborough, MA 01772-9102, USA e-mail: wei-dong\_yao@ hms.harvard.edu

#### *†Present address:*

Taixiang Saur, McLean Hospital, Belmont, MA, USA

‡Hongyu Ruan and Taixiang Saur have contributed equally to this work.

Spike timing-dependent plasticity (STDP) of glutamatergic synapses is a Hebbian associative plasticity that may underlie certain forms of learning. A cardinal feature of STDP is its dependence on the temporal order of presynaptic and postsynaptic spikes during induction: pre–post (positive) pairings induce t-LTP (timing-dependent long-term potentiation) whereas post–pre (negative) pairings induce t-LTD (timing-dependent longterm depression). Dopamine (DA), a reward signal for behavioral learning, is believed to exert powerful modulations on synapse strength and plasticity, but its influence on STDP has remained incompletely understood. We previously showed that DA extends the temporal window of t-LTP in the prefrontal cortex (PFC) from +10 to +30 ms, gating Hebbian t-LTP. Here, we examined DA modulation of synaptic plasticity induced at negative timings in layer V pyramidal neurons on mouse medial PFC slices. Using a negative timing STDP protocol (60 post–pre pairings at 0.1 Hz, Δt = −30 ms), we found that DA applied during post–pre pairings did not produce LTD, but instead enabled robust LTP. This anti-Hebbian t-LTP depended on GluN2B-containing NMDA receptors. Blocking D1- (D1Rs), but not D2- (D2Rs) class DA receptors or disrupting cAMP/PKA signaling in pyramidal neurons also abolished this atypical t-LTP, indicating that it was mediated by postsynaptic D1R-cAMP/PKA signaling in excitatory synapses. Unlike DA-enabled Hebbian t-LTP that requires suppression of GABAergic inhibition and cooperative actions of both D1Rs and D2Rs in separate PFC excitatory and inhibitory circuits, DA-enabled anti-Hebbian t-LTP occurred under intact inhibitory transmission and only required D1R activation in excitatory circuit. Our results establish DA as a potent modulator of coincidence detection during associative synaptic plasticity and suggest a mechanism by which DA facilitates input-target association during reward learning and top-down information processing in PFC circuits.

**Keywords: STDP, Hebbian, dopamine, glutamate, reward, learning**

## **INTRODUCTION**

Spike timing-dependent plasticity (STDP) is a Hebbian synaptic learning rule that may underlie neural circuit remodeling and behavioral adaptations (Bi and Poo, 2001; Dan and Poo, 2006; Caporale and Dan, 2008; Feldman, 2012; Ganguly and Poo, 2013). In its canonical form, STDP depends on the temporal order and narrow window of presynaptic and postsynaptic spikes: pairings of pre–post spikes induce long-term potentiation (t-LTP) whereas post–pre spike pairings induce long-term depression (t-LTD; Magee and Johnston, 1997; Markram et al., 1997; Bi and Poo, 1998). At many synapses, induction of Hebbian STDP depends on postsynaptic *N*-methyl-D-aspartate receptors (NMDARs), a classical coincidence detector of presynaptic and postsynaptic discharges and a source of intracellular Ca2<sup>+</sup> influx needed for synaptic modifications (Caporale and Dan, 2008; Feldman, 2012). Different NMDAR subunits may differentially contribute to STDP; for example, GluN2A and GluN2B subunits haven been shown to mediate t-LTP and t-LTD, respectively, in cultured hippocampal synapses (Gerkin et al., 2007), consistent with the different channel biophysics, synaptic localizations, and signaling mechanisms associated with these subunits (Riccio and Ginty, 2002; Cull-Candy and Leszkiewicz, 2004; Lau and Zukin, 2007). Opposite to classical Hebbian STDP, atypical forms of STDP have also been observed at some synapses, where pre–post spikings drive t-LTD and post–pre spikings drive t-LTP (Han et al., 2000; Fino et al., 2005; Safo and Regehr, 2005; Letzkus et al., 2006; Lu et al., 2007; Fino et al., 2008). These STDP variants, referred as anti-Hebbian, are relatively rare but also often depend on NMDARs, particularly anti-Hebbian t-LTP (Letzkus et al., 2006).

The quantitative rules of STDP are profoundly influenced by neuromodulations (Lin et al., 2003; Couey et al., 2007; Seol et al., 2007; Pawlak et al., 2010; Cassenaer and Laurent, 2012). A particularly important neuromodulator is dopamine (DA), believed to encode reward signal during behavioral reinforcement and learning (Schultz, 2002; Wise, 2004). Recent studies suggest that DA, via the activation of D1 (D1Rs)- and D2 (D2Rs)-class receptors, is required for STDP induction in striatal medium spiny neurons (Pawlak and Kerr, 2008; Shen et al., 2008). DA has also been shown to broaden the temporal window of t-LTP at hippocampal (Zhang et al., 2009) and prefrontal cortex (PFC; Xu and Yao, 2010) synapses and, remarkably, convert t-LTD into t-LTP in cultured

hippocampal neurons. In both synapses, DA-driven extension of t-LTP timing window is mediated by postsynaptic D1R-cAMP/PKA signaling and is likely the result of a decreased t-LTP induction threshold (Zhang et al., 2009), suggesting an important role for DA in the control of associability of pre–post coincident stimuli that trigger STDP.

In many brain regions, LTP (including t-LTP) at glutamate synapses often cannot be induced when endogenous local GABAergic transmission is left unblocked, supporting a role for native GABAergic network in constraining the excitability and plasticity of excitatory circuits (Wigstrom and Gustafsson, 1983; Bissiere et al., 2003; Meredith et al., 2003; Liu et al., 2005). Interestingly, DA can remove the powerful inhibitory constraint in both lateral amygdala and medial PFC (mPFC), gating t-LTP induction at glutamate synapses on principle cells (Bissiere et al., 2003; Xu and Yao, 2010). The dopaminergic gating is mediated through a mechanism by which DA decreases GABA release by acting on D2Rs localized at presynaptic GABAergic terminals of a subset of PFC interneurons (Mrzljak et al., 1996; Chiu et al., 2010; Xu and Yao, 2010).

In this study, we investigated DA modulation of STDP in the mouse mPFC, an association cortex that mediates cognition, reward, and memory (Fuster, 2008). Much of these functions are regulated by DA and mediated by synaptic strength in PFC excitatory circuits (Seamans and Yang, 2004). We previously reported that DA, via cooperative activation of D2Rs in inhibitory circuits and D1Rs in excitatory circuits, enables t-LTP in layer V PFC pyramidal neurons over a positive timing window of 0 to +30 ms. We now extend our earlier work by examining DA modulation of STDP at negative timing. Our results indicate that DA drives t-LTP at −30 ms, enabling a form of anti-Hebbian t-LTP that depends on postsynaptic D1-cAMP/PKA signaling and GluN2B-containing NMDARs in pyramidal neurons. In contrast to the high susceptibility of Hebbian t-LTP to GABAergic inhibition, DA-enabled anti-Hebbian t-LTP can be induced under intact inhibitory transmission.

#### **MATERIALS AND METHODS**

All procedures were conducted in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals and with an approved IACUC protocol from the Harvard Medical Area Standing Committee on Animals. Coronal slices (300 μm) were cut from the mPFC (containing the anterior cingulate or prelimbic cortices) of C57BL/6J mice (postnatal day 30–50) with a Leica VT1200 vibratome (Xu et al., 2009; Xu and Yao, 2010). Slices were incubated at room temperature in oxygenated artificial cerebrospinal fluid (ACSF) containing (in mM) 126 NaCl, 2.5 KCl, 2.5 CaCl2, 1.2 MgCl2, 25 NaHCO3, 1.2 NaH2PO4, and 25 Dglucose for at least 1 h before electrophysiological recording. Slices were then transferred to a recording chamber and secured with a harp during recording.

Somatic whole-cell patch-clamp recordings were performed on individual layer V PFC pyramidal neurons using an Axoclamp 2B amplifier (Molecular Devices). All recordings were made at 32◦C, maintained with a TC344 Dual Automatic Temperature Controller (Harvard Apparatus). Cells were visualized with an Olympus BX51WI upright microscope under infrared illumination

and recognized by their pyramidal shapes. Presynaptic stimuli (0.033 Hz, 200 μs), where necessary, were delivered at superficial layers II/III with a concentric tungsten electrode (FHC). In current-clamp recordings, pipettes were filled with (in mM) 130 K-gluconate, 8 NaCl, 10 HEPES, 0.4 EGTA, 2 Mg-ATP, and 0.25 GTP-Tris, pH 7.25 (with KOH) and recordings were made at the resting membrane potential of the cell. Input resistance was monitored throughout the experiment from the voltage response to a −200 pA hyperpolarizing current. In voltageclamp experiments, electrodes were filled with (in mM) 142 Cs-gluconate, 8 NaCl, 10 HEPES, 0.4 EGTA, 2.5 QX-314 [*N*-(2,6 dimethylphenylcarbamoylmethyl)triethylammonium bromide], 2 Mg-ATP, and 0.25 GTP-Tris, pH 7.25 (with CsOH). Neurons were voltage clamped at −60 or −30 mV unless specified otherwise. Picrotoxin, (2R)-amino-5-phosphonopentanoate (APV), MK-801, 6-cyano-7-nitroquinoxaline-2,3-dione (CNQX), NVP-AAM077, and ifenprodil, where indicated, were either included in ACSF throughout experiments or added after baseline recordings were established. DA at 100 μM (in the presence of 20 μM ascorbic acid) was made fresh on the day of experiments. Drugs (e.g., DA or its agonists/antagonists) applied during STDP induction were washed in approximately 4 min before the start of pre–post or post–pre spike pairings and washed out approximately 12 min thereafter with a gravity-driven perfusion system (Harvard Apparatus). For intracellular dialysis of PKI (6–22; PKA inhibitor 6–22 amide; Calbiochem), we waited for at least 10 min after the patch rupture to allow its diffusion to synapses. Signals were filtered at 1 kHz, digitized at 10–50 kHz, and analyzed with pClamp 9.2 (Molecular Devices) or Mini Analysis 6 (Synaptosoft).

All data are expressed as mean ± SEM. Statistical analysis was performed using unpaired Student's *t*-tests or one-way ANOVA followed by Dunnett's *post hoc* tests, as specified in individual figures.

#### **RESULTS**

#### **DA ENABLES t-LTP IN NATIVE PFC CIRCUITS OVER A 60-ms TEMPORAL WINDOW**

We performed whole-cell recordings from visually identified layer V pyramidal cells on mPFC slices (**Figure 1A**). Postsynaptic potentials (PSPs), evoked by extracellular stimuli at layer II/III, were recorded at the resting membrane potential (−67.8 ± 1.0 mV). This was nearly identical to the reversal potential of inhibitory postsynaptic currents (IPSCs) in this preparation (∼−67 mV; Xu and Yao, 2010). At this resting level, PSPs were excitatory, mediated primarily by α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptors (AMPARs), and with little contamination by inhibitory postsynaptic potentials (IPSPs) evoked as a result of excitation of local or feedforward inhibitory pathways (Xu and Yao, 2010).

Following a 10–15 min baseline recording, t-LTP was induced by 60 pairs (0.1 Hz) of presynaptically elicited PSPs and postsynaptic action potentials (APs) with variable pre–post (positive) or post–pre (negative) spike timing intervals (Δts; **Figure 1B**). The specificity, efficiency, and underlying mechanism of this STDP protocol to induce t-LTP at positive Δts have been established (Xu and Yao, 2010). Confirming our previous results, under

conditions of unblocked GABAergic transmission (the GABAA receptor blocker picrotoxin was omitted from the extracellular bath), pre–post pairings at Δt = +30 ms did not induce significant change in the amplitude of PSPs [105.7 ± 10.4%; *P* > 0.05 vs. baseline (101.5 ± 2.1%); **Figure 1C**]. However, when DA (100 μM) was added to the bath during pre–post pairings, the same protocol produced a lasting and significant increase in PSP amplitude (139.8 ± 6.4%; *P* < 0.01 vs. baseline; **Figure 1D**). Extending this finding to the negative Δt direction, we found that a classical t-LTD protocol (60 post–pre pairings, 0.1 Hz, Δt = −30 ms) did not induce LTD [93.5 ± 5.9%; *P* > 0.05 vs. baseline (99.8 ± 0.3%); **Figure 1E**], but instead induced a significant LTP [132.0 ± 1.3%; *P* < 0.05 vs. baseline (99.5 ± 0.9%); **Figure 1F**] when DA was applied to the extracellular bath during post–pre pairings. At a more extended negative timing interval (Δt = −50 ms), the presence of DA had no significant effect on the outcome of synaptic plasticity (Saur and Yao, data not shown). The DA-enabled t-LTP induced by post–pre pairings

at −30 ms was not caused by a delayed potentiation of PSPs by DA itself because bath-applied DA in the absence of PSP-AP pairings produced a reversible depression of PSPs (Xu and Yao, 2010). In addition, DA had little effect on the intrinsic excitability of these neurons (Xu and Yao, 2010). This atypical form of t-LTP is opposite to the canonical Hebbian t-LTP driven by pre–post spike pairs, thus can be considered anti-Hebbian. Together, our data indicates that DA opens up a 60-ms temporal window (from −30 to +30 ms) that is otherwise closed for Hebbian and anti-Hebbian synaptic plasticity in native PFC circuits.

#### **DA-ENABLED ANTI-HEBBIAN LTP IS MEDIATED BY D1Rs, BUT NOT D2Rs, AND CAN BE INDUCED UNDER INTACT GABAergic TRANSMISSION BY D1R ACTIVATION ALONE**

We next investigated the DA receptor class(es) that mediate the negative-timing t-LTP (**Figure 2**). Under intact inhibitory transmission (**Figure 2A**), selective blockade of D1Rs by

SCH23390 (10 μM; added to the perfusion bath 1 min before DA application) completely abolished the DA-enabled t-LTP at −30 ms (96.8 ± 4.6%; *P* > 0.05 vs. baseline; **Figures 2B,F**), suggesting a mandatory role for D1Rs in this t-LTP. In contrast, blocking D2Rs by including haloperidol (2 μM) during DA application failed to block this t-LTP (134.3 ± 6.1%; *P* > 0.05 vs. DA; **Figures 2C,F**), suggesting D2Rs did not contribute to this DA-enabled t-LTP. This result was unexpected because we and others had previously shown that DA-enabled t-LTP induced at positive timings requires activation of D2Rs when GABAergic transmission is left unblocked, through a mechanism by which DA acts on presynaptic D2Rs at local GABAergic terminals to suppress inhibitory transmission (Bissiere et al., 2003; Xu and Yao, 2010). Thus, our result suggests that DAenabled t-LTP induction at −30 ms did not require suppression of the endogenous GABAergic inhibition. Indeed, application of the D1R agonist SKF81297 (2 μM) alone (129.2 ± 7.0%; **Figures 2D,F**) in the absence of picrotoxin was sufficient to mimic the effect of DA in enabling t-LTP at Δt −30 ms, whereas the D2R agonist quinpirole (10 μM) alone was insufficient (102.0 ± 4.4%; **Figures 2E,F**). Thus, like DA-enabled positive-timing t-LTP, DA-enabled negative-timing t-LTP is mediated by D1Rs; but unlike positive-timing LTP, the negativetiming t-LTP does not seem to be constrained by GABAergic transmission.

To further evaluate the role of GABAergic inhibition in negative-timing t-LTP, we compared the magnitude of DAenabled −30 ms t-LTP in the absence and presence of picrotoxin at different time points following post–pre pairings (**Figure 3**). In the presence of picrotoxin, 60 pairs of postsynapticAP and presynaptic EPSPs (excitatory PSPs) induced neither t-LTP nor t-LTD without bath-applied DA (95.2 ± 7.8%; **Figure 3A**), suggesting that this low-frequency, single-spike protocol was inefficient for LTD induction at −30 ms under control conditions. In contrast, when

DA was supplied during pairings, this protocol induced robust t-LTP (146.0 ± 8.1%; **Figure 3B**). However, a direct comparison of this DA-enabled t-LTP with that in the absence of picrotoxin revealed a delayed occurrence of PSP potentiation when picrotoxin was omitted (**Figure 3C**). These experiments suggest some potential constraining effects of GABAergic inhibition on the development phase of t-LTP. Whether this was due to a transient potentiation of IPSPs following post–pre pairings that would shunt EPSPs or an inhibition of t-LTP induction/expression mechanism by GABAergic transmission remains to be determined. Nevertheless, our data suggest that Hebbian and anti-Hebbian t-LTP in the PFC depend on different DA receptor subtypes and display differential susceptibility to endogenous GABAergic circuit inhibition.

#### **DA-ENABLED ANTI-HEBBIAN LTP IS MEDIATED BY POSTSYNAPTIC D1R-cAMP/PKA SIGNALING IN PYRAMIDAL CELLS**

We next investigated the signaling mechanism underlying D1Rdependent anti-Hebbian t-LTP (**Figure 4**). Our previous study demonstrated that DA acts on D1Rs and downstream cAMP/PKA signaling in pyramidal neurons to drive t-LTP at Δt = +30 ms, an extended and normally ineffective timing interval (Xu and Yao, 2010). We hypothesized that similar signaling mechanism, i.e., postsynaptic D1R-cAMP/PKA pathway in excitatory synapses on pyramidal neurons mediates the anti-Hebbian t-LTP and thus studied SKF81297-enabled t-LTP at Δt = −30 ms in the presence of picrotoxin (50 μM): under these conditions, GABAAR-mediated inhibitory influence was blocked and

effects of DA receptors were limited to excitatory synapses. Bath application of SKF81297 (2 μM) during post–pre pairings enabled significant t-LTP (162.9 ± 21.26%; **Figure 4A**), thus fully mimicking the enabling effect of DA (**Figure 4D**). As expected, quinpirole (10 μM) failed to enable t-LTP at −30 ms (100.9 ± 3.5%; **Figures 4B,D**), further supporting that D1Rs, but not D2Rs, in pyramidal cells of excitatory microcircuits mediate this negative-timing t-LTP. Importantly, loading postsynaptic neurons with PKI (6–22) (20 μM), a membraneimpermeable form of inhibitory peptide of PKA, completely abolished the SKF81297-enabled −30 ms t-LTP (94.18 ± 14.98%; **Figures 4C,D**), suggesting that this t-LTP depends on postsynaptic cAMP/PKA signaling. Taken together, our results indicate that, similar to DA-enabled Hebbian t-LTP at +30 ms, DAenabled anti-Hebbian t-LTP at −30 ms depends on postsynaptic D1Rs and downstream cAMP/PKA signaling in pyramidal cells.

### **DA-ENABLED ANTI-HEBBIAN t-LTP DEPENDS ON GluN2B-CONTAINING NMDARs**

Conventional LTP and classical Hebbian t-LTP, including DAenabled positive-timing t-LTP illustrated in our previous study (Xu and Yao, 2010), depend on postsynaptic NMDARs (Caporale and Dan, 2008). Including the NMDAR antagonist APV (50 μM) in the bath completely abolished DA-enabled t-LTP at −30 ms (96.9 ± 6.7%; **Figure 5A**), indicating that this anti-Hebbian t-LTP is also NMDAR-dependent. GluN2A and GluN2B subunits have been suggested to play differential roles in LTP

and LTD (Liu et al., 2004; Massey et al., 2004) but see (Berberich et al., 2005; Weitlauf et al., 2005; Morishita et al., 2007). Thus, we further investigated which of these subunits might mediate DAenabled negative-timing t-LTP,using ifenprodil, a GluN2B-specific inhibitor and NVP-AAM077, a GluN2A-preferred competitive antagonist (Auberson et al., 2002). Previous studies have shown that at 0.4 μM or lower, NVP-AAM077 selectively inhibits GluN2A-NMDAR-mediated currents in response to synaptically released glutamate in rodent hippocampal and PFC synapses (Weitlauf et al., 2005; Zhao et al., 2005; Gerkin et al., 2007). We found that at 0.4 μM, NVP-AAM077 did not prevent SKF81297 enabled t-LTP at −30 ms (166.4 ± 14.96%; **Figures 5B,D**), suggesting that GluN2A is not required to support this negativetiming t-LTP. In contrast, ifenprodil (3 μM; 107.2 ± 13.71%; **Figures 5C,D**) completely blocked SKF81297-enabled t-LTP at −30 ms, suggesting that the negative-timing t-LTP depended on GluN2B. Together, our analysis indicates that DA-enabled anti-Hebbian t-LTP is mediated by GluN2B-containing NMDARs.

### **MODULATION OF SYNAPTIC GluN2A- AND GluN2B-NMDAR CURRENTS BY SKF81297**

GluN2A-NMDARs and GluN2B-NMDARs exhibit different channel conductance, kinetics, and subcellular localizations and are differentially required for t-LTP and t-LTD, respectively (Gerkin et al., 2007). A recent study also indicates that these NMDAR subtypes in the hippocampus are differentially modulated by D1Rs: GluN2B-NMDAR-mediated synaptic currents are potentiated whereas GluN2A-NMDAR currents are depressed (Varela et al., 2009). Because DA/D1R enables t-LTP at both +30 and −30 ms, normally ineffective timings, and GluN2B-NMDARs are required for DA/D1R-enabled t-LTP at −30 ms, it is possible that D1R activation enables t-LTP at these timings by enhancing GluN2B-NMDAR currents. To evaluate this possibility, we examined the modulation of synaptic GluN2A- and GluN2B-mediated NMDAR currents by D1R activation in PFC pyramidal neurons (**Figure 6**).

We recorded NMDAR-mediated excitatory postsynaptic currents (EPSCs) at −30 mV, a depolarized potential that permitted the removal of Mg2<sup>+</sup> blockade of NMDAR channels. Picrotoxin (50 μM) and CNQX (20 μM) were included in the extracellular bath to block GABAA receptor and AMPA receptormediated responses, respectively. EPSCs recoded under these conditions were mediated predominately by NMDARs as MK-801 (20 μM), an open channel NMDAR blocker, use-dependently inhibited synaptically evoked EPSCs (**Figure 6A**). In addition, the NMDAR-EPSCs were composed mainly of GluN2A and GluN2B currents, as sequential applications of NVP-AAM077 (0.4 μM) and ifenprodil (3 μM) nearly completely abolished the total NMDAR-EPSC (**Figure 6B**). Further supporting that GluN2Aand GluN2B-NMDAR currents were properly isolated, the NVP-AAM077-insensitive component (presumably GluN2B-NMDAR current) showed slower rise and decay compared to ifenprodilinsensitive component (presumably GluN2A-NMDAR current; **Figure 6C**).

Following a 5–10 min baseline recording, D1Rs were activated by adding SKF81297 (2 μM) to the bath for 10 min,

a protocol similar to that for t-LTP induction. SKF81297 produced a sustained and significant suppression of GluN2A-EPSCs (64.16 ± 6.93 %; *P* < 0.05 vs. baseline), but a very modest, statistically insignificant reduction of GluN2B-EPSCs (89.94 ± 3.48%; *P* > 0.05; **Figures 6D,E**). This data suggests that D1R activation facilitates t-LTP at various timing intervals not by enhancing GluN2A or GluN2B-mediated NMDAR currents, and that additional signaling mechanism downstream of NMDAR-mediated Ca2<sup>+</sup> influx must be involved.

### **A CIRCUITRY-BASED MODEL OF DA MODULATION OF PFC SYNAPTIC PLASTICITY**

In summary, combined with our previous work (Xu and Yao, 2010), the above experiments support a working model by which DA drives both Hebbian and anti-Hebbian t-LTP in native PFC circuits (**Figure 7**). Under resting physiological conditions where GABAergic transmission is intact and basal (tonic) DA level is low, no t-LTP can be elicited in layer V output neurons. When DA level rises (as is expected during attentional or motivational arousal), t-LTP is enabled across a temporal window that ranges from −30 to +30 ms. DA suppresses inhibitory transmission by acting at D2Rs on GABAergic terminals to gate positive-timing Hebbian t-LTP. This D2R-mediated disinhibition alone is sufficient to drive t-LTP at Δt = +10 ms. However, induction of t-LTP at +30 ms, a substantially extended, normally ineffective positive timing also requires activation of postsynaptic D1R-cAMP/PKA pathway in pyramidal neurons, suggesting a need for cooperative actions of

D1Rs and D2Rs in separate inhibitory and excitatory microcircuits. In contrast, DA-enabled t-LTP at −30 ms requires only the activation of postsynaptic D1R-cAMP/PKA signaling in excitatory microcircuits, regardless of the presence of endogenous GABAergic inhibition. Thus, DA "opens" a 60 ms timing window that is otherwise "closed" for associative synaptic plasticity in prefrontal circuits.

## **DISCUSSION**

### **POSTSYNAPTIC D1 RECEPTORS AS COINCIDENCE MODULATORS**

The present study highlights a profound modulation of STDP quantitative rule by DA in the mouse PFC. The results support the notion that postsynaptic D1Rs, coupled to downstream cAMP/PKA signaling, are potent modulators of coincidence detection during associative synaptic plasticity. The normal temporal window for t-LTP induction in PFC excitatory synapses is approximately 10 ms (0 to +10 ms), which is extended by DA to +30 ms (Xu and Yao, 2010) and −30 ms (this study), resulting in a six-fold broadening! As in other synapses, NMDARs mediate DA-enabled t-LTP at both positive and negative timings across the window in these PFC synapses. However, activation of D1Rs by SKF81297 suppresses, rather than potentiates, both GluN2A- and GluN2Bmediated NMDAR currents. This result suggests that DA extends t-LTP window not by modulating NMDAR channels *per se*, but by acting on downstream signaling mechanisms that control t-LTP induction, similar to that seen in hippocampal neurons (Zhang et al., 2009).

The D1R-mediated inhibition of GluN2A-NMDARs and GluN2B-NMDARs contrasts the result from CA1 pyramidal cells in the mouse hippocampus, where these currents are oppositely regulated by D1Rs (Varela et al., 2009). Brain region differences in NMDAR compositions (Zhao et al., 2005; Wang et al., 2008) and DA signaling details, as well as variations in experimental conditions might contribute to the discrepancy. The suppression of both GluN2A-NMDAR and GluN2B-NMDAR currents by SKF81297 seems surprising because previous studies have shown that low-concentration SKF81297 potentiates synaptic NMDAR-EPSCs (Seamans et al., 2001). However, DA modulation of NMDARs in the PFC is known to be complex, and many factors,

t-LTP is absent when tissue DA level is minimal. When its concentration rises, DA can gate Hebbian t-LTP across a timing window of 0 → +30 ms. However, the mechanisms of t-LTP induction at different timings vary: at Δt = +10 ms, DA gates t-LTP induction through suppression of presynaptic GABA release by activating D2Rs at GABAergic terminals. At Δt = +30 ms, DA gates t-LTP induction through both suppression of presynaptic GABA release via D2Rs and postsynaptic activation of cAMP/PKA signaling downstream to D1Rs, highlighting the need of concurrent activation of both D1Rs and D2Rs in separate excitatory and inhibitory circuits. In contrast, negative-timing t-LTP can be gated by DA as well, but this form of anti-Hebbian t-LTP can be induced by activating postsynaptic D1Rs alone without the need to suppress GABAergic transmission involving presynaptic D2Rs in inhibitory circuits. Consequently, circuit cooperativity is not necessary for DA-enabled anti-Hebbian t-LTP. PN, pyramidal neurons; IN, interneurons; L2/3, layer 2/3; L5, layer 5.

including drug types and concentrations, influence the result (Seamans and Yang, 2004). For example, SKF81297 is known to exert an inverted-U dose-dependent modulation of NMDAR activity, where low doses potentiate but high doses inhibit it (Seamans and Yang, 2004). It would be important in the future to further determine the factors that contribute to D1 modulation of PFC NMDARs under different conditions.

What downstream mechanisms might be targeted by DA to drive t-LTP at negative timings? The dependence of −30 ms t-LTP on GluN2B-NMDARs, but not GluN2A-NMDARs, indicates that DA acts on GluN2B-mediated cellular signaling. Perhaps due to their unique subcellular localization, i.e., extrasynaptic (Bliss and Schoepfer, 2004; which is yet to be confirmed in the PFC by ultrastructural studies), GluN2B-NMDARs have been considered especially suitable for detection of post– pre spiking pairs, transducing negatively correlated synaptic

activity patterns to LTD (Gerkin et al., 2007). Compared to GluN2A-NMDARs, GluN2B-NMDARs undergo a slower Mg2<sup>+</sup> unblockade by back-propagating APs (bAPs; Clarke and Johnson, 2006), have a lower open channel probability (Chen et al., 1999), and permit less Ca2<sup>+</sup> influx, favoring the induction of LTD possibly by activating protein phosphatases 1 (PP1) and 2B (PP2B/calcineurin; Mulkey et al., 1994; Morishita et al., 2001). DA can inhibit PP1 and activate CaMKII, an essential signaling molecule required for most forms of LTP (Malenka and Bear, 2004), in the synapse through the D1R-cAMP/PKA-Inhibitor I/DARPP-32 pathway (Greengard et al., 1999), thus converting a "would-be-LTD" elicited by negative timing stimuli to LTP. Not necessarily mutually exclusive, D1R-cAMP/PKA signaling could also modulate voltage-sensitive dendritic ion conductances (Seamans and Yang, 2004) to influence the non-linear interaction of bAPs and subsequent EPSPs (Johnston et al., 1999), generating a Ca2<sup>+</sup> influx patterns that favor t-LTD. Regardless of the mechanisms, our data indicate that DA has a potent role in postsynaptic co-incidence detection during STDP, markedly broadening the temporal window for timing-dependent LTP induction.

#### **HEBBIAN vs. ANTI-HEBBIAN t-LTP IN PFC CIRCUITS**

In Hebb's (1949) original postulate, a lasting increase in synaptic strength occurs if repeated presynaptic firing precedes and contributes to firing of postsynaptic cells. The canonical form of STDP, especially the "LTP arm" is considered Hebbian because plasticity is induced by repeated pairings of pre–post discharges. In this regard, the DA-enabled t-LTP at −30 ms in our study is "anti-Hebbian." Similar forms of anti-Hebbian t-LTP have also been observed at several other synapses, including distal synapses between layer II/III and V pyramidal neurons in the somatosensory cortex (Letzkus et al., 2006), excitatory synapses onto striatal medium spiny neurons and cholinergic interneurons (Fino et al., 2005, 2008), and synapses between cultured hippocampal neurons (Zhang et al., 2009). Importantly, the anti-Hebbian t-LTP described here and elsewhere (Letzkus et al., 2006) depends on activation of postsynaptic NMDARs, suggesting that it is still associative by nature. This STDP variant sharply contrasts a non-associative, NMDAR-independent form of anti-Hebbian LTP in hippocampal interneurons that depends on hyperpolarization and Ca2+-permeable AMPARs (Lamsa et al., 2007).

Our protocol for anti-Hebbian t-LTP involves pairing 60 post– pre spikes at 0.1 Hz with Δt = −30 ms, a straightforward correlate of our t-LTP protocols (60 pre–post pairs at 0.1 Hz, +10 to +30 ms). Interestingly, while the positive-timing protocols are effective in inducing robust LTP in the absence of DA, the negative-timing protocol is ineffective in inducing LTD. Given that similar negative-timing protocols are effective in LTD induction in other DA target areas (Pawlak et al., 2010), the inability of our protocol to induce LTD in the PFC is surprising. Our data suggests that PFC plasticity mechanisms are rather unique. Future studies are needed to establish effective t-LTD protocols in the PFC under control conditions and it will be interesting to see whether such t-LTD can be converted to anti-Hebbian by DA, as is the case for hippocampal synapses (Zhang et al., 2009).

Our study provides evidence that Hebbian and anti-Hebbian t-LTP are differentially regulated by GABAergic inhibitory circuits. LTP, both conventional high-frequency stimulation (HFS) induced and positive timing-dependent, is susceptible to GABAergic inhibition (Wigstrom and Gustafsson, 1983; Bissiere et al., 2003; Meredith et al., 2003; Liu et al., 2005; Tully et al., 2007), suggesting that Hebbian LTP is constrained by inhibitory network under native conditions. Consistent with this view, we recently showed that the induction of positive-timing t-LTP in PFC layer V neurons requires suppression of GABAergic transmission (Xu and Yao, 2010). In contrast, our current findings indicate that negative-timing t-LTP can be induced, albeit with a more delayed time course, without suppressing endogenous inhibitory transmission, suggesting that GABAergic circuits have a less constraining effect on anti-Hebbian t-LTP. The differential effects of GABA on Hebbian and anti-Hebbian t-LTP may be attributed to differences in the timing of GABA release in pre–post and post–pre pairings. In our experiments, GABA release is likely associated with activation of the cortical feedforward inhibitory pathway by presynaptic layer II/III stimulation. Although GABA is unlikely to influence dendritic membrane properties at resting state because of the near identical resting membrane potential and Cl<sup>−</sup> reversal potential in PFC pyramidal neurons, it may differentially impact dendritic depolarization during pre–post or post–pre pairings. Specifically, GABA-mediated IPSPs shunt EPSPs on the rising phase of bAPs during pre–post pairings whereas IPSPs curtail EPSPs on the falling tail of bAPs during post–pre pairings. As a consequence, GABA exerts different effects on EPSP, bAP, and their non-linear summation under the two timing conditions, resulting in differential activation of NMDARs and Ca2<sup>+</sup> influx dynamics that could dictate whether LTP or LTD will be induced. Indeed, GABA has been shown to influence dendritic depolarization and modify the balance of NMDARs and voltage-sensitive Ca2<sup>+</sup> channels at corticostriatal synapses, where it controls the polarity of STDP (Paille et al., 2013). We note, however, that all our experiments were conducted in the absence of GABAB receptor antagonists, thus potential effects of these receptors, especially presynaptic autoreceptors (Davies et al., 1991) in anti-Hebbian t-LTD cannot be excluded.

#### **PHYSIOLOGICAL RELEVANCE OF ANTI-HEBBIAN t-LTP**

The DA hypothesis of reward learning posits that DA serves as an instructing signal that enables and/or facilitates synaptic modifications to reinforce ongoing associative adaptive behaviors and mnemonic processes (Schultz, 2002; Wise, 2004). The profound effects of DA on STDP in the PFC support the emerging tri-component STDP learning rule that neuromodulators can potently influence the gating, polarity, shape, timing window, and other quantitative parameters of STDP (Pawlak et al., 2010). Importantly, our results suggest that the effect of DA is always facilitating, regardless of the temporal order of pre vs. postsynaptic spiking. This provides a mechanism of spatial and temporal binding of active but not necessarily causally correlated inputs to activated DA afferents to strengthen these inputs. Anti-Hebbian t-LTP may serve to strengthen late-spiking inputs which would have been weakened otherwise under Hebbian STDP, attaching necessary motivational salience for these inputs. Prefrontal layer V neurons receive inputs from other cortical regions as well as thalamocortical and hippocampal pathways and process top-down information from these regions. Implementation of both Hebbian and anti-Hebbian t-LTP by these neurons may prove advantageous in the effective association and integration of cortical, thalamus, and hippocampal information to guide behavioral adaptation. However, in computational models that assign importance to STDP for learning and memory, typically generation of both LTP and LTD is considered relevant. Thus, mechanisms that can weaken the potentiated synapses on these neurons should exist. Additional studies will be required to define how timing of DA release, local concentration and dynamics of DA transients, and DA receptor distributions at target dendritic spines shape STDP window and polarity, in particular t-LTD. Incorporating these mechanistic details can improve the current neural network models (Baras and Meir, 2007; Florian, 2007; Izhikevich, 2007; Fremaux et al., 2010) of learning and reward, which in turn, will deepen our understanding of the roles of DA in normal reward and motivation as well as in pathological conditions, such as addiction, depression, and schizophrenia.

#### **AUTHOR CONTRIBUTIONS**

Hongyu Ruan, Taixiang Saur, and Wei-Dong Yao designed research; Hongyu Ruan and Taixiang Saur performed research; Hongyu Ruan, Taixiang Saur, and Wei-Dong Yao analyzed data; and Wei-Dong Yao wrote the paper.

#### **ACKNOWLEDGMENTS**

We thank members of the laboratory for comments and discussions and Ms. Donna Reed for editorial assistance. This study was supported by National Institutes of Health grant DA032283 (Wei-Dong Yao) and National Center for Research Resources grant OD011103 (New England Primate Research Center). We thank Dr. Y. P. Auberson at Novartis Pharma AG, Basel for the generous gift of NVP-AAM077.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 February 2014; accepted: 02 April 2014; published online: 23 April 2014. Citation: Ruan H, Saur T and Yao W-D (2014) Dopamine-enabled anti-Hebbian timing-dependent plasticity in prefrontal circuitry. Front. Neural Circuits 8:38. doi: 10.3389/fncir.2014.00038*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Ruan, Saur and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Nicotinic modulation of cortical circuits

#### **Sergio Arroyo† , Corbett Bennett† and Shaul Hestrin\***

Department of Comparative Medicine, Stanford University School of Medicine, Stanford, CA, USA

#### **Edited by:**

Allan T. Gulledge, Geisel School of Medicine at Dartmouth, USA

#### **Reviewed by:**

Bruno Cauli, CNRS and UPMC, France Randy M. Bruno, Columbia University, USA

#### **\*Correspondence:**

Shaul Hestrin, Department of Comparative Medicine, Stanford University School of Medicine, Edwards R314, 300 Pasteur Drive, Stanford, CA 94305, USA e-mail: shestrin@stanford.edu

†These authors have contributed equally to this work.

The ascending cholinergic neuromodulatory system sends projections throughout cortex and has been shown to play an important role in a number of cognitive functions including arousal, working memory, and attention. However, despite a wealth of behavioral and anatomical data, understanding how cholinergic synapses modulate cortical function has been limited by the inability to selectively activate cholinergic axons. Now, with the development of optogenetic tools and cell-type specific Cre-driver mouse lines, it has become possible to stimulate cholinergic axons from the basal forebrain (BF) and probe cholinergic synapses in the cortex for the first time. Here we review recent work studying the cell-type specificity of nicotinic signaling in the cortex, synaptic mechanisms mediating cholinergic transmission, and the potential functional role of nicotinic modulation.

**Keywords: cholinergic, nicotinic receptors, interneuron, volume transmission, optogenetics**

## **INTRODUCTION**

Cholinergic axons from the basal forebrain (BF) innervate the entire cortex and are the main source of cortical acetylcholine (ACh; Mesulam et al., 1983; Rieck and Carey, 1984; Rye et al., 1984; Saper, 1984; Eckenstein et al., 1988). Endogenously released ACh activates both metabotropic muscarinic and/or ionotropic nicotinic acetylcholine receptors (nAChRs) expressed on cortical neurons. In this review, we will focus on nAChR activation in the cortex.

Nicotinic receptors are pentameric proteins comprised of particular combinations of subunits α2–α7 and β2–β4 (Cordero-Erausquin et al., 2000; Dani and Bertrand, 2007). In the cortex, two main types of nAChRs predominate: the low affinity homomeric α7 receptor and the high affinity heteromeric α4β2 receptor, though the α5 subunit is expressed to a lesser extent as well (Winzer-Serhan and Leslie, 2005; Kassam et al., 2008). Because these receptors exhibit distinct cationic permeabilities, agonist affinities, and desensitization properties (Dani and Bertrand, 2007), phasic activation of cholinergic BF axons can produce a temporally complex pattern of nAChR-dependent activation in cortical neurons depending on the identity and proportion of receptor subtypes being expressed.

## **CELL-TYPE SPECIFICITY OF NICOTINIC RECEPTOR EXPRESSION**

Several studies applying exogenous cholinergic agonists have demonstrated that only a fraction of cortical cells express functional nAChRs (summarized in **Figure 1**). In the supragranular layers, nicotinic receptors are expressed exclusively in inhibitory cells, including all L1 interneurons (Christophe et al., 2002; Gulledge et al., 2007) and a heterogeneous subset of L2/3 interneurons that co-express one or more of the following biochemical markers: vasoactive intestinal peptide (VIP), cholecystokinin, calretinin, calbindin, and neuropeptide Y (Porter et al., 1999; Gulledge et al., 2007). However, in two of the most prominent classes of inhibitory cells, parvalbumin (PV)-expressing and somatostatin (SOM)-expressing interneurons, nAChR expression is either absent or sparse (Porter et al., 1999; Gulledge et al., 2007). Interestingly, many if not all nAChR-expressing interneurons also express the ionotropic serotonergic receptor (5HT3, Férézou et al., 2002; Lee et al., 2010). Given that cholinergic cells in the BF and serotonergic cells in the raphe nucleus are both more active during wakefulness than during non-rapid eye movement sleep (Wu et al., 2004; Lee et al., 2005), the cortical targets on which these neuromodulatory systems converge may play a role in producing the pattern of activity associated with wakefulness.

Less is known about the pattern of nAChR expression in the lower cortical layers. Nicotinic receptors are expressed presynaptically on thalamocortical axons in L4 (Gil et al., 1997; Disney et al., 2007) where they have been shown to enhance sensory responses (Disney et al., 2007). In L5, nicotinic responses have been reported in low-threshold spiking (LTS; Xiang et al., 1998; but see Porter et al. (1999); Gulledge et al. (2007)) but not fast-spiking (FS) interneurons (Xiang et al., 1998; Porter et al., 1999; Gulledge et al., 2007). Thus, in both supra- and infragranular cortex PV+ interneurons do not exhibit postsynaptic nicotinic responses, suggesting that some rules for nAChR expression in GABAergic cells may be shared between the upper and lower layers (Gulledge et al., 2007). Interestingly, in contrast to pyramidal cells in the supragranular layers, nicotinic responses have been demonstrated in L6 pyramidal neurons (Kassam et al., 2008) and L5 pyramidal neurons (Zolles et al., 2009; Poorthuis et al., 2013), although responses in L5 pyramidal neurons have not been universally reported (Porter et al., 1999; Gulledge et al., 2007).

## **BASAL FOREBRAIN (BF) CHOLINERGIC AXONS TARGET SPECIFIC CORTICAL CELL TYPES**

The properties of α7 and non-α7 receptors and their pattern of expression in cortical cells suggest that postsynaptic nicotinic responses may vary in their kinetics. In order to study the properties of nAChR-mediated responses in cortex, it is necessary to record responses to selective activation of cholinergic fibers. Several recent studies have used optogenetic tools to probe cholinergic synapses throughout the brain, including the hippocampus (Gu and Yakel, 2011), thalamus (Sun et al., 2013), interpeduncular nucleus (Ren et al., 2011), and striatum (English et al., 2012). In the cortex, we have recently shown that L1 interneurons, L2/3 late-spiking (LS) interneurons, and L2/3 choline acetyltransferase (ChAT)-expressing interneurons (a class of cells that also express VIP) exhibit nicotinic responses following photostimulation of channelrhodopsin-2 (ChR2)-expressing BF axons (Arroyo et al., 2012). The endogenous nicotinic response in L1 and L2/3 LS cells was mediated both by α7 and non-α7 nAChRs, while the responses in L2/3 ChAT/VIP-expressing cells exhibited only nonα7 receptor responses.

By eliciting endogenous release of ACh from BF cholinergic axons, we were able to characterize the cholinergic synapse in the cortex for the first time and identify the time course of nAChR-mediated responses. Interestingly, the kinetics of the responses mediated by α7 and non-α7 nAChRs differed by an order of magnitude (α7: rise time ∼3 ms, decay tau ∼5 ms; nonα7: rise time ∼35 ms, decay tau ∼200 ms; **Figure 2A,** Arroyo et al., 2012). Although the peak amplitude of the fast α7 response was often larger, more charge was transferred via the slower nonα7 response, leading to a slow barrage of disynaptic inhibition in upper layer pyramidal neurons and FS cells (Arroyo et al., 2012).

## **MECHANISMS UNDERLYING NICOTINIC TRANSMISSION IN THE CORTEX**

Cholinergic cells in the BF project throughout the cortex where they form a dense web of presynaptic varicosities spanning all cortical layers. Numerous anatomical studies observed that a large fraction of these varicosities are not directly adjacent to postsynaptic structures, leading to the hypothesis that the cholinergic system operates primarily by diffuse release of neurotransmitter into the extracellular space ("volume transmission") (Mrzljak et al., 1993; Umbriaco et al., 1994; Lendvai and Vizi, 2008; Yamasaki et al., 2010), though others have emphasized the presence of classical synaptic contacts (Turrini et al., 2001).

The presence of both a slow nicotinic response mediated by the high affinity non-α7 receptor and a fast response mediated by the low affinity α7 receptor (Arroyo et al., 2012) led us to hypothesize that these two response components might be mediated by volume transmission and classical synaptic transmission, respectively. We performed several lines of experiments to test this possibility.

The trial-to-trial variability of synaptic responses depends in part on the number of release sites mediating transmission between presynaptic fibers and the postsynaptic cell (Manabe et al., 1993). Because non-synaptic receptors activated by volume transmission can sample release from many presynaptic sites, this form of signaling should be characterized by low variability (Szapiro and Barbour, 2007). In cells exhibiting dual component excitatory postsynaptic currents (EPSCs; **Figure 2A**) we found that the response variability of the slow component was severalfold smaller than that of the fast component as quantified by the coefficient of variation (CV; **Figures 2B, C**, Bennett et al., 2012). Moreover, the amplitudes of the fast and slow response components were not correlated across single trials (**Figure 2D**). These data are consistent with the notion that the slow response component is mediated by ACh release from many non-synaptic release sites while the fast response component is mediated by relatively fewer release sites onto classical postsynaptic terminals.

**FIGURE 2 | Synaptic mechanisms underlying cholinergic transmission**. **(A)** Example dual-component response recorded under voltage clamp. Note the fast α7 mediated response followed by the slower non-α7 response. Inset, fast component is displayed on an expanded timescale. **(B)** Response amplitude for the slow component is plotted against the response amplitude of the fast component for two cells. Note that the fast component exhibits much more variability in amplitude relative to the slow component. **(C)** Variability of the two response components quantified as the coefficient of variation (CV). **(D)** Example single trial responses to photostimulation demonstrate a reliable slow component across trials in which the fast component varied widely. Orange traces represent trials in which a fast component was not detectable. **(E)** Dual-component nicotinic responses before and after application of the AChE inhibitor ambenonium. Inset, expanded timescale reveals no effect of the AChE blocker on the fast component. Blue circles and ticks represent photostimulation.

Responses mediated by volume transmission are highly sensitive to perturbations of transmitter clearance (Szapiro and Barbour, 2007). We found that application of an AChE inhibitor drastically prolonged the decay of the slow but not the fast nicotinic response (**Figure 2E**, Bennett et al., 2012). Moreover, application of exogenous AChE selectively attenuated the slow response (Bennett et al., 2012). Together, these data suggest that the fast and slow nicotinic responses are mediated by distinct synaptic mechanisms.

A conclusive determination of synaptic or non-synaptic transmission requires detailed anatomical reconstruction of receptor localization relative to presynaptic varicosities and a characterization of the kinetics of α7 and non-α7 receptors. To date, no anatomical study has examined the spatial relationship between nicotinic receptor subtypes and cholinergic varicosities in the cortex. Furthermore, though we were able to estimate the kinetics of α7 receptors for a range of ACh concentrations using nucleated patches, we did not observe non-α7 receptor responses in this preparation, and no previous studies report the kinetics of natively expressed non-α7 receptors.

Given the lack of anatomical data, we cannot exclude the possibility that α7 receptors are located perisynaptically and not at classical postsynaptic specializations, since both of these arrangements could produce high variability and insensitivity to AChE perturbation. Similarly, our data do not definitively rule out the possibility that non-α7 receptor-mediated currents are synaptic. However, the synapse mediating this response would have to fulfill several specific criteria. To explain our AChE perturbation results, the synaptic cleft would have to be constructed such that activation of postsynaptic receptors is primarily limited by hydrolysis of ACh by AChE rather than diffusion. This is remarkable given that diffusion of neurotransmitter out of a conventional synaptic cleft is extremely fast (concentration decay *t*1/<sup>2</sup> ∼0.15 ms; Eccles and Jaeger, 1958). Moreover, the slow rise time of the non-α7 receptormediated EPSC (20–80% in 35 ms) would require that these receptors exhibit exceptionally slow activation kinetics. Since both synaptic and nonsynaptic cholinergic varicosities are found in cortex, we believe that a more parsimonious explanation of our data is that non-α7 nicotinic receptors are located extrasynaptically where they bind ACh diffusing from nonsynaptic release sites.

## **FUNCTIONAL CONSEQUENCES OF NICOTINIC RECEPTOR ACTIVATION IN THE CORTEX**

Numerous studies have demonstrated that activation of nicotinic receptors is critical for normal cognition. Administration of nicotine has been shown to enhance working memory and attention and to alleviate the cognitive deficits observed in multiple neuropsychiatric conditions (Levin, 2002). Furthermore, loss of the β2 nAChR subunit, a necessary component of the high affinity non-α7 cortical nAChR (α4β2), has been shown to impair both learning (assayed by a passive avoidance task; Picciotto et al., 1995) and attention (assayed by the 5 choice serial reaction time test, 5CSRTT; Cordero-Erausquin et al., 2000; Guillem et al., 2011). Though knockout of the α7 nAChR subunit does not affect gross neurological function (Orr-Urtreger et al., 1997) or performance on the 5CSRTT (Grottick and Higgins, 2000; Howe et al., 2010; Guillem et al., 2011), recent evidence suggests that activating α7 nAChRs may alleviate the cognitive impairments associated with Alzheimer's disease and schizophrenia (Levin, 2013). Recently, it was shown that optogenetic activation of BF cholinergic axons in visual cortex enhanced performance on a visual discrimination task, while silencing BF cholinergic cells impaired performance (Pinto et al., 2013). However, whether this effect was mediated by nicotinic or muscarinic receptors was not investigated.

Several mechanisms have been proposed to account for the behavioral enhancements associated with nAChR activation. First, it has been suggested nAChR activation may lead to amplification of sensory responses by modulating release from thalamocortical terminals. Indeed, in brain slices preserving thalamocortical connections, it was shown that release from thalamocortical terminals is enhanced by nicotine (Gil et al., 1997). A recent study extended this finding by showing that iontophoresis of nicotine in primate visual cortex augments responses to visual stimuli (Disney et al., 2007). Interestingly, nAChRs are present on thalamocortical axons targeting excitatory but not inhibitory cells in L4 (Disney et al., 2007; Kruglikov and Rudy, 2008), suggesting that ACh may play a role in modulating the balance of excitation and inhibition elicited by sensory stimuli.

Another line of studies suggests that nAChR activation may shape the spatiotemporal pattern of inhibition in cortex by differentially modulating the excitability of distinct classes of interneurons. Our data demonstrate that activation of cholinergic axons in brain slices elicits disynaptic inhibition in both pyramidal neurons and inhibitory FS cells (Arroyo et al., 2012). This nAChR-dependent inhibition of FS cells is consistent with a recent study showing that cholinergic activation following foot shock inhibits spiking in L2/3 PV+ neurons in auditory cortex (Letzkus et al., 2011). In this study, the authors show that a fraction of L1 interneurons exhibit a nAChR-dependent increase in spiking after foot shock and suggest that these cells mediate the inhibition observed in PV+ cells; however, whether other nAChR expressing interneurons in L2/3 play a role in mediating cortical disinhibition was not definitively ruled out. Indeed, two recent studies suggest that another population of nAChR-expressing cells, VIP+ interneurons, preferentially target SOM-expressing interneurons in the visual cortex (Pfeffer et al., 2013) and barrel cortex (Lee et al., 2013) and, to a lesser degree, PV+ interneurons (Dávid et al., 2007; Hioki et al., 2013; Pi et al., 2013). Thus, it is likely that nAChR activation produces disinhibition via both L1 interneurons (Christophe et al., 2002; Letzkus et al., 2011; Jiang et al., 2013) and L2/3 VIP+ interneurons (Lee et al., 2013; Pfeffer et al., 2013).

The substantial difference in kinetics between α7 and non-α7 nicotinic receptors together with their cell-type specific expression suggests that these two nAChRs may play distinct roles in modulating cortical activity. For example, temporally precise excitation mediated by α7 receptors may synchronize activity in α7-receptor expressing interneurons. In contrast, slow excitation mediated by non-α7 receptors may facilitate modulatory pathways that unfold over longer time scales. Indeed, nAChRexpressing interneurons have been implicated in a number of slow processes, including inhibition mediated by postsynaptic GABA<sup>B</sup> receptors (Tamás et al., 2003), reduction of synaptic efficacy by activation of presynaptic GABA<sup>B</sup> receptors (Oláh et al., 2009; Chittajallu et al., 2013), and regulation of cerebral blood flow (Cauli et al., 2004).

### **FUTURE DIRECTIONS**

Ultimately, understanding how nAChR activation modulates cortical activity will require a more complete understanding of (1) the patterns of activity in cortically projecting cholinergic axons during behavior; (2) the functional roles of nAChRexpressing cortical neurons and their subsequent modulation by endogenously released ACh; and (3) the respective impact of fast and slow nicotinic modulation on cortical circuits. The recent proliferation of Cre-driver lines has allowed investigators to begin to probe the function of various classes of cortical cells, including some cell-types known to express nAChRs. However, further work is needed to uncover how the function of these cortical neurons is modulated by activation/silencing of cholinergic fibers and blockade of specific receptor subtypes. Given the well-established role for nicotinic signaling in numerous neuropsychiatric diseases, a better understanding of the mechanisms underlying nicotinic modulation of cortical activity holds promise for the development of more effective therapeutic interventions.

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 January 2014; accepted: 10 March 2014; published online: 28 March 2014. Citation: Arroyo S, Bennett C and Hestrin S (2014) Nicotinic modulation of cortical circuits. Front. Neural Circuits 8:30. doi: 10.3389/fncir.2014.00030*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Arroyo, Bennett and Hestrin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Cholinergic modulation of the medial prefrontal cortex: the role of nicotinic receptors in attention and regulation of neuronal activity

#### **Bernard Bloem1,2 , Rogier B. Poorthuis <sup>3</sup> and Huibert D. Mansvelder <sup>1</sup>\***

<sup>1</sup> Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, Neuroscience Campus Amsterdam, Vrije Universiteit, Amsterdam, Netherlands

<sup>2</sup> McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA

<sup>3</sup> Max Planck Institute for Brain Research, Frankfurt am Main, Germany

#### **Edited by:**

Evelyn K. Lambe, University of Toronto, Canada

#### **Reviewed by:**

Vinay V. Parikh, Temple University, USA Craig Edward Brown, University of Victoria, Canada

#### **\*Correspondence:**

Huibert D. Mansvelder, Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, Neuroscience Campus Amsterdam, Vrije Universiteit, de Boelelaan 1085, 1081 HV, Amsterdam, Netherlands e-mail: h.d.mansvelder@vu.nl

Acetylcholine (ACh) release in the medial prefrontal cortex (mPFC) is crucial for normal cognitive performance. Despite the fact that many have studied how ACh affects neuronal processing in the mPFC and thereby influences attention behavior, there is still a lot unknown about how this occurs. Here we will review the evidence that cholinergic modulation of the mPFC plays a role in attention and we will summarize the current knowledge about the role between ACh receptors (AChRs) and behavior and how ACh receptor activation changes processing in the cortical microcircuitry. Recent evidence implicates fast phasic release of ACh in cue detection and attention. This review will focus mainly on the fast ionotropic nicotinic receptors and less on the metabotropic muscarinic receptors. Finally, we will review limitations of the existing studies and address how innovative technologies might push the field forward in order to gain understanding into the relation between ACh, neuronal activity and behavior.

**Keywords: acetylcholine, nicotinic receptors, medial prefrontal cortex, attention, neurophysiology**

## **INTRODUCTION**

The prefrontal cortex (PFC) is thought to be important for the highest cognitive processes, including executive functioning (Alvarez and Emory, 2006; Euston et al., 2012), working memory (Funahashi, 2013), decision making (Euston et al., 2012), retrieval from long term memory (Rugg et al., 1996; Tomita et al., 1999), social behavior (Forbes and Grafman, 2010; Avale et al., 2011), emotion (Davidson and Irwin, 1999; Wallis, 2007), personality (Damasio et al., 1994; Kennis et al., 2013) and attention (Miller and Cohen, 2001; Euston et al., 2012). It is thought that subregions mediate different functions. In rodents, the medial part of the PFC (mPFC), has been shown to be important for goal-directed action (Killcross and Coutureau, 2003), working memory (Rossi et al., 2012) and attention (Muir et al., 1996; Passetti et al., 2002; Totah et al., 2009; Euston et al., 2012). This part of the PFC roughly corresponds to the dorsolateral PFC in humans and other primates (Uylings et al., 2003; Vertes, 2004, 2006; Farovik et al., 2008). Lesions of this region result in severe attentional deficits (Muir et al., 1996; Passetti et al., 2003; Kahn et al., 2012) and neuroimaging and electrophysiological studies have shown that this part of the brain is involved in behavioral tasks requiring sustained attention (Gill et al., 2000; Totah et al., 2009; Bentley et al., 2011). Moreover, increasing attentional load by reducing stimulus saliency or introducing distracters increases neuronal activity in the mPFC (Gill et al., 2000).

The PFC receives a dense cholinergic innervation and it is thought that this neurotransmitter plays an important role in the PFC, especially in behavior requiring attention. Acetylcholine (ACh) is a neurotransmitter that is produced in a small number of cells, but has widespread effects throughout the brain (Woolf and Butcher, 2011). Most important for ACh release in the cortex is the basal forebrain, a brain area composed of several cholinergic nuclei, including the nucleus basalis, the septum, the substantia innominata and the diagonal band of Broca (Mesulam, 1995; Zaborszky et al., 1999; Woolf and Butcher, 2011). In addition, ACh is produced in some midbrain nuclei, that is the pedunculopontine nucleus and laterodorsal tegmental area (Mesulam et al., 1983), and in sparsely distributed cholinergic interneurons (Eckenstein and Baughman, 1984; von Engelhardt et al., 2007). In contrast to its local production, the effects ACh exerts on the brain networks are strong and widely distributed. Almost all regions of the brain are innervated by cholinergic neurons and many neurons and glial cells express ACh receptors (AChRs; Van der Zee and Luiten, 1999; Van der Zee and Keijser, 2011; Picciotto et al., 2012). However, it is currently not known how specific the projections of the neurons in the basal forebrain are (Fournier et al., 2004; Chandler and Waterhouse, 2012; Chandler et al., 2013).

To study the effects of ACh on behavior and cognition, researchers have used techniques to measure ACh levels, such as microdialysis and amperometry, and methods to manipulate the cholinergic system, using pharmacology, specific cholinergic lesions and optogenetic manipulations of ACh release. Together, these results indicate that ACh is crucial for attention (Bentley et al., 2011; Klinkenberg et al., 2011), arousal (Metherate et al., 1992; Détári et al., 1999; Platt and Riedel, 2011), learning and memory (Kilgard and Merzenich, 1998; Hasselmo, 2006; Gu et al., 2012) and the sleep-wake cycle (Deurveilher and Semba, 2011; Lin et al., 2011; Platt and Riedel, 2011). It is thought that the effect of ACh depends on its target areas (Everitt and Robbins, 1997; Bentley et al., 2011). In relation to the mPFC, ACh seems mostly involved in attention. Therefore, findings relevant to the role of ACh on attention will be discussed here.

Many studies have demonstrated that pharmacological interventions targeting the cholinergic system or lesions of the basal forebrain affect attention (Jones and Higgins, 1995; Mirza and Stolerman, 2000; Risbrough et al., 2002; Robbins, 2002; Pattij et al., 2007), in addition to other cognitive functions. However, due to the lack of specificity of these methods, it is hard to draw firm conclusions about these, since many processes and brain structures are manipulated simultaneously. Fortunately, more recently it became possible to manipulate the cholinergic system more finely. Studies using local cholinergic lesions or drug administrations and local cholinergic measurements have provided a clearer picture about the role of ACh in the mPFC.

In this review, we will evaluate the evidence that ACh release in the mPFC is involved in attention. The role of AChR, and in particular nicotinic acetylcholine receptors (nAChR), in attention is reviewed and the way in which receptor activation modulates local neuronal activity. In addition, we will address the modulation of these processes by nicotine and smoking and the role of the cholinergic modulation of the mPFC in neuropsychiatric disorders. Finally, an outlook is provided concerning the new possibilities to study the role of ACh release in the mPFC, its relation to behavior and the mechanisms through which this occurs.

## **ACETYLCHOLINE IN THE MEDIAL PREFRONTAL CORTEX (mPFC)**

Several lines of evidence indicate that the cholinergic innervation of the mPFC is specifically involved in attention. First, local cholinergic lesions, using the specific immunotoxin 192 immunoglobulin G (IgG)-saporin, result in severely compromised performance in sustained attention tasks (Gill et al., 2000; Chudasama et al., 2004; Dalley et al., 2004). In addition, attention related increases in neuronal activity in the mPFC were absent after cholinergic lesions (Gill et al., 2000).

Secondly, microdialysis studies indicate that attentional tasks are accompanied by increases in ACh concentrations in the mPFC (Passetti et al., 2000; Dalley et al., 2001) that are correlated to the current attentional demands (Kozak et al., 2006). Moreover, recent technological advances (Parikh et al., 2004) made it possible to measure ACh release on a finer timescale. This has revolutionized our understanding of the cholinergic modulation of cortical processes. In particular, the group of Martin Sarter (Parikh et al., 2007; Howe et al., 2013) demonstrated that, whereas cholinergic signaling was traditionally considered to be slow and tonic, there are actually fast transients of ACh in the mPFC during attention tasks. During cues that were detected, rapid elevations

in ACh concentrations were observed in the mPFC, whereas in motor cortex, these "transients" were absent. These findings have demonstrated ACh release in relation to a specific cognitive operation and demonstrated that this attentional process involves ACh in the mPFC.

Furthermore, whereas most pharmacological studies concerning the role of AChRs affect many cognitive operations at the same time and cannot differentiate the effects on different brain regions, local infusion of pharmacological agents in the mPFC (Hahn et al., 2003b; Chudasama et al., 2004) can demonstrate an involvement of specific receptors in that region in a certain task. With this method, several groups have demonstrated important roles of the nicotinic (nAChR; Hahn et al., 2003b) and muscarinic (mAChR; Robbins, 2002; Chudasama et al., 2004) receptors in the mPFC in attentional processes.

Finally, it should be noted that the relationship between the mPFC and the basal forebrain is reciprocal. Whereas other cortical areas are also innervated by the basal forebrain, the mPFC is the major source of cortical projections to the basal forebrain (Zaborszky et al., 1997). Hence, it seems that the mPFC is located in a special position with regard to the basal forebrain and that the mPFC-basal forebrain system is critical in mediating sustained attention.

Given the important role of the cholinergic modulation of the mPFC in healthy individuals and the crucial involvement in many neuropsychiatric disorders, it is of great importance to understand the mechanisms by which ACh contributes to cognition and how it influences processing in the microcircuit underlying cognition. Despite the fact that we know that the mPFC and ACh play crucial roles in the ability to focus our attention, very little is known about the exact mechanisms. In particular, the recently discovered phasic cholinergic modulation is very poorly understood. There have been many studies on tonic effects of ACh, suggesting that ACh acts as a neuromodulator and affects attention by increasing the excitability of networks (Picciotto et al., 2012). However, the recent findings that ACh is not only involved in attention by a tonic neuromodulatory role, but also in the mediation of specific cognitive events in single trials—namely cue detection—has posed the question of how short phasic ACh release affects processing in the mPFC network. Recent studies have shed light on how short applications of ACh affect processing in cortical networks and on the role these receptors play in attention. Because the timescale of nAChRs match well with the timescale of the observed phasic release of ACh, most of this review will be devoted to the role of nAChRs in the modulation of processing and the enhancement of attention.

## **CHOLINERGIC INNERVATION OF THE MEDIAL PREFRONTAL CORTEX (mPFC)**

In order to understand the effects of ACh on cortical processing, it is crucial to first know the patterns of innervation. When antibodies for the ACh generating enzyme, choline acetyltransferase (ChAT), became available in the 1980's, it quickly became clear that the entire cortical mantle is innervated densely with cholinergic axons (Kimura et al., 1980; Bigl et al., 1982; Mesulam et al., 1983; Woolf et al., 1983; Eckenstein and Baughman, 1984; Eckenstein et al., 1988; Wenk, 1997). It was demonstrated that most cholinergic axons originate from the basal forebrain, although cholinergic neurons are also present in the cortex itself (Eckenstein and Baughman, 1984; von Engelhardt et al., 2007). In addition, the PFC receives some fibers from the pedunculopontine nucleus and the laterodorsal tegmental area (Mesulam et al., 1983; Eckenstein et al., 1988), although the functional significance of this is unknown. Although the entire cortex is innervated by ACh, there are laminar differences. In general, layer I–III and layer V are most strongly innervated and layer IV the least. This is due to a layer specificity in the projections of the basal forebrain (Eckenstein et al., 1988) There are differences in this pattern between cortical areas, however, and in the PFC a clear laminar pattern is absent (Eckenstein et al., 1988).

In addition to the pattern of innervation, it is also crucially important to determine what the mode of transmission is. Recently it has been shown that there is both tonic and phasic cholinergic signaling in the mPFC (Parikh et al., 2007). Moreover, it has been long debated whether ACh functions through volume or synaptic transmission (Smiley et al., 1997; Sarter et al., 2009). Both aspects of transmission are crucial for determining the effects of ACh on the mPFC. Recent evidence indicates that most likely both are present (Parikh et al., 2007; Bennett et al., 2012) and that there is a complex interplay of tonic and phasic release, and volume and synaptic transmission, making the precise release parameters crucial for determining the effects on the mPFC.

## **ACETYLCHOLINE RECEPTORS**

There are two types of AChRs: the nAChR and mAChR. Both receptors allow ACh to change the electrical activity of the target cells and to affect other processes through intracellular signaling cascades (Dajas-Bailador and Wonnacott, 2004; Gulledge and Stuart, 2005; Intskirveli and Metherate, 2012; Thiele, 2013; Yakel, 2013). However, these receptors function in fundamentally different ways. The nAChR is a pentameric ionotropic receptor, belonging to the cystine-loop superfamily of receptors (Gotti and Clementi, 2004; Changeux, 2012). When ACh binds nAChRs, the channel opens and a direct cationic inward current occurs, which depolarizes the membrane. In contrast, the mAChR is a G-protein coupled receptor and functions through an intracellular signaling cascade (Bubser et al., 2012).

### **MUSCARINIC ACETYLCHOLINE RECEPTORS**

There are five different types of mAChRs (M1–M5), all of which are G-protein coupled receptors (Bubser et al., 2012). They can be divided into two principal types, based on the intracellular α subunit type of the G-protein they are bound to. The first main group is made up of the M1, M3 and M5 receptors which interact with Gq/11 proteins, whereas the second group includes M2 and M4 and interacts with Gi/o proteins (Brown, 2010).

In the cortex, mainly M1, M2 and M4 are present (Levey et al., 1991), although M4 has a considerable lower expression than the first two. Through a variety of intracellular signaling cascades, mAChR activation affects the functioning of many ion channels, resulting in changed conductances of mainly potassium and calcium channels (Thiele, 2013). In general, M1 activation results in a lower potassium conductance, whereas M2 and M4 result in an increase of potassium conductance and a decrease of calcium conductance. Gulledge et al. (Gulledge and Stuart, 2005; Gulledge et al., 2007, 2009) have demonstrated that cortical layer V pyramidal neurons are strongly modulated by M1 receptors in a complex fashion. Phasic ACh application hyperpolarized and/or depolarized these neurons, whereas tonic presence of ACh had the opposite effect. Importantly, the intracellular signaling pathway mediated effects of mAChR binding have a slow timescale compared to the effects mediated by nAChR, which result in a direct inward current with a fast onset and a slower duration (Gulledge et al., 2007).

#### **NICOTINIC ACETYLCHOLINE RECEPTORS**

nAChRs are ligand-gated ion channels with a pentameric structure and are composed of five subunits. There are 12 neuronal subunits (α2–α10 and β2–β4) (Gotti and Clementi, 2004) and, consequently, there are many types of receptors that can be formed (Gotti et al., 2006). There are two main subfamilies of nAChRs. The first is the homopentameric receptors that are formed by 5 α subunits. Both ACh and nicotine, an exogenous ligand of the nAChR, bind to the interfaces of the opposite sides of the α subunits. Second, there are heteropentameric receptors that are composed of two α subunits, carrying the principle ligand binding site, and two β subunits, containing the complementary binding (Gotti et al., 2006). In addition, there is a fifth subunit that does not contribute to ligand binding but which can nevertheless influence the characteristics of the receptor. In the cerebral cortex, there are only two main types of receptors present (Alkondon and Albuquerque, 2004). First, there are homopentameric receptors composed of five α7 subunits. Secondly, there are heteromeric receptors that contain 2 α4 subunits, 2 β2 subunits and a fifth subunit, which can be α4, β2 or α5 (Albuquerque et al., 2009). There are important differences between the different nAChRs and this also holds true for the two types present in the cerebral cortex.

All nAChRs are cationic selective channels, permitting a flow of Na+, K<sup>+</sup> and Ca2+, thereby depolarizing the membrane. However, there are substantial differences in the conductances for these individual ions in the different receptor types (Fucile, 2004). It has been shown that especially the homopentameric α7 nAChR is permeable to calcium and that the addition of the α5 subunit to the heteropentameric α4β2 nAChR greatly increases its calcium conductance (Fucile, 2004). Calcium conductance is an interesting property of nAChR because this links nAChR activation to intracellular signaling pathways (Dajas-Bailador and Wonnacott, 2004; Gubbins et al., 2010) and because it mediates the effect of presynaptic nAChR stimulation on increased neurotransmitter release (Sharma and Vijayaraghavan, 2003; Dickinson et al., 2008). Despite the fact that the α4β2 nAChR has a substantially lower calcium conductance, it should be noted that also activation of this receptor can induce intracellular calcium signaling through its association with voltage operated calcium channels (VOCCs; Dajas-Bailador and Wonnacott, 2004). Another important difference between the two main groups of nAChRs is their affinity to ACh (Clarke et al., 1985). In contrast to the heteropentameric receptors, that have a nanomolar affinity to ACh, homopentameric receptors have an affinity in the micromolar range (Gotti et al., 2006). This is one of the reasons why it has been suggested that homopentameric α7 receptors are located in synapses and that α4β2\* nAChRs (\* denotes the presence of a fifth accessory subunit) are located extrasynaptically and are activated by volume transmission (Bennett et al., 2012).

An interesting property related to the differences in affinity is the desensitization of both types of receptors. Whereas the α7 nAChR desensitizes fast to high concentrations of ACh (McGehee and Role, 1995), a radically different picture emerges when looking at low agonist concentration desensitization. At agonist concentrations that are insufficient for receptor activation, desensitization can be observed in high-affinity α4β2\* nAChRs receptors. This process has been termed "high-affinity desensitization", to distinguish it from "classical desensitization" (Giniatullin et al., 2005). In other words, the α7 nAChR desensitizes quickly to high agonist concentrations, and the α4β2\* nAChRs desensitizes much slower but also in response to much lower ACh concentrations (Mansvelder et al., 2002). Desensitization is an important property of nAChRs because it has been shown that realistic concentrations of nicotine, after the smoking of only one cigarette (Henningfield et al., 1996; Matta et al., 2007; Rose et al., 2010), desensitize high-affinity nAChRs in the ventral tegmental area (VTA) and thereby contribute to the addictive properties of nicotine (Mansvelder et al., 2002; Wooltorton et al., 2003).

There are also important differences in the timescale of the currents that are flowing through the channels and the pharmacological profile of the receptors. Hence, the two main types of nAChRs can be distinguished easily based on their sensitivities to particular pharmacological agents and the timescale of their activation (McGehee and Role, 1995).

Finally, the accessory α5 subunit has an important influence on the heteropentameric receptor. In addition to the already mentioned increase in Ca2<sup>+</sup> conductance, this subunit has also been shown to increase conductance and the sensitivity to nicotine (Ramirez-Latorre et al., 1996), to prolong inward currents in response to persistent nicotine application (Bailey et al., 2012) and potentially to influence the receptor localization (Gotti and Clementi, 2004). Furthermore, recently it was also demonstrated that the α5 subunit influences the expression of the α4 subunit in the VTA (Chatterjee et al., 2013).

## **ROLE OF NICOTINE RECEPTORS IN BEHAVIOR**

During attention tasks there is a release of ACh in the mPFC which is associated both with attentional effort and with cue detection (Passetti et al., 2000; Parikh et al., 2007). Recently, mice lacking specific nicotinic subunits were tested in the 5-choice serial reaction time task (5-CSRTT; Robbins, 2002), an attentional task for rodents in which the animals have to respond to 5 different cue lights by making a nosepoke in the corresponding hole in order to obtain food rewards. The results indicate that β2 subunits in the prelimbic cortex are necessary for cue detection, as mice lacking these subunits make more errors of omission in this task and reexpression of these subunits in the prelimbic cortex was sufficient to rescue behavior (Guillem et al., 2011). This is the first time that attention problems have been demonstrated in these mice. Although the authors did not find altered behavior in mice lacking the α7 subunit, others have reported that α7 knock-outs do have attentional deficits as apparent by an increase in omissions and a decrease in accuracy (Young et al., 2004, 2007; Hoyle et al., 2006). A possible explanation for this discrepancy is that in these latter experiments the mice performed more trials. Hence, it could be that the demands on sustained attention were higher thereby revealing the phenotype. Moreover, in the experiments of Guillem (Guillem et al., 2011) the mice made relatively more omissions, making it possible that the differences were masked by a ceiling effect. Nevertheless, the fact that they did find an effect on omissions in the β2 knock-out mice suggests that they were able to measure differences in attention behavior between different phenotypes and that probably the phenotype of α7 knock-outs is more subtle.

Although the role of the β2\* nAChRs in attention behavior has not been tested before with the use of mice lacking these subunits, there have been attempts to study them using a pharmacological approach. In other studies using the same behavioral task, it was found that pharmacological blockade of β2\* nAChRs did not affect task performance in rats (Grottick and Higgins, 2000; Hahn et al., 2011) and in mice (Pattij et al., 2007). Therefore it was concluded that these receptors are not involved in cue detection. There are several possible explanations for the discrepancy between these findings. First, there could be species differences explaining the lack of effect in rats. Secondly, differences could be due to the concentration of antagonist applied and residual effects of ACh through nAChRs. It is not completely known how high the antagonist concentration is in the mPFC when it is administered systemically. In addition, in electrophysiological recordings there is not a full blockade of the inward currents (Guillem et al., 2011; Poorthuis et al., 2013a) after local ACh application in the presence of the β2\* nAChRs antagonist, dihydro-β-erythroidine (DHβE), that was used in the rat studies. In addition, knocking out genes can induce compensatory effects and developmental changes. Indeed, it is known that mice lacking β2 subunits have an upregulation of muscarinic excitability (Tian et al., 2011).

Interestingly, it has also been demonstrated that the α5 subunit, which is present on layer VI pyramidal neurons, is necessary for normal attention behavior (Bailey et al., 2010). In contrast to β2 knock-out mice, mice lacking the α5 subunit have a reduced accuracy in the 5-CSRTT and only a small, but not significant, effect on omissions. Since α5 and β2 subunits form nAChRs only on layer VI pyramidal cells, it could be that the effect on omissions is dependent on nAChRs that do not have the α5 subunit. In contrast, the effect on accuracy in α5 knock-out mice could be due to differences that are due to the role of the α5 subunit in development, as mice lacking this subunit have neurons with shorter apical dendrites (Bailey et al., 2012). Alternatively, it could be that β2\* nAChR are specifically involved in the mediation of the effects of cholinergic transients, whereas α5β2\* are more important for tonic effects of ACh. This could well be the case, since that would mean that the timescale of their activation would match the release mode.

In addition to the knock-out approach to probe the involvement of specific receptors in this task, other studies have also used pharmacological methods. Most of these have used systemic administration of nicotinic and/or muscarinic drugs and are hard to interpret since nAChRs throughout the brain are activated. However, a small number of studies have infused cholinergic drugs locally into the mPFC, thereby generating important data regarding the cholinergic modulation of this brain area. In one study, nicotine was infused systemically or locally into the mPFC or hippocampus and attention behavior in the 5-CSRTT was compared between these conditions (Hahn et al., 2003b). This study elegantly showed that the effects of systemic nicotine on the accuracy in the task could also be observed after local infusion of nicotine. In contrast to what one would expect on the basis of studies using knock-out mice (Guillem et al., 2011), they did not find that nicotine in the mPFC could replicate the effects of systemic nicotine on omissions. There was no effect of nicotine on the dorsal hippocampus. The same authors also performed another study in which they investigated the contribution of heteromeric and homomeric nAChRs to the effects of nicotine on the 5-CSRTT using the specific antagonists DHβE and methyllycaconitine (MLA; Hahn et al., 2011). Based on co-application of these antagonists and nicotine, they concluded that the effects of nicotine are mediated by α7 nAChRs and not by β2\* nAChR. A more recent study, in which nicotinic agonists were used, shows however that the attention enhancing effects of nicotine are also seen with specific β2\* nAChR agonists, but not with α7 nAChRs agonists (Young et al., 2013).

To summarize, although there is plenty of evidence showing that prefrontal ACh is crucial for attention behavior and that nAChRs are involved in performance during the 5-CRSTT, it is currently not completely clear what the role of different types of receptors are and how exactly they change the number of omissions and accuracy. Interpreting the results is complicated by the fact that there are many small differences in task design and because of problems with interpreting systemic administration and knockout studies. Nevertheless, recent results are clearly showing an involvement of the β2\* nAChRs in cue detection during the 5-CSRTT (Guillem et al., 2011).

### **CHOLINERGIC MODULATION OF CORTICAL CIRCUITRY**

The cortex is a six-layered structure (I–VI) (Douglas and Martin, 2004), although the rodent PFC misses the classical input layer IV (Uylings et al., 2003). In addition, there is a second organizational principle, called cortical columns (Mountcastle, 1997; Markram et al., 2004) in which neurons often have similar receptive field properties. Although the existence of cortical columns in all regions of the cortex is controversial (Horton and Adams, 2005), it is a useful concept to understand processing in the cortical circuitry. Within these different layers, there are excitatory, glutamatergic pyramidal neurons and inhibitory, GABAergic interneurons. These are thought to modulate processing locally by inhibiting the activity of the pyramidal neurons, thereby shaping processing in the local microcircuitry (Markram et al., 2004; Huang et al., 2007; Isaacson and Scanziani, 2011). Both of these groups of neurons can be further divided into many subclasses on the basis of morphology, electrophysiological firing pattern, projection targets and molecular characteristics (Ascoli et al., 2008; DeFelipe et al., 2013).

Although it is not known how exactly information is processed in cortical circuits, many studies have looked into the connectivity and information flow in the cortical circuitry of primary sensory areas (Armstrong-James et al., 1992; Thomson et al., 2002; Hirsch and Martinez, 2006; Feldmeyer, 2012; Constantinople and Bruno, 2013). It remains to be seen whether these findings can be generalized to higher order cortical areas such as the PFC. Based on this work, a general model of information flow within cortical circuits has been proposed. To describe processing, it is useful to describe the direction of information flow in the cortical hierarchy. Conceptually this is easiest to understand in the visual cortex (Hubel and Wiesel, 1977). In this system, there is a clear hierarchy of cortical areas that process visual information in which the receptive field properties get bigger and more complex throughout the visual system (Hubel and Wiesel, 1962, 1965; Moran and Desimone, 1985; Felleman and Van Essen, 1991). There are three different possible "directions" in which processing can occur (Lamme et al., 1998). First, there is feedforward processing, meaning that sensory information entering the cortex is processed according to these hierarchical steps in a bottom up fashion. In contrast, there is feedback processing (Lamme et al., 1998; Lamme and Roelfsema, 2000), referring to a modulation of the processing of incoming information by hierarchically higher brain areas. Examples are top-down attention, predictions and expectations (Lamme and Roelfsema, 2000). Finally, there is lateral processing (Lamme et al., 1998) referring to horizontal integration or competition at a given level of the hierarchy (Gilbert and Wiesel, 1989; Adesnik and Scanziani, 2010).

In sensory cortical areas, feed-forward information enters the cortex from the thalamus and targets layer IV (Castro-Alamancos and Connors, 1997; Douglas and Martin, 2004). Layer IV excitatory neurons project to the superficial layer II and III, which subsequently send information to the deep layer V (Gilbert and Wiesel, 1979; Thomson et al., 2002; Thomson and Bannister, 2003). Layer V innervates layer VI and sends a signal back to the superficial layers. Also, this layer and layer VI project strongly to subcortical structures such as the thalamus and the basal ganglia (Gabbott et al., 2005; Olsen et al., 2012). For this reason, they are sometimes referred to as the cortical output layers In contrast, layer II and III project mainly to other cortical areas (Adesnik and Scanziani, 2010; Little and Carter, 2012). Finally, layer I is very different from the other layers, since the density of neurons is extremely low (Meyer et al., 2010) and all neurons are GABAergic interneurons (Jiang et al., 2013). It is thought that thalamic feedback signals are send to layer I and that this modulates processing in the cortical column (Rubio-Garrido et al., 2009; Letzkus et al., 2011; Cruikshank et al., 2012).

As stated before, this model is based on information from sensory cortical areas and it remains to be determined whether it holds for the mouse mPFC. Furthermore, it is a simplified model since, for example, also in the barrel cortex layers V and VI receive monosynaptic inputs from the thalamus (Agmon and Connors, 1991; Constantinople and Bruno, 2013). One important difference between the PFC and the sensory cortices is that the rodent PFC does not have a layer IV. Instead, inputs from higher order thalamic relay nuclei (Sherman, 2012) target layer II/III and V. In addition, the superficial layers are modulated, like other cortical areas, by nonspecific thalamic projections (Little and Carter, 2012). Another feature of the PFC which distinguishes it from other cortical areas is the strong recurrent connectivity (Wang et al., 2006) and persistent firing outlasting stimulus presentations (Zhang and Séguéla, 2010; Yang et al., 2013) that can be observed in this area. Hence, we are only beginning to understand how information flows in the cortical microcircuitry. Nevertheless, a picture is emerging how ACh modulates the flow of information in the cortex.

On a network level, basal forebrain stimulation in anesthetized animals results in a desynchronized state of field potentials (Goard and Dan, 2009; Kalmbach et al., 2012) and neuronal firing in the basal forebrain is correlated with a reduction in low frequency and an increase of high frequency oscillations in the cortex (Duque et al., 2000; Manns et al., 2000). Since these frequency bands are related to the state of arousal and cortical activation (Uhlhaas et al., 2008; Deco and Thiele, 2009; Wang, 2010; Cachope et al., 2012), ACh has long been considered a neuromodulator that is involved in setting the state of arousal. Mechanistically, it was shown that ACh activated cortical mAChRs on pyramidal neurons (Gulledge et al., 2009), thereby shifting firing modes from bursting to tonic and changing low frequency high amplitude oscillatory activity to high frequency low amplitude on a network level (Metherate et al., 1992).

Other studies have looked at the effect of ACh on the direction of the flow of information in the cortex. Again, these studies have been performed in sensory areas because in these regions, neuronal responses could be related to sensory stimulation. One of the dominant effects that has repeatedly been demonstrated is the enhancement of feedforward thalamic input into the sensory cortical areas. In layer IV, ACh increases the gain and reliability of neuronal responses in layer IV of the visual cortex (Goard and Dan, 2009; Soma et al., 2012, 2013), an effect which is mediated by heteromeric nAChRs (Roberts et al., 2005; Disney et al., 2007). In the barrel cortex, a similar effect was observed (Oldford and Castro-Alamancos, 2003). In layer II and III, the picture is more complex. In general, cholinergic modulation reduces firing rate in these layers by increasing GABAergic inhibition through mAChRs and nAChRs (Disney et al., 2012; Alitto and Dan, 2013; Soma et al., 2013), although reliability of encoding and modulation by presented stimuli sometimes increased at the same time (Goard and Dan, 2009; Soma et al., 2013). Interestingly, it has recently been reported that the cortical depression associated with whisker trimming is accompanied by an increase of heteromeric receptors on interneurons in layer II/III and that blocking these receptors can prevent the cortical depression. This suggest that heteromeric receptors in layer II/III are required for regulating the responsiveness of the somatosensory cortex (Brown et al., 2012). Intracortical projections, which are thought to connect superficial layers between different cortical columns are also inhibited by ACh through mAChRs (Kimura and Baughman, 1997). Based on this finding and the reduced activity in the superficial layers, it has been suggested that ACh reduces horizontal processing through cortico-cortical interactions (Hasselmo and Giocomo, 2006). Indeed it has been observed in slices, in vivo animal experiments and in humans that the spatial spread of excitation in response to stimuli is reduced in the presence of elevated levels of ACh (Kimura et al., 1999; Silver et al., 2008). This effect could have a sharpening effect on tuning curves of receptive fields and the discriminability of sensory stimuli (Roberts et al., 2005; Thiele

et al., 2012). Also, the combination of reduced lateral interactions and an increased sensitivity to thalamic inputs could increase the networks sensitivity to incoming information and increase the signal to noise ratio. This effect is also observed with enhanced attention (Briggs et al., 2013). Therefore, this could be one of the core mechanisms through which ACh modulates selective attention (Hasselmo and Giocomo, 2006; Deco and Thiele, 2011; Hasselmo and Sarter, 2011). The effect of ACh on the deeper layers V and VI is less understood in functional terms. However, also in deep layers both pyramidal and interneurons are modulated by nAChRs and mAChRs (Gulledge et al., 2007; Kassam et al., 2008; Poorthuis et al., 2013a) and both response suppression and facilitation can be observed (Soma et al., 2013). Finally, in layer I, all interneurons contain heteromeric and/or homomeric nAChRs (Christophe et al., 2002; Alitto and Dan, 2013). Since these neurons inhibit both layer I-III interneurons and layer II/III pyramidal cells, the effect of cholinergic layer I activation is complex and can inhibit as well as disinhibit pyramidal cells in deeper layers (Letzkus et al., 2011; Arroyo et al., 2012; Bennett et al., 2012; Cruikshank et al., 2012; Jiang et al., 2013).

## **CHOLINERGIC MODULATION OF THE MEDIAL PREFRONTAL CORTEX**

Despite the fact that the effects of ACh, as described above, are found in sensory cortices, there are reasons to believe that the cholinergic modulation of the mPFC occurs in a similar manner. Autoradiographical measurements of the localization of mAChRs and nAChRs do not show big differences in receptor localization between different cortical regions (Clarke et al., 1984, 1985; Spencer et al., 1986). In addition, there is evidence that some of the principles outlined above also hold true for the mPFC. For instance, also in the mPFC layer V pyramidal neurons are prominently modulated by M1 (Gulledge et al., 2009) whereas layer II–III pyramidal neurons are not. Moreover, also in the mPFC the release of other neuromodulators is strongly increased by nicotinic stimulation (dos Santos Coura and Granon, 2012).

In contrast to other cortical regions, where thalamic axons target mainly layer IV, in the mPFC they target layer III and V (Rotaru et al., 2005), as layer IV is nonexistent. It has been demonstrated that after lesioning of the thalamic nucleus targeting the PFC, the mediodorsal thalamus (MDT), there is a 40% reduction of high affinity binding sites, suggesting a strong heteromeric nAChR presence on the thalamocortical terminals (Gioanni et al., 1999). In addition, this study demonstrated that nicotine induces a strong glutamate release in the PFC and that an iontophoretic nicotine application enhanced the response to MDT stimulation in all layers. Moreover, it was demonstrated that nicotine increases spontaneous release of glutamate from thalamic inputs onto layer V neurons (Lambe et al., 2003). In contrast, in layer II/III mAChR and nAChR seem to have opposing effects on glutamatergic inputs, although the percentage of neurons modulated in this layer is rather low (Vidal and Changeux, 1993). Given these findings and the increase of coding reliability that is observed in sensory areas after nAChR stimulation (Disney et al., 2007; Goard and Dan, 2009; Soma et al., 2012), one could speculate that an enhancement of thalamocortical processing is a dominant effect of nAChR stimulation in the mPFC. Interestingly, heteromeric receptors on these terminals were not reexpressed in (Guillem et al., 2011), demonstrating that it is unlikely that β2\*-nAChRs on thalamic inputs play a role in cue detection in this task.

In addition to these presynaptic receptors, β2\*-nAChRs were also found postsynaptically on cells in the mPFC (**Figure 1**). It was found that there is a strong presence of α4β2α5 nAChRs on pyramidal cells in layer VI and α4β2\* nAChRs on interneurons in all layers (Poorthuis et al., 2013a; Poorthuis and Mansvelder, 2013). Given the finding that reexpression of β2 subunits in the prelimbic cortex could rescue the phenotype of β2 knockout mice, it is most likely that these receptors are crucial for cue detection in the 5-CSRTT. This would suggest that during a sustained attention task, ACh increases inhibition in the mPFC through nAChRs and increases pyramidal cell activity in layer VI. These pyramidal neurons feed back to the thalamic inputs of the mPFC (Gabbott et al., 2005). In the visual cortex these layer VI pyramidal neurons have been shown to modulate the gain of incoming thalamic information (Olsen et al., 2012). It would be interesting to disentangle the contribution of prelimbic interneurons and layer VI pyramidal cells in an attention task to further narrow down the specific β2\* nAChRs that are required for cue detection. Homomeric receptors were also found in pyramidal cells of the mPFC in a layer and neuronal subtype specific manner. Interestingly α7 receptors were reported to be present on layer V pyramidal neurons (Poorthuis et al., 2013a). To our best knowledge, this is the first demonstration of a homomeric nAChR presence on layer V pyramidal cells. During development, there is a transient upregulation of the expression of the α5 subunit in the cortex (Winzer-Serhan and Leslie, 2005). The first months there is a particularly high expression in layer VI, with a peak around 2 weeks after birth. It was shown that this is also the case in the PFC and that these α5 expressing neurons are pyramidal neurons projecting to the MDT (Kassam et al., 2008). In addition, some cells in layers II-V express this accessory subunit. These cells are thought to be interneurons, based on electrophysiological recordings and post-hoc single cell reverse transcription polymerase chain reaction (RT-PCR; Porter et al., 1999).

As in other cortical areas, non fast spiking interneurons are modulated by mAChRs and nAChRs stimulation (Kawaguchi, 1997; Gulledge et al., 2007; Poorthuis et al., 2013a). In contrast, it is unclear how exactly fast spiking interneurons are modulated by ACh. There have been reports that fast spiking interneurons are unresponsive to cholinergic stimulation (Kawaguchi, 1997; Gulledge et al., 2007) but it has also been published that fast spiking interneurons are inhibited through mAChR in layer V of the visual cortex (Xiang et al., 1998), that mAChR activation inhibits GABA release from fast spiking cells on pyramidal cells in the somatosensory cortex (Kruglikov and Rudy, 2008) and that α7 nAChRs are present on fast spiking interneurons in layer I-V. In layer I all neurons have nAChRs, as described above. A consequence of the nicotinic stimulation of interneurons is that nicotine has been shown to increase the inhibition of layer V pyramidal neurons (Couey et al., 2007). Hence, interneurons in all layers, except for layer VI contain a mixed profile of nAChRs. This includes both fast spiking and non-fast spiking interneurons although there are

differences in nAChRs in these two populations in the different layers.

different layers.

Together, these results show that the models of cholinergic modulation from sensory areas are at least useful to understand the cholinergic modulation of the mPFC. Nevertheless, in order to understand the way AChRs mediate the effects of phasic ACh release in the mPFC, it will be crucial to study the receptor localization and their effects on network physiology into more detail.

Given these findings, one could speculate about the functional role of nAChRs in the modulation of mPFC activity by ACh. It seems that nAChR stimulation results in an increase of the inhibitory tone of the mPFC network. In addition, there seems to be a strong increase in the processing of thalamic information. Together this could mean that nAChR stimulation would "reset" the network so that new incoming information can be processed. This would fit well with the model that was proposed by Sarter (Sarter et al., 2005; Howe et al., 2013) in which short increases in ACh would mediate an attentional shift, or more precisely: a shift from perceptual attention to the activation of response rules allowing the expression of a behavioral response. Furthermore, as in sensory cortices the data support the model that ACh reduces the functional connectivity of corticocortical projections. In other words, also in the mPFC there is an increased drive from the thalamus whereas the superficial layers, that mediate most of the corticocortical connectivity, are inhibited. In the deep layers, it was recently found that nAChR activation increases spontaneous activity in acute brain slices. Based on the connectivity of layers V and VI, this would suggest that the activation of nAChRs in the mPFC by ACh increase the drive from this region on subcortical structures. Since layer V strongly connects to the striatum, it could be that the activation of this layer is important in the initiation of the behavioral response after the mPFC has detected the cue. In contrast, layer VI projects back to the MDT, which could modulate the gain of the thalamic inputs. To determine the effects of activation of these layers, it will be necessary to perform *in vivo* experiments in which the activity in different layers will be measured and/or manipulated.

Since it is known that the basal forebrain gets activated in response to salient events (Lin and Nicolelis, 2008) and that there are strong projections to this region from subcortical areas like the nucleus accumbens (St. Peters et al., 2011) and the amygdala (Jolkkonen et al., 2002), it seems that phasic cholinergic signaling in the mPFC is important for signaling salient information. In other words, when important information regarding potential rewards or dangers are presented or expected, ACh might update the internal goals, the direction of attention, the content of working memory and/or a change in behavior.

It remains to be determined how this links to the effects of ACh on sustained attention. It could be that ACh influences sustained attention through this fast signaling mode and that when sustained attention fades, this is reflected by a reduction in the size or frequency of cholinergic transients. Alternatively, the effects of ACh on sustained attention might be independent of fast cholinergic transients and instead related to tonic release of ACh. Finally, there might be a complex interplay between tonic and phasic effects.

## **EXOGENOUS nAChR ACTIVATION: ACTIVATION AND DESENSITIZATION BY NICOTINE**

Although the endogenous ligand for nAChRs is ACh, many people use a drug that contains an exogenous ligand for this receptor, namely nicotine, in the form of smoking of tobacco. Since there is evidence that nicotine influences attentional performance (Mirza and Stolerman, 2000; Hahn et al., 2003a; Levin et al., 2006; Heishman et al., 2010) and that at least a part of these effects are mediated by prefrontal nAChRs in rats (Hahn et al., 2003c), it is interesting to see how realistic concentrations of nicotine affect cholinergic signaling through nAChRs in the mPFC. It was found (Poorthuis et al., 2013b) that nicotine activates nAChRs and thereby influences network activity, although the main effect of nicotine is actually a desensitization of nAChRs. Especially heteromeric nAChRs desensitize strongly in the presence of 300 nM nicotine, a concentration that is found in the brain after the smoking of just one cigarette for over 10 min. For this reason, it was concluded that nicotine interferes strongly with cholinergic signaling through nAChRs. In addition to the activating and desensitizing properties of nicotine when it binds to the nAChRs, it has also been shown that nicotine can induce persistent changes in gene expression in multiple brain areas, including the mPFC (Mychasiuk et al., 2013), and that it strongly influences the presence of high affinity nicotine receptors in the brain (Marks et al., 1992; Buisson and Bertrand, 2001). The mechanisms behind this are still controversial (Vallejo et al., 2005; Govind et al., 2012) but it has been firmly established that this is the case.

At the behavioral level, although the evidence for an effect of nicotine on attention is strong, the precise conditions under which this can be observed are still under debate. Although nicotine seems to improve cognition in certain patient populations including schizophrenia, ADHD and dementias (Newhouse et al., 2004; Potter and Newhouse, 2008; D'Souza and Markou, 2012), the evidence for an attention enhancing effect in healthy populations is scarce (Newhouse et al., 2004; Heishman et al., 2010). Moreover, people that are addicted to smoking function better when they are not in a state of abstinence (Kleykamp et al., 2005; Vossel et al., 2011) although this seems to reduce a cognitive deficit associated with the abstinence rather than to really improve attention. Importantly, in humans it is unlikely that smokers represent an unbiased sample of the population. Rather, attentional problems or other cognitive deficits might already be present (Rigbi et al., 2008). Also, mutations in the genes coding for the nAChR subunits influence smoking behavior itself (Picciotto and Kenny, 2013). To circumvent these problems, animal work provides an outcome. In sustained attention tasks, many groups have shown that acute nicotine administration can improve performance (Grottick and Higgins, 2000; Stolerman et al., 2000; Hahn et al., 2003a; Young et al., 2013) although there are still some discrepancies between the different findings (Mirza and Stolerman, 1998; Robbins, 2002). Importantly, the age and duration of nicotine administration have been found to be important parameters (Counotte et al., 2012b). Rats that received nicotine during adolescence had attentional difficulties in adulthood, an effect that was not observed when nicotine was delivered during adulthood (Counotte et al., 2011, 2012a).

There seem to be big differences between acute and chronic nicotine administration. Especially at an early age, the network is prone to adapt quickly. Because nicotine use in humans often starts during puberty and is occurring during prolonged periods, it is likely that the effects of nicotine on cognition in humans are different from what was observed in slices. For this reason it is hard to explain the cognitive effects of smoking from the data on desensitization. Nevertheless, it suggests that nicotine does not exert its effects as an agonist, but rather as an agent that desensitizes β2\* nAChRs. Recently, several groups have started disentangling the activating and desensitizing effects of nicotine in attention. Levin and Rezvani have administered nAChR antagonists and an agonist that mainly desensitizes high affinity nAChRs and found that attention can be improved by these drugs (Levin et al., 2013; Rezvani et al., 2013). Therefore this would suggest that the attention enhancing effects of nicotine are actually mediated by a desensitization of nAChRs. This raises the question, however, why mice lacking β2\* nAChRs were shown to have an attentional deficit and the administration of nAChR antagonist mecamylamine increases the number of omissions (Pattij et al., 2007). To conclude, although there is a lot of evidence that nicotine influences attentional performance, it is still under debate what the exact conditions are under which it improves or decreases attention and what the mechanisms are through which it does so.

## **THE ROLE OF CHOLINERGIC MODULATION OF THE MEDIAL PREFRONTAL CORTEX (mPFC) IN NEUROPSYCHIATRIC AND NEURODEGENERATIVE DISORDERS**

There are many neuropsychiatric disorders associated with dysfunctions in the cholinergic system and the mPFC. It is beyond the scope of this review to detail all mechanisms of these disorders, but findings relating to the role of the mPFC, ACh and attention will be highlighted shortly.

Given the studies mentioned above, it is no surprise that attention deficit hyperactivity disorder (ADHD) is associated with dysfunctions in the mPFC and the cholinergic system. ADHD is characterized, among others, by a decreased top down control, inattention and impulsive acts, all of which are strongly linked to the mPFC and ACh (Robbins, 2002; Sarter and Paolone, 2011; Ohmura et al., 2012). Furthermore, nicotine itself can increase cognitive performance in ADHD patients (Newhouse et al., 2004; Levin et al., 2006) and since recently, clinical trials are being performed to test the efficacy of nAChR subtype specific agonists to increase cognitive performance in ADHD patients (Bain et al., 2013; Jucaite et al., 2014).

In addition to ADHD, schizophrenia is also associated with disturbances in the cholinergic system and the mPFC (Weinberger and Berman, 1996; Minzenberg et al., 2009; Brooks et al., 2011, 2012). Schizophrenia patients have deficits in PFC dependent cognition, such as working memory (Forbes et al., 2009) and behavioral flexibility (Leeson et al., 2009) and have alterations in the microcircuitry of the PFC, in particular in interneurons (Lewis et al., 2005; Uhlhaas and Singer, 2010). In addition, multiple ACh receptor types have been linked to the disease (Raedler et al., 2003; Wallace and Bertrand, 2013). Although the relation is far from clear, a number of observations have been made that establish a link between schizophrenia and the α7 nAChR. First, it is expressed to a lower degree in schizophrenia patients (Guan et al., 1999; Young and Geyer, 2013). Moreover, in mice this receptor is linked to sensorimotor deficits that are also found in schizophrenia patients and their healthy family members (Martin and Freedman, 2007). Also, the part of the genome coding for this receptor is linked to schizophrenia. Finally, it is known that schizophrenia patients participate in heavy nicotine searching behavior, which could compensate for the lower expression of α7 receptors, and that nicotine, in addition to more selective α7 agonists, can improve cognitive functioning in these patients (Olincy et al., 2006; Wallace and Bertrand, 2013).

Obviously, another psychiatric disorder associated with nAChRs in particular is addiction. Of all drugs, nicotine is used most extensively and it is associated with a significant social and economic burden for society (Dani and Balfour, 2011; De Biasi and Dani, 2011; Picciotto and Kenny, 2013). Fundamentally, addiction is not an attentional disorder. However, addiction is linked to changes in functioning of the mPFC and behavioral control (Van den Oever et al., 2010; Goldstein and Volkow, 2011) and it has been shown that attention is impaired after nicotine exposure (Counotte et al., 2011). Moreover, people using nicotine often report attentional benefits although it's not clear to what extent this is due to a relief from withdrawal symptoms or acute effects (Heishman et al., 2010).

Finally, given the fact that lesion, electrophysiological and pharmacological studies strongly indicate that ACh is a key neurotransmitter in memory function (Deiana et al., 2011), it is not surprising that another disorder strongly linked to cholinergic functioning is Alzheimer's disease (AD). Because of reports (Davies and Maloney, 1976) of strong cholinergic cell loss in the septum and basal forebrain of Alzheimer's patients, early theories of AD emphasized a cholinergic involvement. As later it became clear that cholinergic cell loss does not occur in early stages of the disorder, it became clear that this cannot account for AD as an etiological factor (Pinto et al., 2011; Schliebs and Arendt, 2011). However, widespread cholinergic cell loss is still considered a major aspect of AD (Micheau and Marighetto, 2011). Another important link between AD and cholinergic signaling is through the nAChR (Buckingham et al., 2009; Jürgensen and Ferreira, 2010). It has been found that AD patients have strongly reduced levels of cortical α4β2 nAChRs (Kellar et al., 1987; Sparks et al., 1998; Perry et al., 2000). In addition, it was demonstrated that the major constituent of the extracellular placques, amyloid-beta, can directly interact with nAChRs and interfere with their functioning (Dineley, 2007). Although there are still a lot of questions about these interactions and about cholinergic cells loss in AD, it is clear that cholinergic dysfunction plays an important role in the memory and attention problems in AD patients (Brousseau et al., 2007; Pinto et al., 2011). Finally, drugs that inhibit the breakdown of ACh, acetylcholinesterase inhibitors (AChEI), were demonstrated to have beneficial effects on AD patients, with improvements in memory and attention (Brousseau et al., 2007; Pinto et al., 2011).

## **SHINING NEW LIGHT ON THE CHOLINERGIC SYSTEM**

As discussed above there are important limitations that are inherent to the approach that was taken by most studies. Concerning electrophysiological experiments, it is well known that the spatial and temporal parameters of ACh application are crucial in determining the electrophysiological effects. Given our lack of knowledge about the transmission modes and concentrations of ACh surrounding the receptors, it is very hard to estimate what the effects of ACh on neuronal activity are. In order to advance our knowledge about the way ACh modulates processing in the mPFC it will be crucial to manipulate ACh release from cholinergic terminals, because this is the only way in which we can monitor the postsynaptic effects that occur with realistic cholinergic stimulation. When it comes to the role of ACh in behavior, there are also certain limitations with the pharmacological and knock-out approach. Pharmacology suffers from a lack of specificity, as it stimulates receptors throughout the body and also here the temporal aspects of receptor activation are far from what is physiologically relevant. As mentioned before, animals lacking specific receptors often show compensatory and developmental effects and therefore do not allow us to study the role of receptors in the normal situation.

Fortunately, there are new methods that will allow us to press forward our understanding of the cholinergic modulation of the mPFC by manipulating ACh release from cholinergic neurons themselves and by measuring the release of ACh and the activity of the cholinergic innervation. Two methods that will be crucial are optogenetics (Zhang et al., 2007; Fenno et al., 2011; Yizhar et al., 2011) and the measurement of presynaptic activity with new calcium dyes (Chen et al., 2013; Kaifosh et al., 2013).

Optogenetics makes use of genetically encoded opsins that allow experimenters to stimulate or inhibit the activity of specific populations of neurons. The neurons that are effected can be defined by their genetic background, their location, their projection targets or a combination of these (Josh Huang and Zeng, 2013). Using this method it will be possible to determine the effect of ACh release in specific brain structures. Since release can be both inhibited and stimulated at specific time points during behavioral tests, it will be possible to determine the effects of different release modes in specific brain regions. In addition, electrophysiological effects of ACh release can be measured using *in vitro* or *in vivo* preparations. The power of this approach has already been demonstrated in a number of studies that investigated polysynaptic effects of ACh release (Arroyo et al., 2012; Bennett et al., 2012).

In addition, very sensitive calcium dyes have been developed (Chen et al., 2013) that make it possible to measure presynaptic activity. In other words, if these dyes are expressed in cholinergic neurons of the basal forebrain, it will be possible to measure the activity of their axons in the cortex. This will most likely lead to breakthroughs in our knowledge about the activity of these neurons, as at the moment very little is known about the activity of these fibers. Recently, a similar approach was used on the GABAergic projections from the basal forebrain to the hippocampus, thereby showing for the first time when these axons are active during behavior (Kaifosh et al., 2013).

These methods will make it possible to address key questions in the field of the cholinergic modulation of the cortex. First of all, they will make it possible to investigate when ACh is released and through what kind of signaling mode this occurs. In other words, we will be able to find out what the role is of tonic and phasic release of ACh. In addition, the spatial specificity of cholinergic signaling can finally be addressed. At the moment there is a scarcity of information regarding the degree of specificity of ACh release. For example, currently it is unknown whether ACh release occurs simultaneously throughout the PFC or whether it can be restricted to specific prefrontal areas such as the prelimbic cortex. Moving from a general notion of a role of ACh in attention towards an understanding of when and where exactly ACh is released will be a crucial step towards understanding the cholinergic system.

Since there are multiple sources of ACh, this approach will make it possible to study the role of the basal forebrain, midbrain cholinergic areas and cortical cholinergic interneurons separately. Moreover, cholinergic neurons only make up a small percentage of cortical projections from the basal forebrain (Gritti et al., 1997; Zaborszky et al., 1999; Gritti et al., 2003), and the genetic approach will allow studying the role of these other projections to the cortex, in an approach similar to (Kaifosh et al., 2013). Using optogenetics and genetically encoded calcium indicators will allow researchers to disentangle the role of different cholinergic and basal forebrain neuronal populations.

Also in the field of neurophysiology big advances are to be expected with the development of optical methods. Many of the questions that remained after experiments in acute brain slices can now finally be addressed. In order to understand how ACh modulates processing in the mPFC we will need to deliver ACh in a realistic manner. If we can make cholinergic axons release ACh themselves then we will make a huge step forwards in this respect. As mentioned before, several papers have been published in which this was done (Arroyo et al., 2012; Bennett et al., 2012). It will be necessary to investigate how nicotine affects currents through nAChRs when ACh is not applied with in the bath or with a puff pipette but instead released from cholinergic axons.

Finally, the combination of calcium indicators, allowing us to measure presynaptic activity, and in vivo electrophysiology make it possible to correlate neuronal spiking and field potential dynamics to ACh release. Again, this is expected to provide exciting new insights into the role of ACh in cognition and the cortical mechanisms underlying this.

## **ACKNOWLEDGMENTS**

Huibert D. Mansvelder received funding from the ERC StG "BrainSignals", the Dutch Fund for Economic Structure Reinforcement (FES, 0908 "NeuroBasic PharmaPhenomics project") and EU 7th Framework Programme (HEALTH-F2-2009-242167 'SynSys').

## **REFERENCES**


effects of basal forebrain 192-IgG-saporin lesions and intraprefrontal infusions of scopolamine. *Learn. Mem.* 11, 78–86. doi: 10.1101/lm.70904


nicotine in rats. *Psychopharmacology (Berl)* 168, 271–279. doi: 10.1007/s00213- 003-1438-6


function in schizophrenia. *Arch. Gen. Psychiatry* 66, 811–822. doi: 10. 1001/archgenpsychiatry.2009.91


for the pathophysiology of schizophrenia. *Schizophr. Bull.* 34, 927–943. doi: 10. 1093/schbul/sbn062


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 December 2013; paper pending published: 31 January 2014; accepted: 20 February 2014; published online: 11 March 2014.*

*Citation: Bloem B, Poorthuis RB and Mansvelder HD (2014) Cholinergic modulation of the medial prefrontal cortex: the role of nicotinic receptors in attention and regulation of neuronal activity. Front. Neural Circuits 8:17. doi: 10.3389/fncir.2014.00017 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Bloem, Poorthuis and Mansvelder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Dopamine modulation of learning and memory in the prefrontal cortex: insights from studies in primates, rodents, and birds

## *M. Victoria Puig1\*, Jonas Rose1,2 \*, Robert Schmidt <sup>3</sup> and Nadja Freund4*

<sup>1</sup> The Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA <sup>2</sup> Animal Physiology, Institute of Neurobiology, University of Tübingen, Tübingen, Germany

<sup>3</sup> BrainLinks-BrainTools, Department of Biology, Bernstein Center Freiburg, University of Freiburg, Freiburg, Germany

<sup>4</sup> Department of Psychiatry and Psychotherapy, University of Tübingen, Tübingen, Germany

#### *Edited by:*

Guillermo Gonzalez-Burgos, University of Pittsburgh, USA

#### *Reviewed by:*

Onur Gunturkun, Ruhr University Bochum, Germany Min Wang, Yale University, USA

#### *\*Correspondence:*

M. Victoria Puig and Jonas Rose, The Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

e-mail: mvpuig@mit.edu

In this review, we provide a brief overview over the current knowledge about the role of dopamine transmission in the prefrontal cortex during learning and memory. We discuss work in humans, monkeys, rats, and birds in order to provide a basis for comparison across species that might help identify crucial features and constraints of the dopaminergic system in executive function. Computational models of dopamine function are introduced to provide a framework for such a comparison. We also provide a brief evolutionary perspective showing that the dopaminergic system is highly preserved across mammals. Even birds, following a largely independent evolution of higher cognitive abilities, have evolved a comparable dopaminergic system. Finally, we discuss the unique advantages and challenges of using different animal models for advancing our understanding of dopamine function in the healthy and diseased brain.

**Keywords: prefrontal cortex, learning and memory, dopamine receptors, executive function, working memory, neuromodulation, evolution**

## **INTRODUCTION**

A major function of executive control is the flexible adaptation to our ever-changing environment. The executive circuits of the brain must, therefore, not only monitor and maintain current behavioral goals but also incorporate new goals and rules. This updating can come in the form of a quick integration of previously acquired knowledge when, for example, a well-known stimulus informs an animal of a change in reward contingencies. In many cases, however, such updating requires new learning, for example when a new stimulus is encountered for the first time. Executive functions are commonly ascribed to the prefrontal cortex (PFC) and frontostriatal networks. The function of these circuits relies heavily on neuromodulation, in particular on dopamine (DA). The aim of this review is to outline the contribution of DA and its receptors in the PFC to learning and memory processes across different species.

We will first introduce studies in the mammalian brain in the sections on humans, non-human primates, and rodents. Due to the challenges of investigating the role of DA transmission in human PFC, we focus the human section on studies utilizing systemic injections of DA agents and impairments of DA transmission in patients with a variety of neurological and psychiatric disorders. The non-human primate and rodent sections review behavioral studies conducted during local manipulations of the DA system in the PFC. While the dopaminergic system in different mammalian species follows largely the same organization, some conceptual and terminological differences can make a comparison of data across species difficult (**Box 1**). For a comparative perspective, we will then outline behavioral studies conducted in birds where local manipulations of the DA system were implemented in a structure equivalent to the mammal PFC, the nidopallium caudolaterale (NCL; Jarvis et al., 2005). Such a comparison is of particular interest given the large evolutionary gap between these species. The lines of birds and mammals separated around 300 million years ago, long before many of the cognitive functions attributed to the PFC evolved (Jarvis et al., 2005; Reiner et al., 2005; Jarvis, 2009; Rose et al., 2009a). In spite of this distance, birds and mammals (with the exception of humans and apes) are largely on par when it comes to cognitive abilities (Emery and Clayton, 2004; Kirsch et al., 2008, 2009). This implies a parallel or convergent evolution of cognition between the species (Emery and Clayton, 2004; Güntürkün, 2012). As a result of this independent evolution, we see stark differences in brain organization between birds and mammals (Jarvis et al., 2005). Most notably, the avian telencephalon does not show the laminar organization of the mammalian cortex. However, other organizational principles were preserved or evolved independently in both lines. This can be taken as a hint of narrow neurobiological constraints in the evolution of a given cognitive ability (Colombo and Broadbent, 2000; Güntürkün, 2005a).

### **ANATOMY OF THE DOPAMINE SYSTEM IN THE PREFRONTAL CORTEX**

The anatomy of the dopaminergic system is very similar between all mammals and birds (for extensive review, see Durstewitz et al., 1998, 1999b; Björklund and Dunnett, 2007). DA neurons can be identified by the expression of several catecholamine-synthesizing

#### **BOX 1 | Conceptual/terminological differences between species.**

When comparing the function of prefrontal DA across species it is important to clarify the terminology used in the different fields of research. As reviewed here, prefrontal DA plays an important role in learning and memory and an extensive body of literature is concerned with its role particularly in working memory (WM). In general, the term WM is strongly associated with its original definition by Baddeley and Hitch (1974), who famously proposed that systems for sensory storage (phonological loop, visuospatial sketchpad, and more recently, an episodic buffer) are governed by a central executive (Baddeley, 1992, 2000).The gist of this definition is that an interconnected neural system allows the brief storage of information and, importantly, its manipulation.

In primates, a seminal contribution to the understanding of this system was the discovery of Fuster and Alexander (1971) of "delay cells" in the PFC.These neurons show increased activity during the delay period of WM tasks maintaining the memory of a stimulus. Consequently, in primates including humans, WM is often modeled as "active memory" (Zipser et al., 1993; Durstewitz, 2009), a system that holds information in memory by sustaining neural activity for a few critical seconds.

Research in rodents commonly uses a broader definition of WM, that refers to "a collection of processes that include the temporary storage of information, as well as executive functions that mediate the manipulation and retrieval of trial-unique information to guide action after both short (seconds) and longer (minutes to hours) delays" (Phillips et al., 2004; see also: Mizumori et al., 1987; Floresco and Phillips, 2001). Importantly, this definition includes a much larger range of delays (seconds to many hours) compared to what is typically used in humans and non-human primates (seconds). Consequently, in rodents, the definition of WM does not necessarily refer to active memory maintenance by delay cells but might rely on different mechanisms that could be classified as learning mechanisms in primates. Thus, it is important to pay attention to the specific paradigms and definitions used when comparing results across species.

The definition of WM typically used in avian research was developed in parallel to the definition in humans (Honig, 1978). Both concepts are largely comparable with the exception that no phonological loop is conceptualized in birds. The delay durations in avian research are largely comparable to those in the primate literature and active information maintenance by delay activity is generally assumed to be the key mechanism of WM (Miller et al., 1996; Güntürkün, 2005a).

Taken together, there are fundamental terminological differences between species and it is important to keep these in mind when comparing results across species. In particular, the vast differences in delay duration used in different paradigms could potentially engage distinct neural mechanisms – what is called WM in one species might be viewed as a learning mechanism in another.

enzymes, tyrosine hydroxylase (TH), aromatic amino acid decarboxylase (AADC), and dopamine-b-hydroxylase (DBH). With modern immunohistochemical techniques it has been possible to map out in detail the location of DA neurons and their specific projections. DA neurons originate in several neighboring midbrain nuclei, being the substantia nigra pars compacta (SNc; A9) and the ventral tegmental area (VTA; A10) the ones projecting to the forebrain. The total number of TH-positive cells in VTA and SNc (bilateral count) is ∼20.000–30.000 in mice

and ∼40.000–45.000 in rats. This number increases considerably in primates, 160.000–320.000 in monkeys and 400.000– 600.000 in young humans. DA neurons send afferents to many target areas, including the several regions of the frontal cortex, with the striatum being the most densely innervated target (Björklund and Dunnett, 2007; **Figure 1**). PFC-projecting DA neurons are intermingled in VTA and SNc both in primates and in rodents. However, the PFC in primates is much more extensively innervated by midbrain DA afferents than in rodents (Thierry et al., 1973; Lindvall et al., 1978; Swanson, 1982; Descarries et al., 1987; Lewis and Sesack, 1997; Björklund and Dunnett, 2007).

Postsynaptically, DA exerts its actions within the PFC/NCL via receptors grouped in two major families, D1-like receptors (D1 and D5 in mammals; D1A and D1B in birds) and D2-like receptors (D2, D3, and D4 in mammals and birds), but D1-like receptors are expressed to a greater extent than D2-like receptors (Lidow et al., 1991; Durstewitz et al., 1998; Seamans and Yang, 2004; de Almeida et al., 2008; Santana et al., 2009; de Almeida and Mengod, 2010). In birds, the D1-like family is extended to include an additional receptor (D1D; Callier et al., 2003; Kubikova et al., 2010). Both families are G-protein-coupled receptors that exert slow changes of activity in the cells and act as functional neuromodulators. D1 like receptors show low affinity for DA, whereas D2-like receptors show higher affinity (Seamans and Yang, 2004). For the sake of clarity, we will abbreviate D1-like and D2-like receptors as D1R and D2R, respectively, and will point to a specific receptor subtype whenever necessary.

Interestingly, dopaminergic signaling in the PFC depends on brain maturation and the PFC is the brain structure that matures last (Gogtay and Thompson, 2010). Analyses of human postmortem brain tissue reveal that the levels of mRNA expression of the D2R and D5R subtypes in PFC are highest in neonates and infants and decrease with age, whereas the D1R subtype mRNA expression and protein levels increase with age and are highest in adulthood (Rothmond et al., 2012). By contrast, both in rats and non-human primates, densities of the D1R and D2R subtypes peak during adolescence and decrease in adulthood (Rosenberg and Lewis, 1994; Andersen et al., 2000). In songbirds, D1R and D2R subtypes in the song nuclei increase with age and peak during adolescence (Kubikova et al., 2010). The developmental patterns of related brain regions in non-songbirds are still unclear.

## **NEUROPHYSIOLOGY OF DA NEURONS**

"Classic" DA neurons show phasic activations (short duration bursts of action potentials) following unpredicted reward coding a quantitative "prediction error" signal, namely the difference between received and predicted reward value. A reward that is better than predicted elicits an activation (positive prediction error response), a fully predicted reward draws no response, and a reward that is worse than predicted induces a decrease in activity (negative error response; Schultz et al., 1993; Schultz, 2007, 2013). These prediction error responses of DA cells have been closely related to reinforcement learning models which assign a functional role of DA in modulating cortico-striatal inputs through a reward-prediction error teaching signal (Schultz, 1997, 2002; Morris et al., 2004, 2006; Pan

et al., 2005, 2008). In fact, fast DA release consistent with these reward predicting signals of DA neurons has been measured in nucleus accumbens during associative learning (Phillips et al., 2003; Day et al., 2007). Besides "classic" reward-prediction error responses, phasic DA cell firing patterns also include responses

to salient and aversive sensory stimuli (Horvitz, 2000; Joshua et al., 2008; Brischoux et al., 2009; Matsumoto and Hikosaka, 2009).

Dopamine neurons also exhibit tonic firing driven by pacemaker-like membrane currents (Grace and Bunney, 1984; Grace, 1991; Goto et al., 2007). The functional relevance of this tonic DA release is unknown. Transient suppression of tonic spiking in DA neurons follows the omission of expected reward, somehow implicating this spiking pattern in reward-based learning (Tobler et al., 2003). Recent work has shown that DA release in the striatum increases gradually (ramps up) as rats expect distant reward, perhaps providing motivational drive (Howe et al., 2013). However, these types of signals have not been described in PFC.

Which of these DA signals reaches the PFC remains currently unclear. While phasic DA prediction error signals could be used as a signal to transiently boost working memory (WM) of the corresponding stimuli (Cohen et al., 2002; O'Reilly et al., 2002), it has also been argued that mostly slower, tonic DA signals are relevant in PFC. Moreover, the phasic components of DA cell firing might be transmitted via corelease of glutamate (Seamans and Yang, 2004; Lavin et al., 2005; Castner and Williams, 2007; Sheynikhovich et al., 2013). For computational models of DA function in PFC this has two main consequences. Firstly, the timescales of tonic DA would constrain functional roles to rather general cognitive states such as arousal or attention. Secondly, DA function in PFC circuits should be carefully contrasted with known features of the putatively fast, phasic, signals of the nigrostriatal system.

In general, heterogeneity among DA cells points to additional functional aspects that are not covered by classic reinforcement learning descriptions (Berridge, 2007; Redgrave et al., 2008; Bromberg-Martin et al., 2010; Morris et al., 2010). While functional roles of VTA and SNc neurons share common properties (Ilango et al., 2014), overall evidence for different functional groups among DA cells has been emerging (Brischoux et al., 2009; Matsumoto and Hikosaka, 2009; Lammel et al., 2012; Watabe-Uchida et al., 2012). Moreover, the heterogeneity in DA cell activity patterns is probably related to heterogeneity in the anatomical pathways; DA neurons contribute to reward or aversion depending on whether they are activated from the laterodorsal tegmentum or the lateral habenula, respectively (Lammel et al., 2012). For these reasons, it has been difficult to dissociate the behavioral correlates of DA release between the projection pathways to the striatum and PFC.

### **HUMAN STUDIES**

Investigating the direct role of DA signaling in human PFC during learning and memory brings quite a few challenges and, consequently, only few studies address this question. DA receptor agonists and antagonists cannot be injected locally, restricted to the PFC, and have to be administered systemically in humans. Our knowledge about the role of DA transmission in the human PFC, therefore, comes from studies combining imaging of the brain with other manipulations such as systemic pharmacology or transcranial magnetic stimulation, genetic profiling, and from work in patients with neurological and psychiatric disorders.

For instance, a recent fMRI study has revealed a connection between context dependent WM and dopaminergic signaling in human PFC (D'Ardenne et al., 2012). The authors first identified by fMRI that the dorsolateral PFC was involved in the encoding of the context. Selective disruption of activity in this region with transcranial magnetic stimulation adversely impacted performance of the participants, causally implicating PFC in context encoding. PFC activity during the task was then found to correlate with phasic responses in the VTA and SNc. Based on these results, the authors suggest that phasic DA signals regulate the encoding and updating of context representations in the PFC.

In the 1970s, it was postulated that hypofrontality (i.e., decreased blood flow in the PFC) underlies mental disorders and impaired cognitive function (Ingvar and Franzén, 1974). In the context of schizophrenia, it was proposed that an excess of DA in the mesolimbic system causes the positive symptoms via hyperstimulation of D2R in the basal ganglia, whereas the cognitive and negative symptoms follow insufficient D1R activation in the frontal cortex (Abi-Dargham and Moore, 2003; Abi-Dargham, 2004). We now know that DA hypofrontality by itself cannot fully explain schizophrenia or other complex mental disorders. Impairments in PFC dopaminergic signaling and genetic profiling in these patients, however, have provided valuable information about the role of PFC DA in learning and memory. For example, schizophrenia patients exhibit imbalances in PFC dopaminergic signaling as determined by imaging approaches (Seeman, 1987; Okubo et al., 1997; Thompson et al., 2014), and show deficits in learning and WM (Kalkstein et al., 2010) that correlate with genetic variations in DA related genes (Glatt et al., 2003; Vereczkei and Mirnics, 2011). In Parkinson's disease (PD) patients, degeneration of neurons in the SNc results in decreased phasic and tonic PFC DA levels (Scatton et al., 1983; Moustafa and Gluck, 2011), which could explain the cognitive impairments present along with the motor deficits (Narayanan et al., 2013). A more direct involvement of DA in PFC-dependent memory processes was established in PD patients with and without DA medication. In a spatial WM task, subjects had to find tokens in boxes presented on a screen. Subjects that were off the DA precursor levodopa (L-DOPA) made more errors (checking boxes that had already been opened) compared to when they had received L-DOPA, indicating that DA is required for proper spatial WM performance. Surprisingly, visual learning and memory was not affected by L-DOPA in this task (Lange et al., 1992). Similarly, L-DOPA withdrawal did not affect the performance of PD patients in an N-back task, where WM is assessed when subjects are presented with a series of stimuli and have to indicate when a stimulus is the same as the one n steps back (Mattay et al., 2002). However, in PD patients undergoing deep brain stimulation surgery, microstimulation of the SN disrupts reinforcement learning in a two-alternative probability learning task (Ramayya et al., 2014). Furthermore, research conducted in attention deficit hyperactivity disorder (ADHD) patients, who also display learning and memory deficits, have

also provided some insight into the role of DA in learning and memory (Brown, 2006; Alderson et al., 2013). In these patients, the size of the PFC is reduced (Seidman et al., 2005), and genes involved in dopaminergic pathways are altered (Gizer et al., 2009). Taken together, the results from work in schizophrenia, PD, and ADHD patients point to an abnormal DA transmission as being responsible for behavioral deficiencies in some learning and memory tasks that depend heavily on PFC function.

Genetic studies have also provided valuable insight into the contribution of the DA system in learning and memory. Individuals with the Val/Val catechol-*O*-methyltransferase (COMT, enzyme that deactivates catecholamines) polymorphism [Val(108/158)Met] exhibit higher COMT activity that correlates with lower DA levels in the PFC (Chen et al., 2004), and have a slightly higher risk of developing schizophrenia (Sagud et al., 2010). Moreover, Val/Val carriers perform worse in the Wisconsin card sorting test (WCST) compared to carriers of the Met allele (Egan et al., 2001; Malhotra et al., 2002). The WCST consists of a battery of cognitive tasks that include WM, sensitivity to reinforcement, and behavioral flexibility. In addition, brain imaging studies indicate that Val/Val carriers need greater PFC activity to perform WM tasks (Egan et al., 2001; de Frias et al., 2010). Stress may be another factor that should be taken into consideration. Healthy human subjects under stress perform poorly in WM tasks (Olver et al., 2014) and exhibit exacerbated levels of PFC DA measured by positron emission tomography (PET; Lataster et al., 2011). In line with this finding, subjects with the above mentioned Val/Val COMT alleles and corresponding reduced levels of PFC DA perform better under stress during WM (Buckert et al., 2012).

Early evidence for the involvement of D1R in WM processes comes from work by Müller et al. (1998) that showed that systemic injections of pergolide, a combined D1R/D2R agonist, but not bromocriptine, a D2R agonist, facilitated WM performance in a delayed matching task with delays of 2–16 s. These results implicated D1R and not D2R in WM modulation. The important role of D1R on WM is also suggested by the correlation between the decrease of D1R binding in the lateral PFC and the decrease in WM performance with age (Bäckman et al., 2011). However, in another study, bromocriptine was shown to improve spatial WM while the D2R antagonist haloperidol (a typical antipsychotic drug) impaired it (Luciana and Collins, 1997). Other experiments, though, did not report a general effect of bromocriptine on spatial memory (Kimberg et al., 1997; Müller et al., 1998) nor binding of the D2R agonist [11C]FLB457 correlated with performance on the WCST (Takahashi et al., 2008).

Positron emission tomography studies in humans with the radioactively marked D1R agonist [11C]SCH23390 have revealed an inverted-U relationship between D1R binding in the PFC and performance on the WCST (Takahashi et al., 2008). An inverted-U relationship means that an optimal level of D1R activation is required for best performance and, thus, levels below and above this optimum impair performance. These experiments were meant to confirm results provided by experimentation in monkeys (see below). Further support for an inverted-U relationship between D1R density and WM comes from patients with schizophrenia. Deficits in WM have been associated with both decreased and increased densities of PFC D1R in these patients (Okubo et al., 1997; Abi-Dargham and Moore, 2003). Taken together, receptor studies in humans point to an important role of PFC D1R in WM with an optimal level of activation needed for best performance. By contrast, the involvement of D2R needs further elucidation.

### **NON-HUMAN PRIMATE STUDIES**

The use of invasive approaches in monkeys has provided valuable insights into the crucial role of PFC DA and its receptors in several higher-order executive functions. In fact, global 6 hydroxydopamine (6-OHDA) induced depletions of DA in the lateral PFC of monkeys allowed to establish early on the critical role of DA in WM (Brozoski et al., 1979). Later, a series of studies showed that there is an increase of extracellular DA in the PFC during WM tasks (Watanabe et al., 1997) that exerts its actions via local D1R (Sawaguchi and Goldman-Rakic, 1991, 1994; Williams and Goldman-Rakic, 1995; Murphy et al., 1996; Collins et al., 1998; Robbins, 2000; Seamans and Yang, 2004; Castner and Williams, 2007; Arnsten et al., 2010). More specifically, local injections of D1R antagonists, but not D2R antagonists, into the lateral PFC of monkeys caused deficits in oculomotor delayed-response tasks; monkeys were less accurate in making memory-guided saccades to remembered locations on the screen. We note that the WM component of the task in these studies was in the order of 1.5 to 6 s, comparable to the human literature. More recent work has evidenced that an optimal level of D1R tone is required for adequate WM performance, and this may be particularly vulnerable to changes in arousal state such as fatigue or stress (Arnsten et al., 2010; Arnsten, 2011). Thus, either too much (under stress) or too little (during fatigue) D1R stimulation impairs performance following an inverted-U shaped curve (Arnsten et al., 1994, 2010; Cai and Arnsten, 1997; Arnsten and Goldman-Rakic, 1998; Goldman-Rakic et al., 2000; Williams and Castner, 2006; Vijayraghavan et al., 2007; Arnsten, 2012). These reports in monkeys agree well with both the deleterious effects of stress on WM performance and the inverted-U relationship between D1R binding and cognitive capabilities reported in human subjects. This inverted-U modulation of D1R also occurs at the level of single PFC neurons engaged in WM. A D1R agonist modulates persistent activity during memory delays following an inverted-U response, whereby low levels of D1R stimulation enhance spatial tuning whereas high levels reduce it (Vijayraghavan et al., 2007). By contrast, D2R have little effect on delay activity and instead modulate the motor component of the task, suggesting some contribution of PFC D2R to motor control function (Wang et al., 2004). Systemic injections of D1R agonists and antagonists also alter the performance of monkeys duringWM tasks, but these studies have been reviewed elsewhere (Castner andWilliams, 2007).

One general question is why detrimental effects of the "wrong" DA concentration are present in the system in the first place. In other words, what could be functional reasons for decreasing WM performance? Speculatively, these could occur in situations in which the contribution of PFC to behavior is reduced anyway. For example, in high stress, fight or flight mode,

behavioral control could be directed to subcortical areas to emphasize speed (Arnsten, 2012; Avery et al., 2013). Alternatively, the fine-tuning of DA concentration could be used to control the "randomness" of behavior to emphasize exploitation or exploration of certain behaviors (Sutton, 1998; Doya, 2002; Parush et al., 2011; Humphries et al., 2012). Specifically, D1R activation might push the PFC toward an exploitation mode by protecting the WM content against distractors (Durstewitz and Seamans, 2002, 2008). In contrast, based on both computational and experimental approaches, D2R activation has been proposed to support behavioral flexibility (exploration; Floresco and Magyar, 2006; Durstewitz and Seamans, 2008; Puig and Miller, 2014). As in physiological situations selective stimulation of D1R or D2R seems problematic, differences in receptor affinities may produce D2R dominated states (very low and very high DA) and D1R dominated states (intermediate DA). While these properties are also well-suited to support the on- and offset of WM-related persistent activity (**Box 2**), it remains unclear whether the timescales of DA modulation of the PFC firing are fast enough (Cohen et al., 2002; O'Reilly et al., 2002; Seamans and Yang, 2004; Lavin et al., 2005; Sheynikhovich et al., 2013).

The monkey lateral PFC has also been implicated in associative stimulus-response learning (Asaad et al., 1998; Pasupathy and Miller, 2005; Histed et al., 2009; Antzoulatos and Miller, 2011; Puig and Miller, 2012, 2014). Reward-prediction error responses of DA cells might be critically involved in these learning processes (Schultz, 1998, 2007, 2013; see above). Consistent with this role in reward prediction, phasic DA release occurs in nucleus accumbens that is dynamically modified by associative learning (Phillips et al., 2003; Day et al., 2007). Thus, it is plausible that these DA signals also play a role in modulating PFC-dependent learning. Indeed, Puig andMiller (2012,2014) have recently shown that PFC D1R and D2R contribute to stimulus-response learning. Monkeys performed an oculomotor delayed response task where they learned by trial and error associations between visual cues and saccades to a right or left target (**Figure 2A**). Local microinjections of both D1R and D2R antagonists (SCH23390 and eticlopride, respectively) impaired the learning performance of the monkeys, who made more errors and needed more correct trials to learn the associations. The learning impairments correlated with a decrease of neural information about the associations in single prefrontal neurons during both the cue and memory delay (1 s) epochs of the trial. Noteworthy, blocking D1R impaired learning more than blocking D2R, whereas blocking D2R led to more perseverative errors (**Figures 2B,C**). This suggests that PFC D1R contribute to learning more than D2R, whereas the latter are more involved in cognitive flexibility. These complementary roles of D1R and D2R in PFC function agree well with the computational models mentioned earlier that propose that D1R activation helps stabilize new representations once an effective strategy has been identified (exploitation) whereas D2R activation destabilizes PFC network states favoring the exploration of new strategies (i.e., flexible processing; Durstewitz et al., 2000a; Seamans and Yang, 2004; Floresco and Magyar, 2006; Durstewitz and Seamans, 2008).

Contrary to the prominent role of DA in WM and associative learning, PFC DA does not influence familiar associations. Blockade of D1R and D2R in the lateral PFC does not cause

#### **BOX 2 | Computational perspectives on DA,WM, and PFC persistent activity.**

Models of DA effects in the PFC can be categorized based on their biophysical details of description and their assumed DA release patterns. Furthermore, while the neuropsychological definitions ofWM seem not always to be consistent across species (**Box 1**), computational studies often focus on the mechanisms underlying persistent activity during delay periods.

An influential early model of DA action in the PFC (Durstewitz et al., 2000a; see also: Durstewitz et al., 1999a), bridged the gap between DA-induced conductance changes and functional roles. In small networks of multi-compartment models of pyramidal cells and interneurons, increased DA levels changed various intrinsic ionic as well as synaptic conductances. Through a differential effect on cells in high and low activity states, these changes lead to a better separation of the network response to target and distractor patterns. In particular, the network ability to maintain a robust representation of the target pattern for more than one second was improved by increased levels of DA. This feature could be a central function of DA release in PFC, to support persistent activity related to WM.

In a similar approach, increasing the dominance of feedback inhibition in the network resulted in an inverted-U shape function of DA concentration and persistent activity, suggesting a close relation to well-known inverted-U shape relations between DA levels and behavioral performance (Seamans andYang, 2004). Overall, the ability of DA to enhance persistent activity has been verified on different modeling levels, ranging from detailed Hodgkin-Huxley-like compartmental models (Durstewitz et al., 2000b), over extended integrate-and-fire type descriptions (Brunel and Wang, 2001), to more abstract rate models (Chadderdon and Sporns, 2006). However, it remains unclear which level of model detail is necessary to capture all relevant factors of the extremely complex cellular and synaptic effects of DA in the PFC (Seamans and Yang, 2004). It has been argued that the fundamental underlying principle of changing the signal-to-noise ratio is the strengthening of both excitatory and inhibitory transmission (Cohen et al., 2002); in some cases this is achieved through changes in ionic and synaptic conductance (Durstewitz et al., 2000a), and in others through simple changes in the gain of the neural activation function (Servan-Schreiber et al., 1990). Mechanistically, D1R and D2R have been argued to be essential for changing the dynamics of PFC networks during WM. In the state space of PFC pyramidal and interneuron firing rates, baseline and persistent WM activity form two separate attractors. The level of DA controls the distance between these attractors as well as the structure of the underlying energy landscape, and thereby also the probability of noise to cause a switch between the two regimes (Durstewitz and Seamans, 2002). Still, besides the support of persistent activity, there are other aspects of DA function in PFC that might not be captured by the same principles.

While most previous modeling studies focused on the role of prefrontal DA on WM, a recent study emphasized that DA also affects long-term plasticity in the PFC (Sheynikhovich et al., 2013). Through a multi-compartment model of a PFC neuron (modified from Durstewitz et al., 2000a) they demonstrated that DA can control both the sign and amplitude of long-term plasticity. Potential functional roles of DA-mediated long-term plasticity in PFC could lie in the learning of complex high-dimensional representation of task rules and context (Mante et al., 2013; Rigotti et al., 2013). This would also expand the functional role from WM to a more fundamental role in shaping cognitive processes. The interaction of such structural changes with the other roles of DA in changing PFC activity and oscillatory patterns during WM remains one important direction for future computational approaches.

any behavioral deficit in monkeys remembering highly familiar stimulus-response associations (Puig and Miller, 2012, 2014; **Figures 2A,D**). This agrees with the hypothesis that DA is essential for the early stages of learning, but with extended training DA appears to play a decreasing role. So there may be a transition from goal-directed to habit-based instrumental performance likely orchestrated by the basal ganglia (Wickens et al., 2007; Graybiel, 2008).

A series of investigations carried out by the groups of AC Roberts and TW Robbins have shown in monkeys that DA depletions in another region of the PFC, the orbitofrontal cortex (OFC), disrupt conditioned reinforcement (i.e., when previously neutral stimuli in the environment become associated with reward). After DA depletions restricted to the OFC monkeys were insensitive to conditioned reinforcers and persisted responding in the absence of reward, resembling the compulsive responding of drug addicts (Walker et al., 2009). The OFC is also critical for reversal learning, the ability to switch responding to a previously non-reinforced stimulus upon learning (Robbins and Roberts, 2007; Kehagia et al., 2010). After excitotoxic lesions of the OFC monkeys were able to learn novel stimulus-reward associations, but showed marked perseverative deficits in their ability to reverse the associations (Clarke et al., 2008). Interestingly, this was sensitive to serotonin but not DA depletions (Clarke et al., 2004, 2005, 2007). In contrast, DA, but not serotonin, depletions in the caudate nucleus disrupt reversal learning, revealing striking neurochemical dissociations between the DAergic and serotonergic neuromodulatory systems in fronto-striatal circuits (Clarke et al., 2011, 2014). The role of specific DA receptors in these effects have not been explored, so this important piece of information is missing. In this regard, one study showed that systemic blockade of D2R, but not D1R, impairs reversal learning in monkeys without affecting new leaning (Lee et al., 2007). However, administration of drugs in this study was systemic, making the specific contribution of PFC D1R and D2R to the reported effects unclear.

## **RODENT STUDIES**

Separate populations of PFC pyramidal neurons with unique morphological and physiological properties have been identified in mice that express only D1R or D2R (Gee et al., 2012; Seong and Carter, 2012). This is similar to the well-established direct and indirect pathways in the basal ganglia, that express D1R and D2R, respectively (Albin et al., 1989; Alexander and Crutcher, 1990; Smith et al., 1998; Gerfen and Surmeier, 2011). In fact, a recent study has demonstrated that selective (optogenetic) activation of D1R-expressing neurons in the striatum (direct pathway) promotes reinforcement learning, whereas selective activation of D2R-expressing neurons (indirect pathway) induces transient punishment (Kravitz et al., 2012). However, the specific contribution of D1R- and D2R-expressing neurons in the PFC to learning has yet to be elucidated.

Early work in rats demonstrated, as in monkeys, that elevating or depleting DA in the PFC impaired spatial WM performance (Simon, 1981; Bubser and Schmidt, 1990; Murphy et al., 1996). In keeping with studies in monkeys, there is a phasic release of DA into the PFC during delayed response tasks, the magnitude of DA efflux being predictive of memory accuracy

## **FIGURE 2 | D1R and D2R in the monkey lateral PFC modulate**

**associative learning but not highly familiar associations. (A)** Delayed associative learning and memory task. Animals fixated to start a trial. A cue object was followed by a brief memory delay and presentation of two target dots. Saccade to the target associated with the cue was rewarded with juice drops. Trials were blocked in pairs of novel cues (80% of trials) and pairs of familiar cues (20% of trials). When performance on novel trials reached the learning criteria (80% correct and 30 correct trials per novel cue), novel cues were replaced and a new block of trials started. Monkeys first completed several Baseline blocks (Bas; first green lines). Then, 3 μl of either saline (controls; n = 20 sessions), a D1R antagonist (30 μg of SCH23390; n = 30 sessions), or a D2R antagonist (high concentration, 30 μg of eticlopride, n = 10 sessions; low concentration, 1 μg of eticlopride, n = 26 sessions) were pressure-injected in the left lateral PFC (Inj, injection block). Drugs were injected after different numbers of baseline blocks in different sessions (S1–S2) to account for any confounds generated by a systematic behavior of the monkeys. We classified blocks as baseline, "early" (injection block and first two postinjection blocks), or "late"

(postinjection blocks 3–5). **(B)** Average learning rates across sessions. We measured the learning rate of each block of trials by fitting a sigmoid distribution to the performance of the monkeys on novel trials using a logistic regression model. Learning rates were the slopes of the fitted distributions. Learning rates decreased significantly after the injection of both D1R and D2R antagonists compared to baseline and post-saline blocks. The D2R antagonist reduced learning rates less than the D1R antagonist. **(C)** Average percent of perseverative errors (consecutive error trials of the same cue). Perseverative errors increased significantly after the injection of both D1R and D2R antagonists compared to baseline and post-saline blocks. The high concentration of the D2R antagonist elicited more perseveration than the other treatments. **(D)** Average percent correct of familiar trials during the baseline, early, and late blocks of trials. Dashed line depicts the 80% threshold used as part of the learning criteria. DA antagonists did not affect the performance of familiar associations. Shown are the mean and SEM. Two-way ANOVA for treatment and blocks as factors. \*p < 0.05, \*\*p < 0.01, \*\*\*p < 0.001, Tukey's least significant difference post hoc test. Modified from Puig and Miller (2012, 2014).

(Floresco and Phillips, 2001; Phillips et al., 2004). Moreover, these DA actions are mediated by D1R. Zahrt et al. (1997) reported that overstimulation of PFC D1R with a D1R agonist induced deleterious effects in spatial WM of rats performing a delayed alternation task, an effect reversed by pretreatment with a D1R antagonist. Rats were required to alternate between two arms to obtain a reward, with a delay between trials of 5–30 s. Another study using a comparable range of delays (0–16 s) found that intra-PFC infusions of a D1R agonist, but not a D2R antagonist, could disrupt or facilitate performance in a task designed to account for the contribution of attention to WM. Importantly, this work suggested that different levels of DA may be required for different cognitive processes (Chudasama and Robbins, 2004). Seamans and Floresco used a delayed response variant of the radial-arm maze task to demonstrate, also in rats, that other types

of "WM" with comparatively longer delays (in the order of 30 min to several hours) are also sensitive to manipulations of PFC D1, but not D2, receptors (Seamans et al., 1998; Floresco and Phillips, 2001; Floresco and Magyar, 2006; Floresco, 2013). We note that some of these studies aimed at directly testing whether inadequate activation of PFC D1R in rodents caused the same detrimental effects on WM previously reported in monkeys, where memory delays were in the order of few seconds. Thus, and as pointed out previously (**Box 1**), it seems like studies across species have not reached a consensus in defining what "WM" is. However, altogether, these studies implicate PFC D1R in different types of "short-term" memory.

Also on par with primate studies, insufficient or excessive activation of PFC D1R impairs the performance of rats in WM tasks following an inverted-U shaped curve (Seamans et al., 1998; Mizoguchi et al., 2009; Floresco, 2013). Interestingly, this has been recently extended to a more holistic view of the role of D1R/D2R in cortico-striatal circuits. Transgenic mice with selective and reversible overexpression of D2R in the striatum exhibit poor WM abilities that correlate with exacerbated PFC D1R activation (Kellendonk et al., 2006; Li et al., 2011). In contrast with the monkey literature, though, rodent work has suggested that PFC D2R could play a role in WM. Druzin et al. (2000) reported that intra-PFC infusions of a D2R agonist disrupt performance of rats in a delayed-response task and that this D2R modulation of WM may be linear (i.e., lower/higher levels of D2R activation are associated with better/poorer performance). Thus, PFC D2R could also contribute to WM but following distinct principles of operation than D1R (i.e., linear vs. an inverted-U modulation; Williams and Castner, 2006; Floresco, 2013). So, perhaps the effects of the D2R agonist bromocriptine observed in human studies can be attributed in part to PFC D2R.

Furthermore, D4R may be key for emotional learning. In rats, activation of D4, but not D1, receptor subtypes in the medial PFC strongly potentiates the salience of emotional associative fear memories. Furthermore, individual neurons in the medial PFC actively encode emotional learning, and this depends on D4R activation (Laviolette et al., 2005). Conversely, stimulation of D1R and not D4R blocks the recall of previously learned emotionally relevant information suggesting, again, that D1R help shape memories. So, PFC D1R and D4R may play discrete roles (memory vs. learning) in the acquisition of emotional associations (Lauzon et al., 2009).

D1R and D2R exert complex modulatory actions on the activity of PFC neurons, as shown by *in vitro* recordings in PFC slices of rodents (see for an extensive review Seamans and Yang, 2004). Briefly, DA tends to enhance spiking via D1R through Na+, K+, and Ca2<sup>+</sup> currents (Yang and Seamans, 1996; Gorelova and Yang, 2000), an effect also observed in PFC slices of monkeys (Henze et al., 2000; González-Burgos et al., 2002). Conversely, DA decreases spiking via D2R, possibly through modulation of glutamatergic receptors and Na<sup>+</sup> conductances (Gulledge and Jaffe, 1998, 2001; Gorelova and Yang, 2000; Tseng and O'Donnell, 2004). Moreover, stimulation of PFC D2R can also induce an afterdepolarization mediated by L-type Ca2<sup>+</sup> channels and NMDA receptors (Gee et al., 2012). Besides these contributions of DA to the modulation of PFC activity, several rodent studies have also provided evidence that PFC neurons shape the activity of DA neurons. For example, Takahashi et al. (2011) found that OFC inactivation impaired state-value representations in VTA DA cell activity, in particular the effect of the animals own action plan on the state value. Furthermore, Jo et al. (2013) showed that PFC inactivation increases the DA response to reward-predicting stimuli. This matches a series of computational modeling studies in which PFC becomes part of the system that determines the value of the current state and propagates this information to the DA system (e.g., Frank et al., 2001; O'Reilly and Frank, 2006; Hazy et al., 2007). Although this supports a general role of PFC in shaping DA cell activity, the specific contribution during behavior depends on the corresponding firing patterns of the PFC neurons that affect DA cells.

## **BIRD STUDIES**

Higher cognitive abilities evolved largely independently in birds and mammals. This parallel evolution gave rise to several crucial differences in neural organization. While avian and mammalian striatum and pallium are homolog (derivedfrom a common ancestor), there are considerable differences in the organization of the pallium (Jarvis et al., 2005). For instance, the avian telencephalon does not have a pallial commissure comparable to the mammalian corpus callosum. The most notable difference, however, is the lack of the typical cortical lamination in the avian pallium (Jarvis et al., 2005). In other words, in spite of a shared evolutionary ancestry and a similar functionality, the avian and mammalian "cortex" look entirely different: what has evolved into layers in the mammalian brain might have evolved into different regions in the avian brain (Jarvis et al., 2013). Other organizational principles were preserved or independently evolved. For instance, a recent analysis of the avian connectome revealed a very similar network organization between birds and mammals (Shanahan et al., 2013). Using graph theory, the authors found that the telencephalon of both species has a comparable organization into modular, smallworld networks with a connective core of hub nodes. The most relevant here is the "prefrontal" hub. While the avian brain has no homolog of the mammalian PFC, it has a functional analog (structure with comparable functionality) – the NCL. A detailed comparison between both structures has been provided elsewhere (Güntürkün, 2005a,b; Kirsch et al., 2008). Briefly, PFC and NCL are centers of multimodal integration that are closely connected to all secondary sensory and motor regions (Kröner and Güntürkün, 1999).

Much like the PFC, the NCL is involved in WM as revealed by lesion studies (Mogensen and Divac, 1982; Güntürkün, 1997) and single cell recordings in pigeons during Go/Nogo tasks (Diekamp et al., 2002). Recently, an elegant study demonstrated that single neurons in the NCL of crows maintain memory information in two versions of a delayed match to sample task (DMS; Veit et al., 2014), the classical paradigm of WM research in primates. The animals were trained to view a sample image and indicate this image among similar images following a short delay (1–2.3 s). Similar experiments revealed an involvement of NCL in other cognitive functions such as categorization (Kirsch et al., 2009), the integration of time-to-reward with reward amount (Kalenscher et al., 2005), and executive control over what information is maintained in WM (Rose and Colombo, 2005). Another hallmark of prefrontal function, the processing of rules that guide behavior, was recently reported in the NCL of crows (Veit and Nieder, 2013). The authors used the same paradigm that was used in the original demonstration of such processes in primate PFC, a modified DMS task (Wallis et al., 2001). They report that single neurons in the NCL represent behavioral rules that instruct the animals how to respond to subsequent stimuli, a result that mirrors the original findings in the PFC.

The NCL, as the PFC, is the prime cortical (pallial) target of dopaminergic innervation (Durstewitz et al., 1999b). As in mammals, these projections arise in VTA and SNc (Waldmann and Güntürkün, 1993; **Figure 1**). Dopaminergic projections to the avian telencephalon show two distinct anatomical features (Wynne and Güntürkün, 1995). One type, "en passant" projections, are also found in the mammalian brain. These axons travel through the telencephalon, contacting a large number of dendrites and somata of predominantly smaller target neurons. The other type, "baskets," has not been reported in the mammalian brain. Here, individual fibers densely wrap around the somata and initial dendrites of predominantly larger cells. Interestingly, this type of innervation might be functionally comparable to the pattern of innervation in the mammalian cortex. In mammals, large pyramidal neurons lie mainly in deeper layers and are targeted by DA terminals through their proximal (in primates also distal) dendrites. The basket structures might be a way to generate a similar innervation of larger cells in the absence of cortical organization (Durstewitz et al., 1999b). Compared to the mammalian PFC, the avian NCL contains members of both DA receptor families, with a considerably lower density of D2 compared to D1 receptors (Dietl and Palacios, 1988; Durstewitz et al., 1998).

Overall, the role of DA in the avian brain is largely comparable to its role in the mammalian brain. DA is involved in motor control and learning, and in birds it also contributes to the acquisition and control of birdsong (Rieke, 1980, 1981; Güntürkün, 2005a; Fee and Goldberg, 2011). Even though birdsong is a major focus of avian research, here we will only briefly refer to this work. It has been reviewed extensively elsewhere and the main focus of the song literature is the role of DA in basal ganglia circuits (Kubikova et al., 2010; Fee and Goldberg, 2011; Simonyan et al., 2012). To our knowledge, no study has recorded avian dopaminergic neurons during learning, so there is no direct evidence for reward prediction error coding in avian DA neurons. However, several studies provide indirect evidence for temporal discounting (TD)-learning in birds. The only study that recorded from single DA neurons in the VTA of songbirds showed that DA neurons are strongly modulated by social context. The authors interpret this result in the light of "approval" – positive feedback of the females that the male subjects sang to (Yanagihara and Hessler, 2006). Later work confirmed that such social context activity is involved in modulating the singing-related activation of the song system (Hara et al., 2007). Further evidence comes from behavioral studies. Pigeons learn a simple discrimination task faster if they receive a larger reward for correct discrimination than with a smaller contingent reward. This difference in learning rate can be predicted by different reward prediction errors due to the different reward magnitudes (Rose et al., 2009b). Furthermore, injections of D1R antagonists in the striatum abolish this effect (Rose et al., 2013). Interestingly, the birds are still able to learn the discrimination but the learning rate is no longer modulated by the contingent reward magnitude. Learning shows an average rate with a slight decrease in performance on a large reward and a slight increase in performance with a small reward.

As in the mammalian PFC, DA in the avian NCL is critically involved in mechanisms of learning and memory. DA levels in the PFC of monkeys increase during WM tasks (Watanabe et al., 1997) and, consistently, microdialysis in the NCL of pigeons show an increase in DA during a DMS task with a delay (4 s) compared to the same task without a delay (Karakuyu et al., 2007).

Furthermore, injections of a D1R agonist (SKF81297) into the NCL and striatum improve performance on a DMS task (Herold et al., 2008). Interestingly, these injections were only beneficial on days with low performance; if the animals performed well, agonist injections disrupted performance. These findings are in line with the mammalian literature showing that DA modulates performance following an inverted-U shaped curve, where too much or too little D1R activation is detrimental to performance. It also complements nicely the reports showing that humans with genetically lower levels of DA in PFC are less susceptible to the detrimental effects of stress on WM (see Human Studies). In addition, and again in line with the mammalian literature on WM, injections of a D1R antagonist (SCH23390) into the NCL disrupt the ability of pigeons to focus their attention over longer periods of time and to ignore distracting stimuli (Rose et al., 2010).

In a recent study, Herold et al. (2012) assessed the expression of different DA receptor types in the NCL of pigeons trained on different cognitive tasks. This approach allowed the dissociation of changes in receptor expression due to WM (using a DMS task), stimulus selection (a stimulus-response task), or general task components such as reward and response selection. It is noteworthy that the mammalian D1R family is extended in the avian brain. In addition to D1A (D1) and D1B (D5) receptors, the avian brain also contains the receptor D1D. The authors report that general task components have no influence on D1R expression in the NCL. However, WM components increase expression of D1B and stimulus-response learning increases expression of D1A and D1D receptors. None of the task components affected the expression of D2R. These results demonstrate an involvement of DA receptors in the NCL not only in WM but also in learning mechanisms (Herold et al., 2012). In line with these results, microinjections of a D1R antagonist (SCH23390) to the NCL of pigeons resulted in severe disruptions of discrimination reversal learning (Diekamp et al., 2000). This result is in contrast to the finding that DA in caudate nucleus, but not the OFC, of monkeys is required for reversal learning (see Non-Human Primate Studies; Clarke et al., 2004, 2005, 2007, 2011).

## **CONCLUSION**

Despite decades of intense research, we are only now starting to comprehend the specific roles of DA in several PFC-dependent learning and memory processes. A main obstacle in understanding the complex DA modulation of PFC function, both at anatomical and physiological levels, is the outstanding heterogeneity and specificity of the DA system itself. Therefore, a cross-species comparison may contribute to identify general principles of DA function in the PFC. Each model species discussed here provides its unique advantages and challenges. Certainly, one of the main goals of studying the dopaminergic system is to expand our understanding of the healthy and diseased human brain in order to develop better treatments for neurological and psychiatric disorders with abnormal DA transmission. Since this research poses many technical constraints, non-human primates offer an alternative to study complex behavior and higher cognitive functions. In contrast, rodents can be manipulated vastly with a variety of genetic/optogenetic approaches, but their cognitive abilities might not be sufficient to address higher cognitive functions of humans. Finally, studying the avian brain offers an evolutionary perspective that might help identify crucial features and constraints of the dopaminergic system. Indeed, the crucial role of DA in executive function is highlighted by the fact that the independent evolution of higher-order cognition in birds gave rise to a largely comparable DA function – even in the absence of cortical layering.

Some major findings have been consistently replicated in different species, establishing their robustness. First, elevating or depleting DA levels in PFC impair performance in WM tasks. Second, PFC DA modulates WM via D1R. The potential involvement of D2R in WM is more controversial. Third, PFC D1R modulate WM following an inverted-U shaped curve. That is, an optimal level of D1R activation is required for adequate WM performance, and this is sensitive to changes in arousal state such as fatigue or stress. Recent studies in monkeys point to interesting extensions of these findings, but still need to be confirmed in other species and in other paradigms. They showed that the inverted-U curve modulation of WM may also occur at the level of spiking in PFC neurons, and that both PFC D1R and D2R play relevant roles in associative learning but not associative memory.

Clearly, more work will be necessary to fully understand the role of different receptor subtypes present in the avian and mammalian brains in learning and memory processes. In order to succeed, and as underscored in this review, researchers working on different disciplines and with different species will need to reach a consensus in how to define different types of learning and memory processes, paying particular attention to WM-related concepts and terminology (**Box 1**). Computational modeling could provide such unified definitions and hypotheses that are testable across species.

Importantly, recent investigations conducted in rodents have highlighted the close interaction between D1 and D2 receptors present in cortico-striatal circuits. In addition, separate populations of pyramidal neurons have been identified in the rat PFC that preferentially express only D1R or D2R, similarly to the D1Rexpressing direct and D2R-expressing indirect pathways of the basal ganglia. Although the specific contribution of these PFC neuron populations to learning and memory has yet to be elucidated, the use of genetic and invasive approaches in rodents is proving to be an excellent source of information. However, nonhuman primate models are better suited to gain deeper insights into the role of DA in more sophisticated tasks that are closer to the human cognitive repertoire. Unfortunately, genetic manipulations and invasive approaches such as optogenetics are just beginning to be developed in primates. A rapid advancement in the development of techniques applicable to humans is especially necessary, since human studies on the role DA in learning and memory have been particularly scarce. In this regard, it is now possible to measure DA release with accurate timescales by molecular fMRI (Lee et al., 2014). We hope that these emerging technical advances in primates will allow a more detailed understanding of the roles of D1R and D2R in higher-order executive function. This will be particularly important for the development of adequate drug therapies for patients with disorders that show

disrupted prefrontal DA signaling such as schizophrenia, PD, and ADHD.

#### **ACKNOWLEDGMENTS**

This work was supported by the BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG, grant no. EXC 1086, to Robert Schmidt) and theVolkswagen Foundation (Freigeist fellowship to Jonas Rose). We also acknowledge support from the Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Tuebingen University (to Jonas Rose and Nadja Freund).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 18 July 2014; published online: 05 August 2014. Citation: Puig MV, Rose J, Schmidt R and Freund N (2014) Dopamine modulation of learning and memory in the prefrontal cortex: insights from studies in primates, rodents, and birds. Front. Neural Circuits 8:93. doi: 10.3389/fncir.2014.00093 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Puig, Rose, Schmidt and Freund. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## New perspectives on catecholaminergic regulation of executive circuits: evidence for independent modulation of prefrontal functions by midbrain dopaminergic and noradrenergic neurons

### *Daniel J. Chandler, Barry D.Waterhouse andWen-Jun Gao\**

Department of Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA, USA

#### *Edited by:*

Evelyn K. Lambe, University of Toronto, Canada

#### *Reviewed by:*

Rita J. Valentino, The Children's Hospital of Philadelphia, USA Amy F. T. Arnsten, Yale University School of Medicine, USA

#### *\*Correspondence:*

Wen-Jun Gao, Department of Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA

e-mail: wgao@drexelmed.edu

Cognitive functions associated with prefrontal cortex (PFC), such as working memory and attention, are strongly influenced by catecholamine [dopamine (DA) and norepinephrine (NE)] release. Midbrain dopaminergic neurons in the ventral tegmental area and noradrenergic neurons in the locus coeruleus are major sources of DA and NE to the PFC. It is traditionally believed that DA and NE neurons are homogeneous with highly divergent axons innervating multiple terminal fields and once released, DA and NE individually or complementarily modulate the prefrontal functions and other brain regions. However, recent studies indicate that both DA and NE neurons in the mammalian brain are heterogeneous with a great degree of diversity, including their developmental lineages, molecular phenotypes, projection targets, afferent inputs, synaptic connectivity, physiological properties, and behavioral functions. These diverse characteristics could potentially endow DA and NE neurons with distinct roles in executive function, and alterations in their responses to genetic and epigenetic risk factors during development may contribute to distinct phenotypic and functional changes in disease states. In this review of recent literature, we discuss how these advances in DA and NE neurons change our thinking of catecholamine influences in cognitive functions in the brain, especially functions related to PFC. We review how the projection-target specific populations of neurons in these two systems execute their functions in both normal and abnormal conditions. Additionally, we explore what open questions remain and suggest where future research needs to move in order to provide a novel insight into the cause of neuropsychiatric disorders related to DA and NE systems.

**Keywords: catecholamine, dopamine, norepinephrine, prefrontal cortex, executive function**

## **INTRODUCTION**

The prefrontal cortex (PFC) is involved in a number of cognitive and executive functions in both primates and rodents, including working memory, sustained and flexible attention (Dalley et al., 2004; Arnsten, 2009; Bari and Robbins, 2013), and is therefore critical in guiding behavior in a complex and dynamic world. Importantly, PFC is innervated and strongly modulated by a number of anatomically and neurochemically distinct pathways. Of particular interest are the afferent fibers arising in the dopaminergic ventral tegmental area (VTA) and noradrenergic locus coeruleus (LC). The anatomical characteristics of these two catecholamine nuclei, as well as the cellular, physiological, and behavioral consequences of their activation, have been well characterized and reviewed in the past dopamine [DA – (Seamans and Yang, 2004; Bjorklund and Dunnett, 2007; Grace et al., 2007; Schultz, 2007; Bromberg-Martin et al., 2010; Ungless and Grace, 2012; Roeper, 2013), norepinephrine (NE) – (Dahlstroem and Fuxe, 1964; Morrison et al., 1978; Grzanna and Molliver, 1980; Swanson, 1982; Berridge and Waterhouse, 2003; Devilbiss and Berridge, 2006; Devilbiss et al., 2006; Arnsten, 2007; Chandler

and Waterhouse, 2012; Chandler et al., 2013)]. It is important to note that these two systems vary to a degree between rodents and primates. In particular, DA fibers in the primate PFC are known to arise from both the substantia nigra and VTA (Porrino and Goldman-Rakic, 1982; Haber and Fudge, 1997). In addition, in contrast to the popular view that DA-containing fibers project selectively to PFC in primates, the heaviest cortical DA projection actually terminates in motor and premotor cortices in the primate brain, while there seems to be a preferential DA projection to frontal and temporal areas in the rat with a minimal contribution to primary sensory and motor areas (Lewis et al., 1987; Berger et al., 1991). Furthermore, the distribution of DA-containing fibers among the cortical layers differs between species such that in primates, layer I is most densely innervated throughout the majority of the cortical mantle, whereas layers I through III are most densely innervated in the rat, and that this occurs preferentially in cingulate and entorhinal cortices (Berger et al., 1991). Despite the inter-species differences in DA projections to cortex [for more detailed review, see (Berger et al., 1991; Haber and Fudge, 1997)], we will focus

on recent findings describing the functional organization and neuronal diversity within VTA and LC and how these attributes relate to the execution of distinct behaviors maintained by prefrontal and non-prefrontal neural circuits. We will also consider how these two systems act synergistically within their terminal fields to mutually guide several aspects of complex behaviors. Finally, although there have been many more recent breakthroughs in understanding dopaminergic neuromodulation of prefrontal circuits, we will discuss how these advances can serve as a guide to similarly transform our thinking about the LC-PFC pathway.

## **DIVERSE FUNCTIONS AND PROPERTIES OF VTA DA NEURONS**

It is well established that the VTA includes both DA and non-DA neurons which project heavily to both PFC and the nucleus accumbens (NAc; Swanson, 1982; Lammel et al., 2008). This projection system has been strongly linked to normal cognitive function and motivated behavior, as well as pathological deviations in these operations such as schizophrenia, attention deficit hyperactivity disorder (ADHD) and addiction (Goldman-Rakic, 1994; Goldman-Rakic and Selemon, 1997; Volkow et al., 2011, 2012). In both rodents and primates, the actions of prefrontal cortical DA are known to vary according to an "inverted U" dose response function, such that too little or too much DA impairs PFC network functions and working memory task performance (Arnsten and Goldman-Rakic, 1998; Arnsten and Li, 2005; Robbins and Arnsten, 2009). It is also known that the firing properties of VTA dopaminergic neurons are plastic such that they are capable of remaining in a silent hyperpolarized state, maintaining irregular tonic discharge, and firing phasically in response to environmental stimuli under different behavioral conditions. However, because DA seems to execute distinct operations in different terminal fields (i.e., reward and reinforcement in NAc and enhancement of working memory in PFC), it raises the question of whether or not the cells which provide DAergic innervations to these regions are anatomically distinct from one another, and whether or not these cells can be differentially activated under different circumstances. Indeed, previous studies have provided evidence for functional specialization of mesocortical DA neurons (Bannon et al., 1982; Chiodo et al., 1984). This issue was recently further illustrated and detailed by Lammel et al. (2008) who found that in rodent, PFC and NAc are in fact innervated by distinct subsets of VTA neurons, and that these cells are physiologically and phenotypically distinct from one another. Specifically, the neurons that project to NAc were found to discharge slowly and have their firing rate suppressed by application of DA, whereas those that project to PFC discharged more rapidly and did not respond to DA application. These discharge properties can be explained by the fact that PFC projection cells lack mRNA coding for the DA D2 autoreceptor, which inhibits firing of DA neurons. Taken together these findings suggest that NAc and PFC which are engaged in unique aspects of motivated behavior receive input from anatomically distinct subsets of DA containing neurons, whose firing patterns appear to be under differential control.

A follow-up to this study showed that each of these subsets of DA-containing neurons in VTA are likewise unique in their

afferent regulation, and that these distinct circuits support different types of behaviors. Specifically, it was shown that afferents from the laterodorsal tegmental nucleus innervate dopaminergic VTA neurons which in turn project to NAc and elicit reward, whereas afferents from the lateral habenula (LHb) synapse on dopaminergic VTA neurons that innervate mPFC and drive aversion (**Figure 1**). The conclusion from this work is that the VTA is comprised of neurochemically similar but anatomically and functionally distinct neurons that mediate discrete aspects of motivated behaviors (Lammel et al., 2012). It is interesting to note that activation of dopaminergic neurons with projections to mPFC results in conditioned place aversion, whereas a much greater body of literature suggests that DA in the PFC plays an important role in electrophysiological and behavioral indices of working memory (Goldman-Rakic, 1994; Stevens et al., 1998; Goldman-Rakic et al., 2004; Arnsten and Li, 2005; Arnsten, 2007; Driesen et al., 2008).

These dual roles for DA in the PFC could potentially be explained by the existence of anatomically and functionally discrete subsets of VTA DA neurons that innervate different cortical layers: for example, DA neurons involved in working memory may project primarily to cortical layers that interact with primary sensory cortices to facilitate the transmission of sensory information between regions so that the representation of a stimulus can be maintained even in its absence. Aversion and emotional operations maintained by DA, on the other hand, may involve the activation of DA neurons that innervate cortical layers which maintain connections with limbic structures rather than sensory structures. In this way, activation of these two pathways could result in DA release and modulation of functionally distinct prefrontal microcircuits that mediate unique operations and behaviors. Conversely, these unique functions could be attributed to a common pool of VTA neurons that do not selectively target functionally distinct cortical layers but, depending on their pattern and level of activation, engage different receptor subtypes to elicit distinct circuit properties. For example, during modest levels of VTA output, such as in response to salient stimuli, the D2 receptor will be activitated due to its higher affinity for the transmitter. Then, during elevated levels of VTA activation, such as during periods of stress, the lower affinity D1 receptor becomes engaged. Thus, because of different receptor affinities and post-synaptic actions, DA release would produce different effects on cellular physiology and PFC circuit properties (Arnsten, 2007, 2009). Based on the inverted-U dose response function for DA actions, and the differential roles of its receptors in working memory functions, modest DA release in response to a salient stimulus is likely to strengthen measures of working memory for that stimulus, whereas excessive activation of the DA D1 receptor impairs behavioral indices of working memory (Seamans and Yang, 2004; Arnsten, 2009). During such periods when PFC is inhibited, emotional centers such as amygdala may instead take over and drive more survivalist "fight or flight" behaviors (Arnsten, 2009). In such an organization, the aversion described by Lammel et al. (2012) may have been reflective of hyperdopaminergic tone at the upper limit of the physiologic range in PFC as a result of optogenetic stimulation, thereby limiting prefrontal operations and allowing other

limbic circuits to guide such a specific behavior instead. Interestingly, Bromberg-Martin et al. (2010) has hypothesized that in the primate brain, DA cells arising from substantia nigra and VTA differentially innervate orbitofrontal cortex and dorsolateral PFC to convey value and salience, respectively, to these structures. This proposal fits well with our central hypothesis that specific subpopulations of neurons arising from the midbrain and hindbrain nuclei are capable of executing unique actions in distinct terminal fields.

#### **DIVERSITY OF NE NEURONS IN THE LC NUCLEUS**

Prefrontal circuits and operations are also subject to regulation by output from the LC-noradrenergic system. Like DA, the actions of NE vary according to an inverted-U dose response function such that too little or too much noradrenergic transmission yields a less than optimal neuronal response to sensory stimuli (Berridge and Waterhouse, 2003; Devilbiss and Waterhouse, 2004; Devilbiss et al., 2012). Importantly, the pattern of LC activation correlates highly with behavioral state in both primates and rodents such that during periods of fatigue, LC discharge is absent or slow. During periods of active waking and in conjunction with behavioral tasks that are cognitively demanding, the LC discharges faster with phasic bursts in response to relevant stimuli. During periods of stress and agitation, the nucleus discharges at a very high tonic rate and sensory-driven phasic responses are lost (Aston-Jones and Bloom, 1981a;Valentino and Foote, 1988; Aston-Jones et al., 1994; Berridge andWaterhouse, 2003;Aston-Jones and Cohen, 2005a,b). Likewise, too much NE in PFC synapses activates the α1 receptor, impairing PFC function in a manner similar to excessive activation of the D1 receptor (Arnsten and Dudley, 2005; Arnsten, 2007, 2009).

Interestingly, behavioral and electrophysiological studies of LC in both primate (Aston-Jones et al., 1994) and rodent (Bouret and Sara, 2004) have shown that LC is highly plastic in response to stimuli that drive its activation. Previous work had suggested a more simplistic role for the LC-NE system in arousal and the sleep-waking cycle. However, attended stimuli that predict reward have beenfound to elicit a robust phasic discharge of LC cells, while distracters of the same or different modality do not (Aston-Jones et al., 1994). Importantly, the response to a reward-predicting stimulus is rapidly lost and shifted to a new stimulus when the reward-contingency is changed (Foote et al., 1980; Aston-Jones et al., 1994; Rajkowski et al., 1994; Bouret and Sara, 2004; Aston-Jones and Cohen, 2005a). These data suggest that LC may therefore have a more complex role in attention and cognition, than simply serving as a generalized alerting or wake-promoting structure (Aston-Jones and Bloom, 1981a; Rajkowski et al., 1994; Berridge and Waterhouse, 2003; Berridge, 2008). Aston-Jones and Cohen, for example, have proposed that LC integrates goal-oriented sensory information from the PFC to shift the nucleus between tonic and stimulus-driven phasic modes of discharge. These tonic and phasic modes of discharge then sensitize terminal fields to detect non-specific and specific stimuli, respectively; thereby guiding labile versus sustained modes of attention (Aston-Jones and Cohen, 2005b).

Importantly, it has long been thought that LC is the sole source of NE to the neocortex (Loughlin et al., 1982, 1986a,b; Berridge and Waterhouse, 2003; Agster et al., 2013), and that its' neurons project to their terminal fields indiscriminately; i.e., a single neuron is just as likely to innervate functionally dissimilar regions as those that have common function (Loughlin et al., 1982, 1986a,b). Recent behavioral evidence however, seems to suggest that the LC-NE system exerts unique influences on operations in distinct prefrontal terminal fields. Specifically, in rodent, NE specific lesions of mPFC impair extradimensional shifting, a behavior in which animals must reorient their attentional reserves to novel stimuli to obtain food reward, but not reversal learning, an OFC dependent behavior in which animals must reorient attention to familiar but previously irrelevant stimuli (McGaughy et al., 2008b; Newman et al., 2008). On the basis of these findings and the observation that both behaviors are noradrenergically regulated (McGaughy et al., 2008a,b; Seu et al., 2009; Snyder et al., 2012) we postulated that OFC and mPFC must be innervated by distinct subsets of LC neurons: if both regions received input from a common pool of LC neurons, injection of 6-OHDA into mPFC would lead to the retrograde degeneration of the axons in mPFC, the cell bodies in LC, as well as anterograde degeneration of axon collaterals innervating OFC. Indeed, we have recently shown that these two regions, as well as anterior cingulate cortex, a third anatomically and functionally distinct prefrontal region, are in fact innervated by anatomically distinct subsets of LC neurons (Chandler and Waterhouse, 2012; Chandler et al., 2013). Additionally, another recent publication from our laboratory demonstrated that the density of noradrenergic release points is not uniform throughout the forebrain (Agster et al., 2013). Specifically, NE varicosity is significantly more dense in PFC than in sensory, motor, and thalamic regions, further supporting the hypothesis that NE may have unique roles and execute distinct operations in functionally and anatomically disparate projection fields (**Figure 2**). These findings suggest that the LC-NE projection to PFC subregions may subserve distinct behavioral roles, similar to what is suggested by the organization of the mesolimbic and mesocortical dopaminergic pathways described by Lammel et al. (2008, 2012). It has also recently been demonstrated by Robertson et al. (2013)that contrary to the longheld notion that LC is the sole source of NE-containing fibers to the forebrain in rodents, the insular cortex is innervated by non-LC derived NE terminals, i.e., sub-coeruleus, A1, and A2 cell groups (**Figure 2**). Such findings challenge the classical view that NE acts uniformly and synchronously within its terminal fields (Aston-Jones and Bloom, 1981a,b; Rajkowski et al., 1994). Specifically, NE release in insular cortex may be achieved through activation of LC, or by activation of the functionally and anatomically distinct sub-coeruleus, A1, or A2 cell groups. The different anatomical connectivities and physiological attributes of these various noradrenergic nuclei suggest that NE can be released into

PFC under unique sensory or environmental circumstances. The finding that PFC is the only cortical structure in this study to be innervated by non-LC NE fibers suggests that the transmitter may maintain unique roles in prefrontal versus non-prefrontal cortical function. Because these non-LC noradrenergic cell groups receive sensory information from the viscera and are involved in homeostatic and interoceptive functions,they form an autonomic circuit and a direct route for the release of NE into prefrontal structures that affect vigilance and decision making. This pathway bypasses the LC and provides a means for asynchronous release of NE in the forebrain from multiple brainstem structures. Such an organization would therefore impose changes in prefrontal physiology without affecting properties of other terminal fields and argues that NE discretely modulates anatomically and functionally distinct terminal networks. Such a hypothesis could be tested by electrically or optogenetically stimulating these non-coerulear noradrenergic cell groups while sampling NE release in prefrontal versus non-prefrontal terminal fields by microdialysis or fast scan voltammetry.

#### **RECIPROCAL CONNECTIONS BETWEEN LC AND VTA**

Despite the heterogeneous and varied roles for DA and NE in prefrontal cortical function that have already been discussed, an added layer of complexity emerges when taking into account the reciprocal connections maintained between VTA and LC (El Mansari et al., 2010). It is reasonable to expect that as both of these pathways are activated in response to different behavioral circumstances, each will produce some effect on the other. This then begs the question as to whether these systems work cooperatively to produce behavioral modifications that require output from both systems, or if they act competitively to drive distinct and opposite behavioral outcomes. It has been shown that electrical stimulation of LC results in an excitation followed by a brief inhibition of midbrain dopamine (DA) neurons through an

**FIGURE 2 | Distinct brain regions are differentially innervated by noradrenergic neurons in multiple brainstem nuclei.** Recent findings from our laboratory (Chandler and Waterhouse, 2012; Agster et al., 2013; Chandler et al., 2013) show that individual LC neurons innervate multiple functionally distinct cortical terminal fields, and that the highest density of NE varicosities in the brain occurs in PFC. A recent finding by Robertson

et al. (2013) also challenged the longstanding notion that LC is the sole source of NE to cortex by demonstrating the existence of NE-containing fibers in insular cortex derived from a rhombomere distinct from that in which LC develops, suggesting that this region has privileged access to autonomic and visceral information while the rest of the cortical mantle does not.

α1 receptor dependent mechanism (Grenhoff et al., 1993). Furthermore, lesions of LC have been shown to reduce basal and amphetamine-induced release of DA in the NAc (Lategan et al., 1990). Interestingly, anatomical evidence has shown that there is also a monosynaptic projection from VTA to LC (Beckstead et al., 1979), and that stimulation of VTA increases the concentration of NE metabolites in PFC (Deutch et al., 1986). Furthermore, previous studies indicated that both NE and DA provide essential modulatory influences on prefrontal functions (Mingote et al., 2004; Arnsten and Li, 2005; Aston-Jones and Cohen, 2005b; Morilak et al., 2005; Rossetti and Carboni, 2005; Drouin et al., 2006).

How do these two systems coordinate their activities to appropriately regulate prefrontal functions and what happens when this coordination becomes un-balanced? Essentially, how does one system affect the ouput of the other under normal conditions and disease states? DA and NE are critical for maintaining normal, adaptive behaviors (Arnsten and Goldman-Rakic, 1998; Dalley et al., 2004; Aston-Jones and Cohen, 2005a; Arnsten, 2007; McGaughy et al., 2008a,b). Increasing or decreasing either transmitter severely limits exploratory behavior. VTA and LC neurons that release DA and NE, respectively, are both activated by salient stimuli, and the strength of activation appears to be related to the values of stimuli used for predicting future behavior (Horvitz, 2000; Stuber et al., 2008; Sara and Bouret, 2012). However, existing evidence suggests DA and NE may contribute to different functions, with DA being related to reward assessment and error prediction and NE being related to arousal and/or vigilance. This suggests that their roles in motivated behavior are segregated in that they reflect different influences of reward on behavior. It has been postulated that DA neurons are more sensitive to the incentive value of reward information, whereas NE neurons are more sensitive to the arousing aspects of reward information (Bouret et al., 2012). Similarly, during a working memory task, NE and DA systems also synergistically or complementarily contribute to modulate the persistent activity needed for the cue, delay and response signaling within the PFC circuitry. Specifically, as others (Robbins and Arnsten, 2009; Arnsten, 2009) have proposed, with optimal levels of NE or DA release under alert, non-stressed conditions, PFC neurons fire during the delay period following cues for preferred but not non-preferred directions. NE enhances delay-related firing in response to cues in preferred directions by stimulating α2A-receptors (increasing the "signal"), whereas DA weakens delay-related firing in response to cues in non-preferred directions by stimulating D1 receptors (decreasing the "noise"). This assumption is evidenced by administration of appropriate concentrations of the α2A-receptor agonist guanfacine or the D1 receptor agonist SKF81297. In contrast, with high levels of NE and DA release as would occur during stress, NE engages the lower-affinity α1-receptors and reduces mnemonic stimulus evoked neuronal firing. Interestingly, the impact of the activation of adrenergic receptors in non-prefrontal cortical regions such as sensory and motor cortices seem to be opposite of that in prefrontal regions: α1-receptor activation increasees neuronal responsiveness to sensory-driven inputs, whereas α2 receptor activation suppresses stimulus evoked discharge (Arnsten, 2000, 2007, 2009). Similarly, high DA induces excessive D1 receptor

stimulation and suppresses cell firing as well. Indeed, administration of the α1-receptor agonist phenylephrine (Mao et al., 1999) or a high concentration of SKF81297 (Williams and Goldman-Rakic, 1995) can mimick the effects of high NE and DA levels, respectively.

It is also important to recognize that DA and NE levels in PFC are constantly fluctuating as a function of arousal level and ongoing behavioral contingencies. As the relative levels of these transmitters in the extracellular space changes, so too will their impact on cellular function. Importantly, the impact of these transmitter systems on post-synaptic cellular physiology is often characterized one at a time, i.e., the impact of DA or the impact of NE on specific parameters of neuronal or circuit function. However, under physiological conditions, it is likely that these two transmitters, as well as many other neuromodulatory agents and transmitter substances interact simultaneously throughout the brain and spinal cord via activation of a number of membranebound receptors on neurons and glia. A first step in addressing the issue of neuromodulator interactions and influences on complex circuit functions would be to consider the net effects of simultaneous administration of two or more modulatory substances on synaptically driven discharge of target neurons. There is already strong evidence for synapse- and cell-type specific modulation of local cortical circuitry in the PFC by both DA and NE (Gao et al., 2001, 2003; Gao and Goldman-Rakic, 2003; Wang et al., 2013). Thus, the PFC is a likely candidate for studies focused on the combined impact of DA and NE on transmission at single synapses and response properties of identified neurons.

## **SUMMARY**

Taken together the findings reviewed here suggest that both noradrenergic and dopaminergic nuclei contain heterogeneous sets of neurons whose properties vary according to terminal field projection targets, and that these two catecholamine pathways act synergistically or complementarily in order to affect executive function and motivated behaviors via connections with specified forebrain circuits as well as by maintaining reciprocal excitatory connections with one another. Because there exists a range of concentrations for both DA and NE in PFC at which behavior and cellular physiology are optimized, and too far below or above this range is detrimental to behavioral outcomes, it seems that these two systems are both required for the normal maintenance and execution of prefrontal operations. Likewise, because these two pathways are reciprocally excitatory, it is likely that activation of one pathway by external or internal stimuli recruits the other indirectly. Such an arrangement would benefit complex behaviors, i.e., a task requiring sustained attention is also dependent on motivational state. It may be the case that VTA efferents to NAc and PFC work in concert with LC inputs to PFC and primary sensory and motor cortical regions. For example, during a period of vigilance in a particular behavioral task, LC activation and NE release may optimize PFC and sensory cortical function with respect to signal to noise ratios of stimulus evoked pyramidal neuron responses, while DA release from VTA promotes a transient working memory association – mnemonic – of that stimulus. Together, these two transmitter systems work synergistically to allow the animal to selectively focus on and remember the relevance of a reward associated stimulus. Upon the successful execution of a behavioral trial and reward retrieval, VTA signals NAc to elicit reward, reinforcing the behavior and causing the animal to continue focusing on that specific stimulus to predict and retrieve the next reward. Hereafter, when a behavioral contingency is changed, the NAc signals VTA that an expected reward has not occurred. The reciprocal connections between VTA and LC may then alter their collective output in PFC, thereby decreasing the sensitivity of PFC and primary sensory networks to that specific stimulus by a NE-mediated decrease in signal to noise ratio, as well as a decrease in working memory for that stimulus. Consequently, in the absence of reinforcement and reward, the animal is able to sample alternative behavioral strategies through sensitization to previously irrelevant stimuli. Once a new strategy is identified, VTA signals NAc to promote reward, thereby shifting the reciprocal connections between LC and VTA back to a mode which favors sensory discrimination and working memory of the new reward predictive stimulus (**Figure 3**). This is an intriguing possibility given that in the rodent, NAc and striatum seem to be largely devoid of LC-derived fibers, and primary sensory and motor cortical areas are not heavily innervated by DA fibers (Berger et al., 1991; Berridge and Waterhouse, 2003). Hence, VTA may preferentially modulate reward through its projections to NAc, LC may preferentially modulate sensory and motor processes through its projections to more posterior cortical areas, and these two catecholamine nuclei may work synergistally in PFC to affect attention, working memory, and cognitive functions that drive complex behavior (**Figure 4**).

Therefore, reciprocal connections between these two nuclei may be important for maintaining activity states in each nucleus that are sufficient for appropriately guiding ongoing behavior. In the absence of these reciprocal connections, the projection from VTA to NAc might be sufficient for keeping an animal motivated to perform a task or execute a specific behavior, but attention toward a specific stimulus used to guide that behavior may be minimal. Conversely, the projection from LC to PFC might be adequate in resolving specific stimuli, but insufficient to attend specific stimuli and achieve a desirable outcome in the absence of a motivational drive provided by the dopaminergic projection from VTA to NAc.

Additionally, as discussed earlier, the VTA maintains a projection to PFC which has been shown to promote aversion (Lammel et al., 2012) rather than motivation or reward. Interestingly, it is known that certain stressors elicit greater release and metabolism of DA in PFC than other forebrain regions (Deutch and Roth, 1990), suggesting that the mesocortical DA may play an integral role in the cognitive aspects of the stress response. Importantly, it is also known that high levels of DA and NE in PFC impair cognition and elevation of these catecholamines occur during exposure to stressors. During stressor-induced activation of the LC (Valentino and Foote, 1987, 1988; Curtis et al., 1999; Berridge and Waterhouse, 2003; Berridge, 2008; Devilbiss et al., 2012), the VTA would be the target of increased noradrenergic transmission from the LC-VTA pathway, thereby providing a means for VTA to contribute to the expression of aversive behaviors much in the same way that LHb neurons influence VTA activity and DA release within the PFC (Lammel et al., 2012, 2013). Methods similar to those used

by Lammel et al. (2012) could be employed to identify such functional connections between VTA and LC and to determine how the reciprocal connections between these two nuclei influence physiological properties, release, and consequently PFC related cognitive function and behavior.

Importantly, these recent findings on the neurobiology of the VTA, as well as the recent identification of non-LC derived NEcontaining terminals in insular cortex represent a way forward for advancing the study of the LC-NE pathway. As this system has long been viewed as homogeneous with fairly uniform, synchronous actions across its efferent domain and on behavior by way of a highly divergent network of axon collaterals, the demonstration that it is in fact more heterogeneous than previously recognized would transform the prevailing notions about the postulated contributions of the LC-NE system toforebrain operations. Importantly, we have recently provided anatomical evidence that LC neurons innervate their terminal fields on a functional rather than random basis (Chandler and Waterhouse, 2012; Chandler et al., 2013) and experiments are currently underway to test the hypothesis that cells with discrete terminal fields express different molecular profiles and unique physiological attributes. Such data would provide evidence that LC efferent system is capable of differential release and asynchronous NE actions across its terminal fields in the same way that DA release is governed by specified VTA projection patterns. Additionally, the recent demonstration that certain regions of PFC are innervated by non-LC-NE containing fibers (Robertson et al., 2013) supports the view that NE maintains distinctive roles in prefrontal circuit operations as dictated by activation of source nuclei (sub-coeruleus, A1, A2) that give rise to NE-PFC projections. Such an organization would therefore prompt noradrenergic modulatory actions in prefrontal circuits without affecting other cortical regions; a mode of operation similar to that proposed for the VTA-DA system on the basis of its divergent mesocortical and mesolimbic projections.

Identification of specific afferents to LC cells with specified outputs as has been shown in the DA system (Lammel et al., 2008, 2012) will further the collective understanding of the role of LC in maintaining discrete behavioral operations rather than acting as a homogeneous and uniform modulator of the activity in LC projection fields. Optogenetic approaches may provide a means of characterizing anatomic, neurochemical, and functionally specific pathways into and out of LC that maintain distinct roles and demonstrate that NE release is capable of producing unique actions in different terminal fields under diverse circumstances. Because the LC-NE and VTA-DA systems maintain reciprocal anatomical connections and appearto act synergistically and complementarily to guide behavior, advances in the study of one of these catecholamine pathways will by necessity impact study of the other. Going forward it will be important to consider the differences as well as the similarities between these two systems. Nevertheless, the results of recent studies of the VTA show that heterogeneity is quite apparent in the nucleus (Lammel et al., 2008, 2012), and our recent work on the anatomy of the LC-PFC projections show that the nucleus is at least anatomically aligned to allow for similar heterogeneity in this nucleus as well. As such, anatomical, molecular, and physiological heterogeneity

in catecholamine nuclei may therefore be a fundamental principle of their organization, and future studies of these structures and their efferent domains may provide a framework for better understanding acquired or genetically transmitted abnormalities of the VTA-DA and LC-NE systems that result in maladaptive behaviors including those expressed in addiction, ADHD, schizophrenia, and post-traumatic stress disorder.

behavioral operations could collectively contribute to the repetition of that

## **FUTURE PERSPECTIVE AND FUNCTIONAL IMPLICATIONS OF THE DIVERSITIES IN CATECHOLAMINERGIC INNERVATION OF PFC**

The diverse innervation of PFC by subsets of DA and NE neurons is certainly an important conceptual advance in our understanding of these two systems. But several questions remain. How are these two systems affected when PFC function and structure are altered in response to genetic and epigenetic factors? How do disease states affect each of these systems and their interactions? Are all cells within these nuclei equally responsive to genetic and environmental insult, or is it possible that cells with different terminal fields are differentially susceptible to certain forms of stressors? For example, evidence suggests that in Alzheimer's and Parksinson's diseases, LC neurons degenerate selectively (Gesi et al., 2000; Grimm et al., 2004; Weinshenker, 2008; Szot et al., 2010; McMillan et al., 2011; Miguelez et al., 2011). It may be that such degeneration targets LC-PFC projection neurons specifically and that this selective degeneration plays a role in the cognitive decline associated with these diseases. Further exploration of the properties of specified groups of LC-cortical projection neurons could help determine the susceptibility of these organizations to pharmacological, environmental, or genetic insult

previously irrelevant stimuli.

heavily to the entire cortical mantle, including PFC and primary sensory and motor areas, but not to the striatum or NAc (Berridge and Waterhouse, 2003). VTA on the other hand innervates NAc and PFC, but provides only sparse innervations to more posterior cortical areas (Berger et al., 1991). Therefore, during periods of arousal and vigilance, when LC and VTA discharge is

beneficial during behavioral tasks which require sustained attention, as DA in NAc (green) will facilitate reward, NE in cortex (red) will alter the signal to noise ratio of pyramidal neurons to optomize them to specific stimuli, and both catecholamines in PFC (yellow) will work synergistically to facilitate working memory and attention to relevant stimuli.

that manifest in symptoms of neuropsychiatric or neurodegenerative disease associated with noradrenergic function. Similarly, it remains to be determined whether PFC projection neurons in the LC are more sensitive to stressors or the actions of psychostimulant drugs (e.g., methylphenidate) as compared to LC cells with different efferent domains. Furthermore, based on the published data on VTA neurons, we expect that subtypes of LC neurons with unique profiles and terminal field projection patterns receive different sets of afferent inputs, e.g., GABAergic versus glutamatergic, cortical versus subcortical, as well as dopaminergic, serotoninergic, or cholinergic afferents. Answers to these questions will provide novel insights into the operation of these systems and their collective impact on adaptive and maladaptive behavior.

## **ACKNOWLEDGMENTS**

This study was supported by NIDA DA017960 to Barry D. Waterhouse, Drexel HCEP fund to Daniel J. Chandler, Drexel Cure grant 002766-002 to Wen-Jun Gao and Barry D. Waterhouse, and NIH R01MH085666 to Wen-Jun Gao.

#### **REFERENCES**


major depressive disorder. *CNS Neurosci. Ther.* 16, e1–e17. doi: 10.1111/j.1755- 5949.2010.00146.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 January 2014; accepted: 27 April 2014; published online: 21 May 2014. Citation: Chandler DJ, Waterhouse BD and Gao W-J (2014) New perspectives on catecholaminergic regulation of executive circuits: evidence for independent modulation of prefrontal functions by midbrain dopaminergic and noradrenergic neurons. Front. Neural Circuits 8:53. doi: 10.3389/fncir.2014.00053*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Chandler, Waterhouse and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The role of prefrontal catecholamines in attention and working memory

## *Kelsey L. Clark and Behrad Noudoost\**

*Department of Cell Biology and Neuroscience, Montana State University, Bozeman, MT, USA*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*Albert Compte, IDIBAPS (Institut d'Investigacions Biomèdiques August Pi i Sunyer), Spain Christos Constantinidis, Wake Forest University, USA Simon Jacob, Charité - Universitätsmedizin Berlin, Germany*

#### *\*Correspondence:*

*Behrad Noudoost, Department of Cell Biology and Neuroscience, Montana State University, Room 01 Lewis Hall, Bozeman, MT 59717, USA e-mail: bnoudoost@montana.edu*

## **INTRODUCTION**

Attention, working memory, impulse control, and other "topdown" cognitive functions have long been known to depend on the prefrontal cortex (PFC) (Ghent et al., 1962; Chao and Knight, 1998; D'Esposito and Postle, 1999). Many of these cognitive functions are disrupted in mental disorders such as attention deficit hyperactivity disorder (ADHD), Parkinson's disease, and schizophrenia. Studies in human and non-human primates have implicated prefrontal catecholamines in control of cognitive functions. Notably, drugs altering catecholamine signaling have been used to treat the symptoms of some of these mental illnesses. Consequently, an imbalance in prefrontal catecholamines has long been a suspected cause of the cognitive component of these mental illnesses. Our goal is to review studies examining the contribution of prefrontal catecholamines to cognitive tasks and their dysfunction. Due to known differences between rodents and primates (Berger et al., 1991), this review will be focused on studies in human and non-human primates. Among catecholamines, the main focus will be on dopamine (DA), however the role of norepinephrine (NE) will also be briefly addressed. We survey the evidence implicating prefrontal catecholamines as the neurochemical mediator of the neural and behavioral signatures of attention and working memory, and link these neurobiological findings to the etiology and treatment of cognitive impairments in mental disorders.

### **EFFECTS OF DA WITHIN PFC**

The importance of prefrontal DA in delayed-response tasks was established very early on (Brozoski et al., 1979), and much work has since gone into unraveling the details of this dependence (see **Table 1**). The PFC receives DA-ergic projections from both the

While much progress has been made in identifying the brain regions and neurochemical systems involved in the cognitive processes disrupted in mental illnesses, to date, the level of detail at which neurobiologists can describe the chain of events giving rise to cognitive functions is very rudimentary. Much of the intense interest in understanding cognitive functions is motivated by the hope that it might be possible to understand these complex functions at the level of neurons and neural circuits. Here, we review the current state of the literature regarding how modulations in catecholamine levels within the prefrontal cortex (PFC) alter the neuronal and behavioral correlates of cognitive functions, particularly attention and working memory.

**Keywords: dopamine, reward, top-down control, pathophysiology, frontal eye field, V4, extrastriate cortex**

ventral tegmental area (VTA) and the substantia nigra (Porrino and Goldman-Rakic, 1982; Levitt et al., 1984; Goldman-Rakic et al., 1992). DA neurons in the VTA and substantia nigra exhibit both tonic activity, and phasic responses associated with the expectation of reward (Schultz et al., 1993) or reward prediction errors (Schultz, 1998). While DA neurons are activated by the spatial cue in the working memory tasks discussed below (since it signals the availability of a reward in the near future), this activation differs from that observed in PFC itself in that it does not reflect the cue position, nor does it continue throughout the delay period (Schultz et al., 1993). Therefore, the incoming DA-ergic input to PFC does not directly encode the remembered stimulus, but could potentially serve to "tune" the prefrontal network for optimal activity.

In order to understand the effects of prefrontal DA release on neural activity, first let us consider DA receptors and the anatomy of DA-ergic terminals in PFC. DA receptors are G-proteincoupled receptors, modulating neuronal activity via intracellular signaling cascades rather than directly inducing either excitatory or inhibitory postsynaptic currents (Yang and Seamans, 1996; Lachowicz and Sibley, 1997; Missale et al., 1998). The five types of DA receptor are commonly divided into two classes: the D1 family (comprised of D1 and D5 receptors) and the D2 family (D2, D3, and D4 receptors) (Missale et al., 1998; Seamans and Yang, 2004). Expression for D1 receptors (D1Rs) is enriched in the PFC of both primates and rodents, suggesting an important role in specifically prefrontal circuit functions (Lidow et al., 1991; Goldman-Rakic et al., 1992). Within PFC, D1Rs are expressed in both superficial and deep cortical layers, while expression of the less abundant D2Rs is limited to the infragranular layers (Lidow et al., 1991). Although this bilaminar distribution pattern


#### **Table 1 | Studies examining the contribution of prefrontal catecholamines to the behavioral and neural correlates of working memory in non-human primates.**

*(Continued)*

#### **Table 1 | Continued**


*(Continued)*

#### **Table 1 | Continued**


*Studies are divided by neuromodulator (dopamine or norepinephrine) and specific receptor (D1R, D2R, alpha-2A) where applicable. Abbreviations: MPTP, 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine, which kills dopaminergic neurons in the substantia nigra; 5-OHDA, 5-hydroxydopamine, which selectively kills dopaminergic and noradrenergic neurons; FEF, Frontal Eye Field; dlPFC, dorsolateral prefrontal cortex; vlPFC, ventrolateral prefrontal cortex; SNR, signal-to-noise ratio; MGS, memory-guided saccade.*

is evident at birth, layer three undergoes a dramatic post-natal increase in the density of DA innervation, which is then subject to layer-specific remodeling and decreases in DA axon density during adolescence (Lewis and Harris, 1991; Rosenberg and Lewis, 1995; Lewis, 1997); during this period performance on delayed response tasks improves and becomes more dependent on the PFC (Alexander and Goldman, 1978). Goldman-Rakic and colleagues used immunohistological staining, Golgi impregnation and electron microscopy to examine DA-ergic synapses in the PFC. They found that DA-ergic boutons were part of synaptic triads, in which the DA-positive bouton formed a symmetric synapse, while an unlabeled asymmetric synapse (of the type associated with excitatory inputs) contacted the same dendritic spine (**Figure 1A**). Many of the postsynaptic neurons appear to be pyramidal cells. However, targets of DA-ergic projections include both pyramidal cells and fast-spiking interneurons (Goldman-Rakic et al., 1989; Sesack et al., 1995, 1998). D1Rs can also be located outside of synapses (Smiley et al., 1994), suggesting at least some slow timecourse effects as DA diffuses to these more remote sites of action. Some DA axonal varicosities also appear to be localized outside of synaptic specializations (Smiley and Goldman-Rakic, 1993), and may contribute to extrasynaptic "volume transmission" effects (Zoli et al., 1998). This anatomy seems conducive to dopamine playing a modulatory role, regulating the efficacy or strength of prefrontal signals originating elsewhere.

The role of dopamine in modulating glutamatergic activity, suggested by the presence of synaptic triads, has been directly tested in slices using dual whole-cell patch clamp recordings, examining the effect of DA application on synaptic transmission between neurons. These experiments revealed that DA reduces the reliability of excitatory neurotransmission by reducing the probability of glutamate release presynaptically (Gao et al., 2001). Consistent with this finding, the consequences of this reduced reliability of synaptic transmission can be read out in the synchronous activity of neighboring prefrontal neurons *in vivo* (**Figure 1B**): iontophoretic application of a D1R antagonist introduces a peak in the cross-correlogram between prefrontal pyramidal neurons, as the reliability of transmission from mutual inputs increases (Castner and Williams, 2007). Conversely, iontophoretic application of a D1R agonist decreases synaptic efficacy, disrupting common excitatory input and eliminate existing cross-correlogram peaks for neighboring prefrontal neurons. Interestingly, DA may have different direct effects on excitatory and inhibitory neurons within PFC (Gao and Goldman-Rakic, 2003; Jacob et al., 2013). Using iontophoresis to examine the effects of DA on prefrontal visual responses during a visual detection task, Jacob and colleagues found two distinct types of DA-ergic modulation (Jacob et al., 2013) One group of PFC neurons, which included all the modulated narrow-spiking, putatively inhibitory neurons, was inhibited by DA; these showed short onset latency of DA effects (∼10 ms), with no change in signal-to-noise ratio (SNR) or inter-trial variability. A second set of prefrontal neurons was excited by DA application, displaying an increase in SNR and decrease in inter-trial variability; this effect was slower (∼200 ms) and observed only in broadspiking, putatively pyramidal neurons. These direct effects of DA on the excitability of individual neurons of different types will then interact at the population level—for example, the activity of inhibitory neurons helps shape the tuning of excitatory neurons during working memory (Rao et al., 2000; Constantinidis et al., 2002).

In addition to modulating neural activity within PFC, DA also alters activity-dependent plasticity (Gurden et al., 2000; Pawlak and Kerr, 2008; Zhang et al., 2009). Such plasticity has long been the focus of study in relation to addiction; however, these changes may also play a more general role in associative learning. Recent work in rodent PFC suggests that both D1R and D2R signaling contribute to changes in plasticity, with D2R actions on inhibitory interneurons gating potentiation, and D1Rs on postsynaptic neurons controlling the size of the temporal window during which coincident spikes induce potentiation (Xu and Yao, 2010). These effects of DA on plasticity, and the known firing of DA inputs to PFC in response to prediction errors, suggest an influence of prefrontal DA on learning, and indeed a recent study reveals just such an effect (Puig and Miller, 2012). Local injection of a D1R antagonist into ventrolateral prefrontal cortex (vlPFC) was found to impair monkeys' acquisition of novel visuomotor associations, without impairing performance on familiar associations. This behavioral effect was accompanied by changes in prefrontal activity, again observed for novel but not familiar cues: selectivity of individual neurons for the upcoming motor response decreased, while synchronous discharge and low-frequency LFP power increased. Human experiments have also linked phasic activity in midbrain DA nuclei and PFC with context-dependent working memory performance (D'Ardenne et al., 2012). Thus, phasic discharge of prefrontal DA inputs may be particularly important during the learning of novel associations and tasks, or contextual switching between rules.

Williams and Goldman-Rakic used iontophoresis in a behaving monkey to extend these findings on the cellular effects of DA to its role in prefrontal circuits during a spatial working memory task (Williams and Goldman-Rakic, 1995).

Iontophoretic application of a D1R antagonist during spatial working memory selectively enhances delay-period activity representing the remembered location. The effect is dose-dependent, with enhanced delay activity for an intermediate level of D1R antagonist, but suppression of both delay and visual activity in the same cell when a greater concentration of the drug is applied.

This "inverted-U" dose-dependency, in which an intermediate level of DA signaling produces more selective memory activity,

presumably by disrupting the efficacy of common input. In the bottom plot, the neurons did not show evidence of common input during the control

> has also been observed for a D1R agonist: low doses suppressed only responses to non-preferred locations, enhancing the spatial tuning of delay activity, while higher doses suppressed activity altogether (Vijayraghavan et al., 2007). How to reconcile the apparently contradictory findings that a low dose of either a D1R agonist (Vijayraghavan et al., 2007) or a D1R antagonist (Williams and Goldman-Rakic, 1995) improves the selectivity of PFC delay activity? Presumably the answer lies in the original level

back toward optimal levels, improving performance; D1R antagonists will

move DA levels further from optimal, impairing performance.

of DA-ergic tone in the PFC neurons being studied (**Figure 1C**), although it remains unclear why these studies would have a systematic difference in baseline prefrontal DA-ergic stimulation. The known elevation of prefrontal DA by stress (Thierry et al., 1976; Roth et al., 1988; Abercrombie et al., 1989) further raises questions as to how the stresses affecting laboratory animals may impact their baseline DA-ergic tone, and thus the effects of pharmacological agents. Also note that the basis for improved delay selectivity appears to differ for the two agents: D1R antagonists may improve tuning by increasing the level of delay firing for the preferred location, while the D1R agonist selectively reduces firing for the non-preferred cue locations.

Several biologically-plausible neurocomputational models have been developed to incorporate DA or NE-ergic modulation of prefrontal activity (Servan-Schreiber et al., 1998; Durstewitz et al., 2000; Brunel and Wang, 2001; Chadderdon and Sporns, 2006; Eckhoff et al., 2009; Avery et al., 2013). One such model (Chadderdon and Sporns, 2006) of task-oriented behavioral selection incorporates such disparate brain regions as early visual areas, inferotemporal cortex, PFC, basal ganglia, and anterior cingulate cortex. At the heart of this model is a mechanism that simulates exogenously induced changes in prefrontal DA release, which is thought to underlie the updating and maintenance functions of working memory. More recently, Avery et al. (2013) constructed a model of PFC designed to capture the effects of signaling through both DA (D1) and NE (alpha-2A and alpha-1) receptors. Both of these models were able to reproduce the "inverted-U" effect of catecholamine signaling, with impaired working memory representations when levels were too high or too low. The former model incorporated changes in prefrontal DA levels over the course of a delayed match to sample task, using these changes to switch the prefrontal network between states of updating based on current inputs vs. maintaining previous inputs, while the latter instead examined the effects of tonic DA and NE tone on network behavior. The extent to which fluctuations in PFC DA levels during different task epochs occur or contribute to task performance remains experimentally unproven.

The effect of DA on the activity of prefrontal neurons is complicated, involving multiple mechanisms of direct and indirect action through D1Rs and D2Rs, affecting presynaptic release, NMDA, GABA, AMPA, Na+, Ca2+, and K<sup>+</sup> currents, among others (Seamans and Yang, 2004). Various studies have reported either primarily inhibitory (e.g., Pirot et al., 1992), excitatory (e.g., Henze et al., 2000), or heterogenous (Jacob et al., 2013) effects of DA on PFC neurons. The main points we wish to emphasize here are that DA acts as a neuromodulator, altering the efficacy of synaptic input to prefrontal neurons, and that there is some optimal level of DA-ergic stimulation for a neuron to experience, with greater or lesser DA signaling leading to an erosion of task related activity.

### **DOPAMINE, REWARD, AND VISUAL ATTENTION**

Given the known firing of prefrontal DA afferents in response to reward expectation (Schultz, 2013), and the ability of expected reward to modulate responses throughout the brain, we cannot discuss the role of DA in prefrontal control of cognitive functions without considering the effects of reward and to what extent they can be separated from the other roles of prefrontal DA signaling. In the following sections we first discuss the difficulties in parsing the behavioral effects and neural signatures of attention and reward. We review the known role of both prefrontal DA and reward in modulating responses in visual cortex, and the evidence for and against prefrontal neurons receiving DA input themselves representing reward value. The evidence suggests that prefrontal DA contributes to both representations of target value and to the behavioral and neural signatures of attention, although further studies will be needed to determine if DA's roles in these processes are dissociable.

#### **DISSOCIATING NEURAL SIGNATURES OF ATTENTION AND REWARD**

DA release is associated with reward cues or expectation (Schultz, 2002). The involvement of DA signaling in both attention and reward raises the question of how these mechanisms overlap or diverge. Indeed, many behavioral tasks manipulate attention or reward in such a way that these two properties cannot truly be distinguished from one another (Maunsell, 2004). Consider a typical study seeking to identify a neural correlate of reward size (**Figure 2A**). One stimulus is placed within the neuron's response field (RF), a second outside it; the relative size of the reward associated with the two locations is then varied, either in blocks or from trial to trial based on some cue. A neuron that displays greater activity when the high-reward stimulus appears in its RF is typically reported as encoding reward expectation or value. The same logic applies to reward probability, although this manipulation must be done in blocks. Now consider a typical "attention" task: multiple stimuli appear onscreen, and one of them must be monitored for a behavioral response—again, the selected location may be held constant over a block of trials or varied from trial to trial based on a cue. Sometimes the task occurs only or more frequently at the cued stimulus—in other cases the animal is explicitly trained not to respond to changes at the uncued location; in either version of the attention task, reward is exclusively or predominantly associated with the stimulus at the attended location, and yet in these studies a difference in firing rate is attributed to the locus of attention rather than an expected reward. Conversely, the "reward" activity we described in the previous experimental design could be attributed to attentional modulation, given that on a behavioral level the expectation of a reward attracts attention (Posner, 1980). Importantly, many areas reflecting attentional modulation in their neural activity also exhibit reward-dependent activity (**Figure 2B**). A recent study of the effects of reward on activity in primary visual cortex showed that the strength of reward-size modulation across cells was strongly correlated with their modulation by attention, suggesting that the neural sources of these effects may be overlapping, if not identical (Stani¸ ˇ sor et al., 2013, discussed further in the next section). Since this critique originally appeared a decade ago, many studies that experimentally manipulate reward values acknowledge the potential confound, or even explicitly attribute their findings to attention (Kennerley and Wallis, 2009), but few attempt to dissociate the two processes. Even studies using paradigms designed to differentiate representations of reward from general behavioral salience (Leathers and Olson, 2012) have proven controversial (Leathers and Olson,

#### **FIGURE 2 | Interactions between attention and reward. (A)** A

schematic illustration of typical tasks used to study reward and attention, and how the differences in potential reward and neural activity are similar between the two paradigms. Consider two studies conducted in V1 (Stani¸ ˇ sor et al., 2013 and McAdams and Reid, 2005). To study the effect of reward size in the Stani¸ ˇ sor task (schematically illustrated in the top panel), two potential targets appear, with colors indicating different reward values. Neural activity recorded at this point in the task reflects the relative value of the target in the RF (higher activity when the RF target offered a greater reward than the non-RF target); a subsequent cue instructs the monkey which target to saccade to. In the McAdams and Reid attentional paradigm (bottom panel), a cue indicates which of two stimuli should be monitored for a change, which instructs an eye

#### 2013; Newsome et al., 2013), or generated results that suggest reward cues can drive attentional allocation in ways that prove detrimental to task performance (Peck et al., 2009).

#### **MODULATION OF VISUAL REPRESENTATIONS BY PFC DA AND REWARD**

Although many studies have examined the effect of DA-ergic agents on prefrontal activity, and prefrontal activity has long been believed to modulate responses in visual cortex during attention and working memory, until recently no one had directly examined the effect of locally manipulating prefrontal DA signaling on visual responses in other cortical areas. Noudoost and Moore (2011b) examined the long-range effects of altering prefrontal DA signaling on visual responses in extrastriate area V4. V4, like much of visual cortex, receives direct projections from the Frontal Eye Field (FEF) part of the PFC, an area strongly implicated in controlling spatial attention (Moore and Fallah, 2004; Armstrong et al., 2009; Clark et al., 2012), and it is believed that these projections may be the source of the changes in activity observed in V4 during the deployment of covert attention (Moore and Armstrong, 2003; Awh et al., 2006; Noudoost et al., 2010, 2014; Squire et al., 2013; Clark et al., 2014). Noudoost and Moore examined the effects of manipulating either D1Rs

movement response to a separate location. Changes at the uncued location must be ignored, and will never lead to rewards. Neural activity is higher when the stimulus in the RF is cued. In both cases higher expected reward value for the stimulus in the RF is associated with greater neural activity. **(B)** An overview of brain areas in which neural activity reflecting both attentional modulation and reward value has been reported. Only a single study is cited for each area; reward studies are in gray, attention studies in black. Dotted outlines represent structures not located on the cortical surface, either within sulci or deeper within the brain. Abbreviations: PMC, premotor cortex; vlPFC, ventrolateral prefrontal cortex; dlPFC, dorsolateral prefrontal cortex; SC, superior colliculus; BG, basal ganglia; LIP, lateral intraparietal area; SEF, supplementary eye field; ACC, anterior cingulate cortex; FEF, frontal eye field.

or D2Rs on V4 visual responses during a passive fixation task, and their effect on saccadic target selection in a free-choice task (Noudoost and Moore, 2011a,b). While both D1R and D2R manipulations increased the monkey's tendency to choose the saccade target in the affected region of space, biasing saccadic target selection, only D1Rs had an impact on V4 visual responses. Local injection of a D1R antagonist into the FEF enhanced the strength of visual signals in V4: response magnitude increased, orientation selectivity was enhanced, and trial-to-trial variability decreased (**Figure 3**). All of these changes are also observed in V4 when covert spatial attention is directed to the V4 neuron's RF (Moran and Desimone, 1985; McAdams and Maunsell, 2000; Reynolds et al., 2000; Mitchell et al., 2007). The reason for the differing effects of FEF D1R and D2R manipulations on V4 activity, but common effects on target selection, may lie in the patterns of receptor expression within the FEF. D1Rs are expressed in both the supragranular layers, which project to V4, and infragranular layers, which contain neurons projecting to motor areas such as the superior colliculus. In contrast, D2Rs are primarily expressed in the infragranular layers. This pattern of expression could account for both receptors influencing target selection, while only D1Rs alter V4 responses (Noudoost and Moore, 2011c).

Neurophysiological experiments in V1 have provided a direct comparison of the effects of attention and reward on visual cortical responses (Stani¸ ˇ sor et al., 2013). Visual responses were shown to be modulated by the relative reward value of the RF stimulus; moreover, the magnitude of this modulation was strongly correlated with the strength of atttentional modulation during a later time window in the same task, and the onset latencies of the two effects were indistinguishable. Like attentional modulation, the neural effects of reward value were dramatically enhanced in the presence of a second stimulus. Human fMRI experiments have also demonstrated a D1R-dependent reward modulation of visual cortical activity (Arsenault et al., 2013). These effects of reward on visual cortex may not be attributable to the PFC—they could result from a bottom-up influence of DA-ergic changes in LGN signaling (Zhao et al., 2002), or via direct DA release from midbrain projections (Lewis et al., 1987). However, several aspects of the findings argue in favor of a prefrontal origin to these effects: the strong correlation with attention in the Stani¸ ˇ sor case, the presence of this modulation even in trials without a visual stimulus in the Arsenault paper, the lower density of DA-ergic projections to visual cortex (Berger et al., 1988), and the proven ability of DAergic PFC activity to modulate representations in visual cortex (Noudoost and Moore, 2011b), make PFC a likely source of this reward-induced modulation.

### **REPRESENTATION OF REWARD VALUE BY PFC NEURONS AND THE ROLE OF DA IN THIS REPRESENTATION**

Multiple studies have looked for representations of reward value in PFC. Leon and Shadlen (1999) examined the effect of centrally cued reward size on FEF and dlPFC responses during a memory-guided saccade task. They found an effect on reward size on responses in dlPFC, but not FEF; this dlPFC rewardsize dependent activity continued throughout the delay period. Interestingly, the presence of reward-size information in dlPFC responses was dependent on the simultaneous maintenance of a spatial memory: in a variant of the task in which the reward cue appeared before the spatial cue, no reward size information was present until after the subsequent spatial cue appeared. However, findings by Ding and Hikosaka suggest that the FEF will also represent reward size information under certain conditions: specifically when the reward is tied to a particular location (Ding and Hikosaka, 2006). Using an asymmetrically rewarded memory-guided saccade paradigm, in which the relative value of the two target locations varied between blocks of trials, they found that about 1/3 of FEF neurons were selective for the location of the larger reward during the cue period. This may reflect the stronger retinotopic organization of the FEF in comparison to dlPFC (Suzuki and Azuma, 1983; Bruce et al., 1985; Funahashi et al., 1989). Interestingly, this reward modulation did not persist into the delay period—precisely the time in which the dlPFC representation of reward was observed by Leon and Shadlen, and the period whose activity predicts an FEF neuron's ability to distinguish targets from distractors (Armstrong et al., 2009). This pattern of reward modulation contrasts starkly with the response properties of the DA neurons projecting to PFC, again emphasizing the role of DA-ergic activity as a modulator rather than a simple driver or inhibitor of prefrontal activity.

Spatially-specific representation of reward values in the FEF and rich DA-ergic inputs to this area raise the hypothesis that FEF DA could serve as a mechanism for reward-dependent selection of visual targets. Indeed, Soltani et al. pursued this idea and tested the behavioral effects of perturbing DA-ergic activity within the FEF of monkeys performing a saccadic choice task and simulated the effects using a biologically-plausible cortical network (Soltani et al., 2013). They found that manipulation of FEF activity either by blocking D1Rs or by stimulating D2Rs increased the tendency to choose targets in the RF of the affected site. These effects of DA manipulation could be described purely in terms of motor biases; however, DA manipulation also altered the influence of choice history, and hence reward history, on subsequent target choices. The effects of choice history were also differently altered by the two DA receptors: D1R manipulation decreased the tendency to repeat choices on subsequent trials, whereas the D2R manipulation increased that tendency. This altered impact of choice history indicates that manipulating FEF DA influences the value of saccadic targets based on prior reward experience. The network simulation results suggest that D1Rs influence target selection mainly through their effects on the strength of inputs to the FEF and on recurrent connectivity, whereas D2Rs influence the excitability of FEF output neurons. Altogether, these results reveal dissociable DA-ergic mechanisms influencing target selection in which D1Rs and D2Rs differentially alter saccadic target selection by virtue of their effects in different cortical layers (Noudoost and Moore, 2011c). The network model revealed that DA-ergic modulation of the afferents to the FEF could alter reward-dependent choice. Based on this model one might predict, for example, that after blocking D1Rs within the FEF, the form and time constant of reward integration would be altered such that the impact of previous rewards on current choices could be increased or decreased.

DA is a neuromodulator known to play a crucial role in reward-dependent behavior. Prefrontal neurons, which receive rich DA-ergic input from areas representing expected rewards, play a pivotal role in top-down modulation of cortical activity. Prefrontal DA (Noudoost and Moore, 2011b) and reward (Stani¸ ˇ sor et al., 2013) can both modulate representation of targets within visual areas, mimicking some of the signatures of topdown visual attention. The questions of whether manipulation of PFC DA changes reward-dependent behavior, the degree to which signatures of attention and reward expectation in visual areas are dissociable, and whether DA-mediated PFC activity is the link for established behavioral interactions between attention and reward, remain to be answered.

## **NOREPINEPHRINE**

DA is not the only neuromodulator whose levels are critical for prefrontal function during cognitive tasks: NE also appears to be crucial to normal PFC activity. The PFC receives NE input from the locus coeruleus (Porrino and Goldman-Rakic, 1982; Levitt et al., 1984). The tonic firing of locus coeruleus NE neurons reflects arousal state, with low rates during slow wave sleep or drowsiness, moderate rates during waking, and high rates in response to acute stress. They also display phasic firing in response to behaviorally relevant stimuli during normal waking, but this phasic firing can extend to irrelevant distractors during fatigue or stress (Aston-Jones et al., 1999). Like the DA projections described above, NE inputs to PFC show a bilaminar targeting pattern (Morrison et al., 1982; Levitt et al., 1984; Lewis and Morrison, 1989). NE binds to high affinity alpha-2 adrenoreceptors, and to lower affinity alpha-1 and beta receptors (Molinoff, 1984). Alpha-2 receptors are found on dendritic spines in the superficial layers of PFC; although they can function both pre- and post-synaptically, their postsynaptic activity appears to underlie the benefits of alpha-2A agonists on working memory and other cognitive tasks (Arnsten and Cai, 1993; Wang et al., 2007). Like dopamine, there appears to be an optimal, intermediate level of NE signaling in PFC. The higher levels of NE associated with stress may impair PFC function through actions at the lower affinity alpha-1 receptors in the superficial layers (Arnsten et al., 1999; Birnbaum et al., 1999; Mao et al., 1999), and beta receptors localized on dendritic spines in the intermediate layers (Aoki et al., 1998; Ramos et al., 2005). Intracellularly, the actions of D1 (Vijayraghavan et al., 2007), alpha-2A (Wang et al., 2007), and beta1 receptors may converge on the cAMP signaling pathway (Gamo and Arnsten, 2011). Studies of the contributions of prefrontal NE to cognitive function led to the development of alpha-2A agonist guanfacine as a treatment for ADHD (Hunt et al., 1995; Taylor and Russo, 2001; Biederman et al., 2008; Gamo and Arnsten, 2011).

#### **HUMAN STUDIES OF PFC CATECHOLAMINES IN NORMAL AND ABNORMAL COGNITIVE FUNCTION**

One of the reasons for focusing on prefrontal catecholamines as opposed to, for example, prefrontal N-methyl-D-aspartate receptor (NMDA) or gamma-Aminobutyric acid (GABA) signaling, the proper functioning of which are certainly also vital to working memory and other prefrontal functions—is that these systems appear to be implicated in multiple disorders involving prefrontal dysfunction. Here we briefly canvas the literature linking prefrontal catecholamines to Parkinson's, schizophrenia, and ADHD, before turning to studies of their contribution to normal cognition in humans.

The loss of DA neurons in Parkinson's disease produces cognitive deficits in addition to the more outwardly apparent motor symptoms (Lees and Smith, 1983; Taylor et al., 1986; Morris et al., 1988; Owen et al., 1992, 1993; Postle et al., 1997). It seems likely that at least some of these cognitive effects are directly due to a loss of DA-ergic input to PFC, and can thus provide insight into the normal contribution of DA to these functions. Accordingly, multiple studies use the withdrawal of L-dopa or other dopaminergic medications in Parkinson's patients to evaluate the effect of reduced DA signaling on various cognitive tasks (**Table 2**). Results generally indicate impaired spatial working memory in the absence of sufficient DA (Lange et al., 1992; Mattay et al., 2002). They also confirm findings suggesting that increased prefrontal activity, measured with fMRI or blood flow, may reflect less efficient processing in these tasks, showing greater dlPFC activation in the hypo-DA-ergic state (Cools et al., 2002), and a correlation between increases in PFC activity and error rates on the working memory task (Mattay et al., 2002). Interestingly, in early Parkinson's disease patients DA loss is more pronounced in specific anatomical regions, with dramatic DA depletion in the putamen and dorsal caudate, while DA levels in the ventral striatum are relatively spared (Kish et al., 1988; Agid et al., 1993). These regions of the basal ganglia also differ in their prefrontal connectivity, the dorsal regions forming a circuit with dlPFC while the ventral striatum is connected to orbitofrontal cortex (Alexander et al., 1986). The consequences of this segregation and differential susceptibility to Parkinson's-induced DA losses can be seen in the effect of medication withdrawal on two tasks selected to differentially engage the dlPFC and the orbitofrontal cortex (Dias et al., 1996; Cools et al., 2001). Performance on task-set switching, which is thought to depend on dlPFC and parietal circuits, was impaired following medication withdrawal; in contrast, patients' performance on a reversal learning which depends upon orbitofrontal cortex actually improved when off of medication. This reinforces the notion of an optimal level of DA signaling: when disease-induced DA depletion affects circuits to different degrees, medication that increases DA globally and optimizes the level in one circuit may produce above-optimal levels in other areas, with corresponding behavioral deficits.

DA is also implicated in the etiology of schizophrenia (origins of this idea reviewed in Baumeister and Francis, 2002). Although the "dopamine hypothesis" of schizophrenia has existed for decades, development of theoretical frameworks to link the pharmacological and neurobiological findings to the phenomenology of the disorder is ongoing, for example the aberrant salience theory of psychosis (Kapur, 2003; Kapur et al., 2005). Clinically effective antipsychotics appear to primarily target the D2 receptor (Seeman and Lee, 1975), and hyperstimulation of subcortical D2Rs is still considered a likely cause of the positive symptoms of the disorder; in contrast, a cortical, and specifically prefrontal, DA deficit may contribute to the cognitive symptoms (Abi-Dargham, 2004; Guillin et al., 2007). Associations have been found between schizophrenia and genetic variations in DA receptors (Glatt et al., 2003; Jönsson et al., 2004), and the COMT gene discussed below (Egan et al., 2001, reviewed in Harrison and Weinberger, 2005). COMT genotype has also been associated with the ability of antipsychotics to improve working memory performance (Weickert et al., 2004). (However, DA is not the only neurochemical system genetically linked to schizophrenia—see Mowry and Gratten, 2013). Schizophrenic patients display deficits in working memory tasks (Park and Holzman, 1992; Fleming et al., 1995; Morice and Delahunty, 1996; Keefe et al., 1997), and neurocognitive deficits have been shown to predict clinical outcomes (Green, 1996). Patients also show abnormal, typically excessive, PFC activation during these tasks (Manoach et al., 1999; Callicott, 2000; Barch et al., 2001; Perlstein et al., 2001). The laminar distribution of DA-ergic innervation of PFC appears altered (Akil et al., 1999), and there is some evidence for changes in prefrontal D1R density (Abi-Dargham et al., 2002, 2012)—although the absence of such effects in postmortem studies may indicate that expression levels are normalized by medication (Laruelle et al., 1990; Meador-Woodruff et al., 1997).

ADHD is one of the most common psychiatric disorders, affecting ∼3–7% of the US population. Clinically, ADHD is characterized by inattention, impulsivity, and hyperactivity (American Psychiatric Association, 2013). In laboratory settings, ADHD patients' inattention and impulsivity lead to deficits in tasks measuring spatial attention (Friedman-Hill et al., 2010), working memory (Alderson et al., 2013), and oculomotor response inhibition (Rommelse et al., 2008; Goto et al., 2010). These cognitive tasks have long been linked to prefrontal function (D'Esposito and Postle, 1999; Miller, 2000). Given this link, it is unsurprising that patients with ADHD show structural and functional differences in prefrontal size, projection strength, resting connectivity, and activity during cognitive tasks (Seidman et al., 2005; Arnsten, 2006; Kieling et al., 2008). Several lines of evidence more specifically implicate prefrontal catecholamine function as an underlying cause and potential therapeutic target. Genetic linkage studies confirm potential contributions of both DA and NE to the disorder (reviewed in Gizer et al., 2009). Associated genes include DA receptors D1, D4, and D5 (Sunohara et al., 2000; Tahir et al., 2000; Kustanovich et al., 2004; Bobb et al., 2005; Mill et al., 2006; Wu et al., 2012), the DA transporter (DAT) (Durston et al., 2005; Mill et al., 2006), the NE transporter, the NE alpha-2A receptor (Xu et al., 2001; Roman et al., 2003), and DA beta-hydroxylase, an enzyme which coverts DA to NE (Daly et al., 1999; Roman et al., 2002; Kopecková et al., 2006). Many of the medications currently prescribed to treat ADHD alter catecholamine transmission (Arnsten, 2009). Stimulants such as amphetamine, lisdexamphetamine, and methylphenidrate block both DA and NE transporters. In rats, methylphenidrate (Ritalin®) has been shown to increase DA and NE release, particularly in the PFC (Berridge et al., 2006), and improve performance on a delayed alternation task used to assess prefrontal function in rodents. These performance benefits were blocked by co-administration of either an alpha-2A or D1R antagonist, neither of which impaired performance in isolation, suggesting that both DA-ergic and noradrenergic signaling contribute to the methylphenidate's cognitive effects (Arnsten and Dudley, 2005). Atomoxetine blocks the NE transporter, producing increases in both NE and DA in the PFC (Bymaster et al., 2002), while guanfacine is an alpha-2A receptor agonist.

Numerous studies have examined dopamine's contribution to cognitive performance by administering various DA agonists or antagonists to healthy volunteers (see **Table 2**). Unfortunately there is no D1R-selective drug available for use in humans; D1R effects have had to be inferred by comparing the effects of mixed agonists to those of D2R selective agents. A number of studies have reported the ability of DA-ergic drugs to alter performance on spatial working memory or delayed response tasks, although the studies' findings differ with respect to the relative contribution of D1Rs and D2Rs (Luciana et al., 1998; Müller et al., 1998) and whether the effects are limited to spatial working memory or apply to a broader range of memory and attention tasks (Luciana et al., 1998; Kimberg and D'Esposito, 2003). Some of this variability is probably attributable to an interaction between drug action and subjects' baseline DA-ergic tone (see discussion of the "inverted-U" action of DA above). Indeed, the action of these drugs in healthy volunteers has been shown to depend on their baseline working memory capacity (Kimberg and D'Esposito, 2003; Mattay et al., 2003). It may even depend on the subject's recent behavior: training on a working memory task, half an hour a day for 5 weeks, is sufficient to improve capacity measurements and decrease prefrontal D1R binding potential, suggesting



*(Continued)*

#### **Table 2 | Continued**


*Studies are grouped based on methodology: drug administration, PET, effects of genetic polymorphisms, and medication withdrawal in Parkinson's patients. Abbreviations and drug actions: DA, dopamine; bromocriptine, a D2 agonist; pergolide, an agonist for both D1 and D2 receptors; haloperidol, non-specific DA agonist; methylphenidate, amphetamine, and dextroamphetamine: stimulants producing an increase in PFC DA and NE release; sulpiride, D2 antagonist; guanfacine, alpha-2A agonist; clonidine, alpha-2 agonist; SCH23390, D1 receptor antagonist; DAT1, dopamine transporter gene; COMT, catechol-O-methyltransferase gene; 5-HTT, serotonin transporter gene; DARPP-32, dopamine- and cAMP-regulated neuronal phosphoprotein gene; DRD2, dopamine receptor D2 gene; WM, working memory; PET, positron emission tomography; dlPFC, dorsolateral prefrontal cortex.*

DA receptor expression may be modulated by the demands of habitual tasks (McNab et al., 2009).

Genetic polymorphisms related to DA processing or signaling have also been linked to cognitive phenotypes, in both neurotypical and patient populations. One of the most extensively studied is a polymorphism in the catechol-O-methyltransferase (COMT) gene. COMT is an enzyme that breaks down DA following synaptic release; its activity is especially important for determining DA levels in the PFC, which has comparatively few DATs (Gogos et al., 1998). A common polymorphism producing a valine-to-methionine substitution alters enzyme activity: the Val-allele has higher enzymatic activity, presumably reducing prefrontal DA levels, while the Met-allele has lower activity, theoretically resulting in higher basal DA (Chen et al., 2004); however these presumed effects of COMT genotype on basal PFC DA levels have never been directly verified in humans. It should also be noted that the effects of many DA-related polymorphisms on working memory may be mediated by the striatum in addition to the PFC (Cools et al., 2008). Met-allele homozygotes show lower prefrontal activity during an n-back working memory task than heterozygotes, who in turn have lower prefrontal activation than Val-allele homozygotes (Egan et al., 2001). Amphetamine, which like other stimulants causes release of DA and NE in PFC (Kuczenski and Segal, 1992; Moghaddam et al., 1993; Berridge et al., 2006; Narendran et al., 2014), reduces prefrontal activity during the 3-back task in Val homozygotes, while increasing prefrontal activity and impairing performance for Met homozygotes on the same task (Mattay et al., 2003). These results are consistent with an inverted-U relationship between prefrontal DA levels and function, where Val homozygotes have slightly sub-optimal basal DA levels due to their increased enzymatic breakdown of DA, while Met homozygotes have higher basal DA levels, such that the additional DA release following amphetamine administration is detrimental to PFC function. Interestingly, Val-allele homozygotes show more perseverative errors on the Wisconsin card-sorting task, but no overall differences in working memory performance or other cognitive measures (Egan et al., 2001; Mattay et al., 2003; Zilles et al., 2012); this absence of baseline differences in working memory based on COMT genotype suggests compensatory changes in other aspects of DA signaling (although see Goldberg et al., 2003). The effects of COMT genotype on prefrontal activity during working memory have been shown to interact additively with another polymorphism, a variable number tandem repeat polymorphisms identified in the 3 untranslated region of the DAT gene (Bertolino et al., 2006; Caldú et al., 2007).

Performance in attention and working memory tasks is impaired in ADHD, Parkinson's disease, and schizophrenia, as well as under stress or in normal aging. Considering the evidence for a contribution of prefrontal catecholamines to these cognitive functions, imbalance in the prefrontal level of these neuromodulators has long been a suspected cause of the cognitive impairments observed in these disorders. More recently, genetic association studies have demonstrated links between prefrontal catecholamines and the etiology of these diseases, as well as how patients respond to treatment. Despite numerous studies examining the link between prefrontal DA or NE and cognitive function in these disorders, we are still far from treatments that fully restore cognitive function. This gap may be partly due to individual variation in the underlying pathology, but also partly as a result of our own incomplete understanding of the neural mechanisms underlying normal cognitive function. Even in cases where the mechanisms are well understood, clinically we lack the means to target specific anatomical or chemical subsets of neurons. However, basic research on the mechanisms of prefrontal function has produced some therapeutic advances, e.g., the introduction of guanfacine for the treatment of ADHD patients, and a more complete understanding of how prefrontal catecholamine signaling underlies cognition may produce further clinical applications.

#### **CONCLUSIONS AND FUTURE DIRECTIONS**

The link between prefrontal catecholamines and cognitive deficits in multiple neurological disorders makes understanding their role in prefrontal function particularly critical. While much progress has been made in elucidating the role of prefrontal catecholamines' role in cognitive function, crucial questions still remain. Is the effect of prefrontal DA mediated entirely via reward expectation, or do basal PFC DA levels modulate working memory and attention performance in a manner dissociable from upcoming rewards? Do PFC DA levels fluctuate significantly over the course of attention and working memory tasks, and do these fast changes in DA signaling contribute to behavioral performance? Although the true "neural mechanism" of working memory maintenance or covert attentional deployment is the pattern of task-related neural activity, driven by spatially tuned glutamatergic and GABAergic responses, these population dynamics are enabled by appropriate DA and NE "tone" within these prefrontal circuits; whether more temporally or spatially localized changes in catecholamine signaling also contribute to task performance (Chadderdon and Sporns, 2006) remains uncertain. More reliable, temporally precise and continuous measures of local DA levels would be an important first step in addressing these questions.

#### **REFERENCES**


working memory updating. *Proc. Natl. Acad. Sci. U.S.A.* 109, 19900–19909. doi: 10.1073/pnas.1116727109


prefrontal cortex of schizophrenics and controls. *Schizophr. Res.*3, 30–31. doi: 10.1016/0920-9964(90)90097-Q


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 January 2014; accepted: 19 March 2014; published online: 08 April 2014. Citation: Clark KL and Noudoost B (2014) The role of prefrontal catecholamines in attention and working memory. Front. Neural Circuits 8:33. doi: 10.3389/fncir. 2014.00033*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Clark and Noudoost. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## mRNA expression profile of serotonin receptor subtypes and distribution of serotonergic terminations in marmoset brain

## *Rammohan Shukla1,2, Akiya Watakabe1,2 and Tetsuo Yamamori 1,2\**

*<sup>1</sup> Division of Brain Biology, National Institute for Basic Biology, Okazaki, Japan*

*<sup>2</sup> Department of Basic Biology, Graduate University for Advanced Studies (SOKENDAI), Okazaki, Japan*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*Guadalupe Mengod, IIBB-CSIC-IDIBAPS-CIBERNED, Spain Etienne Sibille, University of Pittsburg, USA*

#### *\*Correspondence:*

*Tetsuo Yamamori, Division of Brain Biology, National Institute for Basic Biology, 38 Nishigonaka, Myodaiji, Okazaki, Aichi, 444-8585, Japan e-mail: yamamori@nibb.ac.jp*

To better understand serotonin function in the primate brain, we examined the mRNA expression patterns of all the 13 members of the serotonin receptor (*5HTR*) family, by *in situ* hybridization (ISH) and the distribution of serotonergic terminations by serotonin transporter (SERT) protein immunohistochemical analysis. Ten of the 13 *5HTR*s showed significant mRNA expressions in the marmoset brain. Our study shows several new features of the organization of serotonergic systems in the marmoset brain. (1) The thalamus expressed only a limited number of receptor subtypes compared with the cortex, hippocampus, and other subcortical regions. (2) In the cortex, there are layer-selective and area-selective mRNA expressions of *5HTR*s. (3) Highly localized mRNA expressions of *5HT1F* and *5HT3A* were observed. (4) There was a conspicuous overlap of the mRNA expressions of receptor subtypes known to have somatodendritic localization of receptor proteins with dense serotonergic terminations in the visual cortex, the central lateral (CL) nucleus of the thalamus, the presubiculum, and the medial mammillary nucleus of the hypothalamus. This suggests a high correlation between serotonin availability and receptor expression at these locations. (5) The *5HTR*s show differences in mRNA expression pattern between the marmoset and mouse cortices whereas the patterns of both the species were much similar in the hippocampus. We discuss the possible roles of *5HTR*s in the marmoset brain revealed by the analysis of their overall mRNA expression patterns.

#### **Keywords: mRNA expression, serotonin receptors, SERT, marmoset, mouse, comparison**

## **INTRODUCTION**

Serotonin is an important neurotransmitter with multiple neuromodulatory functions in the central nervous sytem (CNS) (Millan et al., 2008; Lesch and Waider, 2012). Its receptors consist of 13 genetically, pharmacologically, and functionally distinct subtypes belonging to seven subfamilies (Alexander et al., 2011). All serotonergic receptors (*5HTR*s) are metabotropic G-coupled proteins except for *5HT3A*, which is ionotropic. Serotonergic innervations in mammalian CNS originate from the median and dorsal raphe nuclei of the mesencephalon (Moore et al., 1978; Bowker et al., 1983). Previous studies demonstrate that the termination patterns in mammalian subcortical regions are very similar across species (for thalamus see Lavoie and Parent, 1991 for basal ganglia see Lavoie and Parent, 1990 and Wallman et al., 2011). The difference in serotonin-dependent modulation among species therefore depends largely on the receptor type present in each locus.

To date, the distribution of serotonin and its receptors has been examined by immunohistochemical analysis, receptor ligand autoradiograpy, and *in situ* hybridization (ISH) in rodents (Mengod et al., 1996), nonhuman primates (Lidow et al., 1989; Hornung et al., 1990; Wilson and Molliver, 1991), and humans (Burnet et al., 1995; Raghanti et al., 2008). The detailed mRNA expression profiles of all the serotonin receptor genes in mice (Lein et al., 2007) and for some brain areas in human (Shen et al., 2012) are now publicly available in the Allen Brain Atlas (ABA) (ABA, 2009, 2012). Our previous study has shown that *5HT1B* and *5HT2A* are abundant in the visual cortex of macaque monkeys but not in rodents (Watakabe et al., 2009). This species difference demonstrates the importance of exploring the expression profiles of serotonin and its receptors in primates. In view of the heterogeneity of serotonin receptor subtypes, we wanted to obtain an integrated view of serotonergic modulation in primates by compiling the expression profiles of all the subtypes along with the termination pattern of serotonergic projections in the primate, which may contribute to an understanding of serotonin function in the primate brain.

For this purpose, we chose the common marmoset (*Callithrix jucchus*), a species of small New World monkey, that has attracted the interest of many biomedical researchers because of small size and ease of breeding (Mansfield, 2003). Moreover, the marmoset is the only nonhuman primate that can be used for generating germline-transmitted transgenic lines (Sasaki et al., 2009). In this study, we examined the mRNA expression profiles of all the known serotonin receptor subtypes by (1) ISH of *5HTR*s and (2) the serotonergic projection pattern by immunohistochemical analysis of the serotonin transporter (SERT) in various brain regions of the marmoset. Here, we discuss the differences and similarities of ISH patterns between some of the mouse and marmoset brain areas and publically available human data set by ABA (Shen et al., 2012).

Serotonergic terminations were particularly pronounced in the primary visual cortex (V1), the central lateral (CL) nucleus of the thalamus, the presubiculum, and the mammillary nucleus (MM) of the hypothalamus, where terminations overlapped with the abundant expressions of selected *5HTR* subtypes. Overall, when compared with mice, the serotonin receptor expression patterns in the marmoset brain were largely different in cortex but similar in hippocampus. The thalamus, which gates sensory information (Monckton and McCormick, 2002; Min, 2010), showed less receptor diversity than the cortex and hippocampus, which integrate sensory information.

## **MATERIALS AND METHODS**

### **ETHICS STATEMENT**

All the experiments were conducted in accordance with the guidelines of the National Institutes of Health, and the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, and were approved by the Animal Care and Use Committee in the National Institutes of Natural Sciences. We made all efforts to minimize the number of animals used and their suffering.

#### **EXPERIMENTAL ANIMAL, TISSUE PREPARATION, AND SECTIONING**

Five brains of the adult common marmoset (*Callithrix jucchus*) (Two male: 2 years 6 months, and 3 years 5 months; Three female: ages-1 year 9 month, 2 years, and, 2 years 1 month) were used for confirmation of the mRNA expression patterns and their reproducibility. To avoid any chance of ambiguity owing to technical issues, the data presented in this paper are collected from the 6 years 2 months old, female marmoset monkey. We observed no individual difference in mRNA expression patterns. For tissue fixation, the animal was deeply anesthetized with Nembutal (100 mg/kg body weight, intraperitoneally) and perfused intracardially with saline (0.9% NaCl) and then with 4% paraformaldehyde in 0.1 M phosphate buffer. The brains were post-fixed for 5 h at room temperature and then cryoprotected with 30% sucrose in 0.1 M phosphate buffer at 4◦C. The two hemispheres were sectioned separately, and approximately 600 coronal sections of 40 μm thickness encompassing the regions from the frontal cortex to the tectum were prepared from each hemisphere. All 13 serotonin receptor genes (**Table 1**) were examined for their expression patterns using an ISH technique. Two sets of tissue sections were immunohistochemically stained for SERT and nissl stained for laminar identification. For mice, data was collected from 3 male (46 weeks) and 2 female (42 and 35 weeks) B6 mice. The presubiculum, which showed expression of *5HT1F* (see results), could be best visualized by the sagittal sections of the mice brain, therefore we prepared sagittal sections of the mice brain. Because the visual (VIS), somatosensory (SS), and somatomotor (MO) areas cover the major part of the mouse brain and have analogous areas in the marmoset brain, these areas were selected for comparison between the mouse and marmoset brains.

## **ISH**

Both the sense and antisense digoxigenin (DIG)-labeled riboprobes used in this study were prepared from plasmids containing PCR-amplified fragments of marmoset *5HTR*s, histidine decarboxylase (*HDC*) and *GAD67* genes. For *VgluT1*, riboprobes previously used for monkey ISH were used (Komatsu et al., 2005). To confirm the specificity of the antisense probes, the sense probes were used as the control in all the experiments. Details of the probes designed for the marmoset are shown in **Table 1** and those for the mouse are shown in Table S1. Single and double-colored ISH were performed using the methods described in the papers of our group (Watakabe et al., 2007, 2009; Takaji et al., 2009). Briefly, free-floating sections were treated with proteinase K (5μg/mL) for 30 min at 37◦C, acetylated, then incubated in a hybridization buffer [5X SSC, 2% blocking regent (Roche Diagnostics, Basel, Switzerland), 50% formamide, 0.1% N-lauroylsarcosine, 0.1% SDS] containing 0.5μg/mL DIGlabeled riboprobes at 65◦C for *5HT3A* receptor gene and 60◦C for the others. The sections were sequentially treated in 2XSSC/50% formamide/0.1% N-lauroylsarcosine for 15 min at 60◦C twice, 30 min at 37◦C in RNase buffer [10 mM Tris-HCl (pH 8.0), 1 mM ethylenediaminetetraacetic acid (EDTA), 500 mM NaCl] containing 20μg/mL RNase A (Sigma Aldrich, Saint Louis, MI), 15 min at 37◦C in 2XSSC/0.1% N-lauroylsarcosine twice, and 15 min at 37◦C in 0.23 SSC/0.1% N-lauroylsarcosine twice. The hybridization probe was detected with an alkaline-phosphatase conjugated anti-DIG antibody using DIG nucleic acid detection kit (Roche Diagnostics).

For double-colored ISH, the sections were cut to 15 or 20μm thickness. The hybridization and washing were carried out as described above, except that both DIG- and fluoresceinlabeled probes were used for the hybridization. After blocking in 1% blocking buffer (Roche Diagnostics) for 1 h, the probes were detected in two different ways. For the detection of fluorescein probes, the sections were incubated with an anti-fluorescein antibody conjugated with horseradish peroxidase (Jackson ImmunoResearch Laboratories, West Grove, PA: #200-032-037, 1:4000 in the blocking buffer) for 3 h at room temperature. After washing in TNT buffer [0.1 M Tris-HCl (pH 7.5), 0.15 M NaCl, 0.1% Tween20] 3 times for 15 min, the sections were treated with 1:100 diluted TSA-Plus reagents (Perkin Elmer, Boston, MA) for 30 min following the manufacturer's instruction, and the fluorescein signals were converted to dinitrophenol (DNP) signals. After washing with TNT buffer 3 times for 10 min, the sections were incubated overnight at 4◦C with an anti-DNP antibody conjugated with Alexa 488 (1:500, Molecular Probes, Life Technologies Corporation, Carlsbad, CA) in 1% blocking buffer for the fluorescence detection of the DNP signals. At this point, an anti-DIG antibody conjugated with alkaline phosphatase (1:1000, Roche Diagnostics) was also incubated for the detection of the DIG probes. The sections were washed 3 times in TNT buffer, once in TS 8.0 [0.1 M Tris-HCl (pH 8.0), 0.1 M NaCl, 50 mM MgCl2], and the alkaline phosphatase activity was detected using HNPP fluorescence detection kit (Roche Diagnostics) following the manufacturer's instruction. This substrate was incubated for 30 min and the incubation was stopped in PBS containing 10 mM EDTA.


#### **Table 1 | Summary of ISH probes for 13 serotonin receptor genes,** *HDC* **and** *GAD67* **in the marmoset.**

*Note that owing to unavailability of the marmoset-specific 5HT1E sequence in the public database, the 5HT1E primers were designed using the macaque 5HT1E sequence. The hybridization temperature for 5HT3A was 65*◦*C and that for others was 60*◦*C. The amplicon includes the primer sequence. F indicates forward and R indicates reverse.*

### **SERT IMMUNOHISTOCHEMISTRY**

Immunohistochemical analysis was conducted essentially in accordance with the protocol previously reported (Sakata et al., 2002). Briefly, we used antisera raised against SERT (1:12000) as primary antibodies and biotinylated goat anti-rabbit IgG (1:1000) as secondary antibodies (all supplied by Immunostar, Inc., USA). The free-floating sections were incubated consecutively in PBS containing 1% H2O2 for 10 min at room temperature, and then in PBS with 0.2% Triton X-100 (PBST) and 5% normal goat serum (serum of the species of the secondary antibody) for 60 min at room temperature. This was followed by overnight incubation in a buffer containing 1% normal goat serum and the primary antibody at 4◦C. After incubation with the biotinylated secondary antiserum for 2 h at room temperature, the sections were processed with an avidin-biotinylated horseradish peroxidase complex (1:200; Vectastain ABC Elite kit, Vector Laboratories, Burlingame, CA, USA) in PBST at room temperature for 1 h and the immunoreaction was visualized by staining with nickel-enhanced coloring solution (0.2 mg/mL

diaminobenzidine: DAB, 0.03% H2O2, 0.03% nickel chloride in TBS).

## **DATA QUANTIFICATION**

Representative areas and regions were identified by referring to the stereotaxic atlas of the marmoset brain (Palazzi and Bordier, 2008; Yuasa et al., 2010; Paxinos et al., 2011) and Nissl staining. The intensity of hybridization signals of different genes varied across different areas of the brain. We present the intensity of the signals as mRNA expression level rated as very low (+), low (++), moderately high (+++), or high (+ + ++) by visual inspection (**Tables 2, 3**). To show the weak signals, the images were adjusted to different contrast levels. In some instances, this enhanced the noise from the adjacent white matter. The true signals based on size and color can be clearly differentiated from the noise (see Figures S8A–D). Because DIG based ISH provides cellular resolution, we also distinguished dense and disperse expression profiles for relevant regions. To provide a more objective comparison of the laminar distribution of expression between the mouse and


#### **Table 2 | Arbitrary values assigned for different levels of expression in cortical brain areas.**

*(Continued)*


#### **Table 2 | Continued**

+ + ++*, high;* +++*, moderately high;* ++*, low; and* +*, very low levels of expression.* ± *was assigned to areas of uncertain level of expression. The superscripts "S" and "VS" denote sparse and very sparse expressions, respectively. The numbers from 1 to 6 denote the layers of the cortex. The abbreviations of the cortical areas are the same as those mentioned in the main text.*

marmoset cortices, we analyzed the optical densities of ISH signals using imageJ image analysis software (Abramoff et al., 2004) (Figures S7A–C). After making the contrast level the same for all images of the same gene, individual images were inverted and optical density was measured using the straight line tool that sampled all layers of the cortex. To subtract the background noise, the optical density of either layer I or white matter (the region where there was no expression above background level) was taken as the control.

#### **RESULTS**

We examined the mRNA expression patterns of all 13 known serotonin receptor subtypes. We found significant expressions of 10 of them; we were unable to detect the expressions of *5HT1D*, *5HT3B*, and *5HT5A* mRNAs in the marmoset brains examined. *5HT3A* mRNA was exclusively expressed in the CA fields of the hippocampus. *5HT1F* mRNA was expressed only in layer VI of V1, the presubiculum, and the lateral mammillary body (LM) of the hypothalamus. In general, the expression patterns of all the genes differed in both the intensity and density of ISH signals throughout the marmoset brain. Most of the examined nuclei showed overlapping expressions of multiple *5HTR* subtypes. In the cerebral cortex, most subtypes of *5HTR* were expressed, whereas we found only limited *5HTR* subtypes in the thalamus. The termination pattern obtained by SERT immunohistochemical analysis in our study was similar to those obtained in previous studies of marmosets (Hornung et al., 1990; Hornung and Celio, 1992) and squirrel monkeys (Lavoie and Parent, 1991). Below, we first describe the patterns of expression of *5HTR* mRNAs, across cortical areas. We then describe their expression patterns in the hippocampus, thalamus, superior colliculus, hypothalamus, amygdala, striatum, and substantia nigra. We also compared anti-SERT immunoreactivity with *5HTR* mRNA expression profiles.

#### **SEROTONIN RECEPTOR mRNA EXPRESSION IN CORTICAL AREAS**

To examine the expression profiles in the association and sensory areas of different lobes of the cortex in the rostrocaudal axis, we examined areas 46 and 6, the primary motor cortex (M1), the primary somatosensory cortex (S1), the inferotemporal gyrus (ITG), area V5 (MT), the temporal cortex (TE), the primary visual cortex (V1), and the secondary visual cortex (V2). Besides these six-layered areas, we also examined the cingulate (CG) cortex and entorhinal cortex (Er) of four-layered areas. In these cortical areas, nine of the ten serotonin receptor genes (i.e. excluding *5HT3A*) were expressed. We noted that several *5HTR* subtypes exhibited gradients in expression profiles in the sensory and association areas. The most conspicuous example was the V1-V2 border (**Figures 3A–F**), which has the most differentiated architecture of the primate cortex. *5HT2A*, a gene abundantly expressed in the middle layer, also showed a marked difference in mRNA expression level between S1 and M1 (**Figure 1**, c5, d5).

Despite such differences in mRNA expression level between areas, a few *5HTR* subtypes exhibited similarities in their laminar expressions across areas when compared with their expression in the upper, middle, and lower layers. In addition, a few *5HTR*s showed sporadic expression across the cortex. *5HT1A*, *5HT6*, *5HT1E*, and *5HT4* were all generally expressed in the upper layers irrespective of the area (**Figures 1**, **2**, see a1–4 to k1–4). This group of genes shared several similar characteristic features in their expression profiles. Compared with *5HT1A* and *5HT6*, both *5HT1E* and *5HT4* were less abundant in layer II. To test our hypothesis of dense expression in excitatory neurons and sparse expression in inhibitory neurons we performed the double **Table 3 | Arbitrary values assigned for different levels of expression in subcortical brain areas.**


+ + ++*, high;* +++*, moderately high;* ++*, low;* +*, very low levels of expression.* ± *was assigned to areas of uncertain level of expression. The superscript "S" denotes sparse expression.*

hybridization of *5HT1A*, *5HT1E*, *5HT4*, and *5HT6* using excitatory (*VgluT1*) and inhibitory (*GAD67*) neuronal markers in V1. Indeed, our results indicated the presence of *5HT1A* and *5HT6* in excitatory neurons and that of *5HT4* in inhibitory neurons (**Figure 4**). We were unable to obtain signals for *5HT1E* using either of the markers. In the frontal (areas 46 and 6) and temporal (ITG and TE) association areas, *5HT1A* and *5HT6* were expressed from layers II through V, but their mRNA expression levels in layer IV of ITG and TE were much lower. In contrast to the wide-spread expression in the association areas, in early

are indicated on the left. Note that all images of a given gene are grouped together and presented at the same contrast level. Scale bar: 100μm.

Layers identified by Nissl staining (not shown) are indicated on the left. Note that all images of a given gene are grouped together and presented at the same contrast level. Scale bar: 100μm.

sensory areas, such as S1, V1, and V2, their expression was mostly limited to layer II. The area difference was conspicuous for *5HT1A* and *5HT6* but not for *5HT1E* and *5HT4*.

*5HT2A* mRNA was expressed at various levels from layers III to V throughout the neocortical areas. Its expression was more abundant in lower tiers of layer III and relatively sparse in layers IV and V. *5HT2C* was expressed sparsely in layers II and V. Although *5HT2A* and *5HT2C* expressions overlapped in layer V, they generally exhibited opposite patterns of layer and area distributions: *5HT2A* was highly expressed in V1 whereas *5HT2C* showed a gradient in expression from being rostrally high to caudally low and was almost undetectable in V1 and V2. In the entorhinal cortex, both the genes were expressed complementarily; unlike in other areas, *5HT2A* was present in layer II

and *VgluT1* neuronal markers in marmoset V1. *5HT1A* in layers II **(A)** and IVcβ **(B)**, *5HT6* in layer II **(E)**, and *5HT1F* in layer VI **(C)** were not expressed in

cells but not in *VgluT1*-positive excitatory cells. The arrow heads indicate the positive signals and coexpressions. Scale bar, 50μm.

and lower layers V and VI (**Figure 2**, k5), whereas *5HT2C* was expressed in layers I and III (**Figure 2**, k6) where *5HT2A* was little expressed. We performed double hybridization of *5HT2A* with *GAD67* and *VgluT1* neuronal markers in V1. Because the expression of *5HT2C* was scant in V1, we performed its double hybridization in sections from the frontal cortex and observed layer V encompassing all areas of the frontal cortex covered in the section. *5HT2A* was mainly expressed in *VgluT1* positive excitatory neurons (**Figure 5**), and almost all the cells

expressing *5HT2C* were positive for *GAD67* inhibitory neurons (**Figure 5**).

The expression levels of *5HT1B*, *5HT1F*, and *5HT7* mRNAs were low throughout the neocortical areas. However, *5HT1B* mRNA was abundantly expressed in V1 (**Figures 2**, h7 and **3D**) and significantly in V2 (**Figure 2**, i7); a higher intensity of *5HT1F* mRNA signals was observed in layer VI of V1 (**Figures 2**, h9 and **3E**) and *5HT7* mRNA was expressed at a moderately high level in layer IV of area ITG (**Figure 2**, f8). Note that the increase in the

**FIGURE 5 | Double ISH of** *5HT2C* **and** *5HT2A* **(red, DIG) with** *GAD67* **and** *VgluT1* **neuronal markers (green, FITC).** *5HT2A* with *GAD67* in layer III of V1 **(A)**, *5HT2A* with *VgluT1* in layer III of V1 **(B)**, *5HT2C* with GAD67 in layer V of frontal cortex **(C)** and *5HT2C* with *VgluT1* in layer V of frontal cortex **(D)**.

The arrows indicate the positive signals and coexpressions. Scale bar, 50 μm. Note that the density of *VgluT1* positive excitatory neuron we observed in layer V is less than other layers **(D)**, which is consistent with the result shown in another report (Gittins and Harrison, 2004).

expression level of *5HT7* overlapped with the enhanced serotonergic terminations at ITG (**Figure 3G**). *5HT1B* was also sparsely expressed in layer V of M1 (**Figure 1**, c7) and CG (**Figure 2**, j7). In the entorhinal cortex, *5HT1B* and *5HT7* showed similar expression patterns, that is, highly expressed in layer II and moderately expressed in lower layers.

## **MARMOSET V1 IS CHARACTERIZED BY SEROTONERGIC PROJECTIONS AND EXPRESSION OF A GROUP OF 5HTR SUBTYPES**

*5HT1B* and *5HT2A* showed high expression levels selectively in V1 and *5HT1A* and *5HT1F* were specifically expressed in V1 (**Figure 3**). The high expression levels of *5HT1B* and *5HT2A* in V1 were previously reported in macaques (Watakabe et al., 2009), and marmosets (Takahata et al., 2012). In the present study, we found a relatively low level thin band like pattern of expression of *5HT1A* in layer IV Cβ (**Figure 3C**), which differed from that of macaques and the expression level of *5HT1F* was moderate to high in layer VI (**Figure 3E**), which was observed to be very low in macaques. When examined by double ISH with excitatory *VgluT1* or inhibitory *GAD67* neuronal marker probes, both *5HT1A* and *5HT1F* were found to be exclusively expressed in

excitatory neurons (**Figures 4B,C**). We also observed that serotonergic projections were dense in layers IV and VI (**Figure 3B** and Figure S1A), where these four subtypes were expressed. The expressions of *5HT1A*, *5HT1B*, and *5HT2A* overlapped with highly dense serotonergic terminations in layer IV and that of *5HT1F* overlapped with moderately dense terminations in layer VI (**Figure 3B**). The expressions of the four genes and the serotonin terminations formed sharp boundaries between V1 and V2 (**Figures 3A–F**).

### **SEROTONIN RECEPTOR mRNA EXPRESSIONS IN HIPPOCAMPUS**

The hippocampal region consists of the dentate gyrus (DG), CA fields, and subiculum (S) (**Figure 6**). It was densely innervated by serotonergic terminals in the areas with no receptor expression and stratum lacunosum moleculare (Slm) (**Figure 6K**). Interestingly, the expressions of *5HTR* mRNAs in the hippocampus were highly subregion-specific. *5HT1A*, *5HT6*, *5HT1E*, and *5HT4* mRNAs, which are expressed in the cortical upper layer, were all abundantly expressed in the DG and pyramidal cell layer from CA3 to CA1. Among them, *5HT1A* mRNA showed particularly prominent expression throughout these structures, whereas

mRNA expressions **(A**–**J)** and immunohistochemical staining with anti-SERT antibody **(K)** in CA1 and CA3 fields, dentate gyrus (DG), presubiculum (PS), subiculum (S), and stratum lacunosum moleculare (Slm) of hippocampal

corresponding similarity of expressions and innervations in the mouse (see Figures S5C,D,F). Images are adjusted at contrasts that show the clearest image for each *5HTR*. Scale bar, 200μm.

the other *5HTR* mRNAs exhibited relatively weak expression in CA3.

In contrast to this group of genes, *5HT2A* and *5HT2C* mRNAs as well as *5HT3A* mRNA exhibited characteristically scattered expressions in the polymorph layer of DG (*5HT2A*) and CA fields (*5HT2C* and *5HT3A*) (**Figures 6E,F,J**). Note that these three mRNAs showed very low expression levels in granule cells, no higher than the expression level of the sense probe, which showed nonspecific faint background staining in DG. Such scattered expression suggests that they are expressed in inhibitory neurons. Indeed, by double ISH we confirmed that the *5HT2C* and *5HT3A* mRNAs in the hippocampus were expressed in a subset of *GAD67*-positive inhibitory neurons (data not shown). The observation that the expression distribution and density differed among *5HT2A*, *5HT2C*, and *5HT3A* mRNAs (Figure S2) suggests that they are expressed in different types of cell.

Despite dense projection by serotonergic terminals, *5HT1F* was the only subtype expressed in the presubiculum above a moderate level. Other receptor types were distributed sparsely and expressed only at low levels (Figure S2, S).

## **SEROTONIN RECEPTOR mRNA EXPRESSION IN THALAMUS, HYPOTHALAMUS, AND AMYGDALA**

Regarding subcortical regions, we examined the thalamus, hypothalamus, amygdala, caudate, septum, ventral striatum, and superior colliculus. Overall, the repertoires of *5HTR* subtypes expressed were quite limited in the thalamus, and as in V1 of the cortex, many regions showed conspicuous overlap between mRNA expression and serotonergic termination as described below.

We examined the expression patterns in a few conspicuous nuclei (as described below) belonging to various groups of the thalamus. Overall, in terms of the number of receptor types expressed, the thalamus showed the least receptor diversity (see **Table 3**). We did not observe the expressions of *5HT1E*, *5HT1F*, *5HT3A*, and *5HT4* in any subnuclei at levels above the background level. The serotonergic terminations into the thalamus were heterogeneous and showed laterally low and medially high gradations (see Figures S1B,E). Both the medial geniculate nucleus (MG) (Figure S3K) and the lateral geniculate nucleus (LG) (Figure S4K) had moderate and heterogeneous serotonergic terminations.

*5HT1A* showed a high level of mRNA expression in the CL nucleus (**Figure 7A**), which overlaps with the dense serotonergic termination in CL (**Figure 7K**, also see Figures S1B,C). In sharp contrast, *5HT1B* showed little expression in CL but was expressed at high levels from nuclei lateral dorsal (LD), ventral lateral (VL), and mediodorsal (MD) cortices to CL, where the *5HT1A* mRNA expression levels were very low to low. *5HT2A* and *5HT2C* were both sparsely expressed in CL and were little expressed from nuclei medial and lateral cortices to CL (**Figures 7E,F**). *5HT2C* was also expressed near the midline thalamic nuclei where the serotonergic projections were dense (Figure S1D,E). *5HT6* and *5HT7* were expressed in CL, VL, LA, and MD from very low to low and from low to moderately high levels, respectively.

The overall expression patterns of all the *5HTR* subtypes were similar in the posterior nuclei including the medial, lateral, and inferior pulviner (Figure S2), medial geniculate nucleus (Figures S2, S3), and ventral posterior nuclei including the ventral posterior lateral (VPL), and ventral posterior medial (VPM) nuclei (Figures S2, S4). In the lateral geniculate nucleus (LG), *5HT1A* and *5HT6* were expressed at very low levels, *5HT7* at a low level (Figure S2), and *5HT1B* at a high level (Figures S2, S4). Finally, in the reticular nucleus (RT), *5HT1B*, *5HT2A*, and *5HT2C* were expressed at moderately high levels and *5HT1A* from very low to low levels (Figure S2).

Within the hypothalamic nuclei, the mammillary nucleus exhibited conspicuous heterogeneity of *5HTR* mRNA expressions

**FIGURE 7 | ISH expression profiles of** *5HTR***s in thalamus.** *5HTR* mRNA expressions **(A–J)** and immunohistochemical staining with anti-SERT antibody **(K)** in central lateral (CL), mediodorsal (MD), lateral dorsal (LD), and ventral lateral (VL) thalamic nuclei. The black arrowheads in **(A)**, **(E)**, and **(F)** show the overlap of *5HT1A* **(A)**, *5HT2A* **(E)**, and *5HT2C* **(F)** expressions with corresponding dense serotonergic projections at CL **(K)** (also see Figure S1), whereas the white arrowheads in **(B)** show the corresponding mismatch between *5HT1B* expression and projections at CL **(K)**. Images are adjusted at contrasts that show the clearest image for each *5HTR*. Scale bar, 200μm.

(**Figure 8**). Such heterogeneity corresponded to the density of serotonergic projections (**Figure 8K**). The medial part of the mammillary nucleus (MM) received denser serotonergic projections than the retro-hypothalamus (RH), lateral hypothalamus (LH) and LM nucleus which lie dorsal, lateral and ventro lateral to MM, respectively (**Figure 8**, reference) The distribution of *5HTR* mRNAs was specific in these regions, which conspicuously overlapped with the serotonergic projections: *5HT2A* and *5HT7* mRNAs were densely expressed in MM but were absent in RH, LH, and LM (**Figures 8E,J**), and *5HT6* mRNA was also more highly expressed in MM, although it was expressed in both RH and MM. In contrast, we observed a moderately high expression level of *5HT1A* mRNA, very low to low expression levels of *5HT1B* mRNA, and a high expression level of *5HT2C* mRNA in RH, LH, and LM but not in MM. *5HT1E*, *5HT3A*, and *5HT4* mRNAs were expressed at insignificant levels.

There was some ambiguity in assigning the localization of *5HT1F* mRNA expression, which was at a high level exclusively in the nucleus lateral to MM, which could be either LM or the ventral tuberomamillary nucleus (VTM) (**Figure 8D**). VTM, which is part of tuberomamillary nucleus (TM), shows the densest population of histaminergic neurons and can be identified using histidine *HDC* as a marker (Ericson et al., 1987; Sakai et al., 2010). *5HT1F* if present in histaminergic neurons can directly modulate the regulation of these neurons. To examine this possibility and locate *5HT1F* expression, we performed ISH of *5HT1F* and *HDC* in adjacent sections (**Figure 9**). Our result shows that *5HT1F* and HDC were expressed in a complementary manner, suggesting that *5HT1F* is expressed exclusively in LM.

The amygdala consists of several subnuclei connected with each other (**Figure 10**). *5HT1F* and *5HT3A* showed no detectable signals above the background in the amygdala. ISH signals of other *5HTR* subtypes were generally observed in most parts of

**FIGURE 8 | ISH expression profiles of** *5HTR***s in hypothalamus.** *5HTR* mRNA expressions **(A**–**J)** and immunohistochemical staining with anti-SERT antibody **(K)** in lateral (ML), medial (MM), and ventral tuberomammillary (VTM) nuclei of hypothalamus. We observed the striking complementary relationship between the *5HT2A* **(E)** and *5HT2C* **(F)**

expressions and overlap of *5HT2A* and *5HT7* expressions with projections at MM. Note that the *5HT1A* **(A)** expression that overlapped with serotonergic innervations in CL (**Figure 7A**) did not match with the projections at MM. Images are adjusted at contrasts that show the best image for each *5HTR*. Scale bar, 100μm.

the amygdala, although signals were heterogeneous and not as pronounced as those in the mammillary nucleus. *5HT1A*, *5HT4*, *5HT6*, and *5HT7* mRNA showed high expression levels in the cortical amygdaloid nucleus (Co), where there were dense serotonergic projections. *5HT1A* mRNA was highly expressed in the basolateral (BLa), basomedial (BMa) and Co and not expressed in the La. *5HT2A* mRNA was expressed only in La and not in Bla, BMa, or Co. *5HT2C* was expressed densely in the medial amygdaloid nucleus (Me), and the expression became very sparse toward La. *5HT1B* mRNA was faintly expressed and *5HT1E* mRNA was homogeneously expressed at low to moderately high levels across all the nuclei. *5HT4* and *5HT7* mRNAs were generally expressed toward the medial part, mostly in Co. The *5HT6* mRNA expression levels were high in Co and low in other nuclei.

#### **SEROTONIN RECEPTOR mRNA EXPRESSIONS IN SUPERIOR COLLICULUS**

expressions **(C)**. Scale bar, 100μm.

The *5HTR* subtypes expressed in the superior colliculus (SC) (**Figure 11**) were similar to those in MD, the adjacent substructure of the thalamus. In SC, we did not find any significant expression of *5HT1F*, *5HT3A*, or *5HT4*. All the other *5HTR* subtypes were sparsely expressed at various levels. The serotonergic projections in SC were moderately dense and appeared to overlap with *5HT6* expression in the zonal layer (Zo). *5HT1A* was mostly expressed in superficial layers including the zonal layer, superficial gray (SuG) layer, and optical nerve layer (Op), and its expression levels ranged from moderately high to high depending on the cell type. *5HT2A* and *5HT1B* were expressed at very low and low levels, respectively, in Zo and SuG. *5HT1E* was exclusively expressed in Zo at a low level. *5HT2C* was expressed across the superior colliculus at a moderately high level; its expression was generally dense in Zo and SuG. *5HT6* was expressed at a moderately high level in two tiers, densely in Zo and SuG, and sparsely in the intermediate gray (InG) layer. Finally, *5HT7* was expressed at a low level in InG.

## **SEROTONIN RECEPTOR mRNA EXPRESSIONS IN CAUDATE AND SEPTUM**

In the caudate, medial septum (MS), and lateral septum (LS) (from right to left in **Figure 12**), the serotonergic projections varied and showed no apparent overlap with 5HT expression. In the caudate, *5HT1F* and *5HT3A* were not expressed. *5HT1E* and *5HT7* were faintly expressed.The mRNA expression levels were low for *5HT1A*, moderately high for *5HT1B*, and moderately high to high for *5HT6* and *5HT4*. *5HT2C* at moderately high mRNA expression levels was densely expressed toward the medial part (Figure S2) and more scattered toward the lateral part of the caudate (**Figure 12F**).

In the septum, *5HT1F* and *5HT3A* were not expressed. *5HT1A* showed sparse but significant expression in both the medial septum (MS) and lateral septum (LS). *5HT1B* showed a moderately high mRNA expression level, *5HT1E* and *5HT7* were expressed at low levels, and *5HT6* was faintly expressed in the lateral septum. *5HT4* was generally expressed at moderately high to high levels in the medial septum. *5HT2A* was exclusively expressed in the medial septum at a moderately high level, and complimentarily *5HT2C* was expressed at a moderately high level in the lateral septum (indicated by arrow heads in **Figures 12E,F**)

### **SEROTONIN RECEPTOR mRNA EXPRESSIONS IN VENTRAL STRIATUM**

We examined the 5HT expression patterns in the internal globus pallidus (iGP), and external globus pallidus (eGP), substantia nigra pars reticulate (SNr), and substantia nigra pars compacta (SNc), representing the ventral striatum. The serotonergic projections in these regions were again heterogeneous. In SNc, the projection density increased near the inferior regions where the expression was generally denser. In the globus pallidus (Figure S2), a small repository of 5HT subtypes was expressed and we did not detect signals above the background level for *5HT1B*, *5HT1F*, *5HT4*, *5HT3A*, or *5HT7* in both nuclei. All the 5HT subtypes were sparsely expressed in these nuclei. The mRNA expression levels were very low for *5HT1A* and low for *5HT1E*, *5HT2A*, and *5HT6* in both the iGP and eGP. Interestingly, *5HT2C* was expressed in the iGP and eGP at high and very low levels, respectively (Figure S2).

In the substantia nigra (**Figure 13** and Figure S2), *5HT1F* and *5HT3A* were not expressed. In SNc, *5HT2C* and *5HT4* mRNAs were expressed sparsely whereas mRNAs of other 5HTs were expressed densely. The levels of expression were very low for *5HT2A*, low for *5HT1A* and *5HT1B* and moderately high for *5HT1E*, *5HT2C*, *5HT4*, *5HT6*, and *5HT7*. In SNr all the 5HTs were expressed sparsely at very low levels except *5HT2C*, which was expressed sparsely but at a high level (**Figure 13**).

## **DISCUSSION**

We report the mRNA localization of all the 10 *5HTR*s that are expressed, as well as the distribution of serotonin terminations in the marmoset brain. Besides confirming the published results of numerous previous studies, the present study notably demonstrates several new findings about the organization of serotonergic systems. On the basis of our findings we discuss the possible roles of *5HTR*s in the marmoset brain, as revealed by our analysis of overall expression patterns.

mRNA expressions **(A–J)** and immunohistochemical staining with anti-SERT antibody **(K)** in basomedial (BMa), basolateral (BLa), cortical (Co), lateral (La), and medial (Me) amygdaloid nuclei of amygdala. Note show the overlap of the expressions with serotonergic projections **(K)** near Co. Images are adjusted at contrasts that show the clearest image for each *5HTR*. Scale bar, 200μm.

## **TECHNICAL CONSIDERATIONS**

In our present study we were unable to obtain the results for *5HT1D*, *5HT3B*, and *5HT5A*. When checked for their expression patterns in the human data set (ABA, 2012), we were unable to find the expression of 5HT1D and *5HT3B*, suggesting that the absence of expression found in our study is not due to artifact. *5HT5A* is found in the frontal cortex at low levels in both humans (ABA, 2012) and mice (Goodfellow et al., 2012, Figure S6). On the basis of this finding, we could not exclude the possibility that ISH using our *5HT5A* probes might have failed to detect low signals. We also encountered some constant background signals associated with the expression of *5HT1F* and *5HT1E*, and we were unable to detect signals for *5HT1E* when testing for its presence using excitatory or inhibitory, neuronal markers for double hybridization. On the basis of our previous study (Watakabe et al., 2007) we consider that low mRNA expression levels of *5HT1F* and *5HT1E* might be the reason for the granular background and also both the lower mRNA expression level and high GC content of *5HT5A* (63.41%) than of the other *5HTR*s might be the reason for the failure to detect ISH signals.

#### **OVERLAP OF SEROTONIN RECEPTOR mRNA DISTRIBUTION AND SEROTONERGIC TERMINATIONS**

Serotonergic projections in the marmoset brain were generally associated with serotonin receptor expressions. Our data show a marked overlap of the mRNA expressions of most *5HTR*s with serotonergic terminations in the visual cortex (**Figure 3**), the subiculum (**Figure 6I**), the CL nucleus of the thalamus (**Figure 7A,E,F**, also see Figures S1B–E), the medial mammillary nucleus (**Figures 8E,J**), the cortico amygdaloid nucleus of the amygdala (**Figure 10**), and the midline thalamic nuclei (Figure S1). All the subtypes, except *5HT1B,* that showed overlaps have somatodendritic localization of their receptor proteins (Table S2), suggesting a strong correlation between serotonin availability and receptor expression.

**FIGURE 11 | ISH expression profiles of** *5HTR***s in superior colliculus.** *5HTR* mRNA expressions **(A–J)** and immunohistochemical staining with anti-SERT antibody **(K)** in zonal layer (Zo), superficial gray (SuG), optic

nerve layer (Op), and intermediate gray (InG) of superior colliculus (SC). Images are adjusted at contrasts that show the clearest image for each *5HTR*. Scale bar, 100μm.

Interestingly, none of the *5HTR*s were expressed in layer I where corresponding serotonergic termination were present and were relatively high in density at certain areas (**Figure 3**). Likewise, both in the mouse and marmoset no serotonergic terminations were found in the pyramidal layer of the hippocampus, where all the *5HTR*s are expressed; instead they were more prominent in Slm (**Figure 6** and Figure S5). Both layer I of cortex (Shipp, 2007) and Slm (Maccaferri, 2011) of the hippocampus receive the apical tuft of pyramidal cell dendrites. This mismatch suggests that the major target of serotonergic terminations in the supragranular layer of the cortex and hippocampus is the apical dendritic tuft of neurons, which is known to increase the gain of pyramidal neurons (Larkum et al., 2004).

### **CORTICAL EXPRESSIONS OF 5HTRs AND CIRCUITRY IMPLICATIONS**

In summary, the upper (supragranular), middle, and lower (infragranular) layers showed quite different patterns of *5HTR* expressions. This feature of *5HTR*s having different mRNA expression patterns in different layers suggests distinct roles of

*5HTR*s in the primate cortex that presumably affect the function of each layer.

Large varicose serotonergic fibers originating from the median raphe nucleus (MRN) have been reported to project at the supragranular layers in the marmoset (Hornung et al., 1990) and macaque (Wilson and Molliver, 1991). These innervations form synapses with supragranular inhibitory neurons in a basket like pattern in macaques and chimpanzees but not in humans (Raghanti et al., 2008), and in both cats and marmosets such a basket like pattern is observed in calbindin-positive (CB+) interneurons (Hornung and Celio, 1992). In the rat hippocampus also innervation to CB+ inhibitory neurons has been reported (Freund et al., 1990). The interneurons are likely to inhibit the nearby pyramidal cells; as has been demonstrated in many locations of the cortex (Sheldon and Aghajanian, 1990; Ropert and Guy, 1991; Foehring et al., 2002).

We report expression of *5HT4* mRNA in *GAD67*-positive inhibitory neurons and the expressions of *5HT1A* and *5HT6* mainly in *VgluT1*-positive excitatory neurons in the upper layers

**FIGURE 12 | ISH expression profiles of** *5HTR***s in caudate and septum.** *5HTR* mRNA expressions **(A–J)** and immunohistochemical staining with anti-SERT antibody **(K)** in the caudate (Cd) nucleus, and medial septum (MS), and lateral septum (LS). Note that the

arrowheads for **(E)** and **(F)** show the presence and absence of *5HT2A* and *5HT2C* expression, respectively, in the medial septum. Images are adjusted at contrasts that show the clearest image for each *5HTR*. Scale bar, 200μm.

of V1 (**Figure 4**). Thus, *5HT4*, which has excitatory cellular effects (Table S2), might indirectly inhibit neighboring pyramidal neurons and *5HT1A*, which has an inhibitory cellular effect, might be recruited to directly inhibit pyramidal neurons. *5HT6*, which has an excitatory cellular effect, similarly can be supposed to excite pyramidal neurons.

Direct and indirect inhibition might be recruited separately, depending on the two different populations of terminal axons originating from different raphe nuclei with their unique behavioral consequences. MRN forms a direct synaptic contact with neuronal somata, whereas DRN has a widespread effect through volume or extrasynaptic transmission (Törk, 1990; Michelsen et al., 2007). The MRN innervation forms synaptic contact with CB+ interneurons (as mentioned above), which on the basis of our findings seem to express *5HT4*. Interestingly, *5HT4* has also been detected in certain CB+ enteric neurons of rodents (Poole et al., 2006). Our observation of *5HT1A* expression mainly in excitatory neurons is based on visual inspection in V1, but previous reports have shown that in Layer II of the monkey prefrontal cortex (PFC) 83% of *5HT1A* is expressed in *VgluT1* positive excitatory neurons and 43% of the remaining inhibitory neurons are found in CB+ interneurons. This suggests that *5HT1A* may be recruited by both MRN and DRN in PFC.

The extrasynaptic localization of *5HT1A* receptors (Riad et al., 2000) supports the idea of direct inhibition of pyramidal neurons expressing *5HT1A* (**Figure 4**) by volume transmission triggered by DRN. In summary, *5HT4* might be recruited in synaptic-indirect inhibition of pyramidal neurons by the stimuli originating from MRN whereas *5HT1A* might be recruited in extrasynaptic-direct inhibition of pyramidal neurons by the stimuli originating from DRN.

#### **THALAMIC NUCLEI PROJECTING TO THE CORTEX SHOW LESS RECEPTOR DIVERSITY**

In thalamic nuclei projecting to cortex, only *5HT1A*, *5HT1B*, *5HT6*, and *5HT7* were prominently expressed. *5HT1A* and *5HT1B* have inhibitory cellular effects (Table S2) whereas *5HT6* and *5HT7* have excitatory cellular effects (Table S2). This suggests that the cortically projecting thalamic nuclei, maintain a balance between excitatory and inhibitory effects on inputs and outputs only by recruiting a limited subgroup of *5HTR*s. *5HT2C*, and *5HT2A* were expressed in addition to these four *5HTR* subtypes in the CL, which projects to the striatum(Van der Werf et al.,

2002), and in the RT, which receives inputs from the cortex (Smith, 2008). Taken together, our data suggest that those regions of the thalamus, which gates afferent information to the cortex, have fewer *5HTR* subtypes (see **Table 3** and Figure S2) and in contrast, the cortex, which integrates sensory information, has more *5HTR* subtypes. Aligning to our findings, physiological data collected from the ferret thalamus (Monckton and McCormick, 2002) also suggest that serotonin has lesser influence (direct postsynaptic inhibitory) on the primary sensory nuclei than on the intralaminar nuclei.

### **COMPLEMENTARY EXPRESSION OF 5HT2A AND 5HT2C**

Many studies have suggested independent, reciprocal, opposing and balancing functional features associated with *5HT2A* and *5HT2C* receptors (Popova and Amstislavskaya, 2002; Winstanley et al., 2004; Nonogaki et al., 2006; Aloyo et al., 2009; Halberstadt et al., 2009). In the hypothalamo-pituitary-testicular -based system, the neural control of male sexual motivation and arousal involves the facilitative action of *5HT2A* and suppressive action of *5HT2C* in a reciprocal manner (Popova and Amstislavskaya, 2002). In the hypothalamus of obese Ay mice, *5HT2A* and *5HT2C* receptors are suggested to have reciprocal roles in the regulation of feeding and energy homeostasis (Nonogaki et al., 2006). The complementary expression of *5HT2A* and *5HT2C* observed in the hypothalamus in our study (**Figures 8E,F**) is consistent with the finding of Papova et al. and Nonogaki et al. in nonprimates. Besides the hypothalamus, the septum (**Figures 12E,F**) and entorhinal cortex (**Figure 2**, k5,k6) also showed complementarity. In V1, there was an enriched expression of *5HT2A* in contrast to the scant expression of *5HT2C* (**Figure 2**).

*5HT2A* is expressed in 86 to 100% of upper layer glutamatergic cells and in 13–31% of inhibitory cells in the monkey and human PFC (De Almeida and Mengod, 2007). Similarly, in the marmoset and macaque V1, it is also mostly expressed in the excitatory neurons (Watakabe et al., 2009; Nakagami et al., 2013, **Figure 5**). In contrast, the expression of *5HT2C* was scant and was mostly detected in the inhibitory neurons (**Figure 5**) of layer V. In rats, *5HT2C* is primarily expressed in excitatory neurons in the PFC (Puig et al., 2010). This difference may be species-specific between the marmoset and rat or due to the difference in the equivalent ages of the two animal species used. In rats there is high expression of *5HT2C* in layers IV and V until P14, and after P56, the expression level becomes low and is limited to layer V (Li et al., 2004; Jang et al., 2012). Overall, our data supports the functional complementarity between *5HT2A* and *5HT2C* suggested in previous pharmacological studies.

#### **SPORADIC AND HIGHLY LOCALIZED EXPRESSIONS OF 5HT1F AND 5HT3A**

*5HT1F* is only expressed in layer VI of V1 (**Figure 3**), the presubiculum (**Figure 6**), and LM of the hypothalamus (**Figure 9**). In V1 and the presubiculum, its expression overlapped with dense serotonergic terminations, again suggesting a high turnover rate of serotonin at these sites. In mouse V1, a recent study has shown that layer VI works as a major mediator of cortical gain modulation (Olsen et al., 2012). Our previous work shows the role of *5HT1B* in increasing the signal-to-noise ratio and *5HT2A* in gain control in V1 (Watakabe et al., 2009). In this report, we have shown the expression of *5HT1F* in excitatory neurons of layer VI. Together, these findings suggest for possible recruitment of the *5HT1F* receptor present in layer VI for supporting the visual gain function in marmoset.

The mammillary body, which includes MM and LM (Vann, 2010) (**Figure 8**), appears to lack interneurons in primates (Veazey et al., 1982), whereas the TM, which surrounds the mammillary body, is composed of inhibitory neurons only. Surprisingly, the members of the *5HT1* family, which have inhibitory cellular effects (Table S2), are not expressed in the mammillary body, except *5HT1F*. This suggests that serotonin primarily functions to facilitate the excitation of the mammillary body in MM, as revealed by the dense serotonergic innervations and expression of *5HT2A*, *5HT6*, and *5HT7* receptors with excitatory cellular effects (Table S2) but hyperpolarizes the ML by recruiting *5HT1F*, thus balancing the overall excitation of the mammillary body. Overall, the sporadic regional localization of *5HT1F* receptors in the marmoset brain may be related to the mediation of the gain modulation or balancing functions.

The expression profile of *5HT3A* we obtained in the cortex was different from that observed in mice, where it was associated with cortical interneurons. *5HT3A* accounts for nearly 30% of all interneurons and is suggested to be involved in shaping the cortical circuit in rodents (Rudy et al., 2011). In addition, Jakab and Goldman-Rakic (2000) showed the *5HT3A* receptor at the cell body of cortical neurons in macaques. There may be species differences in the expression pattern of *5HT3A* in the cortex between marmosets and other species. In our present study, we examined *5HT3A* expression using several probes of *5HT3A*, but except for the probes mentioned in the results (shown in **Table 1**) we observed high background signal intensities for all probes. The working probe was found to be expressed only in GABAergic interneurons in the CA fields of the hippocampus (**Figure 6J**). Therefore, we cannot exclude the possibility that the differences observed in our marmoset study are due to the different isoforms generated by alternate splicing, because two splice variants of *5HT3A* are found in humans, which exhibit similar pharmacological and electrophysiological profiles when expressed as homomers (Hannon and Hoyer, 2008)

## **COMPARISON OF 5HTR mRNA EXPRESSION BETWEEN DIFFERENT SPECIES**

*5HT1A* was expressed in the marmoset, but not in the macaque, in layer IV of V1. The expression is also lacking in human V1 (ABA, 2012). It is tempting to correlate this difference with species-specific physiological differences, such as dichromatic vision, observed in some marmosets (Solomon, 2002; Surridge et al., 2003), compared with the trichromatic vision in humans and macaques (Surridge et al., 2003). Besides this difference, features such as the expression of *5HT1A* and *5HT6* in the upper layer, the V1-specific expression of *5HT1B*, the enriched expression of *5HT2A* in V1, the rostral decrease in the expression of *5HT2C*, the low expression level of *5HT7* and the absence of expression of *5HT3A* (as discussed above) in the cortex were very much similar to those in humans (ABA, 2012). Besides these similarities, the upper layer expression of *5HT1A*, which has been observed in the marmoset (in the present study), macaque and human (De Almeida and Mengod, 2008) is also observed in the rat PFC (Goodfellow et al., 2009), and the expression of *5HT7* mRNA, which is observed prominently in the thalamus and at low levels in the cortex, is also similarly observed in rodents (Gustafson et al., 1996). Together, the expressions of *5HT1A* and *5HT7* receptor subtypes in the cortex seem to be conserved between rodents and primates.

In the hippocampus there was a surprising similarity in the expression patterns observed between marmosets and mice. In both species, except for *5HT2C* and *5HT3A*, the expression of all the *5HTR*s was limited only to the pyramidal layer (**Figure 6** and Figure S5), suggesting that majority of serotonin receptors are recruited for the modulation of glutamatergic transmission in the hippocampus. The serotonergic projections, in both the species (as discussed above) were dense at Slm (**Figure 6** and Figure S5K). The overlap between serotonergic terminations and *5HT1F* observed in the presubiculum, the specific expression of *5HT2A* in the polymorph layer of DG, and high overall expression level of *5HT1A* observed in the marmoset study was very similar to that in mice (**Figure 6** and Figure S5). In the thalamus, again the number of receptor subtypes expressed was smaller than that in the cortex (ABA, 2009).

Besides the conspicuous differences in the overall mRNA expression levels of 5HTs (Figure S7), which were low in mice, there are some notable differences between the mouse and marmoset expression profiles observed in the cortex. *5HT1E* found in the marmosets (**Figure 1**) was not detected in the mice (ABA, 2009), and the enriched and specific expressions of *5HT1A*, *5HT1B*, *5HT1F*, and *5HT2A* found in V1 of the marmosets (**Figure 3**) were also not observed in the mice (Figure S6). *5HT4* observed in inhibitory neurons of the marmosets was scarcely expressed in the mouse cortex (Figure S6). *5HT3A* is expressed in cerebral cortex of macaques (Jakab and Goldman-Rakic, 2000) but was not observed in our study of the marmosets. In mice it was expressed mainly in upper layers including layer I (Figure S6), where there was no expression of any *5HTR*s in the marmoset. Among the other expression patterns that were exclusively observed in the mice are as follows: the expression of 5HT1D in layer 6b of SS (Figure S6, c2), the sparse expression of *5HT1B* in layer 4 of SS (Figure S6, b2), abundant expression of *5HT1F* in MO (Figure S6, d3).

Taken together, the mRNA expression pattern of *5HTR*s in the marmoset as compared with those in the mouse shows some significant differences in the cortex, which suggests certain primate specific roles of *5HTR*s and the usefulness of the marmoset as a primate model in further studies of serotonergic modulations in higher brain functions that are specific to primates

#### **ACKNOWLEDGMENTS**

We thank Drs. Yasuke Komatsu and Yuki Nakagami for help with marmoset handling and operation and Dr Kathleen S Rockland for critical reading. This work was supported by Scientific Research on Innovative Areas "Neural Diversity and Neocortical Organization" from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to Tetsuo Yamamori) and "Strategic Research Program for Brain Science (Highly Creative Animal Model Development for Brain Science)" from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Reference figures are used with consent from the National Center for Neurology and Psychiatry, Kodaira, Japan.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fncir*.* 2014*.*00052/abstract

## **REFERENCES**


autoradiography and *in situ* hybridization studies of new ligands and newly identified receptors. *Histochem. J.* 28, 747–758. doi: 10.1007/BF02272148


Nakagami, Y., Watakabe, A., and Yamamori, T. (2013). Monocular inhibition reveals temporal and spatial changes in gene expression in the primary visual cortex of marmoset. *Front. Neural Circuits* 7:43. doi: 10.3389/fncir.2013.00043


Poole, D. P., Xu, B., Koh, S. L., Hunne, B., Coupar, I. M., Irving, H. R., et al. (2006). Identification of neurons that express 5-hydroxytryptamine4 receptors in intestine. *Cell Tissue Res.* 325, 413–422. doi: 10.1007/s00441-006-0181-9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 March 2014; accepted: 25 April 2014; published online: 19 May 2014. Citation: Shukla R, Watakabe A and Yamamori T (2014) mRNA expression profile of serotonin receptor subtypes and distribution of serotonergic terminations in marmoset brain. Front. Neural Circuits 8:52. doi: 10.3389/fncir.2014.00052*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Shukla, Watakabe and Yamamori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Interaction between the 5-HT system and the basal ganglia: functional implication and therapeutic perspective in Parkinson's disease

#### *Cristina Miguelez 1,2, Teresa Morera-Herreras 1, Maria Torrecilla1, Jose A. Ruiz-Ortega1,2 and Luisa Ugedo1 \**

*<sup>1</sup> Department of Pharmacology, Faculty of Medicine and Dentistry, University of the Basque Country UPV/EHU, Leioa, Spain*

*<sup>2</sup> Department of Pharmacology, Faculty of Pharmacy, University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*Bruno Pierre Guiard, University of Paris XI, France Karen Jaunarajs, University of Alabama, USA Kristin Briana Dupre, National Institutes of Health, USA*

#### *\*Correspondence:*

*Luisa Ugedo, Department of Pharmacology, Faculty of Medicine and Dentistry, University of the Basque Country UPV/EHU, Barrio Sarriena sn, 48940 Leioa, Spain e-mail: luisa.ugedo@ehu.es*

The neurotransmitter serotonin (5-HT) has a multifaceted function in the modulation of information processing through the activation of multiple receptor families, including G-protein-coupled receptor subtypes (5-HT1, 5-HT2, 5-HT4–7) and ligand-gated ion channels (5-HT3). The largest population of serotonergic neurons is located in the midbrain, specifically in the raphe nuclei. Although the medial and dorsal raphe nucleus (DRN) share common projecting areas, in the basal ganglia (BG) nuclei serotonergic innervations come mainly from the DRN. The BG are a highly organized network of subcortical nuclei composed of the *striatum* (caudate and putamen), *subthalamic nucleus* (STN), internal and external *globus pallidus* (or entopeduncular nucleus in rodents, GPi/EP and GPe) and *substantia nigra* (*pars compacta*, SNc, and *pars reticulata*, SNr). The BG are part of the cortico-BG-thalamic circuits, which play a role in many functions like motor control, emotion, and cognition and are critically involved in diseases such as Parkinson's disease (PD). This review provides an overview of serotonergic modulation of the BG at the functional level and a discussion of how this interaction may be relevant to treating PD and the motor complications induced by chronic treatment with L-DOPA.

#### **Keywords: 5-HT, basal ganglia, electrophysiology, Parkinson's disease, L-DOPA induced dyskinesia**

Serotonergic innervation in the brain originates from the raphe nuclei. Both, the medial and the dorsal raphe nucleus (DRN), project to common areas implicated in motor control, such as the thalamus. Nevertheless, the basal ganglia (BG) nuclei receive serotonergic afferences coming prevalently from the DRN (reviewed in Di Matteo et al., 2008). The BG contain serotonin (5-HT) and its metabolite 5-hydroxy-indolacetic acid (5-HIAA) (Palkovits et al., 1974; Saavedra, 1977; Lavoie and Parent, 1990), 5-HT transporter (SERT) and serotonergic receptors (from 5-HT1 to 5- HT7). These serotonergic receptors are unevenly expressed along the BG, and their distribution also differs between species. Here, we will review the evidences supporting the serotonergic system as a modulator of the BG functionality. Both physiological and pathological conditions will be analyzed from the basic and clinical point of view.

#### **PHYSIOLOGICAL SEROTONERGIC MODULATION OF THE BASAL GANGLIA**

In accordance with its neuroanatomical distribution (as summarized in **Table 1**), 5-HT physiologically modulates BG nuclei activity by acting on serotonergic receptors.

#### **STRIATUM**

The striatum is the main input nucleus of the BG and a key neural substrate for motor function. Several studies have shown that 5-HT affects striatal function. In fact, both DRN stimulation and local administration of 5-HT into the striatum inhibit the vast majority of the striatal cells (Olpe and Koella, 1977; Davies and Tongroach, 1978; Yakel et al., 1988). However, by performing intracellular recordings, some researchers have reported striatal excitatory postsynaptic potentials after DRN stimulation, as well as a 5-HT-induced increase in firing rate of medium spiny neurons (MSN) (Vandermaelen et al., 1979; Park et al., 1982; Stefani et al., 1990; Wilms et al., 2001). Stimulation of presynaptic 5-HT1A and 5-HT1B receptors inhibits striatal 5-HT release (Gerber et al., 1988; Knobelman et al., 2000), and these receptors also control the release of other neurotransmitters in the striatum. Accordingly, 5-HT1A receptor activation decreases glutamate release from corticostriatal projections (Antonelli et al., 2005; Mignon and Wolf, 2005; Dupre et al., 2011, 2013). On the other hand, activation of 5-HT1B receptors indirectly stimulates the *substantia nigra pars compacta* (SNc) by decreasing GABA release from the *substantia nigra pars reticulata* (SNr), what consequently leads to increasing striatal dopamine levels (Gerber et al., 1988).

The 5-HT2 receptor family produces an inhibitory action on striatal neuron activity, mainly by modulating MSN (el Mansari et al., 1994; el Mansari and Blier, 1997). Moreover, Rueter et al. (2000) have shown that 5-HT2C receptors exert tonic inhibitory control over MSN membrane excitability. Other *in vivo* studies, however, have shown contradictory results suggesting that the effect of serotonergic drugs depends on the area of the striatum analyzed (Wilms et al., 2001). 5-HT2 receptor activation indirectly reduces the activity of striatal MSN


+++*, strong;* ++*, moderate;* +*, weak/r, rodent; m, monkey; h, human. EP, entopeduncular nucleus; GPe, external segment of the globus pallidus; GPi, internal segment of the globus pallidus; STN, subthalamic nucleus; SNc, substantia nigra pars compacta; SNr, substantia nigra pars reticulata.*

by enhancing the inhibitory tone of cholinergic interneurons over these output neurons. The increased release of acetylcholine is due to activation of cholinergic interneurons mainly through 5-HT2C receptors, although the involvement of 5-HT6 and 5-HT7 receptors has also been demonstrated (Bonsi et al., 2007; Blomeley and Bracci, 2009). In addition, the activation of 5-HT2C receptors located on fast-spiking interneurons increases their excitability, causing an enhancement of GABAergic postsynaptic inhibition that also decreases the activity of striatal projecting neurons (Blomeley and Bracci, 2009).

#### **SUBTHALAMIC NUCLEUS**

5-HT exerts a complex effect in the *subthalamic nucleus* (STN) that is considered to be a powerful excitatory drive in the BG motor circuit. Both pharmacological lesion of the DRN and 5-HT depletion increase STN firing frequency and burst activity *in vivo* (Liu et al., 2007; Aristieta et al., 2013). Decreased and increased excitability have been reported with the activation of 5-HT1A and 5-HT2C, and 5-HT4 receptors, respectively (Flores et al., 1995; Stanford et al., 2005; Xiang et al., 2005; Shen et al., 2007; Aristieta et al., 2013). In addition, activation of 5-HT1B receptors inhibits synaptic activity of STN neurons (Barwick et al., 2000; Shen and Johnson, 2008).

#### **GLOBUS PALLIDUS**

The *globus pallidus* (GP) has two segments, the external GP (GPe), which has a central position in the BG loop, and the internal GP (GPi/EP), which, together with the SNr, form the output structures of the BG. In the GPe, 5-HT depletion decreases the firing frequency and increases the proportion of bursty and irregular neurons (Delaville et al., 2012b). In contrast, local application of 5-HT or selective serotonin reuptake inhibitor (SSRI) administration excites most of GPe neurons (Querejeta et al., 2005; Zhang et al., 2010; Wang et al., 2013). These findings have been further confirmed by a patch-clamp recording study in which 5-HT perfusion produced a reversible depolarization of the GP neuron membrane potential, thereby increasing the firing rate of these neurons (Chen et al., 2008). *In vivo* studies indicate that the stimulatory effect of 5-HT on GPe neurons is mediated by the activation of 5-HT4 or 5-HT7 postsynaptic receptors, but not 5-HT2C and 5-HT3 receptors (Bengtson et al., 2004; Kita et al., 2007; Chen et al., 2008; Hashimoto and Kita, 2008). In contrast, 5-HT can decrease the presynaptic release of glutamate and GABA from the subthalamopallidal and striatopallidal terminals, respectively, through 5-HT1B receptors (Querejeta et al., 2005). In addition, 5-HT has been proposed to modulate the inhibitory and excitatory responses in GPe electrical stimulation of the motor cortex in awake monkeys (Kita et al., 2007). In fact, 5-HT suppresses GABAergic inhibitory responses to cortical stimulation through presynaptic 5-HT1B receptors and glutamatergic excitatory responses involving presynaptic or postsynaptic 5-HT1A receptors (Kita et al., 2007).

Few studies have been conducted to investigate the effects of 5-HT on the GPi/EP nucleus. Recently, it has been shown that intra-EP administration of a 5-HT2 receptor agonist promotes oral movements and inhibits EP neuronal activity in dopamine-depleted rats (Lagiere et al., 2013).

#### **SUBSTANTIA NIGRA**

Together with the GPi, the SNr constitutes the principal output nucleus of the BG and plays a relevant role in movement initiation. In this nucleus, 5-HT induces mostly an inhibitory effect *in vivo* (Dray et al., 1976; Collingridge and Davies, 1981), while 5-HT depletion decreases firing rate and increases burst activity of SNr neurons (Delaville et al., 2012a). Electrophysiological studies carried out in brain slices indicate that 5-HT not only excites SNr neurons acting directly on 5-HT2C receptors (Rick et al., 1995; Stanford and Lacey, 1996; Stanford et al., 2005) but also disinhibits SNr neurons by reducing GABA release from striatonigral terminals via presynaptic 5-HT1B receptor stimulation (Stanford and Lacey, 1996). A recent electrophysiological study reveals that presynaptic 5-HT1B receptor activation gates STN excitatory inputs to the SNr and reduces burst firing activity of the SNr, and therefore may be critically involved in movement control (Ding et al., 2013).

The role of 5-HT transmission in modulating the activity of dopaminergic SNc neurons is still unclear. Although the effect of 5-HT input seems to be inhibitory (Sinton and Fallon, 1988; Arborelius et al., 1993), chemical lesion of the DRN does not significantly alter SNc activity and DRN electrical stimulation only inhibits spontaneous activity in a subset of neurons (Kelland et al., 1990). Further, SSRI administration does not modulate SNc activity (Prisco and Esposito, 1995), and 5-HT depletion has been shown to either decrease or have no significant effect on SNc neuron excitability (Kelland et al., 1990; Minabe et al., 1996). Non-selective 5-HT2 receptor antagonists stimulate SNc neurons (Ugedo et al., 1989), whereas 5-HT4 receptors selectively prevents the stimulatory effect induced by haloperidol in this brain area (Lucas et al., 2001).

#### **IMPLICATION OF THE SEROTONERGIC SYSTEM IN PARKINSON'S DISEASE**

In the parkinsonian state and subsequent replacement therapy with L-DOPA, the serotonergic system adapts to the lack of dopamine by adopting anatomical and functional transformations.

#### **SEROTONERGIC SYSTEM IN PARKINSON'S DISEASE AND PARKINSONIAN ANIMAL MODELS**

Parkinson's disease (PD) is a neurodegenerative disease typified by loss of dopaminergic neurons in the SNc and subsequent dopamine depletion in the striatum. In patients with PD, it is generally supported that serotonergic neurotransmission decreases in advanced stages of the disease (Haapaniemi et al., 2001; Kerenyi et al., 2003) since the DRN, in addition to other nuclei, undergoes degeneration (Halliday et al., 1990; Jellinger, 1990). Moreover, 5-HT and 5-HIAA concentrations, as well as SERT expression, are reduced in several BG nuclei (Scatton et al., 1983; Raisman et al., 1986; D'Amato et al., 1987; Chinaglia et al., 1993; Kerenyi et al., 2003; Guttman et al., 2007; Kish et al., 2008; Rylander et al., 2010). Regarding receptor expression, 5-HT1A is decreased and 5-HT2C is increased in some BG nuclei (Fox and Brotchie, 2000; Ballanger et al., 2012) (**Figure 1**). Other serotonergic receptor (5-HT1B*/*D, 5-HT3, and 5-HT4) densities are however not modified by the dopaminergic loss (Steward et al., 1993; Reynolds et al., 1995; Wong et al., 1996; Castro et al., 1998). Overall, this dysfunctional serotonergic neurotransmission can indeed be linked to the high prevalence of depressive symptoms in parkinsonian patients (Reijnders et al., 2008).

In animal models of parkinsonism, the changes occurring after dopaminergic lesion have not been equally reproduced by different research groups. The discrepancies between these studies may be due to different protocol paradigms used for inducing the parkinsonian state, including the age of the animals, site of injection, concentration of the toxin, and the time between surgery and performing the studies. Several researchers have reported hyperinnervation (Zhou et al., 1991; Rozas et al., 1998; Balcioglu et al., 2003; Maeda et al., 2003), while others found no sprouting (Prinz et al., 2013), or even a decrease in striatal serotonergic fibers after dopaminergic damage (Takeuchi et al., 1991; Rylander et al., 2010). Along the same lines, striatal 5- HT levels have been found to be increased (Commins et al., 1989; Zhou et al., 1991; Karstaedt et al., 1994; Balcioglu et al., 2003), unchanged (Breese et al., 1984; Carta et al., 2006), or decreased (Frechilla et al., 2001; Aguiar et al., 2006, 2008). As detailed in **Figure 1**, studies performed in different animal models report unequal modification in serotonergic receptor expression along the BG nuclei. On the other hand, the DRN also suffers adaptative changes after the dopaminergic degeneration, such as increased 5-HT1A expression in MPTP monkeys (Frechilla et al., 2001) or weaker inhibitory effects of 5-HT1A agonists on neuron activity in rats (Wang et al., 2009). Electrophysiological studies using different 6-hydroxydopamine (6-OHDA) lesion models have shown increased basal firing rate of serotonergic cells in the parkinsonian state (Zhang et al., 2007a; Kaya et al., 2008; Wang et al., 2009; Prinz et al., 2013), while others show decreases (Guiard et al., 2008) or no changes (Miguelez et al., 2011).

In spite of the disparity of results, it seems clear that to varying extents, the serotonergic system is affected in parkinsonian conditions. More clinical and preclinical studies using the same experimental models and a greater amount of samples would help to clarify the role of the serotonergic system in each stage of PD.

**serotonergic receptor expression in pathological states.** Changes found in serotonergic receptor density in parkinsonian (left boxes) and dyskinetic (right boxes) patients or animals models compared to control conditions. Each nucleus and its modifications in receptor expression are encoded with the same color. GABAergic inhibitory pathways are represented in dark blue and

connections are indicated in green and serotonergic pathways in brown. DRN, *dorsal raphe nucleus*; GPi (EP), internal segment of the *globus pallidus (entopeduncular nucleus)*; GPe, external segment of the *globus pallidus*; STN, *subthalamic nucleus*; SNc, *substantia nigra pars compacta*; SNr, *substantia nigra pars reticulata*. r, rodent; m, monkey; h, human.

#### **SEROTONERGIC SYSTEM IN L-DOPA INDUCED DYSKINESIA**

The dopamine precursor L-DOPA is the most effective pharmacological treatment for PD, but it does not stop the progression of the disease. Moreover, long-term administration of L-DOPA induces motor complications, known as L-DOPA induced dyskinesias (LID), which have been related to adaptive changes of the serotonergic system. For example, a recent publication revealed that patients who had developed dyskinetic movements showed significant serotonergic hyperinnervation in the GPe and caudate, in comparison to non-dyskinetic individuals (Rylander et al., 2010). Such sprouting was directly correlated with the severity of motor complications. In contrast, other studies have shown that striatal *postmortem* content of 5-HT and SERT levels did not differ significantly between dyskinetic and non-dyskinetic cases (Calon et al., 2003; Kish et al., 2008), and chronic L-DOPA treatment did not influence SERT expression (Politis et al., 2010). As for serotonergic receptors, a study performed in PD patients that followed L-DOPA treatment showed increased 5-HT1A expression in several cortical areas, while no modification in the striatum, GP, SN, or thalamus was reported (Huot et al., 2012b). In the SNr, 5-HT2C expression has also been observed to be raised in those patients (Fox and Brotchie, 2000).

The use of animal models has provided valuable data to better understand the physiopathological mechanisms of LID. The most used models include non-human primates injected with MPTP and rodent-models with hemilateral dopaminergic loss chronically treated with L-DOPA. Although differences may arise from the methodological protocols, such models are considered to reproduce resembling symptoms and molecular changes to those observed in PD patients and efficiently respond to antidyskinetic therapy (Iderberg et al., 2012). It is now well known that exogenously administered L-DOPA can be stored, transformed into dopamine, and released from serotonergic terminals to multiple brain regions, including the striatum, in an uncontrolled manner, producing a non-physiological stimulation of sensitized dopaminergic receptors (Arai et al., 1995; Carta et al., 2007; Yamada et al., 2007; Navailles et al., 2010b, 2013). Lesions of the DRN consistently prevent the expression of dyskinesia (Carta et al., 2007; Eskow et al., 2009) or dopamine release after an acute L-DOPA injection (Navailles et al., 2010b). This interaction between serotonergic and dopaminergic systems is reciprocal, as 5-HT levels also decrease after L-DOPA administration, and L-DOPA itself can antagonize the effect of serotonergic agents (Bartholini et al., 1968; Everett and Borcherding, 1970; Commissiong and Sedgwick, 1979; Borah and Mohanakumar, 2007; Navailles et al., 2010a; Riahi et al., 2011; Miguelez et al., 2013). In dyskinetic animals, SERT expression has been found to be up-regulated (Rylander et al., 2010), not modified (Prinz et al., 2013), or decreased (Nevalainen et al., 2011). Serotonergic receptor expression in the BG is unevenly modified with L-DOPA treatment: 5-HT2A and 5-HT1B receptor expression is increased (Zhang et al., 2008; Riahi et al., 2011, 2013; Huot et al., 2012c), while 5-HT1A receptor expression is increased (Huot et al., 2012a) or does not change (Riahi et al., 2012) (**Figure 1**). The primary modifications occurring in the serotonergic system are thought to take place at terminal levels because no changes in the number of serotonergic neurons (Rylander et al., 2010; Inden et al., 2012) or 5-HT or dopamine levels in the DRN of dyskinetic rats have been reported (Bishop et al., 2012).

#### **CLINICAL RELEVANCE**

Although motor complications appear in the majority of the patients that receive chronic treatment with L-DOPA, an effective pharmacological tool for avoiding or treating LID expression is still missing. In this sense, 5-HT1A*/*1C receptors, which are involved in the regulation of the ectopic dopamine release, are envisaged as promising targets. In 6-OHDA-lesioned rats and MPTP monkeys chronically treated with L-DOPA, 5-HT1A*/*1C receptor agonists reduce expression of LID without impairing L-DOPA improvement in motor performance (Bibbiani et al., 2001; Ba et al., 2007; Dupre et al., 2007). Furthermore, administration of the 5-HT1A agonist, 8-OH-DPAT, also prevents L-DOPA-induced increment of extracellular dopamine (Nahimi et al., 2012). Other drugs that modulate 5-HT neurotransmission have shown efficacy over LID. Thus, a recent study has revealed that the treatment with the precursor of 5-HT, 5-hydroxytryptophan reduces the appearance of LID in L-DOPA-primed rats (Tronci et al., 2013). The 5-HT2A receptor inverse agonist ACP-103 reduces tremor in rodents and LID in MPTP monkeys (Vanover et al., 2008). Acute and prolonged SSRI treatment attenuates the severity and development of LID in L-DOPA-primed and naive rats without interfering with motor improvement, which may be mediated in part by 5-HT1A receptors (Bishop et al., 2012; Conti et al., 2014). In contrast, in PD patients, while buspirone, a partial 5-HT1A agonist, ameliorates dyskinesia (Kleedorfer et al., 1991; Bonifati et al., 1994), sarizotan, another 5-HT1A receptor agonist, failed to improve it compared with placebo (Goetz et al., 2008) and significantly increased *off* time (Goetz et al., 2007).

## **CONCLUDING REMARKS**

The effects of 5-HT in the BG depend on the specific nucleus and its receptor distribution. 5-HT induces an inhibition of MSN in the striatum using either direct or indirect activation of serotonergic receptors, as well as in the STN and SNr *in vivo*. In contrast, in the GPe the overall effect of 5-HT is excitatory. In other nuclei such as the EP or SNc the net effect is still not well understood.

The serotonergic physiological modulation may be modified in pathological conditions where the BG nuclei are highly affected. Here, we provide data regarding the alteration of the serotonergic system in PD, pointing out important discrepancies about the relationship between the serotonergic and dopaminergic systems in pathological states. In this concern, key methodological differences such as the use of different animal species and models, pharmacological treatments or stage of the disease in PD patients may explain these inconsistencies.

In summary, the serotonergic system is implicated in the modulation of the BG activity and in the etiopathology of PD and LID. However, although in preclinical studies results indicate that serotonergic drugs may be suitable for treating LID, this fact has yet to be supported by clinical trials. Accordingly, further investigation is required to determine the most suitable serotonergic target to treat these motor disturbances.

## **ACKNOWLEDGMENTS**

This study was supported by the grants IT747-13, PI12/00613, UPV/EHU UFI11/32.

## **REFERENCES**


striatum with extensive dopaminergic denervation. *Neurosci. Lett.* 343, 17–20. doi: 10.1016/S0304-3940(03)00295-7


firing of the midbrain raphe nuclei 5-HT neurons and a decrease of their response to 5-HT(1A) receptor stimulation in the rat. *Neuroscience* 159, 850–861. doi: 10.1016/j.neuroscience.2008.12.051


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 January 2014; accepted: 27 February 2014; published online: 17 March 2014.*

*Citation: Miguelez C, Morera-Herreras T, Torrecilla M, Ruiz-Ortega JA and Ugedo L (2014) Interaction between the 5-HT system and the basal ganglia: functional implication and therapeutic perspective in Parkinson's disease. Front. Neural Circuits 8:21. doi: 10.3389/fncir.2014.00021*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Miguelez, Morera-Herreras, Torrecilla, Ruiz-Ortega and Ugedo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Serotoninergic and dopaminergic modulation of cortico-striatal circuit in executive and attention deficits induced by NMDA receptor hypofunction in the 5-choice serial reaction time task

## *Mirjana Carli\* and Roberto W. Invernizzi*

*Laboratory of Neurochemistry and Behavior, Department of Neuroscience, IRCCS-Istituto di Ricerche Farmacologiche "Mario Negri," Milano, Italy*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*Albert Adell, Spanish Council for Scientific Research, Spain Sam Barnes, University of California, San Diego, USA*

#### *\*Correspondence:*

*Mirjana Carli, Laboratory of Neurochemistry and Behavior, Department of Neuroscience, IRCCS-Istituto di Ricerche Farmacologiche "Mario Negri," Via Giuseppe La Masa 19, Milano 20156, Italy e-mail: mirjana@marionegri.it*

Executive functions are an emerging propriety of neuronal processing in circuits encompassing frontal cortex and other cortical and subcortical brain regions such as basal ganglia and thalamus. Glutamate serves as the major neurotrasmitter in these circuits where glutamate receptors of NMDA type play key role. Serotonin and dopamine afferents are in position to modulate intrinsic glutamate neurotransmission along these circuits and in turn to optimize circuit performance for specific aspects of executive control over behavior. In this review, we focus on the 5-choice serial reaction time task which is able to provide various measures of attention and executive control over performance in rodents and the ability of prefrontocortical and striatal serotonin 5-HT1A, 5-HT2A, and 5-HT2C as well as dopamine D1- and D2-like receptors to modulate different aspects of executive and attention disturbances induced by NMDA receptor hypofunction in the prefrontal cortex. These behavioral studies are integrated with findings from microdialysis studies. These studies illustrate the control of attention selectivity by serotonin 5-HT1A, 5-HT2A, 5-HT2C, and dopamine D1- but not D2-like receptors and a distinct contribution of these cortical and striatal serotonin and dopamine receptors to the control of different aspects of executive control over performance such as impulsivity and compulsivity. An association between NMDA antagonist-induced increase in glutamate release in the prefrontal cortex and attention is suggested. Collectively, this review highlights the functional interaction of serotonin and dopamine with NMDA dependent glutamate neurotransmission in the cortico-striatal circuitry for specific cognitive demands and may shed some light on how dysregulation of neuronal processing in these circuits may be implicated in specific neuropsychiatric disorders.

**Keywords: 5-HT receptors, DA receptors, NMDA receptor, PFC, dorsal striatum, attention, executive functions, GLU release**

## **INTRODUCTION**

The integrated activity across frontal cortex and other cortical and sub-cortical brain regions supports a number of cognitive processes subsumed under the term "executive function." These cognitive processes comprise: selective allocation of attentional resources, maintenance, retrieval, and manipulation of information in working memory, formulation and planning of appropriate sequences of actions, inhibition of inappropriate responses and decision-making on the basis of positive or negative outcomes. Neuropsychological evidence has suggested that executive functioning is critically dependent on the frontal cortex (Fuster, 2009) and indeed the term executive function and frontal lobe function have often been used interchangeably. However, numerous studies in healthy human subjects, monkeys, and rats are suggesting that executive processes are not an exclusive property of frontal cortex but that are mediated by networks incorporating multiple cortical regions (posterior/parietal and prefrontal) as well as cortico-striatal-thalamic circuitry linking regions of the frontal cortex via basal ganglia to the thalamus. The executive dysfunctions associated with basal ganglia disorders have provided further evidence that fronto-striatal circuitry rather than discrete frontal regions may be important in mediating these functions.

The neural activity in the cortico-striatal circuitry is modulated by a diversity of neurochemical influences, each contributing to its functional integrity in a specific manner. Glutamate serves as the major excitatory neurotransmitter in the brain. Given the multiplicity of its receptor subtypes, a particular neuron's response to glutamate is determined by the presence and organization of diverse receptor subtypes; ionotropic N-methyl D-aspartate (NMDA), AMPA and kainate and metabotropic mGlu receptors. The NMDA receptors are especially interesting as various studies show that they are able to support persistent firing of cortical neurons (Compte et al., 2000; Wang, 2013). Evidence drawn from studies with rodents, monkeys and humans using multidisciplinary approaches have suggested that neuronal signaling via glutamatergic NMDA receptors play a central role in prefrontal cortex (PFC) activity and its cognitive functions such as working memory, attention, reversal learning (Malhotra et al., 1996; Moghaddam and Adams, 1998; Honey et al., 2003, 2004; Amitai and Markou, 2010; Neill et al., 2010; Arnsten et al., 2012; Pehrson et al., 2013; Wang et al., 2013).

The cortico-striatal circuitry receives innervations from all of the major ascending neurotransmitter systems, which include dopamine (DA), noradrenaline (NE), serotonin (5-HT), and acetycholine (ACh). Notably, studies manipulating the activity of the ascending neurotransmitter systems have demonstrated a rather selective role of these neuromodulatory systems in executive functions (Robbins, 2013). DA appears to play a role in stabilization of representations in processes such as working memory and attention control while NE contribute by enhancing the signal in cognitive operation of the PFC. The 5-HT has been shown to contribute in some of the processes implicated in the cognitive flexibility and impulsivity. The ACh innervation of the PFC has been implicated in attention and spatial working memory. Among these neuromodulatory pathways DA and 5-HT have received special attention for their putative involvement in the pathophysiology of neuropsychiatric disorders such as for example schizophrenia where cognitive functioning is an important indicator of outcome (Green et al., 2004; Lewis, 2004; Gold et al., 2007; Luck and Gold, 2008).

The overlap and convergence of DAergic and 5-HTergic forebrain projections with glutamatergic projections provide a framework for a complex neuronal interaction, which could support various cognitive functions. Underlying the complexity of DAand 5-HT-glutamate interaction is the co-localization of DA and 5-HT receptors with glutamate receptors within cortico-striatal circuitry. Thus, it is apparent that specific components of executive functions may be the results of convergence points between NMDA receptor signaling and the activity in these neuromodulatory systems. The two classes the DA receptors D1-like (D1 and D5) and D2-like (D2, D3 and D4) all belong to G-protein coupled receptors (GPCR); the D1-like receptors couple to the stimulatory Gs protein while D2-like couple to the inhibitory Gi/Go protein. So far seven families of serotonin receptors have been identified each with numerous subtypes. With the exception of 5-HT3 receptor, a ligand-gated ion channel the remaining receptors belong to the superfamily of GPCR. The electrophysiological, biochemical and behavioral characteristics of the interaction DA/NMDA and 5-HT/NMDA receptors have been studied and have been reviewed extensively (Aghajanian and Marek, 2000; David et al., 2005; Castner and Williams, 2007; Tritsch and Sabatini, 2012; Celada et al., 2013; de Bartolomeis et al., 2013).

Here we will first discuss the role of prefrontocortical NMDA receptors in attention and executive control and in cortico-striatal activity. Next, we will review a series of our systematic studies comparing the performance of animals after pharmacological manipulation of DA and 5-HT receptors activity locally in the medial PFC (mPFC) or in the dorsomedial striatum (dm-STR) in animals in which glutamatergic activity was perturbed by blockade of NMDA receptor in the PFC in a task that entails selective attention and tight organization of a complex response sequence for optimal performance (Carli et al., 1983; Robbins, 2002) and which engages fronto-striatal-thalamic circuitry (Christakou et al., 2001; Chudasama and Muir, 2001; Rogers et al., 2001; Chudasama et al., 2003a). Finally, we will illustrate our findings that suggest an association between NMDA receptor antagonist induced increase in glutamate release and attention deficit.

## **NMDA RECEPTORS IN THE PFC, ATTENTION, EXECUTIVE CONTROL, AND CORTICO-STRIATAL ACTIVATION**

Attention allows the subject to engage with its environment by selecting information relevant for its behavior. The relevant information is selected by top-down modulation of neural activity in posterior cortical areas by signals arising from the PFC (Buschman and Miller, 2007; Saalmann et al., 2007; Noudoost et al., 2010). Various lines of evidence demonstrate that persistent firing of pyramidal cells not only support working memory (Funahashi et al., 1989; Wang et al., 2013) but it contribute also to the process of attentional selection (Lebedev et al., 2004). Activation of NMDA receptors on local recurrent synapses rather than AMPA receptor stimulation has been shown to support persistent neuronal activity within the mPFC during the delay period in a working memory task (Wang et al., 2013) but their contribution to attention-induced firing is unclear. However, the attention-driven improvements in signal stability and noise correlation in the macaque visual cortex (area V1) has been shown to depend on high NMDA/AMPA receptor ratio (Herrero et al., 2013).

In our studies in rats we have focused on the control of attention, specifically on the process of input selection; the selection of task-relevant inputs for further processing (Luck and Gold, 2008; Lustig et al., 2013). This aspect of attention is somewhat distinguished from that where the attention is put on the selective activation and maintenance of task-appropriate rules (Luck and Gold, 2008; Gilmour et al., 2013). The most common experimental paradigms used for examining input selection processes of attention are continuous performance tasks among, which is the 5-choice serial reaction time (5-CSRT) task (Lustig et al., 2013). As in most cognitive task the successful performance requires the contribution of several factors other than control of attention and may thus tap as well on executive control processes.

#### **5-CHOICE SERIAL REACTION TIME TASK**

For rats (Carli et al., 1983) (**Figure 1**) the requirement during the 5-CSRT task performance is to sustain spatial attention divided among five locations to detect a brief visual stimulus over a large number of trials. Performance is characterized in terms of accuracy of visual discrimination, omissions, speed of responding and by different aspects of executive control such as premature and perseverative responses (see Robbins, 2002 for a detailed description and discussion of these performance measures). The main measure of the selective spatial attention in the 5-CSRT task is accuracy of visual discrimination. Correct responses are rewarded by a food pellet while incorrect responses or failure to respond within the allotted time (omission) result in few seconds of darkness (time out period). Accuracy is independent of omissions and it is relatively impervious to potential confounds such as changes in motor activity or motivation (see Robbins, 2002). Premature responses that occur before the onset of visual stimulus may arise as a consequence of animal not being able to

wait for a reward related cue. These "impulsive" responses measure an aspect of response inhibition that is related to response selection but also to action restraint during waiting and could be considered a type of motor impulsivity (Evenden, 1999; Dalley et al., 2011). Nose poke responses after the correct target detection has been performed are defined as perseverative responses and are considered an indicator of "compulsivity." Perseverative responses constitute persistence in an initially rewarded behavior such as nose poke (even though is no longer rewarded) and may be regarded as inability to alter behavior in reaction to changing task demands thus representing a measure of behavioral flexibility. Premature and perseverative response result in time out. Responses during time out are not reported usually even if they may constitute an additional parameter reflecting compulsivity (Amitai and Markou, 2010). Finally, a measure of response latency (i.e., mean latency to make a correct response) likely reflects decision time as long as changes in motivation and motor status are ruled out.

## **NMDA RECEPTORS IN THE mPFC AND 5-CSRT TASK PERFORMANCE**

The NMDA glutamate receptor is a ligand-gated ion channel composed of multiple subunits, which responds rapidly to glutamate by conducting cation currents that depolarize neurons rapidly. In the cerebral cortex NMDA receptors are preferentially expressed by pyramidal neurons particularly in layers II, III, V, and VI but also in excitatory and inhibitory axon terminals (Conti et al., 1997) particularly on parvalbumine (PV+) labeled GABA interneurons (Huntley et al., 1994). Changes in cortical NMDA transmission have consequences for other neurotransmitters locally (for example GABA) and distally (for example DA and 5-HT).

The selective blockade of NMDA receptors located in the mPFC by a competitive NMDA receptor antagonist 3-(R)-2 carboxypiperazin-4-propyl-1-phosphonic acid (R-CPP) has a profound impact on rats' performance in the 5-CSRT task (**Table 1**). The performance impairment is characterized by deficit in accuracy, increased omissions and correct response latency and by a concomitant loss of executive control in the form of increased premature and perseverative responses. These effects are robust (about 20% decrease in accuracy, while the number of premature and perseverative responses are increased by 2 to 3-fold) and consistent across many independent experiments. This pattern of effects resembles to that after lesions of the mPFC (Muir et al., 1996; Passetti et al., 2002) and clearly implicates NMDA receptor signaling in the mPFC for the successful performance of the task.

The effects of systemic administration of non-competitive NMDA receptor antagonists such as phencyclidine (PCP), dizocilpine and ketamine on rats' performance in the 5-CSRT task appears to be highly dependent on the type of treatment regimen used. First exposure to these drugs often leads to nonspecific effects in some animals such as ataxia and head weaving which are incompatible with the performance of this task while after repeated exposures these effects subside and rats start to show the characteristic deficit in performance; decreased accuracy and increased impulsivity and compulsivity (Grottick and Higgins, 2000; Higgins et al., 2003b; Le Pen et al., 2003; Amitai et al., 2007; Auclair et al., 2009; Amitai and Markou, 2010; Smith et al., 2011). In contrast rats tested after a wash-out period from sub-chronic PCP treatment do not show any performance deficit in the 5-CSRT task. However, Barnes et al. (2012b) using a 5 choice continuous performance task (5C-CPT), which is a version of the 5CSRT task specifically designed to add non-targets to which the subject must inhibit responding, were able to show an attention/vigilance deficit but only when the attentional load was increased.

Attentional impairment may almost certainly account for accuracy deficits observed in this task after injections of R-CPP (10–50 ng/side). However, the accuracy of rats in this task depends also on temporal organization of behavior, as responses initiated late are more likely to be incorrect. Naïve rats make the

#### **Table 1 | Effects of blockade of frontocortical NMDA receptors on attentional performance.**


*Data from: aMirjana et al., 2004, bMurphy et al., 2005, cPehrson et al., 2013. mPFC, medial prefrontal cortex; PrL PFC, prelimbic prefrontal cortex; InF PFC, infralimbic prefrontal cortex; ACC, anterior cingulate cortex.*

↓*, decrease;* ↑*, increase; 0, no effect; nr, not reported*

majority of nose poke responses in the holes (about 80% almost all correct) in a narrow time window (0–0.8 s) of stimulus presentation (Passetti et al., 2002). In analogy to what reported for mPFC lesioned rats it could not be excluded that the temporal distribution of responses of R-CPP-injected rats was more random across a much larger time window thus suggesting that they are "distracted/disorganized" (Passetti et al., 2002). A commission error in the 5-CSRT task may be the result of a faulty decision process, distraction or inability to hold "on-line" the planned response. Thus, it could not be excluded that additional deficit in response selection, increased distractability/disorganization and working memory may account for accuracy deficit after R-CPP. The impaired response selection is an important component of attentional deficit and the correct response latency may reflect the speed of processing involved in the input selection mechanisms of attention and in operations of decisional processes in response selection or both. Since correct and incorrect responses in this task have the same motor requirements the slowing of correct but not incorrect responses after R-CPP rule out motor impairment and could suggest the slowing of input selection processing speed. However, dysfunctional mechanisms of stimulus detection most likely due to distraction or temporal disorganization may certainly contribute. Thus, it could be argued that on occasions when the animals were able to overcome "distraction" and respond correctly they were doing it at the cost of slower responding. This indicates that animals injected with R-CPP when they correctly detected the visual stimulus could hold "on-line" mental representation of planned responses well after the visual stimulus has disappeared. In line with this suggestion are observations that control animals will compensate for the decreased salience of the visual stimulus by increasing the correct response latencies (Carli, 2006b). That deficits in working memory may not completely account for accuracy impairments may also be suggested by recent findings of Chudasama et al. (2005) who using an attention-working memory combined task have shown that rats with PFC lesions were impaired on the attentional but not on the working memory component of the task.

In addition to accuracy deficit R-CPP-injected rats made more omissions. This may suggest that rats did not orient their attention on the stimulus presentation array in time or were engaging in some other behavior thus missing the stimulus presentation. The accuracy and omission deficit were completely abolished by prolonging the stimulus duration (see Figure 3 in Mirjana et al., 2004). Because the frequency of stimulus presentation was regular relative to each trial initiation, when the stimulus duration was increased, the position of the visual target in both space and time was emphasized, thus facilitating accurate responding. The mean latency to collect the earned reward, which represents an additional measure of motivation and/or motor function, was not affected by R-CPP. Together, these findings rule out the possibility that the R-CPP-induced impairments in accuracy and omissions were a consequence of hyperactivity, poor motivation or a failure to make associations or remember the general rules of the task.

Impulsivity and perseveration are both intimately related to executive attentional processes that enable accurate response selection in the face of distraction and interference (Shallice, 1982; Robbins, 1996). Increasing the duration of the target stimulus reduces while decreasing it increases premature and perseverative responses suggesting that premature and perseverative responses in the 5-CSRT task may be under attentional control (Christakou et al., 2001; Carli, 2006b). However, the R-CPP-induced increase in anticipatory and perseverative responses persisted even when the longer stimulus helped alleviate the accuracy and omissions deficits (see Figure 3 in Mirjana et al., 2004). It may be argued that there was a primary deficit of response inhibition making the animals "impulsive" and "compulsive." Increased impulsivity in this task has been reported after highly arousing stimuli such as brief presentation of loud white noise during the waiting period (Carli et al., 1983), which may lead to attentional deficit (Carli et al., 1983). The inverted U-shaped function linking arousal and performance (Yerkes and Dodson, 1908) has been shown in human subjects performing a 5-CSRT task under conditions of elevated arousal (Wilkinson, 1963). The hypo-function of NMDA receptors in the mPFC may thus lead to a behavioral profile compatible with a state of hyper-arousal. A possible contribution of NMDA antagonist-induced NE release (Lena et al., 2007) to the state of hyper-arousal and consequent impairment in attention cannot be excluded as high levels of tonic NE activity is associated with an inability to focus attention (Aston-Jones et al., 2000).

The increased perseverative responding, which is in line with that reported after excitotoxic lesions of the mPFC (Muir et al., 1996) could be the result of R-CPP preventing the suppression of responses once effective for obtaining reward. The perseverative deficit was not general; it was specifically directed to the stimulus array holes and not the panel of the food magazine.

Evidence for functional heterogeneity of rat PFC and NMDA receptors therein were shown by Passetti et al. (2002) and Chudasama et al. (2003b) who reported that impairments in attentional accuracy after lesions to the mPFC (Muir et al., 1996; Passetti et al., 2002) are mainly reproduced by lesions confined to more dorsal (Cg1) aspects of PFC sparing prelimbic (PrL) and infralimbic (InF) sub-regions. However, attentional deficit induced by R-CPP injections confined to PrL or InF was less well localized (Murphy et al., 2005). A recent study comparing systemic and local application of dizocilpine an NMDA antagonist into the anterior cingulate cortex (ACC) (area Cg1) in rats performing a 3-choice version of the task report that while systemic administration of dizocilpine affected accuracy and omissions local application increased omissions without concomitant changes in accuracy. This finding would suggest separable roles for NMDA receptor in the PFC and ACC for the control of attention (Pehrson et al., 2013). The PrL subregion of PFC has been shown to be particularly involved in perseverative responding (Passetti et al., 2002; Chudasama et al., 2003b) whereas lesions or blockade of NMDA receptors in the InF sub-region mainly affect premature "impulsive" responding (Chudasama et al., 2003b; Murphy et al., 2005). However, in the study by Murphy et al. (2005) perseverative responses were not affected by blockade of NMDA receptors in the PrL. The failure to see changes in this behavior may reflect the fact that in contrast to studies in which perseverative errors are followed by darkness and time-out (Passetti et al., 2002; Chudasama et al., 2003b; Mirjana et al., 2004) in the study by Murphy et al. (2005) they had no consequences. This may suggest that NMDA receptors are implicated in the control of those behaviors that are relevant for the success and not those of no-consequences.

#### **NMDA RECEPTORS IN THE mPFC AND CORTICO-STRIATAL ACTIVATION**

Numerous studies in rodents show that acute or repeated administration of NMDA antagonists such as PCP, dizocilpine and ketamine consistently lead to disinhibition of the firing of pyramidal neurons (Jackson et al., 2004) most probably by decreasing the activity of GABA interneurons (Homayoun and Moghaddam, 2007) whose response depends on the firing pattern of pyramidal cells (Thomson, 2000; Shi and Zhang, 2003). NMDA receptor antagonists increase glutamate, 5-HT and NE release in the PFC (Moghaddam et al., 1997; Moghaddam and Adams, 1998; Abekawa et al., 2006; Lena et al., 2007; Lopez-Gil et al., 2007). Systemic PCP and dizocilpine also reduce extracellular GABA in the mPFC (Yonezawa et al., 1998) and there is evidence that glutamate release is inhibited by GABA (Pende et al., 1993; Bonanno et al., 1997; Perkinton and Sihra, 1998). Similarly, intra-mPFC infusion of R-CPP to conscious rats increased glutamate efflux within this brain area (Ceglia et al., 2004; Abekawa et al., 2006; Calcagno et al., 2006, 2009) and lowered GABA levels (Calcagno et al., 2009; Agnoli et al., 2013). The glutamate increase elicited by R-CPP is suppressed by TTX added to the medium perfusing the microdialysis probe suggesting that neuronal activity is required. Thus, the effect of R-CPP on extracellular glutamate may be mediated by direct or indirect suppression of cortical GABAergic transmission, which in turn enhances the release of glutamate. Employing dual-probe microdialysis technique we confirmed and extended these finding showing that R-CPP infused in the mPFC raised also extracellular levels of cortical DA whereas in the dm-STR extracellular levels of GABA were increased together with those of glutamate and DA (Agnoli et al., 2013). These data are summarized in **Table 2**.

The activation of glutamate neurotransmission in the mPFC and increased firing activity of pyramidal-projecting neurons may drive the increase in endogenous glutamate release in the dm-STR, which in turn may increase GABA and DA release. The NMDA/GABA interaction, regulate DA levels in the PFC and striatum (Balla et al., 2009). Reducing GABA transmission in the mPFC with GABAA receptor antagonist bicuculline or infusion of glutamate increases DA release in the dorsal-STR and these **Table 2 | Effects of R-CPP infused in the mPFC on glutamate (GLU), GABA and dopamine (DA) release in the PFC and dm-STR.**


*Data from: Ceglia et al., 2004; Calcagno et al., 2006, 2009; Carli et al., 2011a,b; Agnoli et al., 2013.*

↑*, increase;* ↓*, decrease.*

effects are abolished by intracortical infusion of dizocilpine or the GABA*<sup>A</sup>* agonist muscimol (Matsumoto et al., 2003). It is conceivable that glutamate by activating NMDA receptors on striatal medium spiny GABA neurons or interneurons facilitates GABA release (Morari et al., 1993, 1994, 1996; Young and Bradford, 1993).

Systemic administration of NMDA receptor antagonists such as ketamine and PCP had no significant effect on extracellular glutamate in the striatum (Lillrank et al., 1994; Moghaddam et al., 1997), and caused no changes or increased basal but inhibited K<sup>+</sup> evoked GABA release (Lillrank et al., 1994; Hondo et al., 1995). These findings suggest that different NMDA receptor antagonists may have different effects on extracellular glutamate and GABA depending on the route of administration and brain region considered.

These findings in rats are paralleled by data from functional magnetic resonance imaging (fMRI) in human subjects showing that NMDA receptor antagonist such as ketamine at doses that cause specific behavioral impairment in the executive component of a working memory task (Honey et al., 2003), increases BOLD response in a brain system comprising frontal cortex, parietal cortex, putamen, and caudate nucleus (Honey et al., 2004) and increases glutamine, a putative marker of glutamate release (Rowland et al., 2005). Some more recent studies assessing PFC activation and global connectivity within a working memory network during rest or during task performance have reported an increased or decreased ketamine-associated activation, respectively (Driesen et al., 2013a,b).

## **SEROTONIN/NMDA RECEPTORS INTERACTION AND ATTENTION PERFORMANCE CORTICAL 5-HT RECEPTORS**

The functions of 5-HT are afforded by the concerted actions of multiple 5-HT receptor subtypes and as shown repeatedly 5-HT through its receptor subtypes exert diverse, often antagonistic actions on the same behavioral response. Several lesion and pharmacological studies have attempted to define the role of 5-HT and its various receptors in different aspects of 5-CSRT task (for a review of these studies see Robbins, 2002).

The mPFC receives extensive 5-HT innervation from the dorsal (DR) and median (MR) raphè nuclei and contains several 5-HT receptors, with particular abundance of 5HT1A and 5-HT2A and 5-HT2C subtypes (Azmitia and Segal, 1978; Steinbusch, 1984; Blue et al., 1988; Jakab and Goldman-Rakic, 1998, 2000; Barnes and Sharp, 1999; Clemett et al., 2000; Pandey et al., 2006). In the PFC the 5-HT1A and 5-HT2A receptors are expressed throughout cortical regions with a greater proportion of expression on pyramidal rather than GABA interneurons (Santana et al., 2004). The 5-HT2C receptors are mainly expressed on pyramidal neurons (Clemett et al., 2000; Puig et al., 2010) and not in fast-spiking interneurons (Puig et al., 2010) but another immunohistochemical study using a different antibody shows more than 50% of the 5-HT2C receptors on GABA neurons (Liu et al., 2007). These 5-HT receptors have been extensively characterized in terms of their localization to pyramidal and GABA interneurons as well as biochemically and electrophysiologically and a detailed review of their impact on cortical neuron activity can be found in Celada et al. (2013).

Stimulation of 5-HT1A receptors by 8-OH-DPAT inhibits NMDA-mediated synaptic excitation in the rat visual cortex (Edagawa et al., 1998) and suppresses glutamate signaling in the PFC by reducing NMDA and AMPA receptor currents (Cai et al., 2002). *In vitro* studies show that activation of 5-HT1A receptor reduces NMDA-evoked glutamate release elevation while their blockade has opposite effects (Matsuyama et al., 1996; Maura and Raiteri, 1996). In *in vivo* studies PFC application of 8-OH-DPAT does not affect NMDA-evoked glutamate release, while the 5- HT1A receptor antagonist WAY100135 enhance basal and NMDAevoked glutamate release in the striatum (Dijk et al., 1995). Additionally, the 5-HT1A partial agonists and full antagonists attenuate working memory deficits as well as psychotomimetic effects induced by NMDA antagonists (Harder and Ridley, 2000; Wedzony et al., 2000). Stimulation of 5-HT1A somatodendritic autoreceptors in the DR or blockade of post-synaptic 5-HT1A receptors in the hippocampus remediate the spatial learning deficit induced by blockade of NMDA receptors (Carli et al., 1998).

Activation of 5-HT2A/2C receptors by DOI enhances the firing of pyramidal neurons (Puig et al., 2003) and 5-HT release dependent on activation of AMPA receptors (Martin-Ruiz et al., 2001) and increases glutamate levels in the somatosensory cortex (Scruggs et al., 2000, 2003). Activation of 5-HT2A/2C receptors in the PFC modulates GABAA receptor currents (Feng et al., 2001) and increases GABA release (Abi-Saab et al., 1999). Blockade of 5-HT2A receptors reduces NMDA antagonists-induced *fos* expression (Habara et al., 2001), motor hyperactivity (Gleason and Shannon, 1997; Martin et al., 1997, 1998; Swanson and Schoepp, 2002), forced swimming immobility (Corbett et al., 1999) and pre-pulse inhibition (PPI) (Varty et al., 1999). Blockade of 5-HT2C receptors enhances NMDA antagonists-induced motor hyperactivity and DA release (Hutson et al., 2000).

#### **PERFORMANCE IN THE 5-CSRT TASK**

The effects of 5-HT1A, 5-HT2A, and 5-HT2C receptor agents after systemic or intra-mPFC injections on attention and executive deficits induced by R-CPP (50 ng/sise) injected in the mPFC are summarized in **Table 3**.

#### **ACCURACY**

The behavioral manifestation of the functional interaction between 5-HT1A and 5-HT2A and 5-HT2C with NMDA receptors in the mPFC is the demonstration that selective agonist at 5-HT1A receptor 8-OH-DPAT and antagonist at 5-HT2A receptor **Table 3 | Summary of the effects of intra-mPFC R-CPP in combinations with 5-HT1A, 5-HT2A, and 5-HT2C agents, an mGlu2***/***<sup>3</sup> agonist and various antipsychotics on attention and executive control and glutamate (GLU) and serotonin (5-HT) release in the mPFC.**


*Data from: aMirjana et al., 2004, bCeglia et al., 2004, cCarli et al., 2006a, dCalcagno et al., 2006, eCalcagno et al., 2009, <sup>f</sup> Pozzi et al., 2011, gBaviera et al., 2008, hCarli et al., 2011b, <sup>i</sup> Carli et al., 2011a.*

↓*, decrease;* ↑*, increase; 0, reversal of R-CPP-induced effect.*

M100907 as well as 5-HT2C receptor agonist Ro60-0175 recovered attentional performance deficit due to blockade of NMDA receptor in the mPFC, albeit in a distinct manner (Mirjana et al., 2004; Carli et al., 2006a; Calcagno et al., 2009). Microinjections of 8-OH-DPAT or M100907 in the mPFC prevent accuracy deficit (Carli et al., 2006a). Clearly, the functional opposition between the two 5-HT receptor subtypes on accuracy suggest that the improvement produced by M100907 and 8-OH-DPAT might reside in their opposite activity on common cellular substrates (Araneda and Andrade, 1991; Ashby et al., 1994; Celada et al., 2013). The 5-HT1A but not 5-HT2A or 5-HT2C receptors appear to be involved in decision processes in this task as 8-OH-DPAT but not M100907 or Ro60-0175 reduced correct response latency and omissions (Mirjana et al., 2004; Carli et al., 2006a; Calcagno et al., 2009). DA system and in particular D1 receptor in the PFC and in the dm-STR have been shown to impact decision processes in this task (Granon et al., 2000; Robbins, 2002); (see Table 2 in Agnoli et al., 2013). The fact that 8-OH-DPAT infused in the mPFC increases DA efflux in this cortical region (Sakaue et al., 2000) may at least in part explain its effects on speed and omissions.

Injections of 8-OH-DPAT and M100907 in the mPFC in control rats had no effect on accuracy, which is in contrast to what reported by other studies. The effects of 5-HT1A agonists on accuracy in normal rats performing the task under basal conditions depend on whether the 5-HT1A somatodendritic autoreceptors or post-synaptic receptors are activated. Systemic 8-OH-DPAT impaired accuracy and this effect was abolished by 5,7-dihydroxytryptamine lesion or blockade of 5-HT receptors in the DR by a selective 5-HT1A antagonist WAY100635 (Carli and Samanin, 2000). In contrast, Winstanley et al. (2003a) reports a facilitation of accuracy after systemic or intra-cortical injections of 8-OH-DPAT. The role of 5-HT2A receptors in accuracy is much less clear as systemic M100907 had no effect (Winstanley et al., 2004a) and intra-mPFC injection facilitated accuracy at long but not short stimulus duration (Winstanley et al., 2003a). The 5-HT2C receptor do not appear to control accuracy in normal rats as no effect on accuracy has been reported after 5-HT2C receptor agonists or antagonists (Higgins et al., 2003a; Winstanley et al., 2004a; Fletcher et al., 2007, 2011).

#### **IMPULSIVITY AND COMPULSIVITY**

In contrast to the effects of 5-HT2A receptor antagonist, which reduced premature responses but not perseverative overresponding either after systemic or intra-cortical injection (Mirjana et al., 2004; Carli et al., 2006a), activation of 5-HT1A receptors in the mPFC had no effect on premature but decreased perseverative over-responding (see **Table 3**). 5-HT acting on 5-HT2A receptors segregated to apical dendrites of pyramidal neurons (Jakab and Goldman-Rakic, 1998) and to GABA interneurons specialized in the perisomatic inhibition of pyramidal cells (Jakab and Goldman-Rakic, 2000) can affect excitatory input (Aghajanian and Marek, 1997) and by acting on 5-HT1A receptors in the axon hilloc (DeFelipe et al., 2001; Czyrak et al., 2003) can suppress the generation of action potential along the axon and influence the activity in subcortical projection areas. Thus, by finely tuning the complex activity of glutamatergic pyramidal neurons, 5-HT may differently influence distinct aspects of executive control. These results clearly demonstrate the selectivity of executive control processes and indicate that impulsivity and compulsivity may be dissociated by 5-HT1A and 5-HT2A receptor mechanisms in the mPFC.

The effects of systemic M100907 and Ro60-0175 on R-CPPinduced impulsivity (Mirjana et al., 2004; Calcagno et al., 2009) are consistent with studies showing similar effects of these compounds on impulsivity but not compulsivity induced by systemic injections of NMDA antagonists dizocilpine and Ro63- 1908 (Higgins et al., 2003b; Fletcher et al., 2011). In contrast the 5-HT2C antagonist SB242084 increased premature responses already in control rats and tended to enhance dozocilpineinduced impulsivity (Higgins et al., 2003b).

Previous studies have suggested that enhanced impulsivity in the 5-CSRT task is associated with increased 5-HT turn-over (Puumala and Sirvio, 1998) and release in the PFC (Dalley et al., 2002) and activation of 5-HT2A/2C receptors by DOI (Koskinen et al., 2000). However, global forebrain 5-HT depletion consistently results in enhanced impulsivity (Soubrié, 1986; Harrison et al., 1997; Carli and Samanin, 2000; Mobini et al., 2000). This apparent discrepancy may be resolved by 5-HT exerting inhibitory activity on impulsivity through 5-HT2C but not 5- HT2A receptors since decreasing their activity leads to impulsivity (Higgins et al., 2003b; Winstanley et al., 2004a; Fletcher et al., 2007). This suggestion is further supported by findings that activation of 5-HT2C receptors decreases while their suppression increases premature responding in the 5-CSRT task under various conditions such as when the waiting period is increased (Carli, 2006b; Fletcher et al., 2007) or premature responding is enhanced by NMDA receptor antagonists (Higgins et al., 2003b; Calcagno et al., 2009).

Like systemic NMDA receptor antagonists, intra-mPFC infusion of R-CPP enhances DA release in the mPFC (**Table 2**) (Moghaddam et al., 1997; Del Arco and Mora, 1999; Feenstra et al., 2002). Increasing DA transmission by d-amphetamine increases perseverative responses in the 5-CSRT task (Baunez and Robbins, 1999). Although microdialysis studies show that 8- OH-DPAT increases DA release in the mPFC (Arborelius et al., 1993; Sakaue et al., 2000) it actually reduces the rise in cortical DA release induced by d-amphetamine, stress and isolation rearing (Rasmusson et al., 1994; Kuroki et al., 1996; Ago et al., 2002) and attenuate d-amphetamine-induced motor activation (Przegalinski and Filip, 1997). The D2 receptor antagonist haloperidol also decreases R-CPP-induced perseverative responding (Baviera et al., 2008) (**Table 3**). It is plausible that 8-OH-DPAT could decrease perseverative responding through its action on DA mechanisms. However, the effects of 8-OH-DPAT were due to activation of 5-HT1A receptors in the mPFC as a selective 5-HT1A antagonist WAY100635 completely blocked the effects of 8-OH-DPAT on accuracy deficit and perseverative responding (Carli et al., 2006a).

## **COMPARISON WITH mGLU2***/***<sup>3</sup> RECEPTORS**

It is worth noting that the effects of M100907 on R-CPP-induced impairments in 5-CSRT task performance and the increase in glutamate release in the mPFC (see **Table 3**) are mimicked by mGlu2*/*<sup>3</sup> receptor agonist LY379268 (Pozzi et al., 2011). 5-HTevoked excitatory post-synaptic currents are similarly inhibited by 5-HT2A antagonist M100907 and by mGlu2*/*<sup>3</sup> receptor agonists (1S,3S)-ACPD and LY354740 and enhanced by the mGlu2*/*<sup>3</sup> antagonist LY341495 (Aghajanian and Marek, 1999, 2000; Marek et al., 2000). Activation of 5-HT2A receptors by DOI or LSD increases excitatory post-synaptic currents and potentials, glutamate release, c-fos in PFC, and induces head-twitch response (Aghajanian and Marek, 2000; Gewirtz and Marek, 2000; Klodzinska et al., 2002; Zhai et al., 2003; Gonzalez-Maeso et al., 2008). All these effects are blocked by 5-HT2A antagonists or by mGlu2*/*<sup>3</sup> agonists. This functional analogy may be based in part on anatomical overlap of mGlu2 particularly in apical dendrites of lamina V with the riches distribution of 5-HT2A receptors (Blue et al., 1988; Aghajanian and Marek, 1999; Marek et al., 2000, 2001) but may also derive from the mGlu2 receptor forming a complex through the specific transmembrane helix with 5-HT2A receptor (Gonzalez-Maeso et al., 2008).

#### **DORSAL STRIATAL 5-HT RECEPTORS**

The 5-HT afferents arising mainly in the DR nucleus (Steinbusch, 1984) innervate all components of the basal ganglia circuitry (Lavoie and Parent, 1991). The fact that 5-HT modulates not only DA but also GABA and glutamate neurotransmission in the dorsal striatum and output regions of the basal ganglia (Nicholson and Brotchie, 2002) suggest a 5-HTergic regulation of action selection and motor control (Di Matteo et al., 2008) but little is known about their contribution to cognitive function.

Among the various 5-HT receptor subtypes present within dorsal striatum the 5-HT2A and 5-HT2C receptors are particularly abundant (Barnes and Sharp, 1999). They are equally distributed on medium spiny neurons (MSN) forming the direct striatonigral and the indirect striatopallidal output projections but also on GABA and cholinergic (ACh) interneurons (Ward and Dorsa, 1996; Eberle-Wang et al., 1997). These 5-HT2 receptor subtypes play a prominent role in the modulation of striatal DA function (Abdallah et al., 2009; Navailles and De Deurwaerdere, 2011) and excite striatal ACh and fast spiking GABA interneurons (Blomeley and Bracci, 2005, 2009). The 5-HT2 receptor antagonists administered within the striatum block DA-mediated oral activity (Plech et al., 1995), synergize D1-induced locomotor activity (Bishop and Walker, 2003) and cause retrograde amnesia in rats (Prado-Alcala et al., 2003a,b). Loss of 5-HT2C receptors enhances behavioral sensitivity to D1 receptor activation (Abdallah et al., 2009).

#### **PERFORMANCE IN THE 5-CSRT TASK**

The effects of 5-HT2A and 5-HT2C receptor agents injected in the dm-STR on attention and executive deficit induced by R-CPP (50 ng/side) injections in the mPFC are summarized in **Table 4**.

#### **ACCURACY**

Activation of 5-HT2C or blockade of 5-HT2A receptors in the dm-STR reduce accuracy deficit induced by R-CPP. These data concur with the above discussed data showing opposing roles of these receptors on the neurochemical processes that support the 5-CSRT task performance deficits induced by NMDA receptor antagonists. However, cortical 5-HT2A receptors exert a much more effective control over attention as much higher dose of M100907 had to be administered in the dm-STR than in the mPFC to achieve an effect on accuracy. The PFC shows much higher levels of 5-HT2A hybridization signals than the dorsal striatum (Pompeiano et al., 1994) and there is a substantial and reciprocal control of the activity of DR cortical 5-HT neurotransmission by PFC (Hajos et al., 1998, 1999; Celada et al., 2001). This control has an important functional role; for example 5-HT depletion abolishes the facilitatory effects of M100907 on accuracy and its ability to prevent R-CPP-induced glutamate release in

**Table 4 | The effects of serotonin and dopamine receptors agents injected in the dm-STR on attention and executive deficits induced by R-CPP injections in the mPFC.**


*Data from aAgnoli and Carli, 2011, bAgnoli et al., 2013.*

↓*, decrease;* ↑*, increase; 0, reversal of R-CPP-induced effect.*

the mPFC (Winstanley et al., 2004a; Calcagno et al., 2009) but also in stress-induced activation of DR (Amat et al., 2005). The DR from which originates the 5-HT projection to the dorsal striatum does not receive reciprocal innervation from the striatum indicating no direct modulation by striatal feedback (Casanovas et al., 1999).

## **IMPULSIVITY AND COMPULSIVITY**

In contrast to the lack of effect of systemic or intracortical M100907 and Ro-60-0175 on perseverative responding, M100907 and Ro60-0175 administered locally in the dm-STR reduced perseverative responding caused by R-CPP. As systemic injections of these 5-HT2 agents had no effect on R-CPP-induced perseverative over-responding it is conceivable that the reduction of perseverative responding in one brain area such as dm-STR, is compensated by opposite effects in other brain regions. In fact, M100907 injected in the ventral tegmental area (VTA) further enhanced perseverative responding caused by blockade of NMDA receptors in the mPFC (Agnoli and Carli, 2012). These findings are in keeping with evidence that different 5-HT receptor subtypes have distinct roles in the modulation of perseverative responses, depending on the type of cognitive process engaged by the task and brain area. For example, 5-HT in the PFC is not essential for higher-order shifting of attentional set while it is critical for the flexible responding in a reversal learning task (Clarke et al., 2005). The 5-HT2A and 5-HT2C receptors exert functionally opposing action on perseverative responding in a spatial-reversal task (Boulougouris et al., 2008) while suppression of 5-HT2C receptors in the orbitofrontal cortex but not in PFC decreases perseverative errors in a similar reversal-learning task (Boulougouris and Robbins, 2010).

It is worth noting that the ability of intra-STR M100907 and Ro60-0175 to remove impulsivity and compulsivity induced by blockade of mPFC NMDA receptors is remarkably similar to what found after systemic or intra dm-STR injections of a D2 receptor antagonist haloperidol (Baviera et al., 2008; Agnoli et al., 2013). It should be noted that blockade of NMDA receptors in the mPFC increases glutamate, GABA, and DA release in the dm-STR (see **Table 2**). As pointed above substantial neurochemical and behavioral evidence support the suggestion that 5-HT can influence DA's effects in the striatum, which may be of relevance for the observed analogy in the behavioral effects 5-HT2A and 5-HT2C receptor agents and D2 receptor antagonists. However, as M100907 and Ro60-0175 but not haloperidol had additional effects on accuracy deficit other likely non-D2, mechanisms in the dm-STR may also contribute. Local infusion of M100907 has been shown to decrease basal and MPTP-stimulated glutamate levels in the dorsal striatum and ameliorate behavioral impairment of MPTP-treated mice (Ansah et al., 2011).

Activation of 5-HT receptors in the striatum elicits predominantly inhibitory responses in the medium spiny (MS) projection neurons (el Mansari et al., 1994; el Mansari and Blier, 1997). 5- HT though 5-HT2C excites striatal ACh interneurons, which in turn inhibit the glutamatergic input to MS projection neurons (Pakhotin and Bracci, 2007). Notably changes in firing activity of ACh interneurons encode behaviorally relevant information (Morris et al., 2004; Yamada et al., 2004). Activation of 5-HT2C receptors strongly increases the firing of GABAergic interneurons in the striatum, which potently inhibit striatal output (Blomeley and Bracci, 2009). Thus, 5-HT2 receptor subtypes through a likely action on glutamate, ACh and GABA mechanisms in the dm-STR may integrate the glutamate cortico-striatal inputs critical for the different aspects of performance in the 5-CSRT task.

### **DOPAMINE/NMDA RECEPTORS INTERACTION AND ATTENTION PERFORMANCE**

#### **DORSAL STRIATAL D1- AND D2-LIKE RECEPTORS**

DA receptors are broadly expressed in the brain with a distribution largely matching the density of innervating DA fibers (Bentivoglio and Morelli, 2005). Among the DA receptors the D1 and D2 receptor subtypes display the widespread distribution and the highest expression levels. They are most prominent in the dorsal and ventral striatum, olfactory tubercle, and cortex (Bentivoglio and Morelli, 2005).

In the striatum the D1 and D2 receptors are segregated to the two MSN output populations of neurons forming direct striatonigral and indirect striato-pallidal patways, respectively. However, both D1 and D2 receptors are expressed in a subset of MSN neurons in the striatum. Whether cooperative effects of D1 and D2 receptors observed in some studies (Perreault et al., 2011) arise from complex network interactions or from their co-localization in some MSN neurons is unclear. Striatal interneurons although proportionally small (5–10% of total neuronal population) exert powerful influence on striatal output. Five distinct GABA subtypes (distinguished by different neuropeptide expression, synthetic enzymes and calcium binding proteins) and one type of ACh interneurons are present. D2 and D1-like (D5) receptors are expressed in ACh interneurons while some GABA interneurons express D1-like (D5) receptors. D2 receptor is expressed also on presynaptic DA terminals of DA afferents as well as in glutamatergic cortical and thalamic afferents. D1 receptors have also been found in a small number of presynaptic glutamatergic terminals (Bentivoglio and Morelli, 2005).

Numerous studies have demonstrated that DA through preand post-synaptic D1 and D2 receptors modulate the probability of release at glutamate, GABA and ACh terminals, ionotropic glutamate, and GABA receptor function and trafficking, postsynaptic excitability and synaptic integration in striatal projecting neurons and interneurons as well as in cortical pyramidal cells and interneurons. DA bi-directionally modulates synaptic NMDA receptors through its D1- and D2-like receptors, but the responses of individual neurons across brain areas and the intracellular pathways recruited vary greatly. These studies (for a comprehensive review see Surmeier et al., 2010; Tritsch and Sabatini, 2012) reveal the complex nature and consequences of this modulation on neural networks implicated in motor, cognitive, and motivational processes (Di Chiara, 2005; Dunnett, 2005; Robbins, 2005; Berridge, 2007; Salamone and Correa, 2012).

#### **PERFORMANCE IN THE 5-CSRT TASK**

The effects of D1- and D2-like receptor antagonists SCH23390 and haloperidol injected in the dm-STR on accuracy and executive deficits induced by R-CPP injections in the mPFC are summarized in **Table 4**.

#### **ACCURACY**

In rats in which accuracy was impaired by intra-mPFC injections of R-CPP (50 ng/side) suppression of D1-like receptor activity in the dm-STR by an antagonist such as SCH23390 prevented accuracy deficit. It could be argued that in rats performing the 5-CSRT task at a very high level of efficiency (i.e., high accuracy) the activity at dorsal striatal D1-like receptors may be already at a maximum and any further activation such as that operated by intra-mPFC injections of R-CPP (**Table 2**) may have detrimental effects. On the other hand, concomitant infusion of SCH23390 and R-CPP in the dm-STR at individually ineffective doses had detrimental effects on accuracy (Agnoli and Carli, 2011). Thus, suppression of D1 receptor function in the dm-STR has positive or detrimental effects on accuracy depending on whether corticostriatal neurotransmission is increased or decreased. Interestingly, another study also reported that systemic SCH23390 tend to improve accuracy of rats with excitotoxic lesion to mPFC (Passetti et al., 2003b).

These findings stand rather alone as in the majority of published studies the effects of D1-like agents was examined in normal rats performing the task under baseline conditions. The picture that emerges is that detrimental effects of SCH23390 on accuracy of rats performing the task under basal conditions depend on dose, brain area, and baseline level of accuracy (Granon et al., 2000; Pezze et al., 2007). In rats performing the task at relatively high level of accuracy (between 80 and 90%) SCH23390 injected in the mPFC or NAC impairs accuracy (Granon et al., 2000; Pezze et al., 2007) while the same dose (100 ng) injected in the dm-STR had no effect (Agnoli and Carli, 2011). Similarly the effect of SKF38393 a D1-like receptor agonist on accuracy is also baseline dependent; injected in the dm-STR or given systemically impairs accuracy of rats performing at high levels of efficiency (about 90%correct) whereas injected in the mPFC boosts accuracy but only in poorly performing rats (about 70% correct) (Granon et al., 2000). SKF38393 injected in the NAC boost accuracy of rats performing at less than 80% correct but only at the lowest dose tested (100 ng) while the same dose injected in the dorsolateral striatum had no effect. These discrepancies in the effects of D1 receptor agents on accuracy are far from surprising as it has been repeatedly shown that the effects of D1 receptor manipulation on task performance depend on the optimal levels of DA for the particular task (Sawaguchi and Goldman-Rakic, 1991; Arnsten, 1997; Zahrt et al., 1997; Granon et al., 2000; Pezze et al., 2003; Chudasama and Robbins, 2004; Robbins, 2005). This is reminiscent of the Yerkes-Dodson principle based on the inverted U-shaped function relating levels of arousal/activation with efficiency of behavioral performance (Robbins, 2005) but also to the inverted U-shaped function relating D1 receptor activation and NMDA-EPSC changes (Seamans and Yang, 2004; Trantham-Davidson et al., 2004; Tritsch and Sabatini, 2012). As for PFC an inverted U-shaped function may relate D1 receptor stimulation in the dm-STR to the efficiency of attentional functioning.

Contrasting with the findings above, suppression of D2-like receptor activity in the dm-STR by local injections of haloperidol has no effect in control rats and is unable to recover accuracy deficit caused by blockade of NMDA receptors in the mPFC (Agnoli et al., 2013). Similar lack of effects on R-CPP-induced accuracy deficit is observed after systemic haloperidol (Baviera et al., 2008). However, the doses used in these studies were effective in reversing other R-CPP-induced effects (see below). That D2-like receptors in the dorsal striatum are unlikely to be involved in governing accuracy is supported by data showing that their activation by quinpirole an agonist at these receptors has no effect either injected in the dorsomedial or dorsolateral striatum (Pezze et al., 2007; Agnoli et al., 2013). Doses of haloperidol and quinpirole higher than those reported in the studies by Baviera et al. (2008) and Agnoli et al. (2013) cannot be tested in the 5-CSRT task as rats stop responding or make mostly omissions.

Systemic haloperidol do not allows for the precise definition of the locus of D2 receptor suppression. However, after systemic administration haloperidol binds comparable proportion of D2 receptors in the striatum (caudate-putamen) and in the frontal cortex (Mukherjee et al., 2001) and the protein structure of the D2 receptors throughout the brain are similar and so is their *in vitro* affinity (Seeman and Ulpian, 1983). It is worth noting that a chemically different D2-like receptor antagonist such as lsulpiride injected in the mPFC had no effect on accuracy (Granon et al., 2000). Although these findings may suggest that cortical D2-like receptors do not contribute to accuracy further studies are necessary to better delineate the role of PFC D2 receptors in attention. However, l-sulpiride given systemically or in the NAC impairs accuracy in control rats but prevents accuracy deficit in rats bearing excitotoxic lesions of the mPFC (Passetti et al., 2003b; Pezze et al., 2009). L-sulpiride does not discriminate between D2 and D3 receptor subtypes (Missale et al., 1998). The D3 receptors are present at very low levels in the mPFC and dorsal striatum but are particularly abundant in the NAC and limbic regions (Sokoloff et al., 1990; Bentivoglio and Morelli, 2005) and hence it could not be excluded the possibility that D3 receptors in the NAC may account for the effect of l-sulpiride. However, as systemic or intra-NAC nafadotride, a preferential D3 (compared to D2) receptor antagonist (Sautel et al., 1995) has no effect on accuracy (Besson et al., 2010) the precise contribution of D3 receptors for the control of accuracy has yet to be fully disclosed.

Intra-dm-STR injections of haloperidol or SCH23390 did not reduce the R-CPP-induced increase in omissions and correct response latencies. It is unlikely that the increased proportion of omissions was due to a change in motivation as the latency to collect the food, which is a more direct measure of motivation was not affected. This increase in omissions may indicate an inability to maintain voluntary control over sustained performance due to motor hyperactivity. However, haloperidol and SCH23390 did not reduce R-CPP-induced motor hyperactivity (Agnoli, 2011). SKF38393 speeded correct responses and decreased omissions when injected in the dm-STR (see Table 2 in Agnoli et al., 2013) similarly to what reported after intra-mPFC injection of this compound while systemic SKF38393 decrease correct response latencies (Passetti et al., 2003a). These finding are broadly consistent with a general performance scaling function of tonic DA activity (Cagniard et al., 2006) and with evidence from other reaction time tasks that striatal DA is implicated in decisional processes (Carli et al., 1985, 1989; Robbins and Brown, 1990; Brown and Robbins, 1991).

#### **IMPULSIVITY AND COMPULSIVITY**

Dorsomedial striatal D1-like and D2-like receptors play an important role in the expression of impulsivity in the 5-CSRT task as both SCH23390 and haloperidol injected in the dm-STR dose-dependently reversed R-CPP-induced premature overresponding, a proxy of impulsivity. Blockade of these D1- and D2-like receptors in control conditions had no effect or decrease premature responses depending on the dose employed and the number of premature responses made by rats under the control condition (Agnoli and Carli, 2011; Agnoli et al., 2013) while their activation by SKF38393 and quinpirole, respectively increase premature responses (Agnoli et al., 2013). The finding that R-CPPinduced motor hyperactivity was not affected by SCH23390 and haloperidol (Agnoli, 2011) suggest that their ability to decrease R-CPP-induced impulsivity is not due to alteration in motor activity and helps dissociating impulsivity from changes in motor activity.

These findings question the prevailing hypothesis that impulsivity can be mostly attributed to the mesolimbic not the nigrostriatal DA system as d-amphetamine-induced impulsivity in the 5-CSRT task was abolished by ventral striatal but not dorsal striatal DA depletion (Cole and Robbins, 1989; Baunez and Robbins, 1999). In addition other studies had shown that D1- but not D2-like receptors in the NAC core contribute to impulsivity under basal conditions (Pezze et al., 2007) whereas D2-like receptors in the NAC appear to come into play only under perturbed conditions such as those induced by amphetamine or in rats made impulsive by excitotoxic lesion of the PFC (Cole and Robbins, 1987; Pattij et al., 2007; Pezze et al., 2009). In the case of high-impulsive rats D2*/*<sup>3</sup> antagonist nafadotride alleviated or exacerbated impulsivity depending whether injected in the core or shell sub-region of the NAC, respectively (Besson et al., 2010). However, DA depletion in the dorsal striatum reversed impulsivity in the 5-CSRT task induced by lesions to the subthalamic nucleus (Baunez and Robbins, 1999) and D2 receptor availability in the dorsal striatum was associated with impulsiveness in methamphetamine-dependent subjects (Lee et al., 2009). Thus, the modulation of impulsivity by DA mechanisms in the dorsal striatum may be detected in particular conditions.

Haloperidol but not SCH23390 injected in the dm-STR reduce perseverative responding induced by blockade of NMDA in the mPFC. The effects of systemic and intra-dm-STR injected haloperidol are remarkably similar; both reduce perseverative and premature over-responding but not accuracy deficit (see, **Tables 3**, **4**). These findings are in accord with a study showing that the "compulsive" stimulus bound perseveration of monkeys after frontal ablation is also alleviated by haloperidol (Ridley et al., 1993). However, other studies report that l-sulpiride either after systemic or intra-NAC injections had no effect in rats made compulsive by excitotoxic lesions of mPFC (Passetti et al., 2003b; Pezze et al., 2009). The causes for the lack of effect of this D2/D3 antagonist are not clear and may depend on various factors such as whether emitting a perseverative responses leads to behavioral consequence or not, brain area or else; for example it has been repeatedly shown that similar pharmacological manipulations increase perseverative responding when these lead to time-out but not when they are without consequences (Harrison et al., 1999; Robbins, 2002; Mirjana et al., 2004; Winstanley et al., 2004a; Murphy et al., 2005).

On the other hand, in normal rats performing the 5-CSRT task at baseline conditions activation of D2-like receptors in the dm-STR dose-dependently increase perseverative responding (Agnoli et al., 2013) similarly to what found after injections of similar doses of quinpirole in the NAC core but not after injections in the dorsolateral striatum (Pezze et al., 2007). That the perseverative responses in the 5-CSRT task may be modulated by nigrostriatal DA system is also suggested by paradoxical increase in these responses after dorsal striatal DA depletion (Baunez and Robbins, 1999) most likely due to the supersensivity of D2-receptors after 6-hydroxydopamine lesion (Ungerstedt, 1971). These findings are in agreement with other studies linking changes in D2 receptors function at various nodes of cortico-striatal circuit to flexible modification of behavior. Although it could not be assumed that perseverative errors in the 5-CSRT task and those made in other tasks such as for example in set-shifting, reversal learning or working memory represent the same psychological process, Floresco et al. (2006) report increased number of perseverative errors after blockade of D2-like receptors in the mPFC in a maze based set-shifting task while Goto and Grace (2005) report that PFC-dependent perseveration in a task requiring an egocentric response strategy depends on tonic DA release and D2-like receptor stimulation in the striatum. In addition, mice over-expressing D2 receptors in the striatum make more perseverative errors in a working memory task (Kellendonk et al., 2006). D2-receptor stimulation by quinpirole increases preseverative but not learning errors of rats performing a spatial reversal task (Boulougouris et al., 2009). The probability of perseverative responses of monkeys performing a three-choice reversal task is also related to D2-receptor availability in the dorsal striatum (Groman et al., 2011).

The lack of effects of D1-like receptor agents injected either systemically, in the mPFC, NAC or dm-STR on perseverative responding in the 5-CSRT task (Granon et al., 2000; Pezze et al., 2007; Agnoli and Carli, 2011; Barnes et al., 2012a; Agnoli et al., 2013) contrast with evidence that D1-like receptors in the mPFC or NAC control perseverative type errors in set-shifting and working memory tasks (Zahrt et al., 1997; Ragozzino, 2002; Haluk and Floresco, 2009). Thus, both DA receptor subtypes act in a cooperative manner to control a component of set-shifting such as ability to disengage from the previously effective but now inappropriate strategy whereas in the 5-CSRT task they appear to control separate cognitive processes such as those engaged by accuracy of visual discrimination and perseverative responding. However, the perseverative over-responding in the 5-CSRT task may result from a deficit in the selection and integration of an adequate response in a long sequence, leading to reward rather than the inability to flexibly adapt to the shifts between rules, strategies and sets. The organization of complex sequences of actions and the ordering of movements within a sequence implicate dorsal striatum with its DA afferents (Graybiel, 1998; Hikosaka et al., 1998; Bailey and Mair, 2006; Jin and Costa, 2010; Jin et al., 2014). Notably, the D1-nigrostriatal and D2-striatopallidal basal ganglia pathways show concomitant activity during action selection and initiation but behave differently during the execution of action sequences (Jin et al., 2014).

## **ATTENTION IMPAIRMENT AND GLUTAMATE RELEASE IN THE mPFC**

One of the characteristics of the microdialysis technique is the possibility to deliver drugs through the probe while collecting neurotransmitters generated and secreted by cells. In our microdialysis studies unilateral perfusion of 100µM R-CPP through the probe in the mPFC for 1 h evoked a marked and reliable increase of glutamate, 5-HT and DA and a reduction of GABA therein (**Tables 2**, **3**). However, the non-competitive NMDA receptor antagonists dizocilpine and ketamine increased cortical 5-HT efflux after bilateral but not unilateral infusion into the mPFC (Lopez-Gil et al., 2012). Although different administration techniques were used in behavioral (intraparenchimal injection) and microdialysis (perfusion through the probe) studies, the total amount of R-CPP delivered were similar (see Discussion in Calcagno et al., 2009). In addition, extracellular glutamate increased to a similar extent after R-CPP perfusion through the probe or intraparenchimal injection of the drug (50 ng/side) at the same dose as used in behavioral studies (see Figure S2 in Calcagno et al., 2009). This strengthens the link between microdialysis and behavioral data.

The proposal that excessive prefronto-cortical glutamate release plays a key role in cognitive deficit stems from the study by Moghaddam and Adams (1998) and is fuelled by a series of observation summarized in **Table 3**. While impulsivity and compulsivity do not appear to be associated with glutamate release in the PFC, **Table 3** illustrate a tight association between the ability of several compounds to prevent R-CPP-induced attention deficits in the 5-CSRT task and the stimulation of glutamate release in the rat mPFC. The first evidence for this association was obtained with the selective 5-HT2A receptors antagonist M100907. It was found that the same systemic doses of M100907 preventing attention deficit in the 5-CSRTT abolished the R-CPP-induced glutamate relase in the mPFC (Ceglia et al., 2004). However, another study failed to observe such interaction (Adams and Moghaddam, 2001). The perfusion of M100907 through the probe mimicked the effect of systemic injection in suppressing R-CPP- (Ceglia et al., 2004) and dizocilpineinduced rise of extracellular glutamate in the mPFC (Lopez-Gil et al., 2007). These findings indicate that cortical 5-HT2A receptors may play a major role and that the stimulation of glutamate release may play a role in the attentional performance deficits caused by NMDA receptor blockade. Cortical 5-HT1A and 5-HT2A receptors co-localize in most pyramidal neurons of the mPFC (Santana et al., 2004) and exert opposite effect on their excitability (Araneda and Andrade, 1991; Ashby et al., 1994), head-twitches behavior (Darmani et al., 1990) and cortical dopamine release induced by D2 receptor blockade (Ichikawa et al., 2001). On this basis, it is expected that 5-HT1A receptor stimulation ameliorate attention deficit induced by R-CPP (Carli et al., 2006a) by a mechanism similar to that of M100907. This was confirmed showing that intracortically perfused 8-OH-DPAT, a relatively selective 5-HT1A receptors agonist, shared with M100907 the ability to prevent R-CPP-induced glutamate release in the mPFC (Calcagno et al., 2006). WAY100635 antagonized the effect of 8-OH-DPAT on glutamate release suggesting a selective involvement of 5-HT1A receptors (Calcagno et al., 2006). These data were recently confirmed showing that dizocilpine-induced release of glutamate and 5-HT in the mPFC were suppressed by Bay × 3702, a 5-HT1A receptor agonist (Lopez-Gil et al., 2009) and strengthen the suggestion that excessive glutamate in the mPFC is deleterious for attentional performance. Further support comes from studies showing that the 5-HT2C receptor agonist Ro60-0175 mimicked M100907 in suppressing R-CPP-evoked glutamate release while the 5-HT2C receptor antagonist SB242084 prevented the effect of M100907 on glutamate (Calcagno et al., 2009). This is not surprising in view of the well-recognized functional opposition between 5-HT2A and 5-HT2C receptors (Millan et al., 1998; Gobert and Millan, 1999) and suggests that 5-HT2C receptors play a major role in controlling the effect of R-CPP on cortical glutamate release. Interestingly, R-CPP-induced rise of extracellular glutamate and 5-HT in the rat mPFC was prevented by M100907 and 5-HT depletion abolished these effects (Calcagno et al., 2009). Likewise, endogenous 5-HT is necessary for M100907 to inhibit motor activity induced by dizocilpine in mice (Martin et al., 1998). Although the effect of R-CPP on 5-HT is not related to its ability to impair attention or executive control (see **Table 3**), it could be argued that enhanced 5-HT tone on cortical 5-HT1A may contribute to the ability of M100907 to counteract the effect of R-CPP on glutamate. However, failure of WAY100635 to prevent the effect of M100907 on R-CPP-induced glutamate release (Calcagno et al., 2009) rules this out. Thus, it is likely that M100907 suppresses glutamate release induced by R-CPP by enhancing the action of endogenous 5-HT on 5-HT2C receptors. Taken together, these findings suggest that an imbalance in the control exerted by endogenous 5-HT on different receptor subtypes, rather than an action at a single receptor, determines the effect of NMDA antagonists on glutamate release and behavior.

The role of glutamate release in attention performance is further supported by data showing that the activation of pre-synaptic mGlu2/3 receptors, which suppress glutamate release, was sufficient to reduce R-CPP-induced accuracy deficits in the 5-CSRT task (**Table 3**). Similarly, the stimulation of mGlu2/3 receptors prevented the working memory impairment induced by PCP in the T-maze (Moghaddam and Adams, 1998).

Antipsychotic drugs show a complex pharmacology involving actions at different neurotransmitter receptors including agonist, antagonist, or partial agonist interactions with 5-HT1A, 5-HT2A, 5-HT2C receptors (Arnt and Skarsfeldt, 1998), which may influence the effects of NMDA receptor antagonists on attention and cortical glutamate. In the 5CSRT task clozapine, olanzapine and low doses of sertindole prevented R-CPP-induced impairment of correct responses and impulsivity but had no effects on compulsivity (Baviera et al., 2008; Carli et al., 2011a,b) resembling the effect of M100907. These antipsychotics block with high affinity 5-HT2A receptors (Arnt and Skarsfeldt, 1998), which likely played a major role in their effects on attention. Aripiprazole effect in the 5CSRT task resembled that of 8-OH-DPAT (see **Table 3**) suggesting an involvement of 5-HT1A receptor stimulation (Carli et al., 2011b). Regardless of their precise mechanism of action, we found that clozapine, olanzapine, sertindole (low doses), and aripiprazole, which share the ability to counteract attention deficit induced by R-CPP, consistently suppressed R-CPP evoked glutamate release in the mPFC while 0.1 mg/kg haloperidol, which occupies most brain D2 receptors (Mukherjee et al., 2001) and 2.5 mg/kg sertindole, did not reverse attention deficits and had no effect of glutamate release (**Table 3**). In line with these findings, other studies showed that clozapine and olanzapine prevented dizocilpine-induced glutamate release (Lopez-Gil et al., 2007). Although 0.3 and higher doses of haloperidol reversed R-CPP and dizocilpine effects on glutamate (Lopez-Gil et al., 2009; Carli et al., 2011b), at 0.3 mg/kg rats stop responding or make mostly omissions, so their effects in the 5-CSRT task could not be reliably assessed. It should be emphasized that the same doses of drugs were used in our behavioral and microdialysis studies. This contributes to support the proposal that excessive glutamate release in the mPFC is deleterious for attention.

## **CONCLUSIONS**

In this review a special emphasis was given to distinct processes that govern the performance of rats in the 5-CSRT task. It is apparent that the input selection process of attention and executive control over impulsive and perseverative responding may be the results of integration of NMDA receptor function and the activity in 5-HT and DA receptor systems along the nodes of cortico-striatal circuitry.

Blockade of NMDA receptors in the mPFC induces a profound deficit in rat's performance in the 5-CSRT task characterized by impaired attention, increased impulsivity and perseverative responding and hyperactivation of cortico-striatal transmission. The reviewed studies show that these deficits are differentially responsive to pharmacological manipulations of 5-HT and DA receptor activity in the mPFC and dm-STR and that increased cortical glutamate release and cortico-striatal transmission is associated specifically with impaired attention but not with enhanced impulsivity and perseverative responding.

Direct comparison of the effects of various 5-HT1A, 5-HT2A, and 5-HT2C agonists and antagonists most clearly implicate these 5-HT receptors in the mPFC in the preservation of input selection process of attention. Impulsivity in the 5-CSRT task, which has been definitely linked to changes in 5-HT function (Dalley and Roiser, 2012) is best controlled by suppression of 5-HT2A or activation of 5-HT2C receptors. In contrast, perseverative response deficit appear to be responsive to activation of 5-HT1A receptor in the mPFC and the suppression of 5-HT2A and activation of 5-HT2C receptors in the dm-STR and VTA. In view of the well-recognized control of striatal and cortical DA function by 5-HT1A and 5-HT2 receptors and the similar effects of a D2-like receptor antagonist such as haloperidol on perseverative response deficit, it is likely that this 5-HT receptors' control of perseverative responding may be the result of a functional interaction with D2-like receptor mechanisms. Manipulation of 5-HT receptors in another task putatively employed to evaluate similar processes confirms that, depending on the cortical area, 5-HT2A and 5-HT2C receptors exert functionally opposing action on perseverative responding (Boulougouris et al., 2008; Boulougouris and Robbins, 2010). Together, these studies highlight the complexity but also specificity of influences that 5-HT exert on prefrontal control of attention and executive functions depending on the receptor subtype, brain areas and specific processes engaged by the task.

The studies reviewed here also show a clear-cut dissociation in the roles played by dm-STR D1-like and D2-like receptors in the control of accuracy and perseverative responding. There is a definite relation between D1 receptor and attention but this relationship is not linear as it can be influenced by many factors such as the levels of baseline performance and optimal levels of DA for the performance of a particular task (Robbins, 2005). While accuracy is not responsive to D2-like receptor activity, the suppression of D1 receptor activity may improve or impair accuracy depending on the activity in the cortico-striatal transmission. The sensitivity of input selection process of attention to D1-like receptor manipulation in the dm-STR is in marked contrast to the lack of effect on processes underlying a form of behavioral flexibility such as that indexed by perseverative responses. Although there appears to be some overlap between D1-like and D2-like receptors in the modulation of certain domains of behavioral flexibility such as that involved in the ability to flexibly adapt to shift between rules, strategies, and sets (Floresco and Jentsch, 2011) the studies reviewed here clearly show that a different form of behavioral flexibility, which may result from the inability to select and integrate an adequate response in a long sequence leading to reward, is under control of D2-like but not D1-like receptor activity in the dm-STR. The two dorsal striatal DA receptor subtypes appear to act in a cooperative manner to control a different component of executive control such as impulsivity.

The suggestion emerging from this review is that the differential modulation of attention and executive functions by the 5-HT and DA systems highlights a degree of specificity for these "nonspecific" neurochemical pathways. These systems integrate the information conveyed by cortical pyramidal neurons at the level of functional modules, which are engaged selectively to optimize the operations necessary for the attentional and executive control over performance. The PFC controls the activity in these neurochemical pathways that in-turn they themselves modulate suggesting that this reciprocal control is essential for cognition.

The impairment in the 5-CSRT task performance by NMDA receptor antagonist administration in the mPFC may represent a model of attentional and executive dysfunction useful to explore the role of brain circuits and neurotransmitter systems in the cognitive symptoms of neuropsychiatric disorders.

#### **REFERENCES**


spatial reversal learning in rats. *Neuropsychopharmacology* 33, 2007–2019. doi: 10.1038/sj.npp.1301584


the prefrontal cortex are dependent on activation of glutamatergic neurotransmission. *Neuropharmacology* 42, 752–763. doi: 10.1016/S0028-3908(02) 00029-1


expression in discrete regions of rat brain. *Eur. J. Pharmacol.* 417, 189–194. doi: 10.1016/S0014-2999(01)00926-8


transmission and motor function. *Neuroscience* 72, 89–97. doi: 10.1016/0306- 4522(95)00556-0


modulation through activation of a gamma-aminobutyric acidB (GABAB) receptor subtype. *Brain Res.* 604, 325–330. doi: 10.1016/0006-8993(93)90384-Y


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 March 2014; accepted: 14 May 2014; published online: 11 June 2014. Citation: Carli M and Invernizzi RW (2014) Serotoninergic and dopaminergic modulation of cortico-striatal circuit in executive and attention deficits induced by NMDA receptor hypofunction in the 5-choice serial reaction time task. Front. Neural Circuits 8:58. doi: 10.3389/fncir.2014.00058*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Carli and Invernizzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits

#### *Kenji Morita1 \* and Ayaka Kato2*

*<sup>1</sup> Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan*

*<sup>2</sup> Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*David J. Margolis, Rutgers University, USA Eleftheria Kyriaki Pissadaki, University of Oxford, UK*

#### *\*Correspondence:*

*Kenji Morita, Physical and Health Education, Graduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan e-mail: morita@p.u-tokyo.ac.jp*

It has been suggested that the midbrain dopamine (DA) neurons, receiving inputs from the cortico-basal ganglia (CBG) circuits and the brainstem, compute reward prediction error (RPE), the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained. These reward expectations are suggested to be stored in the CBG synapses and updated according to RPE through synaptic plasticity, which is induced by released DA. These together constitute the "DA=RPE" hypothesis, which describes the mutual interaction between DA and the CBG circuits and serves as the primary working hypothesis in studying reward learning and value-based decision-making. However, recent work has revealed a new type of DA signal that appears not to represent RPE. Specifically, it has been found in a reward-associated maze task that striatal DA concentration primarily shows a gradual increase toward the goal. We explored whether such ramping DA could be explained by extending the "DA=RPE" hypothesis by taking into account biological properties of the CBG circuits. In particular, we examined effects of possible time-dependent decay of DA-dependent plastic changes of synaptic strengths by incorporating decay of learned values into the RPE-based reinforcement learning model and simulating reward learning tasks. We then found that incorporation of such a decay dramatically changes the model's behavior, causing gradual ramping of RPE. Moreover, we further incorporated magnitude-dependence of the rate of decay, which could potentially be in accord with some past observations, and found that near-sigmoidal ramping of RPE, resembling the observed DA ramping, could then occur. Given that synaptic decay can be useful for flexibly reversing and updating the learned reward associations, especially in case the baseline DA is low and encoding of negative RPE by DA is limited, the observed DA ramping would be indicative of the operation of such flexible reward learning.

**Keywords: dopamine, basal ganglia, corticostriatal, synaptic plasticity, reinforcement learning, reward prediction error, flexibility, computational modeling**

## **INTRODUCTION**

The midbrain dopamine (DA) neurons receive inputs from many brain regions, among which the basal ganglia (BG) are particularly major sources (Watabe-Uchida et al., 2012). In turn, the DA neurons send their axons to a wide range of regions, with again the BG being one of the primary recipients (Björklund and Dunnett, 2007). This anatomical reciprocity between the DA neurons and the BG has been suggested to have a functional counterpart (**Figure 1A**) (Doya, 2000; Montague et al., 2004; Morita et al., 2013). Specifically, the BG (in particular, the striatum) represents reward expectations, or "values" of stimuli or actions (Kawagoe et al., 2004; Samejima et al., 2005), and presumably influenced by inputs from it, the DA neurons represent the temporal-difference (TD) reward prediction error (RPE), the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained (Montague et al., 1996; Schultz et al., 1997; Steinberg et al., 2013). In turn, released DA induces or significantly modulates plasticity of corticostriatal synapses (Calabresi et al., 1992; Reynolds et al., 2001; Shen et al., 2008) so that the values of stimuli or actions stored in these synapses are updated according to the RPE (**Figure 1B**). Such a suggested functional reciprocity between the DA neurons and the cortico-BG (CBG) circuits, referred to as the "DA=RPE" hypothesis here, has been guiding research on reward/reinforcement learning and value-based decision-making (Montague et al., 2004; O'Doherty et al., 2007; Rangel et al., 2008; Glimcher, 2011).

Recently, however, Howe et al. (2013) have made an important finding that challenges the universality of the "DA=RPE" hypothesis. Specifically, they have found that, in a reward-associated spatial navigation task, DA concentration in the striatum [in particular, the ventromedial striatum (VMS)] measured by fast-scan cyclic voltammetry (FSCV) primarily shows a gradual increase toward the goal, in both rewarded and unrewarded trials. The "DA=RPE" hypothesis would, in contrast, predict that striatal DA shows a phasic increase at an early timing (beginning of the trial and/or the timing of conditioned stimulus) and also shows a later

**FIGURE 1 | Mutual interaction between dopamine (DA) and the cortico-basal ganglia (CBG) circuits, and its suggested functional counterpart. (A)** DA neurons in the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNc) receive major inputs from the basal ganglia (BG) and the brainstem. In turn, DA released from these neurons induces plastic changes of synapses in the CBG circuits, in particular, corticostriatal synapses (indicated by the dashed ellipse). This mutual interaction between DA and the CBG circuits has been suggested to implement the algorithm of reinforcement learning as follows. (1) States or actions are represented in the cortex or the hippocampus, and receiving inputs from them, neurons in the BG, in particular, medium spiny neurons in the striatum represent values (reward expectations) of the states/actions, with these values stored in the strengths of the corticostriatal synapses. (2) The DA neurons receive inputs from the BG, as well as inputs from the brainstem, which presumably convey the signal of obtained reward, and compute reward prediction error (RPE). (3) Then, released DA, representing the RPE, induces plastic changes of the corticostriatal synapses, which implement the update of the values (reward expectations) according to the RPE. **(B)** Presumed implementations of processes (1) and (3).

decrease, rather than an increase, in the case of unrewarded trials (c.f., Niv, 2013).

In most existing theories based on the "DA=RPE" hypothesis, it is assumed that neural circuits in the brain implement mathematical reinforcement learning algorithms in a perfect manner. Behind the request of such perfectness, it is usually assumed, often implicitly, that DA-dependent plastic changes of synaptic strength, which presumably implement the update of reward expectations according to RPE, are quite stable, kept constant without any decay. However, in reality, synapses might be much more dynamically changing, or more specifically, might entail time-dependent decay of plastic changes. Indeed, decay of synaptic potentiation has been observed at least in some experiments examining (presumably) synapses from the hippocampal formation (subiculum) to the ventral striatum (nucleus accumbens) in anesthetized rats (Boeijinga et al., 1993) or those examining synapses in hippocampal slices (Gustafsson et al., 1989; Xiao et al., 1996). Also, active dynamics of structural plasticity of spines has recently been revealed in cultured slices of hippocampus (Matsuzaki et al., 2004). Moreover, functional relevance of the decay of synaptic strength has also been recently put forward (Hardt et al., 2013, 2014). In light of these findings and suggestions, in the present study we explored through computational modeling whether the observed gradual ramping of DA can be explained by extending the "DA=RPE" hypothesis by taking into account such possible decay of plastic changes of the synapses that store learned values. (Please note that we have tried to describe the basic idea of our modeling in the Results so that it can be followed without referring to the Methods.)

## **METHODS**

#### **INCORPORATION OF DECAY OF LEARNED VALUES INTO THE REINFORCEMENT LEARNING MODEL**

We considered a virtual spatial navigation (unbranched "I-maze") task as illustrated in **Figure 2A**. It was assumed that in each trial subject starts from *S*1, and moves to the neighboring state in each time step until reaching *Sn* (goal), where reward *R* is obtained, and subject learns the values of the states through the TD learning algorithm (Sutton and Barto, 1998). For simplicity, first we assumed that there is no reward expectation over multiple trials. Specifically, in the calculation of RPE at *S*<sup>1</sup> and *Sn* in every trial, the value of the "preceding state" or the "upcoming state" was assumed to be 0, respectively; later, in the simulations shown in **Figure 4**, we did consider reward expectation over multiple trials. According to the TD learning, RPE (TD error) at *Si* in trial *k* (= 1, 2, . . .), denoted as δ*<sup>i</sup>* (*k*), is calculated as follows:

$$
\delta\_i(k) = R\_i(k) + \mathcal{Y}V\_i(k) - V\_{i-1}(k)\,,
$$

where *Vi* (*k*) and *Vi* <sup>−</sup> <sup>1</sup> (*k*) are the value of *Si* and state *Si* <sup>−</sup> <sup>1</sup> in trial *k*, respectively, *Ri* (*k*) is the reward obtained at *Si* in trial *k* [*Rn*(*k*) = *R* and *Ri*(*k*) = 0 in the other states], and γ (0 ≤ γ ≤ 1) is the time discount factor (per time step). This RPE is used for updating *Vi* <sup>−</sup> 1(*k*) as follows:

$$V\_{i-1}(k+1) = V\_{i-1}(k) + \alpha \delta\_i(k),$$

where α (0 ≤ α ≤ 1) represents the learning rate. At the goal (*Sn*) where reward *R* is obtained, these equations are calculated as follows (**Figure 2Ba**):

$$
\delta\_n(k) = R + 0 - V\_{n-1}(k)
$$

$$
V\_{n-1}(k+1) = V\_{n-1}(k) + \alpha \delta\_n(k)
$$

$$
= V\_{n-1}(k) + \alpha \{ R - V\_{n-1}(k) \},
$$

given that *Vn*(*k*) = 0 (representing that reward expectation across multiple trials is not considered as mentioned above). In the limit of *k* → ∞ (approximating the situation after many trials) where *Vn* <sup>−</sup> <sup>1</sup> (*k*) = *Vn* <sup>−</sup> <sup>1</sup> (*k* + 1) ≡ (denoted as) *V*<sup>∞</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>, the above second equation becomes

$$\begin{aligned} V\_{n-1}^{\infty} &= V\_{n-1}^{\infty} + \alpha \left( R - V\_{n-1}^{\infty} \right), \\ \therefore \ V\_{n-1}^{\infty} &= R \end{aligned}$$

spatial navigation (unbranched "I-maze") task associated with reward. In each trial, subject starts from *S*<sup>1</sup> (start), and moves to the neighboring state out at each state according to the reinforcement learning model: (I) RPE *(Continued)*

#### **FIGURE 2 | Continued**

δ*<sup>i</sup>* = *Ri* + γ *Vi* − *Vi* <sup>−</sup> <sup>1</sup> is calculated, where *Ri* is reward obtained at S*i* (*Ri* = 0 unless *i* = *n*); *Vi* and *Vi* <sup>−</sup> <sup>1</sup> are the values of state *Si* and *Si* <sup>−</sup> 1, respectively; γ (0 ≤ γ ≤ 1) is the time discount factor, and (II) the calculated RPE is used to update the value of *Si* <sup>−</sup> <sup>1</sup> : *Vi* <sup>−</sup> <sup>1</sup> → - *Vi* <sup>−</sup> <sup>1</sup> + αδ*<sup>i</sup>* , where α (0 ≤ α ≤ 1) is the learning rate and (0 ≤ ≤ 1) is the decay factor: = 1 corresponds to the case of the standard reinforcement model without decay, and < 1 corresponds to the case with decay. The bottom-right inset shows the same computations at the goal (*Sn*): note that *Vn* is assumed to be 0, indicating that reward is not expected after the goal in a given trial [reward expectation over multiple trials is not considered here for simplicity; it is considered later in the simulations shown in **Figure 4** (see the Methods)]. **(B)** Trial-by-trial changes of *Vn* <sup>−</sup> <sup>1</sup> (value of *Sn* <sup>−</sup> 1) in the simulated task shown in **(A)**. **(a)** The case of the standard reinforcement learning model without decay [ = 1 in **(A)**]. *Vn* <sup>−</sup> <sup>1</sup> (indicated by the brown bars)

and therefore

$$
\delta\_n(k) \to R + 0 - R = 0.
$$

Similarly, δ*<sup>n</sup>* <sup>−</sup> *<sup>j</sup>* (*j* = 1, 2, 3,...) can be shown to converge to 0 in the limit of *k* → ∞. This indicates that as learning converges, there exists no RPE at any states except for the start (*S*1), at which δ1(*k*) in the limit of *k* → ∞ is calculated to be γ *<sup>n</sup>* <sup>−</sup> <sup>1</sup>*R*.

Let us now introduce time-dependent decay of the value of the states into the model, in such a way that the update of the state value is described by the following equation (instead of the one described in the above):

$$V\_{i-1}(k+1) = \mathbb{x}\{V\_{i-1}(k) + \alpha \delta\_i(k)\}\,,$$

where (0< ≤ 1) represents the decay factor ( = 1 corresponds to the case without decay). At the goal (*Sn*), this equation is calculated as follows (**Figure 2Bb**):

$$V\_{n-1}(k+1) = \mathbb{1}\left\{V\_{n-1}\left(k\right) + \alpha\delta\_n\left(k\right)\right\}$$

$$= \mathbb{1}V\_{n-1}(k) + \alpha\mathbb{1}\left\{R - V\_{n-1}(k)\right\}.$$

In the limit of *k* → ∞ where *Vn* <sup>−</sup> <sup>1</sup> (*k*) = *Vn* <sup>−</sup> <sup>1</sup> (*k* + 1) ≡ (denoted as) *V*<sup>∞</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>, this equation becomes

$$\begin{aligned} V\_{n-1}^{\infty} &= \varkappa V\_{n-1}^{\infty} + \alpha \varkappa \left\{ R - V\_{n-1}^{\infty} \right\} \\ &\Leftrightarrow \left\{ 1 - \varkappa \left( 1 - \alpha \right) \right\} V\_{n-1}^{\infty} = \alpha \varkappa R \\ &\Leftrightarrow V\_{n-1}^{\infty} = \alpha \varkappa R / \{ 1 - \varkappa \left( 1 - \alpha \right) \}, \end{aligned}$$

and therefore

$$\delta\_{\pi}(k) \to R + 0 - \left[\alpha \varkappa R/\{1 - \varkappa \left(1 - \alpha\right)\}\right]$$

$$= \left[\left(1 - \varkappa\right)/\{1 - \varkappa \left(1 - \alpha\right)\}\right]R \quad (k \to \infty),$$

which is positive if is less than 1. This indicates that if there exists decay of the state values, positive RPE remains to exist after learning effectively converges, contrary to the case without decay mentioned above. Similarly, as for the value of *Vn* <sup>−</sup> 2(*k*) in the gradually increases from trial to trial, and eventually converges to the value of reward (*R*) after many trials while RPE at the goal (δ*<sup>n</sup>* = *R* + 0 − *Vn* <sup>−</sup> 1) converges to 0. **(b)** The case of the model incorporating the decay [ < 1 in **(A)**]. *Vn* <sup>−</sup> <sup>1</sup> does not converge to *R* but instead converges to a smaller value, for which the RPE-based increment (αδ*n*, indicated by the red dotted/solid rectangles) balances with the decrement due to the decay (indicated by the blue arrows). RPE at the goal (δ*n*) thus remains to be positive even after many trials. **(C)** The solid lines show the eventual (asymptotic) values of RPE after the convergence of learning at all the states from the start (*S*1) to the goal (*S*7) when there are 7 states (*n* = 7) in the model incorporating the decay, with varying **(a)** the learning rate (α), **(b)** the time discount factor (γ ), **(c)** the decay factor (), or **(d)** the amount of the reward obtained at the goal (*R*) [unvaried parameters in each panel were set to the middle values (i.e., α = 0.6, γ = 0.8(1/6) , = 0.75, and *R* = 1)]. The dashed lines show the cases of the model without decay.

limit of *k* → ∞, which we denote *V*<sup>∞</sup> *<sup>n</sup>* <sup>−</sup> <sup>2</sup>,

$$\begin{split} V\_{n-2}^{\infty} &= \varkappa V\_{n-2}^{\infty} + \alpha \varkappa \{ \varkappa V\_{n-1}^{\infty} - V\_{n-2}^{\infty} \} \\ &\Leftrightarrow \{ 1 - \varkappa (1 - \alpha) \} V\_{n-2}^{\infty} = \alpha \varkappa \chi V\_{n-1}^{\infty} \\ &= \alpha^2 \varkappa^2 \wp R / \{ 1 - \varkappa (1 - \alpha) \} \\ &\Leftrightarrow V\_{n-2}^{\infty} = \alpha^2 \varkappa^2 \wp R / \{ 1 - \varkappa (1 - \alpha) \}^2, \end{split}$$

and therefore

$$\begin{aligned} \delta\_{n-1}(k) &\to 0 + \chi V\_{n-1}^{\infty} - V\_{n-2}^{\infty} \\ &= \left[ \alpha \varkappa \gamma (1 - \varkappa) / \left\{ 1 - \varkappa (1 - \alpha) \right\}^2 \right] R \quad (k \to \infty) .\end{aligned}$$

Similarly, in the limit of *k* → ∞, the followings hold for *j* = 1, 2, 3,..., *n* − 2:

$$\begin{aligned} V\_{n-j}(k) \to V\_{n-j}^{\infty} &= \alpha^j \varkappa^j \nu^{j-1} R/\{1 - \varkappa \ (1 - \alpha)\}^j, \\ \delta\_{n-j}(k) \to \delta\_{n-j}^{\infty} &= [\alpha^j \varkappa^j \nu^j (1 - \varkappa)/\{1 - \varkappa \ (1 - \alpha)\}^{j+1}] R. \end{aligned}$$

At the start of the maze (*S*1) (*j* = *n* − 1), the value of the "preceding state" is assumed to be 0 given that reward expectation across multiple trials is not considered as mentioned above, and thus the followings hold in the limit of *k* → ∞:

$$V\_{n-j}(k) \to V\_{n-j}^{\infty} = a^j \varkappa^j \nu^{j-1} R / \{1 - \varkappa \,(1 - \alpha)\}^j,$$

$$\delta\_{n-j}^{\infty} = 0 + \varkappa \, V\_{n-j}^{\infty} - 0 = \varkappa \, V\_{n-j}^{\infty}$$

$$= a^j \varkappa^j \nu^j R / \{1 - \varkappa \,(1 - \alpha)\}^j.$$

The solid lines in **Figure 2C** show δ<sup>∞</sup> *<sup>i</sup>* for all the states from the start (*S*1) to the goal (*S*7) when there are 7 states (*n* = 7), with varying the learning rate (α) (**Figure 2Ca**), time discount factor (γ ) (**Figure 2Cb**), decay factor () (**Figure 2Cc**), or the amount of reward (*R*) (**Figure 2Cd**) (unvaried parameters in each panel were set to the middle values: α = 0.6, γ = 0.8(1/6) , = 0.75, and *R* = 1); the dashed lines show δ<sup>∞</sup> *<sup>i</sup>* in the model without incorporating the decay for comparison. As shown in the figures, in the cases with decay, the eventual (asymptotic) values of RPE after the convergence of learning entail gradual ramping toward the goal under a wide range of parameters. Also notably, as appeared in the " = 0.87" line in **Figure 2Cc**, depending on parameters, a peak at the start and a ramp toward the goal could coexist.

#### **MAGNITUDE-DEPENDENT RATE OF THE DECAY OF LEARNED VALUES**

We also considered cases where the rate of decay of learned values depends on the current magnitude of values so that larger values are more resistant to decay. We constructed a time-step-based model, in which decay with such magnitude-dependent rate was incorporated. Specifically, we again considered a model of the same I-maze task (**Figure 2A**) and assumed that RPE is computed at each time step *t* as follows:

$$\delta(t) = \mathcal{R}(t) + \mathcal{V}(\mathcal{S}(t)) - V(\mathcal{S}(t-1)),$$

where *S*(*t*) is the state at time step *t* and *V*(*S*(*t*)) is its value, and *R*(*t*) and δ(*t*) are obtained reward and RPE at time step *t*, respectively. γ is the time discount factor (per time step). According to this RPE, the value of state *S*(*t* − 1) was assumed to be updated as follows:

$$V\left(\mathcal{S}\left(t-1\right)\right) \to V\left(\mathcal{S}\left(t-1\right)\right) + \alpha\delta(t),$$

where α is the learning rate. We then considered the following function of value *V*:

$$\varkappa(V) = 1 - (1 - \varkappa\_1) \exp(-V/\varkappa\_2),$$

where <sup>1</sup> and <sup>2</sup> are parameters, and assumed that the value of every state decays at each time step as follows:

*V* → ( (*V*))1/*<sup>n</sup>* × *V* (for the value of states without update according to RPE), or

*V* → ( (*V*))1/*<sup>n</sup>* × (*V* + αδ) (for the value of state with update according to RPE).

**Figure 3Ba** shows the function (*V*) with a fixed value of <sup>1</sup> (<sup>1</sup> = 0.6) and various values of <sup>2</sup> [<sup>2</sup> = ∞ (lightest gray lines), 1.5 (second-lightest gray lines), 0.9 (dark gray lines), or 0.6 (black lines)], and **Figure 3Bb** shows the decay of learned values with each of these cases (with 7 time steps per trial assumed). For each of these cases, we simulated 100 trials of the I-maze task shown in **Figure 2A** with 7 states, with assuming γ = 0.8(1/6) and α = 0.5 and without considering reward expectation over multiple trials, and the eventual values of RPE are presented in the solid lines in **Figure 3Bc**. Notably, the time-step-based model described in the above is not exactly the same as the trial-based model described in the previous section even for the case where the rate of decay is constant: in the time-step-based model, upon the calculation of RPE: δ (*t*) = *R* (*t*) + γ*V* (*S* (*t*)) − *V* (*S* (*t* − 1)), *V* (*S* (*t*)) has suffered decay (*n* − 1) times, rather than *n* times (which correspond to a whole trial), after it has been updated last time.

#### **SIMULATION OF MAZE TASKS WITH REWARDED AND UNREWARDED GOALS**

As a simplified model of the T-maze free-choice task with rewarded and unrewarded goals used in the experiments (Howe et al., 2013) (see the Results for explanation of the task), we considered a free-choice task as illustrated in **Figure 4A**, where each state represents a relative location on the path expected to lead

**FIGURE 3 | Decay of learned values with magnitude-dependent rate leads to sigmoidal ramping of RPE resembling the observed DA ramping. (A)** was reprinted by permission from Macmillan Publishers Ltd: Nature (Howe et al., 2013), copyright (2013). **(A)** DA ramping in the ventromedial striatum observed in the experiments (Howe et al., 2013). **(B) (a)** Presumed magnitude-dependence of the rate of decay of learned values raised to the power of the number of time steps in a trial. *(Continued)*

#### **FIGURE 3 | Continued**

The horizontal black dashed line at 1 represents the case without decay, and the horizontal lightest-gray solid line at 0.6 represents the case of decay with a constant (magnitude-independent) rate. The three curved lines indicate three different degrees of magnitude-dependence of the rate of decay. **(b)** Decay of learned values under the different degrees of magnitude-dependence of the rate of decay [line colors (brightnesses) correspond to those in panel **(a)**]. **(c)** The solid lines indicate the values of RPE after 100 trials at all the states from the start (*S*1) to the goal (*S*7) in the simulated I-maze task shown in **Figure 2A** with 7 states (*n* = 7) in the model incorporating the decay with magnitude-dependent/independent rate, with varying the magnitude-dependence [line colors (brightnesses) correspond to those in **(a,b)**]. The dashed line shows the case of the model without decay.

to, or the path after passing, the rewarded or unrewarded goal or at either of the goals in each trial. We assumed that subject moves to the neighboring state in each time step, and chooses one of the two possible actions (leading to one of the two goals) at the branch point (*S*5), while learning the values of each stateaction pair (*A*1, *A*2, ··· : there is assumed to be only a single action "moving forward" in the states other than the branch point), according to one of the major reinforcement (TD) learning algorithms called Q-learning (Watkins, 1989) (for the reason why we have chosen Q-learning, see the Results), with additionally incorporating the decay of learned values with magnitudedependent rate. Specifically, at each time step *t*, RPE is computed as follows:

$$\delta(t) = R(t) + \gamma Q(A\left(t\right)) - Q\left(A\left(t - 1\right)\right) \text{(at states other than } S\_5\text{)}$$

$$\delta(t) = R(t) + \gamma \max\left\{ Q(A\_5), Q(A\_6) \right\} - Q(A\left(t - 1\right)) \text{ (at } \mathbb{S}\_5\text{)},$$

where *A*(*t*) is the state-action pair at time step *t* and *Q*(*A*(*t*)) is its value, and γ is the time discount factor (per time step). There were assumed to be *N* = 25 time steps per trial, including the inter-trial interval, and γ was set to γ = 0.81/25. According to this RPE, the value of the previous state-action pair is updated as follows:

$$Q\left(A\left(t-1\right)\right) \to Q\left(A\left(t-1\right)\right) + a\delta\left(t\right),$$

where α is the learning rate and it was set to 0.5. We then assumed that the value of every state-action pair (denoted as *Q*) decays at each time step as follows:

$$Q \to \left( \varkappa \left( Q \right) \right)^{1/N} \times Q,$$

where (*Q*) is the function introduced above, and <sup>1</sup> and <sup>2</sup> were set to <sup>1</sup> = 0.6 and <sup>2</sup> = 0.6. At the branch point (*S*5), one of the two possible actions (*A*<sup>5</sup> and *A*6) is chosen according to the following probability:

$$\text{Prob}\left(A\_5\right) = 1/\left(1 + \exp\left(-\beta\left(Q\left(A\_5\right) - Q\left(A\_6\right)\right)\right)\right),$$

$$\text{Prob}\left(A\_6\right) = 1/\left(1 + \exp\left(-\beta\left(Q\left(A\_6\right) - Q\left(A\_5\right)\right)\right)\right)$$

$$= 1 - \text{Prob}\left(A\_5\right),$$

where Prob(*A*5) is the probability that action *A*<sup>5</sup> is chosen, and β is a parameter determining the degree of exploration vs. exploitation upon choice (as β becomes smaller, choice becomes more and more exploratory); β was set to 1.5. In the simulations of this model, we considered reward expectation over multiple trials, specifically, we assumed that at the first time step in every trial, subject moves from the last state in the previous trial to the first state in the current trial, and RPE computation and value update are done in the same manner as in the other time steps.

In addition to the simulations of the Q-learning model, we also conducted simulations of the model with a different algorithm called SARSA (Rummery and Niranjan, 1994) (the results shown in **Figure 4F**), for which we assumed the following equation for the computation of RPE at the branch point (*S*5):

$$
\delta(t) = R(t) + \mathcal{Y}Q(A\_{\text{chosen}}) - Q(A(t-1)),
$$

where *A*chosen is the action that is actually chosen (either *A*<sup>5</sup> or *A*6), instead of the equation for Q-learning described above. In the simulations shown in **Figure 4C**, reward *R*(*t*) was assumed to be 1 only at one of the goals (*S*8) and set to 0 otherwise, whereas in the simulations shown in **Figure 4E** and **Figure 4F**, R(*t*) was assumed to be 1 and 0.25 at the two goals (*S*<sup>8</sup> and *S*9, respectively) and set to 0 otherwise. In addition to the modeling and simulations of the free-choice task, we also conducted simulations of a forced-choice task, which could be regarded as a simplified model of the forced-choice task examined in the experiments (Howe et al., 2013). For that, we considered sequential movements and action selection in the same state space (**Figure 4A**) but randomly determined choice (*A*<sup>5</sup> or *A*6) at the branch point (*S*5) in each trial rather than using the choice probability function described above (while RPE of the Q-learning type, taking the max of *Q*(*A*5) and *Q*(*A*6), was still assumed), and reward *R*(*t*) at the two goals were set to 1 (large reward) and 0.25 (small reward). In each of the conditions, 1000 trials were simulated, with initial values of *Q*(*A*) set to 0 for every state-action pair *A*. We did not specifically model sessions, but we considered that the 1000 trials were divided into 25 "pseudo-sessions," each of which consists of 40 trials, so as to calculate the average and s.e.m. of the mean RPE in individual pseudo-sessions across the 25 pseudo-sessions, which are shown in the solid and dashed lines in **Figures 4Ca,Db,Ea,Fb** (in these figures, the average ± standard deviation of RPE in individual trials across trials are also shown in the error bars). **Figures 4Cb,Eb** show the RPE in 401st ∼ 440th trials. In the simulations of 1000 trials for **Figures 4C,D,E** by the Q-learning model with decay, negative RPE did not occur. By contrast, negative RPE occurred rather frequently in the SARSA model (**Figure 4F**). The ratio that the rewarded goal (*S*8) was chosen (i.e., ratio of correct trials) was 65.6, 64.5, and 64.5% in the simulations of 1000 trials for **Figures 4C,E,F**, respectively. The simulations in the present work were conducted by using MATLAB (MathWorks Inc.), and the program codes will be submitted to the ModelDB (https:// senselab.med.yale.edu/modeldb/).

#### **RESULTS**

#### **DECAY OF PLASTIC CHANGES OF SYNAPSES LEADS TO RAMPING OF RPE-REPRESENTING DA SIGNAL**

We will first show how the standard reinforcement learning algorithm called the TD learning (Sutton and Barto, 1998) works and what pattern of RPE is generated by using a virtual reward learning task, and thereafter we will consider effects of possible decay

**(A)** Simulated free-choice T-maze task with rewarded and unrewarded

*(Continued)*

#### **FIGURE 4 | Continued**

specifically modeled, and thus there does not exist a particular state that corresponds to the start of each trial. **(B)** Temporal evolution of the DA concentration in the ventromedial striatum in the experiments (Howe et al., 2013). **(a)** Average DA for rewarded (blue) or unrewarded (red) trials. **(b)** Individual trials. **(C)** Temporal evolution of the RPE in the simulations of the model incorporating the decay of learned values with magnitude-dependent rate. **(a)** The thick solid blue and red lines indicate the average, across 25 "pseudo-sessions" (see the Methods), of the mean RPE for rewarded and unrewarded trials in each pseudo-session consisting of 40 trials, respectively. The dotted lines (nearly overlapped with the solid lines) indicate these averages ± s.e.m. across pseudo-sessions. The error bars indicate the average ± standard deviation of RPE in individual trials across trials. The vertical dotted, dashed, and solid gray lines correspond to the lines in **(A)**, indicating *S*1, *S*<sup>5</sup> (branch point), and *S*<sup>8</sup> or *S*<sup>9</sup> (goal) in the diagram, respectively. **(b)** Examples of the temporal evolution of RPE in individual trials in the simulations. **(Da)** DA concentration in the forced-choice task in

of plastic changes of synapses storing learned values. We considered a virtual spatial navigation task as illustrated in **Figure 2A**. In each trial, subject starts from *S*1, and moves to the neighboring state in each time step until reaching the goal (*Sn*), where reward *R* is obtained (unbranched "I-maze," rather than branched "T-maze," was considered first for simplicity). Based on the prevailing theories of neural circuit mechanisms for reinforcement learning (Montague et al., 1996; Doya, 2000), we have made the following assumptions: (1) different spatial locations, or "states," denoted as *S*<sup>1</sup> (=start), *S*2, ··· , *Sn* (=goal, where reward *R* is obtained), are represented by different subpopulations of neurons in the subject's brain (hippocampus and/or cortical regions connecting with it), and (2) "values" of these states are stored in the changes (from the baseline) in the strength of synapses between the state-representing neurons in the cortex/hippocampus and neurons in the striatum (c.f. Pennartz et al., 2011), and thereby the value of a given state *S*, denoted as *V*(*S*), is represented by the activity of a corresponding subpopulation of striatal neurons. We have further assumed, again based on the current theories, that the following pair of computations are carried out at each state (*Si*, *i* = 1, 2,..., *n*) in the DA-CBG system: (I) DA neurons receive (indirect) impacts from the striatal neurons through basal ganglia circuits, and compute the TD RPE: δ*<sup>i</sup>* = *Ri* + γ*Vi* − *Vi* <sup>−</sup> 1, where *Ri* is reward obtained at S*i* (*Ri* = 0 unless *i* = *n*); *Vi* and *Vi* <sup>−</sup> <sup>1</sup> are the "values" (meaning reward expectations after leaving the states) of state *Si* and *Si* <sup>−</sup> 1, respectively; and γ (0 ≤ γ ≤ 1) is a parameter defining the degree of temporal discount of future rewards called the time discount factor, and (II) the RPE is used to update the value of the previous state (i.e., *Si* <sup>−</sup> 1) through DA-dependent plastic changes of striatal synapses: *Vi* <sup>−</sup> <sup>1</sup> → (*Vi* <sup>−</sup> <sup>1</sup> + αδ*i*), where α (0 ≤ α ≤ 1) represents the speed of learning called the learning rate, and (0 ≤ ≤ 1) is a parameter for the time-dependent decay; we first considered the case of the standard reinforcement learning model without decay (the case with = 1).

Assume that initially subject does not expect to obtain reward after completion of the maze run in individual trials and thus the "values" of all the states are 0. When reward is then introduced into the task and subject obtains reward *Rn* = *R* at the goal (*Sn*), positive RPE δ*<sup>n</sup>* = *R* + γ*Vn* − *Vn* <sup>−</sup> <sup>1</sup> = *R* + 0 − 0 = *R* the experiments (Howe et al., 2013). The left red vertical line indicates the branch (choice) point, while the right red line indicates another (unbranched) turning point in the M-maze used in the experiments. **(b)** RPE in the simulations of the simplified forced-choice task by the model. Configurations are the same as those in **(Ca)** except for the colors: light-green and dark-green indicate the large-reward and small-reward cases, respectively. **(E)** RPE in another set of simulations, in which it was assumed that goal-reaching (trial completion) is in itself internally rewarding, specifically, *R*(*t*) in the calculation of RPE (δ(*t*)) at the rewarded goal and the unrewarded goal was assumed to be 1 (external + internal rewards) and 0.25 (internal reward only) [rather than 1 and 0 as in the case of **(C)**], respectively. Configurations are the same as those in **(C)**. **(F) (a)** DA concentration in the dorsolateral striatum in the experiments (Howe et al., 2013). **(b)** RPE in the model incorporating the algorithm called SARSA instead of Q-learning, which was assumed in the simulations shown in **(C,Db,E)**. It was assumed that goal-reaching (trial completion) is in itself internally rewarding in the same manner as in **(E)**. Configurations are the same as those in **(Ca)**.

occurs, and it is used to update the value of *Sn* <sup>−</sup> <sup>1</sup> : *Vn* <sup>−</sup> <sup>1</sup> → 0 + αδ*<sup>n</sup>* = α*R*. Then, in the next trial, subject again obtains reward *R* at the goal (S*n*) and positive RPE occurs; this time, the RPE amounts to δ*<sup>n</sup>* = *R* + γ*Vn* − *Vn* <sup>−</sup> <sup>1</sup> = *R* + 0 − α*R* = (1 − α) *R*, and it is used to update the value of *Sn* <sup>−</sup> <sup>1</sup> : *Vn* <sup>−</sup> <sup>1</sup> → α*R* + αδ*<sup>n</sup>* = - 2α − α<sup>2</sup> *R*. In this way, the value of *Sn* <sup>−</sup> <sup>1</sup> (*Vn* <sup>−</sup> 1) gradually increases from trial to trial, and accordingly RPE occurred at the goal (δ*<sup>n</sup>* = *R* − *Vn* <sup>−</sup> <sup>1</sup>) gradually decreases. As long as *Vn* <sup>−</sup> <sup>1</sup> is smaller than *R*, positive RPE should occur and *Vn* <sup>−</sup> <sup>1</sup> should increase in the next trial, and eventually, *Vn* <sup>−</sup> <sup>1</sup> converges to *R*, and RPE (δ*n*) converges to 0 (**Figure 2Ba**) (see the Methods for mathematical details). Similarly, values of the preceding states except for the initial state (V*<sup>n</sup>* <sup>−</sup> 1, *Vn* <sup>−</sup> <sup>2</sup>, ··· ; except for *V*1) also converge to *R* and RPE at these states (δ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>, δ*<sup>n</sup>* <sup>−</sup> <sup>2</sup>, ··· ; except for δ1) converges to 0. Thus, from the prevailing theories of neural circuit mechanisms for reinforcement learning, it is predicted that DA neuronal response at the timing of reward and the preceding timings except for the initial timing, representing the RPE δ*n*, δ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>, δ*<sup>n</sup>* <sup>−</sup> <sup>2</sup>, ··· , appears only transiently when reward is introduced into the task (or the amount of reward is changed), and after that transient period DA response appears only at the initial timing, as shown in the dashed lines in **Figure 2C**, which indicate eventual (asymptotic) values of RPE in the case with 7 states, with various parameters. The gradual ramping of DA signal observed in the actual reward-associated spatial navigation task (Howe et al., 2013) therefore cannot be explained by the DA=RPE hypothesis standing on the standard reinforcement (TD) learning algorithm (Niv, 2013).

Let us now assume that DA-dependent plastic changes of synaptic strengths are subject to time-dependent decay so that learned values stored in them decay with time. Let us consider a situation where *Vn* <sup>−</sup> <sup>1</sup> (value of *Sn* <sup>−</sup> 1) is smaller than *R* and thus positive RPE occurs at *Sn*. If there is no decay, *Vn* <sup>−</sup> <sup>1</sup> should be incremented exactly by the amount of this RPE multiplied by the learning rate (α) in the next trial, as seen above (**Figure 2Ba**). If there is decay, however, *Vn* <sup>−</sup> <sup>1</sup> should be incremented by the amount of α × RPE but simultaneously decremented by the amount of decay. By definition, RPE (δ*<sup>n</sup>* = *R* − *Vn* <sup>−</sup> <sup>1</sup>) decreases as *Vn* <sup>−</sup> <sup>1</sup> increases. Therefore, if the rate (or amount) of decay is constant, *Vn* <sup>−</sup> <sup>1</sup> could initially increase from its initial value 0 given that the net change of *Vn* <sup>−</sup> <sup>1</sup> per trial (i.e., α × RPE − decay) is positive, but then the net change per trial becomes smaller and smaller as *Vn* <sup>−</sup> <sup>1</sup> increases, and eventually, as α × RPE becomes asymptotically equal to the amount of decay, increase of *Vn* <sup>−</sup> <sup>1</sup> should asymptotically terminate (**Figure 2Bb**). Even at this asymptotic limit (approximating the situation after many trials), RPE at the goal (δ*n*) remains to be positive, because it should be equal to the amount of decay divided by α. Similarly, RPE at the timings preceding reward (δ*<sup>n</sup>* <sup>−</sup> <sup>1</sup>, δ*<sup>n</sup>* <sup>−</sup> <sup>2</sup>, ···) also remains to be positive (see the Methods for mathematical details). The situation is thus quite different from the case without decay, in which RPE at the goal and the preceding timings except for the initial timing converges to 0 as seen above. The solid lines in **Figure 2C** show the eventual (asymptotic) values of RPE in the I-maze task (**Figure 2A**) with 7 states in the case of the model with decay, amount of which is assumed to be proportional to the current magnitude of the state value (synaptic strength) (i.e., the *rate* of decay is constant, not depending on the magnitude), with varying the learning rate (α) (**Figure 2Ca**), the time discount factor (γ ) (**Figure 2Cb**), the decay factor (κ) (**Figure 2Cc**), or the amount of reward (*R*) (**Figure 2Cd**). As shown in the figures, under a wide range of parameters, RPE entails gradual ramping toward the goal, and the ramping pattern is proportionally scaled with the amount of reward (**Figure 2Cd**).

#### **EXPLANATION OF THE OBSERVED GRADUALLY RAMPING DA SIGNAL**

As shown so far, the experimentally observed gradual ramping of DA concentration toward the goal could potentially be explained by incorporating the decay of plastic changes of synapses storing learned values into the prevailing hypothesis that the DA-CBG system implements the reinforcement learning algorithm and DA represents RPE. In the following, we will see whether and how detailed characteristics of the observed DA ramping can be explained by this account. First, the experimentally observed ramping of DA concentration in the VMS entails a nearly sigmoidal shape (**Figure 3A**) (Howe et al., 2013), whereas the pattern of RPE/DA ramping predicted from the above model (**Figure 2C**) is just convex downward, with the last part (just before the goal) being the steepest. We explored whether this discrepancy can be resolved by elaborating a model. In the model considered in the above, we assumed decay with a constant (magnitudeindependent) rate. In reality, however, the rate of decay may depend on the magnitude of learned values (synaptic strengths storing the values). Indeed, it has been shown in hippocampal slices that longer tetanus trains cause a larger degree of longterm potentiation, which tends to exhibit less decay (Gustafsson et al., 1989). Also, in the experiments examining (presumably) direct inputs from the hippocampal formation (subiculum) to the nucleus accumbens (Figure 6A of Boeijinga et al., 1993), decay of potentiation appears to be initially slow and then accelerated. We constructed an elaborate model incorporating decay with magnitude-dependent rate, which could potentially be in accord with these findings. Specifically, in the new model we assumed that larger values (stronger synapses) are more resistant to decay (see the Methods for details). We simulated the I-maze task (**Figure 2A**) with this model, and examined the eventual values of RPE after 100 trials, with systematically varying the magnitude-dependence of the rate of decay (**Figures 3Ba,b**). **Figure 3Bc** shows the results. As shown in the figure, as the magnitude-dependence of the rate of decay increases so that larger values (stronger synapses) become more and more resistant to decay, the pattern of RPE ramping changes its shape from purely convex downward to nearly sigmoidal. Therefore, the experimentally observed nearly sigmoidal DA ramping could be better explained by tuning such magnitude-dependence of the rate of decay.

Next, we examined whether the patterns of DA signal observed in the free-choice task (Howe et al., 2013), specifically, cue (tone)—reward association T-maze task, can be reproduced by our model incorporating the decay. In that task, subject started from the end of the trunk of letter "T". As the subject moved forward, a cue tone was presented. There were two different cues (1 or 8 kHz) indicating which of the two goals lead to reward in the trial. Subject was free to choose either the rewarded goal or the unrewarded goal. In the results of the experiments, subjects chose the rewarded ("correct") goal in more than a half (65%) of trials overall, indicating that they learned the cue-reward association and made advantageous choices at least to a certain extent. During the task, DA concentration in the VMS was shown to gradually ramp up, in both trials in which the rewarded goal was chosen and those in which the unrewarded goal was chosen, with higher DA concentration at late timings observed in the rewarded trials (**Figure 4Ba**). We tried to model this task by a simplified free-choice task as illustrated in **Figure 4A**, where each state represents a relative location on the path expected to lead to, or the path after passing, the rewarded or unrewarded goal or at either of the goals in each trial. The VMS, or more generally the ventral striatum receives major dopaminergic inputs from the DA neurons in the ventral tegmental area (VTA), whose activity pattern has been suggested (Roesch et al., 2007) to represent a particular form of RPE defined in one of the major reinforcement (TD) learning algorithms called Q-learning (Watkins, 1989). Therefore, we simulated sequential movements and action selection in the task shown in **Figure 4A** by using the Q-learning model incorporating the decay of learned values with magnitude-dependent rate (see the Methods for details).

Given that the model's parameters are appropriately tuned, the model's choice performance can become comparable to the experimental results (about 65% correct), and the temporal evolution of the RPE averaged across rewarded trials and also the average across unrewarded trials can entail gradual ramping during the trial (**Figure 4Ca**), reproducing a prominent feature of the experimentally observed DA signal. In the experiments (Howe et al., 2013), the authors have shown that the momentto-moment level of DA during the trial is likely to reflect the proximity to goal (location in the maze) rather than elapsed time. Although our model does not have description of absolute time and space, the value of RPE in our model is uniquely determined depending on the state, which is assumed to represent relative location in the maze, and thus given that the duration of DA's representation of RPE co-varies with the duration spent in each state, our model could potentially be consistent with the observed insensitivity to elapsed time. A major deviation of the simulated RPE/DA from the experimentally observed DA signal is that difference between rewarded trials and unrewarded trials is much larger in the simulation results, as appeared in **Figure 4Ba** and **Figure 4Ca**. We will explore how this could be addressed below. **Figure 4Cb** shows examples of the temporal evolution of RPE in individual trials in the simulations. As appeared in the figure, ramping can occur in a single trial at least for a certain fraction of trials, although more various patterns, including ramping peaked at earlier times, transient patterns, and patterns with more than one peaks, also frequently appear (see **Figure 4Bb** for comparison with the experimental results). Closely looking at the simulation results (**Figure 4Cb**), there exist oblique stripe patterns from top right to bottom left (especially clearly seen for blue colors), indicating that upward or downward deviation of RPE values, first occurred at the timing of goal and the preceding timing due to presence or absence of reward, transmits to earlier timing (to the left in the figure) in subsequent trials (to the bottom). The reason for the appearance of such a pattern is that RPE is used to update the value of state-action pair at the previous timing. This pattern is a prediction from the model and is expected to be experimentally tested, although the difference in DA signal around the timing of goal between rewarded and unrewarded trials was much smaller in the experiments, as mentioned above, and thus finding such a pattern, even if exist, would not be easy.

In the study that we modeled (Howe et al., 2013), in addition to the free-choice task, the authors also examined a forced-choice task, in which subject was pseudo-randomly forced to choose one of the goals associated with high or low reward in each trial. The authors have then found that DA ramping was strongly biased toward the goal with the larger reward (**Figure 4Da**). We considered a simplified model of the forced-choice task, represented as state transitions in the diagram shown in **Figure 4A** with the two goals associated with large and small rewards and the choice in each trial determined (pseudo-)randomly (see the Methods for details). We conducted simulations of this task by using our model with the same parameters used in the simulations of the free-choice task, and found that the model could reproduce the bias toward the goal with the larger reward (**Figure 4Db**).

#### **EXPLANATION OF FURTHER FEATURES OF THE OBSERVED DA SIGNAL**

Although our model could explain the basic features of the experimentally observed DA ramping to a certain extent, there is also a major drawback as mentioned in the above. Specifically, in our simulations of the free-choice task, gradual ramping of the mean RPE was observed in both the average across rewarded trials and the average across unrewarded trials, but there was a prominent difference between these two (**Figure 4Ca**). In particular, whereas the mean RPE for rewarded trials ramps up until subject reaches the goal, the mean RPE for unrewarded trials ramps up partway but then drops to 0 after passing the branch point. In the experiments (Howe et al., 2013), the mean RPE for rewarded trials and that for unrewarded trials did indeed differentiate later in a trial (**Figure 4Ba**), but the difference was much smaller, and the timing of differentiation was much later, than the simulation results. The discrepancy in the timing could be partially understood given that our model describes the temporal evolution of RPE, which is presumably first represented by the activity (firing rate) of DA neurons whereas the experiments measured the concentration of DA presumably released from these neurons and thus there is expected to be a time lag, as suggested from the observed difference in latencies of DA neuronal firings (Schultz et al., 1997) and DA concentration changes (Hart et al., 2014). The discrepancy in the size of the difference between rewarded and unrewarded trials, however, seems not to be explained in such a straightforward manner even partially. In the following, we would like to present a possible explanation for it.

In the simulations shown in the above, it was assumed that the unrewarded goal is literally not rewarding at all. Specifically, in our model, we assumed a positive term representing obtained reward (*R*(*t*) > 0) in the calculation of RPE (δ(*t*)) at the rewarded goal, but not at the unrewarded goal [where *R*(*t*) was set to 0]. In reality, however, it would be possible that reaching a goal (completion of a trial) is in itself internally rewarding for subjects, even if it is the unrewarded goal and no external reward is provided. In order to examine whether incorporation of the existence of such internal reward could improve the model's drawback that the difference between rewarded and unrewarded trials is too large, we conducted a new simulation in which a positive term representing obtained external or internal reward (*R*(*t*) > 0) was included in the calculation of RPE (δ(*t*)) at both the rewarded goal and the unrewarded goal, with its size four times larger in the rewarded goal [i.e., *R(t)* = 1 or 0.25 at the rewarded or unrewarded goal, respectively; this could be interpreted that external reward of 0.75 and internal reward of 0.25 are obtained at the rewarded goal whereas only internal reward of 0.25 is obtained at the unrewarded goal]. **Figure 4E** shows the results. As shown in **Figure 4Ea**, the mean RPE averaged across unrewarded trials now remains to be positive after the branch point and ramps up again toward the goal (arrowheads in the figure), and thereby the difference between rewarded and unrewarded trials has become smaller than the case without internal reward. Neural substrate of the presumed positive term (*R*(*t*)) representing internal reward is not sure, but given the suggested hierarchical reinforcement learning in the CBG circuits (Ito and Doya, 2011), such inputs might originate from a certain region in the CBG circuits that controls task execution and goal setting (in the outside of the part that is modeled in the present work).

In the study that we modeled (Howe et al., 2013), DA concentration was measured in both the VMS and the dorsolateral striatum (DLS), and there was a difference between them. In the VMS, nearly constant-rate ramping starts just after the trialonset, and rewarded and unrewarded trials differentiate only in the last period, as we have seen above (**Figure 4Ba**). In the DLS, by contrast, initial ramping looks less prominent than in the VMS, while rewarded and unrewarded trials appear to differentiate somewhat earlier than in the VMS (**Figure 4Fa**). The VMS and DLS, or more generally the ventral striatum and dorsal striatum, are suggested to receive major dopaminergic inputs from the VTA and the substantia nigra pars compacta (SNc), respectively (Ungerstedt, 1971), though things should be more complicated in reality (Björklund and Dunnett, 2007; Bromberg-Martin et al., 2010). Both VTA and SNc DA neurons have been shown to represent RPE, but they may represent different forms of RPE used for different reinforcement (TD) learning algorithms. Specifically, it has been empirically suggested, albeit in different species, that VTA and SNc DA neurons represent RPE for Q-learning (Roesch et al., 2007) and SARSA (Morris et al., 2006), respectively; these two algorithms differ in whether the maximum value of all the choice options (Q-learning) or the value of actually chosen option (SARSA) is used for the calculation of RPE [see (Niv et al., 2006) and the Methods]. Conforming to this suggested distinction, so far we have assumed Q-learning in the model and compared the simulation results with the DA concentration in the VMS that receives major inputs from the VTA. The emerging question, then, is whether simulation results become more comparable to the DA concentration in the DLS if we instead assume SARSA in the model. We explored this possibility by conducting a new simulation, and found that it would indeed be the case. **Figure 4Fb** shows the simulation results of the model with SARSA, which also incorporated the internal reward upon reaching the unrewarded goal introduced above. Compared with the results with Q-leaning (**Figure 4Ea**), initial ramping looks less prominent, and rewarded and unrewarded trials differentiate earlier. These two differences could be said to be in line with the experimentally observed differences between the VMS and DLS DA concentrations as described above, although again the difference between rewarded and unrewarded trials is larger, and the timing of differentiation is earlier, in the model than in the experiment.

Intriguingly, in the study that has shown the representation of RPE for Q-learning in VTA DA neurons (Roesch et al., 2007), DA neurons increased their activity in a staggered manner from the beginning of a trial (before cue presentation) toward reward, with the activity in the middle of the increase shown to entail the characteristics of RPE. It is tempting to guess that such a staggered increase of VTA DA neuronal firing actually has the same mechanistic origin as the gradual increase of VMS DA concentration in the study that we modeled (Howe et al., 2013). Consistent with this possibility, in a recent study that has simulated the experiments in which VTA DA neurons were recorded (Roesch et al., 2007) by using a neural circuit model of the DA-CBG system (Morita et al., 2013), the authors have incorporated decay of learned values, in a similar manner to the present work, in order to reproduce the observed temporal pattern of DA neuronal firing, in particular, the within-trial increase toward reward (although it was not the main focus of that study and also the present work does not rely on the specific circuit structure/mechanism for RPE computation proposed in that study).

## **DISCUSSION**

While the hypothesis that DA represents RPE and DA-dependent synaptic plasticity implements update of reward expectations according to RPE has become widely appreciated, recent work has revealed the existence of gradually ramping DA signal that appears not to represent RPE. We explored whether such DA ramping can be explained by extending the "DA=RPE" hypothesis by taking into account possible time-dependent decay of DA-dependent plastic changes of synapses storing learned values. Through simulations of reward learning tasks by the RPE-based reinforcement learning model, we have shown that incorporation of the decay of learned values can indeed cause gradual ramping of RPE and could thus potentially explain the observed DA ramping. In the following, we discuss limitations of the present work, comparisons and relations with other studies, and functional implications.

#### **LIMITATIONS OF THE PRESENT WORK**

In the study that has found the ramping DA signal (Howe et al., 2013), it was shown that the peak of the ramping signals was nearly as large as the peak of transient responses to unpredicted reward. By contrast, in our simulations shown in **Figure 4E**, average RPE for all the trials at state *S*<sup>5</sup> is about 0.158, which is smaller than RPE for unpredicted reward of the same size in our model (it is 1.0). This appears to deviate from the results of the experiments. However, there are at least three potential reasons that could explain the discrepancy between the experiments and our modeling results, as we describe below.

First, in the experiments, whereas there was only a small difference between the peak of DA response to free reward and the peak of DA ramping during the maze task when averaged across sessions, the slope of the regression line between these two values (DA ramping / DA to free reward) in individual sessions (Extended Data Figure 5a of Howe et al., 2013) is much smaller than 1 (it is about 0.26). Indeed, that figure shows that there were rather many sessions in which the peak of DA response to free reward was fairly large (>15 nM) whereas the peak of DA ramping during the maze task was not large (<15 nM), while much less sessions exhibited the opposite pattern. How the large variability in DA responses in the experiments reflects heterogeneity of DA cells and/or other factors is not sure, but it might be possible to regard our model as a model of cells or conditions in which response to free reward was fairly large whereas ramping during the maze task was not large. Second, it is described in Howe et al. (2013) (legend of Extended Data Figure 5a) that DA response to free reward was compared with DA ramping measured from the same probes during *preceding* behavioral training in the maze. Given that the same type of reward (chocolate milk) was used in the task and as free reward, and that the measurements of DA response to deliveries of free reward were made after the measurements of DA ramping during 40 maze-task trials in individual sessions, we would think that there possibly existed effects of satiety. Third, the degree of unpredictability of the "unexpected reward" in the experiments could matter. Specifically, it seems possible that there were some sensory stimuli that immediately preceded reward delivery and informed the subjects of it such as sounds (generated in the device for reward supply) or smells. In such a case, conventional RPE models without decay predict that, after some experience of free reward, RPE of nearly the same size as that of RPE generated upon receiving ultimately unpredictable reward is generated at the timing of the sensory stimuli (unless time discount is extremely severe: size becomes smaller only due to time discount), and no RPE is generated at the timing of actual reward delivery. In contrast, and crucially, our model with decay predicts that, after some experience of free reward, RPE generated at the timing of the sensory stimuli is significantly smaller than RPE generated upon receiving ultimately unpredictable reward, and positive RPE also occurs upon receiving reward but it is also smaller than the ultimately unpredictable case (if the timing of the sensory stimuli is one time-step before the timing of reward in the model with the parameters used for **Figure 4**, RPE values at those two timings after 15 experiences are about 0.87 and 0.16, respectively; these two RPEs are about 0.99 and 0.00 in the case without decay). The mechanism of this can be schematically understood from **Figure 2Bb** by viewing *Vn* <sup>−</sup> <sup>1</sup> (bar height) and δ*<sup>n</sup>* (space above the bar) as RPEs at the timings of the preceding sensory stimuli and the actual reward delivery, respectively (as for the former, except for time discount); they are both smaller than the reward amount ("*R*"), which is the size of RPE generated upon receiving this reward ultimately unpredictably. With these considerations, we would think that the discrepancy between the experiments and the model in the relative sizes of the peak DA response to free reward and the peak DA ramping in the maze task could potentially be explained.

Other than the point described above, there are at least six fundamental limitations of our model. First, our model's behavior is sensitive to the magnitude of rewards. As shown in the Results, in our original model assuming decay with a constant rate, overall temporal evolution of RPE is proportionally scaled according to the amount of reward (**Figure 2Cd**). However, such a scalability no longer holds for the elaborated model incorporating the magnitude-dependent rate of decay, because the assumed magnitude-dependence (**Figure 3Ba**) is sensitive to absolute reward amount. Consequently, the patterns of RPE shown in **Figures 3** and **4** will change if absolute magnitude of rewards is changed. In reality, it is possible that magnitude-dependence of the rate of decay of learned values (synaptic strength) itself can be changed, in a longer time scale, depending on the average magnitude of rewards obtained in the current context. Second, whereas the free-choice task used in the experiments (Howe et al., 2013) involved cue-reward association, our simplified model does not describe it. Because of this, the state in our model is assumed to represent relative location on the path expected to lead to, or the path after passing, the rewarded or unrewarded goal or at either of the goals in each trial (as described before), but not absolute location since the absolute location of rewarded/unrewarded goal in the experiments was determined by the cue, which changed from trial to trial. Third, our model only has abstract representation of relative time and space, and how they are linked with absolute time and space is not defined. Fourth, validity of our key assumption that plastic changes of synapses are subject to time-dependent decay remains to be proven. There have been several empirical suggestions for the (rise and) decay of synaptic potentiation (Gustafsson et al., 1989; Boeijinga et al., 1993; Xiao et al., 1996) and spine enlargement (Matsuzaki et al., 2004) in the time scale of minutes, which could potentially fit the time scale of the maze task simulated in the present study, but we are currently unaware of any reported evidence for (or against) the occurrence of decay of DA-dependent plastic changes of synapses in animals engaged in tasks like the one simulated in the present study. Also, we assumed simple equations for the decay, but they would need to be revised in future works. For example, any plastic changes will eventually decay back to 0 according to the models in the present work, but in reality at least some portion of the changes is likely to persist for a long term as shown in the experiments referred to in the above. Fifth, regarding the origin of the ramping DA signal and its potential relationships with the DA=RPE hypothesis, there are potentially many possibilities, and the mechanism based on the decay of learned values proposed in the present study is no more than one of them (see the next section for two of other possibilities). Sixth, potential modulation of DA release apart from DA neuronal firing is not considered in the present study. We have assumed that the observed ramping DA signal in the striatum (Howe et al., 2013) faithfully reflects DA neuronal firing, which has been suggested to represent RPE. However, as pointed out previously (Howe et al., 2013; Niv, 2013), whether it indeed holds or not is yet to be determined, because DA neuronal activity was not measured in that study and DA concentration can be affected by presynaptic modulations of DA release, including the one through activation of nicotinic receptors on DA neuronal axons by cholinergic interneurons (Threlfell et al., 2012), and/or saturation of DA reuptake. Addressing these limitations would be interesting topics for future research.

#### **COMPARISONS AND RELATIONS WITH OTHER STUDIES**

Regarding potential relationships between the ramping DA signal in the spatial navigation task and the DA=RPE hypothesis, a recent theoretical study (Gershman, 2014) has shown that DA ramping can be explained in terms of RPE given nonlinear representation of space. This is an interesting possibility, and it is entirely different from our present proposal. The author has argued that his model is consistent with important features of the observed DA ramping, including the dependence on the amount of reward and the insensitivity to time until the goal is reached. Both of these features could also potentially be consistent with our model, although there are issues regarding the sensitivity of model's behavior to reward magnitude and the lack of representation of absolute time and space, as we have so far described. It remains to be seen whether the limitations of our model, including the large difference between rewarded and unrewarded trials, are not the case with his model. Notably, these two models are not mutually exclusive, and it is possible that the observed DA ramping is a product of multiple factors. Also, the possible correspondence between the differential DA signal in the ventral vs. dorsal striatum and Q-learning vs. SARSA mentioned in the Results could also hold with Gershman's model.

It has also been shown (Niv et al., 2005) that the conventional reinforcement learning model (without decay) can potentially explain ramping of averaged DA neuronal activity observed in a task with probabilistic rewards (Fiorillo et al., 2003), if it is assumed that positive and negative RPEs are asymmetrically represented by increase and decrease of DA neuronal activity from the baseline, with the dynamic range of the decrease narrower than that of the increase due to the lowness of the baseline firing rate. This mechanism did not contribute to the ramping of RPE in our simulations, because such asymmetrical representation was not incorporated into our model; actually, negative RPE did not occur in the 1000-trials simulations of our Q-learning model for **Figures 4C,Db,E**, while negative RPE occurred rather frequently in the SARSA model (**Figure 4Fb**). Notably, according to the mechanism based on the asymmetrical RPE representation by DA (Niv et al., 2005), ramping would not appear in the I-maze task where reward is obtained in every trial without uncertainty (**Figure 2A**) because negative RPE would not occur in such a situation, different from the cases of the decay-based mechanism proposed in the present work and the mechanism proposed by Gershman (Gershman, 2014) mentioned above. Experimental examination of the I-maze would thus be potentially useful to distinguish mechanisms that actually operate. In the meantime, the mechanism based on the asymmetrical RPE representation by DA is not mutually exclusive with the other two, and two or three mechanisms might simultaneously operate in reality.

#### **FUNCTIONAL IMPLICATIONS**

Given that the observed DA ramping is indicative of decay of learned values as we have proposed, what is the functional advantage of such decay? Decay would naturally lead to forgetting, which is rather disadvantageous in many cases. However, forgetting can instead be useful in certain situations, in particular, where environments are dynamically changing and subjects should continually overwrite old memories with new ones. Indeed, it has recently been proposed that decay of plastic changes of synapses might be used for active forgetting (Hardt et al., 2013, 2014). Inspired by this, here we propose a possible functional advantage of synaptic decay specifically for the DA-CBG system involved in value learning. In value learning, active forgetting is required when associations between rewards and preceding sensory stimuli are changed, such as the case of reversal learning in which cue-reward association is reversed unpredictably. In theory, flexible reversal of leaned association should be possible based solely on RPE without any decay: old association can be erased by negative RPE first, and new association can then be learned by positive RPE. However, in reality there would be a problem due to a biological constraint. Specifically, it has been indicated that the dynamic range of DA neuronal activity toward the negative direction from the baseline firing rate is much narrower than the positive side, presumably for the sake of minimizing energy cost (c.f., Laughlin, 2001; Bolam and Pissadaki, 2012; Pissadaki and Bolam, 2013), and thereby DA neurons can well represent positive RPE, but perhaps not negative RPE (Bayer and Glimcher, 2005) (see also Potjans et al., 2011). This indication has been challenged by subsequent studies: it has been shown (Bayer et al., 2007) that negative RPE was correlated with the duration of pause of DA neuronal firing, and a recent study using FSCV (Hart et al., 2014) has shown that DA concentration in the striatum in fact symmetrically encoded positive and negative RPE in the range tested in that study. Nevertheless, it could still be possible that representation of negative RPE by DA is limited in case the baseline DA concentration is low. In such a case, synaptic decay could be an alternative or additional mechanism for erasing old, already irrelevant cue-reward associations so as to enable flexible reversal/reconstruction of associations, with possibly the rate of decay itself changing appropriately (i.e., speeding up just after the reversal/changes in the environments) through certain mechanisms (e.g., monitoring of the rate of reward acquisition). We thus propose that decay of learned values stored in the DA-dependent plastic changes of CBG (corticostriatal) synapses would be a feature of the DA-CBG circuits, which endows the reinforcement learning system with flexibility, in a way that is also compatible with the minimization of energy cost.

With such consideration, it is suggestive that DA ramping was observed in the study using the spatial navigation task (Howe et al., 2013) but not in many other studies (though there could be symptoms as we discussed above). Presumably, it reflects that the spatial navigation task is ecologically more relevant, for rats, than many other laboratory tasks. In the wild, rats navigate to forage in dynamically changing environments, where flexibility of learning would be pivotal. Moreover, the overall rate of rewards in wild foraging would be lower than in many laboratory tasks, and given the suggestion that the rate of rewards is represented by the background concentration of DA (termed tonic DA) (Niv et al., 2007), tonic DA in foraging rats is expected to be low and thus representation of negative RPE by DA could be limited as discussed above. The rate of decay of learned values would therefore be adaptively set to be high so as to turn on the alternative mechanism for flexible learning, and it would manifest as the prominent ramping of DA/RPE in the task mimicking foraging navigation (even if the rate of rewards is not that low in the task, different from real foraging). If this conjecture is true, changing the volatility of the task, mimicking changes in the volatility of the environment, may induce adaptive changes in the rate of decay of learned values (synaptic strengths), which could cause changes in the property of DA ramping (c.f., **Figure 2Cc**): a testable prediction of our model.

Apart from the decay, DA ramping can also have more direct functional meanings. Along with its roles in plasticity induction, DA also has significant modulatory effects on the responsiveness of recipient neurons. In particular, DA is known to modulate the activity of the two types of striatal projection neurons to the opposite directions (Gerfen and Surmeier, 2011). Then, given that DA neurons compute RPE based on value-representing BG inputs, on which the activity of striatal neurons have direct and/or indirect impacts, ramping DA, presumably representing a gradual increase of RPE according to our model, would modulate the activity of striatal neurons and thereby eventually affect the computation of RPE itself. Such a closed-loop effects (c.f., **Figure 1A**) can potentially cause rich nonlinear phenomena through recurrent iterations. Exactly what happens depends on the precise mechanism of RPE computation, while the present work does not assume specific mechanism for it so that the results presented so far can generally hold. Just as an example, however, when the model of the present study is developed into a model of the DA-CBG circuit based on a recently proposed mechanism for RPE computation (Morita et al., 2012, 2013; Morita, 2014), consideration of the effects of DA on the responsiveness of striatal projection neurons can lead to an increase in the ratio of correct trials, indicating occurrence of positive feedback (unpublished observation). This could potentially represent self-enhancement of internal value or motivation (c.f., Niv et al., 2007). Such an exciting possibility is also expected to be explored in future work.

### **AUTHOR CONTRIBUTIONS**

Kenji Morita conceived and designed the research. Kenji Morita and Ayaka Kato performed the modeling, calculations, and simulations. Kenji Morita drafted the manuscript. Ayaka Kato commented on the manuscript, and contributed to its revision and elaboration.

## **ACKNOWLEDGMENTS**

This work was supported by Grant-in-Aid for Scientific Research on Innovative Areas "Mesoscopic Neurocircuitry" (No.25115709) of The Ministry of Education, Science, Sports and Culture of Japan and Strategic Japanese - German Cooperative Programme on "Computational Neuroscience" (project title: neural circuit mechanisms of reinforcement learning) of Japan Science and Technology Agency to Kenji Morita.

## **REFERENCES**


Xiao, M. Y., Niu, Y. P., and Wigström, H. (1996). Activity-dependent decay of early LTP revealed by dual EPSP recording in hippocampal slices from young rats. *Eur. J. Neurosci.* 8, 1916–1923. doi: 10.1111/j.1460-9568.1996.tb01335.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 January 2014; accepted: 24 March 2014; published online: 09 April 2014. Citation: Morita K and Kato A (2014) Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front. Neural Circuits 8:36. doi: 10.3389/fncir.2014.00036*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Morita and Kato. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control

## *Sakyasingha Dasgupta1,2\*, Florentin Wörgötter 1,2 and Poramate Manoonpong2,3*

*<sup>1</sup> Institute for Physics - Biophysics, George-August-University, Göttingen, Germany*

*<sup>2</sup> Bernstein Center for Computational Neuroscience, George-August-University, Göttingen, Germany*

*<sup>3</sup> Center for Biorobotics, Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark*

#### *Edited by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

#### *Reviewed by:*

*Kenji Morita, The University of Tokyo, Japan Bernd Porr, University of Glasgow, UK*

#### *\*Correspondence:*

*Sakyasingha Dasgupta, Bernstein Center for Computational Neuroscience, George-August-University, Friedrich-Hund Platz 1, 37077 Göttingen, Germany e-mail: sdasgup@gwdg.de*

Goal-directed decision making in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). A number of computational and experimental studies have well established the role of the basal ganglia in reward-based learning, where as the cerebellum plays an important role in developing specific conditioned responses. Although viewed as distinct learning systems, recent animal experiments point toward their complementary role in behavioral learning, and also show the existence of substantial two-way communication between these two brain structures. Based on this notion of co-operative learning, in this paper we hypothesize that the basal ganglia and cerebellar learning systems work in parallel and interact with each other. We envision that such an interaction is influenced by reward modulated heterosynaptic plasticity (RMHP) rule at the thalamus, guiding the overall goal directed behavior. Using a recurrent neural network actor-critic model of the basal ganglia and a feed-forward correlation-based learning model of the cerebellum, we demonstrate that the RMHP rule can effectively balance the outcomes of the two learning systems. This is tested using simulated environments of increasing complexity with a four-wheeled robot in a foraging task in both static and dynamic configurations. Although modeled with a simplified level of biological abstraction, we clearly demonstrate that such a RMHP induced combinatorial learning mechanism, leads to stabler and faster learning of goal-directed behaviors, in comparison to the individual systems. Thus, in this paper we provide a computational model for adaptive combination of the basal ganglia and cerebellum learning systems by way of neuromodulated plasticity for goal-directed decision making in biological and bio-mimetic organisms.

**Keywords: decision making, recurrent neural networks, basal ganglia, cerebellum, operant conditioning, classical conditioning, neuromodulation, correlation learning**

## **1. INTRODUCTION**

Associative learning by way of conditioning, forms the main behavioral paradigm that drives goal-directed decision making in biological organisms. Typically, this can be further classified into two classes, namely, classical conditioning (or correlation-based learning) (Pavlov, 1927) and operant conditioning (or reinforcement learning) (Skinner, 1938). In general, classical conditioning is driven by associations between an early occurring conditional stimulus (CS) and a late occurring unconditional stimulus (US), which lead to conditioned responses (CR) or unconditioned responses (UR) in the organism (Clark and Squire, 1998; Freeman and Steinmetz, 2011). The CS here acts as a predictor signal such that, after repeated pairing of the two stimuli, the behavior of the organism is driven by the CR (adaptive reflex action) at the occurrence of the predictive CS, much before the US arrives. The overall behavior is guided on the sole basis of stimulus-response (S-R) associations or correlations, without any explicit feedback in the form of rewards or punishments from the environment. In contrast to such classically conditioned reflexive behavior acquisition, operant conditioning provides an organism with adaptive control over the environment with the help of explicit positive or negative reinforcements (evaluative feedback) given for corresponding actions. Over sufficient time, this enables the organism to respond with good behaviors, while avoiding bad or negative behaviors. As such within the computational learning framework, this is usually termed reinforcement learning (RL) (Sutton and Barto, 1998).

At a behavioral level, although the two conditioning paradigms of associative learning appear to be distinct from each other, they seem to occur in combination as suggested from several animal behavioral studies (Rescorla and Solomon, 1967; Dayan and Balleine, 2002; Barnard, 2004). Behavioral studies with rabbits (Lovibond, 1983) demonstrate that the strength of operant responses can be influenced by simultaneous presentation of classically conditioned stimuli. This was further elaborated upon in the behavior of fruit flies (Drosophila), where both classical and operant conditioning predictors influence the behavior at the same time and in turn improve the learned responses (Brembs and Heisenberg, 2000). On a neuronal level, this relates to the interaction between the reward modulated action selection at the basal ganglia and the correlation based delay conditioning at the cerebellum. Although the classical notion has been to regard the basal ganglia and the cerebellum to be primarily responsible for motor control, increasing evidence points toward their role in non-motor specific cognitive tasks like goal-directed decision making (Middleton and Strick, 1994; Doya, 1999). Interestingly, recent experimental studies (Neychev et al., 2008; Bostan et al., 2010) show that the the basal ganglia and cerebellum not only form multi-synaptic loops with the cerebral cortex, but, two-way communication between the structures exist via the thalamus **Figure 1A**) along with substantial disynaptic projections to the cerebellar cortex from the subthalamic nucleus (STN) of the basal ganglia and from the dentate nucleus (cerebellar output stage) to the striatum (basal ganglia input stage) (Hoshi et al., 2005). This suggests that the two structures are not separate performing distinct functional operations (Doya, 2000a), but are linked together forming an integrated functional network. Such integrated behavior is further illustrated in the timing and error prediction studies of Dreher and Grafman (2002) showing that the activation of the cerebellum and basal ganglia are not specific to switching attention, as previously believed, because both these regions were activated during switching between tasks as well as during the simultaneous maintenance of two tasks.

Based on these compelling evidences we formulate the *neural combined learning hypothesis*, which proposes that goal-directed decision making occurs with a parallel adaptive combination (balancing) of the two learning systems (**Figure 1B**) to guide the final action selection. As evident from experimental studies (Haber and Calzavara, 2009), the thalamus potentially plays a critical role in integrating the neural signals from the two subnetworks while having the ability to modulate behavior through dopaminergic projections from the ventral tagmental area (VTA)

conditioned reflexive behaviors. Adapted and modified from Doya (2000a). **(B)** Combinatorial learning framework with parallel combination of ICO

(García-Cabezas et al., 2007; Varela, 2014). The motor thalamic (Mthal) relay nuclei, specifically the VA-VL (ventral anterior and ventral lateral) regions receive projections from the basal ganglia (inputs from the globas pallidus) as well as the cerebellum (inputs from the dentate nucleus) (Jones et al., 1985; Percheron et al., 1996). This can be further segregated with the ventral anterior and the anterior region of the ventrolateral nucleus (VLa) receiving major inputs from the globus pallidus internus (GPi), while the posterior region of the ventrolateral nucleus (VLp) receives primary inputs from the cerebellum (Bosch-Bouju et al., 2013). Recent studies using molecular markers were able to distinguish the VA and VL nuclei in rats (Kuramoto et al., 2009), which had hitherto been difficult and were considered as a single overlapping area as the VA-VL complex. Interestingly, despite apparent anatomical segregation of information in the basal ganglia and cerebellar territories, similar ranges of firing rate and movement related activity are observed in the Mthal neurons across all regions (Anderson and Turner, 1991). Furthermore, some experimental studies based on triple labeling techniques found zones of overlapping projections, as well as interdigitating foci of pallidal and cerebellar labels, particularly in border regions of the VLa (Sakai et al., 2000). In light of these evidences, it is plausible that the basal ganglia and cerebellar circuitries not only form an integrated functional network, but their individual outputs are combined together by a subset of the VLa neurons which in turn project to the supplementary and pre-supplementary motor cortical areas (Akkal et al., 2007) responsible for goaldirected movements. We envision that such a combined learning mechanism may be driven by reward modulated heterosynaptic plasticity (neuromodulation by way of dopaminergic projections) at the thalamus.

In this study, input correlation learning (ICO)in the form of a differential Hebbian learner (Porr and Wörgötter, 2006), was implemented as an example of delay conditioning in the cerebellum, while a reservoir network (Jaeger and Haas, 2004) based continuous actor-critic learner (Doya, 2000b) was implemented as an example of reward based conditioning in the basal ganglia. Taking advantage of the individual learning mechanisms, the combined framework can learn the appropriate goal-directed control policy

*Ocom* controls the agent behavior (policy) while sensory feedback from the

agent is sent back to both the learning mechanisms in parallel.

for an agent1 in a fast and robust manner outperforming the singular implementation of the individual components.

Although there have been a number of studies which have applied the two different conditioning concepts for studying selforganizing behavior in artificial agents and robots, they have mostly been applied separately to generate specific goal-directed behaviors (Morimoto and Doya, 2001; Verschure and Mintz, 2001; Hofstoetter et al., 2002; Prescott et al., 2006; Manoonpong et al., 2007; Soltoggio et al., 2013). In our previous work (Manoonpong et al., 2013) we motivated a combined approach of the two learning concepts on a purely algorithmic level without any adaptive combination between the two. To the best of our knowledge, in this paper we present for the first time a biologically plausible approach to model an adaptive combination of the cerebellar and basal ganglia learning systems, where they indirectly interact through sensory feedback. In this manner they work as a single functional unit to guide the behavior of artificial agents. We test our neural combined learning hypothesis within the framework of goal-directed decision making using a simulated wheeled robot situated in environments of increasing complexity designed as part of static and dynamic foraging tasks (Sul et al., 2011). Our results clearly show that the proposed mechanism enables the artificial agent to successfully learn the task in the different environments with changing levels of interaction between the two learning systems. Although we take a simplified approach of simulated robot based goal-directed learning, we believe our model covers a reasonable level of biological abstraction that can help us understand better, the closed-loop interactions between these two neural subsystems as evident from experimental studies and also provide a computational model of such combined learning behavior which has hitherto been missing.

We now give a brief introduction to the neural substrates of the cerebellum and the basal ganglia with regards to classical and operant conditioning. Using a broad high-level view of the anatomical connections of these two brain structures, we motivate how goal-directed behavior is influenced by the respective

1Agent here refers to any artificial or biological organism situated in a given environment.

structures and their associated neuronal connections. The individual computational models with implementation details of the two interacting learning systems are then presented in the Materials and Methods Section followed by results and discussion.

#### **1.1. CLASSICAL CONDITIONING IN THE CEREBELLUM**

The role of the Cerebellum and its associated circuitry in the acquisition and retention of anticipatory responses (sensory predictions) with Pavlovian delay conditioning has been well established (Christian and Thompson, 2003; Thompson and Steinmetz, 2009). Although most of the classical conditioning studies are primarily based on eye-blink conditioning (Yeo and Hesslow, 1998), recent experimental studies have established the essential role of the cerebellum in learning and memory of goal-directed behavioral responses (Burguiere et al., 2010). In **Figure 2A** a highly simplified control structure of the major cerebellar pathways and their relative function is indicated. The Inferior Olive relays the US signal to the cerebellar cortex through the climbing fibers and then induces plasticity at the synaptic junctions of the mossy fibers carrying the CS information (Herreros and Verschure, 2013). Repeated CS-US pairings gradually lead (through synaptic consolidation) to the acquisition of the CR with a drop in the firing activity of the Purkinje cells (output from the cerebellar cortex). The cerebral cortex projects to the lateral cerebellum via pontine nuclei relays (Allen and Tsukahara, 1974; Lisberger and Thach, 2013; Proville et al., 2014) which in turn have projections back to the cerebral cortex through relays in the thalamus (ventro-lateral nucleus), thus projecting the conditioned responses from the cerebellum to the motor cortical areas (Stepniewska et al., 1994; Sakai et al., 2000). In essence, the cerebellar action modulates or controls the motor activity of the animal which produces changes in its goal oriented behavior. The goal oriented behaviors can typically involve both attraction toward or avoidance of specific actions (generally referred to as adaptive reflexes) involving both sensory predictions and motor control, toward which the cerebellum makes a major contribution. It is also important to note that although numerous experimental and computational studies demonstrate

the function of the Cerebellum in classical conditioning or correlation learning (Kim and Thompson, 1997; Woodruff-Pak and Disterhoft, 2008), a possible role of the Cerebellum toward supervised learning (SL) has also been widely suggested (Doya, 1999; Kawato et al., 2011). Typically within the paradigm of SL a training or instructive signal acts as a reference toward which the output of a network (movements) is compared, such that the error generated acts as the driver signal to induce plasticity within the network in order to find the correct mapping between the sensory input stimuli and the desired outputs (Knudsen, 1994). Using the classical conditioning paradigm, it has been suggested that the instructive signal that supervises the learning is the input activity associated with the US. As such, the SL model of the cerebellum considers that the climbing fibers from the inferior olive provide the error signal (instructive activity) for the Purkinje cells. Coincident inputs from the inferior olive and the granule cells lead to plasticity at the granule-to-Purkinje synapses (Doya, 2000a). Although there have been experimental studies to validate the SL description of the cerebellum (Kitazawa et al., 1998), it has been largely directed toward considering the cerebellum as an internal model of the body and the environment (Kawato, 1999). Furthermore, Krupa et al. (1993) observed that even when the red nucleus (relay between motor cortex and cerebellum) was inactivated learning proceeded with no CR being expressed. Thus, this demonstrates that no error signal based on the behavior was needed for learning to occur. Instead, the powerful climbing fiber activity evoked by the US, acting as a template, could cause the connection strengths of sensory inputs that are consistently correlated with it to increase. Subsequently , after sufficient repetition, the activity of these sensory inputs alone would drive the UR pathway. As such, in this work we directly consider correlation learning as the basis of classical conditioning in the cerebellum without taking into consideration SL mechanisms and do not explicitly consider the US relay from the inferior olive as an error signal.

#### **1.2. REWARD LEARNING IN THE BASAL GANGLIA**

In contrast to the role of the cerebellum in classical conditioning, the basal ganglia and its associated circuitry possess the necessary anatomical features (neural substrates) required for a rewardbased learning mechanism (Schultz and Dickinson, 2000). In **Figure 2B** we depict the main anatomical connections of the cortical basal ganglia circuitry. It is comprised of the striatum (consisting of most of the caudate and the putamen, and of the nucleus accumbens), the internal (medial) and external (lateral) segments of the globus pallidus (GPi and GPe respectively), the subthalamic nucleus (STN), the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNc) and pars reticulata (SNr). The input stage of the basal ganglia is the striatum connected via direct cortical projections. Previous studies have not only recognized the striatum as a critical structure in the learning of stimulusresponse behaviors, but also established it as the major location which projects to as well as receives efferent connections from (via direct and indirect multi-synaptic pathways) the dopaminergic system (Joel and Weiner, 2000; Kreitzer and Malenka, 2008). The processing of rewarding stimuli is primarily modulated by the dopamine neurons (DA system in **Figure 2B**) of the VTA and SNc with numerous experimental studies (Schultz and Dickinson, 2000) demonstrating, that changes in dopamine neurons encode the prediction error in appetitive learning scenarios, and associative learning in general (Puig and Mille, 2012). **Figure 2B**—right shows the idealized reciprocal architecture of the striatal and dopaminergic circuitry. Here sensory stimuli arrive as input from the cortex to the striatal network. Excitatory as well as inhibitory synapses project from the striatum to the DA system which in turn uses the changes in the activity of DA neurons to modulate the activity in the striatum. Such DA activity also acts as the neuromodulatory signal to the thalamus which receives indirect connections from the striatum, via the GPi, SNr and VTA (Varela, 2014). Computational modeling of such dopamine modulated reward learning behavior is particularly well reflected by the Temporal Difference (TD) algorithm (Sutton, 1988; Suri and Schultz, 2001), as well as in the action selection based computational models of the basal ganglia (Gurney et al., 2001; Humphries et al., 2006). In the context of basal ganglia modeling, Actor-Critic models (explained further in the next section) of TD learning (Houk et al., 1995; Joel et al., 2002) have been extensively used. They create a functional separation between two sub-networks of the critic (modeling striatal and dopaminergic activity) and the actor (modeling striatal to motor thalamus projections). The TD learning rule uses the prediction error (TD error) between two subsequent predictions of the net weighted sum of future rewards based on current input and actions, to modulate critic weights via long-term synaptic plasticity. The same prediction error signal (dopaminergic projections) is also used to modulate the synaptic weights at the actor; output from which controls the the actions taken by the agent. Typically, here the mechanism of action selection, can be regarded as the neuromodulation process occurring at the striatum, which then reaches the motor thalamic regions via projections from the output stages of the basal ganglia, namely GPi/GPe and SNr (Gurney et al., 2001; Houk et al., 2007) (**Figure 2B**).

## **2. MATERIALS AND METHODS**

### **2.1. COMBINATORIAL LEARNING WITH REWARD MODULATED HETEROSYNAPTIC PLASTICITY**

According to the neural combined learning hypothesis for successful goal-directed decision making, the underlying neural machinery of animals combines basal ganglia and cerebellar learning systems output, induced with a reward modulated balancing (neuromodulation) between the two, at the thalamus to achieve net sensory-motor adaptation. Thus, here we develop a system for the parallel combination of the input correlationbased learner (ICO) and the reward-based learner (actor-critic) as depicted in **Figure 1B**. The system works as a dual learner where the individual learning mechanisms run in parallel to guide the behavior of the agent. Both systems adapt their synaptic weights independently (as per their local synaptic modification rules) while receiving the same sensory feedback from the agent (environmental stimuli) in parallel. The final action that drives the agent is calculated as a weighted sum (**Figure 3** red circle) of the individual learning components. This can be described as follows:

$$
\rho\_{com}(t) = \xi\_{ico} o\_{ico}(t) + \xi\_{ac} o\_{ac}(t) \tag{1}
$$

where, *oico*(*t*) and *oac*(*t*) are the *t* time step outputs of the input correlation-based learner and the actor-critic reinforcement learner, respectively. *ocom*(*t*) represents the *t* time step combined action. The key parameters here that govern the learning behavior are the synaptic weights of the output neuron projection from the individual components (ξ*ico* and ξ*ac*). These govern the degree of influence of the two learning systems, on the net action of the agent. Previously, a simple and straight forward approach was undertaken in Manoonpong et al. (2013), where an equal contribution (ξ*ico* = ξ*ac* = 0.5) of ICO and actor-critic RL for controlling an agent was considered. Although this can lead to successful solutions in certain goal-directed problems, it is sub-optimal due to the lack of any adaptive balancing mechanism. Intuitively for associative learning problems with immediate rewards the ICO system learns quickly as compared to distal reward based goaldirected problems where, the ICO learner can provide guidance to the actor-critic learner. In particular depending on the type of problem, the right balance between the two learners needs to be achieved in an adaptive manner.

While there is evidence on the direct communication (Bostan et al., 2010) or combination of the subcortical loops from the cerebellum and the basal ganglia (Houk et al., 2007), a computational mechanism underlying this combination has not been presented, so far. Here we propose for the first time, an adaptive combination mechanism of the two components, modeled in the form of a reward modulated heterosynaptic plasticity (RMHP) rule, which learns the individual synaptic weights (ξ*ico* and ξ*ac*) for the projections from these two components. It is plausible that such a combination occurs at the VA-VL region of the motor thalamic nuclei which has both pallido-thalamic (basal ganglia) and cerebello-thalamic projections (Sakai et al., 2000). Furthermore, a few previous experimental studies (Desiraju and Purpura, 1969; Allen and Tsukahara, 1974) suggested that the individual neurons of the VL (nearly 20%) integrate signals from the basal ganglia and the cerebellum along with some weak cerebral inputs2. Based on biological evidence of dopaminergic projections at the thalamus from the basal ganglia circuit (García-Cabezas et al., 2007; Varela, 2014) as well as cerebellar projections to the thalamic ventro-latral nucleus (Bosch-Bouju et al., 2013) (see Figures 42–47 in Lisberger and Thach, 2013) we consider here that such dopaminergic projections act as the neuromodulatory signal and triggers the heterosynaptic plasticity (Ishikawa et al., 2013). A large number of such heterosynaptic plasticity mechanisms contribute toward a variety of neural processes involving associative learning and development of neural circuits in general (Bailey et al., 2000; Chistiakova and Volgushev, 2009). Although there is currently no direct experimental evidence of heterosynaptic plasticity at thalamic nuclei, it is highly plausible that such interactions could occur on synaptic afferents as observed in the amygdala and the hippocampus (Vitureira et al., 2012). Here, we use the instantaneous reward signal as the modulatory input in order to induce heterosynaptic changes at the thalamic junction. Similar approach have also been used in some previous theoretical models of reward modulated plasticity (Legenstein et al., 2008; Hoerzer et al., 2012). Although the dopaminergic projections from the VTA to the Mthal are primarily believed to encode a reward prediction error (RPE) signal (Schultz and Dickinson, 2000), there exists considerable diversity in the VTA neuron types with a subset of these dopaminergic neurons directly responding to rewards (Cohen et al., 2012). Similar variability has also been observed in the single DA neuron recordings from memory guided sacadic tasks performed with primates (Takikawa et al., 2004). This suggests that although most dopaminergic neurons respond to a reward predicting conditional simuli, some may not strictly follow the canonical RPE coding (Cohen et al., 2012). It is important to note that, within this model, it is equally possible to use the reward prediction error (TD error, Equation 12) and still learn the synaptic weights of the two components in a stable manner, however with a negligibly slower weight convergence due to continuous weight changes (see Supplementary Figure 1).

Based on this RMHP plasticity rule the ICO and actor-critic RL weights are learned at each time step as follows :

$$
\Delta \xi\_{ico}(t) = \eta r(t) [o\_{ico}(t) - \bar{o}\_{ico}(t)] o\_{ic}(t), \tag{2}
$$

$$
\Delta \xi\_{ac}(t) = \eta r(t)[o\_{ac}(t) - \bar{o}\_{ac}(t)]o\_{ico}(t). \tag{3}
$$

Here *r*(*t*) is the current time step reward signal received by the agent, while *o*¯*ico*(*t*) and *o*¯*ac*(*t*) denote the low-pass filtered version of the output from the ICO learner and the actor-critic learner, respectively. They are calculated as:

$$
\bar{o}\_{ico}(t) = 0.9\bar{o}\_{ico}(t-1) + 0.1o\_{ico}(t),
$$

$$
\bar{o}\_{ac}(t) = 0.9\bar{o}\_{ac}(t-1) + 0.1o\_{ac}(t). \tag{4}
$$

The plasticity model used here is based on the assumption that the net policy performance (agent's behavior) is influenced by a single global neuromodulatory signal. This relates to the dopaminergic projections to the ventra-lateral nucleus in the thalamus as well as connections from the amygdala which can carry reward related signals that influence over all action selection. The RMHP learning rule correlates three factors: (1) the reward signal, (2) the deviations of the ICO and actor-critic learner outputs from their mean values, and (3) the actual ICO and actor-critic outputs. The correlations are used to adjust their respective synaptic weights (ξ*ico* and ξ*ac*). Intuitively here the heterosynaptic plasticity rule can be also viewed as a homeostatic mechanism (Vitureira et al., 2012). Such that, the equation 2 tells the system to increase the ICO learners weights (ξ*ico*) when the ICO output is coincident with the positive reward, while the third factor (*oac*) tells the system to increase ξ*ico* more (or less) when the actor-critic learner weights (ξ*ac*) are large (or small), and vice versa for Equation 3. This ensures that overall the ratio of weight change of the two learning components occurs at largely the same rate. Additionally in order to prevent uncontrolled divergence in the learned weights, homeostatic synaptic normalization is carried out specifically as follows:

$$
\xi\_{ico}(t) = \frac{\xi\_{ico}(t)}{\xi\_{ico}(t) + \xi\_{ac}(t)},
$$

$$
\xi\_{ac}(t) = \frac{\xi\_{ac}(t)}{\xi\_{ico}(t) + \xi\_{ac}(t)}.\tag{5}
$$

This ensures that the synaptic weights always add up to one and 0 < ξ*ico*, ξ*ac* < 1. In general this plasticity rule occurs on a very slow time scale which is governed by the learning rate parameter η. Typically convergence and stabilization of weights are achieved by setting η much smaller compared to the learning rate of the two individual learning systems (ICO and actor-critic). To get a more detailed view of the implementation of the adaptive combinatorial learning mechanism, interested readers should refer to algorithm 2 in the Supplementary Material.

#### **2.2. INPUT CORRELATION MODEL OF CEREBELLAR LEARNING**

In order to model classical conditioning of adaptive motor reflexes3in the cerebellum, we use a model-free, correlation based, predictive control learning rule called input correlation learning (ICO) (Porr and Wörgötter, 2006). ICO learning provides a fast and stable mechanism in order to acquire and generate sensory predictions for adaptive responses based solely on the correlations between incoming stimuli. The ICO learning rule (**Figure 3** Right) takes the form of an unsupervised synaptic modification mechanism using the cross-correlation between the incoming predictive input stimuli (predictive here means that the signals occur early) and a single reflex signal (late occurring). As depicted in **Figure 3** right, cortical perceptual input in the form of predictive signals (CS) represents the mossy fiber projections to the cerebellum microcircuit, while the Climbing fiber projections from the inferior olive that modulates the synaptic weights in the

<sup>2</sup>It is also plausible that integration of activity arising in basal ganglia and cerebellum might take place in the thalamus nuclei other than the VL-VA, since pallidal as well as cerebellar fibers are known histologically to terminate not only in the VL-VA but also in other structures (Mehler, 1971).

<sup>3</sup>The reflex signal is typically a default response to an unwanted situation. This acts as the unconditional stimulus occurring later in time, than the predictive conditional stimulus.

deep cerebellar nucleus are depicted in a simplified form with the differential region (*d*/*dt*).

The goal of the ICO mechanism is to behave as a forward model system (Porr and Wörgötter, 2006) that uses the sensory CS to predict the occurrence of the innate reflex signal (external predefined feedback signaling unwanted scenarios), thus letting the agent to react in an anticipatory manner to avoid the basic reflex altogether. Based on a differential Hebbian learning rule (Kolodziejski et al., 2008) the synaptic weights in the ICO scheme are modified using heterosynaptic interactions of the incoming inputs, depending on their order of occurrence. In general, the plastic synapses of the predictive inputs get strengthened if they precede the reflex signal and are weakened if their order of occurrence is reversed. As a result, the ICO learning rule drives the behavior depending on the timing of correlated neural signals. This can be formally represented as,

$$
\rho\_{i\alpha}(t) = \rho\_0 \mathbf{x}\_0(t) + \sum\_{j=1}^{K} \rho\_j(t) \mathbf{x}\_j(t). \tag{6}
$$

Here, *oico* represents the output neuron activation of the ICO system driven by the superposition of the plastic K-dimensional predictive inputs *xj*(*t*) = *x*1(*t*), *x*2(*t*),..., *xK*(*t*) <sup>4</sup> (differentially modified) and the fixed innate reflex signal *x*0(*t*). The synaptic strength of the reflex signal is represented by ρ<sup>0</sup> and is fixed to the constant value of 1.0 in order to signal innate response to the agent. Using the cross-correlations between the input signals, our differential Hebbian learning rule modifies synaptic connections as follows:

$$
\Delta \rho\_{\dot{\jmath}}(t) = \mu \mathbf{x}\_{\dot{\jmath}}(t) \frac{d}{dt} \mathbf{x}\_0(t). \tag{7}
$$

Here, μ defines the learning rate and is typically set to a small value to allow slow growth of synaptic weights with convergence occurring once the reflex signal *xo* = 0 (Porr and Wörgötter, 2006). Thus, ICO learning allows the agent to predict the primary reflex and successfully generate early, adaptive actions. However, no explicit feedback of goodness of behavior is provided to the agent and thus only an anticipatory response can be learned without the explicit notion of how well the action allows reaching a desired (rewarding) goal location. As depicted in **Figure 3**, the output from the ICO learner is directly fed into the RMHP unit envisioned to be part of the ventro-lateral thalamic nucleus (Akkal et al., 2007; Bosch-Bouju et al., 2013).

#### **2.3. ACTOR-CRITIC RESERVOIR MODEL OF BASAL-GANGLIA LEARNING**

TD learning (Sutton, 1988; Suri and Schultz, 2001), in the framework of actor-critic reinforcement learning (Joel et al., 2002; Wörgötter and Porr, 2005), is the most established computational model of the basal ganglia. As explained in the previous section, the TD learning technique is particularly well suited for replicating or understanding how reward related information is formed and transferred by the mid-brain dopaminergic activity.

The model consists of two sub-networks, namely, the adaptive critic (**Figure 3** left, bottom) and the actor (**Figure 3** left, above). The critic is adaptive in the sense that it learns to predict the weighted sum of future rewards taking into account the current incoming sensory stimuli and the actions (behaviors) performed by the agent within a particular environment. The difference between the predicted "value" of sum of future rewards and the actual measure acts as the temporal difference (TD) prediction error signal that provides an evaluative feedback (or reinforcement signal) to drive the actor. Eventually the actor learns to perform the proper set of actions (policy5 ) that maximize the weighted sum of future rewards as computed by the critic. The evaluative feedback (TD error signal) in general acts as a measure of goodness of behavior that, overtime, lets the agent learn to anticipate reinforcing events. Within this computational framework, the TD prediction error signal and learning at the critic are analogous to the dopaminergic (DA) activity and the DA dependent long term synaptic plasticity in the striatum (**Figure 2B**), while the remaining parts of striatal circuitry can be envisioned as the actor which uses the TD modulated activity to generate actions, which drives the agent's behavior.

Inspired by the reservoir computing framework (Maass et al., 2002; Jaeger and Haas, 2004), here we use a chaotic random recurrent neural network (RNN) (Sussillo and Abbott, 2009; Rajan et al., 2010) as the adaptive critic (cortico-striatal circuitry and the DA system) connected to a feed-forward neural network, serving the purpose of the part of striatum that performs action selection (Gurney et al., 2001) and then relays it to the motor thalamus via projections from the globus pallidus and substantia nigra. This provides an effective framework to model a continuous actorcritic reinforcement learning scheme, which is particularly suited for goal-directed learning in continuous state-action problems, while at the same time maintaining a reasonable level of biological abstraction (Fremaux et al., 2013). Here, the reservoir network can be envisioned as analogous to the cortex and its inherent recurrent connectivity structure, and the readout neurons serving as the striatum, with plastic projections from the recurrent layer, as the modifiable cortico-striatal connections (Hinaut and Dominey, 2013). The reservoir network is constructed as a generic network model of *N* recurrently connected neurons with high sparsity (refer to Supplementary Material for details) and fixed synaptic connectivity. The connections within the recurrent layer are drawn randomly in order to generate a sparsely connected network of inhibitory and excitatory synapses. A subset of the reservoir neurons receive input connections (fixed synaptic strengths) as external driving signals and has an additional output layer of neurons that learns to produce a desired response based on synaptic modification of weights from the reservoir to output neurons. The input connections along with the large recurrently connected reservoir network represents the main cortical microcircuit-to-striatum connections, while the output layer neural activity can be envisioned as striatal neuronal responses. In this case, the reservoir critic provides an input (sensory stimuli) driven dynamic network with a large repertoire of signals

<sup>4</sup>This x(t) is different from the neural state activation vector **x**(*t*) of Equation 9.

<sup>5</sup>In reinforcement learning, policy refers to the set of actions performed by an agent that maximizes it's average future reward.

that is used to predict the value function *v* (average sum of future rewards). *v*(*t*) approximates the accumulated sum of the future rewards *r*(*t*) with a given discount factor γ (0 ≤ γ < 1)6 as follows:

$$\nu(t) = \sum\_{i=1}^{\infty} \nu^{i-1} r(t+i). \tag{8}$$

In our model, the membrane potential at the soma (at time *t*) of the reservoir neurons, resulting from the incoming excitatory and inhibitory synaptic inputs, is given by the *N* dimensional vector of neuron state activation's, **x**(*t*) = *x*1(*t*), *x*2(*t*),..., *xN*(*t*). The input to the reservoir network, consisting of the agent's states (sensory input stimuli from the cerebral cortex), is represented by the *K* dimensional vector **u**(*t*) = *u*1(*t*), *u*2(*t*),..., *uK*(*t*). The recurrent neural activity within the dynamic reservoir varies as a function of its previous state activation and the current driving input stimuli. The recurrent network dynamics is given by,

$$\tau \dot{\mathbf{x}}(t) = -\mathbf{x}(t) + \mathbf{g} \mathbf{W}\_{\text{sys}} \mathbf{z}(t) + \mathbf{W}\_{\text{in}} \mathbf{u}(t) + \mathbf{b},\tag{9}$$

$$
\hat{\nu}(t) = \tanh(\mathbf{W}\_{out}\mathbf{z}(t)),
\tag{10}
$$

$$z\_i(t) = \tanh(\alpha x\_i(t) + \beta). \tag{11}$$

The parameters **W***in* and **W***sys* denote the input to reservoir synaptic weights and the recurrent connection weights within the reservoir, respectively. The parameter *g* (Sompolinsky et al., 1988) acts as the scaling factor for the recurrent connection weights allowing different dynamic regimes from stable to chaotic being present in the reservoir. Similar to Sussillo and Abbott (2009) we select *g* such that the network exhibits chaotic dynamics as spontaneous behavior before learning and maintains stable dynamics after learning, with the help of feedback connections and neuronal activation homeostasis via intrinsic plasticity (Triesch, 2005; Dasgupta et al., 2013a). The RNN does not explicitly model action potentials, but describes neuronal firing rates, where in, the continuous variable *zi* is the instantaneous firing rate of the reservoir neurons and is calculated as a non-linear saturating function of the state activation *xi* (Equation 11). The output layer consists of a single neuron whose firing rate *v*ˆ(*t*) is calculated at time *t* based on equation 10, as a non-linear transformation of the weighted projections of the internal reservoir neuron firing rates **z**(*t*). Here the parameter **W***out* denotes the *N* × *K* dimensional reservoir to output connection synaptic weights. Each unit in the network also receives a constant bias signal *bi*, represented in equation 9 as the *N* dimensional vector **b**. The overall time scale of the RNN and the leak rates of individual reservoir neurons are controlled by the parameter τ .

Based on the TD learning principle, the primary goal of the reservoir critic is to predict *v*(*t*) such that the TD error δ is minimized over time. At each time point *t*, δ is computed from the current (*v*ˆ(*t*)) and previous (*v*ˆ(*t* − 1)) value function predictions (reservoir output), and the current reward signal *r*(*t*), as follows:

$$
\delta(t) = r(t) + \mathcal{y}\hat{\boldsymbol{\nu}}(t) - \hat{\boldsymbol{\nu}}(t-1). \tag{12}
$$

The output weights **W***out* are calculated using the recursive least squares (RLS) algorithm (Haykin, 2002) at each time step, while the sensory stimuli **u**(*t*) are being fed into the reservoir. **W***out* are calculated such that the overall TD-error (δ) is minimized. We implement the online RLS algorithm using a fixed forgetting factor (λ*RLS* < 1) as given in **Algorithm 1**.

As proposed in Triesch (2005) and Dasgupta et al. (2013a) we implement a generic intrinsic plasticity mechanism based on the Weibull distribution for unsupervised adaptation of the reservoir neuron non-linearity using a stochastic decent algorithm to adapt the scale α and shape parameters β of the saturating function in Equation 11. This allows the reservoir to homoeostatically maintain a stable firing rate while at the same time it drives the neuron activities to a non-chaotic regime. It is also important to note that one of the primary assumptions of the basic TD learning rule is a Markovian one, which considers future sensory cues and rewards depending only on the current sensory cue without any memory component. The use of a reservoir critic (due to the inherent fading temporal memory of recurrent networks Lazar et al., 2007) breaks this assumption. As a result, such design principle extends our model to problems with short term dependence of immediate sensory stimuli on the preceding history of stimuli and reward (see **Figure 4** for a simulated example of local temporal memory in reservoir neurons).

The actor (**Figure 3** left above) is designed as a single stochastic neuron, such that for a one dimensional action generation the output (*Oac*) is given as:

#### **Algorithm 1: Online RLS algorithm for learning reservoir to output neuron weights.**

*Initialize*: **W***out* = 0, exponential forgetting factor (λ*RLS*) is set to a value less than 1 (we use 0.85) and the auto-correlation matrix ρ is initialized as ρ(0) = **I**/β, where **I** is unit matrix and β is a small constant.

*Repeat*: At time step *t*

Step 1: For each input signal **u**(*t*), the reservoir neural firing rate vector **z**(*t*) and network output *v*ˆ(*t*) are calculated using equation 11 and equation 10.

Step 2: Online error *e*(*t*) calculated as: *e*(*t*) ← δ(*t*)

Step 3: Gain vector **K**(*t*) is updated as: **K**(*t*) ← <sup>ρ</sup>(*t*−1)**z**(*t*) λ*RLS*+**z***<sup>T</sup>* (*t*)ρ(*t*−1)**z**(*t*)

Step 4: Update the auto-correlation matrix ρ(*t*) ρ(*t*) ← <sup>1</sup> λ*RLS* ρ(*t* − 1) − *K*(*t*)**z***T*(*t*)ρ(*t* − 1)

Step 5: Update the instantaneous output weights **W***out*(*t*) **W***out*(*t*) ← **W***out*(*t* − 1) + *K*(*t*)*e*(*t*)

Step 6: *t* ← *t* + 1

*Until*: The maximum number of time steps is reached.

<sup>6</sup>The discount factor helps assigning decreasing value to rewards further away in the past as compared to the current reward.

*oac*(*t*) = (*t*) +- *K wi*(*t*)*ui*(*t*), (13)

(not shown here). Spontaneous dynamics then unfolds in the system based on Equation 9. The lower right panel plots the activity of 5

*i* = 1

where *K* denotes the dimension (total number) of sensory stimuli (**u**(*t*)) to the agent being controlled. The parameter *wi* denotes the synaptic weights for the different sensory inputs projecting to the actor neuron. Stochastic noise is added to the actor via (*t*), which is the exploration quantity updated at every time step. This acts as a noise term, such that initially exploration is high, and the agent needs to navigate the environment more if the expected cumulative future reward *v*(*t*) is sub-optimal. However, as the agent learns to successfully predict the maximum cumulative reward (value function) over time, and the net exploration is decreased. As a result (*t*) gradually tends toward zero as the agent starts to learn the desired behavior (correct policy). Using Gaussian white noise σ (zero mean and standard deviation one) bounded by the minimum and maximum limits of the value function (*vmin* and *vmax*), the exploration term is modulated as follows:

$$\epsilon(t) = \Omega \sigma(t) \cdot \min\left[0.5, \max\left(0, \frac{\nu\_{\max} - \hat{\nu}(t)}{\nu\_{\max} - \nu\_{\min}}\right)\right]. \tag{14}$$

Here, is a constant scale factor selected empirically (see Supplementary Material for details). The actor learns to produce the correct policy, by an online adaptation (**Figure 3** left above) of its synaptic weights *wi* at each time step as follows:

$$
\Delta \boldsymbol{w}\_i(t) = \boldsymbol{\pi}\_a \boldsymbol{\delta}(t) \boldsymbol{u}\_i(t) \boldsymbol{\epsilon}(t), \tag{15}
$$

where τ*<sup>a</sup>* is the learning rate such that 0 < τ*<sup>a</sup>* < 1. Instead of using direct reward *r*(*t*) to update the input to actor neuron synaptic weights, using the TD-error (i.e., error of an internal reward) allows the agent to learn successful behavior, even in cases of delayed reward scenarios (reward is not given uniformly for each time step but is delivered as a constant value after a set of actions were performed to reach a specific goal). In general, once the agent learns the correct behavior, the exploration term ((*t*)) should become zero, as a result of which no further weight change (Equation 15) occurs and *oac*(*t*) represents the desired action policy, without any additional noise component.

the network exhibits considerable fading memory of the brief

#### **3. RESULTS**

incoming input stimuli.

In order to test the performance of our bio-inspired adaptive combinatorial learning mechanism, and validate the interaction through sensory feedback, between reward-based learning (basal ganglia) and correlation-based learning (cerebellum) systems, we employ a simulated, goal-directed decision making scenario of foraging behavior. This is carried out within a simplified paradigm of a four-wheeled robot navigating an enclosed environment, with gradually increasing task complexity.

#### **3.1. ROBOT MODEL**

The simulated wheeled robot NIMM4 (**Figure 5**) consists of a simple body design with four wheels whose collective degree of rotation controls the steering and the over all direction of motion. It is provided with two front infrared sensors (*IR*<sup>1</sup> and *IR*2) which can be used to detect obstacles to its left or right side, respectively. Two relative orientation sensors (μ*<sup>G</sup>* and μ*B*) are also provided, which can continuously measure the angle of deviation of the robot with respect to the green (positive) and blue (negative) food sources. They are calibrated to take values in the interval [−180◦, 180◦] with the angle of deviation μ*G*,*<sup>B</sup>* = 0*<sup>o</sup>* when the respective goal is directly in front of the robot, μ*G*,*<sup>B</sup>* is positive when the goal locations are to the right of the robot and negative for the opposite case. In addition NIMM4 also consists of two relative position sensors (*DG*,*B*) that can calculate it's relative straight line distance to a goal, taking values in the interval [0, 1], with the

**FIGURE 5 | Simulated mobile robot system for goal-directed behavior task. (Top)** The mobile robot NIMM4 with different types of sensors. The relative orientation sensor μ is used as state information for the robot. **(Bottom)** Variation of the relative orientation μ*<sup>G</sup>* to the green goal. the front left and right infrared sensors *IR*<sup>1</sup> and *IR*<sup>2</sup> are used to detect obstacles in front of the robot. Direction control for the robot is maintained using the quantity *Usteering* calculated by the individual learning components (ICO and actor-critic) and then fed to the robot wheels to generate forward motion or steering behavior. Sensors *DG* and *DB* measure straight line distance to the goal locations.

respective sensor reading tending to zero, as the robot gets closer to the goal location and vice versa.

#### **3.2. EXPERIMENTAL SETUP**

The experimental setup (**Figure 6**) consists of a bounded environment with two different food sources (desired vs punishing) located at fixed positions. The primary task of the robot is to navigate the environment such that, eventually, it should learn to steer toward the food source that leads to positive reinforcements (green spherical ball in **Figures 6A–C**) while avoiding the goal location that provides negative reinforcements or punishments (blue spherical ball), within a specific time interval. The main task is designed as a continuous state-action problem with a distal reward setup (Reinforcement zone in **Figure 6**), such that the robot starts at a fixed spatial location with random initial orientation ([−60◦, 60◦]) and receives the positive or negative reinforcement signal only within a radius of specific distance (*DG*,*<sup>B</sup>* = 0.2) from the two goal locations. Within this boundary, for the green goal it receives a continuous reward of +1 at every time step and a continuous punishment of −1 in case of the blue goal, respectively. At other locations along the environment no reinforcement signal is given to the robot.

The experiments are further divided into three different scenarios of, foraging without an obstacle (case I), with single obstacle (case II) and a dynamic foraging scenario (case III), demonstrating different degrees of reward modulated adaptation between the two learning systems in different environments. In all scenarios, the robot can continuously sense its angle of deviation to the two goals with μ*G*,*<sup>B</sup>* always active. This acts as a Markov decision process (MDP) such that, the next sensory state of the robot depends on the sensory information for the current state of the robot and the action it performed, and is conditionally independent of all the previous sensory states and actions. Detecting the obstacle results in negative reinforcement (continuous −1 signal) triggered by the front infrared sensors (*IR*1,<sup>2</sup> > 1.0). Furthermore, hitting the boundary wall in the arena results

#### **FIGURE 6 | Three different scenarios for the goal-directed foraging**

**task. (A)** Environmental setup without an obstacle case. Green and Blue objects represent the two food sources with positive and negative rewards, respectively. The red dotted circle indicates the region where the turning reflex response (from the ICO learner) kicks in. The robot is started from and reset to the same position, with random orientation at the beginning of each trial episode. **(B)** Environmental setup with an obstacle. In addition to the previous setup, a large obstacle is place in

the middle of the environment. The robot needs to learn to successfully avoid it and reach the rewarding food source. Collisions with the obstacle (triggered by *IR*<sup>1</sup> and *IR*2) generate negative rewards (−1 signal) to the robot. **(C)** Environmental setup with dynamic switching of the two objects. It is an extended version of the first scenario. After every 50 trials the reward zones are switched such that the robot has to dynamically adjust to the new positively reinforced location (food) and learn a new trajectory from the starting location.

in a negative reinforcement signal (−1), with the robot being reset to the original starting location. Although the robot is provided with relative distance sensors, sensory stimuli (state information) is provided using only the angle of deviation sensors and the infrared sensors. The reinforcement zone (distance of *DG*,*<sup>B</sup>* = 0.2) is also used as the zone of reflex to trigger a reflex signal for the ICO learner. Fifty runs were carried out for each setup in all cases. Each run consisted of a maximum of 150 trials. The robot was reset if the maximum simulation time of 15 s was reached, or if it reaches one of the goal locations or if it hits a boundary wall, which ever occurs earlier.

#### **3.3. CEREBELLAR SYSTEM: ICO LEARNING SETUP**

The cerebellar system in the form of ICO learning (**Figure 3** right) was setup as follows: μ*G*,*<sup>B</sup>* were used as predictive signals (CS). Two independent reflex signals (*x*0,*<sup>B</sup>* and *x*0,*G*, see equation 6) were configured with one for blue food source and the other for the green food source (US). The setup was designed following the principles of delayed conditioning experiments, where, an overlap between the CS and the US stimuli needs to exist in order for the learning to take place. The reflex signal was designed (measured in terms of the relative orientation sensors of the robot) to elicit a turn toward a specific goal once the robot comes within the reflex zone (inside the dotted circle in **Figures 6B,C**). Irrespective of the kind of goal (desired or undesired) the reflex signal drives the robot toward it with a turn proportional to the deviations defined by μ*G*,*<sup>B</sup>* i.e., large deviations cause sharper turns. The green and the blue ball were placed such that there was no overlap between the reflex areas, hence only one reflex signal per goal, got triggered at a time. In other words, the goal of the ICO learner is simply to learn to steer toward a food location without any knowledge of it's worth. This is representative of an adaptive reflexive behavior as observed in rodent foraging studies where in the behavior is guided without explicit rewards, but just driven by conditioning between the CS-US stimuli, such that the robot or animal learns to favor certain spots in the environments without any knowledge of their worth. The weights of the ICO learner ρμ*<sup>G</sup>* and ρμ*<sup>B</sup>* (Equation 6) with respect to the green and blue goals were initialized to 0.0. If the positive derivative of the reflex signal becomes greater than a predefined threshold, the weights change and otherwise they remain static, i.e., a higher change in ρμ*<sup>G</sup>* in comparison to ρμ*<sup>B</sup>* would mean that the robot gets drawn toward the green goal more.

#### **3.4. BASAL GANGLIA SYSTEM: RESERVOIR ACTOR-CRITIC SETUP**

The basal ganglia system in the form of a reservoir based actorcritic learner was setup such that, the inputs to the critic and actor networks (**Figure 3** left) consisted of the two relative orientation sensor data μ*<sup>G</sup>* and μ*<sup>B</sup>* and the front left and right infrared sensors (*IR*<sup>1</sup> and *IR*2) of the robot (**Figure 4**). Although the robot also contains relative distance sensors, these were not used as state information inputs. This makes the task less trivial, such that sufficient but not complete information was provided to the actor-critic RL network. The reservoir network for the critic consisted of *N* = 100 neurons and one output neuron that estimates the value function *v*(*t*) (Equation 10). Reservoir input weights *Win* were drawn from an uniform distribution [−0.5, 0.5] while the reservoir recurrent weights *Wsys* were drawn from a Gaussian distribution of mean 0 and standard deviation *g*2/*N* (see Equation 9). Here *g* acts as the scaling factor for *Wsys*, and it was designed such that there is only 10% internal connectivity in *Wsys* with a scaling factor of 1.2. The reward signal *r*(*t*) (Equation 12) was set to +1 when the robot comes close (reflex/reinforcement zone) to the green ball and to −1 when it comes close to the blue ball. A negative reward of −1 was also given for any collisions with the boundary walls or obstacle. At all other locations within the environment, the robot receives no explicit reward signal. Thus, the setup is designed keeping a delayed reward scenario in mind, such that earlier actions lead to a positive or negative reward, only when the robot enters the respective reinforcement/reflex zone. The synaptic weights of the actor with respect to the two orientation sensors (*w*μ*<sup>G</sup>* and *w*μ*<sup>B</sup>* ) were initialized to 0.0, while the weights with respect to the infrared sensors (*wIR*<sup>1</sup> and *wIR*<sup>2</sup> ) were initialized to 0.5 (equation 13). After learning, a high value of *w*μ*<sup>G</sup>* and a low value of *w*μ*<sup>B</sup>* would drive the robot toward the green goal location and away from the blue goal. The weights of the infrared sensor inputs effectively control the turning behavior of the robot when encountered with an obstacle (higher *wIR*1—right turn, higher *wIR*2—left turn). The parameters of the adaptive combinatorial network are summarized in the Supplementary Tables 1–3.

#### **3.5. CASE I: FORAGING WITHOUT OBSTACLE**

In the simplest foraging scenario the robot was placed in an environment with two possible food sources (green and blue) and without any obstacle in between (**Figure 6A**). In this case the green food source provided positive reward while the blue food source provided negative reward. The goal of the combined learning mechanism was to make the robot successfully steer toward the desired food source. **Figure 7A** shows simulation snapshots of the behavior of the robot as it explores the environment. As observed from the trajectory of the robot, initially it performed a lot of exploratory behavior and randomly moved around in the environment, but eventually it learned to move solely toward the green goal. This can be further analyzed looking at the development of the synaptic weights of the different learning components as depicted in **Figure 8**. As observed in **Figure 8C** due to the simple correlation mechanism of the ICO learner (cerebellar system), the ICO weights adapt relatively faster as compared to the actor. Due to random explorations (**Figure 9B**) in the beginning, in the event of the blue goal being visited more frequently, reflexive pull toward blue goal - ρμ*<sup>B</sup>* is greater than toward the green goal ρμ*<sup>G</sup>* . However, after sufficient explorations, as the robot starts reaching the green goal more frequently, ρμ*<sup>G</sup>* also starts developing. This is counteracted by the actor weights (basal ganglia system), where in, there is a higher increase in *w*μ*<sup>G</sup>* (orientation sensor input representing angle of deviation from green goal) as compared to *w*μ*<sup>B</sup>* (orientation sensor input representing angle of deviation from blue goal). This is caused as result of the increased positive rewards received from the green goal (**Figure 9A**) that causes the TD-error to modulate the actor weights (equation 15) accordingly. At the same time no significant change is seen in the infrared sensor input weights (**Figure 8B**), due to the fact that in this scenario, the infrared sensors get triggered only on collisions

with the boundary wall and remain dormant otherwise. Recall that the infrared sensor weights were initialized to 0.5.

Over time as the robot moves more toward the desired food source, the ICO weights also stabilize with the reflex toward the green goal being much stronger. This also leads to a reduction of the exploration noise (**Figure 9B**), and the actor weights eventually converge to a stable value (**Figures 8A,B**). Here, the slow RMHP rule performs a balancing act between the two learning systems with initial higher weight of the actor-critic learner and then a switch toward the ICO system, once the individual learning rules have converged. **Figure 9C** shows the development of the value function (*v*(*t*)) at each trial, as estimated by the critic. As observed initially the critic underestimates the total value due to high explorations and random navigation in the environment. However, as the different learning rules converge, the value function starts to reflect the total accumulated reward with stabilization after 25 trials (each trials consisted of approximately 1000 time steps).

This is also clearly observed from the change of the orientation sensor readings shown in **Figure 9D**. Although there is considerable change in the sensor readings initially, after learning, the orientation sensor toward the green goal (μ*G*) records positive angle, while the orientation from the blue goal μ*<sup>B</sup>* records considerably lower negative angles. This indicates that the robot learns to move stably toward the positively rewarded food source and away from the oppositely rewarded blue food source. Although this is the simplest foraging scenario, the development of the RMHP weights ξ*ico* and ξ*ac* (**Figure 8D**) depicts the adaptive combination of the basal gangliar and cerebellar learning systems for goal-directed behavior control. Here the cerebellar system (namely ICO) acts as a fast adaptive reflex learner that guides and shapes the behavior of the

**FIGURE 8 | Synaptic weight change curves for the static foraging tasks without obstacle and with single obstacle. (A)** Change in the synaptic weights for actor-critic RL learner. Here *w*μ*<sup>G</sup>* corresponds to the input weights of the orientation sensor toward the green goal and *w*μ*<sup>B</sup>* corresponds to the input weights of the orientation sensor toward the blue goal. **(B)** Change in the weights of the two infrared sensor inputs of the actor. *wIR*<sup>1</sup> is the left IR sensor weight, *wIR*<sup>2</sup> is the right IR sensor weights. **(C)** Change in the synaptic weights of the ICO learner. ρμ*<sup>G</sup>* is the CS stimulus weight for the orientation sensor toward green, ρμ*<sup>B</sup>* the CS stimulus weight for the orientation sensor

toward blue. **(D)** Learning curve of the RMHP combined learning mechanism showing the change in the weights of the ICO network output (depicted in red). ξ*ico* is weight of the ICO network output. ξ*ac* is weight of the actor-critic RL network output (depicted in black). **(E–H)** Show the change in the weights corresponding to the single obstacle static foraging task. In all the plots the gray shaded region marks the region of convergence for the respective synaptic weights. Three different timescales exist in the system, with the ICO learning being the fastest, actor-critic RL being intermediate and the adaptive combined learning being the slowest. (see text for more details.)

Dasgupta et al. Neuromodulated adaptive neural combinatorial learning

reward-based learning system. Although both the individual systems eventually converge to provide the correct weights toward the green goal, the higher strength of the ICO component (ξ*ico*) leads to a good trajectory irrespective of the starting orientation of the robot. This is further illustrated in the simulation video showing three different scenarios of only ICO, only actor-critic and the combined learning cases, see Supplementary Movie 1.

#### **3.6. CASE II: FORAGING WITH SINGLE OBSTACLE**

In order to evaluate the efficacy of the two learning systems and their cooperative behavior, the robot was now placed in a slightly modified environment (**Figure 6B**). As in the previous case, the robot still starts from a fixed location with initial random orientations. However, it now has to overcome an obstacle placed directly in front (field of view), in order to reach the rewarding food source (green goal). Collisions with the obstacle, during learning, resulted in negative rewards (−1) triggered by the front left (*IR*1) and right (*IR*2) infrared sensors. This influenced the actor-critic learner to modulate the actor weights via

exploration steadily decreases and the value function prediction also reaches near convergence at 25 trials (1 trial approximates 1000 time steps). The thick black line represents the average value calculated over 50 runs of the experiment with standard deviation given by the shaded region. **(D)** Plots of the two orientation sensor readings (in degrees) for the green (μ*G*) and the blue (μ*B*) goals, averaged over 50 runs. During initial exploration the angle of the deviation of the robot from the two goals changes randomly. However, after convergence of the learning rules, the orientation sensor readings stabilize with small positive angle of deviation toward the green goal and large negative deviation from the blue goal. This shows that post learning, the robot steers more toward the green goal and away from the blue goal. Here the thick lines represent average values and the shaded regions represent standard deviation.

TD-error and generate turning behavior around the obstacles. In parallel, the ICO system, still learns only a default reflexive behavior of getting attracted toward either of the food sources by a magnitude proportional to its proximity to them (same as case I), irrespective of the associated rewards. As observed from the simulation snapshots in **Figure 7B**, after initial random exploration, the robot learns the correct trajectory to navigate around the obstacle and reach the green goal. From the synaptic weight development curves for the actor neuron (**Figure 8E**) it is clearly observed that although initially there is a competition between *w*μ*<sup>G</sup>* and *w*μ*<sup>B</sup>* , after sufficient exploration, as the robot gets more positive rewards by moving to the green food source, the *w*μ*<sup>G</sup>* weight becomes larger in magnitude and eventually stabilizes.

Concurrently in **Figure 8F**, it can be observed that unlike the previous case the left infrared sensor input weight *wIR*<sup>1</sup> gets considerably higher as compared to *wIR*<sup>2</sup> . This is indicative of the robot learning the correct behavior of turning right in order to avoid the obstacle and reach the green goal. However, interestingly, as opposed to the simple case (no obstacle) the ICO learner tries to pull the robot more toward the blue goal, as seen from the weight development of ρμ*<sup>G</sup>* and ρμ*<sup>B</sup>* in **Figure 8G**. This behavior can be attributed to the fact that, as the robot reaches the blue object in the beginning, the fast ICO learner provides high weights for a reflexive pull toward the blue as opposed to the green goal. As learning proceeds and the robot learns to move toward the desired location (driven by the actor-critic system), the ρμ*<sup>G</sup>* weight also increases, however it still continues to favor the blue goal. As a result in order to learn the correct behavior the combined learning systems needs to favor the actorcritic mechanism more as compared to the naive reflexives from the ICO. This is clearly observed from the balancing between the two as depicted in the ξ*ico* and ξ*ac* weights in **Figure 8H**. Following the stabilization of the individual learning system weights, the combined learner provides much higher weighting of the actor-critic RL system. Thus, in this scenario, due to the added complexity of an obstacle, one sees that the reward modulated plasticity (RMHP rule) learns to balance the two interacting learning systems, such that the robot still performs the correct decisions overtime (see the simulation run from Supplementary Movie 2).

## **3.7. CASE III: DYNAMIC FORAGING (REVERSAL LEARNING)**

A number of modeling as well as experimental studies of decision making (Sugrue et al., 2004) have considered the behavioral effects of associative learning mechanisms on dynamic foraging tasks as compared to static ones. Thus, in order to test the robustness of our learning model, we changed the original setup (**Figure 6C**), such that, initially a positive reward (+1) is given for the green object and a negative reward (−1) for the blue one. This enables the robot to learn moving toward the green object while avoiding the blue object. However, after every 50 trials the sign of the rewards was switched such that now the blue object received positive reward, and the green goal the opposite. As a result the learning system needs to quickly adapt to the new situation and learn to navigate to the correct target. As observed in the **Figure 10B** initially the robot performs random explorations receiving a mixture of positive and negative rewards, however after sufficient trials, the robot reaches a stable configuration (exploration drops to zero) and receives positive rewards concurrently (**Figure 10A**). This corresponds to the previous case of learning to move toward the green goal. As the rewards were switched, the robot then obtained negative reward when it moved to the green object. As a consequence, the exploration gradually increased again; thereby the robot also exhibited random movements. After successive trials, a new stable configuration was reached with the exploration dropping to zero and now the robot received more positive rewards, however for the other target (blue object). This is depicted with more clarity, in the simulation snapshots in **Figure 7C** (beginning—random explorations, learn 1—reaching green goal, learn 2—reaching blue goal).

In order to understand how the combined learning mechanism handles this dynamic switching, in **Figure 11** we plot the synaptic weight developments of the different components.

Initially the robot behavior is shaped by the ICO weights (**Figure 11B**) which learn to steer the robot to the desired location, such that the reflex toward green object (ρμ*<sup>G</sup>* ) is stronger than that toward the blue object (ρμ*<sup>B</sup>* ). Furthermore, as the robot

**FIGURE 10 | Temporal development of the reward and exploration noise for the dynamic foraging task. (A)** Change in the reward signal (r) over time. Between 3 × 104 time steps and 5 × 104 time steps the robot learns the initial task of reaching the green goal, receiving positive rewards (+1), successively. However, after 50 trials (approximately 5 × 104 to 5.5 × 104 time steps) the reward signals were changed, causing the robot to receive negative rewards (−1) as it drives to the green goal. After around 10 × 104 time steps as the robot learns to steer correctly toward the new desired location (blue goal), it successively receives positive rewards. **(B)** Change in the exploration noise () over time. There is random exploration in the beginning of the task and after switching the reward signals (pink shaded regions), followed by stabilization and decrease in exploratory noise once the robot learns the correct behavior (gray shaded region). In both plots the thick dashed line (black) marks the point of reward switch.

receives more positive rewards, the basal ganglia system starts influencing it's behavior by steadily increasing the actor weights toward the green object (**Figure 11A**, *w*μ*<sup>G</sup>* , *wIR*<sup>1</sup> > *w*μ*<sup>B</sup>* , *wIR*<sup>2</sup> ). This eventually causes the exploration noise () to decrease to zero and the robot learns a stable trajectory toward the desired food source. This corresponds to the initial stable region of the synaptic weights between 2 × 104 and 6 × 104 time steps in **Figures 11A–C**. Interestingly the adaptive RMHP rule tries to balance the influence from the two learning systems with eventual higher weighting of the ICO learner. This is similar to the behavior observed in the no obstacle static scenario (**Figure 8D**). After 50 trials (5 × 10<sup>4</sup> time steps), the reward signs were inverted which causes the exploration noise to increase. As a result the synaptic weights try to adapt once again and influence the behavior of the robot,now toward the blue object. In this scenario although the actor weights eventually converge to the correct configuration of *w*μ*<sup>B</sup>* greater than *w*μ*<sup>G</sup>* , the cerebellar reflexive behavior remains biased toward the green object (previously learned stable trajectory). This can be explained from the fact that the cerebellar or ICO learner has no knowledge of the type of reinforcement received from the food sources, and just naively tries to attract the robot to a goal when it is close enough (within the zone of reflex) to it. As a result of this behavior, the RMHP rule tries to balance the contributions of both learning mechanisms (**Figure 11D**), by increasing the strength of the actor-critic RL component as compared to the ICO learner

**FIGURE 11 | Synaptic weight change curves for the dynamic foraging task. (A)** Change in the synaptic weights for actor-critic RL learner. Here *w*μ*<sup>G</sup>* corresponds to the input weights of the orientation sensor toward the green food source (spherical object) and *w*μ*<sup>B</sup>* corresponds to the input weights of the orientation sensor toward the blue. **(B)** Change in the synaptic weights of the ICO learner. ρμ*G*—the CS stimulus weight for the orientation sensor toward green, ρμ*<sup>B</sup>* the CS stimulus weight for the orientation sensor toward blue. **(C)** Change in the weights of the two infrared sensor inputs to the actor. *wIR*1—left IR sensor weight, *wIR*2—right IR sensor weights. Modulation of the IR sensor weights initially and during the periods 7 × 104 - 9 × 104 time steps can be attributed to the high degree of exploration during this time, where in the robot has considerable collisions with the boundary walls triggering these sensors (see **Figure 7C**). **(D)** Learning curve of the RMHP combined learning mechanism showing the change in the weights of the individual

component (ξ*ac* > ξ*ico*). This lets the robot, now learn the opposite behavior of stable navigation toward the blue food source, causing the exploration noise to decrease once again. Thus, through the adaptive combination of the different learning systems, modulated by the RMHP mechanism, the robot was able to deal with dynamic changes in environment and complete the foraging task successfully (see the simulation run in Supplementary Movie 3).

Furthermore, as observed from the rate of success on the dynamic foraging task (**Figure 12A**), the RMHP based adaptive combinatorial learning mechanism clearly outperforms the individual systems (only ICO or only actor-critic RL). Here the rate of success was calculated as the percentage of times the robot was able to successfully complete the first task of learning to reach the green food source (green colored bars), and then after switching

components. ξ*ico*—weight of the ICO network output (depicted in red), ξ*ac*—weight of the actor-critic RL network output (depicted in black). Here the ICO weights converge initially for the first part of the task, however fail to re-adapt upon change of reward signals. This is counter balanced by the correct evolution of the actor weights. As a result although initially the combinatorial learner places higher weight for the ICO network, after task switch, due to change in reinforcements the actor-critic RL system receives higher weights and drives the actual behavior of the robot. The inlaid plots show a magnified view of the two synaptic weights between 9.5 × 104 - 10 × 104. The plots show that the weights do not change in a fixed continuous manner, but increase/decrease in a step like formation corresponding to the specific points of reward activation (**Figure 10A**). In all the plots the gray shaded region mark the region of convergence for the respective synaptic weights, and the thick dashed line (black) marks the point of reward switch. (see text for more details).

of the rewards signals, the percentage of times it successfully reached the blue food source (blue colored bars). Furthermore, in order to test the influence of the RMHP rule, we tested the combined learning with both, equal weightage to ICO and actorcritic systems as well as a plasticity induced weighting for the two individual learning components. It was observed that although for the initial static case of learning to reach the green goal the combined learning mechanism with equal weights works well, the performance drops considerably, after the reward signals were switched, and re-adaptation was required. Such a performance was also observed in our previous work (Manoonpong et al., 2013) using a simple combined learning model of feed-forward actor-critic (radial basis function) and ICO learning. However, in this work we show that the combination of a recurrent neural network actor-critic with ICO learning, using the RMHP

rule, was able to re-adapt the synaptic weights and combine the two systems effectively. The learned behavior greatly outperforms the previous case and shows a high success rate for both, the initial navigation to green goal location and successively to the blue goal location, after switching of reinforcement signals.

In **Figure 12B**, we plot the average time taken to learn the first and second part of the dynamic foraging task. The learning time was calculated as the number of trials required on successful completion of the task (i.e., successively reaching green or blue goal/food source location) averaged over 50 runs of the experiment. The combined learning mechanism with RMHP, successfully learns the task in less trials, as compared to the individual learning systems. However there was a significant increase in the learning time after the switching of reward signals. This can be attributed to the fact that after exploration goes to zero initially, a stable configuration is reached, the robot needs to perform more random explorations in order to change the strength of the synaptic connections considerably such that the opposite action of steering to the blue goal can be learned. Furthermore, as expected from the relatively fast learning rate of the ICO system, it was able to learn the tasks much quicker as compared to the actor-critic system, however its individual performance was less reliable than the actor-critic system as observed from the success rate (**Figure 12A**). Taken together, our model of RMHP induced combination mechanism provides a much more stable and fast decision making system as compared to the individual systems or a simple naive parallel combination of the two.

## **4. DISCUSSION**

Numerous animal behavioral studies (Lovibond, 1983; Brembs and Heisenberg, 2000; Barnard, 2004) have pointed to an interactive role of classical and operant conditioning in guiding the decision making process for goal-directed learning. Typically a number of these psychology experiments reveal compelling evidence that both birds and mammals, can effectively learn to perform sophisticated tasks when trained using a combination of these mechanisms (Staddon, 1983; Shettleworth, 2009; Pierce and Cheney, 2013). The feeding behavior of Aplysia have also been used as model systems in order to compare classical and operant conditioning at the cellular level (Brembs et al., 2004; Baxter and Byrne, 2006) and also study how predictive memory can be acquired by the neuronal correlates of the two learning paradigms (Brembs et al., 2002).

In case of the mamalian brain recent experimental evidence (Neychev et al., 2008; Bostan et al., 2010) point toward the existence of direct communication and interactive combination between the neural substrates of reward learning and delay conditioning learning systems, namely the basal ganglia and the cerebellum. However, the exact mechanism by which these two neural systems interact is still largely unknown. Few experimental studies suggest that such a communication could exists via the thalamus (Sakai et al., 2000), through which reciprocal connections from these two areas connect with the cortical areas in the brain (see **Figure 1**) (McFarland and Haber, 2002; Akkal et al., 2007). As such, in this paper we make the hypothesis (neural combined learning) that such a combination is driven by a reward modulated heterosynaptic plasticity (Legenstein et al., 2008; Hoerzer et al., 2012), triggered by dopaminergic projections (García-Cabezas et al., 2007; Varela, 2014) existing at the thalamus that dynamically combines the output from the two areas and drives the overall goal directed behavior of an organism. It is important to note that, it is also possible that thalamic projections carrying basal-ganglia and cerebellar inputs could eventually converge onto a single pyramidal cell via relay neurons at the motor cortex. Furthermore, as the motor and frontal cortical regions together with the striatum, have been observed to receive particularly dense dopaminergic projections from the mid brain areas (VTA) (Hosp et al., 2011), it is plausible that the proposed neuromodulatory heterosynaptic plasticity could also occur directly at the cortex (Ni et al., 2014). We model the classical delay conditioning paradigm observed in the cerebellum with the help of input correlation learning (Porr and Wörgötter, 2006), while reward based learning modulated by prediction errors, is modeled using a temporal difference model of actor-critic learning. Using a simple robot model, and three different scenarios of increasing complexity for a foraging task, we demonstrate that the neural combinatorial learning mechanism can effectively and robustly enable the robot to move toward a desired food source while learning to avoid a negatively rewarded, undesired food source while being considerably robust to dynamic changes in the environmental setup.

Although there have been a few robot studies, trying to model basal ganglia behavior (Gurney et al., 2004; Prescott et al., 2006) and cerebellar learning for classical conditioning (Verschure and Mintz, 2001; Hofstoetter et al., 2002), to the best of our knowledge they have only been applied individually. In this study, for the first time, we show how such a combined mechanism can be implemented using a wheeled robot that leads to a more efficient decision making strategy. Although designed with a simplified level of biological abstraction, our model sheds light toward the way basal gangliar and cerebellar structures in the brain indirectly interact with each other through sensory feedback. Furthermore, our model of the critic based on a reservoir network, takes into account the strong reciprocal recurrent connections in the cortex that provide input to the striatal system (this is analogous to the output layer in our model) while being modulated by dopaminergic neural activity (TD-error). Such reservoir models of the basal ganglia system have also been previously implemented in the context of learning language accusation (Hinaut and Dominey, 2013) or for modeling the experimentally observed varying timescales of neural activity of domapinergic neurons (Bernacchia et al., 2011). Specifically in this work, the reservoir also provides a fading memory of incoming sensory stimuli (Dasgupta et al., 2014) that can enable the robot to deal with partially observable state space problems as shown previously in Dasgupta et al. (2013b). As a result such a recurrently connected network typically outperforms non-linear feed-forward models of the critic (Morimoto and Doya, 1998). Although beyond the scope of the current article, our work with the reservoir based critic sheds new insights in to how large recurrent networks can be trained in a non-supervised manner using reward modulation and a simple recursive least squares algorithm, which has hitherto been a difficult problem, with only few simple models existing that work on synthetic data (Hoerzer et al., 2012) or require supervised components (Koprinkova-Hristova et al., 2010).

In the context of goal directed behavior, one may also draw similarity of the basic reflexive mechanism learned by the cerebellum (Yeo and Hesslow, 1998) to innate or intrinsic motivations in biological organisms, in contrast to more extrinsic motivations (in the form of reinforcing evaluative feedbacks) provided by the striatal dopaminergic system of the basal ganglia (Boedecker et al., 2013). Our hypothesis is that in order for an organism to make decisions in a dynamic environment, where in, certain behaviors result in basic reflexes (based on CS—US conditioning) while others lead to specific rewards or punishments, it needs a mechanism that can effectively combine these, in order to accomplish the desired goal. Our neuromodulation scheme, namely, the RMHP rule provides such an adaptive combination that guides the behavior of the robot over time in order to achieve stable goal directed objectives. Particularly, our RMHP based combined learning model provides evidence that cooperation between reinforcement learning and correlation learning systems can enable agents to perform fast and stable reversal learning (adaptation to dynamic changes in the environment). Such combination mechanisms could be crucial in dealing with navigation scenarios involving contrasting or competing goals, with gradual or sudden changes to environmental conditions. Furthermore, this could also point toward possible adaptation or mal-adaptation between the basal ganglia and cerebellum in case of neurological movement disorders like dystonia (Neychev et al., 2008) which typically involve both these brain structures.

Over all our computational model based on the combinatorial learning hypothesis shows that indeed the learning systems of the basal ganglia and the cerebellum can adaptively balance the output of each other in order to deal with changes in environment, reward conditions, and dynamic modulation of prelearned decisions. Although here we modeled a novel reward modulation between the two systems, no direct feedback (interaction) between the cerebellum and basal ganglia was provided. In the future we plan to include such direct communication between the two in the form of inhibitory feedback, as evident from recent experimental studies (Bostan et al., 2010). However, in its current form, we envision such an adaptive combinatorial learning approach to have wide impact on bio-mimetic agents, in order to provide better solutions of decision making problems in both static and dynamic situations, as well as show how the neuromodulation of executive circuits in the brain can effectively balance output from different areas. While our combined learning model verifies that the adaptive combination of the learning systems of the basal ganglia and the cerebellum leads to effective goal-directed behavior control in an artificial system, it would be interesting to further investigate this combination in biological systems, particularly in terms of the underlying neuronal correlates. As observed by Williams and Williams (1969) in a pigeon pecking at an illuminated key in a Skinner box, their results suggest that the desired key-pecking behavior CR may be shaped (autoshaping) by not only operant conditioning but also by classical conditioning; since imposing an omission schedule on the key-light, key-peck association did little to revoke the conditional pecking response. Hence, it seems that the existing occasional pairing of the key-light CS with the food US are adequate to drive the pecking behavior (CR), which thus emerge from classical conditioning. Based on these principles, several animal behavioral studies have observed similar autoshaping effects even in rodents (Cleland and Davey, 1983; Meyer et al., 2014), where, multiple sources of information (e.g., colored lights or sound (conditioned stimuli), food (reward or unconditioned stimuli), and response levers or keys shape and guide the animal responses over time toward desired behaviors. Although both the basal ganglia (Winstanley et al., 2005) and the cerebellum (Klopf, 1988) have been studied with regards to such behaviors, it has been largely carried out separately. However, our results on artificial systems indicate that their combined learning produces more efficient goal directed behaviors, specially in reversal learning (dynamic foraging) scenarios. As such, future neurobiological (combining lesion and tracing studies) and animal psychology experiments could investigate classical conditioning (correlation-based learning) in the cerebellum , operant conditioning (reward-based learning) in the basal ganglia and their combination for goal-directed behavior control in animals like rodents or birds. Furthermore, although we specifically investigated goal-directed behaviors in this study, there is wide spread evidence of habit learning (Yin and Knowlton, 2006) and motorskill learning (Salmon and Butters, 1995) in both these brain structures and their implications on neurodenerative diseases like parkinson (Redgrave et al., 2010). Future experimental studies based on this combined learning hypothesis could investigate how the such a combination and interaction between the two learning systems influence goal directed decisions making vs habitual behaviors and the effect on neurodegenrative diseases by possible imbalances between them (de Wit et al., 2011).

### **AUTHOR CONTRIBUTIONS**

Conceived and designed the experiments: Sakyasingha Dasgupta, Poramate Manoonpong, and Florentin Wörgötter. Performed the experiments: Sakyasingha Dasgupta. Analyzed the data: Sakyasingha Dasgupta and Poramate Manoonpong. Wrote the paper: Sakyasingha Dasgupta. Read and commented on the paper: Poramate Manoonpong and Florentin Wörgötter.

## **ACKNOWLEDGMENTS**

This research was supported by the Emmy Noether Program (DFG, MA4464/3-1), the Federal Ministry of Education and Research (BMBF) by a grant to the Bernstein Center for Computational Neuroscience II Göttingen (01GQ1005A, project D1) and the International Max Planck Research School for Physics of Biological and Complex Systems scholarship.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fncir.2014. 00126/abstract

#### **REFERENCES**


Haykin, S. S. (2002). *Adaptive filter theory*. Upper Saddle River, NJ: Prentice Hall.


*1998. Proceedings., 1998 IEEE/RSJ International Conference on* (Victoria, BC), 1721–1726.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 June 2014; accepted: 30 September 2014; published online: 28 October 2014.*

*Citation: Dasgupta S, Wörgötter F and Manoonpong P (2014) Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control. Front. Neural Circuits 8:126. doi: 10.3389/fncir.2014.00126*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Dasgupta, Wörgötter and Manoonpong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Thalamic neuromodulation and its implications for executive networks

### *Carmen Varela\**

Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA

#### *Edited by:*

Guillermo Gonzalez-Burgos, University of Pittsburgh, USA

#### *Reviewed by:*

Robert P. Vertes, Florida Atlantic University, USA Randy M. Bruno, Columbia University, USA

#### *\*Correspondence:*

Carmen Varela, Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Building 46, Room 5233, 43 Vassar Street, Cambridge, MA 02139, USA e-mail: carmenv@mit.edu

The thalamus is a key structure that controls the routing of information in the brain. Understanding modulation at the thalamic level is critical to understanding the flow of information to brain regions involved in cognitive functions, such as the neocortex, the hippocampus, and the basal ganglia. Modulators contribute the majority of synapses that thalamic cells receive, and the highest fraction of modulator synapses is found in thalamic nuclei interconnected with higher order cortical regions. In addition, disruption of modulators often translates into disabling disorders of executive behavior. However, modulation in thalamic nuclei such as the midline and intralaminar groups, which are interconnected with forebrain executive regions, has received little attention compared to sensory nuclei. Thalamic modulators are heterogeneous in regards to their origin, the neurotransmitter they use, and the effect on thalamic cells. Modulators also share some features, such as having small terminal boutons and activating metabotropic receptors on the cells they contact. I will review anatomical and physiological data on thalamic modulators with these goals: first, determine to what extent the evidence supports similar modulator functions across thalamic nuclei; and second, discuss the current evidence on modulation in the midline and intralaminar nuclei in relation to their role in executive function.

**Keywords: modulators, acetylcholine, serotonin, dopamine, noradrenaline, histamine, midline, intralaminar**

#### **INTRODUCTION AND KEY TERMS**

#### **THALAMIC AFFERENTS: DRIVERS AND MODULATORS**

All the forebrain structures that contribute to cognitive functions receive input from the thalamus, which is a critical point for the routing of information and gateway control. Thalamic cells receive two general types of afferents, drivers and modulators. Thalamic drivers are afferents that target proximal dendrites with relatively large synaptic boutons, reliably evoke spikes in thalamic cells, and whose function is thought to be the faithful transmission of the spike message relayed by thalamic cells to postsynaptic structures. In contrast, modulators are those afferents that target primarily distal dendrites and influence spike transmission by adjusting the cellular and synaptic mechanisms underlying spike generation; by doing so, they are thought to fine-tune the message relayed by thalamic cells and control its probability of transmission (reviewed in Sherman and Guillery, 1998; Guillery and Sherman, 2002). It should be noted that this distinction between drivers and modulators is largely based on evidence from the sensory thalamus, which has critical relay functions. Outside of the sensory thalamus, the evidence (still scarce and mostly anatomical) suggests that the anatomical features that distinguish drivers and modulators are present in all thalamic nuclei, although the functional correlates regarding spike generation and transmission still need to be characterized for many thalamic regions. For example, nuclei outside the primary sensory thalamus receive afferents with driver morphology from multiple sources (Baldauf et al., 2005; Masterson et al., 2009). These drivers converging onto individual cells may contribute to spike generation like the drivers in sensory thalamus, but each of them could also contribute to subthreshold modulation that is integrated across all drivers to generate an output, something that will need to be tested. Similarly, some modulators outside of the primary sensory thalamus share features of drivers (such as the large cholinergic afferents in some higher order nuclei, reviewed below). Therefore, the definition of drivers and modulators that is used here is an operational definition that may need refinement as we learn more about the thalamus.

In every thalamic nucleus studied to date, modulator synapses are found to constitute the vast majority of inputs to a given relay cell. The innervation by modulators is particularly dense in the midline and intralaminar groups of thalamic nuclei, both interconnected with executive areas such as the medial prefrontal cortex (mPFC) and basal ganglia. mPFC and the basal ganglia have been extensively studied, including the effect of modulators on these regions. Surprisingly, the midline and intralaminar nuclei are largely unexplored territory compared to neocortex, basal ganglia, or the sensory thalamic nuclei. Even some basic questions, such as the cell response properties or the modulator effects on these thalamic nuclei, remain unanswered. This review will first discuss anatomical and physiological results on modulators across the thalamus. In the second part, it will review recent evidence that highlights the importance of midline and intralaminar nuclei in executive functions, and the role of modulators in these nuclei. The objective is to point out important gaps in knowledge and untested hypotheses regarding the function of modulators in the thalamus. Recent technological developments (optogenetics, pharmacogenetics, clearing techniques such as "clarity") provide powerful

tools to address many open questions that must be answered in order to elucidate the role of thalamic modulation in executive networks.

Modulators constitute a heterogeneous group of afferents that nevertheless share some anatomical and physiological properties across the thalamus (reviewed in Sherman and Guillery, 1998; **Figure 1A**). Modulators originate in a variety of brain regions and use various neurotransmitters (summarized schematically in **Figure 1B**). Examples of thalamic modulators include cholinergic, serotonergic, dopaminergic, and noradrenergic afferents from the brainstem, histaminergic afferents from the hypothalamus, and glutamatergic afferents from layer VI of neocortex.

The first part of this review will discuss the anatomy and physiology of six chemically defined modulators across the thalamus. GABAergic inputs to thalamic nuclei, which originate primarily from diencephalic sources, will not be considered in this review. Furthermore, many neuroactive peptides (including orexins) co-localize with neurotransmitter systems in the thalamus (reviewed in Jones, 2007), and can have wider effects than neurotransmitters, for example, on gene expression, synaptogenesis, local blood flow, etc. Because of their broad spectrum of actions they fall far from the scope of this review. Similarly, other unconventional neurotransmitters like endocannabinoids, purines, and nitric oxide are present in the terminals of some thalamic afferents (reviewed in Jones, 2007), but their effects will not be examined here.

#### **THALAMIC NUCLEI: FIRST AND HIGHER ORDER**

Guillery (1995) distinguished two groups of thalamic nuclei: "First order" are those nuclei that receive drivers from ascending afferent pathways, and transmit information that arrives at the thalamus for the first time. Nuclei in the other group were named "higher order," and are those that relay information that has gone through the thalamus at least once (through a first order nucleus). The main feature that distinguishes higher from first order nuclei is that at least some of their driver input originates in layer V of neocortex; for this reason, they are thought to participate in cortico-cortical communication (Theyel

representation of the features that distinguish drivers and modulators in first order (left) and higher order (right) nuclei. Note the higher fraction of modulators in higher order nuclei, where some modulators have large synapses contacting proximal dendrites. In both first and higher order nuclei, modulators activate ionotropic and metabotropic receptors and, in those in

repetitive stimulation. **(B)** Approximate location of the brain regions that provide modulator afferents to the thalamus, color-coded for the neurotransmitter they use. The outline of a rodent brain is used for convenience, although the diagram combines results from different species (see text for details).

et al., 2010). The first order group includes the ventral posterior, the ventral part (parvocellular) of the medial geniculate nucleus, the dorsal lateral geniculate nucleus (LGN), and the anterior thalamic nuclei, which receive somatosensory, auditory, visual, and mammillary afferents, respectively. There is evidence of layer V neocortical input for most of the other thalamic nuclei. These higher order nuclei have projections to higher order cortical regions (Clascá et al., 2012) and accumulating evidence points to the role of these nuclei in cognitive processes. See **Figure 2** for a schematic representation of thalamic nuclei at three anteroposterior levels of the rat thalamus, color-coded to indicate the first or higher order nature of each nucleus.

Modulators contribute a large proportion of the synapses that thalamic cells receive, their axonal terminals have thin and diffuse branches, and their terminals contain round small vesicles (they are called RS terminals, for their "Round" vesicles and "Small" size). Most RS terminals (the typical modulator morphology) contact the distal and intermediate parts of the dendrites. In cells reconstructed from thalamic first order nuclei, RS terminals form 40–80% of the synapses in distal and intermediate dendrites (Wilson et al., 1984; Raczkowski et al., 1988;

Paxinos and Watson, 2004) and their abbreviated names. First order nuclei are labeled in blue, higher order in red, and nuclei that have not been

not include the medial geniculate nucleus (more posterior), which includes a first order region (the ventral portion) as well as higher order regions (dorsal portion).

Liu et al., 1995). The location of RS contacts far from the soma is consistent with them having a weak effect on spike generation. Still, in these studies the focus was on identifying terminal types, since RS terminals are likely to correspond to modulators. When modulators are identified by their neurotransmitter (reviewed below), additional terminal types and dendritic targets can be identified. For example, some modulator terminals (e.g., some cholinergic terminals in higher order nuclei) contact proximal dendrites, overlapping with driver synapses, and can be fairly large; and yet other modulators (serotonergic, noradrenergic, histaminergic) form "en passant" synapses, with little morphological specialization.

In first order thalamus, cortical layer VI and cholinergic inputs account for the majority of modulators. Each of these inputs contributes almost 50% of the RS terminals in the cat LGN (Erisir et al., 1997b). Also, after injections of retrograde tracer in first order thalamic nuclei, and staining for the tracer and for acetylcholine markers on the brainstem, the majority of cells are double-labeled (Mesulam et al., 1983; Hallanger et al., 1987; Lee et al., 1988; Steriade et al., 1988). The proportion of retrogradely labeled cholinergic brainstem cells was in the order of 70–85% when the retrograde tracer was injected in first order nuclei like the ventral posterior, LGN, and medial geniculate (Steriade et al., 1988), suggesting that most of the brainstem modulators to these nuclei originate in cholinergic cells.

In higher order nuclei, the overall number of modulator synapses is almost twice the number in first order nuclei (Van Horn and Sherman, 2007). This difference could result from an increased number of modulator axons sent to higher order nuclei, or it could reflect an increased number of synapses per axon. It could also indicate the existence of additional afferent centers providing extra modulator input to higher order cells. Consistent with the latter, the proportion of brainstem cells projecting to higher order nuclei that were cholinergic was roughly 60% in the cat ventral anterior, ventral lateral, and anterior ventral; 45% in the macaque lateral posterior and pulvinar nuclei; and as low as 28% in the cat mediodorsal (Steriade et al., 1988), although the same study found the fraction to be 82% in the cat lateral posterior, a proportion more similar to the first order thalamus. Only 25% of brainstem cells retrogradely labeled from tracer injections in the intralaminar centromedian and parafascicular nuclei were cholinergic (Paré et al., 1988). Overall, the evidence suggests that additional brainstem modulators (in addition to cholinergic) project to higher order nuclei.

Within the higher order nuclei, the midline and intralaminar groups are densely interconnected with executive areas (mPFC, basal ganglia). Additional higher order nuclei outside the midline and intralaminar project to executive regions (Vertes et al., 2014). For example, the anterior nuclei are highly interconnected with the cingulate and retrosplenial cortices, and with mPFC. The motor thalamus (ventral anterior, ventral lateral, and ventral medial nuclei) has projections to the basal ganglia and motor cortices. The anterior and motor groups have been studied mainly in the context of their roles in episodic memory and motor control, and little is known about their participation in executive function. For this reason, the second part of this review will focus on the midline and intralaminar nuclei.

#### **THALAMIC MODULATORS**

#### **GLUTAMATE: LAYER VI CORTICOTHALAMIC MODULATORS**

Layer VI afferents are the most studied of thalamic modulators. The evidence indicates that they form a complex network from layer VI sublaminae to first and higher order thalamic nuclei; they are topographically and functionally organized, and have an important role in sensory gain control.

#### *Origin*

Thalamic glutamatergic modulators originate in layer VI of neocortical areas (Jacobson and Trojanowski, 1975; Kaitz and Robertson, 1981; Kelly and Wong, 1981; Abramson and Chalupa, 1985; Giguere and Goldman-Rakic, 1988; Conley and Raczkowski, 1990; Ojima, 1994; Bourassa et al., 1995; Bourassa and Deschênes, 1995; Lévesque and Parent, 1998; Wang et al., 1999; Murphy et al., 2000; Kakei et al., 2001; Killackey and Sherman, 2003; Cappe et al., 2007; Briggs,2010). Allocortical areas also send afferents to the thalamus (Price and Slotnick, 1983; Cornwall and Phillipson, 1988; Groenewegen, 1988; Risold et al., 1997; McKenna and Vertes, 2004; Cenquizca and Swanson, 2006; Varela et al., 2014), although their glutamatergic nature needs confirmation. Dekker and Kuypers (1976) reported the presence of small terminals in the thalamus after injection of tritiated aminoacids in hippocampus, which suggests that they are modulators, but the driver/modulator nature of hippocampo-thalamic projections remains to be investigated with modern techniques.

In neocortex, about 30–50% of the pyramidal cells in layer VI project to the thalamus (Thomson, 2010), and the anatomy of corticothalamic projections suggests a high degree of topographic precision in the function of layer VI compared to other modulators (Murphy et al., 1999; Hazama et al., 2004). Layer VI also contains cortico-cortical projecting cells, but corticothalamic cells do not project to other cortical areas (Petrof et al., 2012). In addition, different subdivisions of layer VI project to first and higher order nuclei (Conley and Raczkowski, 1990; Bourassa et al., 1995; Bourassa and Deschênes, 1995; Killackey and Sherman, 2003), and the organization of projections increases in complexity in monkeys compared to rodents. In rats, pyramidal cells in the upper portion of layer VI of primary sensory cortices project to their corresponding first order nucleus (LGN, ventral posterior), while the lower layer VI projects to the higher order (posterior medial and lateral posterior nuclei). Axons from lower layer VI frequently branch to innervate both the first and higher order nuclei in rat (Bourassa et al., 1995; Bourassa and Deschênes, 1995). In prosimians (galago), lower layer VI cells do not branch and, instead, different subsets of cells provide input to the LGN and the pulvinar nuclei (Conley and Raczkowski, 1990). Of the three tiers of layer VI in macaques, only the upper and lower have corticothalamic projections. Each of these two sublaminae is part of a distinct functional network, with the upper layer targeting the magnocellular layers in LGN, as well as their cortical targets in layer IVCalpha. The lower layer VI sublamina projects to parvocellular LGN cells, as well as to their target, layer IVCbeta (Thomson, 2010; Briggs and Usrey, 2011). Whether functional classes in other nuclei are similarly organized in parallel circuits with layer VI remains an open question. It would be particularly interesting to investigate the functional organization in higher order cortical regions

(mPFC, higher order sensory areas) of different animal groups, since these areas become relatively enlarged through evolution (Krubitzer and Seelke, 2012) and may gain in network complexity as well.

Higher order nuclei receive layer VI inputs from multiple cortical areas, and we know less about the specific sublaminae within layer VI that contribute afferents to higher order nuclei. One possibility is that layer VI feedback follows a similar pattern to that observed in first order. This would mean that corticothalamic afferents reciprocating a thalamocortical projection would have an upper layer VI component, whereas non-reciprocal corticothalamic projections would originate in lower layer VI. There is evidence of this arrangement in the somatosensory system, where the posterior medial nucleus receives input from upper layer VI of the non-barrel cortex to which it projects, and also from the lower layer VI of primary somatosensory cortex, a main target of the ventral posterior nucleus (Killackey and Sherman, 2003). Similar results have been reported for the macaque mediodorsal nucleus (Giguere and Goldman-Rakic,1988), which receives upper layerVI input from mPFC as part of a reciprocal connection, but receives both upper and lower layer VI inputs from areas of the cingulate cortex that get only sparse mediodorsal afferents.

There is little information regarding the contributions from the contralateral hemisphere to the corticothalamic projections. Small terminals (potential layer VI projections) have been reported in the contralateral mediodorsal nucleus after unilateral tracer injections in mPFC (Négyessy et al., 1998). Contralateral projections were also demonstrated from the motor cortex to several motor, intralaminar, and somatosensory thalamic nuclei (Molinari et al., 1985; Alloway et al., 2008).

#### *Local network organization*

One of the key features that distinguish layer VI glutamatergic inputs from other glutamatergic inputs (e.g., layer V and noncortical drivers) is the dendritic location of their synapses. Cortical modulators target mostly distal dendrites in both first and higher order nuclei (Robson, 1983; Kultas-Ilinsky and Ilinsky, 1991; Erisir et al., 1997a; Wang et al., 1999; Bartlett et al., 2000). In fact, the glutamatergic modulators contact the relay cells in more distal locations than other modulators (Erisir et al., 1997a).

The arborization pattern of individual axons is quite distinct, and *in vivo* results indicate that their geometrical shape is linked to the cell's response properties. Individual axons from layer VI cells form terminal arbors with a plate-like (Ojima, 1994: ventral portion of the medial geniculate nucleus; Kakei et al., 2001: ventral anterior and lateral nuclei) or rod-like morphology (Bourassa et al., 1995: ventral posterior nucleus; Bourassa and Deschênes, 1995: LGN; Rockland, 1996: pulvinar nucleus). Bourassa et al. (1995) and Bourassa and Deschênes (1995) did not find a consistent arborization pattern in the posterior medial and lateral posterior nuclei. However, they did report that axonal plexuses were always in the horizontal plane in the lateral posterior nucleus, and showed examples of both rod and plate-like configurations. In the LGN, the orientation of the rod-like corticothalamic terminals correlates with the response properties of the cells of origin, with the orientation of the terminals being either parallel or perpendicular to the orientation preference of the cells of origin (Murphy

et al.,1999); thefunctional correlates of these arborization patterns need to be tested in other first and in higher order nuclei.

#### *In vitro results*

Layer VI corticothalamic afferents have a direct depolarizing effect on relay cells (Scharfman et al., 1990; Reichova and Sherman, 2004; Miyata and Imoto, 2006), and an indirect hyperpolarizing effect through the activation of the thalamic reticular nucleus (TRN; Landisman and Connors, 2007; Lam and Sherman, 2010). The direct excitatory effect is mediated by both ionotropic and metabotropic receptors (mGluRs). Although with exceptions, group I mGluRs are postsynaptic, and groups II and III are localized in presynaptic terminals (Niswender and Conn, 2010). Of the two group I mGluRs, mGluR1 contributes to the corticothalamic excitatory postsynaptic potentials (EPSPs) in the LGN, ventral posterior, and posterior medial nuclei (McCormick and von Krosigk, 1992; Turner and Salt, 2000; Reichova and Sherman, 2004). Instead, groups II and III mediate presynaptic inhibition of corticothalamic responses, both the direct EPSP (Turner and Salt, 1999; Alexander and Godwin, 2005) and the inhibitory postsynaptic potentials evoked by the TRN (Salt and Turner, 1998; Turner and Salt, 2003). The inhibitory component from the TRN can also be diminished by cholinergic input (Lam and Sherman, 2010). Since activation of mGluRs increases with the intensity of stimulation, presynaptic inhibition through group II receptors could prevent over-activation or saturation of thalamic responses. Recent evidence indicates that mGluRs can also be active with relatively low frequency of stimulation, which brings up the possibility of their involvement throughout the response curve of relay cells (Viaene et al., 2013). Another property of layer VI corticothalamic synapses is that the direct response facilitates following repetitive stimulation. The facilitation is the result of both presynaptic and postsynaptic mechanisms (Miyata and Imoto, 2006; Sun and Beierlein, 2011), and it is stronger for the EPSPs evoked on relay compared to TRN cells (Alexander et al., 2006; Jurgens et al., 2012).

The activation of postsynaptic mGluRs is critical for one the proposed functions of corticothalamic modulators: switching the firing mode of relay cells (McCormick and von Krosigk, 1992; Godwin et al., 1996). Relay cells in the thalamus fire spikes in two modes, burst and tonic (Jahnsen and Llinás, 1984). In tonic mode, relay cells respond in a linear fashion to their inputs, while burst firing is non-linear but provides better detectability (Sherman, 2001). Burst firing relies on the activation of a transient (T-type), low threshold, calcium current. Changes in membrane potential determine the de-inactivation and activation state of the calcium channels responsible for burst firing (Jahnsen and Llinás, 1984; Gutierrez et al., 2001). De-inactivation of the T current takes about 100 ms, which falls within the timeframe of mGluRs responses. The relatively slow dynamics of mGluRs leads to slow changes in the membrane potential that can influence the firing mode. Thus, layer VI activation of a relay cell would make it more likely to fire spikes in tonic mode, facilitating faithful signal transmission (Sherman, 2001).

#### *Systems level*

Most of the *in vivo* studies on corticothalamic projections have been done in the visual system in anesthetized preparations (recent reviews include Cudeiro and Sillito, 2006; Sillito et al., 2006; Briggs and Usrey, 2011), and only recently in awake animals (Olsen et al., 2012; Pais-Vieira et al., 2013). In the visual system, layer VI corticothalamic projections can influence centersurround strength without changing the spatial selectivity of receptive fields (Rivadulla et al.,2002;Jones et al.,2012). An important aspect of the corticothalamic input is that it is topographically and functionally organized, meaning that specific functional types of LGN cells (X, Y, W or parvocellular, magnocellular, koniocellular) will be influenced by layer VI cells with similar response properties. However, the effect on the firing rate of relay cells is reversed depending on the overlap of on–off receptive field regions. For example, an on-center relay cell with a receptive field overlapping with the "off" portion of a corticothalamic receptive field, would receive an excitatory influence from cortex, whereas if the overlapping fields were of the same sign, the influence would be inhibitory (Wang et al., 2006). Topographically organized effects are also observed in the somatosensory system, where activation of layer VI cells produced opposite effects on simultaneously recorded neighboring thalamic barreloids. During layer VI activation, cells in non-aligned thalamic barreloids were suppressed and less selective to preferred whisker stimulation. Instead, during activation of layer VI, responses in the topographically aligned barreloid were selectively increased to preferred whisker stimulation, leading to an increase in spatial tuning selectivity (Temereanca and Simons, 2004). Enhanced responses were also seen in thalamic barreloids after activation of topographically aligned regions in motor cortex, which could contribute to sensory gating and anticipatory responses in cortex

and thalamus during active whisking (Lee et al., 2008; Pais-Vieira et al., 2013).

The results from sensory systems demonstrate contributions to sensory processing, but corticothalamic inputs are found in every thalamic nucleus, which implies functions beyond specific sensory modalities. Layer VI cells receive input from all cortical layers and could serve to integrate processed cortical information with the direct input from the thalamus (Thomson, 2010). On the other hand, the effect of corticothalamic inputs on membrane potential points to a gain control system. There is evidence in support of the gain control hypothesis in the mouse visual cortex (Olsen et al., 2012), in which optogenetic manipulation (activation and inhibition) of layer VI scaled the tuning curves of cortical cells up and down without changes in response selectivity (**Figure 3**). Stimulation of layer VI linearly reduced cortical responses to the presentation of full-field gratings moving in different directions (**Figures 3A–C**), while inhibition of layer VI increased cortical responses (**Figures 3D–F**). This linear modification of the cortical tuning curves was found to result from the effect of layer VI on other cortical layers and on thalamic LGN cells. However, the effect on tuning curves was not tested in LGN, and the role of layer VI on gain control deserves further exploration at the thalamic level. In particular, although other modulators have an effect on membrane potential and could influence thalamic gain, the topographic and functional organization of the corticothalamic projection suggests that layer VI provides a more precise control than other modulators. Along these lines, corticothalamic projections could carry out topographically specific, top-down gain control in sensory nuclei as

**FIGURE 3 | Layer VI contributes to gain control in mouse visual cortex. (A)** Response of a layer V cell (spike rasters and peri-stimulus histograms) to visual stimuli with and without photostimulation of layer VI; black line above raster indicates stimulus presentation, blue indicates the time of optogenetic activation of layer VI. Visual stimuli were full-field gratings drifting in different directions (arrows); scale bar, 40 spikes/s. **(B)** Tuning curves for the cell in **(A)**, including the responses to nine stimulus directions, with (blue) and without (black) photostimulation of layer VI. **(C)** Population tuning curve with (blue) and

without (black) photostimulation of layer VI; the population tuning curve was generated by first circularly shifting the stimulus direction for each unit so that the maximal response occurred at 0◦. The responses were then normalized to this peak response and averaged (n = 55 units). **(D–F)** same as **(A–C)** but during photosuppression of layer VI, and using a cell from layer IV as example; scale bar in **(D)** 50 spikes/s; population tuning curve in **(F)** is the average of n = 52 units. (Reprinted from Olsen et al., 2012, with permission from Macmillan Publishers.)

a function of ongoing neocortical processing. It has also been suggested that they could implement predictive modulation (Sillito et al., 2006) in expectation of stimulus arrival or stimulus changes, such as when processing a moving stimulus. Future experiments should test these hypotheses, and step beyond sensory cortices to explore the role of layerVI in other thalamocortical networks.

#### **ACETYLCHOLINE**

Cholinergic systems have been broadly involved in state regulation (sleep–wake cycle, attention) and may contribute to state dependent changes in information routing in neocortex. The thalamus receives cholinergic input from a variety of sources that preferentially innervate higher order nuclei and, through these nuclei, could contribute to cholinergic mediated modulation in neocortex. In the thalamus, cholinergic terminals can have large synaptic boutons (with potentially strong postsynaptic effects), and the effect on relay cells can be circuit specific, determined by the cell's projection target.

#### *Origin*

Cholinergic input to the thalamus originates mainly in the pedunculopontine (PPT) and the laterodorsal tegmental (LDT) nuclei (Saper and Loewy, 1980; Mesulam et al., 1983; Sofroniew et al., 1985; Woolf and Butcher, 1986). Cholinergic neurons in the PPT and LDT are intermingled with non-cholinergic neurons but, after injection of retrograde tracers in the thalamus, most of the retrograde tracer is found in choline acetyltransferase positive neurons, suggesting that the non-cholinergic cells project sparsely to the thalamus (Mesulam et al., 1983; Sofroniew et al., 1985). Besides the PPT and LDT afferents, some thalamic nuclei (the mediodorsal, anterior ventral, anterior medial, and anterior intralaminar nuclei) receive cholinergic projections from the basal forebrain (Hallanger et al., 1987; Parent et al., 1988; Steriade et al., 1988; Heckers et al., 1992; Gritti et al., 1998), a region otherwise projecting to cortical areas and to the TRN (Saper, 1984; Hallanger et al., 1987). The parabigeminal nuclei provide additional cholinergic input to the LGN of cats and monkeys, a projection that is both ipsi- and contralateral in cats and strictly contralateral in the tree shrew (De Lima and Singer, 1987; Fitzpatrick et al., 1988, 1989; Smith et al., 1988; Bickford et al., 2000). Lastly, cholinergic neurons from the entopeduncular nucleus (Kha et al., 2000) and *substantia nigra* (*pars reticulata*; Kha et al., 2001) send axons to the rat ventral lateral and ventral medial nuclei, both part of the motor thalamus. Within the diencephalon, the medial habenula contains cholinergic neurons (Levey et al., 1987; Heckers et al., 1992), but its efferents appear to be directed outside the dorsal thalamus (Vincent et al., 1980).

#### *Local network organization*

PPT and LDT cholinergic projections have preferential targets within the thalamus. Sensory nuclei (LGN, ventral posterior, and the medial geniculate nuclei) receive most of their cholinergic afferents from PPT, whereas higher order nuclei and the anterior group have a LDT component (Woolf and Butcher, 1986; Hallanger et al., 1987; Smith et al., 1988; Steriade et al., 1988). This additional LDT innervation may contribute to the higher density of cholinergic fibers observed in some higher order compared to first order nuclei (Parent and Descarries, 2008).

Within the higher order group, the mediodorsal, the lateral posterior, ventral anterior, ventral lateral, laterodorsal, and posterior nuclei receive a substantial fraction of their cholinergic input from LDT. The two latter nuclei receive about two thirds of their brainstem cholinergic input from PPT and a third from LDT. Within the intralaminar, the central lateral seems to be primarily targeted by PPT, while the central medial has a large component from LDT (Woolf and Butcher, 1986; Hallanger et al., 1987). Anterograde tracers have also demonstrated LDT projections to the midline nuclei (Kuroda and Price, 1991); however, the relative contribution of PPT and LDT to the midline cholinergic innervation was not addressed in this study.

At least some of the cholinergic brainstem axons have collaterals that innervate more than one nucleus in the dorsal thalamus (Uhlrich et al., 1988; Shiromani et al., 1990; Bolton et al., 1993), and can innervate the TRN as well (Spreafico et al., 1993). In some cases, the axons remain within nuclei of a particular sensory modality; e.g., the collaterals that innervate the LGN, lateral posterior, and pulvinar nuclei in cat (Uhlrich et al., 1988). There are other patterns of collateral projections, e.g., those that branch into several of the midline nuclei, or to midline and intralaminar (Bolton et al., 1993), or to LGN and intralaminar nuclei (Shiromani et al., 1990). More localized projections have been documented in the visual thalamus. Here, some axons terminate only in the LGN or only in the lateral posterior and pulvinar nuclei. Axons within the LGN distribute terminals across laminae or inside individual laminae (Uhlrich et al., 1988). It should be noted that in this study axons were not identified as cholinergic; however, results from retrograde tracer studies (see introduction) suggest that most or all of the reconstructed axons were cholinergic.

Cholinergic cells projecting to the thalamus can have branches to extra-thalamic regions as well. PPT projects both to the LGN and to the superior colliculus (Billet et al., 1999). Similarly, subsets of cells in PPT and LDT that project to the thalamus also project to the pontine reticular formation (Semba et al., 1990) and to the basal forebrain (Losier and Semba, 1993). The collaterals of cholinergic projections may contribute to the multi-regional coordination of state changes brought about by this system.

The ultrastructure of cholinergic terminals has been studied in a few first order – LGN, ventral posterior –, and higher order – anterior ventral, mediodorsal, parafascicular – nuclei (Hallanger et al., 1990; Kuroda and Price, 1991; Parent and Descarries, 2008). One feature of the LGN PPT terminals is that they contain the enzyme nitric oxide synthase (Cucchiaro et al., 1988; Hallanger et al., 1990; Bickford et al., 1993; Erisir et al., 1997a). In fact, cholinergic afferents may be the main, or even the sole, source of nitric oxide in the thalamus; although some serotonergic cells in the raphe express nitric oxide synthase, they do not project to the thalamus (Simpson et al., 2003). In the LGN, PPT terminals form asymmetric synapses on proximal and distal dendrites of relay cells, often in the vicinity of driver synapses, and occasionally in the soma. Compared to the LGN, the cholinergic terminals in the ventral posterior nucleus are sparser, smaller, and they establish asymmetric synapses on small dendrites (farther from the soma; Hallanger et al., 1990). The main difference between the ultrastructure of cholinergic terminals in first order and higher order nuclei is the much larger size in higher order. In both the mediodorsal and parafascicular nuclei, they can reach more than 2 μm (Hallanger et al., 1990; Kuroda and Price, 1991). In the mediodorsal nucleus, 90% of LDT boutons were larger than 1 μm, which is a size range more typical of drivers. In the ventral anterior nucleus, cholinergic terminals were less than 1 μm, but still larger on average than the terminals reported in the LGN and ventral posterior nuclei in the same preparation, suggesting a stronger effect on cells of higher order nuclei. The cholinergic terminals in the anterior ventral nucleus contacted dendrites of various sizes (often small dendrites and rarely somas), and they made occasional symmetric synapses in addition to the most common asymmetric contacts (Hallanger et al., 1990). The presence of nitric oxide synthase was not tested in higher order nuclei.

The larger cholinergic terminal size and fiber density in higher order nuclei may result in stronger postsynaptic effects on higher order compared to first order relay cells, something that can have important implications in cortical regions. As an example, association neocortical areas (those receiving afferents from higher order nuclei) present greater attentional modulation than primary cortical regions (Bender and Youakim, 2001; Maunsell and Cook, 2002), a function in which the cholinergic system may be involved. The attentional modulation observed in neocortex could reflect modulation at the thalamic level. Indeed, the evidence suggests that higher order nuclei, such as the pulvinar nucleus, have stronger attentional modulation than first order like the LGN (Bender and Youakim, 2001), and contribute to corticocortical synchronization during attentional tasks (Saalmann et al., 2012). Future manipulation experiments of higher order nuclei while observing the effect on attentional modulation in thalamus and cortex simultaneously, will help clarify the causal contribution of the thalamus to attentional modulation in cortical regions.

Another open question is the origin of the large cholinergic terminals. Higher order nuclei receive a substantial projection from the LDT, and one possibility is that LDT axons provide the larger terminals observed in the thalamus. A further point related to the terminal size is that large terminal size is commonly associated with drivers and not modulators. Cholinergic afferents with large terminals could have a strong effect on spike generation probability on higher order cells (e.g., in the mediodorsal and parafascicular nuclei) because, in addition to having a large size, cholinergic terminals in these cells contact dendritic regions that are close to the soma. Both the lateral mediodorsal nucleus and LDT have been suggested to participate in oculomotor control (Kuroda and Price, 1991) and it is possible that the LDT projection represents a driver input to the mediodorsal nucleus.

#### *In vitro and systems level*

Cholinergic activation depolarizes the majority of thalamic cells (Sillito et al., 1983; Francesconi et al., 1988; Curró Dossi et al., 1991), although some relay cells, as well as thalamic interneurons, are hyperpolarized by cholinergic agonists (McCormick and

Prince, 1986; McCormick and Pape, 1988; Hu et al., 1989; Murphy et al., 1994; Zhu and Heggelund, 2001; Varela and Sherman, 2007). In general, relay cells that are hyperpolarized by acetylcholine are in higher order nuclei (MacLeod et al., 1984; Mooney et al., 2004; Varela and Sherman, 2007; Beatty et al., 2009). Interestingly, at least in one higher order nucleus (the parafascicular), the sign of the cholinergic effect correlates with the projection target of the cell. Relay cells projecting to neocortex are depolarized by cholinergic agonists, whereas those projecting to striatum are inhibited (Beatty et al., 2009). This result has key implications for the function of thalamostriatal projections in behavioral flexibility, and will be discussed in the second part of this review. It also raises the possibility that the depolarizing or hyperpolarizing effect of modulators may be pathway specific in other nuclei; given the variety of modulator effects in higher order nuclei (**Figure 4**), the correlation between modulator effect and projection target needs to be tested for pathways from these nuclei.

Mixed responses, in which a hyperpolarization is followed by depolarization, have also been reported. This combined response was observed in the lateral posterior nucleus, in interneurons of the LGN (Zhu and Heggelund, 2001), and in a subset of cells of the ventral medial nucleus (MacLeod et al., 1984). It was also reported in about half of the cells in the guinea pig lateral and medial geniculate nuclei (McCormick and Prince, 1987; McCormick, 1992), and could represent species differences, with depolarization being the most common response in rat first order nuclei.

Overall, cholinergic-evoked depolarization (whether by itself or as part of a mixed response) is mediated by ionotropic and muscarinic (M1, M3) receptors (Zhu and Uhlrich, 1997, 1998; Mooney et al.,2004;Varela and Sherman,2007), whereas theM2 muscarinic receptor is responsible for the hyperpolarization of GABAergic cells (McCormick and Prince, 1986; Zhu and Heggelund, 2001).

Aside from the effect on membrane potential, other effects of acetylcholine at the thalamic level have not been extensively studied. Results outside the thalamus suggest that there is much to be explored regarding the functions of the cholinergic system in the thalamus (Picciotto et al., 2012), especially in behaving animals. In the slice preparation, acetylcholine affects neurotransmitter release and synaptic strength in intracortical and thalamocortical synapses (Favero et al., 2012), changes that can be important during the implementation of bottomup and top-down attentional regulation (Varela, 2013) and can only be studied in the behaving animal. In addition, results from a head-restrained preparation show that the effects on membrane potential observed in the slice may vary *in vivo* throughout the sleep–wake cycle. Iontophoretic application of cholinergic agonists in the LGN depolarized cells during wakefulness, as expected from the *in vitro* results, but had heterogeneous effects during slow-wave sleep (Marks and Roffwarg, 1989). Lastly, cholinergic activation enhances thalamocortical information transmission through nicotinic receptors located along the axons of the thalamocortical pathway (Kawai et al., 2007), a result that remains to be investigated in thalamic projections to other targets, like the basal ganglia and hippocampus.

thalamic nuclei from whole-cell patch clamp experiments in rat slices; data are color-coded according to the overall effect on excitability.

Varela and Sherman, 2007, 2009; with permission from the American Physiological Society and from Oxford University Press.)

### **SEROTONIN**

Serotonergic afferents to the thalamus have not received much attention, in spite of the critical involvement of serotonin in the control of the sleep–wake cycle and in disorders like depression (Monti, 2011; Kupfer et al., 2012). In the thalamus, serotonergic afferents target preferentially higher order nuclei, where they have heterogeneous effects on membrane potential and could evoke changes in firing mode throughout the sleep–wake cycle.

#### *Origin and local network organization*

The serotonergic axons innervating the thalamus have their origin in the medial and lateral divisions of the dorsal raphe (De Lima and Singer, 1987; Vertes, 1991; Gonzalo-Ruiz et al., 1995; Vertes et al., 1999, 2010; Kirifides et al., 2001), and in the median raphe (Gonzalo-Ruiz et al., 1995; Vertes et al., 1999). The projections do not always overlap; for example, the median raphe projects most heavily to the lateral mediodorsal nucleus, while the medial mediodorsal nucleus receives serotonergic input from the dorsal raphe (Groenewegen, 1988).

Just like with the cholinergic input, the distribution of serotonergic fibers within the thalamus is not uniform. The preferential targets are the midline and intralaminar nuclei, and, more generally, the higher order nuclei. The rest of the dorsal thalamus receives sparse innervation with the exception of the LGN (Cropper et al., 1984; Lavoie and Parent, 1991; Vertes, 1991; Vertes

et al., 1999, 2010). There is some evidence of local differences in innervation density within nuclei. The heaviest serotonergic innervation in the LGN is generally found in structures receiving input from W-ganglion cells (Ueda and Sano, 1986; Mize and Payne, 1987; Fitzpatrick et al., 1989), although others have found uniform innervation across the LGN and lateral posterior and pulvinar nuclei (Morrison and Foote, 1986).

Serotonergic afferents form asymmetric synapses along the dendrites (distal and proximal) of thalamic cells (Pasik et al., 1988; Liu and Jones, 1991). They also form atypical contacts (Liu and Jones, 1991), meaning that they do not present all the morphological specializations of a synapse, only a close membrane apposition.

### *In vitro and systems level*

Serotonin depolarizes thalamic cells in first order nuclei, such as the LGN, the ventral portion of the medial geniculate, the ventral posterior, and the anterior dorsal nuclei (Pape and McCormick, 1989; McCormick and Pape, 1990; Chapin and Andrade, 2001a; Monckton and McCormick, 2002). The depolarization results, at least in part, from changes in the voltage-dependence of the hyperpolarization-activated current, Ih (Pape and McCormick, 1989; McCormick and Pape, 1990; Chapin and Andrade, 2001b; Monckton and McCormick, 2002). Subsets of cells in higher order nuclei are either depolarized or hyperpolarized, and the

proportion of cells that show one or the other response varies between species (Monckton and McCormick, 2002; Varela and Sherman, 2009). When compared in the same preparation, the depolarization is much stronger in higher order than in first order areas (Varela and Sherman, 2009), consistent with the denser innervation in those nuclei. Overall, both acetylcholine and serotonin inhibit a subset of cells specifically in higher order nuclei, while the effect is mostly depolarizing in first order (**Figure 4**). The inhibition of cells in higher order means that, when active, these modulators could switch some cells to burst mode, which can contribute to the finding of more bursting in higher compared to first order nuclei (Ramcharan et al., 2005). In addition to the effect on the membrane potential, there is evidence that serotonin affects the response properties of some relay cells. Cells in the midline and intralaminar nuclei have a strong slow afterhyperpolarization (sAHP) that can last several seconds after a train of spikes. Serotonin depolarizes cells in these nuclei and inhibits the sAHP through 5-HT-7 receptors (Goaillard and Vincent, 2002).

There is little information from *in vivo* preparations on the role of serotonin on thalamic function. Activation of the dorsal raphe nucleus was reported to inhibit LGN cells in the anesthetized preparation (Kayama et al., 1989). However, this was observed after several seconds of stimulation, and could result from changes in synaptic plasticity somewhere else in the brain (Lesch and Waider, 2012). Another report in the anesthetized preparation (Grasso et al., 2006), found that serotonergic agonists infused in the motor thalamus (ventral anterior and ventral lateral nuclei) produced an inhibition of the discharge of these cells, consistent with the *in vitro* findings in higher order nuclei. The systems level approach to serotonergic function in the thalamus remains essentially uninvestigated. The study of serotonin outside the thalamus hints at critical roles for this neurotransmitter; from synapse development and plasticity to the learning of fear responses (Lesch and Waider, 2012). Future experiments should characterize the effect of serotonergic afferents on sensory responses, and on the response mode of thalamic cells across sleep states. Much like brainstem cholinergic centers, cells in the raphe change their activity as a function of state (Monti, 2011). Many of the raphe cells are REM-OFF, suggesting a reduction in serotonergic tone in the thalamus during REM, a reduction that can selectively affect the firing mode of higher order cells. An intriguing idea is that changes in firing mode in higher order nuclei could contribute to the selective activation of higher order cortical areas during REM, an activation that is thought to underlie dreaming (Hobson et al., 1998).

#### **NORADRENALINE**

Like with serotonin, the studies of noradrenergic modulation in the thalamus are fairly limited and much remains to be investigated. Recent evidence offers important cues that could instigate further research on this neurotransmitter; these results suggest a role of thalamic noradrenaline in sensory gating and in certain motor and executive disorders.

#### *Origin and local network organization*

The cells that provide noradrenergic afferents to the brain are located in the locus coeruleus (LC) and in the brainstem reticular formation. The thalamus receives its noradrenergic input mostly from cells in the LC – many of which also contain galanin (Simpson et al., 1997). Additional projections have been reported for the midline paraventricular nucleus from the A5 noradrenergic region in the brainstem (Swanson and Hartman, 1975; Morrison and Foote, 1986; Byrum and Guyenet, 1987; De Lima and Singer, 1987; Simpson et al., 1997; Vogt et al., 2008).

As with acetylcholine and serotonin, there are regional differences in the innervation of thalamic nuclei. For example, the LGN is virtually free of noradrenergic fibers, while the lateral posterior and pulvinar nuclei are densely innervated (Morrison and Foote, 1986). In the somatosensory thalamus, noradrenergic innervation is denser in the posterior medial nucleus (higher order) compared to the ventral posterior nucleus (Simpson et al., 1999). Therefore, similar to other modulators, the results in the sensory thalamus point to a more prominent role of noradrenaline in higher compared to first order nuclei. However, the limited evidence from the midline and intralaminar nuclei suggests that they receive sparse noradrenergic innervation, except for the midline paraventricular nucleus (Swanson and Hartman, 1975). Regarding ultrastructure, noradrenergic terminals in the thalamus are small, and, like serotonergic terminals, do not seem to form well differentiated synapses (Nothias et al., 1988).

#### *In vitro and systems level*

Noradrenaline applied *in vitro* to the LGN, medial geniculate, TRN, anterior ventral, and the paratenial nuclei, evoked a slow depolarization, which in turn reduced burst firing and promoted tonic activity (McCormick and Prince, 1988). The authors found that the depolarization was caused by a decrease in a potassium leak current and by changes in the voltage sensitivity of the Ih current. The Ih current could then remain active at resting membrane potentials and make it more difficult for cells to switch to burst mode (Pape and McCormick, 1989; McCormick and Pape, 1990). The effect of noradrenaline on the response properties of relay cells was tested in paratenial neurons, in which noradrenaline reduced the sAHP and decreased spike frequency adaptation (McCormick and Prince, 1988).

*In vivo*, in the anesthetized preparation, iontophoretic application of noradrenergic agonists inhibits thalamic cells in the motor thalamus (ventral anterior and ventral medial nuclei; Grasso et al., 2006). The sign of the effect is the opposite of that found by *in vitro* experiments, where depolarization was common. More research is needed to clarify if the different results indicate the variability of the responses across thalamic nuclei, or an effect of the anesthesia. Evidence from the awake preparation suggests that, although depolarization predominates in the somatosensory thalamus, inhibitory responses are fairly common too. Responses to whisker stimulation increased in most cells of the ventral posterior nucleus during stimulation of the LC, although between 20% (Moxon et al., 2007) and almost 40% (Devilbiss and Waterhouse, 2011) of the cells showed a suppression of their response. In particular, phasic stimulation of the LC had a permissive or "gating" effect in some cells, facilitating the response to a stimulus that the cell would otherwise not respond to in the absence of LC stimulation. Stimulation of the LC also enhanced the synchronization of sensory responses

between simultaneously recorded cells in the ventral posterior nucleus, with potential implications on temporal summation at the cortical level (Devilbiss and Waterhouse, 2011). Furthermore, noradrenaline changed the synaptic strength of intracortical and thalamocortical synapses in the slice preparation (Favero et al., 2012). In this study, noradrenaline facilitated thalamocortical relative to intracortical transmission in the input layers of cortex, a result that has implications for the routing of external vs. internal information during the sleep–wake cycle (Varela, 2013).

Aside from the effects on sensory gating, recent evidence suggests the involvement of thalamic noradrenaline modulation in executive and motor disorders. Infusion of noradrenergic agonists (but not serotonin) in the mediodorsal nucleus disrupts prepulse inhibition; prepulse inhibition paradigms are used as indicators of sensorimotor gating disruption in neuropsychiatric disorders, and it was suggested that noradrenergic activation in the mediodorsal nucleus reproduces some of the sensorimotor gating deficits observed in these disorders (Alsene et al., 2011). Likewise, noradrenaline may be critical for the normal function of the motor thalamus, which is suggested by the specific decrease of this neurotransmitter in the motor thalamus of the symptomatic MPTP (methyl-phenyl-tetrahydropyridine) primate model of Parkinson disease (Pifl et al., 2013). Overall, the available evidence indicates that noradrenergic modulation in the thalamus can influence sensory responses and, potentially, has considerable clinical relevance.

#### **DOPAMINE**

Dopamine is one of the thalamic modulators with more direct involvement in disease. The degeneration of dopaminergic cells in the *substantia nigra pars compacta* links this modulator to the pathogenesis of Parkinson disease. In addition, the role of dopaminergic cells from the ventral tegmental area (VTA) in reward signaling is thought to contribute to addiction, and to the symptomatology of disorders such as schizophrenia and depression. The thalamus does not receive strong dopaminergic innervation from the *substantia nigra*, but it gets dopamine afferents from the VTA and additional mesencephalic and diencephalic regions. Also, dopaminergic terminals are often near thalamic terminals at their targets (e.g., neocortex, striatum), indicating that at least some of the thalamic dopaminergic modulation may occur not at the soma, but at the terminal site.

#### *Origin and local network organization*

There is a wide range of brain areas, particularly in the primate, that provide dopaminergic input to the thalamus, including the hypothalamus, zona incerta, the VTA, the periaqueductal gray, and the lateral parabrachial nucleus, all of which project bilaterally to most nuclei of the macaque thalamus (Hughes and Mullikin, 1984; Sánchez-González et al., 2005). Dopaminergic projections to the thalamus from the *substantia nigra* are minimal, although there are non-dopaminergic projections from this region (Kuroda and Price, 1991; Sánchez-González et al., 2005; Melchitzky et al., 2006; Kusnoor et al., 2012). Some afferents, like those from the VTA, project broadly across the thalamus, whereas others have restricted projections, like those from the

hypothalamus and zona incerta, which have dense projections to the midline thalamus. Most of the projections to the midline do not express the dopamine transporter, and it has been suggested that the absence of the transporter could make the effect of dopamine less time and spatially restricted in these nuclei. The absence of the dopamine transporter has clinical implications as well, because this transporter is the point of action of drugs (amphetamines) and toxins (MPTP), suggesting that the midline dopaminergic afferents would be relatively protected against these substances compared to other nuclei (Sánchez-González et al., 2005).

There are important species differences in the density of thalamic dopaminergic innervation, with the primate thalamus having substantially higher densities compared to the rat (García-Cabezas et al.,2009). Dopaminergic fibers in the thalamus of primates often display higher densities than in cortex, and the density is highest in the motor and midline thalamus, and the lateral posterior nucleus (Sánchez-González et al., 2005); the lowest densities are found in sensory first order nuclei (LGN, medial geniculate, and ventral posterior nuclei). In primates, dopaminergic terminals contact the presynaptic dendrites of thalamic interneurons, raising the possibility that the denser dopaminergic innervation in primates is related to the increased number of interneurons in these animals (García-Cabezas et al., 2009).

#### *In vitro and systems level*

Outside of the thalamus, two types of dopaminergic receptors, D1 and D2, are often segregated in functional circuits, something that has yet to be explored in detail in the thalamus. Along these lines, D2 receptors are highly expressed in midline and intralaminar nuclei (Rieck et al., 2004; Piggott et al., 2007), and D1–D2 receptors mediate different effects on membrane potential in different nuclei. D1 mediates the depolarization of rat LGN cells in slices (Govindaiah and Cox, 2005), and D2 the hyperpolarization of most cells in the mediodorsal nucleus (Lavin and Grace, 1998). Furthermore, in the mediodorsal nucleus, D2 can influence the cells response properties, by facilitating the occurrence of low threshold burst spikes and increasing the sAHP (Lavin and Grace, 1998). Other dopaminergic receptors are present in the presynaptic terminals of thalamic afferents; for example, D4 can presynaptically and selectively decrease the inhibitory input from the *globus pallidus* to the TRN (Govindaiah et al., 2010).

*In vivo*, the results of iontophoretic application of dopamine were found to be dose-dependent, with dopamine facilitating visual responses at low doses and inhibiting responses at higher doses (Albrecht et al., 1996; Zhao et al., 2001, 2002). The inhibition at higher doses could result from the activation of local interneurons or TRN cells. Iontophoresis of D1 agonists suppressed visual responses in these studies, something in contrast to the depolarization seen in slices (Govindaiah and Cox, 2005); the use of more selective agonists and antagonists could help resolve the differences and characterize the effect of dopamine in sensory evoked responses.

The relatively weak dopaminergic innervation of the rat thalamus may have discouraged research on the function of this modulator at the thalamic level. However, the importance of dopamine modulation on thalamic function should not be underestimated. First, the dramatic increase in dopaminergic innervation in the primate thalamus compared to the rodent thalamus points to the evolutionary relevance of this system; it also suggests that dopamine may be specifically relevant for those functions that gain in importance through evolution, such as higher order cognitive functions. And, second, dopaminergic and thalamic synapses often converge on the same postsynaptic targets outside of the thalamus (Kuroda et al., 1996), suggesting that thalamic dopaminergic modulation may be more likely to occur at the level of thalamic terminals than at the soma.

#### **HISTAMINE**

Very little is known about the modulator functions of histamine in the thalamus, with most of the evidence coming from studies in the LGN. The activity of histaminergic cells varies across the sleep–wake cycle suggesting that, similar to serotonin, noradrenaline, and acetylcholine, this modulator may be involved in the regulation of general changes of activity across states of vigilance. However, the effect of histamine on the excitability of thalamic cells, and the selective modulation of thalamostriatal terminals by histamine suggest more complex functions that need to be investigated.

### *Origin and local network organization*

Histaminergic input arises from the tuberomammillary nucleus of the hypothalamus (Manning et al., 1996; Blandina et al., 2012). In the cat LGN, histaminergic fibers have a preference for zones innervated by the W-cell system (Uhlrich et al., 1993), although their distribution is more homogeneous in the macaque LGN (Manning et al., 1996; Wilson et al., 1999). No clear synaptic contacts were observed, only *en passant* swellings, which hint to a diffuse modulation mechanism (Uhlrich et al., 1993; Wilson et al., 1999).

### *In vitro and systems level*

*In vitro*, LGN cells are depolarized by histamine. The response has two components, the main one being an increase in input resistance mediated by H1 receptors. The second component is a smaller depolarization, which is observed after blockade of H1 receptors, is mediated by H2 receptors, and is associated with a decrease in input resistance (McCormick and Williamson, 1991). These *in vitro* results in the LGN are consistent with the effect of activating the tuberomammillary region *in vivo*, which results in increased firing in LGN cells, with no change of their spatial frequency tuning (Uhlrich et al., 2002). Conversely, a study testing iontophoretically applied histamine in the anterior and intralaminar nuclei found an inhibition of baseline firing (Sittig and Davidowa, 2001). More research is needed to characterize the effects of histamine across the thalamus and identify the receptors that mediate the responses in different nuclei. There are additional histaminergic receptors in thalamic cells (H3, H4), but evidence of their function is limited (Strakhova et al., 2009). In particular, H3 presynaptic receptors could be important in the modulation of thalamostriatal terminals, where they are expressed; these receptors selectively facilitate thalamostriatal – and not corticostriatal – synapses during repetitive stimulation (Ellender et al., 2011).

Cells of the tuberomammilary nucleus are only active during wakefulness and their degree of activation correlates with alertness levels (Takahashi et al., 2006), suggesting that its function in the thalamus may relate to attentional levels and state related changes through the sleep–wake cycle.

## **THALAMIC MODULATORS AND EXECUTIVE FUNCTION**

The data reviewed in the previous section suggests that modulators contribute to the function of virtually all thalamic nuclei and may be critical in higher order nuclei. These nuclei receive a higher proportion of modulators than first order, have cell populations with heterogeneous responses to modulators, and are interconnected with brain regions that are themselves under strong modulator control.

One feature that characterizes higher order thalamic nuclei is the complexity of their projections. Whereas sensory nuclei have relatively confined projection targets within neocortex, higher order nuclei project to multiple regions within and outside of neocortex. Targets include the basal ganglia, hippocampus, hypothalamus, and amygdala. Among them, mPFC and the striatum have been identified as key structures in the control of executive function. Although a few other thalamic nuclei project to these two areas, the following section will focus on the modulation of two groups of nuclei that have strong connections with mPFC and the striatum: the midline and the intralaminar groups (Hoover and Vertes, 2007; Galvan and Smith, 2011). The midline group includes, ventrally, the reuniens and rhomboid nuclei, and, more dorsally, the paratenial, paraventricular, and mediodorsal nuclei. This group is defined primarily by its position along the midline of the thalamus, and the mediodorsal, the reuniens, and the paratenial nuclei also originate from the same pronuclear mass during development (Jones, 2007). The intralaminar nuclei follow an anteroposterior axis, with the rostral part including the central lateral, paracentral, and central medial nuclei. The parafascicular nucleus, together with the centromedian nucleus in primates, constitute the caudal components of the intralaminar group and are the main source of thalamic input to the striatum (Galvan and Smith, 2011). The midline and intralaminar nuclei have other projection targets (e.g., the hippocampus and amygdala), and modulators in these nuclei can therefore influence networks beyond those directly involved in executive function.

Many open questions remain regarding the function of the midline and intralaminar nuclei. In most cases we lack even basic information, such as which area (or areas) drives these nuclei, or what are the receptive field properties of their cells. Nonetheless, some of the nuclei have been implicated in functional loops in which modulators play a critical role. I will review those here.

One of the first functions proposed for the midline and intralaminar nuclei, and in which modulators are involved, was state maintenance. Moruzzi and Magoun's (1949) classic study raised the possibility that the intralaminar nuclei could mediate the effect of the reticular activating system on the neocortex during wakefulness. The cortical projections of the midline and intralaminar nuclei, which innervate the superficial layers of multiple regions, gave support to the idea that the reticular activating system could influence neocortex through the activation of the midline-intralaminar thalamus. This is consistent with the disruption of consciousness that follows damage to this thalamic region in humans (Llinás et al., 1998), as well as with the improvement thatfollows deep brain stimulation of the intralaminar nuclei in patients in the minimally conscious state (Schiff et al., 2007). Likewise, brainstem cholinergic and monoaminergic regions promote wakefulness through their effect on multiple regions (Lee and Dan, 2012), and they innervate midline and intralaminar nuclei extensively (reviewed above). On the other hand, the traditional view of a brainstem-midline/intralaminar-neocortex network that implements wakefulness has recently been challenged (Fuller et al., 2011). According to Fuller et al. (2011) one of the relevant networks for state regulation starts on parabrachial glutamatergic afferents that project to the basal forebrain, which then influences neocortical state directly, and could also do so indirectly through the thalamus (Hallanger et al., 1987; Buzsaki et al., 1988; Parent et al., 1988; Steriade et al., 1988; Gritti et al., 1998). The role of the intralaminar and midline nuclei in state maintenance needs further clarification. New experimental strategies to manipulate the activity of specific pathways (Xu and Südhof, 2013) offer more selective approaches to attack this question.

Regarding cognitive behavior, the modulation of the midline and intralaminar nuclei may be important in rewarded behavior. These nuclei have high densities of dopamine D2 receptors compared to other parts of the thalamus (Rieck et al., 2004; Piggott et al., 2007); dopamine can influence the midline and intralaminar nuclei locally, but dopaminergic modulation of midline output is likely to also occur at striatal and mPFC targets. Paraventricular and dopaminergic terminals converge, in close spatial relation, onto the same cells in the nucleus accumbens, although that close relation was not found in mPFC (Pinto et al., 2003). Instead, centromedian terminals in the striatum were not found on the same postsynaptic dendrites as dopaminergic terminals (Smith et al., 1994). In mPFC, mediodorsal afferents converge on the same layer V cells contacted by dopaminergic axons, with the mediodorsal input being more distal to the soma (Kuroda et al., 1996). The anatomical data indicates that the paraventricular nucleus has the closest synaptic relation with dopaminergic terminals. The paraventricular nucleus has been suggested to participate in dopamine-mediated reward associations (Igelstrom et al., 2010; Choi et al., 2012; Martin-Fardon and Boutrel, 2012). Kelley et al. (2005) proposed that the paraventricular nucleus is an important component of the network controlling food-related, goal-directed behavior. The paraventricular would integrate energy and circadian information from the hypothalamic orexin system and relay it to the striatum to regulate dopamine levels and feeding behavior, a hypothesis that has recently received support in rats (Choi et al., 2012). In fact, the paraventricular is the thalamic nucleus with the densest orexinergic innervation (Sakurai, 2007), and the effect of these peptides on paraventricular networks deserves further investigation. The role of other nuclei in the midline and intralaminar groups (which also respond to orexins) on rewarded behavior is largely unexplored (Purvis and Duncan, 1997; Bayer et al., 2002).

Recent evidence points to another important function of the caudal intralaminar group in behavioral flexibility and task switching in relation to sensory demands (Galvan and Smith, 2011). Lesions or inactivation of the parafascicular nucleus impair tasks that require behavioral flexibility and prevent the local increase in acetylcholine that occurs in the dorsal striatum during task shifting (Brown et al., 2010; Kato et al., 2011). Thalamostriatal afferents evoke a burst-pause firing pattern in striatal cholinergic interneurons; the cholinergic burst transiently silences corticostriatal afferents (**Figure 5**), and is followed by a facilitation of the striatopallidal output, which is thought to contribute to action suppression through the motor thalamus. This brief overriding of corticostriatal input followed by the biased activation of the striatopallidal "no-go" pathway, is thought to suppress ongoing motor output and allow for the selection of a different action (Ding et al., 2010). A complementary line of evidence indicates that intralaminar cells respond with burst discharges to a variety of stimuli, particularly to unexpected and salient stimuli, and could therefore play an important role in shifting attention and behavior under unexpected or changing conditions (Matsumoto et al.,2001), which would contribute to the deficits in cue-triggered responses observed after intralaminar lesions (Hembrook and Mair, 2011). An important experiment will be to determine if it is specifically the burst firing mode in intralaminar cells that evokes burst-pause firing in striatal cholinergic interneurons. Acetylcholine selectively hyperpolarizes intralaminar cells that project to striatum (Beatty et al., 2009) and this modulator could be critical at influencing behavioral flexibility at the thalamic level by keeping intralaminar cells in burst mode. Also interesting is that thalamostriatal projections from the caudal intralaminar nuclei

**FIGURE 5 |Thalamostriatal projections gate corticostriatal inputs in mouse slices. (A)** Left, diagram of the experimental preparation: medium spiny neuron (MSN) recorded in the striatum while corticostriatal projections are activated, with or without preceding stimulation of thalamostriatal projections. Right, activation (downward arrows) of corticostriatal input evokes a train of EPSPs in a MSN cell. **(B)** Corticostriatal EPSPs are reduced when thalamostriatal stimulation precedes the corticostriatal stimulation by 25 ms. **(C)** Overlay of corticostriatal EPSPs before and after (blue) thalamostriatal activation to illustrate the changes in amplitude. **(D)** Overlay of corticostriatal EPSPs before and after (red) thalamostriatal activation, but with a long delay (1 s) between the thalamostriatal and corticostriatal activation. [Reprinted from (Ding et al., 2010), with permission from Elsevier.]

are largely segregated from thalamocortical projections, whereas in the rostral intralaminar and outside of the intralaminar group, projections often collateralize to cortex and striatum (Smith et al., 2009). This suggests that modulation may occur selectively and independently for the thalamostriatal and thalamocortical caudal intralaminar networks.

Within the midline group, a few studies implicate the nucleus reuniens in behavioral flexibility and other cognitive processes (reviewed in: Cassel et al., 2013). In a water maze task, Dollemanvan der Weel et al. (2009) observed that reuniens lesions in rats did not alter memory acquisition, but made the animals more "impulsive" during retrieval. In probe trials, animals searched for the platform in the correct location, but, in contrast to controls, soon switched to searching all over the pool. Impulsive responses were also observed after reuniens lesions in rats in a multiple choice visual-response task (Prasad et al., 2013), although not in a similar task used by Hembrook and Mair (2011). Consistent with a role in behavioral flexibility, inactivation of reuniens produced deficits in behavioral paradigms that required response inhibition, like the passive avoidance task (Davoodi et al., 2011) and a task that required switching from egocentric to allocentric navigation strategies (Cholvin et al., 2013). An important confound is that inactivation of reuniens can have additional effects, such as impairment of working memory (Hallock et al., 2013) and enhancement of memory generalization (Xu and Südhof, 2013), which could produce impairments in cognitive flexibility. Outside of reuniens, there is some evidence that the mediodorsal nucleus may contribute to behavioral flexibility; inactivation of this nucleus leads to perseverative errors in a task that required rats to switch from egocentric to cue-discrimination strategies (Block et al., 2007). More research is needed to clarify the role of the midline thalamus in behavioral flexibility and to begin the exploration of thalamic modulation on this function. Evidence from mPFC (a major target of the midline nuclei) indicates an important role for dopamine in behavioral flexibility (Floresco and Magyar, 2006) and makes this modulator an inviting starting point.

## **SUMMARY AND FUTURE DIRECTIONS**

Most modulators have relatively similar properties within first order thalamic nuclei, but differ in either their anatomical or functional features between first and higher order. **Table 1** summarizes the key anatomical and physiological findings in first and higher order nuclei, as well as those specific to the midline and intralaminar areas. Higher order nuclei receive glutamatergic modulators from the lower sublamina of layerVI, they receive cholinergic input with a larger LDT component than first order, they have subsets of cells that are hyperpolarized by acetylcholine and serotonin, and receive denser projections from brainstem modulators (cholinergic, serotonergic, noradrenergic, and dopaminergic). Many higher order nuclei have not been extensively studied, and further research is needed to advance our understanding of the similarities and differences across nuclei, and to fully characterize their functional implications.

One crucial aspect that has been minimally investigated in the thalamus is the integration of modulator and driver inputs in individual dendrites (although see Crandall and Cox, 2013). The view of thalamic cells as relays has been so prevalent in the literature, that complementary conceptual frameworks have been weakened or not even considered. Thinking of thalamic cells as relays is important to understand thalamic function, but other views are necessary and will stir further progress on our understanding of the thalamus. The careful organization of thalamic modulator and driver synapses along the dendrites of thalamic cells suggests an important role for thalamic cells in the integration of inputs. Corticothalamic modulators have small terminals that tend to contact relatively distal parts of the dendrites of thalamic cells. Other modulators (cholinergic, serotonergic) spread their synapses along the proximal dendrites, falling within the area of termination of drivers. The overlap of synapses in proximal dendrites may facilitate the modulation of drivers and of voltage dependent channels (such as IT) located in those dendritic regions (Destexhe et al., 1998). The overlap between drivers and modulators is particularly relevant in higher order nuclei; these nuclei have drivers of multiple origins (Baldauf et al., 2005; Masterson et al., 2009) that could be modulated independently, something that needs to be investigated. Furthermore, the arrangement of synapses along the dendrites of thalamic cells may be important to ensure adequate interactions between modulators. Recording from thalamic dendrites is feasible (Williams and Stuart, 2000), and recent advancements in multicolor optogenetics (Klapoetke et al., 2014) allow the specific manipulation of multiple modulator populations. Studying the interaction of multiple modulators on individual dendrites is critical to figuring out their relative contribution to cell physiology, their influence on other inputs and, ultimately, the computational functions of thalamic cells.

By far, the most broadly studied effect of thalamic modulators has been the effect on membrane potential. This focus is well justified, since changing the membrane potential switches thalamic cells between the linear "tonic" mode of response (at depolarized levels) and the non-linear "burst" mode (at hyperpolarized membrane potentials). The tonic mode is thought to be an accurate mode of information transmission, whereas the burst mode has a higher signal-to-noise ratio and can be more effective at indicating a change in incoming information. This has important implications for the gating functions of thalamic nuclei through the sleep–wake cycle, and for the generation of oscillatory rhythms in thalamocortical networks. Rhythmic burst firing due to abnormal inhibition has been suggested to interfere with thalamic function and contribute to the pathophysiology of neuropsychiatric disorders, such as schizophrenia (Llinás et al., 1999; Lisman, 2012). Some modulators (acetylcholine, serotonin) specifically inhibit cells in higher order nuclei, and dysfunction of these modulators could contribute to abnormal rhythmicity in these nuclei. The effect of modulators on membrane potential also has implications for gain control, as suggested by layer VI modulation results in the visual system. Gain control at the thalamic level could represent a form of top-down control on earlier stages of the visual pathway, like the LGN, which receive layer VI afferents from the cortical regions that they project to. Future experiments will determine if layer VI projections to the thalamus can have gain control functions in higher order nuclei. These nuclei


**Table 1 |**

**Summary.** receive reciprocal (from cortical regions they project to) and nonreciprocal (from cortical regions they receive driver input from) layer VI afferents that could contribute, respectively, to top-down and bottom-up mechanisms of gain control. Furthermore, beyond the role of modulators on excitability, there is evidence that modulators influence the response properties of relay cells through the modification of the voltage-dependence of membrane conductances (e.g., the blockade of sAHP by serotonin and noradrenaline; McCormick and Prince, 1988; Goaillard and Vincent, 2002). However, the effect of these changes in the encoding of information in the awake animal is not known.

Undoubtedly, much remains to be learned about thalamic modulation from the systems perspective. Brainstem modulators experience state dependent changes in activity, and, within states, modulators could contribute to further fine-tuning, e.g., to variations in alertness. Investigation of higher order nuclei in different behavioral states could be an effective starting point. For example, during wakefulness, higher order relay cells are more likely to fire bursts than first order relay cells (Ramcharan et al., 2005), as predicted from the *in vitro* data reviewed here. However, we do not know what changes occur in higher order nuclei throughout the sleep–wake cycle, although their strong innervation from brainstem state regulation centers suggests stronger modulations than in first order nuclei.

Within the higher order nuclei, the midline and intralaminar groups stand out as essential components of the executive networks that engage mPFC and basal ganglia. Research in these thalamic groups has lagged behind the study of the sensory thalamus. Recent evidence suggests that these nuclei have a role in modulator regulated behaviors, such as behavioral flexibility and reward directed behavior. Research in this part of the thalamus is essential to understanding executive behavior and disease. The thalamus has been a successful therapeutic target for deep brain stimulation in a number of neurological conditions, such as essential tremor (Lyons, 2011). Treatments for disorders of executive function (schizophrenia, depression) will not be able to take the thalamus into account until we understand the role of nuclei like the midline and intralaminar in executive networks.

### **ACKNOWLEDGMENT**

I thank M. J. Galazo and A. Rosenberg for their helpful feedback on this manuscript.

#### **REFERENCES**


nucleus of the thalamus in the cat. *Neuroscience* 85, 149–178. doi: 10.1016/S0306- 4522(97)00573-3


Lee, S., Carvell, G. E., and Simons, D. J. (2008). Motor modulation of afferent somatosensory circuits. *Nat. Neurosci.* 11, 1430–1438. doi: 10.1038/nn.2227


to rat thalamocortical neurones in vitro. *Neuroscience* 122, 459–469. doi: 10.1016/j.neuroscience.2003.08.014


Zhu, J. J., and Uhlrich, D. J. (1998). Cellular mechanisms underlying two muscarinic receptor-mediated depolarizing responses in relay cells of the rat lateral geniculate nucleus. *Neuroscience* 87, 767–781. doi: 10.1016/S0306-4522(98)00209-7

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 March 2014; accepted: 07 June 2014; published online: 24 June 2014. Citation: Varela C (2014) Thalamic neuromodulation and its implications for executive networks. Front. Neural Circuits 8:69. doi: 10.3389/fncir.2014.00069 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Varela. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Severe drug-induced repetitive behaviors and striatal overexpression of VAChT in ChAT-ChR2-EYFP BAC transgenic mice

## **Jill R. Crittenden\*, Carolyn J. Lacey , Tyrone Lee , Hilary A. Bowden and Ann M. Graybiel**

Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA

#### **Edited by:**

Evelyn K. Lambe, University of Toronto, Canada

#### **Reviewed by:**

Margaret Davis, National Institutes of Health, USA Emmanuel Valjent, Inserm, France

#### **\*Correspondence:**

Jill R. Crittenden, Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, 46-6133, 77 Massachusetts Avenue, Cambridge, MA 02139, USA e-mail: jrc@mit.edu

In drug users, drug-related cues alone can induce dopamine release in the dorsal striatum. Instructive cues activate inputs to the striatum from both dopaminergic and cholinergic neurons, which are thought to work together to support motor learning and motivated behaviors. Imbalances in these neuromodulatory influences can impair normal action selection and might thus contribute to pathologically repetitive and compulsive behaviors such as drug addiction. Dopamine and acetylcholine can have either antagonistic or synergistic effects on behavior, depending on the state of the animal and the receptor signaling systems at play. Semi-synchronized activation of cholinergic interneurons in the dorsal striatum drives dopamine release via presynaptic nicotinic acetylcholine receptors located on dopamine terminals. Nicotinic receptor blockade is known to diminish abnormal repetitive behaviors (stereotypies) induced by psychomotor stimulants. By contrast, blockade of postsynaptic acetylcholine muscarinic receptors in the dorsomedial striatum exacerbates drug-induced stereotypy, exemplifying how different acetylcholine receptors can also have opposing effects. Although acetylcholine release is known to be altered in animal models of drug addiction, predicting whether these changes will augment or diminish drug-induced behaviors thus remains a challenge. Here, we measured amphetamine-induced stereotypy in BAC transgenic mice that have been shown to overexpress the vesicular acetylcholine transporter (VAChT) with consequent increased acetylcholine release. We found that drug-induced stereotypies, consisting of confined sniffing and licking behaviors, were greatly increased in the transgenic mice relative to sibling controls, as was striatal VAChT protein. These findings suggest that VAChT-mediated increases in acetylcholine could be critical in exacerbating drug-induced stereotypic behaviors and promoting exaggerated behavioral fixity.

**Keywords: amphetamine, dopamine, acetylcholine, striatum, striosome, stereotypy, drug addiction**

#### **INTRODUCTION**

Acetylcholine is a key intercellular signaling molecule that is released from neurons in the central and peripheral nervous systems as well as from non-neuronal cell types such as immune and epithelial cells (Grando et al., 2007). Imbalances in CNS acetylcholine have been documented in neurologic disorders including Alzheimer's disease and Parkinson's disease, and also in drug addiction. Enzymes required for acetylcholine synthesis (ChAT, choline acetyltransferase), break-down (AChE, acetylcholinesterase) and vesicular packaging (VAChT) are particularly abundant in the striatum (Graybiel et al., 1986; Zhou et al., 2001), a subcortical brain region that is important for motor and motivational control and for habit formation (Jog et al., 1999; Graybiel, 2008; Yin et al., 2009). Cholinergic interneurons comprise only 1–2% of the total number of striatal neurons but their processes, along with cholinergic input fibers from brainstem nuclei (Dautan et al., 2014), span the striatum (Graybiel et al., 1986; Kawaguchi, 1992). Furthermore, cholinergic interneurons are thought to correspond to the tonically active neurons (TANs) that undergo semi-synchronous patterns of firepause-rebound activity upon presentation of learned or salient sensory cues (Kawaguchi, 1993; Aosaki et al., 1995; Matsumoto et al., 2001; Inokawa et al., 2010; Schulz et al., 2011; Zhao et al., 2011; Doig et al., 2014). The activity of these interneurons is controlled by intrinsic membrane activity as well as a variety of inputs, including excitatory inputs from the cerebral cortex (Reynolds and Wickens, 2004; Doig et al., 2014) and the sensoryresponsive parafascicular nucleus of the thalamus (Lapper and Bolam, 1992), local inhibitory input (Gonzales et al., 2013; Doig et al., 2014), and modulatory inputs from cholinergic and dopaminergic fibers (Aosaki et al., 1994; Dautan et al., 2014). All together, these data are consistent with the notion that the cuerelated activity of cholinergic interneurons of the striatum serves to re-bias action selection driven by cortico-basal ganglia circuits and their thalamic links (Minamimoto et al., 2009; Ding et al., 2010).

Cholinergic interneurons are frequently located at the borders between striosomes (a.k.a. patches) and matrix, two striatal compartments that have different input-output connections (Gerfen, 1984; Jimenez-Castellanos and Graybiel, 1989; Langer and Graybiel, 1989; Eblen and Graybiel, 1995; Kincaid and Wilson, 1996; Fujiyama et al., 2011; Watabe-Uchida et al., 2012; Gerfen et al., 2013) and that are impacted differentially by psychomotor stimulants (Graybiel et al., 1990; Canales and Graybiel, 2000; Capper-Loup et al., 2002; Horner et al., 2012; Jedynak et al., 2012) and disease (Crittenden and Graybiel, 2011). Preferential disruption of striosomes by toxins or genetic targeting influences the severity of drug-induced stereotypy (Tappe and Kuner, 2006; Liao et al., 2008; Murray et al., 2013). Moreover, ablation of cholinergic interneurons in the striatum blocks the drug-induced striosome-to-matrix gene induction ratio (Saka et al., 2002) and can increase drug-induced stereotypy (Aliane et al., 2011). Together, these data suggest that the cholinergic system mediates interactions between the two striatal compartments (Miura et al., 2008), and the balance between drug-induced hyperlocomotion and restricted, repetitive behaviors (Canales and Graybiel, 2000).

Overexpression of VAChT augments vesicular loading and release of acetylcholine *in vitro* (Song et al., 1997). Moreover, transgenic mouse models that carry multiple copies of *Slc18a3*, the gene encoding VAChT, show an increase in evoked release of acetylcholine in hippocampal slices (Nagy and Aubert, 2012; Kolisnyk et al., 2013b). Transgenic mouse and rat lines that are engineered to drive gene expression in cholinergic cells typically carry exogenous copies of the cholinergic gene locus (Eiden, 1998) with an inactivated *Chat* gene but an intact *Slc18a3* gene. Accordingly, ChAT-ChR2-EYFP BAC mice, which were selected for high-level expression of channelrhodopsin in cholinergic neurons (Zhao et al., 2011), have been shown to overexpress VAChT (Kolisnyk et al., 2013b). Evaluation of ChAT-ChR2-EYFP BAC mice demonstrated that they have normal metabolic rate and baseline locomotor activity but reduced performance in tests for attention, spatial memory, cue-guided memory and working memory (Kolisnyk et al., 2013b). Deletion of VAChT in the prefrontal cortex of mice also disrupts cognitive function, as reflected by reduced reversal learning and attention-task performance (Kolisnyk et al., 2013a). Thus, both hypofunction and hyperfunction of VAChT are associated with impairments in cognitive function.

Here, we show that ChAT-ChR2-EYFP BAC mice have elevated striatal VAchT and abnormally severe confined stereotypies when treated with high doses of D-amphetamine. By contrast, they showed mild hypersensitivity to low doses of D-amphetamine and their behavioral responses to saline injection were relatively normal. These data are consistent with the proposal that the regulation of acetylcholine release is especially important for balancing the response to extreme dopamine stimulation.

### **MATERIALS AND METHODS**

#### **MICE**

The Committee on Animal Care at the Massachusetts Institute of Technology approved all procedures. ChAT-ChR2-EYFP BAC mice were genotyped from tissue assayed by Transnetyx, Inc. for the presence of *EYFP*. ChAT-ChR2-EYFP BAC transgenic mice (Zhao et al., 2011) were obtained from Prof. Guoping Feng on a C57BL/6J genetic background and crossed to a line on a mixed FVB/N and 129S4 background. Offspring were intercrossed to maintain the line by crossing *EYFP*-positive mice to *EYFP*negative mice at every generation such that mice homozygous for the transgene were never generated. For the data reported here, ChAT-ChR2-EYFP BAC hemizygous mice were compared to BAC-negative, sibling wildtype mice. Transgenic and control mice were tested in parallel by an experimenter blinded to genotype. All experimental mice were male and group-housed with sibling controls under a standard light-dark cycle (lights on at 7 am and off at 7 pm), with free access to food and water. Mice were between 3–11 months of age at the time of testing.

For the viral vector experiment to evaluate cholinergic neuropil, male mice from the Cre knock-in line, B6;129S6- ChAT<tm2(Cre)Lowl>/J (Rossi et al., 2011), were obtained from Jackson Laboratories and used directly for experimentation.

#### **INTRACEREBRAL VIRAL INJECTION**

Adeno-associated virus (rAAV5EF1a-DIO-hChR2(E123T/ T159C)-mCherry) was packaged and purified by the Gene Therapy Center Vector Core at The University of North Carolina at Chapel Hill and estimated by dot blot to be at a concentration of 4X10e12 virus molecules/ml.

Mice were anesthetized by injection (i.p., 10 ml/kg) with a mixture of ketamine (120 mg/kg) and xylazine (16 mg/kg) in saline. Mice were mounted onto a stereotactic frame and small burr holes were made bilaterally (AP = 0.9 mm and ML = −1.9 mm and +1.9 mm, relative to bregma). A NanoFil microsyringe (World Precision Instruments) was lowered to deliver 0.5 µl of virus solution at each of two sites on each side of the brain (DV = 2.0 mm and 2.7 mm), at a rate of 0.1 µl/min. Eight weeks after surgery, mice were transcardially perfused and brain sections were obtained for immunohistological examination as described below.

### **IMMUNOLABELING AND HISTOLOGY**

Mice were deeply anesthetized with Euthasol (pentobarbital sodium and phenytoin sodium from Virbac AH Inc.) and perfused transcardially with 15 ml of saline followed by 60 ml of 4% paraformaldehyde in 0.1 M NaKPO4, pH 7.4. Brains were postfixed overnight in 4% paraformaldehyde and then cryoprotected by submersion overnight in 20% glycerin in 0.1 M NaKPO4, pH 7.4. Frozen, 24-µm thick, transverse brain sections were cut on a sliding microtome. Immunoreactivity was assessed by standard methods. Briefly, free-floating sections were incubated with primary antisera (anti-VAChT AB1588 from Millipore, 1:100 dilution; anti-ChAT AB144P from Millipore, 1:200 dilution; polyclonal anti-CalDAG-GEFI, 1:5,000 dilution Crittenden et al., 2004). For immunofluorescence, sections were incubated with secondary antibodies coupled to ALEXA594 or ALEXA488 (Invitrogen Corp., anti-rabbit, 1:250 dilution) and then were mounted and coverslipped with Vectashield media (Vector Laboratories). For non-fluorescent immunohistochemistry, sections were incubated with a biotinylated secondary antibody (Vector Laboratories, anti-guinea pig and anti-goat, 1:500 dilution) the signal was amplified and visualized by the Vectastain Peroxidase ABC System (Vector Laboratories), and then sections were mounted and coverslipped with Eukitt (Electron Microscopy Sciences). Brightfield images were obtained on an Olympus BX61 microscope, and confocal images were obtained on a Nikon C2 microscope. EYFP and ALEXA488 fluorophores were excited with a 488 nm solid-state laser and emission was transmitted through a 560 nm short pass filter plus a 510 +/− 42 nm band pass filter. mCherry and ALEXA594 were excited with a 561 nm solid-state laser and emission was transmitted through a 648 nm short pass filter plus a 593 +/− 20 nm band pass filter. High-resolution confocal images were made by summing 5 µm (**Figures 2D–F**) or 7.5 µm (**Figures 2G–I**) stacks of images taken at 0.5 µm intervals. Images were processed and analyzed with Fiji software (Schindelin et al., 2012). To ensure that the fluorescence signal from cholinergic neuropil was within a striosome, and not derived from matrix tissue above or below the image plane, care was taken only to stack images in which the CalDAG-GEFI immunoreactivity was consistently low within striosomal borders.

### **IMMUNOBLOTTING**

Mice were sacrificed by cervical dislocation, and the striatum and overlying cerebral cortex were dissected on a cold plate prior to freezing on dry ice and storage at −80◦C. To prepare tissue lysates, frozen tissue was homogenized in ice-cold modified RIPA buffer (50 mM Tris pH 8.0, 150 mM NaCl, 1% Triton X-100, 0.1% sodium dodecyl sulfate, 1% NaDeoxycholate) with protease inhibitor, sodium fluoride, activated sodium orthovanadate and PMSF and centrifuged at 16,000 × g for 10 min to pellet insoluble material. The protein concentrations of supernatants were determined by bicinchoninic acid assays (Pierce). For detection of VAChT, lysates were not boiled prior to resolving proteins by SDS-PAGE. Gel-resolved proteins were transferred to PVDF membrane by using the Invitrogen iBlot. Immunoblotting was accomplished by standard methods. Blots were incubated overnight with antisera against VAChT (139103 from Synaptic Systems, 1:500 dilution). Blots were rinsed and incubated with horseradish peroxidase-coupled secondary antibody (Santa Cruz Biotechnology, Inc. anti-rabbit, 1:5,000 dilution) prior to immunodetection with Immobilon Western (Millipore) according to the manufacturer's instructions. Blots were subsequently incubated with anti-β-tubulin III (T8578 from Sigma-Aldrich Co., 1:10,000 dilution) and horseradish peroxidase-coupled secondary antibody (Santa Cruz Biotechnology, Inc. anti-mouse, 1:10,000) to normalize total protein loading, as described in the Statistics subsection below.

### **DRUGS**

D-amphetamine (Sigma) was prepared fresh daily by dissolving in saline, and mice were injected with 10 ml/kg (i.p.) for doses of 2.5 mg/kg/day or 7.0 mg/kg/day.

### **BEHAVIOR EVALUATION**

Locomotor activity was measured using an activity monitoring system and TruScan software (Coulbourn Instruments). Mice were administered saline or drug individually in the activity monitors, which consisted of 25 cm square × 40 cm high arenas surrounded by Plexiglas walls. The floor consisted of a removable plastic drop pan that was cleaned between sessions. A sensor ring, housing 16 infrared beams to detect horizontal movements, surrounded the arena. The animal position was measured every 100 ms, and the software calculated the distance traveled in 5-min bins.

Mice were placed into the activity monitors and given 60 min to habituate prior to injection. Following injection, mice were placed back in the monitors for an additional 85 min of data collection. On days 1–3, mice were injected with saline for habituation to the treatment and environment. On days 4–10, mice were treated with D-amphetamine. Mice were then given 7 days of drug wash-out, with no treatment or handling, prior to a final Damphetamine challenge treatment. Each mouse received the same dose of D-amphetamine on each day, and separate cohorts of mice were used for the low- and high-dose D-amphetamine treatment experiments. To test for sensitization to the injection procedure itself, one cohort of D-amphetamine (7.0 mg/kg) treated mice was given a challenge dose of saline, 12 days after the D-amphetamine challenge. Mice were videotaped while in the monitors on saline day 1, D-amphetamine treatment day 1, and on the challenge day. Video recordings were 2-min long each, and they were taken at 50 and 80 min after injection. An experimenter blinded to genotype scored the videotapes using a keyboard scoring system with the public domain software JWatcher™, version 1.0 (University of California, LA, CA, USA, and Macquarie University, Sidney, Australia).<sup>1</sup> Individual keys were assigned to score resting, grooming, locomotion, rearing, sniffing air, sniffing/licking floor, sniffing/licking wall, no sniffing/licking and highly confined stereotypy.

### **STATISTICS**

To quantify immunoblot signals, VAChT immunoreactive bands were selected in ImageJ software (Schneider et al., 2012), and their density was measured and normalized to neuron-specific tubulin bands on the same blot. Results were averaged across three independent brain samples and compared by two-tailed Student's *t*-tests.

Distance traveled and rearing activity graphs are shown as the mean + standard error of the mean for each genotype cohort of mice. Summed distance traveled data (bar graphs in **Figures 3–5**) from the Truscan recording system were compared between genotypes by unpaired, two-tailed Student's *t*-tests. Stereotypy scores between genotype cohorts were compared by Mann-Whitney *U*-tests. Significance criteria were set at *p* < 0.05.

### **RESULTS**

### **OVEREXPRESSION OF VAChT IN THE STRIATUM OF ChAT-ChR2-EYFP BAC TRANSGENIC MICE**

VAChT protein products were elevated in striatum of ChAT-ChR2-EYFP BAC transgenic mice, as illustrated by immunohistochemistry in brain slices (**Figures 1A,B**) and by immunoblot quantitation (**Figure 1E**). VAChT immunoreactivity was observed

<sup>1</sup>http://www.jwatcher.ucla.edu/

in the cell bodies of cholinergic interneurons in both genotypes and in small puncta throughout the striatum, consistent with VAChT function in cell bodies and nerve terminals of cholinergic neurons. Differentially intense VAChT immunoreactive bands were observed at ∼70 kDa and ∼40 kDa, approximating the size of products that were previously confirmed to be missing in lysates from VAChT knockout mice (Nagy and Aubert, 2012). The intermediate size bands, which were not elevated in the BAC transgenic mice, were presumed to be nonspecific. Immunoreactivity for ChAT appeared to be similar in striatum of ChAT-ChR2-EYFP BAC transgenic mice, relative to controls (**Figures 1C,D**). Our results are consistent with reports that, relative to controls, mRNA for VAChT is elevated 20-fold and mRNA for ChAT is unchanged in the striatum of ChAT-ChR2-EYFP BAC transgenic mice (Kolisnyk et al., 2013b).

#### **CHOLINERGIC NEUROPIL DISTRIBUTION IN STRIOSOME AND MATRIX COMPARTMENTS**

The overexpression of enhanced yellow fluorescent protein (EYFP) in cholinergic neurons of the ChAT-ChR2-EYFP BAC transgenic mice provided an opportunity to detect fine neuronal processes that could not be fully labeled with traditional methods (Matsuda et al., 2009). To observe how cholinergic neuropil was distributed relative to the striosome and matrix compartments, we labeled striatal sections from the ChAT-ChR2- EYFP BAC transgenic mice with red immunofluorescence for the matrix marker, CalDAG-GEFI (Kawasaki et al., 1998) to

compare to the pattern of EYFP fluorescence. We observed that EYFP-positive cholinergic interneuron somata (arrows in **Figures 2A,C,D** and **2F**) were frequently in close proximity to one another and to striosomal borders (**Figures 2A–F**) and, although predominantly in the matrix, were occasionally located within a striosome. Thick, sparsely-spiny, EYFP-positive processes emanated from EYFP-positive cell bodies and were presumed to be dendrites (arrowhead in **Figures 2A,F**). These putative cholinergic interneuron dendrites sometimes appeared to cross striosome-matrix borders, but were nevertheless more prevalent in the matrix than in the striosomes (**Figures 2A–F**). In low-magnification images of the medial striatum, striosomes appeared as EYFP-poor zones (**Figure 2A**), presumably owing to the matrix enrichment of these EYFP-positive dendrites, as well as matrix-preferring EYFP-positive cholinergic fibers from brainstem nuclei (Dautan et al., 2014). By high-resolution confocal microscopy, however, very thin, EYFP-positive processes with varicosities, presumed to be axons, could be visualized in both matrix and striosome compartments (**Figures 2D–F**), as indicated in previous ChAT immunostaining methods analyzed with light microscopic methods (Graybiel et al., 1986), but even more clearly seen here with confocal microscopy. Thus the cholinergic neuropil of the striatum, although intense in the matrix compartment, is differentiated, with potential acetylcholine releases sites abundant in both striosomes and matrix and dense dendritic arbors in the matrix compartment.

To determine whether the distribution pattern of cholinergic neuropil that we observed might be unique to the ChAT-ChR2-EYFP BAC transgenic mice, we examined cholinergic interneurons in a ChAT-Cre knock-in line (Rossi et al., 2011) that does not have duplication of the cholinergic gene locus. To label the cholinergic interneurons in this line, we injected, into the striatum of the mice, an adeno-associated virus that carries a Cre-dependent gene encoding the red fluorophore, mCherry (described in Section Materials and Methods). Eight weeks after viral injection, we obtained brain sections and double-labeled them by green immunofluorescence for the matrix marker, CalDAG-GEFI. Compared to the ChAT-ChR2-EYFP BAC mice, we observed fewer cholinergic interneurons labeled by this method, and the cholinergic neuropil appeared less dense in most regions. However, the compartmental distribution of labeled processes was similar in the two lines (**Figures 2G–I**). In the ChAT-Cre knock-in line, the thick, mCherry-positive processes (presumed dendrites) crossed compartment borders but were more abundant in the CalDAG-GEFI-positive zones (matrix), than in the striosomes (CalDAG-GEFI-poor). By contrast, the very fine mCherry-positive processes with varicosities (presumed axons) appeared similarly dense between the striosome and matrix compartments (**Figures 2G–I**). These results again suggest that inputs to the cholinergic interneuron dendrites are enriched in the striatal matrix (Herkenham and Pert, 1981; Graybiel et al., 1986; Sadikot et al., 1992; Fujiyama et al., 2006; Raju et al., 2006), whereas potential axon-terminal release sites from cholinergic interneurons are dense in both striosomes and matrix.

#### **SEVERE DRUG-INDUCED STEREOTYPY IN ChAT-ChR2-EYFP BAC TRANSGENIC MICE**

To test whether the ChAT-ChR2-EYFP BAC transgenic mice had abnormal responses to psychomotor stimulants, we measured locomotion and stereotypy in transgenic mice and sibling controls treated with low or high doses of D-amphetamine. To measure locomotion each mouse was placed into an activity monitor fitted with infrared photobeams to monitor mouse movements and calculate distance traveled. To measure drug-induced stereotypy, we video-recorded the mice for 2 min, at 50 and at 80 min postinjection and a rater blinded to genotype scored the frequency and duration of each behavior observed.

To habituate the mice to the activity chamber and also to gather baseline behavior data, we injected the mice with saline (10 ml/kg, i.p.) for three consecutive days prior to drug treatment. We found no differences in behavior between the ChAT-ChR2- EYFP BAC transgenic mice and sibling controls injected with saline in the novel chamber (**Figures 3A,B**). In response to the first injection of low-dose D-amphetamine (2.5 mg/kg, i.p.), the ChAT-ChR2-EYFP BAC transgenic mice showed a tendency for a greater locomotor response than their sibling controls, but this effect did not reach statistical significance (**Figure 4A**). The time spent in slow versus fast locomotion was significantly less for the transgenic mice, reflecting the tendency for their increase in distance traveled (**Figure 4B**). To test for drug sensitization, we treated the mice for an additional 6 days with the same dose of D-amphetamine followed by a 7-day drug washout period with no treatment, and then measured their response to a drug challenge at the same dose-level. On the challenge day, both transgenic and control mice showed evidence of sensitization in that they began locomotion much sooner after drug injection than they did on the first day (**Figure 4C**). The total distance traveled did not appear increased on challenge day relative to the first day of treatment, which is likely related to the increase in pausing for wall-sniffing in the sensitized mice (**Figure 4D** compared to **4B**). On the challenge day, the transgenic mice showed evidence of drug hyperresponsivity based on significantly more time spent in fast locomotion than controls (**Figure 4D**). Altogether, the ChAT-ChR2-EYFP BAC mice showed slight hypersensitivy to low-dose D-amphetamine, both in their response to acute treatment and after repeated treatment inducing drug sensitization.

The response of the transgenic mice to high doses of Damphetamine (7.0 mg/kg) was strikingly different from that of their wildtype siblings. The distance traveled scores of the transgenic mice started to fall sharply at the 20 min post-injection time point (**Figure 5A**), as they began to engage in severe and confined stereotypic behaviors such as sniffing the floor or the wall in the corners of the monitors (**Figure 5B**). After repeated high-dose D-amphetamine treatment, both transgenic and control mice developed sensitized responses to the drug, indicated by their short latencies to the onset of locomotion after drug-injection (**Figure 5C**) and increase in severe stereotypy (**Figure 5D**), relative to day 1. In summary, the ChAT-ChR2-EYFP BAC mice had more severe confined stereotypic behavior, in both the naïve and the drug-sensitized state, than the corresponding control mice (**Figures 5B,D**).

Considering that cholinergic interneurons are reported to be responsive to learned and salient cues, we also tested whether the mice that had been sensitized to high-dose D-amphetamine would show a sensitized locomotor response to saline injection only. After drug sensitization, both ChAT-ChR2-EYFP BAC and control mice showed a sharper response to saline injection than they did prior to drug sensitization (**Figure 6**, compared to **Figure 3A**), but there were no apparent differences between genotypes. Thus, the transgenic mice did not exhibit blockade either of behavioral sensitization to D-amphetamine injection itself or of the capacity to become sensitized to cues associated with injection of the drug.

### **DISCUSSION**

Our findings point to an abnormal behavioral phenotype in BAC transgenic ChAT-ChR2-EYFP mice in which the mice exhibit excessively severe amphetamine-induced stereotypy. It is likely that this phenotype derives from overexpression of VAChT in these mice. Consistent with the finding that *Slc18a3* transcription is elevated (Kolisnyk et al., 2013b), we observed significantly higher VAChT protein immunolabeling in the striatum and cerebral cortex of hemizygous transgenic mice than in their littermate controls, as determined both by immunoblotting and by immunohistochemistry. Thus, this BAC mouse line is likely to have increased stimulated acetylcholine release in the striatum, based on findings of increased release of acetylcholine in hippocampal slices from this line and a second, similar line (B6.eGFPChAT) (Nagy and Aubert, 2012; Kolisnyk et al., 2013b). VAChT immunoreactivity appeared to be particularly abundant

in the lateral striatum, which contains cholinergic interneurons as well as cholinergic afferent fibers from the pedunculopontine nucleus (Dautan et al., 2014), and excitatory inputs from sensorimotor regions of the thalamus (Lanciego et al., 2004) and neocortex (Kincaid and Wilson, 1996). Notably, this VAChTenriched region appears to be near to the ventrolateral zone that induced the highest levels of oral stereotypy in a mapping study made by local amphetamine injections across the striatum (Dickson et al., 1994).

The VAChT overexpression finding is congruent with chromosomal insertions of multiple copies of the ChAT-ChR2-EYFP BAC construct (Zhao et al., 2011), in which the first part of the *Chat* coding region was replaced by the ChR2-EYFP cassette, but the *Slc18a3* gene, which is located within intron 1 of *Chat* (Eiden, 1998), was not altered. A similar BAC construct design has been used to generate numerous transgenic rodent lines in order to drive gene expression in cholinergic cells. Such transgenic lines include a channelrhodopsin line (Ren et al., 2011; Zhao et al., 2011), a ribosomal L10a marker TRAP line (Doyle et al., 2008; Heiman et al., 2008), fluorescent reporter lines (Gong et al., 2003), a tau-GFP line (Grybko et al., 2011), mouse Cre lines (Gong et al., 2007), and a rat Cre line (Witten et al., 2011). ChAT BAC lines selected for high transgene expression levels would be expected to have correspondingly high expression of VAChT. Knock-in lines in which the transgene is targeted to the endogenous ChAT locus (Rossi et al., 2011), or BAC transgenic lines in which VAChT is specifically inactivated (Ting and Feng, 2014), could be the exceptions. It is well-recognized that the genetic background (Thomsen and Caine, 2011), as well as the sex, age and housing conditions of mice influence their responses to psychomotor stimulants. The phenotype of the BAC mice described here highlights the importance of controlling for the possibility that BAC transgenic mice carry extra copies of genes or have a gene mutation caused by chromosomal insertion of the BAC. Abnormal responses to cocaine were discovered for a Drd2- EGFP BAC mouse line as well (Kramer et al., 2011), although this phenotype is reported to be sensitive to genetic background and is dependent on homozygosity for the BAC insertion (Chan et al., 2012).

The fluorophore overexpression in the ChAT BAC transgenic mice, and virus-injected ChAT-Cre knock-in mice, permitted us to observe the cholinergic neuropil in fine detail. The dendrites that originated from the fluorophore-labeled cholinergic interneurons were sparsely spiny and were more prevalent in the matrix compartment than in the striosomes of the dorsomedial striatum (Graybiel et al., 1986). This finding complements evidence that the thalamic parafascicular nucleus preferentially targets the striatal matrix, and is a major source of input to the dendritic shafts of cholinergic interneurons (Herkenham and Pert, 1981; Lapper and Bolam, 1992; Sadikot et al., 1992; Fujiyama et al., 2006; Raju et al., 2006). The compartmentalized distribution of EYFP-positive dendrites was most obvious in

the striosome-rich medial striatum, a region specifically implicated in the cholinergic regulation of drug-induced stereotypy (Aliane et al., 2009). In contrast to the matrix-enrichment of dendrites from the cholinergic interneurons, very thin, EYFPpositive cholinergic processes, with abundant varicosities typical of axons, extended throughout both the matrix and striosomes, a finding strongly extending the original observation of this fine intra-striosomal neuropil (Graybiel et al., 1986). This apparent innervation of striosomes by the fine fibers of striatal cholinergic neurons contrasts with the reported minimal innervation of striosomes by inputs arising from cholinergic neurons in the brainstem, which strongly and preferentially innervate the matrix compartment (Dautan et al., 2014). If verified by further colabeling of these two sources of cholinergic input to the striatum, these results together would suggest that direct cholinergic innervation of striosomes likely arises specifically from the cholinergic interneurons.

AChE, the main degradative enzyme of acetylcholine, is enriched in the matrix in humans (Graybiel and Ragsdale, 1978), bringing up the further possibility that acetylcholine signaling is more transient in the matrix than in striosomes. Although such differential AChE distributions are scarcely visible in the rodent striatum, preferential striosomal expression of c-Fos is induced by high-dose amphetamine treatment in both rodents and monkeys *in vivo* (Graybiel et al., 1990; Canales and Graybiel, 2000; Saka et al., 2004; Horner and Keefe, 2006; Jedynak et al., 2012) and this compartmentalized pattern is disrupted by ablation of cholinergic interneurons (Saka et al., 2002). Together with our immunfluorescence findings, these observations suggest that acetylcholine release could directly and differentially influence the striosome and matrix compartments.

The ChAT-ChR2-EYFP BAC transgenic mice suffered severe stereotypy, both after acute administration of D-amphetamine and after repeated administration of this drug. There is abundant evidence that changes in acetylcholine levels in the striatum are linked to drug-induced stereotypy. *In vivo* microdialysis studies in behaving rats show that acute, binge methamphetamine treatment protocols that induce high levels of stereotypy lead to changes in acetylcholine release in the dorsal striatum, relative to pre-drug levels. Rats that are given prolonged, repeated drug treatments that result in tolerance to the stereotypy-inducing effects of methamphetamine have less repression of acetylcholine release (Kuczenski and Segal, 2001). By contrast, the levels of acetylcholine in the ventral striatum are not different in rats that show tolerance versus sensitization to drug-induced stereotypy (Kuczenski and Segal, 2001), supporting the idea that druginduced stereotypy is related to activity in the dorsal striatum. Other microdialysis studies, however, suggest that rats exhibiting a sensitized stereotypic responses to amphetamine have an *increase* in striatal acetylcholine (Bickerdike and Abercrombie, 1997), rather than a decrease. The reasons for these opposing effects of psychomotor stimulants on acetylcholine levels are unclear (Kuczenski and Segal, 2001), but the studies nevertheless converge to show a strong correlation of behavioral stereotypy with changes in striatal acetylcholine.

This relationship between acetylcholine and drug-induced stereotypy is still not understood at a mechanistic level. Pharmacologic studies indicate that stereotypy can be influenced by postsynaptic muscarinic receptors as well as presynaptic nicotinic acetylcholine receptors. One potential mechanistic link comes from the fact that activation of presynaptic β2 nicotinic acetylcholine receptors, under certain conditions, can enhance dopamine release from terminals in the dorsal striatum (Zhou et al., 2001; Perez et al., 2009; Threlfell et al., 2012) and that drug-induced stereotypy is associated with co-activation of D1 and D2-type dopamine receptors (Capper-Loup et al., 2002). Moreover, repeated nicotine administration induces stereotypy in rats and also enhances behavioral responses to cocaine (Collins and Izenwasser, 2004). Notably, two studies show that DHβE administration in mice reduces sensitization of stereotypies in response to repeated drug administration, but does not change the stereotypy in response to first-time drug exposure (Karler et al., 1996; Metaxas et al., 2012), suggesting that β2 nicotinic receptors are important for sensitization of stereotypy, but not for the acute stereotypic response. Collectively, these findings raise the possibility that ChAT-ChR2-EYFP BAC transgenic mice are "born sensitized" by virtue of having molecular abnormalities that are similar to those in sensitized animals, and that this predisposition biases them toward exhibiting an increased response to their first D-amphetamine exposure, without being so severe as to occlude further sensitization.

VAChT is reported to be up-regulated in post-mortem striatal samples from human methamphetamine users (Siegal et al., 2004), suggesting that humans exposed to drugs of abuse might have abnormal acetylcholine release akin to that in rodent models of drug addiction (Bickerdike and Abercrombie, 1997; Kuczenski and Segal, 2001) and in transgenic mice with VAChT overexpression (Nagy and Aubert, 2012; Kolisnyk et al., 2013b). Considering the findings reported here of increased responses to a habitforming psychomotor stimulant in mice that ovexpress VAChT, dysregulation of VAChT in humans might be directly related to drug addiction. Reversible AChE inhibitors are prescribed for medical conditions of reduced acetylcholine function including myasthenia gravis and Alzheimer's disease (Nair and Hunter, 2004). AChE inhibitors are also under investigation for the treatment of methamphetamine addiction (De La Garza et al., 2012) and motor tics in Tourette syndrome (Cubo et al., 2008). Despite the widespread use of compounds that reduce the breakdown of acetylcholine, there is incomplete information about the effects of augmenting acetylcholine release. The distinction between these two approaches for elevating acetylcholine may an important one, considering that the phasic release of acetylcholine is thought to be important for cognitive attention (Sarter et al., 2009) and that there are profound differences between tonic and phasic release of dopamine (Goto et al., 2007; Schultz, 2007). Whether stimulation or overexpression of VAChT could be beneficial for particular medical conditions remains to be tested.

The functions of acetylcholine in the striatum depend upon a multitude of factors including the differential activation and dynamics of acetylcholine receptor subtypes, the striatal region under study, and the firing patterns of the dopamine-containing neurons innervating the striatum (Morris et al., 2004; Perez et al., 2009; Threlfell and Cragg, 2011; Zhang and Sulzer, 2012). Moreover, these neuromodulatory influences are themselves associated with functions in numerous neural circuits and cell types. We highlight the severe stereotypic behavior of the ChAT-ChR2- EYFP BAC transgenic mice because of its striking potential reflection of the power of the striatal cholinergic system to influence repetitive behaviors induced by habit-forming drugs. This strong overexpression phenotype suggests that the cholinergic system is poised to regulate responses to intense dopaminergic stimulation, conditions engendered by drug use. Cell signaling mechanisms across the body are relevant to an animal's response to drugs of abuse, and how the loss of frontal control contributes to addiction (Feil et al., 2010) may be related to how cuesensitivity, motivation, and memory are integrated (Flagel et al., 2009).

#### **ACKNOWLEDGMENTS**

We thank Anne Huang, Michael Riad, Jannifer Lee, Sarah Osmulski, Dr. Dan Hu and Dr. Kyle Copps for technical assistance and Alexander McWhinnie for assistance in graphics. We thank Dr. Yasunobu Murata and Prof. Martha Constantine-Patton for use of the confocal microscope. We thank Prof. John Reynolds and Dr. Jonathan Ting for critical comments on the manuscript. This work was supported by The Simons Center for the Social Brain, The CHDI Foundation, The James and Pat Poitras Research Fund, The National Institute of Child Health and Development (R37- HD028341), The Nancy Lurie Marks Family Foundation, and The Stanley Center for Psychiatric Research at the Broad Institute, via a grant to Edward Scolnick.

#### **REFERENCES**


patterned distributions of nigrostriatal projection neurons and striatonigral fibers. *Exp. Brain Res.* 74, 227–238. doi: 10.1007/bf00248855


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 March 2014; paper pending published: 17 April 2014; accepted: 12 May 2014; published online: 28 May 2014*.

*Citation: Crittenden JR, Lacey CJ, Lee T, Bowden HA and Graybiel AM (2014) Severe drug-induced repetitive behaviors and striatal overexpression of VAChT in ChAT-ChR2-EYFP BAC transgenic mice. Front. Neural Circuits 8:57. doi: 10.3389/fncir.2014. 00057*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Crittenden, Lacey, Lee, Bowden and Graybiel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Corrigendum: Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits

#### *Kenji Morita1 \* and Ayaka Kato2*

*<sup>1</sup> Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan*

*<sup>2</sup> Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan*

*\*Correspondence: morita@p.u-tokyo.ac.jp*

#### *Edited and reviewed by:*

*M. Victoria Puig, Massachusetts Institute of Technology, USA*

**Keywords: dopamine, basal ganglia, corticostriatal, synaptic plasticity, reinforcement learning, reward prediction error, flexibility, computational modeling**

**A corrigendum on Figure 2Cd of**

**Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits**

*by Morita, K., and Kato, A. (2014). Front Neural Circuits 8:36. doi: 10.3389/fncir. 2014.00036*

In the preparation of organized program codes for this article (Morita and Kato, 2014) for submission to public database after the publication, we have noticed that there was an error in the code for making **Figure 2Cd** written by one of the authors Kenji Morita. Specifically, although RPE values at *S*<sup>1</sup> for the cases with decay (i.e., the leftmost points of the three solid lines) should be proportional to the amount of reward as appeared in the formula for calculating them:

(at the start of the maze (*S*1)

$$\begin{aligned} (j &= n - 1) \\ \delta\_{n - \bar{j}} &= 0 + \gamma V\_{n - \bar{j}} - 0 \\ &= \gamma V\_{n - \bar{j}} \\ &= \alpha^{\bar{j}} \varkappa^{\bar{j}} \nu^{\bar{j}} R / \{1 - \varkappa (1 - \alpha)\}^{\bar{j}} \\ &\quad (\text{in the right-bottom of page 4}), \end{aligned}$$

where "*R*" represents the amount of reward, they were incorrectly plotted as an equal value in **Figure 2Cd** (indicated by the red circle in the left ("Error") panel of the figure attached to this Corrigendum) because "*R*" was mistakenly dropped (i.e., effectively assumed to be 1 in all the cases) in the code. We have corrected the code and made the corrected **Figure 2Cd** [the right ("Corrected") panel of the figure attached to this Corrigendum]. There is no need to change the texts explaining **Figure 2Cd** in the Methods, Results, and the figure legend. We sincerely apologize for the inconvenience. Lastly, we would like to take this opportunity to

announce that the program (MATLAB) codes for this article (with the correction described in the above) are now available on the ModelDB (Accession: 153573): http://senselab.med.yale.edu/ modeldb/ShowModel.asp?model=153573

## **REFERENCES**

Morita, K., and Kato, A. (2014). Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. *Front* *Neural Circuits* 8:36. doi: 10.3389/fncir.2014. 00036

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 April 2014; accepted: 23 April 2014; published online: 16 May 2014.*

*Citation: Morita K and Kato A (2014) Corrigendum: Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal* *ganglia circuits. Front. Neural Circuits 8:48. doi: 10.3389/fncir.2014.00048*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Morita and Kato. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Corrigendum: "The role of prefrontal catecholamines in attention and working memory"

## *Behrad Noudoost\* and Kelsey L. Clark*

*Department of Cell Biology and Neuroscience, Montana State University, Bozeman, MT, USA \*Correspondence: bnoudoost@montana.edu*

#### *Edited and reviewed by:*

*M. Victoria Puig, IMIM -Hospital del Mar Medical Research Institute, Spain*

**Keywords: dopamine, reward, top-down control, frontal eye field, V4**

#### **A corrigendum on**

### **The role of prefrontal catecholamines in attention and working memory**

*by Clark, K. L., and Noudoost, B. (2014). Front. Neural Circuits 8:33. doi: 10.3389/fncir.2014.00033*

On Page 4, second column, first paragraph, the sentence currently reading (errors in bold):

"One group of PFC neurons, which included all the modulated narrowspiking, putatively inhibitory neurons, was inhibited by DA; these showed short onset latency of DA effects (∼10 **ms**), with no change in signal-to-noise ratio (SNR) or inter-trial variability. A second set of prefrontal neurons was excited by DA application, displaying an increase in SNR and decrease in inter-trial variability; this effect was slower (∼200 **ms**) and observed only in broad-spiking, putatively pyramidal neurons."

Both instances of "ms" (milliseconds) in this sentence should be changed to seconds, "s." Correct text will read:

"One group of PFC neurons, which included all the modulated narrowspiking, putatively inhibitory neurons, was inhibited by DA; these showed short onset latency of DA effects (∼10 s), with no change in signal-to-noise ratio (SNR) or inter-trial variability. A second set of prefrontal neurons was excited by DA application, displaying an increase in SNR and decrease in inter-trial variability; this effect was slower (∼200 s) and observed only in broad-spiking, putatively pyramidal neurons."

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 September 2014; accepted: 11 November 2014; published online: 02 December 2014.*

*Citation: Noudoost B and Clark KL (2014) Corrigendum: "The role of prefrontal catecholamines in attention and working memory". Front. Neural Circuits 8:142. doi: 10.3389/fncir.2014.00142*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Noudoost and Clark. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

### OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org