# NEURAL CIRCUITRY OF BEHAVIORAL FLEXIBILITY: DOPAMINE AND RELATED SYSTEMS

EDITED BY: Gregory B. Bissonette and Matthew R. Roesch PUBLISHED IN: Frontiers in Behavioral Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

*All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-795-8 DOI 10.3389/978-2-88919-795-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **NEURAL CIRCUITRY OF BEHAVIORAL FLEXIBILITY: DOPAMINE AND RELATED SYSTEMS**

Topic Editors:

**Gregory B. Bissonette,** University of Maryland, USA **Matthew R. Roesch,** University of Maryland, USA

Decades of research have identified a role for dopamine neurotransmission in prefrontal cortical function and flexible cognition. Abnormal dopamine neurotransmission underlies many cases of cognitive dysfunction. New techniques using optogenetics have allowed for ever more precise functional segregation of areas within the prefrontal cortex, which underlie separate cognitive functions. Learning theory predictions have provided a very useful framework for interpreting the neural activity of dopamine neurons, yet even dopamine neurons present a range of responses, from salience to prediction error signaling. The functions of areas like the Lateral Habenula have been recently described, and its role, presumed to be substantial, is largely unknown. Many other neural systems interact with the dopamine system, like cortical GABAergic interneurons, making it critical to understand those systems and their interactions with dopamine in order to fully appreciate dopamine's role in flexible behavior. Advances in human clinical research, like exome sequencing, are driving experimental hypotheses which will lead to fruitful new research directions, but how do (or should?) these clinical findings inform basic research?

Following new information from these techniques, we may begin to develop a fresh understanding of human disease states which will inform novel treatment possibilities. However, we need an operational framework with which to interpret these new findings. Therefore, the purpose of this Research Topic is to integrate what we know of dopamine, the prefrontal cortex and flexible behavior into a clear framework, which will illuminate clear, testable directions for future research.

**Citation:** Bissonette, G. B., Roesch M. R, eds. (2016). Neural Circuitry of Behavioral Flexibility: Dopamine and Related Systems. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-795-8

# Table of Contents

*05 Editorial: Neural Circuitry of Behavioral Flexibility: Dopamine and Related Systems* Gregory B. Bissonette and Matthew R. Roesch *07 Optogenetic silencing of locus coeruleus activity in mice impairs cognitive flexibility in an attentional set-shifting task* Kathrin Janitzky, Michael T. Lippert, Achim Engelhorn, Jennifer Tegtmeier, Jürgen Goldschmidt, Hans-Jochen Heinze and Frank W. Ohl *15 Acute physical exercise improves shifting in adolescents at school: evidence for a dopaminergic contribution* Timo Berse, Kathrin Rolfes, Jonathan Barenberg, Stephan Dutke, Gregor Kuhlenbäumer, Klaus Völker, Bernward Winter, Michael Wittig and Stefan Knecht *24 Neural correlates of rules and conflict in medial prefrontal cortex during decision and feedback epochs* Gregory B. Bissonette and Matthew R. Roesch *38 Infusion of D1 Dopamine Receptor Agonist into Medial Frontal Cortex Disrupts Neural Correlates of Interval Timing* Krystal L. Parker, Rafael N. Ruggiero and Nandakumar S. Narayanan *45 Cholinergic and ghrelinergic receptors and KCNQ channels in the medial PFC regulate the expression of palatability* Marc A. Parent, Linda M. Amarante, Kyra Swanson and Mark Laubach *56 Individual variability in behavioral flexibility predicts sign-tracking tendency* Helen M. Nasser, Yu-Wei Chen, Kimberly Fiscella and Donna J. Calu *74 Individual differences in the influence of task-irrelevant Pavlovian cues on human behavior* Sara Garofalo and Giuseppe di Pellegrino *85 Basal forebrain motivational salience signal enhances cortical processing and decision speed* Sylvina M. Raver and Shih-Chieh Lin *97 Ongoing behavioral state information signaled in the lateral habenula guides choice flexibility in freely moving rats* Phillip M. Baker, Sujean E. Oh, Kevan S. Kidder and Sheri J. Y. Mizumori *119 To Act or Not to Act: Endocannabinoid/Dopamine Interactions in Decision-Making* Giovanni Hernandez and Joseph F. Cheer

*129 Glycogen synthase kinase-3b inhibition in the medial prefrontal cortex mediates paradoxical amphetamine action in a mouse model of ADHD*

Yi-Chun Yen, Nils C. Gassen, Andreas Zellner, Theo Rein, Rainer Landgraf, Carsten T. Wotjak and Elmira Anderzhanova

*146 Normal neurochemistry in the prefrontal and cerebellar brain of adults with attention-deficit hyperactivity disorder*

Dominique Endres, Evgeniy Perlov, Simon Maier, Bernd Feige, Kathrin Nickel, Peter Goll, Emanuel Bubl, Thomas Lange, Volkmar Glauche, Erika Graf, Dieter Ebert, Esther Sobanski, Alexandra Philipsen and Ludger Tebartz van Elst

*159 Association of oxytocin level and less severe forms of childhood maltreatment history among healthy Japanese adults involved with child care* Rie Mizuki and Takeo Fujiwara

# Editorial: Neural Circuitry of Behavioral Flexibility: Dopamine and Related Systems

Gregory B. Bissonette1, 2 \* and Matthew R. Roesch1, 2

*<sup>1</sup> Department of Psychology, University of Maryland, College Park, MD, USA, <sup>2</sup> Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, USA*

Keywords: Dopamine, Pavlovian Instrumental Transfer, Sign and goal tracking, set shifting, endocannabinoid system, basal forebrain non-cholinergic neurons, motivational salience, ADHD

**The Editorial on the Research Topic**

#### **Neural Circuitry of Behavioral Flexibility: Dopamine and Related Systems**

Dopamine neurotransmission has long been identified as a key component of prefrontal cortical function and flexible cognition; however, flexible behavior does not depend solely on dopamine. Together, multiple systems allow fast and flexible behavior in the face of a constantly changing environment. This Research Topic illuminates work by researchers who are leading efforts to understand how different neural systems yield flexible behavior.

The psychostimulant amphetamine is, paradoxically, a useful treatment for Attention Deficit Hyperactivity Disorder (ADHD) symptoms. Here, Yen and colleagues demonstrate in a mouse model of ADHD that the calming effect of amphetamine is due to decreased GSK3B, initiated through NMDA receptor signaling (Yen et al.).

Stress is known to be under the partial regulation of oxytocin, and oxytocin levels in adults may be influenced by severity of childhood maltreatment. The inconsistent quality of results on this topic are tackled by Mizuki and Fujiwara, who were able to demonstrate a correlation of severity of childhood maltreatment with diminishing oxytocin levels in humans (Mizuki and Fujiwara).

A key challenge in combatting drug addiction is the incentive motivational property of a stimulus. This is known to develop through Pavlovian Instrumental Transfer (PIT). Garofalo and di Pellegrino were able to show, for the first time in humans, that Sign-Tracker responses were biased by reward-paired cues while Goal-Trackers were not (Garofalo and di Pellegrino). This information is critical in developing individualized treatment plans for treating maladaptive behaviors.

Physical exercise is known to promote better cognition, but the mechanism remains unknown. In their article Berse and colleagues identified a particular single nucleotide polymorphism in the Dopamine Transporter (DAT1/SLCA6A3) gene which contributed to improved cognition under physical exercise conditions in adolescent humans (Berse et al.). These data suggest the usefulness in physical exercise to promote improved flexible cognition in adolescents.

Altered neurochemistry in the anterior cingulate cortex (ACC) is thought to be a root cause of the behavioral symptoms of ADHD. However, when Endres and colleagues conducted a large single voxel proton scale magnetic resonance spectroscopy study, they identified no neurometabolic differences in ACC or cerebellum between control and experimental groups (Endres et al.). These results fail to replicate an earlier and smaller experiment, and the authors propose that the difficulty in diagnosing ADHD may lie in a number of false-negative studies.

Flexible behavior is known to depend on the medial prefrontal cortex (mPFC). Bissonette and Roesch demonstrate how neural correlates of rules and conflict are encoded and signaled by mPFC neurons as rats modify their behavior in the face of changing contingencies

Edited and reviewed by: *Nuno Sousa,*

*University of Minho, Portugal*

\*Correspondence: *Gregory B. Bissonette gbissone@umd.edu*

Received: *16 December 2015* Accepted: *11 January 2016* Published: *28 January 2016*

#### Citation:

*Bissonette GB and Roesch MR (2016) Editorial: Neural Circuitry of Behavioral Flexibility: Dopamine and Related Systems. Front. Behav. Neurosci. 10:6. doi: 10.3389/fnbeh.2016.00006* (Bissonette and Roesch). Importantly, these populations of neurons also modified their activity based on feedback, significantly increasing activity for rewarded outcomes and significantly decreasing activity for non-rewarded choices. These results demonstrate how the mPFC signals the need for behavior to become flexible.

Raver and Lin present a very cogent review of the role that the basal forebrain plays in signaling arousal, attention and decisionmaking. They discuss relevant literature focusing on how a population of BF non-cholinergic neurons encode motivational salience through ensemble burst firing (Raver and Lin). This review serves focuses attention on how the BF salience system is a critical part of signaling attention towards relevant stimuli, promoting flexible behavior.

Most animal research uses reward-based learning paradigms, requiring animals to consume their reward. In this research topic, Parent and colleagues demonstrate how cholinergic and ghrelinergic signaling—both working through KCNQ channels in the mPFC—are critical for regulating the duration of licking(Parent et al.). These findings are important for interpreting data that manipulates the mPFC and uses rewardbased learning paradigms.

Sign-tracking in animals is associated with increased sensitivity to food cues. Nasser and colleagues demonstrate that sign-tracking, but not goal-tracking, rats are more sensitive to food associated cues and that the degree of sign-tracking correlated with a failure to suppress responses during devaluation (Nasser et al.). These data suggest that natural variation among animals in their tendency towards either sign or goal-tracking may be correlated with drug addiction vulnerability.

The locus coeruleus (LC), which provides noradrenergic projects to the cortex, is important for attention. Janitzky and colleagues demonstrate that optogenetically suppressing LC activity during a set shifting task impaired performance on reversal and extra-dimensional set-shifting but not on learning of intra-dimensional shifts or compound discriminations (Janitzky et al.). These results suggest a role for LC not during initial learning, but during situations which require flexible shifts in behavior.

The lateral habenula (LHb) acts as a gatekeeper for information about negative outcomes being transmitted to the midbrain dopamine system. Baker and colleagues provide a thorough review of the LHb literature and present data to support a hypothesis whereby LHb neural activity is important for the maintenance of goal-directed actions to new response types during changing contingencies (Baker et al.). This information is necessary to guide future research on how LHb activity may be important for flexible behavior.

Particular behavioral responses need to be organized over time, allowing appropriate actions to occur at the right time. Parker, Ruggiero and Narayanan demonstrate that blockade of D1-type dopamine receptors in the medial frontal cortex leads to altered interval timing performance by impacting ramping patterns and differentially field potential activity without impacting baseline firing rates of neurons (Parker et al.). These data provide further support that frontal cortical neural activity works best with a basal level of dopamine.

Many neural systems interact while animals engage in complex behaviors and make decisions. Hernandez and Cheer provide a thorough review of the literature surrounding the endocannabinoid system and how endocannabinoid and dopamine systems may interact while animals make decisions (Hernandez and Cheer). This review summarizes the literature around how different aspects of the endocannabinoid system may modulate dopamine release.

Together, this collection of 13 works of original research and reviews provide both a new infusion of knowledge and a broad review of the neural systems underlying flexible behavior. Crossing a range of fields and using a diverse set of tools and subjects, this body of work has illuminated some exciting future avenues of research.

# AUTHOR CONTRIBUTIONS

GB and MR conceived of the Research Topic, edited manuscripts, and worked on this editorial.

# FUNDING

We would like to acknowledge our funder, National Institute on Drug Abuse (MRR DA031695; MRR DA040993).

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Bissonette and Roesch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Optogenetic silencing of locus coeruleus activity in mice impairs cognitive flexibility in an attentional set-shifting task

Kathrin Janitzky 1, 2 \* † , Michael T. Lippert 2 †, Achim Engelhorn<sup>2</sup> , Jennifer Tegtmeier <sup>2</sup> , Jürgen Goldschmidt 2, 3, Hans-Jochen Heinze1, 2, 3 and Frank W. Ohl 2, 3, 4

*<sup>1</sup> Department of Neurology, University of Magdeburg, Magdeburg, Germany, <sup>2</sup> Systems Physiology of Learning, Leibniz Institute of Neurobiology, Magdeburg, Germany, <sup>3</sup> Center for Behavioral Brain Sciences, Magdeburg, Germany, <sup>4</sup> Systems Physiology, Institute of Biology, University of Magdeburg, Magdeburg, Germany*

The locus coeruleus (LC) is the sole source of noradrenergic projections to the cortex and essential for attention-dependent cognitive processes. In this study we used unilateral optogenetic silencing of the LC in an attentional set-shifting task (ASST) to evaluate the influence of the LC on prefrontal cortex-dependent functions in mice. We expressed the halorhodopsin eNpHR 3.0 to reversibly silence LC activity during task performance, and found that silencing selectively impaired learning of those parts of the ASST that most strongly rely on cognitive flexibility. In particular, extra-dimensional set-shifting (EDS) and reversal learning was impaired, suggesting an involvement of the medial prefrontal cortex (mPFC) and the orbitofrontal cortex. In contrast, those parts of the task that are less dependent on cognitive flexibility, i.e., compound discrimination (CD) and the intra-dimensional shifts (IDS) were not affected. Furthermore, attentional set formation was unaffected by LC silencing. Our results therefore suggest a modulatory influence of the LC on cognitive flexibility, mediated by different frontal networks.

#### Edited by:

*Gregory B. Bissonette, University of Maryland, USA*

#### Reviewed by:

*Juan Mena-Segovia, Rutgers University, USA Zackary Adam Cope, University of California, San Diego, USA*

\*Correspondence:

*Kathrin Janitzky kathrin.janitzky@med.ovgu.de*

*† These authors have contributed equally to this work.*

Received: *23 July 2015* Accepted: *12 October 2015* Published: *04 November 2015*

#### Citation:

*Janitzky K, Lippert MT, Engelhorn A, Tegtmeier J, Goldschmidt J, Heinze H-J and Ohl FW (2015) Optogenetic silencing of locus coeruleus activity in mice impairs cognitive flexibility in an attentional set-shifting task. Front. Behav. Neurosci. 9:286. doi: 10.3389/fnbeh.2015.00286* Keywords: optogenetics, halorhodopsin, B6.Cg-Tg(Th-cre)1Tmd/J hemizygous mice, locus coeruleus, attentional set-shifting task, cognitive flexibility, extra-dimensional set-shifting, prefrontal cortex

# INTRODUCTION

The Attentional Set Shifting Task (ASST) as an animal analog of the ID/ED task was designed to dissociate between two categories of frontocortical based kinds of behavioral flexibility in rodents: reversal learning and set shifting (Birrell and Brown, 2000). The ASST requires animals to initially learn a rule and form an "attentional set" within the same stimulus dimension before an extradimensional shift (EDS) is performed. During the EDS mice have to switch their attention to a previously irrelevant dimension (Kos et al., 2011). Here we used a modified version of the ASST (Bissonette et al., 2008) for reliable set shifting in mice. Since attentional set shifting depends on successful prior formation of an attentional set, the task was designed with internal construct validation (Young et al., 2010) by recording the performance as a ratio of EDS and last IDS trials. If this ratio exceeds one, sufficient set formation can be assumed (Garner et al., 2006; Bissonette et al., 2008).

Regulation of attention and behavioral flexibility are important functions attributed to the networks of the prefrontal cortex (PFC). There is evidence for a distribution of different cognitive functions measured by the ASST within specific regions of the PFC. In rodents for example, reversal learning, when reinforcement contingencies are altered within a single stimulus domain, recruits and engages the orbitofrontal cortex (OFC), whereas attentional set-shifting, in which attention is reallocated to a previously irrelevant perceptual dimension, depends on the medial prefrontal cortex (mPFC; Hamilton and Brigman, 2015).

Prefrontal networks are strongly modulated by catecholamines. In particular there is a dense noradrenergic innervation of the PFC by the locus coeruleus (LC), a small nucleus in the brainstem that is the sole source of norepinephrine (NE) in the cortex (Aston-Jones and Cohen, 2005). NE in the PFC seems to be required for cognitive flexibility (Milstein et al., 2007; McGaughy et al., 2008; Chamberlain and Robbins, 2013). In situations that warrant to disengage attention from a previously relevant dimension that lost its relevance, the PFC is required for rapid and successful adaptation of behavior in the changing environment (Lapiz and Morilak, 2006; Tait et al., 2007).

Lesion studies have shown that noradrenergic deafferentiation of the mPFC caused selective impairment of EDS performance (McGaughy et al., 2008), and therefore support a special role for cortical NA in cognitive flexibility, when the animal shifts from attending from one perceptual dimension to another (Lapiz and Morilak, 2006; Lapiz et al., 2007; Tait et al., 2007; Desteno and Schmauss, 2008; McGaughy et al., 2008). Since those prior studies are based on irreversible lesion techniques or long lasting pharmacological manipulations they were unable to differentiate between effects on the acquisition phase, memory consolidation phase or even long term plastic changes. Considering that small changes of catecholaminergic activity in the PFC profoundly affect cognitive functions, the current study applied optogenetic LC silencing during brief periods of task performance in Th::Cre-mice to investigate the effects on learning new strategies independent from memory consolidation. Optogenetic silencing can be restricted to the acquisition phase alone and does not interfere with the functions of the LC during the remaining time. Since the use of TH-Cre mice allows to target TH-positive neurons of the LC in a highly specific manner and since microbial halorhodopsins enable the inhibition of those neurons on a short time scale with excellent reversibility, optogenetic methods are well suited to study the role of the LC on complex behavior (Carter et al., 2010; de Lecea et al., 2012). In summary, we used short-term optogenetic LC silencing and studied the effects on cognitive flexibility in the attentional set-shifting task.

### MATERIALS AND METHODS

#### Animals

For the experiments we used naive male B6.Cg-Tg(Thcre)1Tmd/J hemizygous mice purchased from Jackson Laboratories, n = 14, aged 10 weeks at the date of surgery. These mice express Cre-recombinase under the control of the tyrosine hydroxylase (TH) promoter and therefore in the noradrenergic neurons of the LC. All procedures were committed by the European Council guideline 86/609/EEC and approved by local authorities.

# Surgery and Optogenetic Transduction

Virus injection and fiber implantation were performed as described previously (Carter et al., 2010). Briefly, mice were anesthetized, and one hole was drilled above the LC (AP 5.5 mm, ML −0.85 mm). A pulled glass pipette was lowered to the depth of 3.7 mm (relative to Bregma) and 500 nl virus solution (AAV2-Ef1a-DIO-eNpHR 3.0-EYFP, generously provided by Karl Deisseroth through the UNC vector core, 1.5E12 particles/ml) were injected into the tissue at a rate of 50 nl/min. The virus was allowed to diffuse into the tissue for an additional 10 min after the end of injection, before the pipette was removed. Immediately afterwards, an optic fiber (220µm, 0.37 NA, Doric Lenses) was implanted 400µm above the center of the injection. We transduced the LC unilaterally to avoid overly strong effects of bilateral silencing on general arousal. All animals were allowed to recover and express sufficient opsin amounts for at least 3 weeks before testing.

## Optogenetic Silencing

Before the experiment, mice were connected to the laser through a fiber optic cable (220µm core diameter, 0.37 NA) and a fiberoptic rotary joint (Doric Lenses). To induce optical silencing of noradrenergic cells in the test group, yellow laser light at 589 nm from a DPSS laser (CNI Lasers) was used. Due to the sharp decline in activation of eNpHR 3.0 by red shifted light (Zhang et al., 2007), it is possible to use spectrally-close red laser light (658 nm) in the control group, instead of resorting to the use of non-functional optogenetic constructs. Such an EYFP-control would have required the use of an additional viral vector with different expression and membrane integration properties. The output power of the laser was adjusted to yield an illumination intensity of 10–15 mW/mm<sup>2</sup> of light at the depth of the LC (Yizhar et al., 2011).

# Open Field Test

To test effects of LC suppression on locomotion, all animals were placed in an open field box (50 × 50 cm<sup>2</sup> , inner sector: 25 × 25 cm<sup>2</sup> ) for 9 min. The animals were videotaped and the position of the animal was tracked with a custom program written in Matlab (TheMathworks). The open field test was performed three times during the experimental period. The first instance of the test took place 1 week before surgery. The second test was conducted 3 weeks after virus injection and in absence of light stimulation. These first two sessions served as habituation to the open field environment and to screen for a potential detrimental effect of surgery. The third test was conducted 2 days before the ASST. During this session, laser light-induced LC silencing was administered for 1 min each, after 2, 5, and 8 min post trial start, respectively (589 nm in all mice). During the third session behavior was analyzed separately for the time period of pre-silencing, during silencing and post-silencing, repeated three times each, to screen for potential effects of LC silencing on ongoing behavior or a rebound effects after LC silencing. The influence of optogenetic silencing on locomotion was measured by total distance and time spent inactive. As a measure of anxiety the time spent in the center section of the open field was detected.

# Attentional Set-shifting Task (ASST)

We used a modified version of the attentional set-shifting task (ASST; Bissonette et al., 2008) for reliable set shifting in mice. Since prior studies showed that single intra-dimensional discrimination was insufficient to form an attentional set in mice, we included additional presentations of the same relevant dimension (7 IDS stages) and an additional reversal stage after the compound discrimination (CDrev) to strengthen the formation of an attentional set. Furthermore, we only tested shifting from the dimension odor to digging material to guarantee a better comparability, because prior studies indicated a tendency for better performance when the shift was from the material to the odor (Kos et al., 2011). Using digging materials of different size and shape, but without a difference in material composition (chemically identical materials in both bowls), helped to avoid discrimination by odor and forces the animal to form an attentional set in the intended dimension.

The odors consisted of commercially available household spices mixed into the digging material (odors and materials used in each trial are tabulated in **Figure 1D**). Digging material always contained a small amount of reward powder to preclude an olfactory-guided search for the reward itself.

The box we used was similar to those used in previous studies (Colacicco et al., 2002). Briefly, the apparatus consists of three compartments. A waiting compartment (30 × 30 cm<sup>2</sup> ) was separated by a Plexiglas door from two choice compartments (15 × 15 cm<sup>2</sup> , **Figure 1A**). Inside each choice compartment there was one digging bowl. One of the bowls was typically fitted with a reward (Kellog's Honey loop breakfast cereal). During the task, the mouse was allowed to transit from the waiting compartment to the choice compartments after the door was opened.

Before the main experiment, mice were habituated to the box over 3 days for two 10-min periods a day, spaced 3 h apart. During this time, the animals were free to explore and inspect the box and the bowls. During this habituation phase, both bowls contained a small piece of reward cereal. For habituation day 1, only the reward was placed in the bowls. For habituation day 2, the reward was placed on top of digging material. On habituation day 3 the reward was covered by digging material. As digging material during habituation home cage type litter was used. Under these conditions, mice learned quickly to dig for the reward. The session was terminated as soon as the mouse had found both food pieces or after 10 min elapsed, respectively.

FIGURE 1 | Experimental Setup. (A) Size and shape of the ASST box. The mouse was placed in the waiting compartment, the door was removed and the mouse allowed access to choice compartments. (B) Optical Setup. The mouse was connected to a fiber optic cable (FOC) that was connected to a fiber optic rotary joint (FRJ). Light was generated by a PC-controlled laser. (C) Example image of mouse in the box during control light illumination. (D) Table of odors and materials used for the different stages of the ASST: M1and2: wood pellets of different size, M3and4: aluminum foil pellets of different size, M5and6: cat litter of different size, M7and8: bark mulch of different size, M9and10: silica gel of different size, M11and12: plastic pellets of different size, M13and14: nuts and bolts, M15and16: metal shucks of different size, M17and18: gum Arabic of different size. Different olfactory stimuli were offered by various sweet dried herbs: O1, oregano; O2, parsley; O3, marjoram; O4, basil; O5, rosemary; O6, dill; O7, whitethorn; O8, stinging nettle; O9, lemon balm; O10, thyme; O11, ribgrass; O12, chamomile; O13, chives; O14, savory; O15, yarrow; O16, lime; O17, fennel; O18, mint.

During the ASST, only one of the bowls contained the reward. The two categories: digging material size/shape and odor were used to induce set formation. The ASST lasted 12 days. Each day, a different discrimination had to be performed (**Figure 1D**). The ASST started with a simple discrimination (SD), where only one dimension (the relevant dimension, two different odors) was presented in home cage type litter. On the next day compound discrimination (CD) followed, where the same relevant dimensions (odors) had to be discriminated, but two different digging materials were introduced as irrelevant dimension. The following day, compound discrimination reversal (CDrev) had to be performed. In this condition, reward contingencies are reversed, i.e., the previously unrewarded odor was now rewarded. Over the following 7 days of the task, intra-dimensional shifts (IDS I–VII) had to be performed. In these stages, different compound stimuli were presented, but the "relevant" dimension remained the same. In the course of these repetitions mice form an attentional set to attend to the relevant dimension (odor). After the last IDS, an intra-dimensional shift reversal (IDS VIIrev) was introduced, in which the reward contingencies of IDS VII were reversed, i.e., the rewarded and unrewarded odor are reversed. On the final day, mice had to perform the extra-dimensional shift (EDS), where new types of compound stimuli were presented, but the "relevant" dimension was changed (odor to digging material). Within the whole ASST, learning criterion was six consecutive hits. Not more than 31 trials per day were conducted, each of them lasting until the reward had been retrieved, the mouse started to dig in the wrong bowl or a maximum of 3 min had elapsed.

# Optogenetic Silencing During the ASST

The optogenetic approach was based on a previous study (Carter et al., 2010). Briefly, to achieve optogenetic silencing of the LC, we connected the mice to a fiber optic cable (200µm core diameter, 0.39 NA, Doric Lenses), attached to a fiber optic rotary joint (FRJ\_1x1, Doric Lenses). The joint was connected to a 589 nm yellow laser (CNI) in the test group (n = 7) and a 658 nm laser in the control group (n = 7) (**Figure 1B**). Light application was controlled by a custom program written in LabView. Within each ASST trial, laser illumination was switched on at the start of the trial and switched off, when the trial ended or after a period of 1 min, whichever occurred first (**Figure 1C**). This limited illumination for 1 min was chosen over continuous light to avoid opsin desensitization. Twenty seconds after the start of illumination, the sliding door was opened and the mouse was allowed to enter the choice compartments.

# Histology

After completion of the experiments, the mice were perfused and brains removed for histology. The brains were sectioned into coronal slices (50µm) and immunohistochemical analyses were performed. Noradrenergic cells were immune-stained with primary antibody against TH (polyclonal rabbit anti-TH ab112, Abcam). Anti-rabbit secondary antibody labeled with Alexa Fluor 546 was used. The expression of optogenetic construct and co-localization with TH-positive neurons was confirmed by fluorescence microscopy (**Figures 2A,B**). Due to the proximity of the LC and the ventricle and often poor tissue preservation, it was difficult to reconstruct the exact fiber position for all animals from histological sections. We have therefore resorted to microcomputed tomography (CT) scans, collected before perfusion, to confirm fiber position (**Figure 2C**).

# RESULTS

# Open Field

The influence of optogenetic silencing on locomotion was measured by total distance (pre-silence: 749.9 ± 70.8 cm; silencing: 675.2 ± 36.1 cm; post-silence: 722.4 ± 73.7 cm; p > 0.5, F = 0.36, One-way ANOVA) and time spent inactive (presilence: 67.4 ± 2.5%; silencing: 86.8 ± 1.2%; post-silence: 68.9 ± 2.8%; p > 0.5, F = 0.15, One-way ANOVA). As a measure of anxiety we calculated the time spent in the center section of the open field (pre-silence: 8.3 ± 1.5%; silencing: 10.8 ± 2.5%; postsilence: 12.6 ± 2.5%; p = 0.4, F = 0.95, One-way ANOVA, see **Figure 3**). These results show no significant locomotor or anxiety related effect of silencing. Furthermore, in the post-silencing period, no increased anxiety or locomotion over the pre-silenced period was evident.

# ASST

#### Impaired SD, CDrev and EDS by Acute Unilateral LC Silencing

All mice readily performed the task despite being connected to the optical setup. Shortly after the sliding door was opened, mice entered the choice compartments and began exploring.

A Two-way ANOVA with factors experimental stage and silencing revealed a significant effect of experimental stage and silencing [F(11, 72) = 4.8, p << 0.01; F(1, 72) = 18.7, p << 0.01, interaction F(11, 72) = 1.8, p = 0.07]. For further analyses, we divided the task stages into two groups: one group most strongly relying on cognitive flexibility (SD, CDrev, EDS) and one containing the remaining stages that rely less on cognitive flexibility. In the group requiring cognitive flexibility, more trials to criterion were needed in the treatment group [18.3 ± 1.4 trials (mean ± S.E.M)] than in the control group (12.2 ± 1 trials). This effect was found to be highly significant (t = 3.6, df = 4, p = 0.01, one-sided unpaired t-test). In contrast, no significant effect of silencing was found in the group of experimental stages requiring less cognitive flexibility (silenced: 12 ± 0.8 trials; control: 10.6 ± 0.7, t = 1.33, df = 16, p > 0.1) (**Figures 4A,B**).

Successful set formation was validated by comparing the ratio of trials needed to reach criterion in the EDS as compared to the IDS VII stages. Both groups showed EDS/IDS VII ratios exceeding unity (control group: 1.44; treatment group: 1.94, p < 0.01 one-sided t-test; **Figure 4C**), indicating successful set formation.

During the extra-dimensional set shift (EDS), LC-silenced mice needed significantly more trials to criterion (15.6 ± 2.2, mean ± S.E.M.) than control mice (10.3 ± 1.0; p < 0.05 Welchtest). Comparable latencies to dig of LC-silenced and control mice during EDS performance (median in both groups: 32 s, mean ± S.E.M.: silenced 41.8 ± 8.9; control 35 ± 4.8, p = 0.5,

Wilcoxon Rank-Sum Test) indicate that this difference was not caused by changes in running speed or response latency.

# DISCUSSION

Several studies support the hypothesis, that the noradrenergic innervation of the PFC is critical for functions like attention and working memory (Devauges and Sara, 1990; Lapiz and Morilak, 2006; Chamberlain and Robbins, 2013). Furthermore, noradrenergic projections to the PFC might be critical for the ability to rapidly switch attention between stimuli and stimulus categories, leading to cognitive flexibility (Lapiz et al., 2007; Tait et al., 2007; McGaughy et al., 2008). Recently, optogenetic methods have started to allow a reversible manipulation of neuronal activity with high genetic and temporal precision. Optogenetic methods are well suited to selectively study the role of a small nucleus like the LC that can be targeted through a Th-Cre mouse line. The temporal fidelity of optogenetics allowed us to restrict silencing to a specific task phase, namely the acquisition phase, in order to investigate the effects of LC silencing in the context of a task that requires learning of new strategies and cognitive flexibility.

Due to the function of the LC for general arousal (Aston-Jones and Bloom, 1981; Lapiz and Morilak, 2006), we conducted initial open field experiments to assess potential side effects of the optogenetic silencing on locomotion. Changes in running speed or anxiety might have unspecific effects on ASST performance. Our results do not show a significant effect of unilateral silencing on locomotion. This supports previous observations, reporting that unilateral suppression of the LC has relatively few non-specific behavioral effects (Carter et al., 2010; Alsene and Bakshi, 2011). Importantly, after the cessation of silencing, no increase in locomotor activity over the pre-silenced period was evident. This finding suggests that if rebound firing in the LC occurs after light offset, it is likely weak and has little behavioral impact. This absence of generalized side effects makes optogenetic unilateral LC silencing especially suited for the study of cognitive functions in the context of a behavioral task. In contrast, stimulation has much more pronounced, general effects on locomotion that can overlap with task relevant functions or even lead to a complete behavioral arrest (Carter et al., 2010).

The used task, the ASST, is an animal analog of the ID/ED task and examines attentional set-shifting and reversal learning in mice (Owen et al., 1991; Garner et al., 2006; Bissonette et al., 2008; McGaughy et al., 2008). The ASST allows internal validation of attentional set formation by comparing the performance of animals in the last IDS vs. EDS stages. Poorer EDS vs. IDS performance would suggest that an attentional set to the initial relevant stimulus dimension was formed (Young et al., 2010). Attentional set formation is further indicated by improved performance from IDS I to IDS VII. Our results are in accordance

to these requirements, hence it can be assumed that the mice did indeed form an attentional set in the silenced as well as control group. Successful set formation in the silenced group demonstrates that LC silencing does not impair general cognitive functions important for learning those parts of the task that are less dependent on cognitive flexibility. Set formation was even stronger in the silenced group as compared to controls, supporting our hypothesis that LC activity is mainly necessary during the acquisition of those parts of the task that require cognitive flexibility. An attentional set is a bias to attend to a particular stimulus dimension as a result of previous experience (Colacicco et al., 2002), and the cost of forming an attentional set is that it limits cognitive flexibility and therefore interferes with the ability to solve new inconsistent problems. Hence, during final EDS, when mice had to shift their attention to the prior irrelevant dimension that now predicts reward, they will take longer if the attentional set is more rigid or cognitive flexibility is reduced. Since, LC silencing during task performance did not generally impair learning but impaired EDS performance that requires cognitive flexibility, the "attentional shift cost" as ratio of EDS/last IDS trials is greater (Chase et al., 2012).

We found that silencing of the LC impaired performance particularly in those stages of the task that require cognitive flexibility, namely initial learning (SD), first reversal stage (CDrev) and extra-dimensional shift (EDS). During SD, the animal has to build a strategy to complete the task. Hence, a switch of attentional focus is required to establish a correct association between cue and reward. During CDrev, the learned strategy has to be abandoned and a new association has to be formed, and during EDS this association has to be performed even outside of the so far relevant stimulus dimension. This EDS stage, which required the highest cognitive flexibility, was also the most strongly affected stage in LC-silenced mice.

Different prefrontal subregions are required for these different cognitive functions. While reversal learning relies on an intact orbitofrontal cortex (OFC), the EDS requires mPFC- related functions. LC neurons are capable of modulating neuronal activity in each of these prefrontal subregions (Chandler and Waterhouse, 2012). Therefore, our present results are in accordance with lesion studies showing that selective noradrenergic deafferentiation of the mPFC impaired only EDS performance (McGaughy et al., 2008). Since LC-silencing modulates mPFC- as well as OFC-related functions, the found interference of silencing with reversal learning and set shifting is not unexpected. As demonstrated in the initial study from which we adapted the optogenetic method, LC silencing by eNpHR 3.0 results in a diminished norepinephrine release from LC terminals in the PFC (Carter et al., 2010). It is likely that this noradrenergic deficit is directly responsible for the observed effect on cognitive flexibility.

Interestingly, our results show an impact of LC silencing not only on EDS and CDrev performance, but also on SD performance. The found impairment in SD learning is in accordance with findings from mPFC-lesioned mice (Bissonette et al., 2008). This finding from Bissonette et al. was one further reason why we included SD into the cognitive flexibility group. Based on reported correlations between SD and EDS performance, a common mechanism might be responsible, for example the reduction of monoaminergic activity within mPFC, resulting in diminished cognitive flexibility (Colacicco et al., 2002). Data suggest that LC-silencing during task performance selectively interferes with acquisition, and inhibition of noradrenergic support in the absence of compensatory mechanism that are seen after lesions, induces deficits of initial learning and reversal learning in addition to impaired EDS. Therefore, our results strengthen the hypothesis that NE is recruited under conditions of unexpected uncertainty (Yu and Dayan, 2005; McGaughy et al., 2008). Due to the found effect on SD, EDS and reversal learning, it is likely that LC silencing is capable of modulating neuronal activity in different prefrontal subregions.

As mentioned above, prior studies based on lesions and pharmacological manipulations have already indicated that the noradrenergic LC is important for PFC-dependent functions. So far, irreversible lesion techniques or long lasting pharmacological manipulations were unable to differentiate between effects on

control group and silenced group, demonstrating that silencing did not interfere with set formation and that both groups successfully formed an attentional set.

acquisition phase or memory consolidation. Furthermore, after LC lesion with N-(2-chloroethyl)-N-ethyl-2-bromobenzylamine (DSP-4) functional recovery of LC noradrenergic neurons was reported by sprouting of the remaining noradrenergic axons that compensate the decreased noradrenaline in specific brain regions (Srinivasan and Schmidt, 2004). After DSP-4 lesion an increased concentration of NE was found in prefrontal cortex that indicates a gradual functional recovery (Srinivasan and Schmidt, 2004). A further post-lesion compensatory mechanism is that decreased NA levels in the PFC after LC lesion lead to changes in the adrenergic receptor profile and therefore influence spontaneous firing rate of mPFC pyramidal neurons (Wang et al., 2010).

The advantage of optogenetic silencing is that it is reversible and therefore no compensatory mechanisms are to be expected. Furthermore, it can be restricted to a particular phase alone and does not interfere with the functions of the LC during other times.

The shown interference of LC silencing with the acquisition phase of the task is in accordance with prior studies investigating reversible bilateral functional inactivation of LC by means of stereotaxic local microinjection of lidocaine. Khakpour-Taleghani and colleagues also show significantly impaired acquisition of spatial memory, but no effect on consolidation and/or retention in the Morris water maze task (Khakpour-Taleghani et al., 2009). Therefore, our results support the hypothesis that the noradrenergic system of the LC may play a more important role in acquisition than in consolidation and retrieval of memory.

Making use of the ASST, our study helps to more precisely define the specific role of the LC in frontal cortex dependent functions. From the results of this study, it is not possible to conclusively attribute the found effects to a particular frontal region. Based on the variety of projections it seems likely that the LC modulates larger parts of the frontal network. A more detailed study testing local silencing of LC terminals in various frontal regions could therefore provide more detailed information about the regions that are affected and under which circumstances they are recruited. In summary, our study demonstrates a specific influence of LC function on the acquisition phase of ASST stages that strongly rely on cognitive flexibility. Based on the anatomical projection pattern of the LC, this influence likely originates in the diverse network of frontal cortex and its noradrenergic LC-projections.

#### REFERENCES


## ACKNOWLEDGMENTS

This work was supported by the Deutsche Forschungsgemeinschaft (DFG). We thank Cathleen Knape for expert technical assistance.

attentional set-shifting test. Neuropsychopharmacology 32, 1000–1010. doi: 10.1038/sj.npp.1301235


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Janitzky, Lippert, Engelhorn, Tegtmeier, Goldschmidt, Heinze and Ohl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Acute physical exercise improves shifting in adolescents at school: evidence for a dopaminergic contribution

Timo Berse<sup>1</sup> \*, Kathrin Rolfes <sup>2</sup> , Jonathan Barenberg<sup>1</sup> , Stephan Dutke<sup>1</sup> , Gregor Kuhlenbäumer <sup>3</sup> , Klaus Völker <sup>2</sup> , Bernward Winter <sup>4</sup> , Michael Wittig<sup>5</sup> and Stefan Knecht <sup>6</sup>

<sup>1</sup> Psychology of Learning in Education and Instruction Group, Institute for Psychology in Education, Department of Psychology, University of Münster, Münster, Germany, <sup>2</sup> Performance Physiology Group, Institute for Sports Medicine, Department of Medicine, University of Münster, Münster, Germany, <sup>3</sup> Molecular Neurobiology Group, Department of Neurology, University of Kiel, Kiel, Germany, <sup>4</sup> Department of Social Sciences, Catholic University of Applied Sciences North Rhine-Westphalia, Münster, Germany, <sup>5</sup> Genetics and Bioinformatics, Institute of Clinical Molecular Biology, University of Kiel, Kiel, Germany, <sup>6</sup> Preventive and Rehabilitative Neurology at Mauritius Hospital and Institute of Clinical Neuroscience and Medical Psychology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

Giselle Petzinger, University of Southern California, USA Nelson Kenneth Totah, Max Planck Institute for Biological Cybernetics, Germany

#### \*Correspondence:

Timo Berse, Psychology of Learning in Education and Instruction Group, Institute for Psychology in Education, University of Münster, Fliednerstr. 21, 48149 Münster, Germany timo.berse@psy.uni-muenster.de

> Received: 06 May 2015 Accepted: 10 July 2015 Published: 28 July 2015

#### Citation:

Berse T, Rolfes K, Barenberg J, Dutke S, Kuhlenbäumer G, Völker K, Winter B, Wittig M and Knecht S (2015) Acute physical exercise improves shifting in adolescents at school: evidence for a dopaminergic contribution. Front. Behav. Neurosci. 9:196. doi: 10.3389/fnbeh.2015.00196 The executive function of shifting between mental sets demands cognitive flexibility. Based on evidence that physical exercise fostered cognition, we tested whether acute physical exercise can improve shifting in an unselected sample of adolescents. Genetic polymorphisms were analyzed to gain more insight into possibly contributing neurophysiological processes. We examined 297 students aged between 13 and 17 years in their schools. Physical exercise was manipulated by an intense incremental exercise condition using bicycle ergometers and a control condition which involved watching an infotainment cartoon while sitting calm. The order of conditions was counterbalanced between participants. Shifting was assessed by a switching task after both conditions. Acute intense physical exercise significantly improved shifting as indicated by reduced switch costs. Exercise-induced performance gains in switch costs were predicted by a single nucleotide polymorphism (SNP) targeting the Dopamine Transporter (DAT1/SLCA6A3) gene suggesting that the brain dopamine system contributed to the effect. The results demonstrate the potential of acute physical exercise to improve cognitive flexibility in adolescents. The field conditions of the present approach suggest applications in schools.

Keywords: physical exercise, acute intense exercise, executive functions, shifting, cognitive flexibility, adolescents, dopamine, gene

# Introduction

Physical exercise is not only capable of fostering physical health but also of improving brain functioning and cognitive processes (Hillman et al., 2008). Among cognitive processes, the executive functions of working memory (cf. Baddeley, 1996) were shown to especially benefit from physical exercise in an early meta-analysis (Colcombe and Kramer, 2003). Colcombe and Kramer reviewed studies with healthy older adults engaging in aerobic fitness training. In a recent review, Barenberg et al. (2011) also included studies on younger participants and distinguished chronic exercise and acute exercise on the intervention side. Chronic exercise studies implement activity programs involving repeated exercise sessions whereas acute exercise studies apply only a single exercise session. On the side of the executive functions, Barenberg et al. (2011) distinguished tasks demanding inhibition (of prepotent responses), shifting (between mental sets), and updating (of working memory content). The latter differentiation was adopted from the latent variable analysis by Miyake et al. (2000) who had shown that inhibition, shifting, and updating are moderately correlating but clearly separable executive functions. Barenberg et al. (2011) revealed that acute exercise can be beneficial for executive functioning measured after exercise. But consistent positive effects emerged only for inhibition performance. Updating was not studied so far, and shifting was shown to be unrelated to acute exercise in four studies. However, the generalizability of the latter studies may be questioned. Two of them investigated healthy adults in small samples with N < 19 (Tomporowski and Ganio, 2006; Coles and Tomporowski, 2008), one studied depressive adults (Kubesch et al., 2003) and one examined overweight children (Tomporowski et al., 2008). In contrast, two more recent studies demonstrated acute exercise effects on shifting in young adults (Berse et al., 2014; Barenberg et al., 2015). Thus, the evidence of acute exercise effects on shifting performance is still inconclusive. Moreover, no study examined this question in adolescents so far (Verburgh et al., 2014).

Behavioral neuroscience substantially adds to the field, because it provides methods to gain insight into possibly contributing processes on the neurophysiological level. So far, acute physical exercise was shown to stimulate a variety of physiological processes. It is supposed to elevate the monoamines norepinephrine, serotonine (Meeusen and Piacentini, 2001), epinephrine and dopamine (Winter et al., 2007), neurotrophic factors (Ferris et al., 2007; Rasmussen et al., 2009), and blood oxygenation (Ekkekakis, 2009; Rooks et al., 2010). Executive functions were shown to rely on a prefrontal cortex-basal ganglia network balancing cognitive flexibility and cognitive stability. The network is modulated by dopaminergic signaling. Baseline dopamine functioning seems to vary among individuals and differential responses to dopaminergic manipulations of the system were observed (Hazy et al., 2006; O'Reilly and Frank, 2006; Cools and D'Esposito, 2011). Combining the above mentioned neurophysiological findings reveals the fronto-striatal dopamine system underlying cognitive flexibility to be the most likely candidate to mediate between acute physical exercise and shifting. One study investigated the possible dopaminergic contribution to exercise effects on shifting in humans. Stroth et al. (2011) implemented a genetic marker of central dopamine availability in their study and found evidence for an involvement of dopamine in chronic exercise effects on shifting. Comparable approaches in acute exercise studies are missing.

Concerning further potential mediators, norepinephrine might also play a role. It seems to regulate arousal and cortical activity across different executive functions (Logue and Gould, 2014). This might argue for an unspecific involvement in acute exercise effects on shifting. Other unspecific effects could also be due to blood oxygenation. Epinephrine predicted inhibition performance during exercise in a study by McMorris et al. (2009). It is questionable, however, whether epinephrine also contributes to shifting after exercise. The nerve growth factor brain derived neurotrophic factor (BDNF) and the serotonine system did not account for shifting demands so far (Alfimova et al., 2012; Logue and Gould, 2014).

Against this background, the present study addressed the question whether acute physical exercise fosters shifting in adolescents. We hypothesized exercise to improve shifting performance in 13–17 year old adolescents, so that participants shift more efficiently between task sets after exercise compared to a control condition. It was expected that the dopamine system contributed to the effect. This was tested by the analysis of genetic polymorphisms. Genetic polymorphisms reflect individual differences in baseline neurophysiological functioning. If the fronto-striatal dopamine system contributed to acute physical exercise effects on shifting, the corresponding dopaminergic polymorphisms should predict differential effectiveness of the manipulation. Moreover, we explored the predictive value of further non-genetic individual difference variables.

# Materials and Methods

#### Participants

A total of 297 adolescents (158 male) took part in the study. Their mean age was 14.8 years (SD = 0.9; range = 13–17). According to the declaration of Helsinki, written informed consent was obtained from all students and their parents. Participation was voluntary and could be quit at any time of the experiment. The procedure had been approved by the ethical committee of the German Psychological Society. Twenty-nine participants failed to complete the study, mostly due to illness in at least one of the experimental sessions or due to concerns about sports capability.

#### Materials and Variables Health Screening

Sports physicians conducted a brief anamnesis and examination to ensure that participants were capable of attending intense physical exercise. They also assessed body mass index and waist-to-hip ratio. A self-developed health questionnaire was given to collect information about consumption of caffeine, cigarettes, alcohol, other drugs, and medication affecting the central nervous system. Furthermore, participants indicated their habitual physical activity on a self-developed five-point scale.

#### Fitness Test

Participants underwent a field test running protocol on a marked track in the gym to assess individual fitness levels. They started running at a speed of 8 km/h. Every 3 min speed was enhanced by 2 km/h until exhaustion. Running speed was controlled by an acoustic signal indicating when to reach the next mark. Medically trained personal took capillary blood samples from the ear lobes at the beginning of the test, after each speed level and after 3 and 5 min recovery to diagnose blood lactate concentrations. Blood lactate analysis

was done using a photometric method and a commercially available kit (EKF Diagnostic, Magdeburg, Germany). While capillary blood samples were collected, participants rated their perceived exertion on the Borg-Scale (Borg, 1970) ranging from 6 (no exhaustion at all) to 20 (complete exhaustion). Heart rate was continuously measured by a chest strap sensor (T31 coded system, Polar Electro, Germany). Individual speed at the anaerobic threshold represents a fitness indicator, which was employed as an individual difference variable and to check for the intensity of our exercise manipulation.

#### Physical Exercise

Physical exercise was manipulated in two conditions, a control condition and an acute intense exercise condition. In the control condition, participants were instructed to watch an infotainment cartoon episode on human body functions on the computer screen while sitting in a relaxed position. The condition was designed to resemble an educational setting with low cognitive and physical load.

The intense exercise condition was chosen according to findings in recent reviews on the exercise-cognition relation. Chang et al. (2012) illustrated that the highest exercise intensities were most effective when cognitive performance was measured after a delay following exercise. Furthermore, Lambourne and Tomporowski (2010) showed that cycling interventions yielded larger effects on cognition than running protocols. So, participants performed two bouts of intense exercise on a bicycle ergometer (Ergo Bike Premium 8i, Daum Electronics, Germany). The intervention was based on an interval protocol described by Meyer et al. (1997). Participants aimed at cycling at a speed of 70 revolutions per minute (rpm) with a corridor of 60–80 rpm denoted acceptable. Pedaling resistance was raised by 25 Watt every 10 s. The procedure started with a 3 min warm up at 25 Watt. Then, the two bouts followed with a 3 min recovery at 25 Watt in between. Exercise bouts were terminated and recovery started when participants indicated exhaustion or cycling speed dropped below 60 rpm. The protocol ended with another 3 min recovery phase at 25 Watt.

The duration of treatments was comparable between the experimental conditions, with the exercise condition lasting approximately 10–14 min depending on individual performance. Physical activation was controlled by assessing blood lactate concentrations before and after both interventions. Furthermore, heart rate was continuously measured. Maximum heart rate at the end of the second exercise bout served as an indicator of willingness for exertion.

#### Shifting

We applied a switching task requiring predictable shifts between mental sets in alternating runs. Our task employing nonverbal stimuli (Baadte and Dutke, 2013) was a modification of the number-letter task (Rogers and Monsell, 1995), which proved to be a good indicator of shifting (Miyake et al., 2000). In the switching task, stimuli appear in a predictable clockwise sequence in the four corners of a computer screen. Presentation starts in the top left corner, continues in the top right corner and so on. Stimuli vary in two dimensions, color (blue or yellow) and shape (circle or triangle) in our modification of the Rogers and Monsell task. Participants use the same two buttons to decide on these stimuli. If a stimulus appears in the top half of the computer screen (top left corner or top right corner) participants indicate the shape with buttons representing circle and triangle. If a stimulus appears in the bottom half of the screen (bottom right corner or bottom left corner) they indicate the color with the same buttons now representing blue and yellow. The switching task encompasses a predictable switch between task sets in every second trial. All in all, in half of the trials a switch is demanded (top left and bottom right) and in the other half a given set is to be maintained (top right and bottom left). Switch trials differ from no-switch trials in that they require executive resources to a greater extent. Typically, this leads to higher latencies in switch trials compared to no-switch trials (Monsell, 2003). Switch costs are calculated as the speed and accuracy differences, respectively, between switch trials and no-switch trials. Participants completed 96 trials in six blocks (16 trials per block), with the first block serving as a practice block. Inter-stimulus interval was 1200 ms, maximum response time was 5000 ms.

#### Genetic Polymorphisms

All genotypes were determined using genomic deoxyribonucleic acid obtained from white blood cells using standard procedures. Variable number tandem repeats (VNTRs) were genotyped using polymerase chain reaction amplification and manual scoring of the repeat numbers on high resolution agarose gel images. Single nucleotide polymorphisms (SNPs) were analyzed using a custom assay on a Sequenom mass array platform. Potentially relevant SNPs and VNTRs were analyzed. We included SNPs and VNTRs targeting the dopamine system (SNPs: rs6277, rs1800497, rs2283265, rs6280, rs936461, rs46000, rs37020, rs27072; VNTRs: 48bp-DRD4, 40bp-DAT1), the glutamate system (SNP: rs7301328), the serotonine system (SNPs: rs1352250, rs4570625), monoamines in general (SNP: rs6323), and BDNF (SNP: rs6265).

#### Mood Valence and Arousal

Participants rated their mood valence and arousal after the physical exercise conditions on the Self-Assessment Manikin (SAM; Lang, 1985), a nonverbal self-report measure of emotional state.

#### Intelligence

Two subtests (verbal retention and verbal processing capacity) of the German Berliner Intelligenzstruktur-Test [Berlin Intelligence Structure Test for gifted adolescents] (BIS-HB; Jäger et al., 2006) were given as measures of intelligence.

#### Attention Deficit Hyperactivity Disorder (ADHD)

Occurrence of ADHD symptoms was self-assessed on the German Selbstbeurteilungsbogen für Aufmerksamkeitsdefizit- /Hyperaktivitätsstörungen [Self-Assessment of Attention Deficit Hyperactivity Disorder] (SBB-ADHS; Döpfner et al., 2008). The questionnaire consists of the subscales attention deficit, hyperactivity, and impulsivity.

#### Design and Procedure

We investigated the influence of the within-subjects factor physical exercise (control, exercise) on performance in the switching task. The order of conditions was counterbalanced between participants. Adolescents were randomly assigned to one of the orders (control-exercise or exercise-control). Except for the fitness test, which was conducted in the gym, the present study was run in a mobile laboratory, which was installed in a rebuilt motor coach. The mobile laboratory ensured a certain standard of experimental control in the field study. Participants were tested in groups of maximally 12 students. The procedure encompassed a pre-experimental session, two experimental sessions, and a post-experimental session. Sessions took place with a 1 week interval except for the post-experimental session, which was conducted 3 weeks after the second experimental session. The whole procedure took place during the regular school schedule and starting times were held constant within groups across sessions. Adolescents were instructed to abstain from caffeine and nicotine in preparation to the data acquisition.

In the pre-experimental session, participants did the health screening and filled in the health and ADHD questionnaires. They also underwent the fitness test and the switching task was instructed and practiced in a short version. The experimental sessions 1 and 2 were comparably structured. The treatment (control or exercise) was followed by the switching task (96 trials version). Blood lactate concentrations were measured before treatment (control or exercise), after treatment, and after the switching task. Valence and arousal were assessed after treatment. In the post-experimental session, participants accomplished the intelligence scales, and a venous blood sample was taken for gene analysis. Finally, participants got a personal fitness test evaluation for participation and they were debriefed.

#### Results

#### Preliminary Analyses Health Screening

We used similar exclusion criteria as were used in the acute intense exercise study by Winter et al. (2007): daily consumption

of more than five cups of coffee, more than nine cigarettes,

or more than 50 g alcohol. Other predefined critical incidents were intake of other (illegal) drugs or medication affecting the central nervous system during the last month. Descriptive analysis revealed that 14 participants had to be excluded because of critical consumption of cigarettes (n = 2), cannabis (n = 5), methylphenidate (n = 4), and antihistamines (n = 3).

#### Shifting

Error rates were inspected to check for participants' understanding of the task and compliance to the experimental procedure. In accordance with our pilot studies, we observed high accuracy in both no-switch and switch trials (M > 95% correct responses). Extremely high frequencies of errors (higher than three times the interquartile range) were found in 27 participants. They were excluded from further analyses as it was reasonable to conclude that they didn't comply with the task. Finally, 227 participants formed the sample for hypothesis testing. **Table 1** shows the descriptive statistics for accuracy and speed data of the switching task.

#### Manipulation Check

We inspected participants' blood lactate concentrations before and after treatment (control, exercise) and after the switching task to check whether our manipulation was effective. **Table 2** shows that participants were physically active in the exercise condition and non-active in the control condition. Furthermore, lactate concentrations after treatment were contrasted to the lactate concentrations at the individual anaerobic threshold assessed in the fitness test. The pattern of differences revealed that in the control condition participants were below the lactate concentration measured at their anaerobic thresholds whereas in the exercise condition participants clearly outreached their thresholds. This pattern confirmed that exercise was done at a high intensity.

#### Genetic Polymorphisms

Genetic polymorphisms [see Section Genetic polymorphisms (Materials and variables)] were successfully determined in a subgroup of the final sample (N = 131). Some school classes were unable to attend (parts of) the post-experimental session due to problems in the class schedule. All SNPs were tested for



Speed is given in response latencies in milliseconds, accuracy in percentage of correct answers.

#### TABLE 2 | Blood lactate concentration (N = 227).


Lactate concentration was measured in mmol/l. Contrast was calculated as the lactate concentration after treatment (control, exercise) subtracted by the lactate concentration at the individual anaerobic threshold in the fitness test.

Hardy-Weinberg equilibrium. The SNP targeting monoamines in general (rs6323) was in disequilibrium and excluded from further analysis.

#### Hypothesis Testing

We conducted a 2 × 2 mixed factorial analysis of variance to test the effect of the within-subjects factor physical exercise (control, exercise) on performance in the switching task, controlling for the influence of the between-subjects factor order of conditions (control-exercise, exercise-control). The speed data revealed a significant effect of physical exercise on switch costs, F(1, 225) = 4.09, p = 0.044, η 2 <sup>p</sup> <sup>=</sup> <sup>0</sup>.02, with lower costs after the exercise treatment compared to the control treatment (see **Table 1**). We did not observe significant differences with respect to the noswitch trials, F(1, 225) = 0.11, p = 0.737, and switch trials, F(1, 225) = 1.28, p = 0.258. With regard to the accuracy data, the main effect of physical exercise was non-significant for all three measures (all F(1, 225) < 0.1).

Taking into account the between-subjects factor order of conditions yielded significant interactions of physical exercise (control, exercise) and order of conditions (controlexercise, exercise-control) on all three speed measures: switch costs [F(1, 225) = 5.66, p = 0.018, η 2 <sup>p</sup> <sup>=</sup> <sup>0</sup>.03], no-switch trials [F(1, 225) = 23.80, p < 0.001, η 2 <sup>p</sup> <sup>=</sup> <sup>0</sup>.10], and switch trials [F(1, 225) = 28.09, p < 0.001, η 2 <sup>p</sup> <sup>=</sup> <sup>0</sup>.11]. **Figures 1**, **<sup>2</sup>** depict the pattern of the interaction effects. In the order control-exercise, reaction times were faster after the exercise treatment compared to the control treatment for both no-switch trials and switch trials (see **Figure 1**). In the order exercise-control, a reversed pattern was observed, with lower reaction times following the control treatment. Taken together, it becomes clear that no-switch and switch reaction times were lower in the second experimental session compared to the first experimental session. The numerically highest difference resulted for switch trials in the condition order control-exercise. Consequently, switch costs (see **Figure 2**) were lower after exercise when the exercise treatment was conducted in the second experimental session and the control treatment in the first. In the order exercise-control no significant differences emerged. The accuracy measures showed no significant interaction effects: switch costs [F(1, 225) = 0.74, p = 0.391], no-switch trials [F(1, 225) = 0.98, p = 0.324], and switch trials [F(1, 225) = 3.30, p = 0.071]. No main effect of condition order was observed on any measure, neither speed nor accuracy.

#### Analysis of Genetic Polymorphisms

Potential genetic predictors of the exercise effect were investigated by means of a multiple regression analysis. To obtain a measure for the performance gains after exercise, we subtracted performance after the exercise treatment from performance after the control treatment for the speed measures (no-switch trials, switch trials, and switch costs). The higher the value, the higher the exercise-induced gain. Switch trial gain was significantly higher than no-switch trial gain, t(226) = 2.00, p = 0.047. Consequently, exercise-induced switch cost gain was positive. A stepwise regression procedure was run and normal distribution of the criterion variables was checked for as well as multicollinearity, homoscedasticity, and outliers of the predictors. We didn't find significant predictors of the no-switch trial gain. **Table 3** shows the significant predictors for the switch cost gain and **Table 4** displays the significant predictors for the switch trial gain.

Switch cost gain was predicted by the DAT1/SLC6A3 (rs46000) polymorphism with carriers of the A allele benefitting more than homozygote C carriers. Switch trial gain was predicted by the DAT1/SLC6A3 (rs46000) polymorphism, with A carriers benefitting more than homozygote C carriers, and DRD2ANKK1 (rs1800497) polymorphism, with T carriers benefitting more than homozygote C carriers.

#### Further Analysis

Finally, further individual difference variables were analyzed in a multiple regression analysis predicting performance gains after exercise. Further predictors encompassed sex, age, body mass index, waist-to-hip ratio, self-reported habitual physical activity, valence and arousal after exercise, ADHD symptoms, intelligence, fitness, and willingness for exertion. Only one significant predictor resulted for no-switch benefits. No-switch benefits were predicted by intelligence [b = 2.71, SE = 0.78, b <sup>∗</sup> <sup>=</sup> <sup>0</sup>.22, <sup>t</sup>(1, 149) <sup>=</sup> <sup>2</sup>.79, <sup>p</sup> <sup>=</sup> <sup>0</sup>.006, <sup>R</sup> 2 adj <sup>=</sup> <sup>0</sup>.04].

#### Discussion

The present study investigated the impact of acute physical exercise on shifting in 13–17 year old adolescents. Although there has been evidence that acute exercise has the potential to foster executive functioning across different age groups when tasks are conducted after exercise (e.g., Chang et al., 2012; Verburgh et al., 2014) we identified two open questions. Based on Miyake et al. (2000) and their finding that executive functioning is not a unitary concept but can be differentiated into separable executive functions, it has remained unclear whether exercise enhances specifically shifting performance. Additionally, studies focusing adolescents were, to our knowledge, lacking.

We expected adolescents to shift more efficiently between task sets after exercise compared to a control condition. Results were in accordance with our hypothesis. We observed significantly lower switch costs after exercise compared to the control

FIGURE 1 | The interaction of physical exercise (control, exercise) and order of conditions (control-exercise, exercise-control) on the speed measures of no-switch trials and switch trials. Post-hoc tests were Bonferroni corrected. \*\*p < 0.01.

condition (see **Table 1**). Although no significant differences were obtained in switch trials and no-switch trials, inspection of means in **Table 1** and the significantly higher exerciseinduced switch trial gain compared to the no-switch trial gain suggest that the reduction in switch costs could be mainly traced back to numerically lower reaction times in switch trials after exercise. Switch trials make higher demands on executive shifting performance than no-switch trials. Accordingly, our study demonstrated for the first time that acute exercise can foster shifting performance in adolescents and that this effect was specific for shifting demands rather than for non-shifting task demands, which were controlled for in no-switch trials.

Integrating the present experiment into former attempts to demonstrate acute exercise effects on shifting, it becomes clear that studies varied with regard to exercise mode and intensity. Both the present attempt and two recent studies in adults that yielded an effect so far (Berse et al., 2014; Barenberg et al., 2015) employed incremental and finally intense ergometer cycling (with aeorobic and anaerobic demands). In contrast, studies which failed to demonstrate this effect (Kubesch et al., 2003; Tomporowski and Ganio, 2006; Coles and Tomporowski, 2008) implemented moderate ergometer cycling. Tomporowski et al. TABLE 3 | Regression of exercise-induced switch cost gain (N = 131), b = unstandardized regression coefficient, SE = standard error, b\* = standardized regression coefficient.


Dopamine Transporter (DAT1)/SLC6A3 rs46000 is coded 0 = C/C, 1 = C/A or A/A. R<sup>2</sup> for the model is 0.03, adjusted R<sup>2</sup> is 0.03.

#### TABLE 4 | Regression of exercise induced switch trial gain (N = 131), b = unstandardized regression coefficient, SE = standard error, b\* = standardized regression coefficient.


rs46000 targeting the Dopamine transporter (DAT1) is coded 0 = C/C, 1 = C/A or A/A. rs1800497 targeting Dopamine D2 receptor density (DRD2)/ankyrin repeat and kinase domain containing 1 (ANKK1) is coded 0 = C/C, 1 = C/T or T/T. R<sup>2</sup> for the whole model is 0.08, the corresponding adjusted R<sup>2</sup> is 0.06.

(2008), who also did not find exercise effects on shifting, used a running protocol. In line with recent reviews (Lambourne and Tomporowski, 2010; Chang et al., 2012) the present work suggests that intense (compared to moderate) exercise as well as ergometer cycling (compared to running) is more likely to evoke executive benefits than other exercise interventions. Our findings are also in line with a recent meta-analysis which argued against age-specificity of acute exercise effects on executive functioning in general (Verburgh et al., 2014). Furthermore, the analysis of individual differences in the present study did not show any influence of age on the performance gains after exercise.

Another aim of the present study was to analyze genetic polymorphisms to gain more insight into possible contributing processes on the neurophysiological level. There is reason to assume that the dopamine system contributes to possible acute exercise effects on shifting though this has not been explicitly tested so far. In our study, switch cost gains were predicted by a polymorphism targeting the dopamine transporter DAT1/SLCA6A3 (rs46000) and switch trial gains were predicted by DAT1/SLCA6A3 (rs46000) and a polymorphism targeting dopamine D2 receptor DRD2/ANKK1 (rs1800497). We did not find genetic predictors for changes in no-switch performance. The results support the theory of a dopaminergic contribution to exercise-induced shifting gains. At least, this applies to the DAT1/SLCA6A3 polymorphism (rs46000) which predicted the switch cost gains, a measure that reflected improvements in executive rather than non-executive task demands.

The pattern of results matches findings from behavioral neuroscience. The dopamine transporter functions in the brain are region-specific with higher impact on subcortical rather than prefrontal areas. Concerning subcortical areas, the dopamine transporter releases and clears extracellular dopamine in the substantia nigra, in the striatum its main function is dopamine reuptake (Madras et al., 2005). The striatum is part of the fronto-striatal network described earlier which forms the basis of executive functioning. Two basic behavioral processes are balanced: Cognitive stability and cognitive flexibility (Cools and D'Esposito, 2011). Shifting, operationalized in the switching task, demands cognitive flexibility and the contribution of the striatum and the dopamine system to this process were recently outlined by Klanker et al. (2013). The dopamine transporter gene was associated with restingstate connectivity in the network which in turn predicted executive performance (Gordon et al., 2015). The frontostriatal network underlying executive functioning was also shown to be sensitive to manipulations. This was demonstrated using dopaminergic agents (Cole et al., 2013; Costa et al., 2014). Even benefits in cognitive training of updating, which demands cognitive flexibility, were shown to be determined by striatal dopamine with training gains related to dopamine transporter polymorphisms (see the review by Bäckman and Nyberg, 2013). Exercise proved to elevate dopamine levels in humans (Winter et al., 2007) and in the striatum of rats (Hattori et al., 1994). As our exercise intervention resembled the one used by Winter et al. (2007) and switch cost gains in the present study were predicted by a dopamine transporter polymorphism, there is reason to conclude that acute, intense physical exercise fostered shifting performance by manipulating dopaminergic neurotransmission. More specifically, exercise seems to affect the fronto-striatal network by influencing striatal dopamine metabolism. We observed that carriers of the A allele of DAT1/SLCA6A3 polymorphism (rs46000) benefitted more from physical exercise than homozygote C carriers. Direct investigations are necessary at this point to reveal the exact underlying neurophysiological mechanism. We propose, however, tying in with Cools and D'Esposito (2011), that carriers of the A allele possess a suboptimal resting-state connectivity for cognitive flexibility antagonistically implying more suitable connectivity for cognitive stability. Recent results of Cummins et al. (2012) support this idea: The authors found reduced cognitive stability in an inhibition task in C carriers of the DAT1/SLCA6A3 polymorphism (rs46000). Summarized, restingstate connectivity of A carriers might be optimal for cognitive stability and suboptimal for cognitive flexibility (with C carriers behaving the other way around). Our results suggest that the possible suboptimal resting-state connectivity for cognitive flexibility in carriers of the A allele can be improved by physical exercise.

The DRD2/ANKK1 polymorphism (rs1800497) also exerts influence on dopaminergic signaling by affecting D2 receptor density in the striatum (Pohjalainen et al., 1998; Ritchie and Noble, 2003). It was an additional predictor of exercise-induced switch trial gains in the present study. As the switch trial performance is not adjusted for non-executive task demands, this indicator is a less specific measure for cognitive flexibility. Beyond the executive shifting demands, switch trials (in contrast to switch costs) comprise storage processes and the motor reaction amongst others. The motor reaction is also influenced by dopaminergic signaling and relies on striatal neurons (Kreitzer and Berke, 2011). This is why the DRD2/ANKK1 polymorphism (rs1800497) could not be compellingly related to executive shifting gains following exercise in the present study. However, the existing literature concerning this polymorphism suggests an involvement in cognitive flexibility (e.g., Stelzel et al., 2010; Markett et al., 2011; Wishart et al., 2011).

We did not find evidence for a contribution of the serotonin nor the glutamate system nor BDNF. Thus, our study demonstrated the involvement of the dopamine system in acute exercise effects on shifting rather than the involvement of other neurophysiological mechanisms.

The further analysis revealed only one significant nongenetic predictor. Performance gains in no-switch trials were predicted by verbal intelligence. Results indicated that more intelligent participants improved their no-switch performances after physical exercise to a greater extent than less intelligent participants. This finding was unexpected and replication and further analysis are desirable.

Regarding limitations of the present account, one might criticize the small effect size of the exercise effect limiting practical relevance. However, the size of the effect is not only influenced by the power of the intervention but also by the choice of the control condition, which is often a matter of debate in exercise experiments. Most often, unspecified resting conditions are implemented with the disadvantage that it is unclear what participants really do beyond being physically inactive. The control condition in our study, however, was specified in that participants were asked to attend to an infotainment cartoon episode resembling an everyday multimedia learning situation. However, this situation might have involved more activity than a resting and/or relaxation situation. Thus, our control condition implies a more conservative test for the exercise condition as it was shown that engaging in cognition activates behavioral and neural resources via dopaminergic networks (Boehler et al., 2011). So, the control condition might have had a beneficial effect on shifting as well and may have diminished the measured effect size. Another limitation might refer to the design. To control for the repeated measurements, the order of conditions was counterbalanced. Unexpectedly, the intervention interacted with the order of conditions. The exercise condition proved to be particularly effective when it was conducted in the second experimental session. This might be due to the lack of a baseline that could demonstrate the effectiveness of physical exercise in the first experimental session. Alternative explanations refer to a differential effectiveness of physical exercise in the time course of repeatedly conducting the switching task (which might reflect a learning process). This question should be examined in future studies as well (cf. Barenberg et al., 2015).

To sum up, the present study found a moderate positive effect of acute intense physical exercise on shifting performance and considerable evidence for a dopaminergic contribution to the effect in contrast to other neurophysiological explanations. From an application perspective, the findings encourage efforts to foster physical activity in adolescents. Even acute physical exercise lasting only 10–14 min proved to foster executive shifting performance, an important skill for academic achievement (Yeniad et al., 2013). The field conditions with the experimental procedure taking place in schools during the normal school schedule assure validity of the results for schoolbased interventions.

### References


#### Author Contributions

SK, SD, KV, and BW designed and planned the study, KR and TB collected the data. Behavioral data was analyzed by TB and KR, gene data was analyzed by GK and MW. All authors took responsibility for interpreting the data. TB drafted the manuscript. All authors revised and approved the manuscript.

#### Acknowledgments

We thank Albert Fromme, Marianne Lambrecht, Jürgen Stork, and Lothar Thorwesten for supporting the data acquisition. This research was supported by the German Federal Ministry of Education and Research, grant 01GJ0810 to SK, KV, and SD. We acknowledge support by Open Access Publication Fund of University of Münster.

#### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbeh. 2015.00196

effects of dopamine neuromodulationson resting-state network connectivity. Neuroimage 78, 59–67. doi: 10.1016/j.neuroimage.2013.04.034


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Berse, Rolfes, Barenberg, Dutke, Kuhlenbäumer, Völker, Winter, Wittig and Knecht. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neural correlates of rules and conflict in medial prefrontal cortex during decision and feedback epochs

Gregory B. Bissonette1,2\* and Matthew R. Roesch1,2

<sup>1</sup> Department of Psychology, University of Maryland, College Park, College Park, MD, USA, <sup>2</sup> Program in Neuroscience and Cognitive Science, University of Maryland, College Park, College Park, MD, USA

The ability to properly adjust behavioral responses to cues in a changing environment is crucial for survival. Activity in the medial Prefrontal Cortex (mPFC) is thought to both represent rules to guide behavior as well as detect and resolve conflicts between rules in changing contingencies. However, while lesion and pharmacological studies have supported a crucial role for mPFC in this type of set-shifting, an understanding of how mPFC represents current rules or detects and resolves conflict between different rules is unclear. Here, we directly address the role of rat mPFC in shifting rule based behavioral strategies using a novel behavioral task designed to tease apart neural signatures of rules, conflict and direction. We demonstrate that activity of single neurons in rat mPFC represent distinct rules. Further, we show increased firing on high conflict trials in a separate population of mPFC neurons. Reduced firing in both populations of neurons was associated with poor performance. Moreover, activity in both populations increased and decreased firing during the outcome epoch when reward was and was not delivered on correct and incorrect trials, respectively. In addition, outcome firing was modulated by the current rule and the degree of conflict associated with the previous decision. These results promote a greater understanding of the role that mPFC plays in switching between rules, signaling both rule and conflict to promote improved behavioral performance.

#### Edited by:

Susan J. Sara, Collège de France, France

#### Reviewed by:

Kevin D. Beck, Rutgers, New Jersey Medical School, USA Phillip Michael Baker, University of Washington, USA

#### \*Correspondence:

Gregory B. Bissonette, Department of Psychology, University of Maryland, College Park, 1120 Biology-Psychology Building, College Park, 20742 Maryland, USA gbissone@umd.edu

> Received: 17 June 2015 Accepted: 18 September 2015 Published: 06 October 2015

#### Citation:

Bissonette GB and Roesch MR (2015) Neural correlates of rules and conflict in medial prefrontal cortex during decision and feedback epochs. Front. Behav. Neurosci. 9:266. doi: 10.3389/fnbeh.2015.00266 Keywords: set-shifting, mPFC, rule encoding, conflict, in vivo electrophysiology

# Introduction

The inability to alter behavioral responding in order to adapt behavior to changing situations is a hallmark of many human psychiatric disorders (Gold et al., 2008, 2009; Strauss et al., 2011). Patients who suffer from deficits in flexible behavior are able to learn information and form rules which instruct and guide choices, but lack the ability to alter their choices when contingencies change (Cools et al., 2000; Shamay-Tsoory et al., 2004). Appropriate use of behavior-guiding rules can lead to effective behavioral flexibility, enabling animals to successfully navigate an ever changing world (Harlow, 1949; Roesch et al., 2010). Patients with many disorders, including schizophrenia (Elliott et al., 1995; Pantelis et al., 1999), Parkinson's disease (Gauntlett-Gilbert et al., 1999; Monchi et al., 2004; Dirnberger and Jahanshahi, 2013) or drug addiction (Lyvers and Yakimoff, 2003) struggle with this ability, as studied on the Wisconsin Card Sorting Task (WCST). The WCST requires individuals to discriminate relevant from irrelevant information while sorting cards based on color, shape or number (Nelson, 1976; Prentice et al., 2008). Patients with the aforementioned disorders can, for example, sort cards by shape, and ignore irrelevant features like color and number, but when the sorting rule changes, they struggle to sort by number and ignore color and shape.

The animal literature clearly indicates medial prefrontal cortex (mPFC) is critical for some aspect of attentional setshifting (Dias et al., 1996a,b; Birrell and Brown, 2000; Colacicco et al., 2002; Bissonette et al., 2008; Roy et al., 2010), a function captured in the shift between sorting parameters during the WCST. Few studies have attempted to record from single neurons in mPFC while animals learned or shifted between rule strategies, and the majority of these studies occur in primates (White and Wise, 1999; Wallis et al., 2001; Bunge et al., 2003; Muhammad et al., 2006; Durstewitz et al., 2010). Yet it is not clear how these representations in mPFC develop, or how they ultimately guide behavior via downstream behavioral circuits.

Interference work has shown us that mPFC plays some role in forming associations between stimuli, responses, and outcomes so that one can learn the contingencies necessary to perform these types of tasks (Boettiger and D'Esposito, 2005; Oliveira et al., 2007). Rat dorsal mPFC (mainly prelimbic cortex) mediates spatial working memory and visual object information, along with cross-modal switching involving spatial location, visual objects and spatial locations with motor responses (Seamans et al., 1995; Kesner et al., 1996; Ragozzino et al., 1998, 1999a,b). Although mPFC lesions impact setshifting, they do not impair initial learning (Dias et al., 1996a; Birrell and Brown, 2000; Bissonette et al., 2008), suggesting that mPFC is not essential for rule learning, but is critical when rule contingencies change. Such a deficit might reflect a misrepresentation of rules after shifts and/or the inability to detect errors and resolve conflict between competing rules. Consistent with these hypotheses, neurophysiological work in behaving animals has shown us that activity in mPFC encodes expected value, future actions, stimulus-response associations, and is spatially selective (Nieder et al., 2002; Horst and Laubach, 2009, 2012; Narayanan and Laubach, 2009; Balleine and O'Doherty, 2010). Further, neural ensemble firing in mPFC reflects distinct active states during set-shifting, which is temporally related to behavioral performance (Durstewitz et al., 2010; Roy et al., 2010; Antzoulatos and Miller, 2011).

Here we furthered the investigation of mPFC's role in attentional set-shifting by recording in rats as they performed a two direction set-shifting task during which behavior is guided by odors or spatial cue lights. We found that a subset of mPFC neurons fired more strongly for one rule over another, and that activity in this population of neurons briefly increases early in a rule block, possibly signaling a need for shifting between rulebased strategies. Interestingly, a separate population of neurons represented both the response direction and the conflict inherent in the task, firing more for high conflict, low certainty trials, over low conflict, more certain trials. Finally, we show that all of these neural subtypes multiplex information by encoding both rewarded and non-rewarded outcomes differently. Together, these data suggest that some mPFC neurons encode one rule preferentially over another while other mPFC neurons are more active during high conflict trials during decision and feedback epochs, further supporting a role for increased attention signals in mPFC and showing how these attention signals mediate mPFC rule encoding.

### Materials and Methods

#### Subjects

Five single housed adult male Long-Evans rats (175–200 g) obtained from Charles River Labs (Wilmington, Massachusetts) and were tested at the University of Maryland, College Park, in accordance with the university and National Institutes of Health guidelines and with approval from University of Maryland, College Park Institutional Animal Care and Use Committee.

#### Set-Shifting Task and Analysis

Rats were required to nosepoke and follow a cue to a well for a fluid reward (10% sucrose). To train rats to nosepoke, wait the required delay periods (500 ms pre-cue, 500 ms cue, 1000 ms pre-fluid delay) and respond to fluid wells for reward took approximately 2 weeks. Rats were then trained to respond to left or right direction lights for reward (approximately 2 weeks of training) and to respond left or right for two distinct olfactory cues (approximately 2 weeks of training). Once rats were proficient at both light and odor responses, rats were given a week of training where odors and lights were presented together, yet on any given day, only one ''rule'' was to be followed (e.g., on Monday, follow lights and ignore odors, on Tuesday, follow odors and ignore lights). Once rats were performing over an 80% success rate on both a light and odor rule day, they underwent surgical implantation of recording electrodes. The recording chamber is identical to those used in previous work (Roesch et al., 2007; Roesch and Bryden, 2011). One wall panel has a central odor port (1<sup>00</sup> wide) with cue lights (EiKO 20.11 lumen bulbs) placed 3<sup>00</sup> on either side of the odor port, such that rats, when fully nosepoking, can still see a light to their left or right. One and three fourths inches below the odor port and 1 <sup>1</sup> <sup>4</sup> '' to both the left and right side of the odor port are the fluid wells. The cartoon representation in **Figures 1A,B** which also provides trialtype information gives an approximate representation of the size and position. For photos of the odor panel, please see Roesch et al. (2007), Roesch and Bryden (2011).

During the set-shifting task, we combined the presentation of both cues so that animals received simultaneous light and odor information (**Figure 1A**). When a house light was illuminated, rats were required to nosepoke and hold for 500 ms in order to receive 500 ms of simultaneous and random odor and direction light pairing. Rats were required to wait the entire 500 ms duration of the cues, then to respond to a direction of one of the dimensions of cue (odor or light), but not the other. Once a response was made, rats were required to hold in the fluid well for 1000 ms, before the outcome was presented (either sucrose reward, or not). After a rat exited the fluid well after consumption of reward, the houselights went dark and the rats experienced a 5 s inter-trial interval (ITI). In the event of an error, once rats exited

the fluid well an additional 3 s penalty was added to the 5 s ITI.

There were two main trial types (**Figure 1B**): trials where both cues indicated the same direction (e.g., right light and right odor) labeled as compatible trials, and incompatible trials, where both cues' directions were in conflict with each other (e.g., right light and left odor). Rats began a day with the correct rule being the same as the final rule from the previous day and were to shift to the other rule in block two. Rules were counterbalanced daily. Performance was monitored by a running average of 20 trials. Once rats reached 80% correct in block one, an additional 40 trials were added onto their total before rules were shifted to avoid rats anticipating a rule shift and to collect enough data post-criteria from each trial-type analyze. In addition to collecting trials to criterion, we classified error types are perseverative or regressive, where a perseverative error was continued responding to the incorrect rule on incompatible trial-types after the switch and before any correct new rule trials while regressive errors were errors on incompatible trial-types after the first correct response in the new rule block, but before the rat reached criterion.

#### Surgical Procedures

All surgical procedures were performed after training on the task. Five rats had a drivable bundle of 10, 25 µm diameter FeNiCr (iron, nickel, chromium) wires chronically implanted in the left or right hemisphere in dorsal mPFC at the top of prelimbic cortex (3.3 mm anterior to bregma, ±0.6 mm laterally, and 3.0 mm ventral to the brain surface) (**Figure 1C**; Bryden et al., 2012; Burton et al., 2013). After testing, rats were transcardially perfused with buffered 4% paraformaldehyde with brains postfixed at 4◦C. Freezing microtome sections (50 µm) were cut and stained with Thionin. Cannula locations and electrode placements were verified under light microscope and drawn onto plates adapted from the rat brain atlas (Paxinos and Watson, 2004). If electrodes had been implanted into the wrong areas, rats would have been excluded from the study, though none were.

#### Data Acquisition and Analysis

Experiments were performed in a behavioral chamber previously described (Schoenbaum and Roesch, 2005). We performed daily screening of active wires, and advanced the electrode assembly by ∼80 µm per day at the end of the recording session to record from a different neuronal population. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX). Signals from the electrode wires were amplified 20 times by an op-amp headstage (Plexon, HST/8o50-G20-GR), located on the electrode array. Immediately outside the chamber, signals were passed through a differential pre-amplifier (Plexon, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50 times and filtered at 150–9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz and amplified at 1–32 times. Waveforms >2.5:1 signal-to-noise were extracted from active channels and recorded to disk. Neurons were sorted using Offline Sorter and Neuroexplorer (Burton et al., 2013), and exported for analysis in Matlab (Bissonette et al., 2013).

We used a least-squares multiple regression to determine the number of cells where firing rate was significantly correlated with either the response direction or rule block when variance for the two remaining factors was accounted for. To achieve this, we compared a base model (k = 2; where k = the number of parameters) to a complex model (k = 3) in two separate iterations, where Y = firing rate (spikes/s) during the 500 ms cue epoch, Rule = coded as (−1 = odor rule) (1 = light rule), Direction = coded as (−1 = right) (1 = left), Compatibility = coded as (−1 = compatible) (1 = incompatible).

We began by finding cells for which a single factor model led to significant change in neural activity. These neurons were grouped according to regressor. Following this, we calculated the number of cells that provided a significant improvement of fit (via Incremental F-test) when the second parameter in model 2 was added to the single factor model. Counts of correlated cells were compared via chi-square (p < 0.05). When analyzing neural activity during behavior, mean firing rate during cue epoch for populations of neurons was compared via ANOVA, with post hoc t-tests when appropriate. Data were checked for normal distribution by KS test and chi-square goodness of fit using Matlab functions kstest and chi2gof. Behavior results were analyzed with two-way ANOVA or one-way ANOVA with post hoc t-tests when appropriate or planned t-test. Means and Standard Error of the Mean (SEM) are provided fort-tests as well.

#### Results

Rats readily learned to discriminate sensory cues and follow the appropriate rule. In 51 total sessions (out of 86), rats reached criterion in both the first and second rule blocks. **Figure 1C** demonstrates that shifting from rule one to rule two required significantly more trials and was challenging for the rats (t-test, Rule Means 26.3 vs 60.3, SEM 5.5 and 14.8, respectively, p < 0.001 t = −5.3, df = 50, Cohen's d = −1, effect size = −0.44). When analyzing errors (**Figure 1D**) in behavior during the rule shift, we observed that the majority of errors were of the regressive type, as compared to perseverative (t-test, Means 1.2 and 15.6, SEM 0.5 and 4.8, p < 0.001, t = −6, df = 50, Cohen's d = −1.32, effect size = −0.6: Nested t-test, Means 1.4 and 19.8, SEM 0.4 and 7, p < 0.05, t = 2.6, df = 4), suggesting that on trials where the cues presented conflicting information, rats ''regressed'' to the initial rule, even after having completed correct conflict trials during the second block. Importantly, these errors suggest that rats were attending to both rule dimensions after a shift occurred. To determine if one rule modality was represented more than another, we broke down the rule neurons by their preferred rule modality (odor or light). We found no differences in numbers of neurons representing either light or odor rule (37 preferred light rule, 42 preferred odor rule, 2-sample z-test to compare sample proportions, p = 0.4). In addition, rats showed no difference behavioral whether they started on odor rule and shifted to lights (32.1 trials to switch, SEM 7.9) or started on light rule and shifted to odor rule (36.6 trials to switch, SEM 8.2). Additionally, there was a significant difference of reaction times (time from cue off-set to when rats left the odor port) between compatible and incompatible trial-types (**Figure 1E**; t-test, p < 0.001, t = 4.0, df = 50, Cohen's d = 0.6, effect size = 0.3). Together, these data demonstrate that rats learned the initial rule in block one, and had difficulty switching to the new rule in block two. Additionally, there was an effect of trial-type on reaction time, demonstrating that rats took longer to make choices when rules were in ''conflict'' with each other.

We recorded single unit activity from 245 neurons during 51 sessions when rats (n = 5, **Figure 2A**) completed rule shifts. Multiple linear regression analysis of neural firing during cue epoch allowed us to categorize neurons by activity according to different task features, as shown in **Figure 2B**. Of the 245 neurons, 108 neurons (46%, a significant percentage χ <sup>2</sup> = 1168, p < 0.001) were modulated by rule (**Figure 2B** light blue wedge) or direction. Seventy-nine of those were modulated by rule but not direction or compatibility. Compatibility—whether cues were compatible (i.e., low conflict) or incompatible (high conflict)—was reflected in activity of 28 neurons (**Figure 2B**, yellow wedge), or 11%, all of which were also modulated by direction. In only 1 neuron was activity modulated by direction, independent from rule and compatibility parameters. Thus, the regression analysis divided neurons into two main groups; (1) neurons whose activity reflected the current rule; (2) neurons whose activity reflected response direction and compatibility of the cues. Though different numbers of neurons were recorded from each rat (Rat 1: 53, Rat 2: 45, Rat 3: 32, Rat 4: 50, Rat 5: 65) the proportional representation of each neural subgroup as identified in the regression analysis was equivalent. χ 2 analysis of the proportion of neurons observed in each subgroup revealed no significant differences in the representation of each animal's data (χ <sup>2</sup> = 60, p = 0.3) (Rule cells: Rat 1: 25%, Rat 2: 40%, Rat 3: 28%, Rat 4: 24%, Rat 5: 42%; Conflict cells: Rat 1: 8%, Rat 2: 9%, Rat 3: 9%, Rat 4: 18%, Rat 5: 13%).

#### Rule Encoding in mPFC

**Figure 3A** plots the neural activity for rule-only neurons for preferred and non-preferred rules, averaged over response

direction and compatibility (n = 79). For each neuron, ''preferred rule'' is defined by the rule that elicited the maximal response before averaging. Thus, it is no surprise when averaging activity over all neurons that activity elicited during the preferred rule (dark blue line) is significantly higher than during its nonpreferred rule (dark red line) when presented with the cues (purple shaded epoch, F(1,156) = 5.33, p < 0.05). This result confirms the results of our regression analysis. Interestingly, however, we also see that activity is significantly higher during pre-cue delay before light and odor cues were ever presented (yellow shaded epoch, F(1,156) = 4.58, p < 0.05). Post hoc t-tests supported the ANOVA results for both pre-cue and cue epochs (pre-cue: Means 0.25 and 0.17, SEM, 0.015 and 0.013, p < 0.001, t = 13.2, df = 78, Cohen's d = 0.7, effect size = 0.32: and cue: Means, 0.24 and 0.17, SEM, 0.015 and 0.013, p < 0.001, t = 13.4, df = 78, Cohen's d = 0.6, effect size = 0.3) epoch data plotted as inset bar graphs demonstrating that mPFC rule neurons robustly encoded one rule over another, even in anticipation of the cues.

Next we asked how many neurons were modulated during the trial block. To do this we averaged activity over the first 3 trials and compared that firing to a subsequent 3 trial window that slid in 1 trial increments. We found that 34 (43% of rule cells) were modulated within the trial block. Modulation within a rule block occurred with equal regularity in block one (n = 19) or block two (n = 15). To find if mPFC rule cell encoding was modulated before behavioral changes occurred or if they followed changes to behavior, we found the first trial when the neural activity was significantly different from the start of the preferred block. In addition, we plotted the average trial when those rats reached their behavioral criterion. **Figure 3B** demonstrates that in block one, a subset of rule-only cells significantly modulated activity well before reaching criterion (t-test, p < 0.001, t = −3.8, df = 6, Cohen's d = −2.2, effect size = −0.74). The same is true for rule-only cells observed in block two (t-test, p < 0.001, t = −9.7, df = 5, Cohen's d = −4, effect size = −1.0). These data suggest that a population of rule-only neurons began to represent the correct rule strategy before the animal's behavior adjusted appropriately.

To better visualize the time course of rule selectivity within these neurons we plotted the mean firing rate (**Figures 4A–C**) for these rule-modulated neurons early (first two correct trials), middle (next eight correct) and late (last ten correct) while subtracting the baseline activity for each of those trials and plotting activity during preferred rule (black, thick line) and nonpreferred rule (gray, thin line). We chose the first two trials as our early time point because in the analysis above, neurons would significantly change firing within 8–16 trials (**Figure 3B**). Thus, by examining the first 2 trials of each of 4 trial types we can examine activity preceding the shift, allowing us to determine when changes in neural signals might develop, both during the pre-cue epoch or cue epoch. Inset bar graphs represent the firing rate difference for pre-cue and cue epochs at the different stages in a rule shift (+/− SEM). Early in a block, neural activity of rule modulated neurons was not significantly different during pre-cue epoch (t-test, Means 0.63 and 0.41, SEM 0.09 and 0.07, p = 0.3, t = 1, df = 33, Cohen's d = 0.1, effect size = 0.1) and was significantly higher during the cue epoch (t-test, p < 0.01, t = 4.6, df = 33, Cohen's d = 0.5, effect size = 0.23) for preferred vs. nonpreferred rule blocks. In the ''middle'' phase (trials 3–10), neural activity for the preferred rule block was significantly elevated compared to non-preferred rule block during both the pre-cue epoch (t-test, Means 0.5 and 0.4, SEM 0.06 and 0.05, p < 0.01, t = 3.6, df = 33, Cohen's d = 0.41, effect size = 0.2) and the cue epoch (t-test, Means 0.6 and 0.4, SEM 0.08 and 0.76, p < 0.01, t = 2.8, df = 33, Cohen's d = 0.3, effect size = 0.15). Late in a rule block (last 10 trials), activity for preferred rules was still stronger than non-preferred rules during pre-cue epoch (t-test, Means 0.51 and 0.33, SEM 0.07 and 0.05, p < 0.001, t = 4.0, df = 33, Cohen's d = 0.5, effect size = 0.25) and cue epoch (t-test, Means 0.5 and 0.4, SEM 0.06 and 0.06, p < 0.01, t = 4.2, df = 33, Cohen's d = 0.3, effect size = 0.14).

Interestingly, mPFC rule-only neurons exhibited elevated activity in middle trials, compared to early or late trials. Preferred rule activity during the middle phase was elevated compared to early trials (t-test, Means 0.46 and 0.54, SEM 0.04 and 0.03, p < 0.001, t = 4.5, df = 33, Cohen's d = 0.32, effect size = 0.2) during both pre-cue and cue epochs (t-test, Means 0.46 and 0.58, SEM 0.05 and 0.04, p < 0.05, t = 3.0, df = 33, Cohen's d = 0.4, effect size = 0.2) (black lines, **Figures 4A,B**). This was not true for non-preferred rule activity, which was unmodulated during

.

pre-cue (t-test, Means 0.41 and 0.4, SEM 0.06 and 0.05, p = 0.87 t = 0.15, df = 33, Cohen's d = 0.02, effect size = 0.01) or for cue activity (t-test, Means 0.41 and 0.45, SEM 0.07 and 0.07, p = 0.18 t = 2.7, df = 33, Cohen's d = 0.17, effect size = 0.1) (Gray lines, **Figures 4A,B**). Additionally, pre-cue and cue epoch activity were significantly decreased (t-test, Means 0.6 and 0.4, SEM 0.07 and 0.05, p < 0.01, t = 3.0, df = 33, Cohen's d = 0.22, effect size = 0.11) late compared with middle trials in the preferred rule block (black lines, **Figures 4B,C**). The same was true among nonpreferred rule activity, which was significantly decreased on late compared to middle (t-test, Means 0.5 and 0.37, SEM 0.06 and 0.05, p < 0.01, t = 2.7, df = 33, Cohen's d = 0.2, effect size = 0.14) trials (Gray lines, **Figures 4B,C**). Thus in addition to a divergence between preferred and non-preferred rules, there appears to be a second signal; where activity in the preferred rule increases early in the block and wanes with learning. Increases in firing that take several trials to develop are consistent with previous reports demonstrating that it takes several trials for changes in attention to be engaged after reward prediction errors (RPEs) have been detected. These data suggest that mPFC rule neurons may be important for signaling the need to shift and the appropriate

neuron's preferred rule was in block one or two. Significance of at least p < 0.05 denoted with asterisk <sup>∗</sup>

rule. Because neural data for rule neurons changed over the course of their preferred rule, we hypothesized that the change in rule signal may be related to improved behavioral performance. To this end, we plotted the average neural firing rate during the cue epoch for preferred and non-preferred rule blocks against behavioral performance (percent correct) for the session. **Figure 4D** demonstrates a significant negative correlation (r = −0.34, p < 0.05) for rule neurons in the preferred rule block, but not in the non-preferred rule block (**Figure 4E**) (r = 0.1, p = 0.8). Together, these data suggest that mPFC rule representations strengthen over the course of the block, as the difference between activity for preferred and non-preferred rules diverges over time.

Since activity of rule-modulated neurons on correct trials changes over a block and was correlated with performance, we hypothesized that failure to properly reflect the current rule might underlie erroneous decisions. To test this hypothesis, we plotted activity on error (**Figure 5A**; thin dashed) and correct (**Figure 5A**; thick solid) trials during both preferred (**Figure 5A**; dark blue) and non-preferred (**Figure 5A**; dark red) rule blocks. Consistent with our hypothesis, we found that activity was diminished on trials during both the pre-cue (F(3,135) = 9.67, p < 0.001) and the cue epochs (F(3,135) = 6.52, p < 0.001), with post hoc t-test revealing significant differences between preferred rule and non-preferred rule correct and error activity during precue (Means 0.3 and 0.23, SEM, 0.02 and 0.02, p < 0.01, t = 4, df = 33, Cohen's d = 0.6, effect size = 0.3, inset) and cue epoch (Means 0.3 and 0.22, SEM, 0.02 and 0.02, p < 0.001, t = 3.4, df = 33, Cohen's d = 0.6, effect size = 0.3), respectively and lower activity for non-preferred errors than preferred rule errors (Means 0.3 and 0.23, SEM 0.02 and 0.2, p < 0.05, t = 2.4, df = 33, Cohen's d = 0.5, effect size = 0.24 light blue vs. light red bars, inset) during cue epoch. More importantly, on errors, the difference between preferred rule and non-preferred rule was not significant during either epoch (pre-cue: t-test, Means 0.23 and 0.22, SEM 0.01 and 0.01, p = 0.3, t = 0.9, df = 33, Cohen's d = 0.14, effect size = 0.1, inset; cue t-test, Means 0.22 and 0.2, SEM 0.02 and 0.02, p = 0.2, t = 1.4, df = 33, Cohen's d = 0.21 effect size = 0.11, inset). These data suggest that when activity in mPFC was low and rule selectivity was reduced, rats incorrectly followed the wrong rule.

Surprisingly, neural activity in this population of neurons was also modulated during the outcome phase. **Figure 5B** shows activity of the same neurons aligned to fluid-well entry, and plots activity through the 1000 ms pre-fluid delay (green bar) to reward delivery (blue bar). Neural activity of rule neurons on both preferred and non-preferred rule blocks was significantly

lower after erroneous choices during both the pre-fluid delay (F(3,135) = 4.03, p < 0.01) and outcome epoch (F(3,135) = 7.41, p < 0.001) compared to correct choices and compared to baseline (F(1,135) = 5.33, p < 0.05). Post hoc t-tests identified significant differences between both pre-fluid delay and outcome epochs for correct vs. error trials (preferred rule correct vs. error, Means 0.2 and 0.13, SEM 0.02 and 0.02, p < 0.001, t = 3, df = 33, Cohen's d = 0.5 effect size = 0.24, non-preferred rule, Means 0.21 and 0.12, SEM 0.03 and 0.02, p < 0.001, t = 3.3, df = 33, Cohen's d = 0.65, effect size = 0.31, inset). Significant differences also exist during the outcome epoch for preferred vs. non-preferred rules correct trials (preferred rule, Means 0.21 and 0.15, SEM 0.3 and 0.2, p < 0.05, t = 4.8, df = 33, Cohen's d = 0.77, effect size = 0.36, non-preferred rule, Means 0.15 and 0.12, SEM 0.02 and 0.02, p < 0.05, t = 3.5, df = 33, Cohen's d = 0.55, effect size = 0.3, inset), but not on error trials (Means 0.11 and 0.09, SEM 0.02 and 0.02, p = 0.8 t = 0.42, df = 33, Cohen's d = 0.12, effect size = 0.04, inset) or during pre-fluid delay epoch (Means 0.19 and 0.18, SEM 0.023 and 0.02, p = 0.2, t = 0.6, df = 33, Cohen's d = 0.11, effect size = 0.1 and Means 0.13 and 0.01, SEM 0.02 and 0.02, p = 0.9, t = 0.4, df = 33, Cohen's d = 0.04, effect size = 0.1, correct and error trials, respectively, inset bar graph).

It might be argued that activity during both the prefluid delay and outcome epochs reflects differences in licking. However, this interpretation does not hold for several reasons. First, we will show that rats do not lick differently during rewards delivered in different rule blocks. Second, licking rapidly increased immediately after entering the fluid well, while neural activity declined. Increasing licking activity was only observed during anticipation of reward delivery, as lick rates slowed once the fluid was delivered. Furthermore, on error trials activity rapidly decreased during the pre-fluid delay even through rats were licking during this period. Finally, on error trials, firing rates dropped below baseline during the outcome epoch. Thus, during both baseline and outcome epochs there was no licking, but activity significantly differed.

These arguments are supported by **Figure 5C** which plots lick activity, by photobeam breaks in the fluid well. There was a significant main effect of correctness (F(3,135) = 45.85, p < 0.001) during the pre-fluid delay epoch. Post hoc

red) rule blocks for correct (solid) and error (dashed) trials with inset bar graphs showing data from pre-cue and cue epochs for correct trials and error trials (faded blue and red bars). Though neural activity is the same as correct trials at baseline, it is significantly reduced during pre-cue and cue epochs. (B) Aligning the data from (A) to fluid-well entry and plotting pre-fluid delay and reward delivery epochs, we see that neural activity of rule-modulated neurons also reflects trial outcomes, being significantly reduced on error trials and highest during preferred rule correct trials. (C) Licking aligned to fluid well entry. Reward is delivered 1 s later on correct trials. Licking did not significantly differ between rules. Significance denoted of at least p < 0.05 with asterisk <sup>∗</sup> .

t-tests demonstrate that rats begin licking on correct trials before reward delivery equally for both preferred and nonpreferred rules (p = 0.5). Rats also lick throughout the prefluid delay on error trials, despite receiving feedback that their choice was incorrect (house lights turn off upon wrong fluidwell entry), though significantly less than on correct trials (p < 0.001). The same is true of the outcome epoch, where there was a significant main effect of correctness (F(3,135) = 212, p < 0.001), where error responses elicited significantly decreased neural activity compared to correct responses for preferred or non-preferred rules (p < 0.001, = 17.8, df = 33, Cohen's d = 4.3, effect size = 0.91). These data support the idea that differences in firing between correct vs. incorrect and preferred vs. non-preferred rules does not reflect differences in licking behavior during the trial feedback. Furthermore, increased licking during the pre-fluid delay on error trials supports the notion that rats, albeit to a lesser degree, still anticipated reward on those trials, suggesting that decreased firing observed with erroneous outcomes might reflect a worse-than-expected outcome.

#### Conflict Encoding in mPFC

We observed 29 (12%) of neurons which encoded direction in our task, 28 of which also reflected the conflict inherent in the different trial-types. **Figure 6A** illustrates the average neural activity for direction neurons, broken down into preferred direction (thick lines), non-preferred direction (thin lines) and by trial type, with cues in compatible (green) or incompatible directions (yellow). There was a significant effect of trial-type and direction (F(3,115) = 11.04, p < 0.001) of firing during the cue epoch, but not during the pre-cue epoch (F(3,115) = 0.74, p = 0.53). Post hoc t-tests of firing during the cue epoch revealed elevated firing for incompatible trials in both the preferred and non-preferred directions (Means 0.14 and 0.2, SEM 0.01 and 0.01, p < 0.05, t = 2.2, df = 28, Cohen's d = 0.5, effect size = 0.24) compared to compatible trial-types (inset bar plots; **Figure 6A**, Yellow vs. Green). This effect was true when a correct response was in the neuron's preferred direction (**Figure 6A** thick Yellow vs. thick Green and inset, Means 0.14 and 0.19, SEM 0.013 and 0.008, t-test, p < 0.05, t = 2.3, df = 28, Cohen's d = 0.52, effect size = 0.25) and the non-preferred direction (Thin yellow vs. thin green and inset, t-test, Means 0.09 and 0.12, SEM 0.01 and 0.012, p < 0.05, t = 2.2, df = 28, Cohen's d = 0.53, effect size = 0.3). To investigate whether activity of direction neurons reflected outcomes, we aligned the data to fluid-well entry (**Figure 6B**), allowing comparison of prefluid and outcome epochs. There was no significant difference between directions or compatibility during the pre-fluid delay (F(3,115) = 0.25, p = 0.9) but there was during the outcome epoch (F(3,115) = 4.29, p < 0.01). Post hoc t-tests revealed a significant difference of activity on preferred direction incompatible trials during the outcome epoch, compared to preferred direction compatible trials (Means 0.2 and 0.14, SEM 0.01 and 0.01, p < 0.001, t = −2.04, df = 27, cohen's d = −0.5, effect size = −0.24) and compatible non-preferred direction (Means 0.17 and 0.12, SEM 0.02 and 0.01, p < 0.001, t = −4.2, df = 28, Cohen's d = −0.2, effect size = −0.2). There were no differences in how the rats licked (**Figure 6C**) during the pre-fluid epoch (F(3,115) = 1.78, p = 0.2) or the outcome epoch (F(3,115) = 0.53, p = 0.7) suggesting that differences in firing between compatible and incompatible trials cannot merely reflect differences in licking.

FIGURE 6 | Conflict and direction activity in the mPFC. (A) Top panel plots direction neurons based on preferred direction (thick) and non-preferred direction (thin) and by trial-type, compatible (green) and incompatible (orange) with inset bar graphs showing data during both pre-cue and cue epochs for correct and error (faded green and yellow) trials. While direction neurons were more active for one direction over another, activity in these neurons was more active when directional cues were in conflict than when they were compatible with each other. (B) Direction and conflict data demonstrates elevated activity for preferred direction incompatible trials during the outcome epoch, but not pre-fluid epoch. (C) Plots licking data demonstrating no difference in lick rate between trial-types. (D) Collapsing the data from (A) into compatible or incompatible trials, we see that directional neurons fire more for incompatible trial-types than for compatible trial-types when presented with cue information. (E) Activity plotted by compatibility averaged across direction during the pre-fluid and outcome epochs demonstrates a significant elevation of activity on correct incompatible trial-types compared to compatible trial-types but not during the pre-fluid epoch. (F) Plots licking rate by compatibility, and demonstrates no difference in lick rate between compatible or incompatible trial-types in either the pre-fluid or outcome epochs. Significance denoted of at least p < 0.05 with asterisk <sup>∗</sup> .

To further illustrate differences in neural activity for incompatible and compatible trials, we collapsed across directions and plotted neural activity for compatible and incompatible trial-types. **Figure 6D** shows the same data as 6a, but collapsed across directions to display compatible vs. incompatible activity. There was no significant difference in neural activity between compatible or incompatible trial-types during the pre-cue epoch (Means 0.12 and 0.11, SEM 0.01 and 0.01, t-test, p = 0.14, t = 1.5, df = 28, Cohen's d = 0.1, effect size = 0.1, inset), and activity was significantly elevated on incompatible trial-types during the cue epoch (Means 0.16 and 0.12, SEM 0.007 and 0.01, t-test, p < 0.01, t = 4.0, df = 28, Cohen's d = 0.44, effect size = 0.22, inset). Plotting compatibility during trial-outcomes (**Figure 6E**), we observed that there was no significant difference in neural activity during the pre-fluid delay (Means 0.12 and 0.13, SEM 0.03 and 0.02, t-test, p = 0.2, t = −1.3, df = 28, Cohen's d = −0.14, effect size = −0.1, inset) but there was a significant difference in neural activity during the outcome epoch (Means 0.12 and 0.15, SEM 0.01 and 0.01, t-test, p < 0.01, t = −4.2, df = 27, cohen's d = −0.21, effect size = −0.1, inset). There was no difference in lick rate (**Figure 6F**) during either the pre-fluid period (Means 4.8 and 4.8, SEM 0.3 and 0.3, t-test, p = 0.8, t = 0.2, df = 27, Cohen's d = 0.02, effect size = 0.01, inset) or the outcome epoch (Means 6.6 and 6.7, SEM 0.3 and 0.3, t-test, p = 0.9, t = 0.9, df = 27, Cohen's d = 0.03, effect size = 0.02, inset).

Increased firing on incompatible trials might be necessary to resolve high conflict when two rules oppose each other. To test this hypothesis we compared firing on error trials to correct responses made in the same response direction (**Figure 7A**). We observed a significant main effect of correctness during the cue epoch (F(3,115) = 4.25, p < 0.01) but not during the pre-cue epoch (F(3,115) = 0.97, p = 0.4). Post hoc t-tests revealed a significant reduction in activity for error trials, compared to correct trial activity in both the preferred or non-preferred directions (p < 0.05, t = 2.3, df = 27, Cohen's d = 0.52, effect size = 0.25, inset bar graphs) during the cue epoch. Thus, when firing was low, rats tended to make errors.

Neural activity of the same neurons is aligned to fluid well entry is illustrated in **Figure 7B**. Activity was not significantly different between correct and incorrect trials during the prefluid delay (F(3,115) = 2.1, p = 0.1) though it was significantly different during the outcome epoch (F(3,115) = 5, p < 0.01) with post hoc t-test revealing significant differences between preferred and non-preferred direction correct and error activity during the outcome epoch (Means 0.24 and 0.3, SEM 0.02 and 0.02, p < 0.001, t = 6.9, df = 27, Cohen's d = 1.5, effect size = 0.6 and Means 0.16 and 0.1, SEM 0.02 and 0.01, p < 0.01, t = 3.3, df = 27, Cohen's d = 0.72, effect size = 0.34, respectively, inset). In addition to signaling whether or not reward was delivered, activity during the outcome epoch was significantly higher in the preferred direction (as defined during cue epoch) relative to the non-preferred direction (Means 0.18 and 0.12, SEM 0.02 and 0.02, t-test, inset, p < 0.05, t = 2.1, df = 27, Cohen's d = 0.54, effect size = 0.26). As highlighted above, these differences cannot reflect differences in licking; licking was not different between preferred and non-preferred directions and neural activity during the outcome epoch was reduced relative to the baseline epoch **Figure 7C**.

#### Discussion

Numerous studies have supported a critical role for the mPFC in managing the shift between rule-based responses (Dias et al., 1996a,b; Birrell and Brown, 2000; Colacicco et al., 2002; Bissonette et al., 2008). These studies have investigated rule shifting across a number of different sensory modalities and response strategies including shifts from cued to egocentric responses in maze tasks (Ragozzino et al., 1999a,b), operant tasks (Floresco et al., 2008; Durstewitz et al., 2010) and shifts between different cued responses (digging tasks) (Birrell and Brown, 2000; Colacicco et al., 2002; Bissonette et al., 2008). Across nearly all of these tasks, rats were able to respond reliably to rule-based response options, but showed impairment when contingencies shift so that previously reliable response options are no longer predictive of a rewarded outcome. Critically, nearly any perturbation to the prefrontal cortex has elicited these deficits, including lesions studies (Dias et al., 1996b; Birrell and Brown, 2000; Bissonette et al., 2008), pharmacological inactivation (Stefani et al., 2003; Floresco et al., 2008) and pharmacological interventions focusing on the role of dopamine (DA; Ragozzino, 2002; Floresco et al., 2006), GluN2B (Stefani and Moghaddam, 2006; Dalton et al., 2011; Brigman et al., 2013; Marquardt et al., 2014) and norepinephrine (Tait et al., 2007), genetic manipulations (Brigman et al., 2013; Bissonette et al., 2014) and recently, optogenetically (Cho et al., 2015).

Though there is a wealth of research demonstrating a critical role for mPFC in mediating rule shifting, there is a dearth of recording studies in rats, identifying how the prefrontal cortex may be accomplishing this task. Primate studies (White and Wise, 1999; Wallis et al., 2001; Bunge et al., 2003; Muhammad et al., 2006; Cromer et al., 2010; Antzoulatos and Miller, 2011) dominate the field, with only one rat study (Durstewitz et al., 2010) suggesting that rat prefrontal neurons switch encoding as

FIGURE 7 | Conflict activity on error trials. (A) Population histogram of conflict neurons on incompatible trial-types (orange) plotted by direction (full color are preferred direction, faded color are non-preferred directions in bar graph insets) showing decreased neural activity on error trials (dashed red or faded red for inset bar graphs) during the cue epoch. (B) Aligning data to fluid-well entry demonstrates a significant reduction in activity during pre-fluid delay and after reward should have been delivered, suggesting these neurons also represent the outcome of the trials. (C) Licking did not significantly differ between directions. Significance denoted of at least p < 0.05 with asterisk <sup>∗</sup> .

ensembles, together representing a rule. In our task, we were able to counterbalance directional responding evenly across all rule blocks, and are thus able to dissociate neural correlates of direction, conflict and rules. Further, by presenting two separate external cues, we control for the possibility that neurons representing a directional response are active during the epoch once cues are presented, and not an unknown time before. Our data support these previous results, and further demonstrates that individual neurons in the rat mPFC more strongly encode one rule block over another. Additionally, our data suggest that not all neurons are involved in abstract rule encoding. In our task, a subset of mPFC neurons was directionally tuned, such that activity reflected a preferred response direction. Notably, these neurons not only represented the response direction, but activity of these neurons reflected heightened conflict on challenging trial-types (i.e., conflict between two rules). Thus, while activity of some neurons represented the current rule block, a separate population of neurons were more active on incompatible trialtypes, when more attentional resources were needed in order to decide which response option was appropriate. All of these electrophysiological comparisons are possible because of the rigorous nature of this behavioral task.

Neurons in the mPFC which were selective for one rule over another were active not only during cue presentation, but also during the pre-cue epoch. This pattern of activity differed from the conflict-neuron population, where activity could only diverge once the directional information was presented. This pattern of activity suggests that rule neurons may be calling up the preferred rule representation and holding it online prior to the instruction to guide downstream areas to select particular actions.

These data are suggestive that the critical role for mPFC rule encoding is separate from the role for the conflict signaling neural populations. Medial PFC rule neurons may signal the shift between rules which quickly passes this information off to downstream areas such as the dorsal striatum to handle the specific behavioral responses. If this is true, neurons in dorsal striatum might also reflect some measure of abstract rule representation, and rule encoding should increase just as mPFC rule signaling decreases. Such a signal would allow mPFC to influence specific action selection in dorsal striatum, and would provide the necessary ''shift'' signal to striatal neurons. Indeed, two disconnection studies and a recent electrophysiological study support this notion. Functional disconnection of the ventral striatum (nucleus accumbens) from mPFC with bupivacaine in a strategy shifting task led to a significant increase in number of trials required to shift response strategies in a plus maze task (Block et al., 2007). Recently, disconnection of the prelimbic region of rat mPFC from the dorsomedial striatum was shown to disrupt cue-guided behavioral switching (Baker and Ragozzino, 2014). Importantly, this study demonstrated that contralateral disconnection of prelimbic cortex and dorsomedial striatum did not impact a cued-association task which did not require switching between rules, suggesting an important role in task switching for this circuit.

Another recent study also demonstrated a role for dorsomedial striatum in set-shifting, while identifying robust neural correlates of rule encoding in dorsal striatal neurons. Dorsal striatal direction neurons reflected directional conflict, though interestingly in opposite sign to the results observed in mPFC in this manuscript. Additionally, the directional conflict signal in mDS was resolved as animals improved performance, demonstrating a possible neural mechanism by which rules guiding behavior impact action selection (Bissonette and Roesch, 2015). Recent research has also supported a role for dorsal striatum in mediating more abstract aspects of cognition, potentially as a site for disruption due to psychiatric illness (Ragozzino, 2007; Miller et al., 2015). Perhaps mPFC conflict neuron activity levels reflect the need for attention on especially salient and challenging trial-types, providing an alerting function for other neural regions to dedicate more resources to resolving the trial at hand.

Interestingly, the selectivity of rule encoding neurons took several trials to develop and was stronger earlier in the block compared to later. Riding on top of this signal was a general increase in activity that was strongest early in the rule block, and waned over the trial block. This was true in both the preferred and non-preferred rule blocks, though activity was more robust in the preferred rule block, and improvement in behavioral success was correlated with decreasing neural firing in a neuron's preferred rule block. Changing rule encoding has been observed before Durstewitz et al. (2010), where prefrontal neural ensembles reorganized prior to rats shifting behavior. We observed the same phenomenon among our rule neurons, which rapidly modulated their neural activity to change between different rule states. In fact, the change in activity levels occurred well before rats reached criteria, and even preceded the 20-trial window in which rats reached behavioral criteria. These data imply that mPFC rule-neurons rapidly modify their activity levels to quickly reflect the changed contingency, and that behavioral adaptation may rely upon downstream areas receiving, recognizing, and storing this modified mPFC activity state.

A separate population of neurons was responsible for encoding not only the response direction, but also for representing the inherent conflict within the task. That is, directionally tuned neurons were more active on incompatible trial-types. These trial-types presumably require additional attentional resources to respond correctly, and increased directional signaling on incompatible trials may reflect the need for additional attentional resources. Such a signal may reflect mPFC's role in attentional tasks, especially the role for preparatory attention (Totah et al., 2012). In this case, attention for specific stimulus-responses may reflect not just the stimulus-response aspects of well-trained rats (Corbit and Balleine, 2003) but also the increased attention necessary to successfully complete more challenging trial-types. It may be the role of a subset of mPFC neurons to report the occurrence of an incongruous trial to downstream areas where potential directional responses may be more rigorously evaluated.

Across the board, neural activity during the decision period was diminished when rats made erroneous choices. Rule encoding in either the preferred or non-preferred rule blocks was diminished on error trials, compared to correct response outcomes. This pattern was also observed for the neurons modulated by the degree of conflict; cue-evoked activity was diminished prior to errors made in both the preferred and non-preferred directions. These data suggest that, when mPFC neurons are not actively engaged during task performance and do not accurately represent the correct rule block, or signal the presence of high conflict, rats were more likely to make an error.

Remarkably, both conflict neurons and rule neurons also were active during outcome epochs. Prefrontal cortical neurons are known to multiplex different forms of information across categories (Rainer et al., 1999; Cromer et al., 2010), and several studies have reported prediction error type responses in mPFC and ACC (Brown and Braver, 2005; Totah et al., 2009; Alexander and Brown, 2011; Bryden et al., 2011; Hayden et al., 2011). Activity during the outcome phase of this task represents a potential feedback mechanism by which mPFC neurons may modulate activity patterns based on outcomes and what preceded them (Laubach et al., 2015). Such signals may arise and/or inform ventral tegmental area (VTA) DA neurons whose activity also reflects RPEs (Schultz and Dickinson, 2000; Waelti et al., 2001; Roesch et al., 2007; Glimcher, 2011). This theory is consistent with anatomy showing that mPFC and VTA are reciprocally connected via mesocortical DA projections from VTA to mPFC (Gabbott et al., 2005; Bjorklund and Dunnett, 2007; Hoover and Vertes, 2007), and cortico-tegmental projections back to VTA (Carr and Sesack, 1996; Vertes, 2004).

Importantly, and unlike DA neurons, activity of mPFC reflects more than simple signed prediction errors. In neurons modulated by conflict and rule during the decision period, we observed higher activity for incompatible over compatible responses, and preferred over non-preferred rules, respectively. Elevated firing during high conflict (incompatible) trials might reflect unexpected reward delivery. That is, on incompatible trials, rats might not expect reward as strongly, thus, when it was delivered, it was surprising and elicited a strong positive prediction error. Although this theory is plausible, it is not consistent with the fact that reward predictions, as measured by anticipatory licking, did not significantly differ between correct compatible and incompatible trials. Unlike conflict signals, neurons that carried information about the current rule during the outcome phase cannot be readily explained in terms of unexpected reward delivery. In addition, rats did not lick

### References


differently during odor compared to light rule trial blocks. Taken together, these data suggest that single neurons in mPFC are informing downstream regions what direction was chosen, what rule was being followed and the degree of conflict associated with making that decision. Specifically, these results might be signaled to mDS which has been shown to signal action specific RPEs (Stalnaker et al., 2012).

In conclusion, this report is among the first to demonstrate neural correlates of distinct rules in rodent mPFC. Because of the controlled nature of this task, we were also able to separate direction from conflict encoding. Further, because rats were required to maintain head position at the central nosepoke we can clearly dissociate neural correlates related to rule encoding during the pre-cue and cue epochs as opposed to signals related to body position and already planned actions, which were only encoded after cues were presented. Importantly, these results suggest that neural representations of rules in mPFC are not directionally tuned and are strongest earlier in rule blocks when rules need to be distinguished. Other neurons participate in signaling the correct direction. In addition these neurons fire more strongly on high conflict trials, when the two rules opposed each other. Finally, all these task parameters—rule, conflict, and direction—were reflected after the decision, during the outcome phase of the task. Attentional set-shifting is complicated and requires several cognitive control functions that govern which rule should be followed, how much attention is necessary to shift and override competing responses, and whether or not the response that was just made was correct, difficult, and consistent with current rules. Remarkably, mPFC participated in all of these functions during performance of our set-shifting task.

## Author Contributions

G.B.B. and M.R. conceived the experiments, G.B.B. performed the experiments and analysis, and G.B.B. and M.R. wrote the manuscript.

# Funding

We would like to thank our funding source DA031695 (MRR).


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bissonette and Roesch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Infusion of D1 Dopamine Receptor Agonist into Medial Frontal Cortex Disrupts Neural Correlates of Interval Timing

#### Krystal L. Parker, Rafael N. Ruggiero and Nandakumar S. Narayanan\*

*Neurology, University of Iowa, Iowa City, IA, USA*

Medial frontal cortical (MFC) dopamine is essential for the organization of behavior in time. Our prior work indicates that blocking D1 dopamine receptors (D1DR) attenuates temporal processing and low-frequency oscillations by MFC neuronal networks. Here we investigate the effects of focal infusion of the D1DR agonist SKF82958 into MFC during interval timing. MFC D1DR agonist infusion impaired interval timing performance without changing overall firing rates of MFC neurons. MFC ramping patterns of neuronal activity that reflect temporal processing were attenuated following infusion of MFC D1DR agonist. MFC D1DR agonist infusion also altered MFC field potentials by enhancing delta activity between 1 and 4 Hz and attenuating alpha activity between 8 and 15 Hz. These data support the idea that the influence of D1-dopamine signals on frontal neuronal activity adheres to a U-shaped curve, and that cognition requires optimal levels of dopamine in frontal cortex.

#### Edited by:

*Gregory B. Bissonette, University of Maryland, USA*

#### Reviewed by:

*Min W. Jung, Korea Advanced Institute for Science and Technology, South Korea Bruno B. Averbeck, National Institute of Mental Health, USA*

#### \*Correspondence:

*Nandakumar S. Narayanan nandakumar-narayanan@uiowa.edu*

> Received: *17 August 2015* Accepted: *19 October 2015* Published: *10 November 2015*

#### Citation:

*Parker KL, Ruggiero RN and Narayanan NS (2015) Infusion of D1 Dopamine Receptor Agonist into Medial Frontal Cortex Disrupts Neural Correlates of Interval Timing. Front. Behav. Neurosci. 9:294. doi: 10.3389/fnbeh.2015.00294* Keywords: medial frontal cortex, dopamine, Parkinson's disease, interval timing, cognition

# INTRODUCTION

Frontal dopamine signaling is crucial for memory and cognition (Goldman-Rakic, 1998). Dysfunction of medial frontal dopamine is involved in diseases such as ADHD, schizophrenia, and Parkinson's disease (Cools et al., 2001; Abi-Dargham et al., 2002; Bellgrove et al., 2005; Narayanan et al., 2013a). Cognition requires optimal levels of frontal dopamine signaling (Cools and D'Esposito, 2011). There are two major classes of dopamine receptors: D1 and D2. Of these, D1 type dopamine receptors (D1DR) have been specifically implicated in cognition (Goldman-Rakic et al., 2004; Kim et al., 2015). Both agonists and antagonists of D1DRs impair neural correlates of cognitive processes such as working memory and attention (Williams and Goldman-Rakic, 1995; Granon et al., 2000; Vijayraghavan et al., 2007).

Here, we study how manipulating D1DRs influences neural activity during timing tasks. Timing is an executive function that requires working memory for temporal rules as well as attention to the passage of time (Parker et al., 2013a; Merchant and de Lafuente, 2014). Timing is wellsuited to study human diseases of cognition because it involves common dopamine-dependent frontal cortical mechanisms in humans and rodents (Merchant et al., 2008; Narayanan et al., 2013b; Parker et al., 2013a, 2015; Merchant and de Lafuente, 2014). We focus on the MFC because rodents lack lateral frontal regions and because of MFC's homologies between rodents and humans

**Abbreviations:** MFC, Medial frontal cortex; D1DR, D1 dopamine receptors; PD, Parkinson's disease.

(Laubach, 2011). Blocking D1DRs in the MFC impairs performance on interval timing tasks, in which subjects initiate a motor response several seconds after an instructional cue (Narayanan et al., 2012). In MFC, there are two prominent neuronal correlates of temporal processing: (1) ramping activity, or neuronal firing that consistently changes over the temporal interval (Durstewitz, 2003; Kim et al., 2013; Parker et al., 2014; Xu et al., 2014) and (2) low-frequency oscillations around ∼4 Hz triggered by the instructional stimuli (Parker et al., 2014, 2015). Blocking D1DRs attenuates both ramping activity and attenuates ∼4 Hz oscillationsHz (Parker et al., 2014). It is unclear how these two correlates of medial frontal cortex activity are influenced by MFC D1DR agonists. Previous data demonstrating that D1DR agonists impair cognitive performance and neuronal correlates of cognition in frontal cortex predict that D1DR agonists should attenuate both ramping activity and ∼4 Hz oscillations.

To test this idea, we recorded neural activity and field potentials from MFC following infusion of SKF82958 (Gilmore et al., 1995), a D1DR agonist, into MFC of animals performing an interval timing task. We report that focal infusion of D1DR agonist attenuates MFC ramping neuronal activity and increases cue-dependent delta activity. These data support the idea that temporal processing by single neurons, like mnemonic processing, requires optimal levels of frontal D1DR signaling.

# MATERIALS AND METHODS

Seven male Long-Evans rats (aged 2 months; 200–225 g) were trained to perform an interval timing task. Animals were motivated by water restriction, while food was available ad libitum. Rats consumed 10–15 mL of water during each behavioral session and additional water (5–10 mL) was provided 1–3 h after each behavioral session in the home cage. Single housing and a 12 h light/dark cycle were used; all experiments took place during the light cycle. Rats were maintained at ∼90% of their free-access body weight during the course of these experiments and received one day of free access to water per week. All procedures were approved by the Animal Care and Use Committee at the University of Iowa.

Rats were trained in interval timing tasks using standard operant procedures described in detail previously (Parker et al., 2014, 2015). First, animals learned to make operant lever presses to receive liquid rewards. After fixed-ratio training, animals were trained in a 12 s fixed-interval timing task in which rewards were delivered for responses after a 12 s interval (**Figure 1A**). Rewarded presses were signaled by a click and an "off " houselight. Each rewarded trial was followed by an intertrial interval of 6–12 s, randomly chosen. Intertrial intervals concluded with an "on" houselight signaling the beginning of the next trial. Early responses occurring before 12 s were not reinforced. The houselight was turned on at trial onset and lasted until the onset of the intertrial interval.

Interval timing performance was evaluated by the timing efficiency or the number of lever presses occurring in the 11– 12 s, timed interval in comparison to all lever presses occurring from 1 to 12 s of the trial. Curvature was also used to evaluate performance during the interval timing task. Curvature indexes

increase as animals' responses are guided by time and measure the deviation from the cumulative response record of a straight line (Fry et al., 1960; Parker et al., 2015). Curvature of timeresponse histograms is a robust measure of animals' timing as it is independent of response rate as responses are controlled by time (Caetano and Church, 2009; Parker et al., 2014).

Rats trained in the 12 s interval timing task were implanted with a microwire electrode array and a 33-gauge infusion cannula (Plastics One) in the MFC according to procedures described previously (Parker et al., 2014). Briefly, animals were anesthetized using Ketamine (100 mg/kg) and Xylazine (10 mg/kg). A surgical level of anesthesia was maintained with ketamine supplements (10 mg/kg). Under aseptic surgical conditions, the scalp was retracted, and the skull was leveled between bregma and lambda. A single craniotomy was drilled over the area above the MFC and four holes were drilled for skull screws. A microelectrode array was implanted in MFC (coordinates from bregma: AP: +3.2, ML ± 1.2, DV −3.5 @ 12◦ in the lateral plane). Electrode ground wires were wrapped around the skull screws. The infusion cannula was then lowered to target the neurons being recorded (coordinates from bregma: AP: +0.3, ML ± 1.0, DV −4.6 @ 40◦ in the lateral plane; targeting bregma coordinates AP +3.2, ML ± 1.0, DV −3.4 in the center of the recording array). The craniotomy was sealed with cyanoacrylate ("SloZap," Pacer Technologies, Rancho Cucamonga, CA) accelerated by "ZipKicker" (Pacer Technologies), and methyl methacrylate (i.e., dental cement; AM Systems, Port Angeles, WA). Following implantation, animals recovered for one week before being reacclimatized to behavioral and recording procedures.

At the beginning of recording experiments, rats received a saline infusion into the MFC ∼45 min prior to neurophysiological recording according to procedures described previously (Parker et al., 2013b, 2014). On the subsequent day rats received an infusion of the D1DR agonist SKF82958 into the MFC. Infusion was conducted by inserting an injector into the guide cannula and 0.5µL of infusion fluid was delivered at a rate of 30µL/h (0.5µL/min) via a syringe infusion pump (KDS Scientific, Holliston, MA). After injections were complete, the injector was left in place for 2 min to allow for diffusion. Statistical comparisons between saline and MFC SKF82958 infusion sessions made no assumption that identical neurons were recorded on subsequent days.

Neuronal ensemble recordings in the MFC were made using a multi-electrode recording system (Plexon, Dallas, TX). Putative single neuronal units were identified on-line using an oscilloscope and audio monitor. The Plexon off-line sorter was used to analyze the signals after the experiments and to remove artifacts. Spike activity was analyzed for all cells that fired at rates above 0.1 Hz. Statistical summaries were based on all recorded neurons. No subpopulations were selected or filtered out of the neuron database. Wide-band signals were recorded using wide-band boards with bandpass filters between 0.07 and 8000 Hz and sampled at 40,000 Hz. Principal Component Analyses (PCA) and waveform shapes were used for spike sorting. Single units were identified as having (1) consistent waveform shape, (2) separable clusters in PCA space, (3) a consistent refractory period of at least 2 ms in interspike interval histograms, and (4) consistent firing rates around behavioral events (as measured by a runs test of firing rates across trials around behavioral events; neurons with |z| scores > 4 were considered "nonstationary" and were excluded). Analysis of neuronal activity and quantitative analysis of basic firing properties were carried out using NeuroExplorer (Nex Technologies, Littleton, MA), and with custom routines for MATLAB. Peri-event rasters and average histograms were constructed around light on, lever release, lever press, and lick. Microwire electrode arrays were comprised of 16 electrodes. In each animal, one electrode without single units was reserved for local referencing, yielding 15 electrodes per rat. Local field potentials (LFPs) were recorded from 4 of these electrodes per rodent. LFP channels were analog filtered between 0.7 and 100 Hz online, sampled at 1000 Hz and recorded in parallel with single unit channels using a wide-band board. Consistent with our prior work, although examples of individual neurons are shown under different drug conditions (control and MFC D1DR agonist), our statistical analyses assume that these populations of neurons are independent (Narayanan and Laubach, 2006, 2008; Narayanan et al., 2013b).

We defined ramping activity as firing rate that progressed uniformly over the interval. We quantified this in two ways: linear regression and PCA. Ramping neurons are described as those with a significant relationship of firing rates over trials vs. time in a linear regression model (Kim et al., 2013). For regression, firing rates were binned into 2 s bins. Secondly, PCA was used to identify dominant patterns of neuronal activity using orthogonal basis functions from peri-event histograms during the 12 s interval (Paz et al., 2005; Narayanan and Laubach, 2009; Bekolay et al., 2014; Narayanan and Laubach, 2014). All neurons from 6 animals per session (control and MFC D1DR agonist sessions) were included in PCA, and the first 500 ms of the interval was excluded due to stimulus-related activity. The same principal components were projected onto control and MFC SKF sessions, and the weights were compared via a t-test (Chapin and Nicolelis, 1999; Narayanan and Laubach, 2014).

Time-frequency calculations were computed using customwritten Matlab routines (Cavanagh et al., 2009). Time-frequency measures were computed by multiplying the fast Fourier transformed (FFT) power spectrum of LFP data with the FFT power spectrum of a set of complex Morlet wavelets (defined

as a Gaussian-windowed complex sine wave: e <sup>i</sup>2πtf e − t 2 2xσ 2 , where t is time, f is frequency (which increased from 1 to 50 Hz in 50 logarithmically spaced steps), and defines the width (or "cycles") of each frequency band, set according to 4/(2πf)), and taking the inverse FFT. The end result of this process is identical to time-domain signal convolution, and it resulted in: (1) estimates of instantaneous power (the magnitude of the analytic signal), defined as Z[t] (power time series: p(t) = real[z(t)]<sup>2</sup> + imag[z(t)]<sup>2</sup> ); and, (2) phase (the phase angle) defined as = arctan(imag[z(t)]/real[z(t)]). Each epoch was then cut in length surrounding the event of interest (−500 to +2000 ms). Power was normalized by conversion to a decibel (dB) scale (10<sup>∗</sup> log10[power(t)/power(baseline)]) from a pre-stimulus baseline of −500 to −300 ms, allowing a direct comparison of effects across frequency bands. Statistical significance was computed via a paired t-test comparing saline and D1DR agonist sessions in 3 three frequency bands, delta (1–4 Hz), theta (4–8 Hz), and alpha (8–15 Hz).

When experiments were complete, rats were sacrificed by injections of 100 mg/kg sodium pentobarbital, and transcardially perfused with 10% formalin. Brains were post fixed in a solution of 10% formalin and 20% sucrose before being sectioned on a freezing microtome. Brain slices were mounted on gelatinsubbed slides and stained for cell bodies using DAPi. Histological reconstruction was completed using post mortem analysis of electrode and cannula placements and confocal microscopy in each animal. These data were used to determine electrode and cannula placement within the MFC.

#### RESULTS

We trained seven rats to perform an interval timing task and implanted recording electrodes and an infusion cannula into the MFC (**Figures 1A,B**). After recovery from surgery, we focally infused the D1DR agonist SKF82958 into the MFC (MFC D1DR agonist sessions) prior to neuronal recordings during interval timing tasks. Compared to saline sessions, rodents had similar numbers of lever presses in MFC D1DR agonist infusion sessions (153.2 ± 16.8 vs. 141.5 ± 24.7 in saline sessions; **Figure 2A**), and acquired similar numbers of rewards (82.8 ± 5.3 vs. 75.8 ± 12.1 in saline sessions **Figure 2B**). MFC D1DR agonist infusion significantly decreased how efficiently animals responded at interval end [% of responses between 11 and 12 s; 0.08 ± 0.2 vs. 0.20 ± 0.03 in saline sessions; t(5) = 3.4, p < 0.02; **Figure 2D**]. Interval timing performance was measured using a curvature index where a higher curvature corresponds to higher deviations from a straight line (24). Here, we found a flatter curvature of time-response histogram in MFC D1DR agonist sessions in comparison to saline sessions [0.23 ± 0.04 vs. 0.32 ± 0.03; t(5) = 2.63, p < 0.05; **Figure 2C**]. Taken together, these data indicate that focal MFC D1DR agonist decreases how efficiently animals guide their responses in time without changing lever pressing or motivation (**Figure 2E**).

In seven animals, we recorded 41 neurons in sessions with MFC D1DR agonist, and 47 neurons in saline sessions. Focal

MFC SKF82958 infusion did not change overall firing rate (2.9 ± 0.5 vs. 3.6 ± 0.9 Hz in saline sessions). We identified stimulusrelated neurons and neurons related to lever pressing by paired t-tests of firing rate before and after stimulus and lever press (Parker et al., 2014). By these criteria, similar fractions of MFC neurons were stimulus-related in MFC D1DR agonist sessions compared to saline sessions (stimulus: 3 vs. 4 in saline sessions; press: 5 vs. 4 in saline sessions). These results provide evidence that SKF82958 did not change the basic neuronal properties of MFC.

Ramping activity, or neuronal activity that consistently increases or decreases over a temporal interval, is a key correlate of temporal processing in MFC (**Figure 3A**) (Durstewitz, 2003; Kim et al., 2013). Our recent work has shown that MFC D1DR blockade attenuates ramping activity (Parker et al., 2014). Here we investigated how MFC D1DR agonists influence ramping activity. We identified ramping activity using linear regression to identify neurons with a significant linear fit. Only 1 neuron had a significant linear fit in MFC D1DR agonist sessions, compared with 7 neurons in saline sessions (2 vs. 15% in saline sessions; χ <sup>2</sup> = 4.3, p < 0.04; **Figure 3B**). When neural data was shuffled in time, no significant differences in ramping neurons were observed (3 neurons/6% in saline sessions vs. 2 neurons/5% in SKF; χ <sup>2</sup> = 0.09, p < 0.76). Ramping activity is also readily identified by principal component analysis, in which ramping components are PC1 (Narayanan and Laubach, 2009; Bekolay et al., 2014; Parker et al., 2014). Consistent with prior work, ramping activity was PC1 and explained 28% of variance among neuronal ensembles in saline sessions (**Figure 3C**). PC1 loaded

FIGURE 3 | MFC D1DR agonist attenuates ramping activity. (A) Example of an MFC ramping neuron recorded in control (blue) sessions. In green, the same putative neuron from the control session is shown in MFC D1DR agonist sessions. Top plot is a raster plot depicting the activity of a single neuron selected from each condition. Each row is a trial from the experiment and each dot is an action potential from a single neuron. (B) There was significantly less ramping activity in MFC D1DR agonist sessions, as identified by the number of neurons with a significant linear fit via regression. Statistical comparisons assumed that independent populations were recorded in control and D1DR blockade sessions. (C) Principal component analysis in control sessions revealed that ramping activity was the most prominent pattern of neural activity among MFC neurons (PC1). (D) To directly compare ramping activity, we projected PCs from control sessions onto MFC D1 agonist sessions. PC1 explained significantly less variance in MFC D1DR agonist sessions, while PC2 and 3 were unchanged. Asterisk represents significance at *p* < 0.05 via a *t*-test. All statistics treated each session independently. Taken together, these data suggest that MFC D1DR agonist attenuates ramping activity of neurons.

more significantly onto saline sessions compared to MFC D1DR agonist sessions [**Figure 3D**; t(86) = 2.3, p < 0.03]. Taken together, these data indicate that MFC D1DR agonists attenuated ramping activity in MFC.

During timing tasks, there is a burst of ∼4 Hz field potential activity after the instructional stimulus during timing tasks (Parker et al., 2014, 2015). This activity is dependent on MFC dopamine and attenuated with D1DR blockade (Parker et al., 2014, 2015). We examined how this activity changed in MFC D1DR agonist sessions among 17 LFP channels across 7 animals. Consistent with prior work in saline sessions, the instructional stimulus was followed by a burst of delta (1–4 Hz), theta (4–8 Hz) and alpha (8–15 Hz) activity (**Figures 4A,B**). In MFC D1DR agonist sessions, delta activity increased [t(16) = 2.7, p < 0.02; 1–4 Hz 0–1 s after the cue], theta activity did not change (4–8 Hz 0–1 s after the cue), and alpha activity was attenuated [t(16) = 3.1, p < 0.01; 8–12 Hz 0–1 s after the cue], when compared to saline sessions (**Figures 4C,D**). Taken together, these data indicate that D1DR agonist SKF82958 in MFC disrupts performance of

FIGURE 4 | MFC D1DR agonists influence oscillatory patterns of MFC LFP during interval timing tasks. (A) Event-related potentials from all LFP channels in 7 rodents (17 channels) revealed a stimulus-triggered peak the MFC after stimulus onset. MFC ERP is unaffected by MFC D1DR agonists. (B) Time-frequency analysis revealed a burst of activity from 4 to 15 Hz following the onset of the cue. (C,D) Infusions of MFC D1DR agonist significantly increased delta activity between 1 and 4 Hz and attenuated alpha activity between 8 and 15 Hz while theta activity was unaffected. Asterisk represents significance at *p* < 0.05 via a *t*-test.

interval timing tasks as well as neuronal correlates of temporal processing in MFC.

# DISCUSSION

In the present manuscript, we recorded MFC neuronal ensembles while focally infusing the D1DR agonist SKF82958 into MFC during performance of an interval timing task. We found that MFC SKF82958 infusion decreased the efficiency of animals' responses during interval timing, attenuated MFC ramping activity, and altered MFC field potentials by enhancing delta activity between 1 and 4 Hz and attenuating alpha activity between 8 and 15 Hz. These data, in combination with our prior work, suggest that optimal frontal D1DR signaling is central to the temporal control of action (Parker et al., 2013b, 2014, 2015).

Combined with our work showing that MFC D1DR blockade disrupts MFC activity, our result is consistent with the idea that fluid cognition requires optimal dopamine in frontal cortex (Goldman-Rakic, 1998). This "U-shaped curve" has been shown for D1DR signaling for cognitive tasks in rats and primates (Zahrt et al., 1997; Granon et al., 2000; Vijayraghavan et al., 2007). In particular, both performance of working memory tasks as well as mnemonic delay activity of frontal neurons requires optimal frontal D1DR (Vijayraghavan et al., 2007). Optimal levels of dopamine are critical for efficient neural transmission in frontal cortex (Kroener et al., 2009).

We extend this line of work to two neuronal correlates of interval timing: ramping activity and low-frequency oscillations. Ramping activity encodes temporal processing in parietal, temporal, and frontal brain regions (Reutimann et al., 2004; Janssen and Shadlen, 2005; Narayanan and Laubach, 2009; Kim et al., 2013). Our results suggest that ramping activity predicts response time on a trial-by-trial basis (Parker et al., 2014) and like delay-activity during working memory tasks, depends on optimal MFC D1DR dopamine signaling. With both MFC D1DR antagonist SCH23390 and MFC D1DR agonist SKF82958, ramping activity is broadly attenuated, clearly demonstrating that ramping signals in MFC adhere to the same U-shaped curve as mnemonic activity during working memory tasks TABLE 1 | Summary of MFC spectral activity in delta, theta and alpha bands 0–1 s after cue onset.


*SCH data are extracted from previously published dataset in Parker et al. (2014).*

from primate lateral frontal cortex (Goldman-Rakic et al., 2004; Vijayraghavan et al., 2007). It is unclear, however, why ramping activity is attenuated. During timing tasks, dopamine neurons phasically fire to reward-predictive stimuli at the beginning of the task (Kobayashi and Schultz, 2008). However, endogenous, tonic dopamine levels can also affect frontal function (Kroener et al., 2009; Cools and D'Esposito, 2011). Future studies will manipulate dopamine neurons while recording from frontal neuronal ensembles to address this issue.

While ramping activity adheres to a U-shaped curve, dopamine's influence on spectral properties of MFC field potentials is more complex (**Table 1**). Delta activity was increased with MFC D1 agonists, while alpha was attenuated and theta was unchanged. MFC ramping neurons are coherent with ∼4 Hz activity (Narayanan et al., 2013b; Parker et al., 2014). In this study, delta activity was increased but ramping activity did not improve, indicating that coupling between ramping neurons and ∼4 Hz activity might also depend on optimal D1DR signaling. Our data suggest that only alpha activity strictly follows a U-shaped curve (**Figure 3**; **Table 1**). Alpha activity in frontal cortex has been associated with top-down processing (Sauseng et al., 2005), and the decreased alpha we see in this study may be reflective of decreased executive control in MFC D1DR agonist sessions.

Low-frequency oscillations are a key mechanism of cognitive control (Cavanagh and Frank, 2014). Rodents and humans have common low-frequency activity in delta, theta, and alpha bands during cognitive tasks (Narayanan et al., 2013b; Parker et al., 2015). These spectral activities require MFC dopamine (Parker et al., 2015). Because low-frequency oscillations can be readily observed with scalp-EEG, finding that they are sensitive to optimal frontal dopamine are of particular translational significance. In humans with Parkinson's disease, frontal dopamine may be facilitated early in the disease and profoundly influenced by the treatment (Cools et al., 2001, 2010). Several human diseases, such as schizophrenia, ADHD, and drug addiction (Abi-Dargham et al., 2002; Heijtz et al., 2007) involve dysfunctional frontal dopamine signaling. Our findings predict that in human diseases with disrupted frontal D1DR dopamine signaling, spectral activity in MFC will also be disrupted (Parker et al., 2015).

These data involve several limitations. First, we administered a pharmacological agonist with a complex receptor profile (Gilmore et al., 1995). Dopamine signaling is complex and depends on state, history, and network properties (Seamans and Yang, 2004). MFC D1DR agonists also likely cause rapid receptor internalization (Ryman-Rasmussen et al., 2005). D1DR are G-Protein receptors with diverse intracellular targets via

### REFERENCES


cAMP (Kim et al., 2015). D1DR signaling can potentiate inputs of local cortical networks onto pyramidal neurons (Yang and Seamans, 1996). Thus, the functional outcome of MFC D1DR agonists on neuronal activity is quite complex. When specifically stimulating frontal neurons expressing D1DRs, interval timing performance is not disrupted but slightly enhanced (Narayanan et al., 2012). Future studies will explore the detailed action of D1DRs signaling aimed at new therapies for human diseases.

## FUNDING

This work was funded by The National Institute of Neurological Disorders and Stroke R01 NS078100/K08 NS078100, The National Institute of Mental Health, NARSAD Brain and Behavior Foundation, and grant #2014/22817-1, São Paulo Research Foundation (FAPESP).

properties and therapeutic implications. Neuropharmacology 34, 481–488. doi: 10.1016/0028-3908(95)00014-W


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Parker, Ruggiero and Narayanan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cholinergic and ghrelinergic receptors and KCNQ channels in the medial PFC regulate the expression of palatability

Marc A. Parent 1,2 , Linda M. Amarante<sup>3</sup> , Kyra Swanson<sup>3</sup> and Mark Laubach<sup>3</sup> \*

<sup>1</sup> The John B. Pierce Laboratory, New Haven, CT, USA, <sup>2</sup> Department of Neurobiology, Yale University School of Medicine, New Haven, CT, USA, <sup>3</sup> Department of Biology and Center for Behavioral Neuroscience, American University, Washington, DC, USA

The medial prefrontal cortex (mPFC) is a key brain region for the control of consummatory behavior. Neuronal activity in this area is modulated when rats initiate consummatory licking and reversible inactivations eliminate reward contrast effects and reduce a measure of palatability, the duration of licking bouts. Together, these data suggest the hypothesis that rhythmic neuronal activity in the mPFC is crucial for the control of consummatory behavior. The muscarinic cholinergic system is known to regulate membrane excitability and control low-frequency rhythmic activity in the mPFC. Muscarinic receptors (mAChRs) act through KCNQ (Kv7) potassium channels, which have recently been linked to the orexigenic peptide ghrelin. To understand if drugs that act on KCNQ channels within the mPFC have effects on consummatory behavior, we made infusions of several muscarinic drugs (scopolamine, oxotremorine, physostigmine), the KCNQ channel blocker XE-991, and ghrelin into the mPFC and evaluated their effects on consummatory behavior. A consistent finding across all drugs was an effect on the duration of licking bouts when animals consume solutions with a relatively high concentration of sucrose. The muscarinic antagonist scopolamine reduced bout durations, both systemically and intra-cortically. By contrast, the muscarinic agonist oxotremorine, the cholinesterase inhibitor physostigmine, the KCNQ channel blocker XE-991, and ghrelin all increased the durations of licking bouts when infused into the mPFC. Our findings suggest that cholinergic and ghrelinergic signaling in the mPFC, acting through KCNQ channels, regulates the expression of palatability.

Keywords: KCNQ, muscarinic, ghrelin, reward, licking, prefrontal

# INTRODUCTION

Consummatory behavior modulates neuronal activity in the medial prefrontal cortex (mPFC) of rats and primates (Petykó et al., 2009, 2015; Bouret and Richmond, 2010; Horst and Laubach, 2012, 2013). For example, a recent study from our group (Horst and Laubach, 2013) found that population activity in the rostral prelimbic cortex was strongly modulated at the moment when rats initiated licking. These changes in spike activity were coterminous with 4–8 Hz phase locking in simultaneously recorded field potentials. It is possible that these signals are used to monitor the consequences

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

Ranier Gutierrez, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Mexico Alfredo Fontanini, Stony Brook University, USA

> \*Correspondence: Mark Laubach mark.laubach@american.edu

Received: 28 August 2015 Accepted: 08 October 2015 Published: 26 October 2015

#### Citation:

Parent MA, Amarante LM, Swanson K and Laubach M (2015) Cholinergic and ghrelinergic receptors and KCNQ channels in the medial PFC regulate the expression of palatability. Front. Behav. Neurosci. 9:284. doi: 10.3389/fnbeh.2015.00284 of ongoing orolingual actions and are integrated with gustatory information, which has recently been shown to be encoded by neurons in the mPFC (Jezzini et al., 2013), to control rewardguided behaviors.

In a related recent study, we developed an operant incentive contrast task to study how rats learn to maximize consumption of rewarding solutions relative to less rewarding options (Parent et al., 2015). Pharmacological and optogenetic inactivations of the rostral prelimbic area shortened bouts of licking when rats consumed relatively high, but not low, levels of sucrose. Classic (Davis, 1973) and more recent investigations (e.g., Dwyer, 2012) have established that the duration of licking bouts in animals ingesting varying quantities of sucrose reflect the relative reward value (aka subjective value) of the solutions. Following inactivation of the mPFC, animals responded as if they were naive to the task. We interpreted these findings as evidence for the rostral mPFC being crucial for the expression of incentive contrast and for the deployment of learned feeding strategies.

In the incentive contrast task, performance depends on the animal's ability to attend to changes in reward value (stop consumption when solution switches to the low value) and their motivation to consume (drive to consume a rewarding solution of high caloric content). Attention and motivation are both partially driven by the influences of acetylcholine (Voytko, 1996; Robbins, 2002; Chudasama et al., 2004; Bloem et al., 2014) and ghrelin (Kojima et al., 1999; Nakazato et al., 2001), respectively, in the brain. The mPFC has receptors for both of these neuromodulatory neurotransmitters (van der Zee and Luiten, 1999; Hou et al., 2006; Mani et al., 2014). Activation of muscarinic acetylcholine receptors (mAChRs) has been shown to change neuronal excitability (Brown and Passmore, 2009; Santini and Porter, 2010) via activation of the Gq/G11-PLC-linked intracellular cascades (Suh and Hille, 2002; Zhang et al., 2003; Delmas and Brown, 2005). This cascade ultimately increases the excitability of neurons via closure of Kv7 (KCNQ) potassium channels and inhibition of the M-current. Modulation of Mcurrents in mPFC specifically has been shown to regulate mPFC dependent behaviors, such as in fear conditioning tasks (Santini and Porter, 2010). Increases in neuronal excitability by mAChRs may increase the influence of synaptic input to this region and provide an efficient mechanism for engagement of mPFC during arousal and attention. To our knowledge, no group has examined the role of mACh receptors or KCNQ channels in the control of consummatory behavior by the mPFC.

The orexigenic peptide ghrelin has recently been shown to enhance the excitability of dopamine neurons in the substantia nigra pars compacta (Shi et al., 2013). The G-proteinmediated activation of the intracellular pathway responsible for modulation of KCNQ channels by mAChR overlaps with ghrelinergic modulation of excitability. Activation of the ghrelin receptor—growth hormone secretagogue receptor (GHS-R)—triggers activation of the same intracellular pathway, and ultimate closure of KCNQ channels (Shi et al., 2013), as mAChRs. The published functional impact of mPFC Mcurrent manipulation on excitability and behavior, together with the potential co-regulation of these same effector KCNQ channels by mAChR and GHS-R, suggests that consumption in a task that is dependent on mPFC may be regulated by both of these neurotransmitter systems. As with the muscarinic system described above, no group has examined the role of ghrelin receptors in the mPFC with regard to the control of consummatory behavior.

Here, we demonstrate that both systemic and mPFC infusions of the muscarinic receptor antagonist scopolamine decreased the duration of licking bouts during access to high value sucrose solutions when provided alternating access to high and low value solutions. These results are similar to what has been previously reported following reversible inactivation of mPFC (Parent et al., 2015), and suggest that blocking mACh receptors with scopolamine disrupts the same elements of neuronal processing that is similarly affected by total cortical inactivation via muscimol. Exactly the opposite result was obtained when cholinergic tone was enhanced locally in mPFC with infusion of the cholinesterase inhibitor physostigmine (aka eserine), activation of mAChR with the mAChR agonist oxotremorine, and blocking KCNQ channels linked to mAChR receptors with XE-991. Furthermore, infusion of ghrelin, which acts on the same KCNQ channels as the muscarinic system enhanced the same measure of palatability (bout duration) only when the high value sucrose solution was available. All four of these manipulations effectively block KCNQ channels and increases neuronal excitability (at least in brain slices, Guan et al., 2011; Pafundo et al., 2013). These stimulatory manipulations all selectively increased the duration of licking bouts when a high value sucrose reward was available, and had no impact on licking for a lower value solution. As the duration of licking bouts is thought to reflect the palatability (or hedonic value) of ingested fluids (Davis, 1973), the present study is the first to implicate cholinergic and ghrelinergic signaling in the mPFC, acting through KCNQ channels, in the expression of palatability.

# MATERIALS AND METHODS

All procedures carried out in this set of experiments were approved by the Animal Use and Care Committee of the John B. Pierce Laboratory and conform to the guidelines set forth for the Ethical Treatment of Animals (National Institutes of Health).

### Animals

Twenty-five Long-Evans rats of 350–450 grams were used in this study. Animals were housed individually and kept on a 12/12 h light/dark cycle switching at 7:00 AM and 7:00 PM. Upon arrival, animals were given 1 week of habituation to their new environment with free access to rat chow followed by daily handling for 1 week. After habituation and initial daily handling, animals had regulated access to food to maintain their body weights at approximately 90% of their free-access weights. Rats typically received 14–18 g of food each day around 5 pm and were weighed daily throughout the period of training and testing in the incentive contrast licking task. Animals had free access to water throughout the experiments. Of the rats used in this study, three rats were removed either due to improper surgical placement of cannulas or drastic changes in behaviors following central infusions that permanently altered baseline behavioral performance following multiple drug infusions.

# Behavioral Apparatus

All animals were trained in sound-attenuating behavioral boxes (ENV-008; Med Associates) containing a single horizontally placed spout located on one wall at 6.5 cm from the floor and a house light at the top of the box. Control of pumps and behavioral quantification was done using a MedPC system version IV (Med Associates). The licking spout was custom built to allow the convergence of two independent solution lines stemming from two independent pumps at a single point (John B. Pierce Laboratory Instruments Shop). Licking was tracked optically as breakage of an infrared beam by the tongue between a custom built emitter/detector placed directly in front of the licking spout (John B. Pierce Laboratory Instruments Shop). Movement of the animal during licking was restricted via placement of two walls on either side of the spout. Solution lines were connected to 60 ml syringes and solution was made available to animals by lick-triggered, single speed pumps (PHM-100; Med Associates) which drove syringe plungers. Each lick activated a pump which delivered roughly 0.029 ml of fluid per pump activation, or an average of 9.7 microliters of fluid per lick.

# Behavioral Task

The incentive contrast licking task used in these experiments is the same as described previously (Parent et al., 2015). Briefly, animals were placed into the operant chamber for 30 min and had constant access to the spout. Two independent pumps delivered sucrose solution to the same spout and were loaded with syringes containing either high value 20% sucrose solution (wt/vol) or low value 4% sucrose solution (wt/vol). After animals were placed into the behavioral box, the MedPC script was started causing the house light to turn on. Licking at the spout initiated a 30-s epoch of access to the high value solution. Each lick was recorded and a lick occurring after the end of the 30-s epoch triggered a 30-s epoch of access of low value sucrose. These epochs of access continually switched back and forth between pumps and provided alternating access to high and low value solutions. At the end of the 30 min session, the house light turned off and animals stopped receiving sucrose solution. Quantification of behavior was implemented via analysis of both licking counts and metrics of licking microstructure such as duration of licking bouts, number of licking bouts, and intra-bout licking rates.

# Behavioral Data Analysis

Analysis of licking was carried out via custom scripts written in MATLAB. Detection and quantification of licking bouts were done as in previous studies (Gutierrez et al., 2010; Horst and Laubach, 2013; Parent et al., 2015). Specifically, bouts were defined as having at least three licks within 300 ms and with an inter-bout interval of 0.5 s or longer. The first 10 epochs during each behavioral session were used to analyze licking microstructure. Statistical analyses were performed using R<sup>1</sup> .

# Surgery

Prior to cannulation, animals were given 2–3 days of free access to rat chow and water. Animals were initially anesthetized using 3.5% isoflurane gas followed by intraperitoneal injections of ketamine and xylazine. The scalp was shaved clean and animals were injected with a bolus of carprofen subcutaneously. Animals were placed into a stereotaxic frame using non-penetrating ear bars and the skull was covered with iodine for 1 min. Iodine was wiped clean from the scalp and the eyes were covered with ophthalmic ointment to prevent drying over the span of the surgery. Lidocaine (0.3 ml) was injected under the scalp and an incision was made longitudinally along the skull. The skin was retracted laterally and all tissue was cleaned from the surface of the scalp. The skull was leveled by adjusting the stereotaxic apparatus to ensure bregma and lamda were within the same horizontal plane. Four screw were placed in the parietal skull plates for support of the guide cannulas. Craniotomies were drilled bilaterally in the frontal skull plates over the medial prefrontal cortex and 26 gauge guide cannulas with dummy cannulas were inserted into the medial prefrontal cortex at 1 mm dorsal to the target coordinate (AP: +3.6; ML: ±1.4 @ 12◦ from the midsagittal plane; DV: −4.0). Later, 33 gauge injection cannulas were used which extended 1 mm past the tip of the guide cannulas. Craniotomies were sealed and implants initially secured with cyanoacrylate and accelerator. The entire intra-cranial implants were then secured to the skull crews and covered with methyl methacrylate dental cement. Skin surrounding the implant was cleaned and maintained taut via placement of a metal suture placed posteriorly to the implant. The wound was covered in antibiotic ointment and rats were injected with intraperitoneally the antibiotic enrofloxacin.

Following surgery, once animals were able to maintain an upright posture and move around the recovery cage, the animals were placed back into the animal housing facility and were provided water containing the enrofloxacin antibiotic as well as carprofen for pain management for 2 days. Full access to food was provided. Animals were checked and weighed daily for 1 week following surgery. To prevent the removal of dummy cannulas during grooming, Kwik Cast silicon sealant was placed over the dummy cannula caps and removed when access to the cannulas was needed. After 1 week, animals' body weights returned to presurgical levels, restricted access to rat chow was reinstated, and animals continued with daily behavioral testing sessions.

# Drug Infusions

Following recovery from surgery and a period of retraining in the task with restricted food access, a series of controls were performed on all rats. First, animals were exposed to the same duration and levels of isoflurane gas used during infusion of drug on test day as an initial gas control session. Second, a PBS control was carried out where the same volume of vehicle without drug was injected intraperitoneally or infused into the mPFC while the animals were anesthetized under isoflurane gas. Finally, on test day, animals were anesthetized via isoflurane gas and drug was injected intraperitoneally or infused centrally into the mPFC. Following test day, recovery sessions were carried out. Each rat

<sup>1</sup>http://www.r-project.org

Frontiers in Behavioral Neuroscience | www.frontiersin.org October 2015 | Volume 9 | Article 284 |

received between 1 and 4 sessions of drug infusions during the time of this study, and took on average 2.6591 (SD = 1.8165) sessions to recover back to a baseline level of performance on the task.

Drugs used in this study included scopolamine, physostigmine, oxotremorine, XE-991, and ghrelin. All drugs were obtained from Tocris and made into solutions using sterile PBS with pH 7.4. Doses were based on published studies: systemic scopolamine—Sánchez-Resendis et al. (2012); intracortical scopolamine—Santini et al. (2012); physostigmine—Herremans et al. (1997); oxotremorine—Desai and Walcott (2006); XE-991—Santini and Porter (2010); ghrelin—Naleid et al. (2005).

#### Confirmation of Cannula Placement

At the termination of experiments, animals were initially anesthetized with isoflurane gas and injected intraperitoneally with Euthasol. Animals were transcardially perfused first with 200 ml of cold saline solution followed by 200 ml of cold 4% paraformaldehyde. Brains were removed and post-fixed in a mixture containing 4% paraformaldehyde, 20% sucrose, and 20% glycerol. Brains were then cut into 100 µm-thick coronal slices using a freezing microtome. Brain sections were mounted onto gelatin-coated slides and Nissl stained via treatment with thionin. Thionin-treated slices were dried through a series of alcohol steps and cleared with Xylene. Slides were covered with permount and coverslipped. Sections were imaged using a Tritech Research scope (BX-51-F), Moticam Pro 282B camera, and Motic Images Plus 2.0 software. The most ventral point of the injection bolus was compared against the Paxinos and Watson atlas to confirm coordinates.

### RESULTS

# Systemic Effects of the Muscarinic Antagonist Scopolamine

Scopolamine was administered systemically over a range of doses (PBS, 0.1 mg/kg, 0.3 mg/kg, 1.0 mg/kg) with intraperitoneal (IP) injections. Independent, one-way repeated measures analysis of variance (ANOVAs) were performed between control and drug administration sessions on epochs of access to high and low value sucrose. ANOVAs were carried out on descriptors of licking microstructure (e.g., mean duration of licking bouts, mean number of bouts) and mean lick counts across all 30-s epochs within a daily 30 min session. Global metrics of consummatory behavior were also tested with ANOVAs including total licks within a daily session and time spent engaged in the task prior to satiation. Effects of drugs were compared across sessions to avoid potential confounding factor of satiety.

Clear effects of systemic scopolamine were apparent across the range of doses that were tested. During the 30-s epochs with access to either high or low levels of liquid sucrose, there was a significant decrease in mean licks per epoch (**Figure 1A**; HVS: [F(3,24) = 17.21, p < 0.001], LVS: [F3,24] = 5.84, p < 0.01), number of bouts per epoch (**Figure 1B**; HVS: [F(3,24) = 19.18, p < 0.001], LVS: [F3,24] = 4.02, p < 0.05), and duration of licking bouts (**Figure 1C**; HVS: [F(3,24) = 6.37, p < 0.01], LVS: [F3,24] = 5.77, p < 0.01) with increasing doses of scopolamine. There was also a significant decrease in the total number of licks (**Figure 1D**; [F(3,24) = 7.04, p < 0.01]) and a slight, yet insignificant, increase in the duration of time required to reach satiety within a session (**Figure 1E**; [F(3,24) = 0.91, p = 0.44]). Post hoc Tukey tests between PBS and the three drug levels found a significant change

sepoch at 0.3 mg/kg and 1.0 mg/kg systemic scopolamine. (C) Mean duration of licking bouts in epochs decreases at 1.0 mg/kg scopolamine. (D) Total number of licks across both high and low reward epochs combined in daily sessions were reduced at the 1.0 mg/kg dose of scopolamine. (E) There was a dose-dependent increase in the time spent by rats engaging in the task under injections of scopolamine. (F) Systemic injections of scopolamine did not alter the intra-bout licking rate, regardless of the given dose. <sup>∗</sup>p < 0.05.

in licking, specifically with the mean number of licks and number of bouts during access to the high value sucrose solution, began to occur at the 0.3 mg/kg dose (p < 0.05). The 1 mg/kg dose strongly affected consumption of both the high and low reward solutions (p < 0.05 for all measures shown in **Figures 1A–D**). These reductions in consummatory behavior, especially at the higher dose of scopolamine, were independent of any effects on sensorimotor abilities, as there were no significant changes in the intra-bout licking rate at any dose injected (**Figure 1F**).

# Prefrontal Effects of the Muscarinic Antagonist Scopolamine

Having established systemic effects of scopolamine in the incentive contrast licking task, we next examined effects of local infusions of scopolamine within the mPFC. We focused on the same rostral region that contains licking-entrained neuronal activity (Horst and Laubach, 2013) and leads to the loss of incentive contrast effects and temporally fragmented licking when inactivated with muscimol (Parent et al., 2015). **Figure 2A** depicts cannula locations for all rats across all drug infusions into mPFC. Infusion of scopolamine (10 µg in 1 µl) resulted in a decrease in mean licks per epoch (**Figure 2B**; [F(1,6) = 18.68, p < 0.01]) and duration of licking bouts (**Figure 2C**; [F(1,6) = 39.18, p < 0.001]) during access to the high value solution. The effects on other measures were much less dramatic in comparison to the systemic data described above. While there was a decrease in the mean number of bouts initiated during access to the high value solution following infusion of scopolamine, this decrease did not reach significance (**Figure 2D**). During epochs of access to the low value solution there was a strong trend of increasing number of bouts (**Figure 2D**; [F(1,6) = 4.96, p = 0.068]) that were found to be of significantly shorter duration (**Figure 2C**; [F(1,6) = 22.04, p < 0.01). Overall, there was a significant increase in the length of time spent engaged in the task before reaching satiety (**Figure 2E**; [F(1,6) = 6.47, p < 0.05]), and a marginal decrease in licking throughout the entire session (**Figure 2F**; [F(1,6) = 2.61, p = 0.16]). While fluid intake for each session was recorded for each drug treatment in this study, only central infusions of scopolamine produced a significant change in volume consumed throughout the session, as measured by the average volume of fluid consumed per high value sucrose epoch divided by the average volume consumed for low value epochs in a given session (paired t-test: [t(6) = 3.6669, p < 0.05]). For central infusions of scopolamine, high value sucrose intake decreased while low value sucrose intake increased.

# Prefrontal Effects of Physostigmine and Oxotremorine

If blockade of cholinergic signaling decreases consumption by reducing the ability of mPFC to contribute to the regulation of motivated behavior, it may be possible to augment the ability of rats to optimally negotiate the task via the upregulation of cholinergic tone locally within mPFC. This hypothesis was tested via the infusion of physostigmine, a classic cholinesterase inhibitor. Inhibition of acetylcholinesterase blocks the degradation of acetylcholine and generally increases cholinergic tone non-specifically regarding cholinergic receptor subtypes. Infusion of 10 µm physostigmine into mPFC augmented behaviors related to consumption and palatability during access to the high value sucrose. There was a significant increase in the mean number of licks per 30-s epoch (**Figure 3A**; [F(1,6) = 8.57, p < 0.05]). While there was only a trend toward a decrease in the number of bouts for the high value sucrose (**Figure 3B**; [F(1,6) = 4.66, p = 0.075]), there was a significant increase in the duration of licking bouts during sessions with physostigmine infusions (**Figure 3C**; [F(1,6) = 7.89, p < 0.05]).

FIGURE 2 | Central infusions of scopolamine into mPFC reduce performance on consummatory contrast task. (A) Central infusions of drugs across all rats were targeted to the medial prefrontal cortex. (B) There was a dramatic decrease in mean number of licks across epochs of access to the high value sucrose solution. (C) Scopolamine decreased the duration of licking bouts during access to both low and high value sucrose solutions. (D) There was a trending decrease and increase in the number of bouts performed within epochs of access to high and low value sucrose solutions, respectively. (E) Animals spent significantly more time engaged in the task following central infusions of scopolamine. (F) Scopolamine infusions led to a trending decrease in the total lick counts during daily sessions. <sup>∗</sup>p < 0.05.

There was also a trending increase in the time spent licking during the session before reaching satiety (**Figure 3D**; [F(1,6) = 3.04, p = 0.132]). Physostigmine infusions did not alter the total number of licks emitted during the session (**Figure 3E**). Licking microstructure for the low value sucrose remained unchanged during physostigmine infusion sessions.

The impact of scopolamine on consummatory behavior suggested that the increased consumption during the task following infusion of physostigmine may be rooted in modulation of muscarinic receptors. To explore this hypothesis, we infused a non-specific muscarinic receptor agonist oxotremorine into mPFC. While the mean number of licks per epoch remained unchanged during sessions with 10 µm oxotremorine infusions (**Figure 4A**), the total number of licks occurring during the session for the high value sucrose greatly increased ([F(1,4) = 8.04, p < 0.05]). Similar to the effects of physostigmine, infusion of oxotremorine showed a trend toward a decrease in the number of bouts for the high value sucrose solution (**Figure 4B**; [F(1,6) = 7.47, p = 0.052]). Infusion of oxotremorine significantly increased the duration of licking bouts for the high value sucrose solution (**Figure 4C**; [F(1,4) = 13.28, p < 0.05]). There was no significant change in the time spent engaged in the task (**Figure 4D**), nor was there a significant effect on total licks emitted across the entire session with oxotremorine infusions (**Figure 4E**).

# Prefrontal Effects of the KCNQ Channel Blocker XE-991

A critical downstream effector of muscarinic receptor activation within neurons is the KCNQ (Kv7.1) type potassium channel

(Delmas and Brown, 2005; Brown and Passmore, 2009). Activation of these potassium channels decrease neuronal activity and promote neuronal synchrony in populations of neurons (e.g., in mPFC: Pafundo et al., 2013). Binding of acetylcholine to muscarinic receptors ultimately drives closure of KCNQ channels. This action drives neuron depolarization and increased neuronal excitability. Given the link between muscarinic receptors and KCNQ channels, alteration of KCNQ channel tone via direct pharmacological manipulations should alter consumption within our task. To test this hypothesis, 10 µM XE-991, a specific KCNQ channel blocker, was infused into mPFC. XE-991 significantly increased consumption during access to the high value sucrose solution via an increase in mean lick count (**Figure 5A**; [F(1,12) = 17.42, p < 0.01]). While there was no significant difference in the number of bouts emitted (**Figure 5B**), there was a significant increase in mean bout duration (**Figure 5C**; [F(1,12) = 13.81, p < 0.01]). There was no change in time spent licking during sessions with infusions of XE-991 (**Figure 5D**). Infusions of XE-991 did, however, have a significant increase in the total number of licks emitted throughout the behavioral session (**Figure 5E**; [F(1,12) = 6.202, p < 0.05]).

#### Prefrontal Effects of Ghrelin

Ghrelinergic modulation of intrinsic excitability in neurons is mediated via the same intracellular signaling pathway as the muscarinic modulatory system, specifically KCNQ channels (Li et al., 2013). Due to the presence of ghrelinergic receptors within the mPFC (Zigman et al., 2006) and the influence of muscarinic receptors on consumption reported above, ghrelin

was infused centrally into the mPFC and its influence on behavior using the consummatory contrast task was tested. Similar to increased cholinergic tone, muscarinic receptor activation, and KCNQ channel inhibition, infusion of 1 µM ghrelin into the mPFC increased consumption of the high value sucrose solution via an increase in the mean number of licks (**Figure 6A**; [F(1,8) = 24.61, p < 0.01]) and total number of licks across the session (**Figure 6E**; [F(1,8) = 14.60, p < 0.01]). On average there were significantly fewer bouts for the high value sucrose solution (**Figure 6B**; [F(1,8) = 7.85, p < 0.05]). There was a marginal effect of increased mean duration of licking bouts for the high value sucrose solution (**Figure 6C**; [F(1,8) = 4.58, p = 0.065]). Similar to infusion of oxotremorine and XE-991, there was no effect of ghrelin on consumption during epochs of access to the low value sucrose solution, nor was there a change in the time spent engaged in the task (**Figure 6D**).

# DISCUSSION

## Summary and Interpretation of Findings in the Present Study

In the present study, we found that decreasing cholinergic tone at muscarinic receptors with scopolamine both systematically and locally within the mPFC paralleled the results found following inactivation of mPFC using muscimol in an incentive contrast licking task (Parent et al., 2015). Decreased muscarinic tone in mPFC impairs performance on the task by decreasing the duration of licking bouts yielding a decreased rate of consumption. Further, we found that augmenting cholinergic

tone locally within mPFC using physostigmine as well as more specifically via direct application of the muscarinic receptor agonist oxotremorine yielded an increase in task performance with greater consumption of the high value reward. A major downstream effector of muscarinic receptor activation is KCNQ (Kv7.1) potassium channel. Binding of acetylcholine to muscarinic receptors drives KCNQ channels into a closed conformation yielding neuronal depolarization and increased excitability. Blocking KCNQ channels with XE-991 drove an increase in task performance that paralleled what occurred following enhancement of cholinergic tone using physostigmine and oxotremorine. Finally, as the orexigenic peptide ghrelin has recently been shown to act on the same KCNQ channels (Shi et al., 2013), we evaluated its actions within the mPFC in some of the same animals, and found similar behavioral effects to the drugs that enhanced cholinergic tone and blocked KCNQ channels.

In all cases, the behavioral effects of the drugs were selective to the relatively higher concentration of sucrose that was tested (20%) and altered the same microstructural measure of licking, bout duration. Previous studies have found that the bout duration increases in proportion the concentration of sucrose (or other sapid nutrients) in the ingested solutions (Davis, 1973). Bout duration has been thus considered to reflect how palatable the solutions are to the animal (e.g., Davis and Perez, 1993) and reflect the relative reward value of a given solution (Grigson et al., 1993). Therefore, we conclude that cholinergic and ghrelinergic receptors and KCNQ channels in the medial PFC regulate the expression of palatability.

Our interpretation uses the phrase ''expression of palatability'' and not palatability per se. This is to emphasize the ''readout'' side of the control of consummatory behavior, and not the encoding of taste information or relative reward value, which has been proposed for other brain areas (agranular insular cortex and basolateral amygdala) and can be assayed using different behavioral measures, such as orofacial reactions (Grill and Norgren, 1978). The temporal control of consummatory behavior involves regulation of sensorimotor and autonomic/visceral systems. Sensorimotor control of consumption is regulated by a part of the medial agranular cortex (Yoshida et al., 2009) that is immediately adjacent to the mPFC area (rostral prelimbic cortex) that was the focus of the present study. Autonomic and visceral controls have been more traditionally emphasized for the prelimbic area (and the adjacent infralimbic cortex) (Terreberry and Neafsey, 1983) through its connections with the hypothalamus and autonomic midbrain and brainstem, as reviewed below. For example, the mPFC area that was studied here was recently shown to be involved in the regulation of breathing (Hassan et al., 2013). The rostral mPFC is well placed to coordinate the sensorimotor and autonomic motor systems through its descending projections (see Gabbott et al., 2005 for review).

# Potential Neuronal Mechanism of Cholinergic and Ghrelinergic Regulation of Palatability

The drugs that we tested might have altered the animals' bout durations due to effects of the drugs on the ability of the mPFC to emit theta-range rhythms that normally accompany the initiation of consummatory behavior in rodents. Several recent studies have reported that neurons in the mPFC exhibit changes in firing rate around the initiation of licking (Petykó et al., 2009, 2015; Horst and Laubach, 2013). One of these studies (Horst and Laubach, 2013) also reported phasic changes in field potentials occur when rats initiate and terminate bouts of licks. The fields showed enhanced phase locking near the licking rhythm, between 6 and 8 Hz, a frequency range that is normally associated with ''theta'' in rodents. This rhythm might reflect a temporal synchronization of network-level activity that could serve to monitor the outcome of licking (Gutierrez et al., 2006) or could reflect a transient encoding of reward expectancy (van Wingerden et al., 2010), as proposed for similar signals in the orbitofrontal cortex.

Theta can be generated in several ways in the frontal cortex, by hippocampal inputs (which do not synchronize with licking: Vanderwolf, 1969), thalamocortical inputs (Hughes and Crunelli, 2007), and NMDA receptor-mediated spiking by layer 5 pyramidal neurons that are coupled to theta bursts by layer 2/3 pyramidal cells (Carracedo et al., 2013). These rhythms are enhanced by cholinergic agonists that act on the M current (Marrion, 1997), generated by KCNQ channels (Delmas and Brown, 2005; Brown and Passmore, 2009). In vitro slice physiology has shown that the application of the selective KCNQ channel blocker, XE-991, increases neuronal excitability, especially in response to low frequency inputs (<10 Hz; Guan et al., 2011; see also Pafundo et al., 2013 for effects in prefrontal cortical slices). Theta activity can be generated intracortically by regular spiking neurons in layer V (Carracedo et al., 2013). These cells are temporally gated by coterminous lower-frequency delta rhythms generated by intrinsic bursting cells (Carracedo et al., 2013). A disruption of the precise temporal interactions between these neurons, by any drug that acts on KCNQ channels or alters extracellular transmitters that act to regulate these channels, would thus disrupt the normal control of rhythmic behaviors that depend on neuronal processing within the cortical area of interest and/or within a collection of brain areas that control consummatory behavior in a coordinated manner.

We must point out that the interpretation of our findings are based on in vitro slice physiology studies, and not in vivo studies done in awake, behaving animals. Testing the implications of our findings will require new experiments that combine neuronal recordings with local infusions of muscarinic drugs and KCNQ channel blockers as well as optogenetic and chemogenetic manipulations of cholinergic activity (e.g., ChAT rats) in the mPFC.

# Potential Neuronal Circuits for the Regulation of Palatability

The mPFC region examined in the present study is one part of a large brain network that encodes the value of foods and regulates consummatory behavior. We have emphasized a role of the mPFC in the expression of palatability. There are neurons in the mPFC that are modulated by sensory (taste) properties of foods (Jezzini et al., 2013). However, a more likely candidate for encoding taste information or retrieving values determined by taste information from memory is the agranular insular cortex (AIC), which is classically considered as ''taste cortex'' (Yamamoto et al., 1989). The AIC contains neurons that encode for the palatability of tastants (Grossman et al., 2008) and respond more vigorously and with shorter latencies to specific tastants compared to the mPFC (Jezzini et al., 2013). The source of these palatability signals within the AIC may be the basolateral amygdala (Grossman et al., 2008) which projects to both the AIC and the mPFC (Hoover and Vertes, 2007; Reppucci and Petrovich, 2015).

Several studies have implicated the AIC in the rewardguided control of action (DeCoteau et al., 1997; Ragozzino and Kesner, 1999; Balleine and Dickinson, 2000; Kesner and Gilbert, 2007; Gardner and Fontanini, 2014; Kusumoto-Yoshida et al., 2015), but not other mPFC dependent behaviors such as action timing (Smith et al., 2010) and delayed alternation (Horst and Laubach, 2009). This region seems to be involved in the retrieval of outcome values that are encoded by the basolateral amygdala (BLA; Parkes and Balleine, 2013). However, as reversible inactivations of the AIC and mPFC have comparable effects on palatability driven feeding (Baldo et al., 2015), we propose that the two areas work together with BLA to regulate consummatory behavior, by enabling the conversion of reward values into control signals that guide action selection (e.g., lick now or later in the incentive contrast licking task).

Anatomical tract-tracing studies have reported heavy interconnections between the mPFC and AIC (Gabbott et al., 2003) and there are significant inputs from the BLA to the region of mPFC that was the focus of the present study (Bacon et al., 1996). Inputs from BLA terminate on parvalbumin interneurons in the mPFC (Gabbott et al., 2006), which regulate the dynamics of neuronal in the mPFC (Dilgen et al., 2013). These connections could both provide value signals to the mPFC and shape the timing of neuronal activity associated with the initiation of consummatory behavior, as described by Horst and Laubach (2013). The BLA also directly innervates corticospinal neurons in the mPFC (Gabbott et al., 2012), which are associated with the autonomic nervous system (Gabbott et al., 2005). Through these connections, information about the palatability of an ingested food or fluid may be processed in the mPFC and modulated by cholinergic tone and cerebrospinal levels of ghrelin to control consummatory behavior.

In addition to its corticospinal connections, the mPFC sends dense projections to autonomic and feeding-related centers in the hypothalamus (Floyd et al., 2001), midbrain (Floyd et al., 2000), and brainstem (Gabbott et al., 2005), including a recently described projection to a trigeminal relay in the brainstem (Iida et al., 2010). The target of the mPFC in the lateral hypothalamus has recently been shown to contain neurons that encode palatability-related information (Li et al., 2013) and to become phasically active in relation to licking behavior (Tandon et al., 2012). Another major output of the mPFC is the ventral striatum, a region associated with encoding reward values (Bissonette et al., 2013) and controlling food seeking behaviors (Taha and Fields, 2005). Cholinergic or ghrelinergic modulation of any of these projections, acting through KCNQ channels, could influence neuronal activity in these subcortical centers to regulate the expression of palatability. This neuronal circuit interpretation of our findings could be tested in new studies that involve multi-site neuronal recordings and opto-/chemo-genetic

## REFERENCES


perturbations of neuronal recordings at the specific times when animals initiate consummatory actions.

# Prefrontal vs. Hypothalamic Effects of Ghrelin

A novel finding of the present study is that the direct administration of ghrelin into the mPFC alters a specific behavioral measure of palatability (i.e., the duration of licking bouts). This finding is in contrast to a recent study in which ghrelin was infused into the ventricles near the ventral hypothalamus (Overduin et al., 2012). The Overduin study found that ghrelin increases overall intake but does not increase measures of palatability. This difference between these findings is likely due to actions of ghrelin on different brain areas (hypothalamus vs. mPFC). Feeding centers in the hypothalamus contain neurons such as the agouti-related pepride-secreting (AgRP) neurons that are sensitive to ghrelin but do not influence palatability (Denis et al., 2015). Our finding that ghrelin is able to influence the expression of palatability may simply be due to ghrelin acting on the same ion channels that the muscarinic cholinergic system acts on (KCNQ channels) and the subsequent modulation of consummatory related neuronal activity in the mPFC (i.e., increases in firing and increased gain of transmission in the licking (theta) frequency). This interpretation of our results could be tested in future studies that combine neuronal recordings with local drug infusions or opto-/chemo-genetic manipulations of neurons with ghrelin receptors in the mPFC and hypothalamus.

# FUNDING

Financial Support: National Science Foundation grant 1121147, National Institutes of Health grant DK099792-01A1, and two grants from the Klarman Family Foundation to ml.

values in monkeys. J. Neurosci. 30, 8591–8601. doi: 10.1523/JNEUROSCI.0049- 10.2010


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Parent, Amarante, Swanson and Laubach. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Individual variability in behavioral flexibility predicts sign-tracking tendency

#### Helen M. Nasser 1, <sup>2</sup> , Yu-Wei Chen<sup>1</sup> , Kimberly Fiscella<sup>1</sup> and Donna J. Calu1, 2 \*

*<sup>1</sup> Behavioral Neuroscience Research Branch, Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Department of Health and Human Services, Baltimore, MD, USA, <sup>2</sup> Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD, USA*

Sign-tracking rats show heightened sensitivity to food- and drug-associated cues, which serve as strong incentives for driving reward seeking. We hypothesized that this enhanced incentive drive is accompanied by an inflexibility when incentive value changes. To examine this we tested rats in Pavlovian outcome devaluation or second-order conditioning prior to the assessment of sign-tracking tendency. To assess behavioral flexibility we trained rats to associate a light with a food outcome. After the food was devalued by pairing with illness, we measured conditioned responding (CR) to the light during an outcome devaluation probe test. The level of CR during outcome devaluation probe test correlated with the rats' subsequent tracking tendency, with sign-tracking rats failing to suppress CR to the light after outcome devaluation. To assess Pavlovian incentive learning, we trained rats on first-order (CS+, CS−) and second-order (SOCS+, SOCS−) discriminations. After second-order conditioning, we measured CR to the second-order cues during a probe test. Second-order conditioning was observed across all rats regardless of tracking tendency. The behavioral inflexibility of sign-trackers has potential relevance for understanding individual variation in vulnerability to drug addiction.

#### Edited by:

*Gregory B. Bissonette, University of Maryland, USA*

#### Reviewed by:

*Caitlin Anne Orsini, University of Florida, USA Nicholas W. Simon, University of Pittsburgh, USA*

#### \*Correspondence: *Donna J. Calu*

*dcalu@som.umaryland.edu*

Received: *14 August 2015* Accepted: *12 October 2015* Published: *03 November 2015*

#### Citation:

*Nasser HM, Chen Y-W, Fiscella K and Calu DJ (2015) Individual variability in behavioral flexibility predicts sign-tracking tendency. Front. Behav. Neurosci. 9:289. doi: 10.3389/fnbeh.2015.00289* Keywords: sign-tracking, outcome devaluation, second-order conditioning, behavioral flexibility, Pavlovian incentive learning

# INTRODUCTION

Addiction is a chronically relapsing disorder that develops only in a subset of individuals that engage in recreational drug use. Approximately 15–30% of individuals that try drugs of abuse transition to addiction (Anthony et al., 1994). The behavior of addicted individuals is characterized by a heightened motivation for drugs and an inflexibility characterized by a persistence to seek and take drugs despite negative consequences (American Psychiatric Association, 2013). Animal procedures exploring individual variability in natural reward seeking during Pavlovian lever autoshaping have identified phenotypic behavioral differences that predict vulnerability to cuedriven drug seeking (Tomie, 1996; Flagel et al., 2009; Saunders and Robinson, 2010; Saunders et al., 2013; Yager and Robinson, 2013). Such studies have demonstrated that heightened incentive motivation for natural reward-associated cues serves as an informative predictor of heightened motivation for drug seeking.

During Pavlovian lever autoshaping, where the extension and retraction of a lever precedes the delivery of food reward, rats show individual differences in conditioned responding (CR). Signtracking rats preferentially approach and contact the lever, while goal-tracking rats preferentially approach and contact the food cup (Hearst, 1974; Boakes, 1977; Flagel et al., 2007). Sign-tracking rats show heightened cue-directed and/or cue-driven motivation for food-, cocaine-, opioid-, and nicotine- associated cues as compared to goal-tracking rats (Robinson and Flagel, 2009; Flagel et al., 2010; Saunders and Robinson, 2010, 2013; Yager and Robinson, 2010; Saunders et al., 2013; Yager et al., 2015). These findings suggest that sign-trackers show heightened incentive motivation, which is behavior driven by the transfer of reinforcing properties of the reward to the reward-predictive cue, for both natural- and drug-reward associated cues. This heightened incentive motivation driven by drug-associated cues persists in sign-tracking rats even when their drug seeking is punished by an aversive footshock barrier in front of the drug-associated response apparatus (Saunders et al., 2013). This finding suggests that after drug-exposure sign-tracking individuals also fail to appropriately adjust reward seeking in response to punishment. An open question that remains is whether such inflexibility in sign-trackers is driven by too much incentive value attributed to the initially appetitive cue, and/or whether there are deficits in incorporating information about changing incentive value of actions, outcomes, and/or their associated cues.

To begin to address this, in Experiment 1, we examined whether individual variability in behavioral flexibility after outcome devaluation predicts tracking phenotype. Pavlovian outcome devaluation (Holland and Straub, 1979) is a procedure that examines the extent to which a previously reward-paired conditioned stimulus (CS) drives CR after the unconditioned stimulus (US) has been devalued through pairing with illness, and thus is the ideal Pavlovian procedure to examine the flexibility of cue-driven behavior when incentive value of the outcome changes. It is thought that a reduction in CR in the outcome devaluation probe test is driven by stimulus-outcome associations, that is, the ability of the CS to evoke a representation of the current incentive value of the US to drive flexible behavior. However, a failure to display a flexible reduction in CR after outcome devaluation could instead be explained by enhanced incentive value attributed to the initially appetitive cue, for which enhanced stimulus-response associations would effectively mask any stimulus-outcome driven learning after outcome devaluation.

To address this possibility, in Experiment 2, we examined whether individual variability in the expression of previously acquired appetitive incentive value, as assessed by Pavlovian second-order conditioning, predicts the tracking phenotype. Pavlovian second-order conditioning (Pavlov, 1927; Rizley and Rescorla, 1972; Holland and Rescorla, 1975) is a procedure that examines the extent to which a previously reward-paired CS alone is able to support CR to a novel second-order conditioned stimulus (SOCS), and thus is the ideal Pavlovian procedure to examine the expression of previously acquired appetitive incentive value. It is thought that heightened responding to the SOCS is the result of the association between first- and secondorder cues (stimulus-stimulus) and/or associations between the second-order cue and conditioned responses previously evoked by the first-order cue (stimulus-response associations), the latter of which is evidence of Pavlovian incentive learning (Rizley and Rescorla, 1972; Rescorla, 1980; McDannald et al., 2013).

Aftertesting rats in Pavlovian outcome devaluation or secondorder conditioning procedures we determined their tracking phenotype by screening them in the lever autoshaping procedure. We predicted that a failure to suppress CR after outcome devaluation (Experiment 1) and heightened CR to second-order cues (Experiment 2) would predict the sign-tracking phenotype.

# MATERIALS AND METHODS

### Subjects

Male Long-Evans rats (Charles River Laboratories, Wilmington, MA; 250–325 g at time of arrival; Experiment 1 between subject design; total n = 60; Experiment 2 within subject design total n = 24) were singly housed and maintained on a reverse 12 h light/dark cycle (lights off at 8 a.m.). Rats had ad libitum access to water and standard laboratory chow for 2 days (Experiment 2) and 6 days (Experiment 1) before being food restricted to 85% of their baseline ad libitum body weight. Once all rats reached 85% of their baseline body weight they were maintained at 85–90% throughout the behavioral experiments, which were performed in accordance to the "Guide for the care and use of laboratory animals" (8th edition, 2011, US National Research Council). Experimental protocols were approved by the Intramural Research Program (NIDA) Animal Care and Use Committee.

Behavioral experiments were conducted in individual standard experimental chambers (25 × 27× 30 cm; Med Associates) that were enclosed in a sound-resistant shell. For both experiments, rats were housed in the animal facility, transferred to the experimental chambers prior to the training sessions, and returned to the facility at the end of the sessions.

# Experiment 1: Pavlovian Outcome Devaluation

#### Apparatus

For Experiment 1, each chamber had one 6 W white cue light located 10 cm above the floor in the center of the wall and a speaker 1 cm above the cue light. A red houselight (6 W bulb covered by red lens) was located at the top of the same wall. The opposite wall was outfitted with a recessed food cup (with photobeam detectors) 2 cm above the floor grid attached to a programed pellet dispenser, which delivered 45 mg food pellets containing 12.7% fat, 66.7% carbohydrate, and 20.6% protein (catalog #1811155; Test Diet 5TUL). The red houselight was illuminated at the start of each training session and was extinguished at the end of each session. Two retractable levers were located on either side of the food cup 6 cm above the floor. These levers remained retracted during the light conditioning phase of the experiment.

#### Phase I: Pavlovian Light Conditioning

A summary of our experimental design can be found in **Table 1**. Behavioral training began with a session that reduces the novelty of the unconditioned stimuli. We exposed rats to a single 64 min magazine training session consisting of eight trials in which two 45 mg pellets [Testdiet Purified Rodent Tablet (5TUL)] were

#### TABLE 1 | Summary of experimental designs.


*The right arrow (*→*) signifies paired presentations. A and B are visual conditioned stimuli (for Experiment 1 we used a steady light and for Experiment 2 we used steady or blinking cue lights, counterbalanced for CS*+ *and CS*−*), X and Y were auditory stimuli (white noise or a tone, counterbalanced for SOCS*+ *and SOCS*−*), L1 refers to lever 1 and L2 referred to lever 2 (location counterbalanced), food signifies the delivery of two food pellet US; LiCl signifies lithium chloride injections, suc signifies the delivery of 0.5 mL of 10% sucrose.*

delivered (0.5 s apart) on a VI 225 s schedule (200–250 s) for 16 trials. Following magazine training, we trained rats in eight daily 64 min light conditioning sessions, consisting of 16 trials of a 10 s cue light CS. At CS offset two pellets spaced 0.5 s apart were immediately delivered to the food cup on a VI 225 s schedule (200–250 s). We recorded the number of nose pokes and time spent in food cup. We provided chow in the homecage and returned the rats to the animal facility after daily light conditioning sessions.

#### Phase II: Conditioned Taste Aversion Training

One day after the final light conditioning session, we devalued the pellets used during light conditioning in a homecage conditioned taste aversion (CTA) procedure that took place in rats' homecage over 4 days. We divided the rats into paired (n = 30) and unpaired (n = 30) groups. We habituated all rats in their homecage to the ceramic ramekins used to present food pellets during subsequent CTA procedures. Across the 4 days of CTA, we exposed all rats to both pellets and lithium chloride (LiCl) induced gastric malaise in order to equate experience between the paired and unpaired groups; however explicit pairing of pellets with illness occurred only in the paired group. On the first and third days of CTA, we gave paired rats 10 min of homecage access to 100 pellets in ceramic ramekins followed immediately by LiCl injection (0.3 M, 5 mL/kg, i.p.), whereas we gave the unpaired rats only the LiCl injection (0.3 M, 5 mL/kg, i.p.). On the second and fourth days, we gave unpaired rats 10 min of homecage access to 100 pellets in ceramic ramekins, and paired rats remained in the homecage with no intervention. We gave all rats standard homecage chow (amount based on 85% body weight with compensation for CTA pellet consumption) 6 h after pellet access and/or injections each day during this phase of the experiment to prevent association of LiCl-induced illness with homecage chow.

#### Phase III: Outcome Devaluation Probe Test

One day after the final day of the CTA procedure, we conducted an outcome devaluation probe test. During this 64 min extinction session, the 10 s cue light was illuminated on a VI 90 s schedule (60–120 s) for 16 trials, but no pellets were delivered. We recorded time spent in the food cup during the pre-CS (10 s in 5 s bins), CS (10 s in 5 s bins), post-CS period (1.5 s post-CS when reward was previously delivered), and post-reward

(5 s post reward). Three hours after the probe test, we gave rats 10 min access to 50, 45 mg pellets (same as the US used during conditioning and in CTA), which we placed in the magazine of operant chambers and we recorded the number of pellets consumed to assess generalization of taste aversion from homecage to experimental chamber. We performed a post-probe test homecage consumption test the next day, in which we gave all rats 10 min homecage access to 100 pellets, and we recorded the number of pellets consumed. Rats that had pellet consumption that fell three standard deviations outside of the group mean during homecage or chamber tests were excluded from the study (paired n = 2, unpaired n = 1).

#### Phase IV: Lever Autoshaping Procedure (Sign-tracking Screening Procedure)

After the outcome devaluation probe test, in order to reduce the novelty of the sucrose US, we gave rats a single 75 min magazine training session, during which 0.5 mL of 10% sucrose was delivered into the food cup on a VI 90 s schedule (60– 120 s) for 25 trials. Subsequently, we trained rats for 5 days in autoshaping sessions (∼75 min per session) in which there were 25 CS+ and 25 CS− presentations occurring on a VI 90 s schedule (60–120 s). CS+ trials consisted of the insertion of a retractable lever (left or right, counterbalanced) paired with a 5150 Hz tone for 10 s, after which the lever was retracted and 0.5 mL of sucrose was delivered to the food cup. CS− trials consisted of the insertion of a retractable lever (left or right, counterbalanced) paired with a 12,163 Hz tone for 10 s, after which the lever was retracted and no reward delivered.

#### Experiment 2: Pavlovian Second-order Conditioning Apparatus

For Experiment 2, each experimental chamber had two 6 W white cue lights located 10 cm above the floor to the left and right of a recessed food cup (with photobeam detectors) 2 cm above the grid floor on the center of the wall attached to a programed pellet dispenser, which delivered 45 mg food pellets containing the same pellet as used in Experiment 1 (catalog #1811155; Test Diet 5TUL). In the center of the opposite wall a speaker was located 8 cm above the floor. A red houselight (6 W bulb covered by red lens) was located at the top of same wall. The red houselight was illuminated at the start of each training session and was extinguished at the end of each session.

#### Phase I: Pavlovian Light Conditioning (First-order Conditioning)

A summary of our experimental design can be found in **Table 1**. Behavioral training began with two sessions that reduce the novelty of the unconditioned and conditioned stimuli. We exposed rats to a single 64 min magazine training session consisting of eight trials in which two 45 mg pellets (Testdiet Purified Rodent Tablet 5TUL) were delivered 0.5 s apart to the food cup on a VI 240 s schedule (60–420 s intervals). The same day we exposed rats to a single 32 min light pre-exposure session consisting of four presentations of each of the two cue lights (flashing and steady) and no pellets on a VI 240 s schedule (60–420 s intervals). After magazine training and light preexposure sessions, we gave rats 12 daily 64 min light conditioning sessions, consisting of 16 trials. Each session consisted of eight CS+ trials, in which a 10 s cue light (flashing or steady light) was rewarded with two pellets delivered on the 9th and 10th second of the CS (stimulus type was counterbalanced across rats) and eight CS− trials, in which the alternate cue light (flashing or steady light) was not rewarded. These lights were counterbalanced by side (left or right of the food cup) and stimulus (flashing or steady). Presentation of CS+ and CS− trials were intermixed to enhance discrimination between the two cue lights. We recorded the number of food cup entries during the conditioning phase. We provided chow in the homecage and returned the rats to the animal facility after daily light conditioning sessions.

#### Phase II: Second-order Conditioning

After 12 days of first-order conditioning (FOC), to reduce the novelty of the second-order conditioned stimuli, we exposed rats in a single 32 min pre-exposure session consisting of eight intermixed trials of second-order cues: four trials of a 10 s tone (5150 Hz, 2 dB) cue presentation, and four trials of 10 s white noise (82 dB) cue presentation. After the pre-exposure session, second-order conditioning started. Second-order conditioning involved three 64 min daily sessions of 16 trials on a VI 240 s schedule (60–420 s intervals). During second-order conditioning there were two types of FOC cues presented within a session. Elemental FOC trials served as "reminder" trials of Pavlovian first-order light discrimination conditioning, and were simply reward-paired first order CS+ trials and unrewarded CS− trials that were presented to remind rats of the discrimination between first-order stimuli during the second-order conditioning phase. Compound FOC trials were simply the second-order trials in which auditory SOCS+ or SOCS− cues paired with respective first-order CS+ or CS−. There were eight first-order cue "elemental" reminder trials and there were eight secondorder conditioning trials: four trials of low tone immediately followed by a 10 s light cue (flash or steady) and four trials of noise immediately followed by the alternate 10 s light cue (flash or steady) with order of stimuli counterbalanced. During secondorder conditioning sessions we recorded the number of food cup entries.

#### Phase III: Second-order Conditioning Probe Test

After 3 days of second-order conditioning, we exposed rats to a single 64 min probe test session consisting of 16 trials presented on a VI 240 s schedule (60–420 s intervals). There were eight intermixed first-order cue presentations (CS+ and CS− light cues) and separately eight intermixed second-order cue presentations (SOCS+ and SOCS− auditory cues) all presented alone, with the second-order cues presented in a block prior to first-order cues. During second-order conditioning probe test, we recorded the number of food cup entries and video for later scoring of rearing and head jerk behaviors as described below in the response measures section.

#### Phase IV: Lever Autoshaping Procedure (Sign-tracking Screening Procedure)

After the second-order conditioning probe test, in order to reduce the novelty of the sucrose US, we exposed rats to a single 75 min magazine training session, during which 0.5 mL of 10% sucrose was delivered into the food cup on a VI 90 s schedule (60–120 s) for 25 trials. Subsequently, we gave rats five sessions of autoshaping (∼75 min per session) in which there were 25 CS+ and 25 CS− presentations occurring on a VI 90 s schedule (60–120 s). CS+ trials consisted of the insertion of a retractable lever (left or right, counterbalanced) for 10 s, after which the lever was retracted and 0.5 mL of sucrose was delivered to the food cup. CS− trials consisted of the insertion of a retractable lever (left or right, counterbalanced) paired for 10 s, after which the lever was retracted and no reward delivered. At the start of each session the red houselight is turned on and at the end of each session the houselight is turned off.

#### Response Measures

#### **Food cup behavior**

For Experiment 1, Pavlovian outcome devaluation Phases I and III, the primary measure of appetitive conditioning was the percentage of time the rat spent in the food cup during the 10 s CS and during the 10 s interval immediately before each CS (pre-CS), as determined by interruption of the photocell beam in the magazine. For the outcome devaluation probe test we also measured the percentage of time the rat spent in the food cup during the 1.5 s post-CS (time when reward was delivered during conditioning). For Experiment 2, Pavlovian second-order conditioning Phases I-III, the primary measure of appetitive conditioning was the number of food cup responses during the 10 s CS and during the 10 s interval immediately before each CS (pre-CS), as determined by interruption of the photocell beam in the magazine. Previous studies (Holland, 1977) have indicated that in FOC most food cup behavior occurs during the last 5 s of a 10 s CS, and thus we examined data in 5 s bins.

#### **Pellet consumption**

For Experiment 1, Outcome devaluation, the consumption of food pellets during taste aversion training and subsequent devaluation tests was determined by counting the number of pellets remaining in the ramekin and bedding after 10 min (for homecage test) or by counting the number of pellets remaining in the magazine and tray beneath the experimental chamber floor after 10 min (in chamber test).

#### **Rearing and head jerk scored from video**

Two experimenters were blinded to the experimental groups and independently scored a subset (33%) of the same videos to confirm accuracy and consistency of video scoring for rearing and head jerk measures. Video scores obtained by each individual were significantly correlated for both measures (rearing r 2 = 0.86, p < 0.001; head jerk r <sup>2</sup> = 0.91, p < 0.001).

#### **Rearing behavior**

For Experiment 2, Second-order conditioning Phase III, rearing (conditioned orienting observed to light stimuli) was a second measure of appetitive conditioning. We observed and scored rearing behavior in one-second intervals for the pre-cue (10 s) and cue (10 s) periods. Rearing was defined as standing with both front forepaws off of the grid (Holland, 1977).

#### **Head jerk behavior**

For Experiment 2, Second-order conditioning Phases III, head jerk (conditioned orienting observed to auditory stimuli) was a third measure of appetitive conditioning to first- and secondorder cues. We observed and scored head jerk behavior in 1 s intervals for the pre-cue (10 s) and cue (10 s) periods. Head jerk was defined as short rapid horizontal or vertical movements of the head usually oriented toward the food magazine or source of the audio output. Simultaneous display of head jerk and hindquarter movement or rearing, were scored as head jerk or not head jerk (Holland, 1977).

#### **Lever autoshaping behavioral measures**

Behavioral characterization of sign- and non-sign trackers were identical for Experiments 1 and 2 and were based on a Pavlovian Conditioned Approach analysis (Meyer et al., 2012). The primary measure of tracking tendency was characterized by the average of three difference score measures that make up the composite sign-tracking score that ranges from −1.0 to +1.0. These three difference score measures of sign-tracking behavior are: (1) preference score, (2) latency score, and (3) probability score, which were calculated for each lever autoshaping session. The preference score (ranges from −1.0 to 1.0) was the number of lever presses during the CS, minus the number of food cup responses during the CS, divided by the sum of these two measures. The latency score (ranges from −1.0 to 1.0) was the average latency to make a food cup response during the CS, minus the latency to lever press during the CS, divided by the duration of the CS (10 s). The probability score (ranges from −1.0 to 1.0) was the probability the rat will lever press minus the probability the rat will make a food cup response, determined on a trial by trial basis and averaged across the session to determine probability of each response. The composite ST score (ranges from −1.0 to 1.0) was determined for each session and was the average of the preference score, latency score, and probability score. Sign-tracking (ST) was defined by a composite score ranging from +0.5 to +1.0 and non-sign tracking (non-ST) was defined by a score ranging from +0.49 to −1.0, and was comprised of intermediate rats with scores ranging from +0.49 to −0.49 and goal-tracking rats with scores ranging from −0.5 to −1.0. Generally speaking, sign-tracking rats prefer and press the lever at a higher frequency, shorter latency, and higher probability than they respond at the food cup. Goal-tracking rats prefer and respond at the food cup more frequently, at a shorter latency, and higher probability than they respond at the lever. Intermediate rats tend not to have a clear preference for the lever or the food cup, responding at similar levels, latencies and probabilities at the food cup and lever. The final composite tracking score used to characterize the individual rats as ST (≥0.5) or non-ST (<0.5) was the average composite tracking score across sessions 3–5 of autoshaping.

#### Statistical Analyses

The behavioral data were analyzed using the SPSS statistical software (IBM) by ANOVAs and t-tests, and significant main effects and interaction effects (p < 0.05) were followed by Bonferroni post-hoc tests. All statistical analyses of the food cup behavior were done on the raw data counts or number of entries. The dependent measures and the factors used in the statistical analyses are described in the results section below.

# RESULTS

# Experiment 1: Pavlovian Outcome Devaluation

#### Phase I: Pavlovian Light-food Conditioning

We trained all rats that a light predicted delivery of a food reward. All rats increased their food cup entries in response to the light CS over the course of eight training sessions, while the response during the pre-CS period remained low and relatively stable. This response curve did not differ between the later determined paired and unpaired groups (**Figure 1A**). We analyzed the data using a mixed ANOVA, with between subject factor of Pairing (Unpaired, Paired) and within subject factors of Session (1-8), and CS epoch (pre-CS last 5 s, CS last 5 s). There was a main effect of Session [F(7, 385) = 80.8, p < 0.05] and CS epoch [F(1, 55) = 453.8, p < 0.05], as well as Session × CS epoch interaction [F(7, 385) = 132.8, p < 0.05], but no main effect of Pairing [F(1, 55) = 0.6, p = 0.43] nor interaction of Pairing × Session [F(7, 385) = 1.7, p > 0.05], Pairing × CS epoch [F(1, 55) = 0.4, p > 0.05] or Pairing × Session × CS epoch [F(7, 385) = 0.9, p > 0.05].

#### Phase II: Conditioned Taste Aversion Training

After light-food conditioning, homecage exposure to the food and illness were either Paired or Unpaired for the two groups. Paired rats readily reduced consumption of the pellets as compared to unpaired rats (**Figure 1B**). We analyzed data using a mixed ANOVA with between subjects factor of Paring (Unpaired, Paired) and within subjects of Trial (Day 1, Day 2), in which we found a main effect of Pairing [F(1, 55) = 167.9, p < 0.05] and Trial [F(1, 55) = 211.7, p < 0.05] and a Pairing × Trial interaction [F(1, 55) = 373.5, p < 0.05].

#### Phase III: Outcome Devaluation Probe Test

After taste aversion training, we assessed conditioned food cup responding to the light under extinction conditions (**Figure 1C**). In accordance with other devaluation studies (Pickens et al., 2003), we presented only results from the first eight probe test trials. Data for CS and pre-CS responding were analyzed using mixed ANOVAs with between subject factor of Pairing (Unpaired, Paired) and within subject factor of CS epoch (pre-CS, CS). The analysis of the 10 s pre-CS and CS periods showed that there was a main effect of CS [F(1, 55) = 155.8, p < 0.05] and a nearly significant main effect of Pairing [F(1, 55) = 3.7, p = 0.06]. There was no significant interaction of CS epoch and Pairing [F(1, 55) = 2.57, p = 0.12]. Overall, we found a modest reduction in food cup behavior during the light cue in the Paired group, consistent with the reduced value of the food outcome established during the CTA phase.

To further confirm the strength of aversion to the pellets after the critical probe test we analyzed data from two additional consumption tests in the absence of LiCl injections; one in the homecage (**Figure 1B** post) and the other in the experimental chamber (data not shown). We analyzed the consumption data from the post-probe homecage test using an ANOVA, with between subject factor of Pairing (Unpaired, Paired). There was a main effect of Pairing [F(1, 55) = 8125.0, p < 0.05]. We analyzed the consumption data from the post-probe chamber test using an ANOVA, with between subject factors of Pairing (Unpaired, Paired). There was a main effect of Pairing [F(1, 55) = 4434.4, p < 0.05; data not shown; mean ± SEM; Unpaired = 50.0 ± 0.03, Paired = 2.3 ± 0.7].

#### Phase IV: Lever Autoshaping Procedure (Sign-tracking Screening Procedure)

After the probe test, we screened rats in the lever autoshaping procedure to determine their Tracking tendency. Rats' performance on three lever measures (contact, latency, and probability) and three food cup measures (contact, latency, and probability) is shown in **Figure 2**. We analyzed the data using six separate sets of mixed repeated measures ANOVAs, using and between subjects factor of Tracking group (non-ST, ST) and within subject factors of CS (CS−, CS+) and Session (1-5). The six analyses were on three lever measures (contact, latency, and probability) and three food cup measures (contact, latency, and probability). The main effects and interactions are reported in **Table 2**. Importantly, the critical CS × Session × Tracking group interactions were significant for all six measures of CR.

## Individual Differences in Outcome Devaluation Probe Test

In order to understand whether performance of individual rats in outcome devaluation relates to tracking tendency we conducted linear correlation analyses using performance during the outcome devaluation probe test and the composite tracking score (**Figure 3**). For paired rats we observed a significant positive correlation between food cup CR in the last 5 s of the CS during outcome devaluation probe test and the later determined tracking score (r <sup>2</sup> = 0.15, p < 0.05; **Figure 3** top right), paired rats that responded at highest levels to the devalued stimulus tended to fall toward the sign-tracking end of the continuum. The relationship between food cup CR during the Post-CS period of outcome devaluation probe test and the tracking score was also positively correlated (r <sup>2</sup> = 0.15, p < 0.05; **Figure 3** top right). There was no such relationship between food cup CR during the pre-CS of outcome devaluation probe test and the tracking score (r <sup>2</sup> = 0.02, p = 0.3; data not shown). We also did not observe any correlations in the Unpaired group during the CS or Post-CS period (r <sup>2</sup> = 0.10, p = 0.09 and r <sup>2</sup> = 0.08, p = 0.15, respectively; **Figure 3** bottom). These results suggest that rats that respond more to the CS after outcome devaluation go on

to engage in more sign-tracking behaviors during Pavlovian lever autoshaping.

Based on this behavioral correlation, we conducted further statistical analyses of each phase of Experiment 1 now including the between subject factor of Tracking tendency to confirm the predictive relationship between flexibility in response to devalued outcomes and tracking tendency. To account for the possibility of unequal variance between our tracking groups for the CS and reward period during probe test we ran the Levene's test for equality of variance [F-critical(1, 55) = 5.3, type I error rate α = 0.05]. Food cup responding for paired non-sign-tracking and sign-tracking rats was less than the Fcritical [F(1, 55) = 3.9, p = 0.06 and F(1, 55) = 1.5, p = 0.23, CS and Post-CS period, respectively]. Therefore, there was insufficient evidence for unequal variance between non-signtracking and sign-tracking paired rats in food cup responding, so degrees of freedom did not need adjustment for the following analyses.

In phase I of light-food training, all rats increased their food cup entries in response to the light CS over the course of eight training sessions, and this response curve did not differ between the later determined sign-tracking and non-signtracking tendency groups (**Figure 4A**) We analyzed the data using a mixed ANOVA, with between subject factor of Tracking tendency (non-ST, ST) and within subject factors of Session (1- 8), and CS epoch (pre-CS last 5 s, CS last 5 s). Three was a main effect of Session [F(7, 385) = 72.8, p < 0.05] and CS epoch [F(1, 55) = 418.3, p < 0.05], as well as Session × CS epoch interaction [F(7, 385) = 120.9, p < 0.05] but no main effect of Tracking tendency [F(1, 55) = 3.3, p = 0.08], and no interactions of Tracking tendency × Session [F(7, 385) = 1.7, p > 0.05], Tracking tendency × CS epoch [F(1, 55) < 0.1, p > 0.05], or Tracking tendency × Session × CS epoch [F(7, 385) = 1.5, p > 0.05].

In phase II, the CTA developed similarly in both sign-tracking and non-sign-tracking rats (**Figure 4B**). For taste aversion training we analyzed the data using a mixed ANOVA, between subjects factors of Pairing (Unpaired, Paired) and Tracking tendency (non-ST, ST) and within subject factor of Trial (Day 1, Day 2). We found main effects of Pairing [F(1, 53) = 155.9, p < 0.05] and Trial [F(1, 53) = 192.5, p < 0.05] and a significant interaction of Trial × Pairing [F(1, 53) = 328.2, p < 0.05]. There were no significant main effects of Tracking tendency [F(1, 53) = 1.43, p = 0.24]. Nor were there interactions of Tracking tendency with Pairing [F(1, 53) < 0.01, p = 0.96] Trial [F(1, 53) = 0.36, p = 0.55], or Trial × Pairing [F(1, 53) < 0.01, p = 0.94].

In phase III, we focused our analysis to food cup behavior during the last 5 s of the CS. We found that the non-signtracking paired rats reduced their food cup responding, while paired and unpaired sign-trackers showed similar levels of food cup responding in probe test (**Figure 4C**). We analyzed this using a mixed ANOVA with between subject factors of Pairing (Unpaired, Paired) and Tracking tendency (non-ST, ST) and a within subject factor of CS epoch (pre-CS, CS). We found main effects of CS [F(1, 53) = 140.5, p < 0.05] and Pairing [F(1, 53) = 6.2, p < 0.05], as well as a Pairing × Tracking tendency interaction [F(1, 53) = 6.2, p < 0.05]. One-way ANOVA of food cup behavior during the Post-CS period, during which time the food pellets were previously delivered in the conditioning phase, showed the same critical Pairing × Tracking tendency interaction [F(1, 53) = 6.2, p < 0.05]. Bonferroni post-hoc comparisons confirmed that paired ST rats spent more time in the food cup relative to paired non-ST rats during the CS and Post-CS periods [F(1, 53) = 5.03, p < 0.05 and F(1, 53) = 7.71, p < 0.05, respectively]. That is, sign-tracking paired rats were less flexible than non-sign-tracking paired rats.

Notably, we did not see evidence for extinction learning differences between tracking groups when repeating the above analysis on food cup responding during the CS with the additional factor of Time (Trial 1–4, Trial 5–8) (data not shown). Importantly, the difference between non-ST paired and unpaired groups was evident in the first trial block (Time × Pairing × Sign-tracking tendency [F(1, 53) = 4.7, p < 0.05; mean ± SEM non-ST Unpaired: 51.3 ± 9.4%; non-ST Paired: 26.2 ± 3.1%, post-hoc: p < 0.05]), which was not true for ST paired


TABLE 2 | Experiment 1. Phase IV: Lever autoshaping procedure, summary table of analyses for lever and food cup measures (contact, latency, and probability).

test post-CS period with composite tracking score determined from lever autoshaping (right).

and unpaired (ST Unpaired: 42.7 ± 4.2%; ST Paired: 46.0 ± 5.6%, post-hoc: p > 0.05). In addition, both tracking groups extinguished at a similar rate, as evidenced by failure to see interaction for Time × Tracking [F(1, 53) = 0.186, p > 0.05] and by similar terminal levels of food cup responding between both unpaired groups in the second block of trials (mean ± SEM non-ST Unpaired: 20.6 ± 3.0%; ST Unpaired: 21.3 ± 4.0%; non-ST Paired: 9.5 ± 3.0%; ST Paired: 15.70 ± 3.7%). Overall, this

suggests that the sign-trackers display less flexible behavior after reward devaluation than non-sign-trackers.

To further confirm the strength of aversion to the pellets after the critical probe test we performed two additional consumption tests in the absence of LiCl injections; one in the homecage (**Figure 1C**, post) and the other in the experimental chamber. We analyzed the consumption data from the post-probe homecage test using an ANOVA, with between subject factors of Pairing

vs. non-sign-tracking (*n* = 21). Percent time spent in food cup (mean ± SEM) during the last 5 s of the 10 s Pre-CS and CS period of light-food conditioning. (B) Phase II: Pellet consumption during conditioned taste aversion training and post-probe homecage consumption test. Number of pellets consumed (mean ± SEM) in 10 min conditioned taste aversion training sessions (trial 1 and 2) and during post-probe homecage consumption test session (post). Paired or Unpaired data is separated by the later determined between subjects factor of tracking tendency. (C) Phase III: Overall effect of outcome devaluation during probe test. Percent time spent in food cup (mean ± SEM) during outcome devaluation probe test separated by tracking tendency during CS (left) and post-CS (right) for Unpaired and Paired groups. \*Different percent time spent in food cup between Unpaired and Paired groups within tracking tendency, *p* < 0.05. Paired non-sign-tracking (*n* = 12); Paired sign-tracking (*n* = 16) vs. Unpaired non-sign-tracking (*n* = 9); Unpaired sign-tracking (*n* = 20). #Different in % time spent in food cup between Paired sign-trackers and Paired non-sign-trackers groups, *p* = 0.05.

(Unpaired, Paired) and Tracking tendency (non-ST, ST). There was a main effect of Pairing [F(1, 53) = 7478.0, p < 0.05], but no main effect of Tracking tendency [F(1, 53) = 1.0, p > 0.05] and no interaction effect of Pairing and Tracking [F(1, 53) = 1.0, p > 0.05]. We analyzed the consumption data from the post-probe chamber test using an ANOVA, with between subject factors of Pairing (Unpaired, Paired) and Tracking tendency (non-ST, ST). There was a main effect of Pairing [F(1, 53)=4105.3, p < 0.05; data not shown; mean ± SEM; Unpaired non-ST = 50 ± 0.0, Unpaired ST = 50 ± 0.1, Paired non-ST = 1.4 ± 0.7, Paired ST = 3.0 ± 1.1], but no significant main effect of Tracking tendency [F(1, 53) = 1.1, p > 0.05] or interaction of Pairing and Tracking [F(1, 53) = 1.2, p > 0.05].

# Experiment 2: Pavlovian Second-order Conditioning

#### Phase I: Pavlovian First-order Light Discrimination Conditioning

We trained all rats in a first-order light CS+, CS− discrimination. All rats increased food cup entries in response to the CS+ over the course of 12 training sessions. In comparison food cup entries in response to the CS− over the course of 12 training sessions were lower than during the CS+ and remained relatively stable. Conditioned food cup entries during both the pre-CS+ and pre-CS− periods were low to start and remained relatively stable (**Figure 5A**). We focused our analysis on food cup entries during the last 5 s of the CS, a time when most food cup behavior occurs during FOC (Holland, 1977, 1980). We analyzed the data using a within subjects repeated measures ANOVA, using factors of Session (1–12), CS epoch (Pre-CS, CS), and CS Discrimination (CS−, CS+). There were main effects of Session [F(11, 253) = 8.7, p < 0.05], CS epoch [F(1, 23) = 353.2, p < 0.05], and CS Discrimination [F(1, 23) = 132.3, p < 0.05]. There were significant interactions of Session × CS epoch [F(11, 253) = 22.8, p < 0.05], Session × CS Discrimination [F(11, 253) = 12.0, p < 0.05] and CS epoch × CS Discrimination [F(1, 23) = 200.4, p < 0.05] as well as a significant Session × CS epoch × CS Discrimination interaction [F(11, 253) = 11.8, p < 0.05].

#### Phase II: Pavlovian Second-order Auditory Discrimination Conditioning

After first-order light discrimination conditioning, we trained all rats in a second-order auditory SOCS+, SOCS− discrimination (**Figures 5B–D**). We focused our analysis of food cup entries during second-order cues across the entire 10 s because most behavior (food cup, rear, head jerk) during second-order conditioning tends to be distributed more evenly across the CS period (e.g., Holland, 1977) in contrast to first-order cues, to which most food cup behavior occurs during the last 5 s of the CS (Holland, 1977, 1980).

All rats increased their food cup entries in response to the auditory SOCS+ over the course of three training sessions, while conditioned food cup entries during the pre-SOCS+ period were low and remained relatively stable. In contrast, food cup entries in response to the auditory SOCS− did not increase over the course of three training sessions and remained relatively stable,

while food cup entries during the pre-SOCS− periods were also low and remained stable (**Figure 5B**). Data for SOCS and pre-SOCS responding were analyzed using a within subjects ANOVA, including within subject factor of Session (1–3), SOCS epoch (pre-SOCS, SOCS) and SOCS Discrimination (SOCS−, SOCS+). The analysis of food cup entries showed a main effect SOCS epoch [F(1, 23) = 88.4, p < 0.05]. There were no main effects of Session or SOCS Discrimination [F(2, 46) = 0.3, p = 0.8, F(1, 23) = 0.2, p = 0.7, respectively] but there were significant interactions of Session × SOCS epoch [F(2, 46) = 3.2, p < 0.05] and Session × SOCS Discrimination [F(2, 46)=4.3, p < 0.05] as well as a significant interaction of Session × SOCS epoch × SOCS Discrimination [F(2, 46) = 4.1, p < 0.05]. This suggests that we observed acquisition of food cup CR to the auditory SOCS+ associated with the previously reward-paired CS across all rats.

For second-order conditioning we analyzed both compound and elemental FOC trials. The food cup entries during FOC

SOCS− or CS+ vs. CS−, *p* < 0.05.

compound and elemental trials are shown in **Figures 5C,D**, respectively. For both compound and elemental FOC cues, food cup entries during CS+ trials remained high and stable over the course of three training sessions, while food cup entries during the pre-CS+ period were low and remained relatively stable. In contrast, food cup entries in response to the FOC CS− trials over the course of the three training sessions were lower than during the CS+ and remained relatively stable, and food cup entries during the pre-CS− period were also low and remained stable. We analyzed the first-order compound and elemental data using a single within subjects ANOVA, including within subjects factors of Session (1–3), CS epoch (pre-CS, CS), CS Discrimination (CS−, CS+) and Stimulus type (elemental, compound). The analysis of food cup responding showed main effects of Session [F(2, 46) = 5.5, p < 0.05], CS epoch [F(1, 23) = 46.8, p < 0.05] and CS Discrimination [F(1, 23) = 51.2, p < 0.05]. There were significant interactions of Session × CS epoch [F(2, 46) = 3.7, p < 0.05] and CS epoch × CS Discrimination [F(1, 23) = 74.7, p < 0.05]. There were no other significant interactions of Session × CS Discrimination [F(2, 46) = 0.2, p > 0.05] or Session × CS epoch × CS Discrimination [F(2, 46) = 0.6, p > 0.05]. There were no main effect of Stimulus type [F(1, 23) = 0.1, p > 0.05] nor any other significant interactions (Stimulus type × Session [F(2, 46) = 1.0, p > 0.05], Stimulus type × CS epoch [F(1, 23) = 0.1, p > 0.05], Stimulus type × CS Discrimination [F(1, 23) = 0.1, p > 0.05], Stimulus type × Session × CS epoch [F(2, 46) = 2.4, p > 0.05], Stimulus type × Session × CS Discrimination [F(1, 23) = 0.4, p > 0.05], Stimulus type × CS epoch × CS Discrimination [F(2, 46) < 0.1, p > 0.05], or Stimulus type × Session × CS epoch × CS Discrimination [F(2, 46) = 1.3, p > 0.05]). This indicated that food cup responding to light first-order CS+ was maintained even when presented in compound with first-order cues but without food reward. That is, food cup responding to the reward-paired firstorder cue did not extinguish during the course of second-order conditioning.

#### Phase III: Second-order Probe Test

After second-order discrimination training, we assessed food cup entries, rearing and head jerk responses to the firstorder conditioned stimuli (FOCS) and second-order conditioned stimuli (SOCS) under extinction conditions (**Figure 6**). We focused our analysis of food cup entries to the last 5 s of the light FOCS, a time when most food cup behavior occurs (Holland, 1977). We focused our analysis of rearing to the first 5 s of a 10 s visual FOCS, as rearing is more frequent during this time period for visual cues (Holland, 1977, 1980). Accordingly, we report food cup entries for only the last 5 s of both FOCS and SOCS trials, and rearing data for only the first 5 s of the FOCS trials. For auditory second-order conditioning, rearing and head jerk were analyzed across the entire 10 s of the SOCS trials because orienting behavior during second-order conditioning tends to be distributed more evenly across the CS periods (Holland, 1977). In accordance with Experiment 1, we presented only results from the first half of the session, the first two trials of each type of stimulus. Data for food cup entries, rearing, and head jerk during first-order and second-order conditioning data were analyzed in

six separate within subjects ANOVAs including within subjects factors of CS epoch (pre-CS, CS) and CS Discrimination (CS+, CS−).

right panel). \*Different in responding between pre-CS and CS, *p* < 0.05.

Overall we observed more food cup entries to FOCS+ trials relative to FOCS− trials while pre-CS responding during both FOCS+ and FOCS− trials was low (**Figure 6A** left), and the same pattern of responding was seen on SOCS+ and SOCS− trials (**Figure 6A** right). The separate analyses of food cup entries for FOCS and SOCS trials showed main effects of CS epoch [F(1, 23) = 98.4 and F(1, 23) = 16.5, FOC and SOC respectively, p < 0.05] and CS Discrimination [F(1, 23) = 96.0 and F(1, 23) = 5.9, FOC and SOC respectively, p < 0.05], as well as a CS epoch × CS Discrimination interaction [F(1, 23) = 88.7, p < 0.05].

Inconsistent with rearing being primarily considered a conditioned response to light cues, percent time spent rearing during light stimuli throughout probe test was lower to FOCS+ relative to FOCS− trials. Pre-CS responding during both FOCS+ and FOCS− trials was also low (**Figure 6B** left). In contrast, percent time spent rearing was greater to auditory SOCS+ trials relative to SOCS− trials while pre-CS responding during both SOCS+ and SOCS− trials was low (**Figure 6B** right). The analysis for percent time spent rearing during light FOCS and the during auditory SOCS trials showed main effects of CS [F(1, 23) = 61.1 and F(1, 23) = 15.5, FOC and SOC respectively, p < 0.05] and CS Discrimination for FOCS trials [F(1, 23) = 7.6 for FOC p < 0.05]. While there was no main effect of CS Discrimination for SOCS trials [F(1, 23) = 0.03, p > 0.05], there were significant interactions of CS epoch × CS Discrimination for both FOCS and SOCS trials [F(1, 23) = 12.6, and F(1, 23) = 6.8, FOCS and SOCS respectively, p < 0.05].

Consistent with head jerk being primarily considered a conditioned response to auditory cues, the percent time spent head jerking was similar for light FOCS+ and FOCS− trials relative to pre-CS responding for both FOCS+ and FOCS− (**Figure 6C** left), and more time was spent head jerking to auditory SOCS+ trials relative to SOCS− trials. Percent time spent head jerking during pre-CS for both SOCS+ and SOCS− cues was low (**Figure 6C** right). While the analysis for percent time spent head jerking during both FOCS and SOCS trials, showed main effects of CS epoch [F(1, 23) = 41.6 and F(1, 23) = 83.4, FOC and SOC respectively, p < 0.05] there was only a main effect of CS Discrimination for SOCS trials [F(1, 23) = 7.6 p < 0.05]. The CS epoch × CS Discrimination interaction only reached significance for SOCS trials [F(1, 23) = 6.6, p < 0.05].

#### Phase IV: Lever Autoshaping Procedure (Sign-tracking Screening Procedure)

After the probe test, we screened rats in the lever autoshaping procedure to determine their Tracking tendency. Rats' performance on three lever measures (contact, latency, and probability) and three food cup measures (contact, latency, and probability) is shown in **Figure 7**. Due to a food cup malfunction during the first session of lever autoshaping that resulted in food cup responding that was greater than three standard deviations outside of the mean for four rats, food cup data from those four rats for all food cup measures were excluded. To maintain the most accurate graphical representation of the lever- and food cup-directed behaviors across all rats we only eliminated this food cup data during the first lever autoshaping session in which there was a malfunction, shown in **Figure 7**. To maintain integrity of our within subject statistical analysis the data for food cup measures of those four rats was excluded across all five autoshaping sessions. We analyzed the data using six separate sets of mixed repeated measures ANOVAs, using within subject factors of CS (CS−, CS+) and Session (1-5) and between subjects factor of Tracking group (non-ST, ST). The six analyses were on three lever measures (contact, latency, and probability) and three food cup measures (contact, latency, and probability). The main effects and interactions are reported in **Table 3**. Importantly, as

in Experiment 1, the critical CS × Session × Tracking group interactions were significant for all six measures of CR.

# Lack of Individual Differences in Second-order Conditioning Probe Test

Given our a priori prediction that there is individual variability in responding to second-order cues as it relates to tracking tendency, we conducted linear correlation analyses using food cup, rearing, and head jerk CR during the SOCS trials during second-order conditioning probe test and the later determined composite tracking score (as assessed across lever autoshaping, **Figure 7**). Similar to previous studies (Holland, 1977) we analyzed food cup behavior during the last 5 s of a 10 s CS and we analyzed head jerk and rearing across the entire 10 s of the CS because most orienting behavior during second-order cues tends to be evenly distributed across the CS periods (e.g., Holland, 1977). We did not observe any significant correlations


TABLE 3 | Experiment 2. Phase IV: Lever autoshaping procedure, summary table of analyses for lever and food cup measures (contact, latency, and probability).

between any of the second-order conditioning measures (food cup, rearing or head jerking) to the SOCS+ (an auditory secondorder cue associated with the previously reward-paired firstorder cue) and the later determined composite tracking score [SOCS+ and tracking score; r <sup>2</sup> = 0.04, r <sup>2</sup> = 0.05, r <sup>2</sup> = 0.05, food cup, rearing, and head jerk respectively p > 0.05, **Figure 8** (top row)]. Furthermore, we found no correlation between secondorder conditioned responses to the SOCS− (an auditory secondorder cue associated with the unrewarded first-order cue) and the later determined composite tracking score (SOCS− and tracking score; r <sup>2</sup> = 0.05, r <sup>2</sup> = 0.06, r <sup>2</sup> = 0.02, food cup, rearing and headjerk, respectively p > 0.05, **Figure 8** (bottom row). Taken together, these results suggested that there is no relationship between CR to the auditory second-order SOCS+ and tracking tendency during lever autoshaping. To confirm that both sign- and non-sign tracking rats expressed the second-order CS discrimination, we include in the Supplementary Materials parallel analyses to Experiment 1 using between subjects factor of Tracking tendency (see Supplementary Figures S1, S2, S3 and accompanying text). This analysis confirmed the negative finding that variability in Pavlovian incentive learning as assessed by second-order conditioning does not relate to tracking phenotype.

# DISCUSSION

We found in Experiment 1 that performance during outcome devaluation probe test correlated with subsequent tracking tendency, such that sign-tracking rats that formed the foodillness association during the CTA phase failed to reduce responding to the light CS during probe test. We found that the marginally significant overall outcome devaluation effect, that is, the difference between paired and unpaired groups, was carried exclusively by the non-sign-tracking rats, which reduce food cup responding to the light CS after devaluation of the food. This phenotypic difference was not due to differences between tracking groups in acquisition of the light-food association or the food-illness association, as both groups similarly acquired these associations in light conditioning and CTA phases, respectively. Nor was this phenotypic difference the result of a non-specific difference in conditioned food cup responding or extinction of that response during probe test, as evidenced by a failure to see any differences between tracking groups in the unpaired conditions. Therefore, the failure of sign-trackers to suppress CR after reward devaluation is likely due to an inability to use stimulus-outcome associations to guide appropriate responding to CS based on the current value of the US. Notably the finding from our study, which assesses sensitivity to outcome devaluation prior to and outside of the context of autoshaping, stands in contrast to prior studies, which find that several lever directed behaviors are sensitive to outcome devaluation when the US devaluation occurs in or just after exposure to the autoshaping context itself (Cleland and Davey, 1982; Derman and Delamater, 2014). We suggest that the inflexibility observed in the present study is an additional feature of the sign-tracking phenotype, which is consistent with a recent study that observes resistance to Pavlovian extinction in sign-trackers (Ahrens et al., 2015). With relevance to inflexibility in drug-seeking, a prior study demonstrates that sign-trackers respond more than goaltrackers to cocaine-paired cues even in the presence of an aversive shock barrier (Saunders and Robinson, 2013). Here we demonstrated that even prior to drug experience, the later determined sign-tracking rats showed difficulty adjusting their cue-driven natural reward seeking behavior after the reward had been devalued. However, a failure to display a flexible reduction in CR after outcome devaluation could be explained instead by enhanced incentive value attributed to the initially appetitive cue, for which enhanced stimulus-response associations would effectively mask any stimulus-outcome driven learning after outcome devaluation.

To address this possibility, in Experiment 2, we examined whether individual variability in the expression of learned appetitive incentive value, as assessed by Pavlovian second-order conditioning, predicts the tracking phenotype. We observed evidence for second-order conditioning across all rats with three different measures of CR (food cup, rearing, head jerk). Performance during second-order conditioning probe test did not correlate with tracking tendency. Despite the lack of a relationship between these factors, we confirmed that both tracking groups expressed second-order cue discrimination in food cup and head jerk CR. Taken together, results from Experiment 1 and 2 suggest that sign- and non-sign-tracking rats learn equally well to attribute value to the previously rewarded first-order CS, which is then similarly able to support secondorder conditioning to a novel auditory cue. Thus, sign-tracking rats appear to have specific difficulty displaying flexible behavior to a first-order CS when the US associated with it is devalued.

# Theoretical and Methodological Considerations

Here we used Pavlovian outcome devaluation and second-order conditioning procedures to test two forms of incentive learning. In outcome devaluation the acquired appetitive incentive value of the US was manipulated through pairing with an aversive experience. Prior work suggests that the reduction in CR after outcome devaluation are dependent on stimulus-outcome (S-O) associations mediated by a CS−evoked representation of current value of the US (Holland and Straub, 1979; Colwill and Motzkin, 1994; Gallagher et al., 1999; Pickens et al., 2003). In second-order conditioning the acquired appetitive incentive value of the firstorder CS was tested directly by pairing with a novel second-order CS. Previous demonstrations of second-order conditioning assert that a change in CR to a second-order cue is either the result of that cue forming a stimulus-stimulus (S-S) association with a first-order cue (Rizley and Rescorla, 1972; Holland and Straub, 1979) or is dependent on stimulus-response (S-R) associations evoked by the second-order cue that are independent of the firstorder cue. The latter S-R associative mechanism is evidence for Pavlovian incentive learning that is insensitive to extinction of the first-order cue or to reward devaluation (Holland and Rescorla, 1975; Holland and Straub, 1979; Holland, 1981; McDannald et al., 2013, but also see Rizley and Rescorla, 1972; Rescorla, 1973; Rashotte et al., 1977; Rescorla, 1980; Nairne and Rescorla, 1981; Rescorla, 1982).

While we did not assess which associative mechanism supports learning in our second-order conditioning procedure, the different types of conditioned response (e.g., food cup entry, head jerk, rearing) can inform whether second-order responding is mediated by S-R or S-S associations (Holland and Rescorla, 1975; Holland and Straub, 1979; McDannald et al., 2013). For instance, acquisition food cup entry and head jerk responses to second-order cues are evidence for S-R associations, while acquisition of rearing is evidence for S-S associations. We found increased food cup, head jerk, and rearing to the second-order SOCS+ at test, suggesting the formation of both S-R and S-S associations. However, because we did not see evidence for successful discrimination of first-order cues with rearing we cannot confirm successful S-S driven responding to second-order cues. In addition, if second-order conditioning in our study relied on S-S associations, which are more dependent on the current incentive value of the first-order stimulus than are S-R associations, we would have expected to see a similar relationship between second-order conditioning probe test and tracking tendency to what we observed in the outcome devaluation experiment. Thus, we infer that S-R associations, which are evidence for incentive learning that is insensitive to reward devaluation, are likely the key mechanism mediating learning in our second-order conditioning procedure (Holland and Rescorla, 1975; McDannald et al., 2013).

Another consideration is that we did not observe the trackingrelated differences in incentive learning previously identified with conditioned reinforcement procedures (Robinson and Flagel, 2009; Yager and Robinson, 2010, 2013). Previous studies have found that sign-tracking rats show greater conditioned reinforcement effects than goal-tracking rats (Robinson and Flagel, 2009; Yager and Robinson, 2013; Yager et al., 2015), that is, for sign-tracking but not goal-tracking rats, a previously rewardassociated Pavlovian cue alone serves as a better reinforcer for the acquisition of a new conditioned instrumental response. It has been suggested from this, together with the observation that Pavlovian lever cues attract sign-trackers to a greater degree than goal-trackers, that sign-trackers attribute greater incentive salience to reward-predictive cues (Robinson and Flagel, 2009). This conclusion stands in contrast to the present results in which we found sign- and non-sign-trackers learn equally well to attribute incentive value to the previously rewarded first-order CS, which is then able to support conditioning to a second-order cue.

There are several theoretical and methodological considerations that may account for this disparity. The first is that two different procedures, conditioned reinforcement and second-order conditioning, have been used to examine the ability of previously reward-paired cue to support acquisition of CR in new associative learning. Importantly, the associative mechanisms that support CR and the form of the conditioned response itself differ in these two procedures. In conditioned reinforcement, instrumental action results in the rewardpredictive stimulus, whereas in second-order conditioning a Pavlovian second-order stimulus precedes the reward-predictive stimulus, independent of action. Individual rats may differ in the extent to which a Pavlovian cue can support Pavlovian vs. instrumental incentive learning known to be mediated by different associative learning mechanisms (Lopez et al., 1992; Corbit and Balleine, 2003).

Another methodological difference between this and prior sign-tracking studies examining incentive learning, is that we observed fewer goal-tracking rats, and thus categorize behavior either as sign- or non-sign-tracking. Prior studies using similar two-lever autoshaping procedures (CS+, CS−) to the one used in the present study have also proven very effective in generating sign-tracking in rats (Boakes, 1977; Davey and Cleland, 1982; Kearns and Weiss, 2004; Holland et al., 2014). Prior studies employing single-lever autoshaping typically observe more bimodal distributions of tracking behavior and thus focus the comparison of individual differences in conditioned reinforcement to the two extremes of the tracking continuum, sign- and goal-tracking (Robinson and Flagel, 2009; Saunders and Robinson, 2010; Yager and Robinson, 2010, 2013; Yager et al., 2015). While we observed that the behavior of the nonsign-tracking group is significantly different than the behavior of sign-tracking group (**Figures 2**, **7**; **Tables 2**, **3**), the pattern of behavior in non-sign-trackers during lever autoshaping closely resembles that of intermediate rats (Flagel et al., 2009), and thus it is possible that our use of non-sign-trackers prevents us from observing differences in expression of previously acquired Pavlovian incentive value that relate to tracking phenotype. However, the sign- and non-sign-tracking behavioral distinction used here was sufficient to observe differences between the two tracking groups in learning when incentive value changes as assessed by outcome devaluation.

Finally, in this study we determined the individual rats' tracking phenotype after assessing two forms of Pavlovian incentive learning, in contrast to prior studies in which sign- and goal-trackers are identified prior to incentive learning behavioral assessments (Flagel et al., 2009; Robinson and Flagel, 2009; Saunders and Robinson, 2010; Yager and Robinson, 2010, 2013; Yager et al., 2015). Here the goal was to limit differences in the individual rats' experience with conditioned and unconditioned stimuli in order to identify whether individual differences in incentive learning mapped onto the tracking phenotypes. We did not see evidence for differences between tracking groups in food cup or rearing (data not shown) behaviors during FOC of the light-food association. Because our assessment of incentive learning occurred in rats that have very similar experience with conditioned and unconditioned stimuli we may procedurally limit our ability to observe phenotypic differences driven by first-order incentive cues to support new associative learning. However, in so much as tracking phenotype is a behavioral trait and not a behavioral state, the time at which tracking phenotype is identified would not be expected to drive differences between results of our study and others.

# Candidate Brain Mechanisms Underlying Individual Differences in Incentive Learning

The behavioral results of the present study suggest that both sign- and non-sign-tracking rats attribute similar levels of appetitive incentive value to reward-paired cues, while only non-sign-tracking rats are able to flexibly adjust behavior in response to reward-paired cues for which the associated reward had been devalued. The brain circuits mediating Pavlovian outcome devaluation, second-order conditioning, and signtracking have considerable overlap. The functional impact of lesioning or disrupting activity in basolateral amygdala (BLA) and nucleus accumbens (NAc) has been demonstrated in each of the paradigms used in the present study. Just as pre-training NAc lesions impair single-reinforcer outcome devaluation (Singh et al., 2010) and acquisition of second-order conditioning (McDannald et al., 2013), they also impair acquisition but not maintenance of sign-tracking behavior (Chang et al., 2012), which is consistent for a role for NAc in acquisition of initial incentive value [however see Chang and Holland (2013) for lack of core and shell alone effects in lever-directed behavior]. Pre-training BLA lesions impair acquisition of incentive value to first-order cues in both outcome devaluation and secondorder conditioning procedures (Hatfield et al., 1996). In contrast BLA lesions do not interfere with the acquisition of signtracking behavior, but instead impact the maintenance of previously acquired sign-tracking behavior observed during lever autoshaping (Chang et al., 2012). Disconnection lesions of BLA and NAc that eliminate communication between these two areas impair both second-order conditioning (Setlow et al., 2002) and acquisition and maintenance of sign-tracking behavior (Chang et al., 2012), which shows a common function for BLA to NAc circuit for mediating attribution of incentive value to conditioned stimuli.

While caution should be taken when attempting to infer from our behavioral results what brain regions might account for individual differences reported here, the present finding of intact reward devaluation effects in non-sign-trackers, but not in signtrackers, suggests that BLA's reciprocal interactions with more specialized areas, such as orbitofrontal cortex or insular cortex, known to be critical for the expression of stimulus-outcome learning and goal-directed action (Pickens et al., 2003; Johnson et al., 2009; Parkes and Balleine, 2013) may be differentially involved in the two tracking groups. The interaction between insular cortex and NAc in retrieval of incentive value for goal-directed action has also been established (Parkes et al., 2015) and may be of interest with relevance to individual differences.

# Relevance of Tracking-Related Individual Differences for Understanding Addiction Vulnerability

Rodent studies that evaluate individual differences in signand goal-tracking behavior have demonstrated that heightened incentive motivation for natural rewards serves as an informative predictor of heightened motivation for drug rewards (Tomie, 1996; Flagel et al., 2009; Robinson and Flagel, 2009; Saunders and Robinson, 2010; Saunders et al., 2013; Yager and Robinson, 2013; Yager et al., 2015). Such pre-clinical procedures aimed at assessing behavioral markers of addiction-vulnerable individuals prior to drug-exposure may have relevance for human addiction. A promising recent study establishes a paradigm for assessing sign- and goal-tracking behaviors in humans (Garofalo and di Pellegrino, 2015), however the link between this procedure and human addiction has yet to be established.

A prominent theme in the addiction field is to understand whether the aberrant behavior of the addicted individual existed prior to drug-experience or whether it was drug-induced. The behavioral results presented here showed that inflexibility to changes in incentive value are evident prior to drug-experience in sign-tracking individuals, for which previous studies have shown have a greater sensitivity to drug-associated discrete cues. Studies directly examining the effects of amphetamine exposure on sign- and goal-tracking behaviors are mixed, sometimes resulting in more sign-tracking behaviors (Doremus-Fitzwater and Spear, 2011; Robinson et al., 2015) or in other studies more goal-tracking behaviors (Simon et al., 2009; Holden and Peoples, 2010). With relevance to the current study, prior cocaine exposure interferes with both stimulus-outcome mediated behavior in outcome devaluation (Schoenbaum and Setlow, 2005) and acquisition and use of learned incentive value to support second-order conditioning (Saddoris and Carelli, 2014). Taken together, it is likely a complex interplay of pre-existing individual differences that predispose addiction vulnerability together with drug-induced neuroadaptations that drive the seemingly aberrant behavior of drug addicted individuals. Here we used classic conditioning procedures with well-defined psychological and neurobiological underpinnings in order to determine whether individual variability in incentive processes map in a meaningful way onto the tracking phenotypes. Accounting for individual differences is likely a useful tool for understanding the brain basis for variability in natural and drugreward seeking behaviors.

# AUTHOR CONTRIBUTIONS

YC and HN contributed equally to this work. YC, HN, and KF acquired the data; YC and HN analyzed the data; DC, YC, and HN designed the experiments and interpreted the data; DC conceived and supervised the project; DC, YC, HN, and KF contributed to the write-up of the final version.

# FUNDING

NIDA IRP.

# ACKNOWLEDGMENTS

The work was supported by the Intramural Research Program of the National Institute on Drug Abuse. The authors declare that they do not have any conflicts of interest (financial or otherwise) related to the data presented in this manuscript. We would like to acknowledge Michael McDannald, Charles Pickens, and Guillermo Esber for their willingness to engage in thoughtful discussions, provide insightful feedback, and comment on the manuscript.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbeh. 2015.00289

## REFERENCES


devaluation effects on Pavlovian conditioned responding. Front. Integr. Neurosci. 4:126. doi: 10.3389/fnint.2010.00126


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Nasser, Chen, Fiscella and Calu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Individual differences in the influence of task-irrelevant Pavlovian cues on human behavior

Sara Garofalo1,2,3\* and Giuseppe di Pellegrino<sup>1</sup>

<sup>1</sup> Center for Studies and Research in Cognitive Neuroscience, Department of Psychology, University of Bologna, Cesena, Italy, <sup>2</sup> Department of Psychiatry, University of Cambridge, Cambridge, UK, <sup>3</sup> Behavioural and Clinical Neuroscience Institute, Department of Psychology, University of Cambridge, Cambridge, UK

Pavlovian-to-instrumental transfer (PIT) refers to the process of a Pavlovian rewardpaired cue acquiring incentive motivational proprieties that drive choices. It represents a crucial phenomenon for understanding cue-controlled behavior, and it has both adaptive and maladaptive implications (i.e., drug-taking). In animals, individual differences in the degree to which such cues bias performance have been identified in two types of individuals that exhibit distinct Conditioned Responses (CR) during Pavlovian conditioning: Sign-Trackers (ST) and Goal-Trackers (GT). Using an appetitive PIT procedure with a monetary reward, the present study investigated, for the first time, the extent to which such individual differences might affect the influence of rewardpaired cues in humans. In a first task, participants learned an instrumental response leading to reward; then, in a second task, a visual Pavlovian cue was associated with the same reward; finally, in a third task, PIT was tested by measuring the preference for the reward-paired instrumental response when the task-irrelevant reward-paired cue was presented, in the absence of the reward itself. In ST individuals, but not in GT individuals, reward-related cues biased behavior, resulting in an increased likelihood to perform the instrumental response independently paired with the same reward when presented with the task-irrelevant reward-paired cue, even if the reward itself was no longer available (i.e., stronger PIT effect). This finding has important implications for developing individualized treatment for maladaptive behaviors, such as addiction.

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

Jeremy J. Clark, University of Washington, USA Donna J. Calu, National Institute on Drug Abuse, USA

#### \*Correspondence:

Sara Garofalo, Department of Psychiatry, University of Cambridge, Forvie Site, Robinson Way, Cambridge CB2 0SR, UK garofalosara56@gmail.com

> Received: 07 April 2015 Accepted: 08 June 2015 Published: 24 June 2015

#### Citation:

Garofalo S and di Pellegrino G (2015) Individual differences in the influence of task-irrelevant Pavlovian cues on human behavior. Front. Behav. Neurosci. 9:163. doi: 10.3389/fnbeh.2015.00163 Keywords: Pavlovian-to-instrumental transfer, cue-controlled behavior, Sign-Tracker, Goal-Tracker, reinforcement learning

# Introduction

Goal-directed behavior can be variably influenced by external and internal factors which impact the values and priorities assigned to rewards and goals (Doya, 2008). One of the most simple and effective mechanisms for influencing choice is reinforcement learning. Reinforcement learning allows animals to connect spatially and/or temporally related events in order to predict future events. Given the complexity of the animal's environment, learning that an arbitrary cue (e.g., a sound) is predictive of a certain goal (e.g., obtain a reward, such as food), allows the animal to learn a flexible response that facilitates achievement of the goal itself. In most cases such cue-controlled behavior is adaptive; for example it helps one obtain food when hungry (Perks and Clifton, 1997; Holmes et al., 2010). However, an inflexible association can lead to perseverance in the same choice even if the goal itself is no longer available, or has negative longterm consequences (Holmes et al., 2010). For example, a cue associated with drugs can induce relapse even when the drug is not voluntary sought, and a sign associated with food can induce craving in the absence of hunger, leading to compulsive over-eating (Volkow et al., 2008). These biases on voluntary choice are also implemented in marketing strategies, such as advertisements, to influence consumer behavior (Smeets and Barnes-Holmes, 2003; Bray et al., 2008; de Wit and Dickinson, 2009). Cue-controlled behaviors have been interpreted as the endpoint of an initial intentional seeking behavior (of a reward), which leads to habitual, and ultimately compulsive, conduct characterized by a loss of control over behavior (Everitt and Robbins, 2005). This interesting framework proposes that the transition from intentional volition to habit and compulsion can be explained by interactions between Pavlovian and instrumental learning processes: a reward acts as an instrumental reinforcer by enhancing actions that are able to produce it, while Pavlovian learning confers incentive salience to cues (Conditioned Stimuli or CS) closely associated with the reward (Everitt and Robbins, 2005). Such cues can elicit craving and motivation towards the associated reward, thus biasing choice. Well-known evidence of this effect can be found in the so-called Pavlovian-to-Instrumental Transfer (PIT) effect (Estes, 1943, 1948). PIT captures the ability of a Pavlovian cue (i.e., a CS associated with a reward) to increase the likelihood of an instrumental response independently paired with the same (specific-PIT), or a similar (general-PIT), reward (Rescorla and Solomon, 1967; de Wit and Dickinson, 2009; Holmes et al., 2010). This effect emerges without any formal association between Pavlovian and instrumental contingencies, and even when the reward itself is no longer available (Talmi et al., 2008). PIT has been mainly studied in non-human animals (Rescorla and Solomon, 1967; Lovibond, 1981; Colwill and Rescorla, 1988; Balleine, 1994; Rescorla, 1994a, 1997, 2000; Delamater, 1995, 1996; Holland et al., 2002; Corbit and Balleine, 2003; Holland and Gallagher, 2003; Holland, 2004; Delamater and Holland, 2008; for review, see Dickinson and Balleine, 1994, 2002; Holmes et al., 2010), but some recent studies have also reported this effect in humans (Paredes-Olay et al., 2002; Hogarth et al., 2007, 2010, 2013a,b; Bray et al., 2008; Allman et al., 2010; Nadler et al., 2011; Prévost et al., 2012; Lovibond and Colagiuri, 2013).

An important, but still neglected, aspect in the human literature about PIT concerns individual differences. In the animal literature, the extent to which a Pavlovian cue becomes attractive and exerts a biasing effect varies between individuals. In particular, Sign-Trackers (ST) and Goal-Trackers (GT) have been shown to have different learning styles, consisting of a tendency to attribute more or less incentive salience to Pavlovian reward-associated cues. In a typical Pavlovian conditioning paradigm, a CS (e.g., lever presentation) is paired with a reward (e.g., food pellet), which is delivered in a different spatial position. In such a situation, two different Conditioned Responses (CR; i.e., learned responses to a previously neutral stimulus) might be expressed. Some animals approach and engage the CS (the Sign) itself and, only after its termination, reach the location of reward delivery; other animals, upon CS presentation, immediately engage the location of reward delivery (the Goal), even if it is not yet available. The first CR has been categorized as Sign-Tracking behavior, while the second CR has been categorized as Goal-Tracking behavior. ST and GT can be conceived of as different learning styles, expressed through a specific CR during Pavlovian learning. ST behavior is thought to arise from the attribution of incentive salience to Pavlovian reward-paired cues, which consequently become a powerful source of motivation for future behavior (Flagel et al., 2011). In ST, incentive stimuli become attractive, eliciting approach towards them and promoting potentially maladaptive cue-controlled behaviors; ST individuals, indeed, are generally more vulnerable to addiction and relapse (Tomie et al., 1998; Flagel et al., 2008; Robinson and Flagel, 2009). The ST and GT profiles do not seem to be limited to the CR expressed, but are also associated with differences in traits such as impulsivity; ST individuals are characterized by higher levels of impulsive behavior compared to GT individuals (Tomie et al., 2000; Flagel et al., 2009).

A deeper investigation into individual differences in attributing incentive salience to reward-paired stimuli would thus be important for understanding and reducing the propensity to develop maladaptive behaviors.

The aim of the present study was to investigate individual differences in human PIT. Specifically, the present study explored, for the first time in humans, whether individual differences in the propensity to approach and engage a Sign (cue-predicting reward) or a Goal (reward) are predictive of cue-controlled behavior. To this end, a typical PIT experimental design was used, comprising three tasks. In the first phase, participants performed an Instrumental Conditioning task, in which they were presented with two possible choices, one paired with an actual monetary win (Rewarded Choice) and the other paired with a neutral outcome (Unrewarded Choice). In a subsequent session, participants performed a Pavlovian Conditioning task, during which they learned to associate a specific visual cue with an actual monetary win (CS+), and another visual cue with a neutral outcome (CS−). During this phase, eye-movements were recorded and subsequently analyzed in order to identify the expressed CR and characterize participants as ST or GT. Mirroring previous studies conducted in animals (Boakes, 1977; Flagel et al., 2007, 2008, 2011; Saunders and Robinson, 2013), in which the CR is identified based on the amount of approaching behavior expressed during CS presentation, in the present study ST and GT participants were distinguished based on a learned oculomotor CR. Specifically, it was measured the tendency to direct contiguous eye-gazes toward the location where the visual CS (Sign) or the reward (Goal) would be presented. Finally, PIT was tested in an extinction phase (without any rewards), during which participants had to choose between the same two options given during instrumental conditioning, while presented with the task-irrelevant CS. In this final phase, PIT would be observed if presentation of the CS+, compared to the CS−, enhanced instrumental responses to the choice rewarded during instrumental conditioning (Congruent Choice), relative to the previously unrewarded choice (Incongruent Choice). If consistent with animal literature, this effect should be stronger in ST individuals than in GT individuals, possibly indicating a stronger biasing effect of Pavlovian cues over behavior in the first group relative to the second.

# Method

#### Participants

Forty-five volunteers (27 female; 2 left-handed; mean age = 24.87, sd = 2.5; mean education = 17.53, sd = 1.5) with no history of neurological diseases were recruited from the student population at the University of Bologna. All participants gave written informed consent to take part in the experiment and received payment corresponding to the amount earned during the tasks. The study was conducted in accordance with institutional guidelines and the 1964 Declaration of Helsinki. It was approved by the Ethics Committee for Psychological Research at the University of Bologna.

#### Stimuli and Procedure

The whole experiment consisted of three tasks. The same visual background was used in all three tasks. Four black squares (4 cm<sup>2</sup> ) were displayed on a 17-inch color monitor with a black background. The squares were highlighted by a white frame and positioned as follows: top center, bottom center, right center, left center. Two black-and-white fractal images (balanced for luminance, complexity and color saturation) were used as Pavlovian cues (CS) and presented within the top center square. An image of a 10 euro cent coin was used as the reward, and a light-yellow circle (equally sized) was used as the neutral outcome (no-reward). Both these visual cues appeared within the bottom center square (**Figure 1**). A computer running Presentation software (Neurobehavioral Systems, Albany, CA, USA) controlled stimulus presentation. On arrival, participants were comfortably seated in a silent room and their position was centered relative to the screen, at a viewing distance of 60 cm from the eye-tracker and 75 cm from the screen. The eye-tracker was positioned under the screen, and was centered relative to both the screen and the participant. Eyemovements and behavioral responses were collected throughout the experiment and stored for offline analysis. Participants were asked to remain as still as possible to avoid confounding effects on eye-movements. The whole experiment was conducted in a dark room to facilitate eye-movement recording. The experimental session began with calibration of the eye-tracker device, during which the participant fixated nine specific points on the computer screen. The experimental session followed the standard paradigm for testing PIT. It was composed of three tasks administered in succession: an Instrumental Conditioning task, in which participants learned a response-contingent reward; a Pavlovian Conditioning task, in which participants learned a cuecontingent reward; and a PIT task, during which the influence of irrelevant Pavlovian cues on instrumental responding was tested. In each task, participants were required to pay attention to the screen and follow the instructions reported at the beginning of the task. A few example trials were always performed and, if necessary, further clarifications were given before beginning each task. At the end of the experimental

session, participants completed the Barratt Impulsiveness Scale (BIS-11; Patton et al., 1995). Previous studies on animals reported an association between Sign-Tracking behavior and reduced impulse control (Flagel et al., 2011). Thus, this measure allowed further investigation into the differences between ST and GT individuals.

#### Instrumental Conditioning Task

Participants were instructed to choose between two squares to gain a reward. One square was paired with an actual monetary win (Rewarded Choice), while the other was paired with a neutral outcome (Unrewarded Choice). The right and left squares were presented in white and indicated as possible choices to be selected by a mouse click. The mouse pointer was centrally positioned before each choice, in order to not encourage a specific choice. Only one square was associated with a reward following a partial reinforcement schedule, so that between one reward and the next a variable interval between 4 and 12 s was always associated with no-reward. After each choice, a corresponding neutral image (light-yellow circle) or reward image (10 euro cents coin) appeared for 1 s in the bottom square (**Figure 1A**). Participants were aware that they would receive an actual payment corresponding to the amount of coins collected during the task. The association between square and outcome was counterbalanced across subjects. The rationale of this task was to make participants learn an association between a specific response (left or right square) and the reward; thus, participants would get a higher frequency of Rewarded Choices if they learned the correct association. The task lasted about 6 min, during which subjects were free to perform as many choices as they wished, with no time pressure.

#### Pavlovian Conditioning Task

In each trial, one of two possible visual cues (fractal images) appeared for 5 s within the top square, followed by a white patch within the bottom square. Upon presentation of the patch, participants were instructed to press the left-Ctrl button on the keyboard as quickly as possible to remove the patch and discover the outcome hidden below. To perform this button press, participants did not need to remove their gaze from the screen. The outcome was then presented for 1 s. One fractal was associated with a reward (10 euro cent coin) on 80% of trials (CS+), while the other fractal was associated with no-reward (light-yellow circle) on all trials (CS−; **Figure 1B**). The task consisted of 40 trials (20 per condition) with a variable inter-trialinterval between 0.5 and 4 s. Participants were aware that they would receive an actual payment corresponding to the amount of coins collected during the task. The association between visual cue and outcome was counterbalanced across subjects. The whole task lasted around 6 min.

The Pavlovian speeded reaction time response described above (''press the button upon patch presentation'') has been successfully used in previous studies Talmi et al. (2008) and was introduced to obtain a behavioral measure of Pavlovian conditioning. The main reason for using a speeded response was to mirror PIT studies on animals, in which Pavlovian conditioning is measured by a behavior performed to gain the reward (e.g., latency of the first nose-poke or frequency of nosepokes; Dickinson et al., 2000; Holland, 2004; Corbit and Balleine, 2005). The rationale here is to observe a faster reaction times when a reward was predicted (CS+ condition) than when a neutral outcome was predicted (CS− condition). To avoid a possible instrumental influence on the task, participants were explicitly told that, in this task, the reward was not contingent on their response. It was demonstrated that, if no answer was given, the patch would disappear anyway after 1.5 s, revealing the outcome. Importantly, this speeded reaction time response

To identify ST and GT CR, eye-movements were recorded in order to evaluate contiguous eye-gazes directed toward the ''Sign'' (top center square) and the ''Goal'' (bottom center square). Mirroring animal studies, these two CR were subsequently used to distinguish participants as ST or GT, depending on the tendency to direct eye-gaze toward the Sign or the Goal during the 5 s of CS presentation (Flagel et al., 2011).

#### Pavlovian-to-Instrumental Transfer (PIT) Task

Participants received exactly the same instructions as in the Instrumental Conditioning phase requiring them to choose between the right and left white squares. The task was identical to the Instrumental Conditioning task, except in two aspects: first, the task-irrelevant Pavlovian CS were presented sequentially within the top square, changing every 30 s, the task was completely performed in extinction, so all choices always lead to no-reward. (**Figure 1C**). Extinction is a standard procedure for assessing PIT, both in human and animal research, since it allows one to test the influence of Pavlovian cues on instrumental responding without the confounding effects of the reward (Rescorla, 1994a,b; Corbit et al., 2001; Bray et al., 2008; Talmi et al., 2008). Indeed, the rationale here is to test the ability of a task-irrelevant Pavlovian cue to drive choices (presumably, towards the response previously associated with a reward) even if the reward is not available anymore. The PIT task lasted about 6 min, during which subjects were free to perform as many choices as they wished, with no time pressure.

#### Eye Tracking

Eye movements were recorded in a dimly lit room using a Pan/Tilt optic eye-tracker (Eye-Track ASL-6000) which registers real-time gaze at 50 Hz. Data acquired during the Pavlovian Conditioning task were analyzed offline using EyeNal Analysis Software (ASL). Dwell time during the 5 s of CS presentation was then measured for two specific areas of interest (AOI): ''Sign'', corresponding to the 4 cm square at the top center, plus a 1 cm margin; ''Goal'', corresponding to the 4 cm square at the bottom center, plus a 1 cm margin. Dwell time was defined as the amount of time during which a series of contiguous fixations remained within the same AOI.

#### Sign-Tracker and Goal-Tracker Categorization

Participants were categorized as ST or GT based on the oculomotor CR expressed during the Pavlovian Conditioning task. Previous studies used approaching and engaging behaviors during Pavlovian Conditioning to identify ST and GT. In these studies, the numbers of contacts with the Sign (i.e., lever) and the Goal (i.e., food tray) were compared to obtain an index of behavior, and divide the subjects into ST (i.e., high probability to engage the lever) and GT (i.e., high probability to engage the food-tray) individuals (Flagel et al., 2007, 2008, 2011; Robinson and Flagel, 2009; Saunders and Robinson, 2013; Robinson et al., 2014). This method was adapted in the present experiment by calculating contiguous eye-gazes (Dwell Time) toward the cue (Sign) and the reward (Goal) AOI, during CS presentation (see above). ST behavior has been defined as a CR to approach and engage ''the cue or sign that indicates impending reward delivery''; while GT behavior has been defined as a tendency to ''engage the location of unconditioned cue delivery, even though it is not available until conditioned cue termination'' (Flagel et al., 2011). Thus, a learned oculomotor CR towards the location of the Sign or the Goal is a practical method for distinguishing between ST and GT individuals. On this basis, an eye-gaze index was created based on the Dwell Time spent on the Sign and Goal locations. An individual dwell is defined as the time period during which a fixation or series of temporally contiguous fixations remain within an AOI. That is, an individual dwell is defined as the sum of the durations across all fixations within the current AOI, from entry to exit. To compute fixations, EyeNal ASL was used, which defines a fixation if the observer' s gaze position remains within a diameter of 0, 5◦ of visual angle for at least 120 ms (six consecutive samples, at 50 Hz sampling rate; Eye-Analysis software Manual, v. 1.41, Applied Science Laboratories, 2007). The Dwell Time spent on the Sign and Goal locations was calculated for each trial and then averaged for each participant. The eye-gaze index was calculated as the difference between the Dwell Time on Sign minus the Dwell Time on Goal over the total Dwell Time (Sign − Goal/Sign + Goal), so that a higher value corresponded to a higher Dwell Time toward the Sign (Sign-Tracking behavior) and a lower value corresponded to a higher Dwell Time toward the Goal (Goal-Tracking behavior). Since the interest here was to disentangle two reward-specific CR, only CS+ trials in the second half of the task were considered, when contingency learning was more established. Based on this index, the top and bottom 50% of the total sample were categorized as ST (eye-gaze index between 0.38 and 1.00) and GT (eye-gaze index between −1.00 and 0.27), respectively.

### Results

#### ST and GT CR

To ensure that the oculomotor responses used to categorize ST and GT individuals were learned CRs, eye-gaze indices were separately analyzed for CS+ and CS− trials in the first and second halves of the Pavlovian Conditioning task. Two separate mixed-effects models with Group (ST/GT) and Hemiblock (1/2) as independent variables were performed for CS+ and CS− conditions. The eye-gaze index described above was the dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results from CS+ trials showed a significant interaction effect (F(1,42) = 14.75; twotailed p = 0.0004; part. η <sup>2</sup> = 0.26). Bonferroni-corrected post hoc tests revealed a significant difference (p = 0.003) between ST (mean = 0.35; sd = 0.77) and GT (mean = −0.06; sd = 0.79) in the second Hemiblock (**Figure 2A**). No other post hoc comparisons were significant (ps > 0.15). Results from CS− trials did not show any significant effects (ps > 0.05; **Figure 2B**). Overall, these results indicate two important points: first, a bias toward either the Sign or the Goal is a learned CR, since it is not

present at the beginning of the task but emerges later in time, when contingencies have been learned (**Figure 2A**); moreover, this looking bias is specific to the reward-paired cue (CS+), as no differences were observed for the unpaired cue (CS−; **Figure 2B**). In **Figure 2A** it is evident how, at the beginning of the Pavlovian task, during CS+ presentation, no tendency seems evident, while, towards the end ST show higher Dwell Time towards the Sign (eye-gaze index increases) while GT show higher Dwell Time towards the Goal (eye-gaze index decreases). **Figure 2B**, on the other hand, shows that the same pattern is not observable during the presentation of the neutral stimulus (CS−).

(Panel E) shows visual exploratory behavior in the two groups

standard error of the mean. \*p < 0.05; \*\*p < 0.01.

(ST = Sign-Trackers; GT = Goal-Trackers) throughout the task. Bars indicate

To further test that this behavior is a reward-specific CR, the eye gaze index was also directly compared between CS+ and CS− trials from the second hemiblock (when contingencies had been learned) within each group. Two separate paired t-tests were performed for the ST and GT groups, using Condition (CS+/CS−) as the independent variable and the eye-gaze index as the dependent variable. In both groups a significant difference between the two conditions was found. The ST group showed a significantly higher eye-gaze index in the CS+ condition than in the CS− condition (t(21) = 1.69; one-tailed p = 0.03; Cohen's d = 0.19), indicating a greater tendency to direct contiguous eye-gazes towards the Sign during CS+ trials than during CS− trials (**Figure 2C**). The GT group showed a significantly lower eye-gaze index in the CS+ condition than in the CS− condition (t(21) = 2.21; one-tailed p = 0.01; Cohen's d = 0.24), indicating a greater tendency to direct contiguous eye-gazes towards the Goal during CS+ trials than during CS− trials (**Figure 2D**).

Given the specific spatial locations of the Sign and the Goal in the present paradigm, visual exploratory behavior was also considered by analyzing the total dwell time spent on the top and the bottom portions of the screen, in order to exclude the presence of a spatial bias that could account for ST and GT behavior. A mixed-effects model was used, with Group (ST/GT) and AOI (Top/Bottom) as independent variables and Total Dwell Time as dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results showed a marginal main effect of AOI (F(1,42) = 4.01; two-tailed p = 0.05; part. η <sup>2</sup> = 0.09), with more Dwell Time spent on the Top of the screen (mean = 0.76; sd = 0.91) than on the Bottom (mean = 0.41; sd = 0.64) in both groups (**Figure 2E**). Neither group differences, nor interaction effects emerged (ps > 0.87). These results strengthen the evidence that the behavioral differences observed between ST and GT cannot be ascribed to a mere spatial bias towards the upper or the lower part of the screen. The general difference in time spent looking at the Top and the Bottom of the screen is compatible with the fact that dwell time was calculated during the 5 s of CS presentation. These results thus indicate that both groups spent more time visually exploring the region of the screen where a stimulus was being presented (Top), rather than where there was no stimulus (Bottom). No difference in this spatial bias was found between the two groups (**Figure 2E**).

Taken together, the last two analyses demonstrated that group differences in the tendency to direct contiguous eye-gazes to the location of the Sign or the Goal cannot be ascribed to a mere spatial bias, but rather reflect a learned reward-related CR.

#### Instrumental Conditioning

To ensure that instrumental conditioning was successful in both the ST and the GT groups, so that all participants learned which response leads to a reward, the number of choices (mouse clicks) made on the two white squares were compared. Choosing the square associated with reward was considered a Rewarded Choice, and choosing the square associated with no-reward was considered an Unrewarded Choice. A mixedeffects model was used, with Choice (Rewarded/Unrewarded) and Group (ST/GT) as independent variables and the number of choices as the dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results showed a main effect of Choice (F(1,42) = 20.88; two-tailed p < 0.0001; part η <sup>2</sup> = 0.33), with Rewarded Choices (mean = 32.80; sd = 9.38) occurring more frequently than Unrewarded Choices

(mean = 22.09; sd = 9.10; **Figure 3A**). Neither group differences, nor interaction effects emerged (ps > 0.55). These results indicate that the ST and GT groups learned to discriminate between the rewarding and non-rewarding choices equally well.

#### Pavlovian Conditioning

To ensure that Pavlovian learning occurred in both ST and GT groups, reaction times to patch presentation were analyzed. If participants correctly learned to discriminate between the two Pavlovian cues, faster reaction times should be observed for CS+ trials relative to CS− trials. A mixed-effects model was used, with Condition (CS+/CS−) and Group (ST/GT) as independent variables, and reaction times as the dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results showed a significant main effect of Condition (F(1,842) = 110.24; two-tailed p = 0.0001; part. η <sup>2</sup> = 0.72), with faster reaction times for CS+ trials (mean = 306.33; sd = 44.41) relative to CS− trials (mean = 351.21; sd = 50.05; **Figure 3B**). Neither group differences, nor interaction effects emerged (ps > 0.29). These results indicate that participants generally reacted more quickly to the patch on trials with the reward-paired cue (CS+) than on trials with the unpaired cue (CS−). This reward-specific response facilitation indicates successful Pavlovian conditioning in both ST and GT.

#### Pavlovian-to Instrumental Transfer

To test for PIT, the numbers of Congruent choices (associated with the reward during Instrumental Conditioning) and Incongruent choices (associated with no-reward during Instrumental Conditioning) during CS+ and CS− presentation were compared. A response index was calculated as the probability of selecting the Congruent choice minus the probability of selecting the Incongruent choice (number of congruent—incongruent choices/total number of choices). Higher values correspond to a higher probability of making the Congruent choice, while lower values correspond to a higher probability of making the Incongruent choice. A mixed-effects model was used, with Condition (CS+/CS−) and

indicate standard error of the mean. \*p < 0.05; \*\*\*p < 0.001.

Group (ST/GT) as independent variables and the response index, described above, as the dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results showed a significant Condition × Group interaction (F(1,42) = 8.22; two-tailed p = 0.006; part. η <sup>2</sup> = 0.16). Bonferroni-corrected post hoc comparisons revealed a significant difference (p = 0.001) between CS+ (mean = 0.18; sd = 0.12) and CS− (mean = 0.04; sd = 0.13) only in ST group, and a significant difference (p = 0.04) between ST (mean = 0.18; sd = 0.12) and GT (mean = 0.08; sd = 0.12) during CS+ (**Figure 4A**). No other comparisons were significant (ps > 0.13). These results indicate that the ST group was more likely to choose the congruent option when they saw the task-irrelevant CS+ than when they saw the CS−. thus revealing a PIT effect. Critically, this bias was stronger in ST than in GT individuals.

While the first analysis on PIT focused on the overall effect, a second analysis divided the task into three equal blocks of 2 min (four trials) to check for differences in task performance over time. A mixed-effects model was used, with Condition (CS+/CS−), Group (ST/GT) and Block (1/2/3) as independent variables, and the response index as the dependent variable. Subjects were modeled as a random effect. Assumptions of normal distribution, independence of residuals and sphericity were verified. Results showed a significant main effect of Condition (F(1,42) = 6.39; two-tailed p = 0.02; part. η <sup>2</sup> = 0.13), a significant Condition × Group interaction (F(1,42) = 7.69; twotailed p = 0.008; part. η <sup>2</sup> = 0.15), and a significant Block × Group interaction (F(1.27,53.32) = 50.61; two-tailed p < 0.001; part. η <sup>2</sup> = 0.5; **Figures 4B,C**). Bonferroni-corrected post hoc tests on the Condition × Group interaction revealed a significant difference (p = 0.003) between CS+ and CS− in ST group but not the GT group, and a significant difference (p = 0.02) between ST and GT groups in CS+ trials (**Figures 4B,C**). Bonferronicorrected post hoc tests on the Block × Group interaction revealed a significant difference (p < 0.0001) between ST and GT groups in the third block, but not in the first and second blocks (**Figures 4B,C**). **Figures 4D,E** show the number of responses.

In line with the results of the first analysis, these results showed that, unlike GT, ST group was more likely to choose the congruent option when they saw the task-irrelevant CS+ than when they saw the CS−, throughout the entire PIT task. The only effect of time revealed by this analysis was in the last block, where a group difference in responses emerged. Since this difference was unrelated to the displayed stimulus (CS+/CS−), it does not constitute a difference in PIT. This result instead indicates that the ST and GT groups differed in the proportion of congruent choice made towards the end of the task.

#### Impulsiveness

To further investigate differences between ST and GT individuals, self-reported impulsiveness, as rated by the BIS-11 questionnaire (Patton et al., 1995), was compared between the two groups. A two-sample t-test was performed using Group (ST/GT) as the independent variable and BIS-11 scores as the dependent variable. Results revealed a significant difference between the two groups (t(28.75) = 2.06; two-sided p = 0.04, with the ST group (mean = 61.0; sd = 9.91) showing higher impulsiveness than the GT group (mean = 54.09; sd = 8.86; **Figure 5**). This finding is consistent with previous studies showing significantly higher levels of impulsiveness as compared to GT (Tomie et al., 2000; Flagel et al., 2009).

# Discussion

Motivated behavior is characterized by a wide span of interindividual differences in both human and non-human animals. In the present study, the PIT paradigm was used to examine individual differences in the excitatory influence that signals associated with reward can exert on human choices. PIT is a well-known procedure for testing the ability of a Pavlovian reward-paired cue to acquire incentive motivational properties and influence instrumental performance (Estes, 1943, 1948; Rescorla and Solomon, 1967; de Wit and Dickinson, 2009; Holmes et al., 2010). Here, participants performed a standard PIT paradigm composed of three tasks: an Instrumental Conditioning task, during which response-outcome associations were learned; a Pavlovian Conditioning task, during which stimulus-outcome associations were leaned; and a PIT task, in which the ability of a Pavlovian cue to drive instrumental responses was tested. Individual differences were characterized by two distinct oculomotor CR exhibited during Pavlovian Conditioning, corresponding to two different learning styles previously identified and described in animal literature: Sign-Tracking (ST) and Goal-Tracking (GT; Estes, 1943, 1948; Boakes, 1977; Flagel et al., 2011). In the present study, ST behavior consisted of a tendency to direct contiguous eye-gazes towards the cue (CS) that indicated impending reward delivery (Sign); in contrast, GT behavior was characterized by a tendency to direct contiguous eye-gazes towards the location of reward (US) delivery (Goal), even if not available until CS termination. An eye-gaze index was based on the emergence of these two behavioral patterns during presentation of the reward-paired stimulus (CS+) in the second half of the task (when contingencies had been learned), and a median split was used to categorize participants as ST or GT. Importantly, the present results demonstrate that this oculomotor CR was: (i) acquired over time (i.e., learned), since a specific CR towards the Sign or the Goal only emerged towards the end of the task, when stimulusreward associations had been acquired selectively during the presentation of reward-paired cues (CS+; **Figures 2A,B**); and (ii) reward specific, since the CR was only evident when participants saw the reward-related cue (CS+) and not when they saw the neutral cue (CS−; **Figures 2C,D**). Coherently with what expected, the task-irrelevant CS had a much stronger influence on the ST group than on the GT group during the PIT task.

Group differences in the PIT effect are not attributable to differences in the strength of Instrumental or Pavlovian learning between the groups, which could have potentially induced a bias towards the rewarded choice in the Instrumental Conditioning task, or a stronger influence of the reward-paired cue in the second Pavlovian Conditioning task. Analyses of both the number of rewarded choices during Instrumental Conditioning, and reaction times during Pavlovian Conditioning, exclude such a possibility by revealing that both the ST and GT groups learned the response-outcome and stimulus-outcome contingencies equally well (**Figure 3**). Consequently, differences in the PIT effect cannot be explained by group differences in the ability to learn either the instrumental or the Pavlovian contingencies. In line with the animal literature (Robinson and Flagel, 2009), the Pavlovian cue (CS+) was clearly predictive of reward, since it elicited faster reaction times during Pavlovian conditioning than the neutral stimulus (CS−) did in both groups, along with a CR corresponding to the behavioral profile of each group (ST/GT).

Since the ''Sign'' and the ''Goal'' had specific spatial locations (the top and bottom portions of the screen, respectively), it is important to rule out the possibility that spatial biases in gaze direction might account for the difference in the PIT effect between groups. A bias towards looking at the top of the screen might cause result in a stronger influence of the Sign on the ST group just because they spent more time looking at it. Analysis of visual exploratory behavior during Pavlovian Conditioning, however, revealed that the ST and GT groups did not differ in the total amount of time spent looking at the top and bottom of the screen (**Figure 2E**). Critically, behavioral differences only emerged during CS+ trials towards the end of the task, once the association between the cue and the reward had been learned. Consequently, it is concluded that there was no a priori bias in gaze direction; rather, such a bias emerged during the Pavlovian Conditioning task as a learned reward-specific CR.

Moreover, a recent study (Trick et al., 2011) directly investigated the relation between fixation times during Pavlovian learning and the PIT effect. The authors found that fixation times during Pavlovian learning increased with uncertainty (that is, more attention was paid to stimuli with uncertain outcome probabilities, e.g., 50%, than to stimuli with more certain outcome probabilities, e.g., 90%). In contrast, the PIT effect increased with the probability of reward (that is, it was stronger for stimuli associated with a high probability of reward, e.g., 90%, than for stimuli associated with uncertain outcomes, e.g., 50%, or a low probability of reward, e.g., 10%). Thus, Trick et al. (2011) concluded that the behavioral influence exerted by CS (i.e., the PIT effect) is dissociated from attention to Pavlovian stimuli in humans, (see Kaye and Pearce, 1984, for similar findings in animals). Instead, PIT is linked to the predictive value acquired by stimuli during learning.

ST behavior has been explained as a consequence of attributing incentive salience to reward-paired cues (Pavlovian CS), arising from the interaction between previous experience (reinforcement learning processes) and individual propensities (Berridge, 2001; Berridge and Robinson, 2003; Flagel et al., 2011). This incentive salience motivates reward-related action (Tomie et al., 2000; Flagel et al., 2008; Robinson and Flagel, 2009). In the present study, ST and GT groups differed in the extent to which Pavlovian reward-paired cues biased their behavior: relative to the GT group, the ST group showed an increased likelihood of performing the instrumental response independently paired with the same reward when presented with the task-irrelevant rewardpaired cue, even if the reward itself was no longer available (i.e., a stronger PIT effect; **Figure 4A**). Therefore, reward-paired cues exerted a stronger source of influence on the behavior of ST individuals, as predicted. Importantly, time course analysis revealed that this effect occurred early and remained stable throughout the entire PIT test session (**Figures 4B,C**), thereby suggesting that the group difference in the PIT effect most likely reflects greater incentive salience to reward cues in ST than in GT individuals. A group difference in the overall amount of congruent responses (during both CS+ and CS− presentation, thus not reflecting PIT) emerged towards the end of the task (**Figures 4B,C**).

Previous studies have found an association between ST behavior and other traits, such as higher levels of behavioral impulsivity and a greater propensity to develop addiction (Tomie et al., 1998; Flagel et al., 2008; Robinson and Flagel, 2009). In line with these studies, the present study found reduced selfreported impulse control in the ST group than in the GT group (**Figure 5**). These findings seem to corroborate the idea that ST and GT behaviors are just one expression of a broader profile of individual differences, which might be clinically relevant. Many studies have reported that ST individuals are more impulsive and prone to develop potentially maladaptive behaviors, such as addiction (Tomie et al., 1998; Robinson and Flagel, 2009; Flagel et al., 2011). For example, the propensity to sign-track is associated with a stronger effect of psychomotor sensitization, a higher susceptibility to a form of cocaine-induced plasticity that may contribute to the development of addiction (Flagel et al., 2008). Furthermore, ST behavior in relation to a specific Pavlovian cue (i.e., a cue predicting monetary reward) is also predictive of the propensity to attribute incentive salience to other reward-paired cues, such as food-related or drug-related cues (e.g., cocaine and alcohol; Uslaner et al., 2006; Cunningham and Patel, 2007; Flagel et al., 2008; Clark et al., 2013). The extent to which such individual differences might play a role in the development of addiction and in the propensity to relapse is not yet clear, but their implications for developing individually targeted treatment programs are promising.

It should be noted that some recent studies highlighted a more complex scenario relating ST and GT behaviors to addiction. While ST individuals are more susceptible to the influence of discrete cues, GT individuals are more influenced by contextual cues, which can motivate drug-seeking behavior (Robinson et al., 2014). Consequently, these learning styles seem to reflect differences in the kinds of triggers to which the individual is susceptible (e.g., discrete/contextual), rather than a propensity to addiction per se. This finding emphasizes that there are diverse pathways to addiction, and has remarkable implications for the development of personalized treatments in the future.

But what exactly is the mechanism underlying the attribution of incentive salience to discrete stimuli, such as Pavlovian cues? A large amount of evidence points to the role of dopaminergic transmission within circuits known to be involved in addiction. The core of the nucleus accumbens, for example, was reported to be involved in ST behavior, and mediates the reinstatement of drug-seeking and drug-taking behavior (Flagel et al., 2007, 2008, 2011; Clark et al., 2013). Furthermore, various studies have supported the involvement of the mesolimbic dopamine system in the emergence of ST behavior. ST individuals are characterized by stronger dopaminergic gene expression and increased levels of dopamine in the nucleus accumbens (correlated with the vigor with which the CR is performed; Flagel et al., 2007, 2008). Even if differences in basic dopaminergic levels cannot fully account for differences in dopamine responsiveness, it has been argued that higher reward-related dopamine release before conditioning might increase attribution of incentive salience to reward-related cues (Wyvell and Berridge, 2000, 2001). Additionally, Flagel et al. (2011) directly demonstrated that dopaminergic transmission is not involved in all forms of learning, but it is necessary for the acquisition of a sign-tracking CR, playing a crucial role in the assignment of incentive salience to reward-related cues. The same study also showed that dopaminergic prediction-error signals, coded by activity in the nucleus accumbens, are present in ST individuals, but not in GT individuals. In the present study, a similar mechanism might occur: high levels of dopamine release might boost attribution of incentive salience to reward-related cues, increasing their ability to motivate and drive behavior.

Future studies might further investigate individual differences in the influence of Pavlovian cues on behavior by taking additional measures into account, such as phasic dopamine levels, psychophysiological indices (e.g., galvanic skin response and heart rate) and as neuroimaging measurements. These methods would allow better comparisons between human and non-human animal research on individual differences in ST/GT behavior and learning styles. A general limitation in the standard PIT paradigm is that the ''Sign'' and the ''Goal'' are presented in distinct spatial locations. Thus, unrelated spatial biases in gaze direction might obscure the effect of interest. Although the analysis conducted in this study already confirmed that the present findings cannot be accounted for by any a priori difference in spatial bias between groups, another way to control for this possibility would be to replicate the experiment with the spatial positions of the ''Sign'' and the ''Goal'' inverted in the three tasks.

In conclusion, the individual differences demonstrated here offer a promising direction for further investigating the degree to which incentive salience is attributed to environmental stimuli associated with rewards, as well as the link between this process and maladaptive behaviors, ranging from overeating to pathological gambling and addiction (Saunders and Robinson, 2013). Further, the present findings have important implications for the treatment of impulse-control disorders. Overall, these individual differences in PIT offer new insights into the mechanisms underlying the transition from intentional to habitual/compulsive behavior.

## Author Contributions

All authors conceived of and designed the experiment; S. G. programmed the task, ran the experiment, analyzed the data, wrote the main manuscript text and prepared the figures; all authors read, corrected and approved the final manuscript.

### Acknowledgments

The authors thank Sara Moroni, Francesca Casadei and Chiara Lancioni for helping with data collection. This work was supported by grants from the Ministero Istruzione Università Ricerca (PRIN 2010, protocol number: 2010XPMFW4\_009) awarded to GdP.

### References


stress-induced corticosterone release and mesolimbic levels of monoamines. Pharmacol. Biochem. Behav. 65, 509–517. doi: 10.1016/s0091-3057(99) 00241-5


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Garofalo and di Pellegrino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Basal forebrain motivational salience signal enhances cortical processing and decision speed

Sylvina M. Raver and Shih-Chieh Lin\*

Neural Circuits and Cognition Unit, Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA

The basal forebrain (BF) contains major projections to the cerebral cortex, and plays a well-documented role in arousal, attention, decision-making, and in modulating cortical activity. BF neuronal degeneration is an early event in Alzheimer's disease (AD) and dementias, and occurs in normal cognitive aging. While the BF is best known for its population of cortically projecting cholinergic neurons, the region is anatomically and neurochemically diverse, and also contains prominent populations of non-cholinergic projection neurons. In recent years, increasing attention has been dedicated to these non-cholinergic BF neurons in order to better understand how non-cholinergic BF circuits control cortical processing and behavioral performance. In this review, we focus on a unique population of putative non-cholinergic BF neurons that encodes the motivational salience of stimuli with a robust ensemble bursting response. We review recent studies that describe the specific physiological and functional characteristics of these BF salience-encoding neurons in behaving animals. These studies support the unifying hypothesis whereby BF salience-encoding neurons act as a gain modulation mechanism of the decision-making process to enhance cortical processing of behaviorally relevant stimuli, and thereby facilitate faster and more precise behavioral responses. This function of BF salience-encoding neurons represents a critical component in determining which incoming stimuli warrant an animal's attention, and is therefore a fundamental and early requirement of behavioral flexibility.

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

David A. Leopold, National Institutes of Health, USA Benjamin Hayden, University of Rochester, USA

\*Correspondence:

Shih-Chieh Lin shih-chieh.lin@nih.gov

Received: 28 August 2015 Accepted: 28 September 2015 Published: 12 October 2015

#### Citation:

Raver SM and Lin S-C (2015) Basal forebrain motivational salience signal enhances cortical processing and decision speed. Front. Behav. Neurosci. 9:277. doi: 10.3389/fnbeh.2015.00277 Keywords: nucleus basalis, behavioral flexibility, attention, decision making, rat, gain modulation

# INTRODUCTION

The mammalian basal forebrain (BF) is one of the most prominent cortically projecting neuromodulatory systems, with dense projections throughout the entire cerebral cortex, including prefrontal cortical areas (Gritti et al., 1997; Henny and Jones, 2008; Zaborszky et al., 2015). BF is an important structure implicated in attention, arousal, and in the control of cortical activity and plasticity (Everitt and Robbins, 1997; Wenk, 1997; Kilgard and Merzenich, 1998; Weinberger, 2003; Froemke et al., 2007). BF neuronal degeneration often occurs as an early event in Alzheimer's disease (AD; Whitehouse et al., 1982; Grothe et al., 2012) and some forms of dementia (Cummings and Benson, 1984; Grothe et al., 2012). BF impairment has been implicated in normal cognitive aging (Gallagher and Colombo, 1995). In recent years, deep brain stimulation of BF targets has emerged as a potential novel therapy to alleviate dementia-related cognitive impairments (Freund et al., 2009; Hescham et al., 2013; Salma et al., 2014). Because of BF's important role in normal cognitive functioning and in agerelated diseases, understanding BF circuitry is therefore an important topic in neuroscience.

Despite the historical focus of BF studies on its cholinergic neurons, recent studies have begun to reveal the heterogeneity of neuronal dynamics and the functional significance of different non-cholinergic elements in the BF (a brief review in Lin et al., 2015). In this review, we focus on a specific population of putative non-cholinergic neurons in the BF that have been extensively studied in recent years (Lin et al., 2006; Lin and Nicolelis, 2008; Avila and Lin, 2014a,b; Nguyen and Lin, 2014). These studies highlight the functional significance of this group of putative non-cholinergic BF neurons in the decision making process via the encoding of motivational salience, which supports a fundamental aspect of behavioral flexibility.

In the first part of this article (Section 1), we discuss how the anatomical and neurochemical complexity of the BF extends far beyond the cholinergic neurons that have historically been the focus of study. In Section 2, we review recent studies that identify a unique population of putative non-cholinergic BF neurons that encodes the motivational salience of stimuli with a robust bursting response and discuss their neurochemical identity. In Section 3, we review previous BF single unit studies in behaving animals and suggest that this group of salienceencoding BF neurons have been widely described but interpreted under different circuit identities. In Section 4, we review the key features of salience-encoding BF neurons that have been revealed by recent studies. Finally, in Section 5, we propose a unifying hypothesis about the functional significance and neurochemical identity of BF salience-encoding neurons. We propose that these salience-encoding BF neurons serve as a gain-modulation mechanism to augment cortical processing of behaviorally relevant stimuli, and to modulate the speed of the decision process that enables flexible and adaptive behavior.

# SECTION 1: BF IS A NEUROCHEMICALLY AND ANATOMICALLY COMPLEX REGION

BF has traditionally been defined by the presence of cortically projecting magnocellular cholinergic neurons that provide most of the cholinergic input to the cerebral cortex (Meynert, 1872; Mesulam et al., 1983). The cortically-projecting cholinergic neurons do not reside in a single well-defined nucleus, but rather are distributed throughout a collection of brain regions that extend along both the anterior-posterior and dorso-ventral axes with a complex geometry (**Figure 1**; Gritti et al., 1997; Zaborszky et al., 2015). The regions containing cholinergic neurons can be broadly divided into two major divisions: an anterior division projecting to the hippocampus, that includes the medial septum and vertical band of Broca, and a posterior division projecting to the cerebral cortex and amygdala, that includes the substantia innominata (SI), the horizontal diagonal band of Broca (HDB), the magnocellular preoptic area (MCPO), and the nucleus basalis of Meynert (NBM; Meynert, 1872; Mesulam et al., 1983; Gritti et al., 2006; Zaborszky et al., 2015). Cortically-projecting neurons in the posterior BF division are also found throughout the posterior ventral pallidum (VP; Gritti et al., 2006; Zaborszky et al., 2015). The anterior division is commonly referred to as the medial septum, while the posterior division is commonly referred to as the BF. The current review focuses on the posterior division only and adopts this narrower definition of the term BF.

Despite the historical focus of BF studies on its cholinergic neurons, neuroanatomical studies in the last two decades have made it clear that BF contains more than just cholinergic neurons and is instead a neurochemically heterogeneous region. In addition to cholinergic neurons, the BF contains an equally prominent number of GABAergic and glutamatergic cortically projecting neurons that are spatially intermixed with cholinergic neurons and co-distributed throughout the BF (**Figure 1**; Freund and Gulyás, 1991; Freund and Meskenaitet, 1992; Gritti et al., 1997; Hur and Zaborszky, 2005; Henny and Jones, 2008; Zaborszky et al., 2015). While non-cholinergic BF neurons have historically been overlooked in the literature, their potential functional significance has been suspected in BF lesion studies: cholinergic-specific lesions of the BF produces limited behavioral and cognitive impairments, and does not capture the scope and severity of non-selective BF lesions that affect noncholinergic neurons (Dunnett et al., 1991; Page et al., 1991; Muir et al., 1993; Wenk et al., 1994; Berntson et al., 2002). The functional significance of non-cholinergic BF neurons has received increasing attention in recent years (Sarter and Bruno, 2002; Lin and Nicolelis, 2008; Avila and Lin, 2014a; Nguyen and Lin, 2014; Kim et al., 2015) as studies have begun to reveal the heterogeneity of neuronal dynamics and the functional significance of different non-cholinergic elements in the BF (a brief review in Lin et al., 2015). The neurochemical heterogeneity in BF highlights the importance of identifying and characterizing the distinct component populations of BF circuits, especially in distinguishing the contribution of cholinergic neurons from noncholinergic BF neurons.

The complex geometry of the BF also intersects at different subregions with several other macrosystems, such as the ventralstriatopallidal system and the extended amygdala, that have input-output connectivity patterns distinct from that of the BF (Gritti et al., 1997; Heimer, 2000). The spatial overlap with other macrosystems, as well as the anatomical heterogeneity between different sub-regions of the BF, add additional layers of complexity to the study of BF, and can become sources of confusion. It is therefore essential for studies to report the exact locations of their experimental investigations within the large BF complex, so that the functional contributions of BF can be distinguished from those of overlapping macrosystems.

# SECTION 2: BF BURSTING NEURONS REPRESENT A UNIQUE POPULATION OF PUTATIVE NON-CHOLINERGIC BF NEURONS

Recent studies have identified a unique population of BF neurons that forms a physiologically and functionally homogenous ensemble, and that has been referred to as BF bursting neurons

FIGURE 1 | Both cholinergic and non-cholinergic BF cortically projecting neurons are co-distributed across broad regions. 3D distribution of neurons in the rat basal forebrain (BF), labeled by retrograde tracer injections into frontal and posterior cortical areas, with each row representing one experiment. The left column shows cortically projecting cholinergic (CH) neurons only; the right panel shows the distribution of non-cholinergic (nch) cortically projecting neurons. Insets show the locations of retrograde tracer injections in frontal and posterior cortical locations. Each cortical target, marked by a different color, receives projections (in corresponding colors to injection locations) from BF neurons distributed along a considerable rostro-caudal and dorso-ventral extent. Note that non-cholinergic projection neurons outnumber cholinergic neurons, and both cholinergic and non-cholinergic projection neurons are intermingled throughout the entire extent of the BF. Light gray structures are the corpus callosum and external capsule. Arrows show orientation (A, anterior; L, lateral; M, medial; P, posterior). Adapted from Zaborszky et al. (2015), reprinted with permission.

or salience-encoding BF neurons in the literature (Lin et al., 2006; Lin and Nicolelis, 2008; Avila and Lin, 2014a,b; Nguyen and Lin, 2014). The BF bursting neurons are characterized by three defining features: first, these neurons have low tonic firing rates (1–10 Hz) that remain unchanged across the different phases of the sleep-wake cycle (**Figures 2A,B**; Lin et al., 2006; Lin and Nicolelis, 2008). Second, the activities of these neurons are highly correlated with each other, and are punctuated by phasic ensemble bursting events that involve most BF bursting neurons (**Figure 2C**; Lin et al., 2006; Lin and Nicolelis, 2008). Third, these neurons show highly similar phasic bursting responses to motivationally salient stimuli that are distinct from other recorded neurons in this region (**Figure 2D**; Avila and Lin, 2014b; more discussion in the next section). The large amplitude action potentials with broad and complex waveforms (Avila and Lin, 2014b) of BF bursting neurons are consistent with the properties of large, magnocellular cortically projecting neurons previously described in the BF (Gritti et al., 1993, 1997). Furthermore, the short latencies in modulating cortical activity by BF bursting neurons (Nguyen and Lin, 2014) are consistent with the conduction delays of a direct BF projection to the cerebral cortex (Aston-Jones et al., 1985; Reiner et al., 1987). BF bursting neurons thus form a functionally and physiologically homogeneous population, most likely as a component of the BF corticopetal projection network. Recordings in the MS region do not find similar bursting neurons (Zhang et al., 2011), suggesting that neurons in the MS and BF regions do not share the same properties.

Multiple lines of indirect evidence suggest that BF bursting neurons do not match the known properties of BF cholinergic neurons. First, the constant firing rates in BF bursting neurons across different arousal states (**Figures 2A,B**) stands in contrast to BF cholinergic neurons whose firing rates are significantly higher during waking and REM sleep compared to slow-wave sleep (SWS; Lee et al., 2005; Hangya et al., 2015). Second, the instantaneous firing rates of BF bursting neurons within the bursts rarely exceed 80 Hz (Lin et al., 2006; Lin and Nicolelis, 2008), which is significantly slower than cholinergic BF neurons that can fire calcium bursts with much faster intraburst frequencies (100–200 Hz or higher; Alonso et al., 1996; Lee et al., 2005; Hangya et al., 2015). Third, the temporal dynamics of BF bursting neurons in response to primary reinforcers do not match those of optogenetically identified BF cholinergic cells. A recent report (Hangya et al., 2015) reveals that cholinergic neurons can be precisely activated by primary reinforcers with very short latencies (15–40 ms), which is markedly faster than the BF bursting response to primary reinforcers that takes place between 50–200 ms after reinforcer delivery (Lin and Nicolelis, 2008; Avila and Lin, 2014a). These lines of evidence suggest that BF bursting neurons likely represent a unique group of noncholinergic BF neurons.

In addition to the corticopetal cholinergic neurons, BF contains prominent populations of GABAergic and glutamatergic cortically projecting cells (Gritti et al., 1997; Henny and Jones, 2008; Zaborszky et al., 2015) that are likely candidates for the identity of the BF bursting neurons. The GABAergic BF neurons present an intriguing possibility because BF GABAergic projections to the cortex are ideally positioned to enhance cortical activity due to their preferential innervation of intracortical interneurons (Freund and Gulyás, 1991; Freund and Meskenaitet, 1992; Henny and Jones, 2008). While many cortically projecting GABAergic BF neurons also express the calcium binding protein parvalbumin (PV; Gritti et al., 1997), it appears unlikely that the BF bursting neurons correspond to the BF cortically projecting PV + GABAergic neurons. A recent study demonstrated that optogenetically tagged PV + GABAergic BF neurons have sustained firing rates greater than 30 Hz (Kim et al., 2015) and brief action potentials (McKenna et al., 2013), which are at odds with the low tonic activity (1–10 Hz) and broad action potential waveforms of BF bursting neurons. Furthermore, the firing rates of these PV + GABAergic BF projection neurons differ across the different sleep cycles, with activity between 25–50 Hz in wake and REM sleep that drops to less than 25 Hz during slow wave sleep (Kim et al., 2015), and further differentiates the activity of these neurons from the BF bursting neurons whose firing rates are not modulated by arousal states (**Figures 2A,B**; Lin et al., 2006; Lin and Nicolelis, 2008). Besides PV + GABAergic neurons, other populations of GABAergic projection neurons exist in BF and can be identified by their expression of the potassium channel Kv2.2 (Hermanstyne et al., 2010) or the neurokinin-3 receptor (Furuta et al., 2004). Another possibility is that BF bursting neurons represent direct glutamatergic BF projections to the cortex (Hur and Zaborszky, 2005). Together, the studies reviewed here suggest that BF bursting neurons are unlikely cholinergic or PV + GABAergic BF projection neurons, and suggest that they represent another group of non-cholinergic BF corticopetal neurons whose neurochemical identity remains to be defined.

# SECTION 3: DIFFERING INTERPRETATIONS OF BF SALIENCE-ENCODING NEURONS IN THE LITERATURE

Perhaps the most distinct and best-characterized property of BF bursting neurons is their ability to encode the motivational salience of primary reinforcers and reinforcer-predictive cues using phasic bursting responses. In the rodent BF, Lin and colleagues have demonstrated that BF bursting neurons respond to both primary reward (water or a sucrose solution; **Figure 3A**; Lin and Nicolelis, 2008; Avila and Lin, 2014a,b; Nguyen and Lin, 2014) and punishment (a quinine solution; Lin and Nicolelis, 2008). As an animal learns the associative relationship between the reinforcers and the preceding conditioned stimuli (CS), both the CSs that predict reward (CS+) or punishment (CS−) acquire the ability to elicit robust bursting in BF neurons (**Figure 3B**). Given that the phasic bursting response is similarly elicited by the CS, irrespective of its sensory modality (auditory or visual), associated motor response (Go or Nogo), or hedonic valence (reward or punishment), the bursting response likely encodes the motivational salience of the stimulus (Lin and Nicolelis, 2008).

The phasic bursting responses of BF neurons to motivationally salient stimuli have in fact been widely described in both non-human primate and in rodent BF literatures. In non-human primates, DeLong first described in 1971 (DeLong, 1971) neurons in the primate SI/NBM region that fire with different response patterns and at tonically lower rates than the neighboring neurons in the globus pallidus (GP; **Figure 3C**), and that show bursting responses to the presentation of a juice reward (**Figure 3D**; Richardson and DeLong, 1991). These reinforcement-active neurons not only show graded response amplitudes according to reward amount

(Richardson and DeLong, 1991), but also robustly burst to aversive stimuli, such as air puffs (**Figure 3E**; Richardson and DeLong, 1991). SI/NBM neurons were subsequently found to respond to the sensory cues that predict rewards, in addition to the primary reinforcers themselves (**Figure 3F**). An example is seen in **Figure 3F** that shows bursting activity of a primate NBM neuron to reward-predicting stimuli, regardless of whether the cue instructs the animal to make a movement (Go) or refrain from making a movement (Nogo) in order to obtain reward (Richardson and DeLong, 1991). Neurons distributed throughout the SI, NBM, and HDB nuclei of the BF therefore appear to reflect the reinforcing nature of rewards and their predictive stimuli (Wilson and Rolls, 1990). Subsequent studies confirmed that reward-related NBM neurons do not encode the sensory qualities of the reward-predicting cues (Wilson and Rolls, 1990; Richardson and DeLong, 1991).

More recent studies in the rodent BF have identified similar response patterns as the non-human primate BF bursting neurons (Tindell et al., 2005, 2009; Lin and Nicolelis, 2008; Smith et al., 2011; Tingley et al., 2014). **Figure 3G** shows examples of such neurons from the Aldridge group that respond with phasic bursting responses to conditioned stimuli that are associated with reward and Go responses (CS+) or with no reward and Nogo responses (CS−). Rodent BF neurons also respond to primary reinforcers with similar responses regardless of whether animals receive appetitive outcomes, such as a sucrose solution or pellet (Tindell et al., 2005; Lin and Nicolelis, 2008), or an aversive outcome like a hypertonic salt solution or quinine (Tindell et al., 2005, 2009; Lin and Nicolelis, 2008; Smith et al., 2011). Similarly, **Figure 3H** shows the entire neuronal population recorded in the BF region from the Nitz group (Tingley et al., 2014), and shows an overrepresentation of neurons with phasic bursting responses to CS onset.

It is important to note that salience-encoding BF neurons are also influenced by hedonic valence. For example, subsequent to the initial phasic bursting response to both CS+ and CS− in a Go/Nogo task that encodes motivational salience, Lin and Nicolelis (2008) showed that the initial bursting is followed by a sustained phase of activity modulation that is excitatory in rewarded (Go) trials and inhibitory in punishment (Nogo) trials (**Figure 3B**). Sustained responses of BF bursting neurons reflecting the hedonic valence of the predicted outcome are also reported in other studies

(Wilson and Rolls, 1990; Richardson and DeLong, 1991; Tindell et al., 2006, 2009) and appear to track the updated value of the expected outcome (Tindell et al., 2009; Smith et al., 2011). Future studies will need to address how motivational salience and hedonic valence information coexist in the same neuronal population.

The prevalence of salience-encoding neurons in the BF literature shows that this is a prominent neuronal population widely present in both rodents and non-human primates. Despite their prevalence, BF salience-encoding neurons have often been interpreted very differently in the literature as either the BF cholinergic neurons (Wilson and Rolls, 1990; Richardson and DeLong, 1991; Tingley et al., 2014), or as corresponding to ventral pallidal (VP) neurons as part of the ventral striatopallidal system (Tindell et al., 2005, 2009; Smith et al., 2011). As described in Section 2, multiple physiological and functional features of these salience-encoding neurons differ from those of cholinergic BF neurons, including their bursting characteristics, their lack of modulation by sleep-wake states (**Figures 2A,B**), and their response latencies to reinforcers (Lin et al., 2006; Lin and Nicolelis, 2008; Hangya et al., 2015). On the other hand, although the location of BF salience neurons overlaps with the caudal VP, bursting neurons have been found both above and below the caudal VP region, broadly corresponding to regions that contain cortically-projecting BF neurons (Lin et al., 2006; Lin and Nicolelis, 2008; Avila and Lin, 2014a,b; Nguyen and Lin, 2014). Moreover, unlike other neurons in this region that encode movement and better resemble neurons in the striatopallidal circuit, salience-encoding BF neurons are concerned primarily about motivationally salient events but not movement (**Figure 2D**; Avila and Lin, 2014b).

In this context, the unique contributions of Lin and colleagues are the identification of salience-encoding neurons as a physiologically and functionally homogeneous neuronal population in the BF, which highlights the importance in distinguishing BF salience-encoding neurons from the other neurons in this region. More importantly, Lin and colleagues suggest that these neurons are non-cholinergic BF neurons that project to the cerebral cortex (Lin et al., 2006; Lin and Nicolelis, 2008; Avila and Lin, 2014b), which stands in stark contrast with previous interpretations that attribute this phenotype to either cholinergic BF neurons or to VP neurons. These differing interpretations underscore the anatomical and neurochemical heterogeneity of the BF, as salience-encoding neurons represent but one functionally and physiologically homogenous population among many others that respond to different behavioral events and play key roles in value-laden decisions. These differing accounts also underscore the importance in future studies to determine the neurochemical identity, as well as the projection targets, of salience-encoding BF neurons.

# SECTION 4: KEY FEATURES OF THE BF SALIENCE-ENCODING NEURONS IN THE DECISION-MAKING PROCESS

In this section, we highlight several key features of BF bursting neurons and describe how BF bursting activity quantitatively modulates behavioral responses and cortical processing. These features are instrumental in understanding the functional significance of BF bursting neurons in the decision-making process.

The first key property of BF salience-encoding neurons is that their bursting responses to sensory stimuli are not innate, but are instead acquired through associative learning (Lin and Nicolelis, 2008). As neutral sensory stimuli acquire motivational salience through associative learning, they become conditioned stimuli (CSs) that reliably predict reward or punishment and can robustly elicit behavioral responses; simultaneously, the CSs also acquire the ability to elicit phasic bursting responses. The BF bursting response, however, is absent following other clearly perceptible but not motivationally salient stimuli. **Figure 4A** provides an example of BF neurons that display phasic bursting responses to previously learned motivationally salient cues, but at the same time show no response to a perceptually salient house light that the animal has not yet learned to associate with reward. Furthermore, as the association between stimuli and their predictive outcomes is reversed through extinction training, BF bursting responses to cues quickly diminish as cues lose their motivational salience (Lin and Nicolelis, 2008). These response patterns indicate that the BF bursting response is not required for the perception of a sensory cue, and its influence on the decision making process must take place after the initial perception stage.

The second key property of BF salience-encoding neurons is that the bursting response is tightly coupled with the success of behavioral responses to motivationally salient cues. In a near-threshold auditory detection task, BF neurons displayed phasic bursting responses to tones when animals made correct behavioral responses (Hit; **Figure 4B**), even when tones were presented at or below detection level threshold. In contrast, when animals failed to respond to the tone, BF neurons were not activated (Miss; **Figure 4B**; Lin and Nicolelis, 2008). Furthermore, within trials in which animals successfully detected and responded to the tone, the amplitude of the BF bursting response scaled with the animals' response latency (**Figure 4B**; Lin and Nicolelis, 2008). These results suggest that successful responses to the CS are associated with, and perhaps require, the BF motivational salience signal, which likely facilitates the execution of the correct behavioral response based on perceived cues. Consistent with this interpretation, in the Go/Nogo task, incorrect ''false-alarm'' responses in Nogo trials were associated with higher BF activity compared with correct Nogo responses (Lin and Nicolelis, 2008, Supplemental Figure S4).

The third key property of BF bursting neurons is that the strength of the BF motivational salience signal is quantitatively coupled with faster and more precise decision speeds. To determine the quantitative relationship between the BF salience signal and decision speed, Avila and Lin (2014a) investigated whether the BF bursting amplitude is capable of influencing the earliest read out of behavioral responses to the CS using the metric of simple reaction time (RT). In a reward-biased simple RT task, the motivational salience of two auditory cues was manipulated by the magnitude of associated rewards. The cue that predicted a large reward elicited faster RTs and stronger BF bursting amplitudes and importantly, the magnitude of RT modulation was quantitatively accounted for by the modulation of BF bursting amplitudes (**Figure 4C**; Avila and Lin, 2014a). The relationship between the BF bursting response and RT was found to be causal, as augmenting the strength of the BF bursting

response with BF electrical simulation increased decision speed (Avila and Lin, 2014a). These findings suggest that the BF bursting response may serve as a gain modulation signal of the decision making process to enhance the speed of responding to motivationally salient cues.

The fourth key property is that the BF bursting response enhances cortical processing at least in part by generating an event-related potential (ERP) response in the frontal cortex (**Figure 4D**; Nguyen and Lin, 2014). To better understand how the BF motivational salience signal modulates downstream cortical processing, Nguyen and Lin (2014) studied the relationship between the BF bursting response and the ERP response in the frontal cortex using an auditory oddball task (**Figure 4D**). The amplitude and timing of BF bursting and the prominent frontal ERP response were tightly coupled with each other (**Figure 4D**), and such coupling was observed on a trial-bytrial basis (Nguyen and Lin, 2014). Furthermore, the frontal ERP response was associated with local field potential (LFP) responses

localized to deep cortical layers of the frontal cortex, coincident with the target layers of BF projections (Henny and Jones, 2008). Such layer-specific LFP response patterns are also recreated by BF electrical stimulation with a delay of 5–10 ms (Nguyen and Lin, 2014), consistent with the conduction delay from the BF to the frontal cortex (Aston-Jones et al., 1985; Reiner et al., 1987). These observations suggest that the frontal ERP/LFP response likely represents the first step by which the BF motivational salience signal enhances cortical processing of a perceived stimulus to facilitate correct behavioral responses.

# SECTION 5: HYPOTHESIS

Based on studies reviewed above, we propose a unifying hypothesis that the BF salience-encoding neurons serve as a signal amplifying, or gain-modulation, mechanism for motivationally salient cues (**Figure 5A**). The hypothesis includes three key components: (1) A unique population of putative noncholinergic BF neurons encodes the motivational salience of stimuli with a phasic bursting response (Lin et al., 2006; Lin and Nicolelis, 2008); (2) The BF motivational salience signal is rapidly broadcasted to the cerebral cortex to enhance cortical processing (Nguyen and Lin, 2014); and (3) This modulation results in faster and more precise decision speed (Avila and Lin, 2014a).

This hypothesis addresses a fundamental question in neuroscience: how the brain filters meaningful from meaningless stimuli to execute responses only to stimuli that are behaviorally relevant. Animals are constantly faced with a barrage of incoming sensory stimuli; however, most of the stimuli are not motivationally salient, do not carry any behavioral consequence, and need not be responded to. For the subset of stimuli that are motivationally salient, which may or may not be perceptually salient, the brain must require an internal gain modulation mechanism to amplify their processing and ensure correct and efficient behavioral responses. Such is the main behavioral function of this unique population of non-cholinergic BF bursting neurons, to serve as a fast and powerful gain modulation mechanism to facilitate behavioral responses to environmental stimuli, and that operates based on the motivational, but not perceptual, salience of the stimuli.

This gain-modulation hypothesis can also be conceptualized in a decision model (**Figure 5B**). Simple decision making processes have been commonly modeled as activity accumulation in a hypothetical decision unit, such as the drift-diffusion model or the linear rise to threshold model (Ratcliff and Rouder, 1998; Reddi and Carpenter, 2000; Ratcliff, 2001). Once the activity of this decision unit reaches a threshold, a decision is made and a behavioral response, such as the RT response, is observed. The studies reviewed here suggest that BF bursting response serves as a gain modulation signal that modulates the rate of activity accumulation in the decision unit. A stronger BF bursting response—such as that generated in response to a stimulus with high motivational salience—increases the rate of activity accumulation, and in turn, increases decision speed and generates a faster RT distribution. Data collected by Lin and colleagues support this hypothesis (**Figure 5B**): stimuli with greater motivational salience produce stronger bursting responses in putative non-cholinergic BF neurons (Lin and Nicolelis, 2008), that in turn enhances activity within cortical networks (Lin et al., 2006; Nguyen and Lin, 2014), and increases the speed and precision of decision making (Avila and Lin, 2014a). On the other hand, the absence of BF bursting in the near-threshold auditory detection task is coupled with the absence of a behavioral response, and likely reflects a lack of internal amplification, such that activity in the decision unit never reaches the decision threshold (Lin and Nicolelis, 2008).

# REFERENCES


The specific cortical mechanisms that underlie the transference of the BF motivational salience signal into a rapid and precise behavioral response remain to be determined, and should be the focus of future experiments. However, the ability of BF bursting neurons to rapidly enhance cortical activity and decision speed are consistent with a disinhibition mechanism mediated by GABAergic BF cortically projecting neurons. Anatomical data show that corticopetal GABAergic neurons preferentially innervate inhibitory interneurons in the neocortex (Freund and Gulyás, 1991; Freund and Meskenaitet, 1992; Henny and Jones, 2008). As these cortical GABAergic interneurons in turn each contact multiple excitatory pyramidal neurons, inhibition of interneuron activity by BF corticopetal projections would have the net result of inducing potent and widespread cortical excitation. Indeed, this disinhibition mechanism has been previously suggested to account for the ability of the BF's non-cholinergic population to gate cortical information processing (Dykes, 1997; Sarter and Bruno, 2002). Additional experiments that confirm the neurochemical identity of the BF salience neurons and their projection targets are needed to test this disinhibition hypothesis, as a direct glutamatergic BF projection to the cortex (Hur and Zaborszky, 2005) remains a possibility.

The BF's ability to encode the motivational salience of a stimulus is a critical component in determining whether or not to attend to incoming sensory information, and is therefore a fundamental and early requirement of adaptive and flexible behavior. Indeed, animals can flexibly respond to the same stimulus depending on its associated motivational salience. The associated motivational salience can be dynamically adjusted through associative learning and rapidly reversed by extinction (Lin and Nicolelis, 2008). As such, the putative non-cholinergic BF salience-encoding neurons represent an important neural circuit that is instrumental in behavioral flexibility. Future experiments should be designed to test the specific contributions of the BF motivational salience signal in guiding flexible and adaptive behavior, and to provide a clearer understanding of the functions of this BF population in age-related diseases and normal cognitive aging.

# FUNDING

This work was supported by the Intramural Research Program of the National Institute on Aging, National Institutes of Health.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Raver and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Ongoing behavioral state information signaled in the lateral habenula guides choice flexibility in freely moving rats

#### Phillip M. Baker, Sujean E. Oh, Kevan S. Kidder and Sheri J. Y. Mizumori\*

Department of Psychology, University of Washington, Seattle, WA, USA

The lateral habenula (LHb) plays a role in a wide variety of behaviors ranging from maternal care, to sleep, to various forms of cognition. One prominent theory with ample supporting evidence is that the LHb serves to relay basal ganglia and limbic signals about negative outcomes to midbrain monoaminergic systems. This makes it likely that the LHb is critically involved in behavioral flexibility as all of these systems have been shown to contribute when flexible behavior is required. Behavioral flexibility is commonly examined across species and is impaired in various neuropsychiatric conditions including autism, depression, addiction, and schizophrenia; conditions in which the LHb is thought to play a role. Therefore, a thorough examination of the role of the LHb in behavioral flexibility serves multiple functions including understanding possible connections with neuropsychiatric illnesses and additional insight into its role in cognition in general. Here, we assess the LHb's role in behavioral flexibility through comparisons of the roles its afferent and efferent pathways are known to play. Additionally, we provide new evidence supporting the LHb contributions to behavioral flexibility through organization of specific goal directed actions under cognitively demanding conditions. Specifically, in the first experiment, a majority of neurons recorded from the LHb were found to correlate with velocity on a spatial navigation task and did not change significantly when reward outcomes were manipulated. Additionally, measurements of local field potential (LFP) in the theta band revealed significant changes in power relative to velocity and reward location. In a second set of experiments, inactivation of the LHb with the gamma-aminobutyric acid (GABA) agonists baclofen and muscimol led to an impairment in a spatial/response based repeated probabilistic reversal learning task. Control experiments revealed that this impairment was likely due to the demands of repeated switching behaviors as rats were unimpaired on initial discrimination acquisition or retention of probabilistic learning. Taken together, these novel findings compliment other work discussed supporting a role for the LHb in action selection when cognitive or emotional demands are increased. Finally, we discuss future mechanisms by which a superior understanding of the LHb can be obtained through additional examination of behavioral flexibility tasks.

Keywords: cognitive flexibility, serotonin, dopamine, lateral habenula, reversal learning, spatial navigation, learning and memory

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

Alicia Izquierdo, University of California, Los Angeles, USA Stan Floresco, University of British Columbia, Canada

> \*Correspondence: Sheri J. Y. Mizumori mizumori@uw.edu

Received: 01 September 2015 Accepted: 19 October 2015 Published: 04 November 2015

#### Citation:

Baker PM, Oh SE, Kidder KS and Mizumori SJY (2015) Ongoing behavioral state information signaled in the lateral habenula guides choice flexibility in freely moving rats. Front. Behav. Neurosci. 9:295. doi: 10.3389/fnbeh.2015.00295

# INTRODUCTION

Multiple decades of research have led to an understanding of many brain areas involved in the ability to switch ongoing behaviors when contingencies change. Changes in behavior can range from a reversal of appetitive or aversive responses, to the adaptation of behavior following subtle environmental cues such as changing seasons. With this range of behavioral flexibility required in complex organisms, it is not surprising that multiple neural systems participate in one or a number of types of related behaviors. In general, behavioral flexibility requires a complex series of neural processes including recognizing environmental cues as well as the internal state of the animal, choosing an appropriate response based on this information, and analyzing the outcome of that choice based on previous expectations in order to plan future behavior. While it is evident that individual forebrain and midbrain systems uniquely control specific functions that enable behavioral flexibility (e.g., outcome analysis or action selection) or determine the current type of behavioral flexibility, e.g., reversal learning vs. set-shifting, other systems appear to play more general roles across many forms of adaptive behavior. Among these are two monoamine neurotransmitter systems: the dopamine (DA) and serotonin (5-HT) systems.

The DA system has been implicated in nearly all aspects of behavioral flexibility performance from action selection to recognizing a change in outcomes (Spirduso et al., 1985; Barnéoud et al., 2000; Ragozzino, 2002; Lee et al., 2007; De Steno and Schmauss, 2009; Kehagia et al., 2010). For example, striatal DA is required for the initiation of motivated actions as selective dopaminergic lesions to the median forebrain bundle results in impaired memory for learned motor programs which is interpreted as impaired top-down movement control (Ridley et al., 2006). DA release within the striatum is also observed when animals experience reward predictive cues or unexpected changes in reward expectations which is thought to relate to motivational aspects of rewarding actions (Stuber et al., 2008; Wassum et al., 2012; Volman et al., 2013). Additionally, prefrontal DA release is selectively increased on the reversal but not repeated performance of a spatial reversal learning task (van der Meulen et al., 2007). 5-HT contributes to behavioral flexibility tasks in a complimentary manner to DA through tracking of expectation for behavioral flexibility. For example, neural activity in the dorsal raphe (DRN; which includes many forebrain projecting 5-HT neurons) tracks ongoing behaviors in relation to upcoming outcomes (Bromberg-Martin et al., 2010; Inaba et al., 2013; Liu et al., 2014). Decreasing 5-HT availability through either excitotoxic lesions or tryptophan depletion impairs behavioral flexibility while increasing it with selective serotonin reuptake inhibitors (SSRIs) enhances it (Bari et al., 2010; Brown et al., 2012; Izquierdo et al., 2012; Wallace et al., 2014). This has led many to suggest that 5-HT signaling contributes to reward or aversive learning especially when behavioral expectancy signals must be updated for future behavioral choices such as when risk is involved in decision making (Doya, 2008; Robbins and Arnsten, 2009; Bari et al., 2010; Liu et al., 2014). Additionally, Groman et al. (2013) provided evidence that these systems interact with one another in a complimentary fashion during behavioral flexibility such that a balanced increase between 5-HT and DA levels in the orbitofrontal cortex and striatum is correlated with ideal reversal learning performance.

Despite well-established roles for monoamine systems in behavioral flexibility, it is not well understood how forebrain structures that are involved in behavioral flexibility influence the DA and 5-HT systems. There is growing interest in understanding how information is processed by structures which influence the DA and 5-HT systems. However, this is a difficult problem since, in order to contribute to a wide variety of behaviors, these systems must receive information about the current behavioral and emotional state of an animal in order to organize and reinforce beneficial behaviors. One key structure that is poised to relay both current behavioral and emotional/internal state information is the lateral habenula (LHb) which can influence both the DA and 5-HT systems via direct and indirect connections (Lecourtier et al., 2008; Goncalves et al., 2012; Sego et al., 2014). It has been suggested that the LHb possess two separate streams of information comprising the medial and lateral portions of the LHb (Hikosaka, 2010; Proulx et al., 2014). Neural recording studies in animals performing complex behavioral tasks that require behavioral flexibility has yet to resolve whether different regions of the LHb respond to different aspects of behavior. Below we describe, and then test, a possible role for the LHb in regulating DA and/or 5HT modulation when animals must flexibly switch ongoing behaviors as contingencies change.

Many potential roles for the LHb in behavior have been proposed based on the diverse effects observed during either neural recording or after experimental manipulation. Early reports of the behavioral role of the LHb included a currently not well defined role in olfactory processing, as well as mating behavior, and aversive or reward learning (Sutherland, 1982). More recent work has focused on the role of the LHb in aversive responses. One such proposal which has gained prominence as of late is a role in inhibiting DA neurons in response to aversive outcomes or predictions during Pavlovian learning (Hikosaka, 2010; Proulx et al., 2014). The role of the LHb is less well understood in goal directed behaviors that rely on behavioral flexibility to obtain a desired outcome. Recent reports and new data presented below support our hypothesis that the LHb plays a role in the execution of specific goal directed actions when the use of complex strategies, or switching of strategies, is required. Specifically we propose that LHb signals to brainstem monoamine systems information about the ongoing behavioral state of an animal for the purpose of organizing adaptive actions aimed at receiving rewards or avoiding punishment. Based on data presented within, it is likely that at least in the rat, this information about behavioral states broadcast to both the serotonergic and dopaminergic systems to then further be integrated with additional input distinct to each system. This view is supported by the role that both DA and 5-HT are known to play in both goal directed activity in general as well as behavioral flexibility specifically. The aim of this review is to synthesize a diverse body of research aimed at understanding how the LHb functions when animals are required to change ongoing or innate actions in order to receive reward or avoid punishment. Our interpretation will emphasize the known role of afferent and efferent structures of the LHb and how they inform the role this structure plays in behavioral flexibility. Additionally, preliminary experiments in our own lab are discussed in relation to this hypothesis of LHb function. Finally, novel means of testing this hypothesis are discussed in relation to state of the art techniques now available to dissect circuit function.

# FUNCTIONAL ANATOMY OF THE LATERAL HABENULA (LHb)

The LHb can be divided into as many as 10 subregions based on either streams of input and output or identities of neuronal protein expression (Andres et al., 1999; Geisler et al., 2003; Aizawa et al., 2012; Wagner et al., 2014). Due to the already relatively small size of the habenular complex itself, however, it is often treated somewhat more homogeneously given the practical challenges in isolating such small subdivisions. Recent advances in promoter driven Cre mice lines however, offer a way forward in addressing subregion specific contributions to LHb function at a basic level. The LHb is often divided into a medial segment and a lateral segment based on the targets of projection neurons: the medial portion mainly targets the median and DRN and the lateral portion largely projects to the rostromedial tegmental nucleus (RMTg; Kim and Chang, 2005; Proulx et al., 2014). It is worth noting however, that this division is based mainly on interest in the LHb control of monoamine structures, particularly in cognition, and this to some extent has minimized our appreciation of the prominent projections to the posterior hypothalamus, dorsal tegmentum and periaqueductal gray identified in both rats and mice (Araki et al., 1988; Quina et al., 2015). These latter projections are also largely segregated although somewhat overlapping (Harris et al., 2014). It is promising for future studies that the overall cytoarchitecture and circuitry appears similar between mice and rats (Geisler et al., 2003; Goncalves et al., 2012; Wagner et al., 2014; Quina et al., 2015) although some differences between rodents and primates have been observed (Parent et al., 1981; Araki et al., 1984; Hong and Hikosaka, 2008).

# LHb Afferent Structures and their Role in Behavioral Flexibility

Insight into the potential types of functions that the LHb contributes may be obtained by understanding the roles that LHb afferent systems play in behavioral flexibility (**Figure 1**). The LHb receives input from many different brain areas which are often divided into several major categories, the basal ganglia, the hypothalamic areas, and the limbic cortical systems (Sutherland, 1982; Lecourtier and Kelly, 2007; Hikosaka, 2010). The LHb also receives DA and 5-HT input from the ventral tegmental area and the median raphe (MRN) making these connections with monoaminergic systems

symmetrical (Beckstead et al., 1979; Skagerberg et al., 1984; Vertes et al., 1999). Overall, the patterns of connectivity of the LHb raise many possibilities for its involvement in a wide variety of behavioral flexibility functions. For a more complete view of all afferent and efferent connections of the LHb see (Lecourtier and Kelly, 2007) or (Quina et al., 2015).

The main input from the basal ganglia arises from the entopeduncular nucleus (EPN; Nagy et al., 1978; Araki et al., 1984). In rodents a majority of EPN fibers, especially in the rostral portion of the nucleus project to the entirety of the LHb (Parent et al., 1981; Araki et al., 1984; Vincent and Brown, 1986). In monkeys this projection appears to originate from a unique and restricted region of the internal globus pallidus (GPi) mainly from the dorsal and ventral boarders (Parent et al., 1981; Hong and Hikosaka, 2008). It is not well understood why this species difference exists. Interestingly, in rats neurons from the EPN projecting to the LHb contain both glutamate and GABA (Araki et al., 1984; Shabel et al., 2014).

The basal ganglia plays a prominent role in behavioral flexibility ranging from response reversal learning, to inhibiting ongoing actions, to switching foraging patches when resources become scarce (Schwartzbaum and Donovick, 1968; Seamans and Phillips, 1994; Hills, 2006; Bryden et al., 2012). However, less is known about the role of the most immediate structure projecting to the LHb, the EPN. In monkeys, neurons recorded from the GPi respond to reward consumption, reward predictive cues or probabilities, and exploratory behavior (Hong and Hikosaka, 2008; Joshua et al., 2009). In rodents however, to date only one paper has examined firing properties of EPN neurons in freely moving animals (Benhamou and Cohen, 2014). This study found that a majority of cells had lower firing rates similar to those observed in monkey GPi neurons which project to the LHb when exploring an open field. The proportion of cells in this group roughly matches proportions of LHb projecting neurons in the EPN (∼66%; van der Kooy and Carter, 1981). This could mean similar functions are served by these EPN neurons in rats as is observed with monkeys in relation to goal directed behavior, however, it remains a speculation at this point (Benhamou and Cohen, 2014). In terms of understanding behavioral flexibility, it is difficult to isolate a role for the EPN in rats due to the lack of in vivo electrophysiological studies as well as confounds with motor effects and sensory motor integration observed following experimental manipulation (Dacey and Grossman, 1977; Scheel-Krüger et al., 1981; Sarkisov et al., 2003; Schwabe et al., 2009). However, these confounds can be controlled for by unilateral or sequential lesion techniques, especially those that use fiber sparing methods (Lutjens et al., 2011). Using these methods, it is known that the EPN contributes to active avoidance operant behavior when animals are given periods of safe pressing for reward intermixed with times when a cue indicates that presses result in shock. EPN lesions result in continued pressing during the shock period (Margules, 1971; Chavez-Martinez et al., 1987). Accurate performance on this task requires the integration of emotional information (shock avoidance) with appetitive signals (hunger) raising the possibility that a downstream target of the EPN such as the LHb may also be involved in the integration of emotional and motivational information especially when one considers LHb connections with the limbic system.

Another major projection to the LHb originates in the hypothalamus including the lateral preoptic area (LPOA) and the lateral hypothalamus (Herkenham and Nauta, 1977; Parent et al., 1981). These hypothalamic areas are known for their role in emotional arousal, cue associations and feeding behavior (Stratford and Wirtshafter, 2012; Sohn et al., 2013; Cole et al., 2015). The lateral hypothalamus has also been connected with attention and learning based on cues for both positive and negative outcomes (Ono et al., 1986). Further, orexin/hypocreatin and melanin-concentrating hormone neurons originating in the lateral hypothalamus likely project to the LHb as staining for these receptors or compounds is found in the LHb (Skofitsch et al., 1985; Peyron et al., 1998). This is interesting in relation to behavioral flexibility as the lateral hypothalamus orexin neurons are proposed to be critical for a flexible arousal system in the brain (Kosse and Burdakov, 2014). Melanin concentrating hormone also participates in feeding behaviors as well as emotional regulation and stress (Hervieu, 2003; Saito and Nagasaki, 2008). It is not known to date whether the LHb also contributes to these behaviors. An additional potential contributor to LHb in behavioral flexibility is the LPOA. Several studies have reported that neurons within the LPOA responded to cues that predicted either positive or negative outcomes similarly, suggesting a role in attention or arousal as would be required during behavioral flexibility (Linseman, 1974; Ono and Nakamura, 1985). Based on the behaviors in which both major LHb afferent systems projections, the EPN and hypothalamic areas, are involved, the LHb stands in an ideal position to integrate sensory/motor, reward, arousal, and emotion/stress related information to guide behavior as both the internal and external states of the animal change. This integrated signal can then be relayed to midbrain areas coherently and quickly.

Identified projections from frontal cortical areas to the LHb also support LHb involvement in behavioral flexibility (Greatrex and Phillipson, 1982; Kim and Lee, 2012). Projections from the prelimbic and infralimbic regions of the mPFC are largely confined to the medial portions of the LHb while the anterior cingulate cortex (ACC) and insular cortex project to more lateral areas (Kim and Lee, 2012). The medial prefrontal cortex (mPFC) is known to be important when established behavioral strategies must be overridden as is required in a number of behavioral flexibility tasks (Seamans et al., 1995; Dalley et al., 2004; Ragozzino, 2007; Shaw et al., 2013). Typically, these mPFC deficits manifest as perseverations on the previous reward contingencies which are interpreted as an inability to inhibit the previously relevant behavior (Dias and Aggleton, 2000; Ragozzino et al., 2003). In contrast, the ACC contributes to general discrimination learning mechanisms as both its lesion or temporary inactivation result in non-specific error patterns and delayed learning (Dias and Aggleton, 2000; Ragozzino and Rozman, 2007; Kosaki and Watanabe, 2012). Neurons recorded from the ACC show greater activation with higher task demands supporting its role in difficult tasks that require switching behaviors (Johnston et al., 2007). Cognitively demanding tasks are known to result in sustained tonic DA signaling (Abercrombie et al., 1989; Phillips et al., 2004). Thus input from the ACC and mPFC likely influence the role of the LHb in controlling monoamine projections discussed below. Specifically, we propose that prefrontal information about behavioral context and task difficultly are relayed to the LHb where they become integrated with other input such as reward and effort (from basal ganglia) to influence monoamine resources. This possibility is supported by a study which showed that LHb inhibition or excitation in vivo resulted in regionally specific changes in tonic dopamine levels (Lecourtier et al., 2008).

# Efferent Connections of the LHb to the DA and 5-HT Systems and their Role in Behavioral Flexibility

The dopaminergic system has been connected with behavioral flexibility and reinforcement learning for many years (Roberge et al., 1980; Schultz, 1998; Heyser et al., 2000; Kehagia et al., 2010). The DA system is mainly contained within two areas, the substantia nigra pars compacta (SNc) and the ventral tegmental nucleus (VTA). The LHb can strongly influence DA neurons in both areas as even a single LHb electrical stimulation pulse can inhibit DA firing in both structures as for as long as 250 ms (Christoph et al., 1986). It is thought that this effect is due to both direct excitatory projections onto GABAergic interneurons as well as indirect projections via the RMTg (Brinschwitz et al., 2010; Balcita-Pedicino et al., 2011; Goncalves et al., 2012). Both the VTA and SNc project to numerous limbic system and cortical areas that influence behavioral flexibility (Fallon, 1981; Swanson, 1982; Oades and Halliday, 1987). A number of excellent reviews are available on this topic (Floresco and Magyar, 2006; Kehagia et al., 2010; Klanker et al., 2013), and so only a selected number of studies will be highlighted here. Historically, depletion of DA projections to the forebrain was found to cause a reduction in the ability to initiate goal directed actions. However, reflexes and automatic motor movement remained undisturbed suggesting that DA plays a critical role in goal directed actions via the forebrain. These findings were commonly related to Parkinson's Disease as a gross depletion of DA is a hallmark of that condition. However, in addition to the motor symptoms of Parkinson's, deficits in behavioral flexibility are also common. This led to interest in DA contributions to behavioral flexibility.

In general, DA neurons themselves, both within the VTA and SNc, respond to reward predictive cues or reward/punishment (Schultz, 1998; Matsumoto and Hikosaka, 2009). In addition, several basal ganglia areas require DA input in order for animals to successfully enact a number of behavioral flexibility tasks. Nucleus accumbens depletion of DA using the neurotoxin 6-OHDA impairs spontaneous exploratory behavior, discrimination learning and reversal learning (Taghzouti et al., 1985). In contrast, DA depletion in the dorsomedial striatum had only a minor effect on reversal learning as evidenced by slight increases in the magnitude of difference between lesioned and sham animals (O'Neill and Brown, 2007). However, more substantial lesions of mouse dorsal striatum have been found to impair rule switches from a turn to a cue based strategy on a water-based U-shaped maze (Darvas et al., 2014). Using in vivo cyclic voltammetry, striatal DA has been found to signal reward predictive cues and unexpected rewards (Aragona et al., 2009; Brown et al., 2011). Thus the striatum represents a possible node in a network that includes the LHb and the DA system to signal when expected events begin or when expectations are violated and new behaviors must be implemented. Another possible actor in this network is the prefrontal cortex. Medial prefrontal 6-OHDA lesions were found to impair the ability of animals to acquire a response set indicating that DA facilitates set or strategy formation (Crofts et al., 2001). Similarly, using in vivo microdialysis it was found that DA levels increased in the mPFC during both the acquisition and switching of a brightness/texture discrimination as well as when reward was given unpredictably (Stefani and Moghaddam, 2006). However, when reward was given on all arm entries in a non-contingent predictable manner, no changes were observed (Stefani and Moghaddam, 2006). Overall, it is likely that the basal ganglia and prefrontal cortex together to signal when behaviors are to be learned or performed and when these learned behaviors must be changed due to changes in reward outcomes.

Recently, there has been a growing interest in how specific firing patterns of DA neurons affect goal directed behavior. Two commonly studied modes of DA transmission are tonic and burst firing. Evidence suggests that burst firing plays a key role in reward prediction and learning in several brain areas (Schultz, 1998; Matsumoto and Hikosaka, 2009; Brown et al., 2011). Tonic firing on the other hand, is thought to influence the plasticity in various circuits (Frank, 2005; Goto and Grace, 2005; Dreyer et al., 2010). One particularly interesting study blocked burst firing in DA cells by selectively removing their N-methyl-D-aspartate (NMDA) receptors. This resulted in a reduced ability to learn and reverse a cue based reward association on a t-maze (Zweifel et al., 2009). The LHb may differentially affect tonic and burst firing aspects of DA transmission contributing to unique control over the role of DA in goal directed actions. In support of this hypothesis, inhibition of the LHb in awake and behaving animals resulted in a sustained (∼1 h) increase in DA in the prefrontal cortex, nucleus accumbens and the lateral striatum (although with different magnitudes and time courses) suggesting a role in tonic neurotransmission (Lecourtier et al., 2008). Stimulation of the LHb during receipt of a reward was shown to block reward-induced DA neuron excitation and shift preferences to the alternative choice suggesting the LHb also plays a role in burst transmission (Stopper et al., 2014). More studies are needed to determine how the LHb might act on tonic and burst firing modes of DA within the same task.

Analogs to the dopaminergic system, the serotonergic system also has a complex role in behavioral flexibility. There are two main 5-HT nuclei in the brain, the DRN and MRN, which together project to a majority of other brain structures (Bobillier et al., 1979; Waterhouse et al., 1986; Sim and Joseph, 1992; Vertes et al., 1999; Vasudeva et al., 2011). Limited research has examined the role of the DRN and MRN themselves in behavioral flexibility. One study indicated that electrolytic lesions of the MRN caused an impairment on an egocentric reversal learning task without affecting initial acquisition (Wirtshafter and Asin, 1986). However, most research has focused on anxiety/depressive-like or ingestive behaviors in relation to MRN functions (Wirtshafter, 2001; Andrade et al., 2013; López Hill et al., 2013; Zangrossi and Graeff, 2014). Likewise there are limited data on the effect of DRN manipulation of behavioral flexibility. Recording from DRN neurons in monkeys has revealed tonic changes in firing rate associated with ongoing goal directed behaviors that continue until after receipt of the reward (Nakamura et al., 2008; Bromberg-Martin et al., 2010). Additionally, electrical stimulation of the DRN can reinforce instrumental behavior (Corbett and Wise, 1979; Rompre and Miliaressis, 1985). More recent work using optogenetic targeting of DRN subpopulations suggests that both 5-HT and non-5-HT mechanisms contribute to reinforcing instrumental behaviors (Liu et al., 2014; McDevitt et al., 2014). These data suggest that the DRN serves to reinforce specific goal directed actions.

While the specific contributions of the MRN and DRN to behavioral flexibility remain relatively unknown, the impact of systemic 5-HT manipulation has been much more extensively examined in behavioral flexibility tasks (Evers et al., 2007; Lapiz-Bluhm et al., 2009; Baker et al., 2011; Mohler et al., 2011; Pennanen et al., 2013; Barlow et al., 2015). The selective 5-HT reuptake inhibitors (SSRI) citalopram or fluoxetine or deletion of the serotonin transporter have been shown to enhance behavioral flexibility selectively when animals are required to switch from either a learned or pre-potent behavior (Brigman et al., 2010; Brown et al., 2012). Additionally, serotonin depletion through tryptophan deprivation has been shown to impair behavioral flexibility in humans (Evers et al., 2007), however, in rats either no deficits have been observed or at higher doses or using parachloroamphetamine, a more fundamental deficit in reinforcement learning has occurred (Masaki et al., 2006; van der Plasse and Feenstra, 2008; Izquierdo et al., 2012). Specific serotonin receptors have also been examined in behavioral flexibility as they are commonly modulated by atypical antipsychotics and thus may offer therapeutic potential to patients who experience deficits in behavioral flexibility. Systemic injection of 5-HT2A or 5-HT2C receptor antagonists have impaired or improved the ability to reverse a spatial strategy respectively, while systemic injection of a 5-HT2A antagonist improved switching between visual cue and response guided strategies (Boulougouris et al., 2008; Baker et al., 2011). In addition, 5-HT6 and 5-HT7 receptor antagonists have been shown to improve behavioral flexibility in both control animals and disease models (Mohler et al., 2011; Nikiforuk, 2012; Nikiforuk and Popik, 2013; Wallace et al., 2014). The nature of these 5-HT effects suggests that the 5-HT system, through its various projections and receptors, plays diverse roles in behavioral flexibility depending on the specific conditions of a given experiment. Nonetheless, despite this diversity of action within the 5-HT system, an overriding role in behavioral flexibility is clearly evident which suggests that both general as well as specific forms of input to the 5-HT nuclei may be required during behavioral flexibility tasks.

Overall, common to both the DA and 5-HT system is their well-documented role in goal directed learning both when an animal must initially learn a behavioral discrimination and when a switch in behavior is required. Both types of learning require both information about the internal state of the animal, e.g., motivation, and motor action planning and execution. As demonstrated in the following discussion of LHb manipulations as well as the results of our own research presented herein, the LHb likely informs both monoamine systems of the ongoing or recently chosen relevant behavior so that this information can be used by the DA and 5-HT nuclei to achieve the specific functions ascribed to each.

# A ROLE FOR THE LHb IN BEHAVIORAL FLEXIBILITY

To our knowledge, no experiments have examined the role of the LHb in behavioral flexibility tasks. However, Matsumoto and Hikosaka (2007) found that LHb neurons tracked reversals in task contingencies. Apart from behavioral flexibility specifically, several lines of evidence support a role for the LHb in ongoing goal directed activity. The sole study looking at LHb neurons while a rat performed a behavior used a pellet chasing task. The authors found that a majority of neurons tracked velocity while the animals performed the task (Sharp et al., 2006). This could be interpreted as support for LHb's role in tracking goal directed behavior. Early reports of the effects of lesions to the LHb were typically performed using electrolytic lesions and often included damage to the medial habenula and surrounding thalamic nuclei/interpeduncular nucleus. In experiments under these conditions, it was found that rats were unable to switch behaviors or maintain behaviors when contingencies were changed using appetitive rewards (Thornton and Evans, 1984; Thornton and Davies, 1991). Another study by this group revealed an interesting interaction between stress and goal directed activity. Specifically, rats were given a one way active avoidance test in which they were required to climb onto an escape platform in order to avoid a shock. At low shock intensities, no differences were observed between controls and lesioned animals. However, when either the shock intensity was increased or the platform raised, lesioned animals showed a deficit in escape latency (Thornton and Bradbury, 1989). This suggests that when either stress (internal state) or effort (goal directed action) is increased, the LHb is needed for effective behavioral responses. It is unknown to date whether these effects related to stress or effort might also affect performance on more standard tasks of behavioral flexibility.

More recent studies have used fiber sparing lesions or temporary inactivation to restrict the extent of damage to adjacent areas as it was suggested that related damage may have contributed to effects observed in earlier LHb lesion studies (Wilcox et al., 1986; Thornton et al., 1994). Using fiber sparing excitotoxic lesions selectively in the LHb, an effect on hippocampal dependent learning has been observed in both the Morris water maze and in a spatial recognition task (Goutagny et al., 2013; Mathis et al., 2015). It was also found that LHb inactivation using GABA agonists impaired performance on a cue guided version of the water maze after the initial spatial memory test. One possibility is that animals were unable to alter their behavior after performing the previous test as would be expected if the LHb is important for behavioral flexibility (Mathis et al., 2015). At the very least, it supports a role for the LHb in hippocampal dependent spatial memory. Furthermore, Stopper and Floresco (2014) found that inactivation of the LHb was sufficient to disrupt both probability and temporal discounting. This deficit manifested as animals choosing equally either option in the two choice task (i.e., at chance levels), a result that could be interpreted as an inability to reorganize the appropriate behavior based on the specific cues in the environment. This view is in support of the proposed role of the LHb in organizing adaptive actions. Additionally since discounting tasks rely on choices determined by both subjective and objective value, the role of the LHb is not solely to signal punishment but rather LHb seems to have a richer role that includes decisions related to choice preference.

Based on the extant literature, then, one possibility is that when learning is either stressful or requires additional effort, either cognitive or physical, the LHb relays important information from forebrain areas such as the EPN and limbic areas perhaps to guide decision making relevant to adaptive strategic choices. To begin to probe the role of the LHb in behavioral flexibility under cognitively demanding circumstances, we undertook the following set of studies to clarify its important role when animals must switch from an ongoing to a newly relevant strategy.

The first experiment used in vivo extracellular recordings to address the role that the LHb plays in both spatial memory and behavioral flexibility when the external environment changes in a number of different ways. This task has been found to elicit reward prediction error (RPE) signals within the VTA of freely behaving rats raising the likelihood that these signals may be found in the LHb during this task as well (Puryear et al., 2010; Jo et al., 2013). Specifically, animals were taught to navigate a radial arm maze in order to collect rewards in which alternating arms had either a large or small reward. In the second half of a test session, the contingencies of the task were changed by switching to darkness, omitting some of the rewards, or reversing the reward contingencies. Three different manipulations were administered in order to probe if the LHb responded differently in a number of behavioral contexts or whether it played a more common role in each version of the task.

A second set of experiments examined the role of the LHb in a repeated probabilistic reversal learning maze task via inactivation with the GABA agonists baclofen and muscimol. In experiment 2a, rats were trained on a T-maze to make egocentric or spatial discriminations for 10 consecutive trials after which the contingencies were reversed. The correct arm was rewarded on 80% of the choices while the incorrect arm was never reinforced. This reward schedule was chosen for several reasons. First, it made the task more difficult than a deterministic reversal task causing the animals to commit more errors for analysis. Additionally, because the reward was probabilistic, error patterns could be examined for sensitivity to positive and negative reinforcement following LHb inactivation further revealing the role of the LHb in behavioral flexibility. In order to examine whether the effects observed in experiment 2a were due to general learning or recall effects or rather due to flexible behavior per se, a control experiment 2b was carried out in which inactivation of the LHb was administered either during initial acquisition of the probabilistic task, or during recall of the contingency on the following day. All of these experiments in varying ways required animals to be flexible in their behavior, thereby allowing for an examination of how the LHb contributes to this ability in rats.

### MATERIALS AND METHODS

### Subjects

Twelve male Long-Evans rats (350–500 g, Simonsen Laboratories) and 31 male Long-Evans rats (350–500 g, Charles River) used in experiments 1 and 2a and 2b respectively, were individually housed in a temperature-controlled environment with a 12 h light/dark cycle. All experiments were conducted during the light phase. All subjects were given food and water ad libitum and handled for at least 5 days before behavioral testing began. During behavioral testing, rats were maintained at 85–90% of their maximum free feeding body weight. All animal care was conducted according to guidelines established by the National Institutes of Health and approved by the University of Washington's Institute for Animal Care and Use Committee.

#### Experiment 1: Differential-Reward, Spatial Memory Task

Behavioral training of a differential reward spatial memory task was conducted on an 8-arm radial maze as described previously (Puryear et al., 2010). The black Plexiglas maze consisted of a central platform (19.5 cm dia) that was elevated 79 cm off the ground with eight radially-extending arms (58 × 5.5 cm), see **Figure 2A**. At the end of the maze arms was a small receptacle that contained, on alternating arms, either a small (0.2 mL) or large (0.6 mL) amounts of ''reward'' (50% diluted Ensure chocolate milk). Each maze arm was hinged such that access to the rewards were remotely controlled by moving the proximal segment up or down, connecting or disconnecting the ends of arms from the central platform. The maze was surrounded by black curtains with several visual cues for orientation (**Figures 2A,B**).

represented in red. Rats received reinforcement on 80% of trials for correct choices and 0% for incorrect choices. When rats chose the correct arm 10

trials in a row, contingencies were reversed.

Frontiers in Behavioral Neuroscience | www.frontiersin.org November 2015 | Volume 9 | Article 295 |

Rats habituated to the radial arm maze through free exploration initially with randomly placed puddles of reward, then with rewards only at the end of arms. Once the animals consistently visited the ends of arms, training of the differential reward spatial memory task began. Each session consisted of two blocks of five trials. Each trial consisted of a study phase and a test phase. During the study phase of each trial, four of the eight arms (two large-reward and two small-reward arms) were pseudorandomly selected and presented individually. After presentation of the fourth arm, the test phase began by making all maze arms accessible at once. The rat was required to collect the remaining rewards. Revisits to previously visited end of arms within a trial were coded as errors. When the animal returned to the central platform after visiting all eight arms, the arms were lowered so that the rat was confined to the platform, and the experimenter re-baited the arms. The locations of differentially rewarded arms were held constant for each rat throughout training but were counterbalanced across rats. Once rats made an average of one or fewer errors per trial on a training day, they underwent a surgical procedure for the implantation of recording electrodes. Training ranged from 20–40 sessions across rats.

During recording sessions, Block 1 consisted of four baseline trials, where reward locations were kept identical to that during initial training. One of the three experimental manipulations was conducted during the four trials of block 2: reward switch, reward omission, or darkness. Large and small reward locations were switched during ''reward switch''. In ''reward omission'' trials, two pseudorandomly chosen rewards (one large, one small) were omitted during the study phase. Reward switch and omission creates conditions where the animal would encounter larger than expected rewards, smaller than expected rewards, and unexpected absence of rewards. In the darkness condition, maze lights were turned off to eliminate access to visual cues. On average, rats were exposed to 10 switch and omission sessions and seven darkness sessions.

#### Experiments 2a and 2b: Repeated Probabilistic Reversal Learning

Eleven rats were trained on a modified T-maze with return arms so that rats could freely return to the start location after rewards were collected (**Figures 2C,D**). The maze was controlled by custom built robotics and software to open and close arms and deliver rewards (z-basic, Elba corp., Beaverton, OR, USA). Initially, rats were trained to alternate reward arms for 10 trials (i.e., access was allowed for a single alternating arm per trial). Then rats were given 10 free choice trials (i.e., simultaneous access to both arms) in order to determine if rats had a strong choice bias. No rats tested displayed a strong choice bias across days. Once animals completed the initial training session in less than 30 min for two consecutive days, probabilistic reversal training began (4–8 days). The initial training arm was randomly selected for each rat. On the first day of training, animals were allowed to freely choose either arm with the correct arm resulting in a one pellet reward 80% of the time and the incorrect arm never resulting in reward. Either choice resulted in a 10 s intertrial interval (ITI; rat was located at the goal area) to control for the time it took to consume the reward following a correct choice in which a pellet was delivered. Rats then returned to the stem of the T-maze via a return track. Animals continued to choose either arm until it chose the correct arm 10 times in a row. The animal was then removed from the maze. The following day, the opposite arm was designated the correct arm and animals were required to reverse their choice in order to receive reinforcement. Again animals were allowed to freely choose either arm until 10 consecutive correct choices were made at which time they were removed and returned to the colony. For every subsequent day, the initially correct arm was psudorandomly chosen and was switched whenever a rat made 10 consecutive correct choices. A session continued for 2 h or until 200 trials were completed. Animals were not tested with inactivation or control injection until they were able to complete at least two reversals and 200 trials for two consecutive days. Once testing began, animals were randomly assigned to receive local LHb infusion of either saline injection or injection of baclofen/muscimol 6 min prior to the test beginning followed by the opposite treatment the following day in a repeated measures design. Animals performance was examined for any effects of order of treatment.

One possibility is that any results observed in experiment 2a are due to an inability of rats to learn discriminations in general, or an impaired ability to recall contingencies learned previously. In order to test these possibilities, an additional control study (experiment 2b) was performed in which the LHb was inactivated on either acquisition of an initial discrimination, or retention of that discrimination of the following day. Specifically, once animals were acclimated to the maze, rats (n = 20) were trained to receive 80% reinforcement on the arm opposite their innate bias observed during the maze acclimation stage. Once animals chose the correct arm 10 trials in a row, they were removed from the maze and returned to the colony. The following day, the opposite treatment (saline or baclofen and muscimol) was given and animals performed the same discrimination to 10 consecutive correct trials in the same manner as the previous day. In one group of animals LHb inactivation occurred during initial acquisition with saline treatment during retention trials while another group received the reverse injection schedule.

#### Stereotaxic Surgery

For experiment 1, recording tetrodes were constructed from 20 µm lacquer-coated tungsten wires (California Fine Wire). Tetrodes were places in custom made drives and impedances were measured at 1 kHz then, if necessary, gold-plated or replaced such that final impedances were 0.2–1.2 MΩ. In both experiments rats were deeply anesthetized with isoflurane, followed by administration of an antibiotic (Baytril, 5 mg/kg) and analgesic (Ketoprofen, 1 mg/kg). The skull was exposed and holes were stereotaxically drilled to allow for implantation of either recording electrodes (A-P: −3.5, M-L: ± 0.9, and D-V: 4–5 mm) or guide cannula (A-P: −3.5, M-L: ± 0.9, and D-V: 4.35 mm) dorsal to the LHb. Six animals in experiment 1 were implanted with a 6-tetrode, linear bundle drive unilaterally, and six animals with two 2-tetrode microdrives bilaterally. A reference electrode was also implanted near the anterior cortex (ventral to the brain surface 1–2 mm), and a ground screw was secured to the skull. The drives were then fixed to the skull with screws and acrylic cement. For experiments 2a and 2b, 31 rats were implanted with bilateral guide cannula (Plastics One, Roanoke, VA, USA) aimed 1 mm above the LHb (A-P: −3.5, M-L: ± 0.9, and D-V: −4.35 mm). Rats were allowed to recover for 5 days with free access to food and water. After recovery, rats were returned to a food restricted diet.

Experiment 1 rats were retrained until they completed 10 trials within an hour for two consecutive days. During retraining, tetrodes were slowly lowered to the LHb, no more than 320 µm/day. Once in the target region tetrodes were lowered in 40 µm increments in search of units, no more than 200 µm/day. Once a unit was found, recordings were conducted. At this point, experimental manipulations were also introduced in the behavioral task (see task description). Tetrodes were left in the same location for up to three sessions in an attempt to record units across multiple experimental conditions. For experiment 2, rats were placed on food restriction following recovery and then began training procedures.

#### Microinjection Procedure

A day before microinjection in experiments 2a and 2b, the injection cannula (Plastics One, Roanoke, VA, USA), which extended 1 mm beyond the guide was inserted into the guide cannula and left in place for 1 min. This was done to control for any initial mechanical damage done by the injector. On a test day, rats were injected with a combination of baclofen and muscimol (Bac/Mus, Sigma) in 0.9% saline, GABA b and a agonists respectively, or vehicle. Both injections used a volume of 0.2 µL (50 ng/0.2 µL baclofen and muscimol) and a 0.15 µL/min infusion rate. This is similar to other LHb inactivation studies that used baclofen and muscimol (Stopper and Floresco, 2014; Mathis et al., 2015). The injection cannula was connected to a 10 µl syringe (Hamilton) via polyethylene tubing (PE 20) using an infusion pump (KD Scientific).

#### Data Collection and Analysis

In experiment 1 all cellular recordings were conducted using a Cheetah data acquisition system (Neuralynx). Cell signals were filtered between 0.6 and 6 kHz, and digitized at 32 kHz. Neuronal spikes were recorded for 2 ms after a voltage deflection exceeded a predetermined threshold on any of the four channels of a tetrode (500–7000× amplification). Animal position data were sampled at 30 Hz via a ceiling mounted video camera that tracked LEDs attached to a preamplifier on the animal's head. Signals were manually sorted using Offline Sorter (Plexon, Inc.) that allows segregation of spikes based on clustering parameters such as spike amplitude, spike duration, and waveform principle components. Cells were further analyzed if the waveform amplitude was at least 1.5 times that of the background cellular activity, and if the cluster boundaries were consistent across the session. The behavioral correlates of unit activity were analyzed using custom Matlab software (MathWorks Inc., Natick, MA, USA). Position data were used to manually place event flags to mark various aspects of behavior throughout the task including reward encounter, animal turns, inbound movement, trial starts, and errors. Given our hypothesis that the LHb regulates VTA dopamine cell responses to reward, reward-related responding in LHb cells were evaluated using similar methods that were used to identify VTA reward responses in prior studies (e.g., Jo et al., 2013; Puryear et al., 2010). In short, neural data were organized into peri-event histograms (PETHs) that were centered around the time of reward encounters (±2.5 s; 50 ms bins). Cells were considered to be reward related if peak (or valley) firing occurred within ±150 ms of reward encounters, and the mean firing rate of the ±150 ms window around the reward encounter was over 150% or under 75% of the mean session firing rate. Throughout the course of the experiment, it became clear that the LHb contained velocity correlated cells. Thus, firing rates of LHb neurons were correlated with the velocity of the animals as they traversed the maze. Based on animal tracking data, ''instantaneous'' velocity of the animal was determined by dividing the distance between two points by the inverse of the video sampling rate (Gill and Mizumori, 2006; Puryear et al., 2010; Mizumori et al., 2004). Each cell's firing rate was then correlated with these velocity measures (Pearson's linear correlation; α = 0.05) within the range of 1–30 cm/s. Velocity analysis did not include times when the animal was not moving, for example during reward consumption.

For local field potential analysis (LFP), signals from each tetrode within the LHb were analyzed in the following manner. Power was calculated using the multitaper Fourier analysis, mtspecgramc, from the Chronux toolbox (Mitra and Bokil, 2007; Bokil et al., 2010), using a 500 ms window with a 50 ms step. The resulting spectrogram was filtered for the theta frequency band (4–8 Hz) and binned relative to event timestamps. Values were interpolated where possible, otherwise they were set to NaN. The mean was taken over the theta frequency band using the MatLab function nanmean, which excludes NaN values. Finally, values were converted to dB using the relation 10<sup>∗</sup> log10(µV 2 )/Hz and the mean was taken over the bins of each event occurrence, again using nanmean. Analysis of LFP velocity correlates matched that of unit analysis. Reward responses were analyzed by taking the 200 ms around reward encounter and comparing it with another 200 ms time window 1800 ms after the reward encounter which was a time found to have similar velocity to the reward encounter. Comparisons for significant changes around the reward encounter were tested using a student's t-test with Bonferroni corrections for multiple comparisons. Proportions of responding LFP signals were analyzed for significant increases above chance with chi square tests. A two way analysis of variance (ANOVA) was used to test for differences around the time of reward between blocks one and two to measure whether reward manipulations change reward approach responses.

For experiment 2, an error analysis was conducted to determine whether inactivation caused changes in the ability to initially inhibit the previously correct choice pattern (perseverative errors) and/or the likelihood that an animal maintained the new choice pattern once selected and reinforced (regressive errors). The first trial of reversal learning was not counted as a perseverative error, but served as initial negative feedback. The following trials were divided into blocks of four trials. If a rat continued to choose the previous location in at least three of the four trials, the block was counted as perseveration. Once the rat made two correct choices in a given block, all subsequent errors were counted as regressive errors as in previous studies (Brown et al., 2012). Additionally, an analysis of win-stay and lose-shift probabilities was carried out. Win-stay probability is the likelihood that a rat will choose the correct arm if it was rewarded in the immediately preceding trial (the number of subsequent correct choices/the total number of preceding rewarded correct choices). Loseshift probability is the frequency with which the rat shifted to the other choice when the correct arm was not rewarded on the previous trial (the number of subsequent incorrect choices/the total number of preceding unrewarded correct choices). This was observed on only a minority of trials. These measures are thought to represent sensitivity to positive and negative reinforcement respectively (Means and Holsten, 1992; Bari et al., 2010; Amodeo et al., 2012). The effects of LHb inactivation were assessed in terms of the number of trials per reversal, total number of reversals completed, and all error measures; these parameters were analyzed using a repeated measures Student's t-test. For experiment 2b, a two way ANOVA was used to test for order of treatments as well as stage of discrimination.

#### Histology

After the completion of all recording sessions, tetrode locations and cannula placements were verified with marking lesions. Rats were deeply anesthetized with 4% isoflurane, and each tetrode (Experiment 1) was marked by passing a 15 µA current through each tetrode wire for 15 s. The animals were then given an overdose of sodium pentobarbital and transcardially perfused with 0.9% saline and a 10% formaldehyde solution. Brains were stored in a 30% sucrose in 10% formalin solution at 4◦C for 1 week. The brains were frozen, and then cut in coronal sections (45 µm) on a freezing microtome. The sections were mounted on gelatin-coated slides, stained with cresyl violet, and examined under light microscopy. Only cells verified to be recorded in LHb were included in the data analysis. In experiments 2a and 2b, only cannula placements within the LHb were included in the analysis. For experiment 1, the locations of recorded cells were determined using standard histological reconstruction methods.

#### RESULTS

#### Histology

Histological results are summarized in **Figure 3**. Of the 12 animals implanted in experiment 1, LHb placed tetrodes were confirmed in six of these animals. In the LHb, a total of 36 unique units were recorded throughout this task. Many cells were recorded for multiple sessions (up to three) in attempt to capture their responses under various experimental manipulations. Units were considered the same cell if they were recorded at the same depth and had comparable waveforms. There was minimal ambiguity in this selection process with signals in the LHb being relatively sparse and often only one cell being recorded

per session. Of the 31 rats implanted in experiment 2, 19 had bilateral cannula placements within the LHb that completed the study and were included in the analysis. Of the remaining rats, two did not meet training criteria during experiment 2a and were removed prior to injection. An additional rat had to be removed from the study due to complications following surgery. An additional two rats in experiment 2a had misplacements in the hippocampus (dorsal to the LHb) and completed six and three reversals respectively indicating no clear effect of a restricted hippocampal inactivation on the task. Of the seven rats with misplacements in experiment 2b, five rats had placements in the mediodorsal thalamus. Interestingly, in each of these cases, rats did not complete the task and qualitatively engaged in freezing behavior or a refusal to move. The remaining two anterior placements did not show any signs of impairment.

## Experiment 1

#### Behavior

Rats were exposed to one of three randomly chosen different reward manipulations during the second block of trials in a daily session. The total number of errors across both blocks of trials were compared. On sessions in which rewards were switched in the second block animals tended to make more errors in the second block (5.93 ± 1.14) than the first block (3.93 ± 0.68), however, this difference was not significant, t<sup>14</sup> = 1.36, p > 0.05. Similarly, when rewards were omitted rats in the second block (4.77 ± 1.36) tended to make more errors than the first block (2.23 ± 0.63) which was not a significant difference, t<sup>12</sup> = 1.45, p > 0.05. Finally, rats run in darkness during the second block (4.00 ± 1.49) were not significantly different from performance in the first trial block (3.00 ± 0.94), t<sup>5</sup> = 0.47, p > 0.05.

#### Mean Firing Rates

Sample LHb traces are shown in **Figure 4A**. Mean firing rates ranged from 0.5–107.6 spikes/s. **Figure 4B** shows the distribution of mean firing rates for LHb cells recorded in the study. Over half the sessions contained units with an average firing rate of less than 10 spikes/s. However, 44% of neurons had mean firing rates over 10 Hz. The wide range of average firing rates suggests that there were multiple cell types recorded throughout this study in accordance with other in vivo rodent or primate studies which have found population averages around 10 Hz or slightly below (Sharp et al., 2006; Matsumoto and Hikosaka, 2007; Aizawa et al., 2013; Goutagny et al., 2013).

#### Reward and Consumption: Single Unit Data

Only 2 (of 36) recorded cells showed firing that correlated with reward. Thus, here we provide only qualitative accounts for each cell. One neuron met criteria for a negative RPE cell; shown in **Figure 5A**. The cell was significantly inhibited at the time of reward encounter, and was excited when rewards are omitted (see ''Materials and Methods'' Section for criteria).

Another neuron was found to track both velocity and reward consumption, shown in **Figure 5B**. The cell was significantly correlated with velocity (Pearson's r = 0.90, p < 0.001). It also exhibited firing when the animal was not moving, but consuming reward, with a qualitatively differential duration according to reward size. Excitation was not observed during reward omission, and the cell started firing only after the animal started to move.

#### Movement-Related Single Unit Responses

Overall, 66% of LHb cells (23/36) were significantly correlated with animal running speed. Of these running speed cells about half (12/23) showed positive correlations while the other half showed negative correlations (11/23). Example

LHb during a recording session. (C) Frequency count of firing rate ranges for all recorded LHb units.

unit data are shown in **Figures 6A,B**. **Figure 6C** shows a scatterplot of the stability of these correlations between blocks. Units included in the plot were found to be significantly correlated with animal running speed across both trial blocks. Different colors/shapes indicate the experimental manipulation conducted during the session. No differences were observed for the number of positively, negatively, or uncorrelated with velocity cells across switch, omission, and darkness manipulations (χ <sup>2</sup> = 0.39, p > 0.05, χ <sup>2</sup> = 0.12, p > 0.05, and χ <sup>2</sup> = 2.11, p > 0.05, respectively). In fact, many of these cells were recorded for multiple sessions and comparable correlations were found across experimental manipulations.

#### LFP Responses

The theta frequency band (4–8 Hz) within the LFP data were analyzed for sessions containing units in the LHb (raw trace shown in **Figure 4C**). Fifty-two individual LFP signals were recorded across six rats. In general, theta power (dB) was found to be significantly correlated with animal running speed (velocity) as well as reward approach. Using instantaneous velocity correlations and modulation of theta power around the time of the reward, 46 of the 52 LFP signals were identified with either or both measures (**Figure 7**). Specifically, 12 were found to correlate with reward only, 15 with both reward and velocity, and another 21 with velocity only. Chi square analysis revealed proportions to be significantly above chance, χ <sup>2</sup> = 6.31, p < 0.05; χ <sup>2</sup> = 9.67, p < 0.05; and χ <sup>2</sup> = 17.76, p < 0.05 respectively.

LFP signals which significantly correlated with velocity (e.g., **Figure 8A**), were more likely to have positive (n = 32) correlates than negative (n = 4) correlates, χ <sup>2</sup> = 12.83, p < 0.05. To examine the stability of velocity correlations, sessions were then grouped by tetrode location, such that a tetrode held at the same depth for multiple sessions would be considered a single ''unit''. Grouped in this way, 18 out of 22 ''units'' were found significantly correlated with animals running speed. **Figure 6D** shows the stability of these correlations across blocks, indicating that LHb theta did not respond to reward or environmental changes. Specifically, from block 1 to block 2, only seven

signals significantly changed their velocity correlations while 29 remained the same.

Of the 27 reward responsive signals, there was no difference in the proportions of positive (n = 13) and negative (n = 14) correlated signals, χ <sup>2</sup> = 0.07, p > 0.05 (see **Figures 8B–D**). Further, the magnitude of response for both the positive and negative responses did not change across conditions from the first to second block indicating that the location of the reward or approach to it may have determined the LFP reward response rather than the size or presence of the reward itself. Specifically, an effect of time (±200 ms) around the reward was observed (F4,200 = 4.33, p < 0.05) but not of block (F1,200 = 0.19, p > 0.05) or an interaction, F4,200 = 0.81, p > 0.05.

## Experiment 2a

Inactivation of the LHb during repeated probabilistic reversal learning was performed in order to test the specific contributions of the LHb to behavioral flexibility performance as manifest through analysis of errors committed during performance. Six rats with bilateral good cannula placements were included in the final analysis. As shown in **Figures 9A,B**, comparison of inactivation (48.5 ± 9.3) with saline control injections (35.5 ± 7.4) revealed that inactivation of the LHb resulted in an increase in trials to criterion for initial acquisition (t<sup>5</sup> = 4.01, p < 0.05), and a decrease in the number of reversals completed over the 200 trial session, t<sup>5</sup> = 7.00, p < 0.01 (2.3 ± 0.2 and 4.7 ± 0.4, respectively). Additionally, as revealed in **Figure 9C**, the number of trials to complete a given discrimination did not differ across acquisition or any reversal for either saline or Bac/Mus treatment. Rather, LHb inactivation resulted in a consistently higher number of trials to criterion across discrimination stages, an effect of treatment (F1,15 = 11.24, p < 0.05), no effect of discrimination stage (F2,15 = 0.06, p > 0.05), and no interaction effect (F2,15 = 0.11, p > 0.05).

No effect of order of treatment was observed on total number of reversals completed (saline day one (4.7 ± 0.3) vs. saline day two (4.7 ± 0.9), t<sup>4</sup> = 0.00, p > 0.05) and (Bac/Mus day one (2.3 ± 0.3) vs. Bac/Mus day two (2.3 ± 0.3), t<sup>4</sup> = 0.00, p > 0.05) so the treatments were collapsed into single groups of saline and Bac/Mus for further analysis. Overall, no differences were observed between saline (2712.0 ± 183.3) and Bac/Mus (3274.0 ± 702.8) treatment in terms of the total time it took animals to complete the task, (t<sup>5</sup> = 0.65, p > 0.05), revealing no gross changes in motor or sensory activity during the task.

An analysis of errors was conducted to further probe the deficit in the probabilistic reversal learning task following LHb inactivation (**Figure 10**). The deficit in discrimination performance was due to an increase across multiple error types. Specifically, no increase in perseverative errors was observed following Bac/Mus treatment (7.4 ± 3.0) compared with saline treatment (1.8 ± 0.4), t<sup>5</sup> = 1.87, p > 0.05. There was an increase in regressive errors (saline = 5.5 ± 0.6 vs. Bac/Mus = 12.1 ± 1.7), t<sup>5</sup> = 3.41, p < 0.05. To assess the sensitivity of LHb inactivation in relation to reward feedback, win-stay and loseshift ratios were also analyzed. LHb inactivation led to a decrease in the win-stay ratio (saline = 0.79 ± 0.01, Bac/Mus = 0.57 ± 0.06, t<sup>5</sup> = 4.01, p < 0.05) and in increase in the lose-shift ratio (saline = 0.27 ± 0.01, Bac/Mus = 0.54 ± 0.04, t<sup>5</sup> = 5.77, p < 0.01).

# Experiment 2b

Due to an effect of LHb inactivation on the first daily discrimination in experiment 2a, one possibility is that learning in general is affected by the manipulation and the effects are not due to the requirement to perform flexibility per se. In order to address this, an additional control experiment was run in order to test the effects of LHb inactivation on initial probabilistic learning in rats which had not been trained to perform flexibility prior to testing (**Figure 11**). Two groups of animals were run receiving either Bac/Mus on acquisition and saline on

platform. (B) Example unit negatively correlated with velocity under the same conditions. The green line is the session average firing rate for each neuron. The red line is the average velocity of the rat across the peri-event time histogram. (C) Scatterplot showing the stability of individual neuron movement correlations prior to and after reward manipulation. Cells are divided into those that showed an overall significant correlation with velocity during the session and those that did not show a significant correlation for the session. (D) LFP correlation of Theta power with velocity across blocks. Channels are divided into the three possible reward manipulations that were experienced by rats.

retention the following day (n = 7), or saline on acquisition and Bac/Mus during retention (n = 6). A two way ANOVA revealed a significant effect of discrimination stage (F1,11 = 6.32, p < 0.05) but no effect of either treatment order (F1,11 = 0.19, p > 0.05) or an interaction (F1,11 = 0.33, p > 0.05). Specifically, Bac/Mus (83.71 ± 14.85) and saline (72.83 ± 7.23) treated rats required a similar number of trials to reach acquisition criterion as well as during retention (saline = 50.43 ± 5.87, Bac/Mus = 52.00 ± 11.82). However, overall acquisition (78.69 ± 8.14) took significantly more trials to criterion than retention (51.15 ± 5.78). Additionally no differences in the seconds per trial completed (saline = 24.60 ± 4.79, Bac/Mus = 28.19 ± 4.76, t<sup>11</sup> = 0.53, p > 0.05) during acquisition were observed. However, during retention the seconds per trial completed after LHb inactivation (36.03 ± 4.46) was significantly higher than under control (23.48 ± 2.45) conditions, t<sup>11</sup> = 2.57, p < 0.05 (**Figure 11C**). This change in the time per trial is in contrast with effects of inactivation on this measure across both experiments 2a and 2b where otherwise no difference was observed.

# DISCUSSION

This study reveals for the first time that the LHb is involved when animals are required to express learned flexible behavior. Experiment 1 revealed that a majority of neurons in the LHb track ongoing movement, with often high correlations (>0.9) with running speed. The population of movement correlates is split, with half these cells being positive correlates and half negative. In addition, an analysis of theta rhythms recorded simultaneously with LHb unit activity revealed that running speed information is also represented at the population level. Additionally, the approach to the reward area also resulted in a significant increase in theta power in 52% of the recorded LFP signals. Neither theta reward nor velocity power correlates changed when reward contingences were manipulated in the second block during, reversal of reward placements, omission of reward, or in darkness. This pattern of effects suggests that the tracking of movement and reward approach by these cells/population signals are more related to ongoing behavior and not tied to reward-specific responses during the task. In contrast,

only 2 out of 36 unique units recorded were found to be reward related—one being linked to consummatory behavior in addition to velocity, and the other exhibiting activity suggestive of a code for an RPE.

In experiment 2a, LHb inactivation during repeated probabilistic reversals of a spatial/response task resulted in impairment in performance that was consistent across multiple reversals, and not related to nonspecific sensory or motor impairment. Animals required more trials to criteria per completed discrimination, suggesting a general impairment in reward discrimination learning when animals need to be flexible in their behavior. Additionally, an increase in regressive errors was observed along with changes in win-stay and lose-shift ratios which suggest a reduction in both reward and non-reward sensitivity. In experiment 2b, however, no effects of LHb inactivation were observed on either the initial acquisition of a probabilistic reward discrimination or on the retention of that discrimination on the following day. Overall, the combined results of these experiments indicate that the LHb tracks ongoing behavioral information for the purpose of facilitating processes that are required for behavioral flexibility. Specifically, LHb may track ongoing behavior so that actions toward goals are optimized. For this reason the LHb may only be required when behavioral strategies must be used in cognitively demanding tasks in order to track ongoing information about specific motor behaviors currently being performed. Specifically, in non-aversive situations, LHb may be involved to the extent that behavioral response strategies are needed to achieve a goal; it may track and direct behaviors to follow dynamic goal information.

The general contribution of the LHb to expressing learned goal directed behavioral activity under demanding conditions is supported by related findings. Mathis et al. (2015) found that when the LHb was inactivated or if excitatory transmission was blocked with the AMPA receptor antagonist CNQX, rats were unable to express a learned spatial memory of the escape platform location in the Morris water maze and instead showed thigmotaxis. Thigmotaxisis is a behavioral strategy initially used when rats are first placed into the maze, one that is characterized by a preference to remain close to the perimeter of an environment. The stress of the water maze may be sufficiently demanding in this instance to require the LHb. Inactivation of the LHb in well trained rats also disrupted both delay and probability discounting by inducing random patterns of choice which could be interpreted as a default or guessing mode (Stopper and Floresco, 2014). These studies may relate to other findings that LHb optogenetic activation can promote active, passive and conditioned behavioral avoidance which suggests that LHb activity is important for learning specific behaviors in response to external/internal stimuli (Stamatakis and Stuber, 2012).

The role for the LHb in utilizing context or behavioral state information to guide behavior can be further illuminated by comparing it with a recent proposal for basolateral amygdala function put forth by Wassum and Izquierdo (2015). The authors propose that the role of the basolateral amygdala is to assign an integrated value signal to specific stimuli in order to guide adaptive responses. We propose a more fundamental role for the LHb in behavioral flexibility in which it provides the current behavioral state of the animal in order to properly select the

appropriate actions within that context. This difference is best exemplified by examining the effects of inactivation of these structures during discounting behaviors. As outlined above, inactivation of the LHb leads to a guessing mode during delay and probability discounting where each choice is selected roughly half of the time (Stopper and Floresco, 2014). Inactivation or DA manipulation of the basolateral amygdala in both delay and probability discounting causes rats to prefer the smaller certain or immediate reward once either the probability decreases or the delay increases (Winstanley et al., 2004; Churchwell et al., 2009; Ghods-Sharifi et al., 2009; Larkin et al., 2015). Crucially, however, discounting curves in both tasks increased rather than decreased in slope indicating a change in preference rather than entering a guessing mode. This helps to distinguish the proposed role for the LHb in identifying the current behavioral state of the animal from a role in valuation of specific actions as is suggested for the basolateral amygdala (Wassum and Izquierdo, 2015).

The mechanism by which neural signals within the LHb translate to an ability to switch behaviors under cognitively demanding conditions in freely moving rodents remains somewhat uncertain. One possible mechanism that has been proposed is through signaling RPEs as has been seen with LHb neurons in head-fixed monkeys (Matsumoto and Hikosaka, 2007; Proulx et al., 2014; Stopper et al., 2014). Although it is exciting to confirm the presence of RPEs in LHb cells, this is a considerably smaller proportion of cells than expected, as the Hikosaka had found over 80% of primate LHb cell activity to be related to rewards (Matsumoto and Hikosaka, 2007). Considering the differences in the animals and task used, there are a number of reasons why this could be the case. The original task used by Matsumoto and Hikosaka was much more Pavlovian in nature compared to our maze based tasks. While LHb neurons did show some excitation during unrewarded trials, these neurons showed much greater responses to the cues

significant increase on the trials to criterion for the initial acquisition of the test day. (B) There was also a decrease in the total number of completed reversals following LHb inactivation. (C) LHb inactivation resulted in a higher number of trials to criterion across competed discriminations. <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

that predict reward omissions. They also showed high levels of responding during the first trial where a reward was omitted; however, once the animal knows whether or not it will be rewarded and the outcome is congruous with the expectation, there is little change from baseline at the actual outcome. Given that the animals in experiment 1 were highly trained, this may explain why we failed to observe more responses directly at the time of reward. However, this cannot completely account for the lack of observed RPE signals as in every case, the rat could not predict which rewards would be switched or omitted during a given session. In addition, the task used in primates requires the subject to be head-fixed, which would abolish movement related neuronal activity. It could be argued that in the present study, if the rat LHb was tracking some sort of discrete cue, movementrelated activity of the LHb cells somehow masked these signals. This is not particularly probable, as the task was designed without explicit cues. However, it may be the case that movement itself is the most reliable cue for when rewards will be received.

In our task, the animal is very well trained, and although navigation is goal directed, some aspects of the task are highly predictable and reliable. For example, all rewards are the same distance away from the center; therefore, once a choice is made, animal trajectory becomes perhaps the most reliable reward predictor. Movement correlates in rat LHb have been previously reported during a pellet-chasing task, which encourages the animals to run in semi random trajectories (Sharp et al., 2006). They found that ∼10% of recorded neurons to be significantly correlated with running speed as compared to our 66% of neurons and LFP theta signals. If the LHb is tracking reward cues, animal movement may be overrepresented in this task. This would suggest that movement itself can serve as a reward predictive stimulus in freely moving animals. This is supported

by the finding that despite only finding 2 of 36 neurons related to reward consumption, over half of the LFP signals recorded showed in increase in theta power during reward approach that was not related to consumption. Theta power synchrony is thought to be an effective means for relaying information between brain areas (Panzeri et al., 1999; Fries, 2005) and has been found to be related to behavioral performance between the LHb and the hippocampus (Goutagny et al., 2013). This raises the possibility that the LHb may also use theta synchrony to relay velocity/reward approach information to other areas as well such as the VTA (Kim et al., 2012). Indeed, velocity or reward related approach neural correlates have been observed in the radial arm spatial memory task previously suggesting this information may be important for reward learning (Puryear et al., 2010). A similar interpretation for strong velocity correlated neural activity has been proposed for another major afferent system of VTA DA neurons, the lateral dorsal tegmentum. The latter neurons were postulated to regulate reward responses of DA neurons according to the learned behaviors needed to obtain rewards (Redila et al., 2015).

To dissociate predictive movement from movement per se, future studies should include an open field component. If a proportion of the movement correlates found in the present study were actually reward predicting cues, then a subpopulation of these cells would not exhibit velocity correlates if recorded during general ambulation. To further investigate reward related responses, future studies should also consider using a task featuring explicit cues for rewards in order to observe LHb responses to reward predicting cues. Regardless of the interpretation of the movement correlates, it is clear that a proportion of LHb cells are heavily modulated by ongoing speed of the animal in freely navigating rats. LHb cells recorded in this study showed either positive or negative running speed correlates, which suggests that there may be subpopulations that code for different movement parameters. Either type of velocity code could inform other structures and/or other LHb cells of ongoing behavior in anticipation of reward encounters. If the primary function of the LHb is to suppress movement during unfavorable conditions, as Hikosaka (2010) proposes, it would be adaptive for movement suppression networks to be informed of ongoing behavior. In this way, velocity information could bias action specific learning, and discourage actions that lead to negative outcomes. Velocity information could also be helpful for more specific control over movement suppression such as the timing of the suppression.

Experiment 2 showed a deficit in behavioral flexibility following inactivation. One possible cause of this effect is the connection of the LHb to the hippocampus. The hippocampus has been proposed to signal context and context changes important in adaptive decision making (Mizumori et al., 2004; Smith and Mizumori, 2006; Kim and Frank, 2009; Bachevalier et al., 2015). LHb cells have been found to be more active and to phase lock during hippocampal theta than during slow wave sleep across the sleep wake cycle (Goutagny et al., 2013). Theta generated within the LHb was also highly synchronized with hippocampal theta, and this synchrony was linearly correlated with performance on a memory task. The hippocampus is thought to be important for reversing probabilistically learned tasks in humans (Shohamy et al., 2009; Dickerson et al., 2011; Delgado and Dickerson, 2012). This effect is not due to any spatial aspects of the task which suggests that the hippocampus is important for applying higher order signals to goal directed behavior such as would be necessary when reward contingencies change probabilistically. This could account for the effects observed in experiment 2 although this is largely speculative. Support for this account of a loss of context however, can be found in the fact that in the present study, animals' win-stay and lose-shift ratios fell to around 50% suggesting they are likely guessing. This phenomenon has also been seen in both delay and probability discounting following LHb inactivation (Stopper and Floresco, 2014). In experiment 1 theta power was correlated with velocity which also supports that this velocity related information may serve as a predictive stimulus during behavioral flexibility through cross talk with context related signals in the hippocampus. Further research should determine if this interaction could account for deficits observed following LHb interaction. How the LHb, hippocampus, and midbrain monoaminergic systems interact during behavioral flexibility clearly requires more research.

Ten subnuclei have been described in the LHb (Geisler et al., 2003), but their behavioral relevance has not been studied. In the present study, although each area was not systematically examined due to the limitations of using a movable microdrive to record signals, no differences between either LFP or single units recorded medially or laterally were observed. This suggests that the information broadcast by the LHb at least in the case of appetitively driven behavioral flexibility, is more or less uniform and likely additional input into target structures is needed in order to achieve any required signal specificity. This view is supported by the general effect observed in experiment 2a in which a broad increase in both trials to criterion as well as error types was observed. More specifically, results support that the LHb contributes to behavioral flexibility by regulating action selection rather than a more specific influence on changing behavior in response to positive or negative reinforcement given that both win-stay and lose-shift behavior fell to nearly chance levels. Understanding responses of the LHb in a wide variety of tasks as well as using more targeted recording techniques is necessary to elaborate both the types of behavior it is involved in as well as whether signal specificity exists within the LHb as the anatomy might suggest.

Based on several recent technological advances, we feel that the LHb is on the cusp of giving up many of the secrets about its underlying behavioral and physiological functions. Proulx et al. (2014) have outlined how optogenetics combined with transgenic Cre-driver mouse lines promises a new understanding of how various sub regions and their respective afferent and efferent connections contribute to LHb behaviors. Other recent work sought to understand precisely how the LHb contributes to monoamine output during behavior and is ongoing (Shen et al., 2012; Stamatakis and Stuber, 2012; Stopper et al., 2014). The present results further suggest that movement of the animal during behavior should also be taken into account when addressing the role of the LHb in learning and memory functions. Due to the discovery of synchrony between the hippocampus and the LHb during memory related tasks (Goutagny et al., 2013), the wealth of information known about hippocampal function in learning and memory as well as behavioral flexibility especially in relation to spatial information offers a promising means of examining LHb contributions to well delineated cognitive systems. By combining the latest functional neuroanatomical techniques with a range of complex behavioral tasks, including those examining behavioral flexibility, the possibility to answer some very longstanding fundamental questions about LHb functions is within reach (Sutherland, 1982).

Simultaneous with the recent upsurge of neuroscience research on LHb function, it has become clear that this region is of interest for its relevance in multiple psychiatric disorders including addiction, depression (Lecca et al., 2014; Proulx et al., 2014), and to a lesser extent, aspects of bipolar disorder (Savitz et al., 2011), schizophrenia (Shepard et al., 2006), and Parkinson's Disease (Luo et al., 2015). This is primarily due to LHb connectivity between the limbic forebrain and dopaminergic and serotonergic systems, which are strongly associated with these disorders. Common to these diseases is also an impairment in behavioral flexibility (Berman et al., 1986; Morice, 1990; Koerts et al., 2009; Dickstein et al., 2010; Walshaw et al., 2010; Nesic et al., 2011; van Holst and Schilt, 2011). Indeed interventions such as deep brain stimulation of the LHb have already been performed on patients with promising results (Sartorius et al., 2010). However, it is heretofore unknown whether behavioral flexibility in this treatment is also affected. Regardless, the fact that LHb has been connected with pathological mood states as well as other maladaptive changes in behavior makes it likely that these conditions interact with the presently proposed role of the LHb in signaling current behavioral state to organize action selection. Support for this interaction comes from findings that rodent models of depression as well as antidepressant treatment leads to changes in both the input to and output from the LHb (Shabel et al., 2014). However, how these changes might affect the current findings of

# REFERENCES


largely velocity related neural firing and reward approach theta oscillations remains unknown. This serves to highlight that a deeper understanding of the LHb is essential for more refined therapies.

# ACKNOWLEDGMENTS

The authors would like to acknowledge the contributions of several people who assisted in various aspects of this study: Van A. Redila, Nikita T. Francis and Summer A. Raynor for assisting with neural recording and behavioral testing, Shawn W. Heide for programming contributions to the neural data analysis, and Jessica Heide for figure designs. This work was supported by the following: NIDA grant T32 DA07278- 20 (PMB), University of Washington Mary Gates Award (KSK), NIMH grant R01MH58755 (SJYM), and a University of Washington Research Royalty Fund grant (SJYM).

mesopontine tegmental nucleus. J. Comp. Neurol. 519, 1143–1164. doi: 10. 1002/cne.22561


the reward schedule task. J. Neurosci. 33, 3477–3491. doi: 10.1523/JNEUROSCI. 4388-12.2013


consideration of nitrergic cell groups. J. Chem. Neuroanat. 41, 281–293. doi: 10. 1016/j.jchemneu.2011.05.008


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Baker, Oh, Kidder and Mizumori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# To Act or Not to Act: Endocannabinoid/Dopamine Interactions in Decision-Making

Giovanni Hernandez <sup>1</sup> \* and Joseph F. Cheer 2,3

<sup>1</sup> Faculté de Pharmacie, Université de Montréal, Montréal, Quebec, QC, Canada, <sup>2</sup> Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, Maryland, MD, USA, <sup>3</sup> Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, MD, USA

Decision-making is an ethologically adaptive construct that is impaired in multiple psychiatric disorders. Activity within the mesocorticolimbic dopamine system has been traditionally associated with decision-making. The endocannabinoid system through its actions on inhibitory and excitatory synapses modulates dopamine activity and decisionmaking. The aim of this brief review is to present a synopsis of available data obtained when the endocannabinoid system is manipulated and dopamine activity recorded. To this end, we review research using different behavioral paradigms to provide further insight into how this ubiquitous signaling system biases dopamine-related behaviors to regulate decision-making.

Keywords: endocannabinoids, dopamine, pre-clinical studies, decision-making, reward

# SCOPE AND INTRODUCTION

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

J. Wayne Aldridge, University of Michigan, USA Thomas J. Anastasio, University of Illinois at Urbana-Champaign, USA

#### \*Correspondence:

Giovanni Hernandez giovannih@gmail.com

Received: 12 August 2015 Accepted: 19 November 2015 Published: 17 December 2015

#### Citation:

Hernandez G and Cheer JF (2015) To Act or Not to Act: Endocannabinoid/Dopamine Interactions in Decision-Making. Front. Behav. Neurosci. 9:336. doi: 10.3389/fnbeh.2015.00336 When presented with several alternatives and when deciding which course of action to take, organisms have to integrate different pieces of information. These pieces of information include, but are not limited to the size of the reward, risk, physiological states and expected time to obtain a reward. After the integration of these pieces of information, it is anticipated that the organism will choose the option with the highest value. Given the combination of different variables at the moment of making a decision, the value of the reward rarely represents its objective properties, it represents subjective desirabilities. The representation of idiosyncratic values of different goal objects is encoded in different neural systems (Kable and Glimcher, 2007). Numerous studies on decision-making have shown that dopamine (DA) plays a critical role in the representation of multiple variables underlying the perceived reward value and reward-seeking. This is not surprising, given the array of structures to which DA neurons project and the different modalities of DA release. For some researchers, DA phasic firing encodes reward value (Tobler et al., 2005; Kobayashi and Schultz, 2008). Others have emphasized the role of DA release in different brain areas in the modulation of different processes involved in reward valuation. For example, in the modulation of reward sensitivity (Wise and Rompre, 1989) or reward-gain (Hernandez et al., 2010; Hernandez and Cheer, 2012), in the modulation of effort cost and vigor (Niv et al., 2007; Salamone et al., 2007); whereas others have put an emphasis on the role of DA release in incentive salience (Berridge and Robinson, 1998).

Given the profusion of processes that DA firing and release is involved, it is of great interest to understand how other brain networks alter the DAergic system activity. One system that is pivotal in the modulation of different circuitries is the endocannabinoid (eCB) system. The eCB system, through its interaction with excitatory and inhibitory afferents to the ventral tegmental area (VTA), has proven to be critical in fine-tuning decision-making processes (Melis and Pistis, 2012). Here, we will review the effects of activation and inactivation of eCBs on diverse appetitive behavioral paradigms and how these manipulations alter the behavior and accompanying DA dynamics.

# BRIEF INTRODUCTION TO THE eCB SYSTEM

Ever since the cloning of the central cannabinoid receptor CB1R (Matsuda et al., 1990) and the isolation of the first endogenous cannabinoid (eCB) (Devane et al., 1992), the eCB system has been of great interest to neuroscientists. CB1Rs are the most abundant Gi/<sup>o</sup> coupled receptors found in the brain (Herkenham et al., 1991; Howlett et al., 2002) and they modulate a wide array of functions and processes, ranging from motor control to decisionmaking.

The eCB system is comprised of cannabinoid receptors (CB1R, CB2R), their endogenous ligands and the enzymes that degrade them. The ligands most thoroughly studied and characterized are anandamide and 2-arachidonoyl glycerol (2-AG). eCBs are lipid-derived metabolites that are produced ''on-demand'' by postsynaptic cells and immediate released. The signal that the cell uses to start the biochemical cascade for eCBs synthesis is, in general, an enhancement in intracellular Ca2<sup>+</sup> concentration. This increase in Ca2<sup>+</sup> is due to cell depolarization or mobilization of intracellular Ca2<sup>+</sup> stores (for a detailed review on the synthesis of eCBs; see Di Marzo, 2006). Once produced and released, eCBs act retrogradely mostly onto CB1R localized on excitatory (glutamatergic) and inhibitory (GABAergic) terminals (Elphick and Egertová, 2001; Wilson and Nicoll, 2001; Alger, 2002). The activation of these receptors produces molecular changes leading to the closing of Ca2<sup>+</sup> (N- and P/Q type) channels (Twitchell et al., 1997) and/or opening of K <sup>+</sup> channels (Mackie et al., 1995). The effect of these changes at the cellular level is to reduce the probability of neurotransmitter release (Maejima et al., 2001; Lupica and Riegel, 2005) thus, influencing both short- and long-term forms of synaptic plasticity (Alger, 2002). After eCBs reach their target, they are rapidly degraded. Specifically, fatty acid amide hydrolase (FAAH) participates in anandamide degradation (Di Marzo et al., 1994) whereas monoacylglycerol lipase (MAGL) degrades 2-AG (Dinh et al., 2002; De Petrocellis et al., 2004).

## eCB ACTIONS IN THE MESOCORTICOLIMBIC REWARD SYSTEM

eCBs modulate decision-making in part by curbing the activity of excitatory and inhibitory neurotransmission along the mesocorticolimbic pathway. eCBs are an important neural substrate involved in decision-making processes (Wise and Rompre, 1989; Koob, 1992; Chao and Nestler, 2004) and in the processing of rewarding stimuli (Wise and Rompre, 1989; Salamone and Correa, 2002; Schultz, 2010; Hernandez et al., 2012). eCB activation of CB1Rs at the level of the ventral tegmental area (VTA), the site of origin of the mesocorticolimbic dopaminergic (DA) neurons, increases DA burst firing (French, 1997; Gessa et al., 1998; Wu and French, 2000). As consequence, it facilitates DA release in terminal areas such as the nucleus accumbens (NAc) and the prefrontal cortex (Chen et al., 1990; Pennartz et al., 1994; Tanda et al., 1997; Cheer et al., 2007a; Oleson et al., 2012). Such change in DA excitability is significant because empirical evidence implicates these neurons in the encoding of the subjective value of the reward (Tobler et al., 2005; Roesch et al., 2007; Kobayashi and Schultz, 2008; Roesch and Bryden, 2011; Lak et al., 2014). When the size or delay of a reward are manipulated, DA neurons fire at a higher rate for the cues that predict the subjective more valuable reward (i.e., larger reward or shorter delay; Roesch et al., 2007). When effort and delay to obtain a reward are manipulated, phasic DA release in the NAc is higher at the cue that predicts lesser exertion. Phasic DA release also increases at cues that predict an immediate reward (Day et al., 2011). Importantly when DA signaling is disrupted, changes in the behavior ensue so that subjects no longer adapt their behavior according to changes in reward contingencies (Cardinal et al., 2001; Ghods-Sharifi and Floresco, 2010; Stopper et al., 2014).

Given that DA neurons do not express CB1Rs (Herkenham et al., 1991; Matsuda et al., 1993; Julian et al., 2003), the modulation of their activity and release by eCBs comes indirectly from the activation of CB1R present on afferents to DA cell bodies. Under conditions of relatively high neural activity, DA neurons release eCBs (Alger, 2002; Melis et al., 2004). These molecules retrogradely bind to CB1R on presynaptic terminals to dampen the activity of DA afferents. This reduction in the activity of DA inputs allows DA neurons to regulate their activity levels (Melis et al., 2004, 2006; Marinelli et al., 2007). The precise mechanism by which eCBs facilitate DA burst firing in a behaving animal remains to be fully established. One possibility is that DA burst firing is the result of the net effect of eCBs on the combined probabilities of glutamate and GABA release (Lupica and Riegel, 2005; Melis and Pistis, 2007). Under normal resting circumstances, approximately 50% of DA neurons are under inhibitory GABAergic drive (Grace and Bunney, 1984) rendering them insensitive to excitatory inputs. Direct activation of CB1R on GABA neurons reduces inhibitory drive on DA neurons, making them more susceptible to excitatory inputs and therefore, more prone to fire in bursts (Overton and Clark, 1997; Zweifel et al., 2009). Nonetheless, activation of CB1R on glutamatergic neurons also reduces the probability of glutamate release. This reduction would have a dual effect; it would diminish the excitatory inputs to DA neurons (Melis et al., 2004), which would curtail burst firing, but it could also reduce GABAergic inhibitory drive onto DA neurons. Indeed, glutamate plays a major role in the maintenance of DA inhibitory drive by acting on NMDA receptors located in GABA neurons most likely through GluN2A receptors (Bergeron and Rompré, 2013; Hernandez et al., 2015). A reduction in glutamate release probability, therefore, adds to an overall decrease in DA inhibitory drive. Although eCBs can lower the probability of glutamate release, the effect is limited due to the greater relative presence of CB1R on GABAergic vs. glutamatergic terminals (Mackie, 2005). The combined effect of decreased glutamate release and CB1R-induced activation of GABA neurons would result in a net reduction in the number of DA neurons firing in a slow tonic manner. Under these conditions, DA neurons are ready to fire in bursts once NMDA receptors are activated (Overton and Clark, 1997; Zweifel et al., 2009; but see Lobb et al., 2010 for an alternative mechanism).

eCB-induced disinhibition of DA neurons in the VTA can be produced intrinsically by acting on GABAergic interneurons or extrinsically via GABAergic afferents (Lupica et al., 2004). This distinction is possible by the general type of GABA receptor involved. GABAergic interneurons preferentially target GABA<sup>A</sup> receptors located on VTA DA neurons; whereas GABAergic afferents target preferentially GABA<sup>B</sup> receptors (Johnson and North, 1992; Sugita et al., 1992). In vitro experiments show that the excitatory effect of the CB1R agonist HU-210 is occluded by application of the GABA<sup>A</sup> receptor antagonist bicuculline or the CB1R antagonist rimonabant to the slice (Cheer et al., 2000). Similarly, perfusion of the CB1R agonist WIN55, 212–2 decreases electrically evoked inhibitory postsynaptic currents (IPSCs) in a GABA<sup>A</sup> receptor-dependent manner (Szabo et al., 2002); whereas application of the CB1R antagonist rimonabant prevents this effect. In addition to this intrinsic mechanism for the eCB dependent disinhibition of VTA DA neurons, an extrinsic disinhibition mechanism has been hypothesized which acts predominantly on GABA afferents targeting GABA<sup>B</sup> receptors (Riegel and Lupica, 2004). Here, the application of CB1R agonist WIN55, 212–2 decreases the amplitude of the GABA<sup>B</sup> mediated IPSCs, in a CB1R-dependent fashion. However, immunocytochemical investigations have not yet identified the origin of such VTA GABA afferents (Mátyás et al., 2008). Further electrophysiological research points towards the: (a) NAc, a critical brain area mediating appetitive behaviors via the integration of inputs from cortical and limbic structures (Mogenson et al., 1980); (b) ventral pallidum, a region that plays a part in the differentiation of wanting, liking, and prediction components of a reward (Smith et al., 2011); and (c) rostromedial tegmental nucleus (RMTg), a small node that plays a pivotal role in processing both aversive and appetitive stimuli (Jhou et al., 2009b).

The projection of medium spiny neurons (MSN) of the NAc to the VTA was one of the first afferents proposed (Walaas and Fonnum, 1980; Sugita et al., 1992; Kalivas et al., 1993). It was hypothesized that these axon terminals converged onto DA neurons and directly inhibited DA activity (Einhorn et al., 1988; Rahman and McBride, 2000). However, recent evidence using genetic and optogenetics tools is at odds with this notion. Optical activation of NAc MSN demonstrated that these axons mainly synapse onto non-DA neurons, and these connections are fast–inhibitory neurons mediated by GABA<sup>A</sup> receptors (Xia et al., 2011). Moreover, it was demonstrated that CB1 expressing neurons in the NAc are fast-spiking interneurons, not MSNs. A conclusion obtained via the use of a knock-in mouse line in which CB1-expressing neurons also expressed the fluorescent protein td-Tomato (Winters et al., 2012). These results imply that synaptic projections from the NAc to the VTA should not be affected by CB1R signaling, although further research utilizing more sophisticated retrograde labeling techniques is needed.

In vivo electrophysiological studies show that GABA projections coming from the VP (Aguilar et al., 2015) and RMTg (Lecca et al., 2011, 2012) are sensitive to cannabinoid manipulations, and they modulate VTA DA neural firing. Inhibiting the degradation of eCBs in the VP decreased VTA DA neural activity observed following chronic treatment with the NMDA glutamate receptor antagonist phencyclidine (Aguilar et al., 2015). Likewise, manipulation of the RMTg nucleus has a profound effect on DA neural firing. The RMTg receives dense, mostly glutamatergic inputs from the lateral habenula (Jhou et al., 2009a,b), an area that encodes aversive stimulation (Matsumoto and Hikosaka, 2009). This nucleus mediates the inhibitory effect of the lateral habenula on midbrain DA neurons (Jhou et al., 2009a,b). The RMTg neurons that project to the VTA form inhibitory synapses, so that activation of this input, via electrical stimulation, inhibits DA firing (Lecca et al., 2011). Systemic injections of CB1R agonist produces a long-lasting decrease in the firing rate of GABA neurons located in the RMTg. The administration of a CB1R antagonist, which on its own is devoid of effects on firing rate of GABA neurons, minutes before the agonist, prevents the inhibition of RMTg GABA neurons. In vitro recordings, demonstrate that the reduction in the amplitude of excitatory postsynaptic currents is the mechanism underlying the inhibition of GABA neurons. In addition to a decrease in excitatory postsynaptic currents, CB1R agonist produced a significant increase in paired-pulse ratio, suggesting that the CB1R agonist produced a reduction in glutamate release through activation of presynaptic receptors (Lecca et al., 2011). As expected, the inhibition of GABA neurons in the RMTg correlates with an increase in firing of DA neurons in the VTA (Lecca et al., 2011, 2012).

These electrophysiological studies suggest that eCB modulation of afferents to the VTA potently regulate DA activity via multiple mechanisms. The modulation of DA responses has important implications for decision-making processes. If, by their phasic firing and release, DA neurons integrate the subjective reward value (Lak et al., 2014) then eCB signaling is crucial during reward evaluation and can alter the weight of the variables used during goal assessment. Once different alternatives are weighted, and different goals are assessed, subjects have to start an action according to their assessment; such course of action is believed to represent the option with the highest expected subjective preference. Thus, reward-seeking can be used as a proxy to infer the subjective reward value and changes in decision-making. In the following section, we will review empirical evidence that shows how altering DA signaling via CB1R manipulations biases goal-directed behavior.

## eCBs AND BSR

Several organisms will deliver electrical pulse trains to different brain areas via insulated macro electrodes (Olds and Milner, 1954; Olds, 1962; Shizgal and Murray, 1989). The effect of the electrical stimulation that leads organisms to seek and reinitiate the stimulation is called brain stimulation reward (BSR). Since its discovery, BSR has become the paradigm of choice for studying the neural reward circuitry and goaldirected responses. The rewarding signal that arises as a result of the delivery of electrical pulses shares properties with natural rewards. BSR can compete with, summate with natural rewards (Conover and Shizgal, 1994) and BSR can be degraded in a similar way as natural rewards (Hernandez et al., 2011). These characteristics strongly suggest that the behavior maintained by pulses of electrical brain stimulation is far from being rigid or habitual responding (Hernandez et al., 2011), but denotes the subject's integration of different pieces of information regarding the value of different outcomes. During intracranial self-stimulation (ICSS), the experimental subject has to choose between pressing the lever to trigger electrical pulses or engage in competitive activities, i.e., exploring the box, sniffing or resting. The time allocated to each activity by the experimental subject will depend on the perceived value of the stimulation.

Can the reward induced by the electrical pulses change by altering DA neurotransmission? ICSS was the first paradigm implemented to study different reward substrates and the role of DA in reward (Crow, 1972a,b). Indeed, the electrical train pulses injected by the electrode produce an increase in DA cell firing and DA release (Moisan and Rompré, 1998; Hernández and Shizgal, 2009). A large body of evidence shows that reward induced by electrical brain stimulation is highly sensitive to changes in VTA DA neurotransmission, as measured by the curve-shift paradigm. In this experimental preparation, a series of stimulation parameters (pulse frequencies or currents) that drives response rate from a maximal to a minimal level in an S-shaped manner are used (Miliaressis et al., 1986). Drugs that enhance DA levels such as DA transporter blocker GBR12909 produce a leftward displacement of the curve that relates operant performance to stimulation parameters (Rompré and Bauco, 1990). Thus, it is inferred that these drugs boost the rewarding effect of electrical brain stimulation. Opposite effects are obtained with DA receptor antagonists like haloperidol and raclopride (Nakajima and Baker, 1989).

What are the consequences of manipulating CB1Rs on ICSS? Since CB1R agonists increase DA output (Ng Cheong Ton et al., 1988; Chen et al., 1990); it was expected that they would potentiate the rewarding signal that arises from the electrical stimulation; whereas CB1 receptor antagonists would do the opposite. However, research from different groups yielded inconsistent evidence. The first reports using ∆<sup>9</sup> -THC, showed a reward enhancement effect that was dependent on the rat strain, such differences in rat strain correlated with differences in DA efflux in the NAc. Lewis rats showed the larger behavioral effect as well as, the higher DA release following the administration of ∆<sup>9</sup> -THC. In contrast, Fisher and Sprague-Daley rats showed a minimal behavioral effect and modest DA increments (Chen et al., 1991; Lepore et al., 1996). Several other studies using Long-Evans or Sprague-Daley rats have found a decrease in reward pursuit or no effect (Stark and Dews, 1980; Vlachou et al., 2007); whereas others have found different results depending on the dosage of ∆<sup>9</sup> -THC used. At low doses (0.1 mg/kg) a facilitation on reward is seen; whereas at a higher doses (1 mg/kg) a hindrance on reward is obtained (Katsidoni et al., 2013). Similar puzzling effects were observed with other CB1R agonists (Arnold et al., 2001; Antoniou et al., 2005). Using indirect agonists such as inhibitors of the enzymes that degrade eCBs (Vlachou et al., 2006; Kwilasz et al., 2014), has yielded a lack of effect or a decrease in reward pursuit (Arnold et al., 2001; Deroche-Gamonet et al., 2001; Vlachou et al., 2006).

These disparate results obtained in ICSS experiments using the curve-shift paradigm could be due to genetic differences as Gardner's experiments suggest (Chen et al., 1991). Another explanation could be that systemic injections of these compounds produce an indiscriminate activation of all brain areas containing CB1Rs. Given that CB1Rs are the most abundant G-proteincoupled receptors in the brain (Herkenham et al., 1990) such broad activation is problematic for studying the neural underpinnings of reward evaluation and reward-seeking. These processes most likely require the activation of eCB synthesis and release to be region, neuron or even synapse-specific (Solinas et al., 2008). Thus, a wide activation of CB1R might give rise to negative or dysphoric effects that counteract their positive action on reward-seeking (Panagis et al., 2014). However, these explanations do not resolve why when using other experimental testing procedures (i.e., progressive ratio) CB1R agonist and antagonist produce behaviorally consistent results, even when using systematic injection and dose ranges similar to the ones used in ICSS experiments.

An alternative possibility relies on findings that the effects of CB1R agonists on DA release in the NAc are moderate at best when contrasted with other DA agonist or DA receptor blockers. Such modest DA release is problematic for traditional curve-shift paradigms used in ICSS experiments. The curveshift paradigm lacks the dimensionality to differentiate between changes in the relative reward strength, the only dimension measured in this experimental preparation from changes in costs (opportunity and effort), to obtain a goal object. All these variables contribute to goal evaluation, and different researchers have shown the modulation of these by changes in DA efflux (Wise and Rompre, 1989; Salamone and Correa, 2002; Hernandez et al., 2012). So when using a two-dimensional perspective, non-measured changes on the ''hidden'' dimension can be misconstrued as an effect the subjective reward intensity.

Why is this methodological distinction important? If DA release does not modulate the relative value of a reward, then moderate changes in DA release would produce unreliable changes in curves relating behavior and stimulation intensity; as it is the case with CBRs agonist. When using the ''mountainmodel'' (Arvanitogiannis and Shizgal, 2008) a testing paradigm that measures opportunity cost in addition to changes in stimulation strength, CB1R antagonists produce consistent decreases in opportunity cost. This reduction correlates with a consistent decrease in DA release (Trujillo-Pisanty et al., 2011). This effects mimics that of DA receptor antagonists (Trujillo-Pisanty et al., 2013), and it is consistent but of opposite direction with results obtained with non-specific and specific DA transport blockers (Hernandez et al., 2010; Hernandez and Cheer, 2012).

# eCBs AND MOTIVATION

When electrical stimulation is used to study the effects of different manipulations on the eCB system, the results appear contradictory. With the inclusion of a third variable in the measuring paradigm, these are reconciled with the rest of the scientific research implicating CB1Rs in reward modulation, relating the motivation for obtaining different classes of rewards evaluated by several schedules of reinforcement. One of these is the progressive ratio where the requirements to acquire a single reward is exponentially increased, within a single session until the experimental organism stops responding. ''Breakpoint'' is the ratio at which the subject stops responding. It is assumed that this schedule measures the relation between response effort and the value of a particular reward (Hodos, 1961). Thus, inferences about the willingness of the organism to work to obtain a goal object can be drawn.

If cannabinoid agonists are used in conjunction with this schedule, they increase breakpoints. Thus, the experimental subjects are willing to lever-press more for a single reward. This effect has been consistent across different classes of rewards (Higgs et al., 2005; Solinas and Goldberg, 2005; Ward and Dykstra, 2005; Gamaleddin et al., 2012; Jones and Kirkham, 2012; Oleson et al., 2012) and equally consistent but opposite effects have been obtained with CB1R antagonists (Solinas and Goldberg, 2005; Ward and Dykstra, 2005; Maccioni et al., 2008; Rasmussen and Huskinson, 2008; Xi et al., 2008; Gamaleddin et al., 2012; Hernandez and Cheer, 2012). Recent research shows that inhibiting 2-AG degradation, but not anandamide increases breakpoints. Moreover, intra-VTA inhibition of 2-AG degradation facilitates reward-seeking and DA phasic release (Oleson et al., 2012).

# eCBs AND DISCOUNTING

The value of a goal object depends on how distant in the future it is. When an organism is deciding among different goal objects, it has to consider into its computations how distant in the future different goals objects are and take a decision based on the best-perceived alternative. At the decision point, the organism will select the option with the highest perceived value. Temporal discounting can be measured by allowing experimental subjects to choose between two alternatives: one that delivers an immediate but small reward vs. another that delivers a larger but delayed reward. Under this arrangement, and when questioned about future choices humans and non-humans subjects show a preference for the larger distant reward over the immediate small one. However, as time passes the difference between the small and large reward becomes less prominent and preference switches (Ainslie, 1975), this change occurs because immediate rewarding outcomes have a greater subjective value than delayed ones. Self-control is exercised when the delayed option is still preferred, whereas impulsivity takes place if the immediate option is chosen (Rachlin and Green, 1972).

DA firing and release are critically important for temporal discounting. DA phasic firing correlates positively with the magnitude of reward and decreases in a hyperbolic fashion with reward delay (Kobayashi and Schultz, 2008). Similarly, DA release in the NAc shows patterned release at the cue that predicts different delays. It shows a decrease that correlates with the length of the delay (Saddoris et al., 2015). When phasic DA release is measured at reward delivery DA release is higher for the larger reward at small to moderate delays, and then decreases to a level comparable to that of the small immediate reward (Hernandez et al., 2014).

By pharmacologically manipulating the DA system, during intertemporal choice tasks, different studies have shown that acute challenges with drugs that increase DA availability produce an increase in self-control. Experimental subjects choose more often the large delayed reward over the immediate small one (Cardinal et al., 2000; Wade et al., 2000; Winstanley et al., 2003, 2005; van Gaalen et al., 2006; Bizot et al., 2007). Conversely, drugs that interfere with DA availability produce an increase in impulsive choice (Wade et al., 2000; van Gaalen et al., 2006; Floresco et al., 2008). Experimental subjects choose more often the small immediate reward over one large delayed one. As a modulator of the DA system, eCB signaling produces remarkable results. Acute activation of CB1Rs with ∆<sup>9</sup> -THC leads to increased self-control that is blocked by CB1R antagonists. Interestingly, CB1R antagonists attenuate the effect of DA agonists (Wiskerke et al., 2011). When given alone CB1R antagonists do not exert a significant influence on self-control. These results suggest that the eCB system does not play a role in baseline temporal discounting (Pattij et al., 2007; Wiskerke et al., 2011; Hernandez et al., 2014).

Although acute increases in DA release increase self-control, the opposite happens when subjects have chronic experience with different drugs of abuse that directly or indirectly alters DA neurotransmission (Di Chiara and Imperato, 1988). Chronic drug exposure produces plastic changes in the mesolimbic circuitry and other brain areas and neurotransmitters that underlie addiction (Nestler, 2001; Everitt and Robbins, 2005; Kalivas and Volkow, 2005). As stated above, eCBs participate in the modulation of synaptic plasticity in the VTA (Melis et al., 2004; Haj-Dahmane and Shen, 2010); where they modulate DA neuron excitability (Lupica and Riegel, 2005; Maldonado et al., 2006). eCBs play a critical role in the increase of phasic DA release observed after the administration of different types of drugs of abuse (Cheer et al., 2007b). They are necessary for the development and expression of sensitization (Viganò et al., 2004; Corbillé et al., 2007; Azizi et al., 2009; Li et al., 2009; Blanco et al., 2014; Mereu et al., 2015). Also, eCB signaling is required for conditioned drug seeking and relapse (De Vries et al., 2001; De Vries and Schoffelmeer, 2005; Maldonado et al., 2006) as well as cue-induced reinstatement (De Vries et al., 2001, 2003; Xi et al., 2006). Therefore, the eCB system is likely to play a role in impulsive behavior observed in drug addiction.

Our laboratory recently found that eCB signaling is a canonical component in the development of impulsive choice caused by chronic cocaine exposure. Specifically, after experimental subjects were sensitized to the effects of cocaine they behaved impulsively in an intertemporal choice task (Mendez et al., 2010; Hernandez et al., 2014; Smethells and Carroll, 2015). The pattern of DA release in the NAc during the task correlates with behavioral performance. Before sensitization, higher DA release is observed for the larger reward when the delay is below 10-s. After sensitization has taken place, phasic release for the small immediate reward is comparatively higher regardless of the delay. Importantly, blockade of CB1Rs before cocaine exposure prevented not only impulsive choice, but it also eliminated maladaptive patterns of phasic DA release. More importantly, from a therapeutic perspective, CB1R blockade reverted changes in self-control observed following cocaine sensitization (Hernandez et al., 2014).

#### FUTURE DIRECTIONS

The research showcased in the present review demonstrates that the eCB system, via modulation of phasic DA release, plays important roles in decision-making processes. eCB signaling is critical for adjudicating value to different rewards as well

#### REFERENCES


as for activating, organizing and maintaining goal-directed behaviors. This happens under normal circumstances and is usurped when decision-making processes are compromised. However, the current state of this body of research is only a first step that will lead to a better understanding of the potential reach of the eCB system in decision-making processes. To further our knowledge, it is important to map each eCB action in all of the relevant circuits thoroughly. This requires elucidating the exact localization of CB1 receptors and their active ligands on cell-type specific nodes and under temporally-resolved circumstances. Such a targeted approach will greatly enhance our current understanding of the anatomical frameworks engaged in decision-making processes. With this information in hand, it will be possible to create models that more accurately predict changes in the behavior and the underlying neurochemistry.

#### ACKNOWLEDGMENTS

This review was supported by NIH grant DA022340 to JFC; by Groupe de Recherche sur le Système Nerveux Central Herbert H. Jasper fellowship to GH.


1 cannabinoid receptors and dopaminergic systems in the rat basal ganglia. Neuroscience 119, 309–318. doi: 10.1016/s0306-4522(03)00070-8


glutamate to cannabinoid receptors. Neuron 31, 463–475. doi: 10.1016/s0896- 6273(01)00375-0


Olds, J. (1962). Hypothalamic sub- strates of reward. Physiol. Rev. 42, 554–604.


in rats: role of glutamate in the nucleus accumbens. J. Neurosci. 26, 8531–8536. doi: 10.1523/jneurosci.0726-06.2006


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Hernandez and Cheer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Glycogen synthase kinase-3β inhibition in the medial prefrontal cortex mediates paradoxical amphetamine action in a mouse model of ADHD

Yi-Chun Yen† , Nils C. Gassen , Andreas Zellner, Theo Rein , Rainer Landgraf , Carsten T. Wotjak ‡ and Elmira Anderzhanova\* ‡

#### Edited by:

Gregory B. Bissonette, University of Maryland, USA

#### Reviewed by:

Gregg Stanwood, Vanderbilt University, USA Walter Adriani, Istituto Superiore di Sanita', Italy

#### \*Correspondence:

Elmira Anderzhanova, Max Planck Institute of Psychiatry, Kraepelinstrasse 2, 80804-D Munich, Germany Tel: +49-89-30622-604 Fax: +49-89-30622-610 anderzhanova@psych.mpg.de

#### †Present address:

Yi-Chun Yen, Duke-NUS Graduate Medical School, Neuroscience and Behavioral Disorders Research Program, 169857 Singapore, Singapore

‡ These authors have contributed equally to this work.

Received: 30 October 2014 Accepted: 27 February 2015 Published: 20 March 2015

#### Citation:

Yen Y-C, Gassen NC, Zellner A, Rein T, Landgraf R, Wotjak CT and Anderzhanova E (2015) Glycogen synthase kinase-3β inhibition in the medial prefrontal cortex mediates paradoxical amphetamine action in a mouse model of ADHD. Front. Behav. Neurosci. 9:67. doi: 10.3389/fnbeh.2015.00067 Max Planck Institute of Psychiatry, Munich, Germany

Psychostimulants show therapeutic efficacy in the treatment of attention-deficit hyperactivity disorder (ADHD). It is generally assumed that they ameliorate ADHD symptoms via interfering with monoaminergic signaling. We combined behavioral pharmacology, neurochemistry and molecular analyses to identify mechanisms underlying the paradoxical calming effect of amphetamine in low trait anxiety behavior (LAB) mice, a novel multigenetic animal model of ADHD. Amphetamine (1 mg/kg) and methylphenidate (10 mg/kg) elicited similar dopamine and norepinephrine release in the medial prefrontal cortex (mPFC) and in the striatum of LAB mice. In contrast, amphetamine decreased, while methylphenidate increased locomotor activity. This argues against changes in dopamine and/or norepinephrine release as mediators of amphetamine paradoxical effects. Instead, the calming activity of amphetamine corresponded to the inhibition of glycogen synthase kinase 3β (GSK3β) activity, specifically in the mPFC. Accordingly, not only systemic administration of the GSK3β inhibitor TDZD-8 (20 mg/kg), but also local microinjections of TDZD-8 and amphetamine into the mPFC, but not into the striatum, decreased locomotor activity in LAB mice. Amphetamine effects seem to depend on NMDA receptor signaling, since pre- or cotreatment with MK-801 (0.3 mg/kg) abolished the effects of amphetamine (1 mg/kg) on the locomotion and on the phosphorylation of GSK3β at the level of the mPFC. Taken together, the paradoxical calming effect of amphetamine in hyperactive LAB mice concurs with a decreased GSK3β activity in the mPFC. This effect appears to be independent of dopamine or norepinephrine release, but contingent on NMDA receptor signaling.

Keywords: ADHD, mouse models, prefrontal cortex, amphetamine, NMDA receptor, dopamine, GSK3β

# Introduction

Attention deficit hyperactivity disorder (ADHD) is a frequent psychiatric disorder with a prevalence of up to 8% in Western populations (Faraone and Mick, 2010). It appears in three types of presentation (inattentive, hyperactive/impulsive, and combined) and shows a high degree of psychiatric comorbidity, e.g., with unipolar depression and autism spectrum disorder (Faraone et al., 2006; Piñeiro-Dieguez et al., 2014). It is generally assumed that alterations in the monoamine signaling are causally involved in the etiology of ADHD. This assumption is based on a paradoxical therapeutic efficacy of monoamine releasing drugs (amphetamine and methylphenidate and selective norepinephrine re-uptake inhibitor reboxetine) in the treatment of ADHD. Presynaptic dopamine D2 receptor-mediated downregulation of dopamine release resulting from a transient initial increase was assumed as the underlying mechanism of the paradoxical effect (Seeman and Madras, 1998).

Despite a conceivable involvement of the striatum, which mediates the dopamine-dependent locomotion, clinical evidence and effectiveness of cognitive therapy nourish the idea that the prefrontal cortex (PFC) rather than the striatum plays a critical role in ADHD pathogenesis and therapy (Mattes, 1980; Barkley, 1997). This is confirmed with recent functional imaging data showing the frontal hypoactivity in patients with ADHD (Dickstein et al., 2006). The impairment of Nmethyl-D-aspartate (NMDA) receptor signaling in the medial PFC (mPFC) is believed to underlie the frontal hypoactivity and the insufficient control over vigilance and executive functions in ADHD. Disturbance in the norepinephrine and dopamine cortical neurotransmission may mediate the NMDA receptor signaling shortage (Makris et al., 2009; Arnsten and Pliszka, 2011). This suggests postsynaptic mechanisms including intracellular signaling pathways as mediators of hyperactivity and the paradoxical action of dopamine-releasing drugs.

Glycogen synthase kinase 3β (GSK3β) was primarily identified as a glycogen synthase inhibiting enzyme, a key in the glucose metabolism regulation. It also interacts with β-catenin and is intimately involved in processes of neurogenesis and neuroprotection. The non-canonical β-arrestin2/PP2A/Akt- -GSK3β pathway plays a significant role in the mediation of dopamine D2/3 receptor- and amphetamine-dependent behavior. Dopamine release and activation of dopamine D2 receptors result in a decrease in the Akt activity and respective decrease in GSK3β phosphorylation leading to its activation (Beaulieu et al., 2004, 2005). Both pharmacologically-induced and innate experimental hyperactivity can be temporarily recovered by GSK3β inhibition (Emamian, 2012; Mines et al., 2013). Changes in the state of constitutionally active GSK3β were considered as both pathogenetic factor and therapeutic means in bipolar disorder and schizophrenia (Jope and Roh, 2006; Emamian, 2012). Accordingly, antipsychotics inhibit GSK3β (Emamian et al., 2004; Li et al., 2007).

GSK3β is subject to plenty of upstream regulators acting via β-arrestin2/Akt and Wnt/β-catenin, DAG/Ca2+-PKC, Trk R/insulin R/BDNF R-PI3K/PDK1, and PPI1-mediated pathways. Recent findings showed that not only dopamine, but also serotonin, adenosine, and brain-derived neurotrophic factor are among these upstream regulators (Jope and Roh, 2006; Kaidanovich-Beilin and Woodgett, 2011). The intracellular pathways, which can be activated by NMDA receptors, also target GSK3β (Svenningsson et al., 2003; Peineau et al., 2007; Li et al., 2009; Beurel et al., 2011; Xi et al., 2011; Liu et al., 2013).

Animal models of ADHD may help to further elucidate the biological basis of the disorder and provide deeper insight into the processes underlying effective therapeutic intervention. Most ADHD models are based on selected mutations in monoaminergic systems (Russell, 2011; Leo and Gainetdinov, 2013). However, selective breeding strategies may better resemble the multigenetic background of the disorder. We have recently validated a novel inbred ADHD mouse model, i.e., low trait anxiety-related behavior (LAB) mice (Yen et al., 2013). LAB mice were originally bred contrasting normal trait anxietyrelated behavior (NAB) and high trait anxiety-related behavior (HAB) mice (Krömer et al., 2005). Selection was based on mouse exploratory behavior on the elevated plus-maze. LAB mice display a clear preference for the open arms (>60%), whereas NAB spend 20--40% and HAB <20% of the total time in the open arms. The open arm preference cannot be solely explained by low anxiety levels, but seems to reflect the increased novelty seeking (Yen et al., 2013). LAB mice of both genders show hyperactivity both in home cages (Krömer et al., 2005) and in the open field (OF; Yen et al., 2013). Remarkably, this hyperactivity becomes even stronger upon repeated exposures, thus arguing against a novelty-driven phenomenon (Yen et al., 2013). LAB mouse endophenotype is characterized by an increase in acoustic startle responses, impaired social recognition and spatial memory. Compared to HAB mice, LAB mice display changes in metabolic pathways both in the periphery and in the brain (Krömer et al., 2005; Kessler et al., 2007; Filiou et al., 2011). In terms of predictive validity, LAB mice show a paradoxical calming response to amphetamine in non-toxic doses (0.5--2.0 mg/kg, i.p.) (Yen et al., 2013).

The aim of the present study was to elucidate the neurochemical and molecular basis of the paradoxical effect of amphetamine in our mouse model of ADHD (i.e., LAB mice) with a special focus on the role of GSK3β. To this end, (1) we examined locomotor activity in the OF and employed in vivo microdialysis in order to compare behavioral effects of amphetamine (and methylphenidate) in LAB and HAB mice with drug-related changes in the dopamine and norepinephrine levels in the mPFC and the striatum. We assessed (2) the efficiency of the dopamine D2 receptor function in LAB mice by measuring behavioral and neurochemical effects of haloperidol. We explored (3) the effects of amphetamine treatment on the phosphorylation of GSK3β in the two brain structures under study; and (4) the effects of systemic and local inhibition of GSK3β on locomotor activity. Finally, we examined (5) potential involvement of glutamate signaling via NMDA receptors in the mechanisms of the calming effect of amphetamine.

# Material and Methods

### Animals

Male HAB, LAB and normal trait anxiety-related behavior (NAB) mice are selectively bred from Swiss CD1 mice (Charles River, Sulzfeld, Germany) in the Max Planck Institute of Psychiatry. Hyperactivity of LAB mice is observed in singleand group-housed animals during light and dark phases of the diurnal cycle and appears in two types: (i) LAB-Intermediate (LAB-I) mice are animals with non-habituating locomotion slightly exceeding the ambient activity in NAB and HAB mice; and (ii) LAB-Strong (LAB-S) mice showing 3-fold higher locomotor activity (Yen et al., 2013). All experiments presented here were performed in LAB-I mice, which represent the majority of the offspring (>60%) and most closely fulfill the criteria for an animal model of ADHD (Yen et al., 2013). For the sake of clarity, the abbreviation ''LAB'' is used instead of ''LAB-I'' throughout the manuscript. All mice were singlehoused under standard laboratory conditions with reversed 12/12 h light/dark cycle (light on at 9 pm), temperature 23 ± 1 ◦C, food and water ad libitum approximately 2 weeks before experiments started. We performed basal locomotor tests with all mice in order to exclude LAB-strong animals, followed by verification of the calming response of amphetamine at an age of 2.0--2.5 months. To meet the 3R's rule of animal welfare we repeatedly (3--4 times) tested mice that was possible due to a stability of endophenotypes (Yen et al., 2013). Given the intertrial intervals (5--10 days) animals were available for subsequent tests at an age of 3--6 months. Nonetheless, to exclude any confounding effect of the repeated exposure animal to the OF and/or a carryover effect of injections and previous treatments, we always analyzed the basal activity measured before any treatment (first 20 min of the OF test). Experiments with between-line comparisons were performed at the same time. Number of animals in the experimental groups varied from 4--13. The exact sample size is indicated in the figures/figure legends. All experiments were carried out according to the European Community Council Directive 2010/63/EEC, and efforts were made to minimize animal suffering. All experimental procedures were approved by the local government of Upper Bavaria (55.2.1.54-2532-188-12).

#### Drugs and Doses

d-Amphetamine hemisulfate (Amph), methylphenidate hydrochloride, lithium chloride (LiCl), TDZD-8, and dimethyl sulfoxide (DMSO) were from Sigma-Aldrich (USA). Haloperidol stock solution was from Ratiopharm GmbH (Germany). Amphetamine, methylphenidate, LiCl powders and haloperidol stock solution were dissolved in saline. TDZD-8 stock solution was prepared in 100% DMSO and then dissolved in saline to achieve a final DMSO concentration of 0.5% v/v. All working solutions were prepared freshly before each experiment. Amphetamine (0.5, 1, and 2 mg/kg) and methylphenidate (3, 10, and 30 mg/kg) were injected in doses which did not induce toxic effect and result in stereotypic behavior. Haloperidol was injected at the dose of 1 mg/kg for reproducing the catalepsy (Boulay et al., 2000; McOmish et al., 2012). The LiCl dose (100 mg/kg) was chosen to trade-off the specific antimanic and gustatory/digestive tract aversive effects of this drug (Gould et al., 2007). The TDZD-8 dose (20 mg/kg) was selected on the basis of our preliminary experiments (not shown) and the previous report (Beaulieu et al., 2004). All drugs were injected intraperitoneally (i.p.) in the volume of 100 µl/10 g of body weight. The doses for amphetamine (1.8 ng/0.5 µl/side) and TDZD-8 (1.1 ng/0.5 µl/side) microinjections were chosen on the basis of the previous reports (Prasad et al., 1999; Chen et al., 2004; Ramirez et al., 2010). Control groups received the respective vehicle injection/microinjection, saline or 0.5% DMSO.

#### Behavioral Tests

All behavioral experiments were performed during the active phase of the diurnal cycle between 10 am and 6 pm.

**Locomotor activity** was assessed in the OF test by measurement of the distance traveled with the automatic TruScan Photo Beam Activity system (Coulbourn Instruments, Whitehall, PA, USA) as described previously (Yen et al., 2013). Basal activity was measured within 20 min prior to systemic drug administration. After an i.p. injection (which lasted less a 1 min) mice were returned to the test arena and recording was continued for 1 or 2 h. In the case of mPFC and striatum local treatment, recording started 2--3 min after microinjections and lasted for 1 h. Data were analyzed in 5 min bins; in some cases, we report mean data corresponding to either 20 min intervals or the entire observation period.

**Stereotypic behavior assessment** was done in accordance to Havemann et al. (1986) rating scale: 0 (no stereotypies); 1 (discontinuous sniffing); 2(continuous sniffing); 3 (discontinuous licking); 4 (continuous licking); 5 (discontinuous gnawing); 6 (continuous gnawing).

**Catalepsy test** was performed as described (Sanberg, 1980). Modification was done by adjusting the horizontal bar in a way that the mouse's forepaws were paced 3 cm above the floor level. The latency of the mouse to descend from the inconvenient postures was recorded by a trained observer; immobility for more than 5 min was scored as 300 s.

#### Brain Microdialysis

**Surgery, probe implantation, and microdialysis** were done as described before (Anderzhanova et al., 2013). Microdialysis guide cannulas (Microbiotech/se AB, Sweden) were implanted into the right mPFC (coordinates: AP 2.20 mm, ML 0.35 mm, and DV −1.50 mm) or right striatum (coordinates: AP 0.50 mm, ML 2.00 mm, and DV −2.25 mm) in accordance with Paxinos and Franklin Mouse Brain Atlas (Paxinos and Franklin, 2001) under isoflurane (Abbot, India) Metacam® (Boehringer Ingelheim GmbH, Germany) anesthesia. Recovery lasted for 1 week and included Metacam® supplementation 0.25 mg/100 ml with drinking water. Microdialysis probes (o.d. 0.2 mm, cuprophane membrane 2 mm of length, MAB 4.15.2.Cu, Microbiotech/se AB, Sweden) were inserted under slight isoflurane anesthesia and then continuously perfused with sterile artificial cerebrospinal fluid (concentrations, in mM: NaCl 145, KCl 2.7, CaCl<sup>2</sup> 1.2, MgCl<sup>2</sup> 1.0, Na2HPO<sup>4</sup> 2.0, pH = 7.4). Microdialysis fractions (20 min) were collected during experimental days 1, 2 and 3 at a flow rate of 1.0 µl/min. In order to minimize the number of animals, microdialysis experiments lasted for three consecutive days during which different pharmacological treatments were given in the same order for each animal. Amphetamine and methylphenidate were injected on the first day 5 h apart allowing recovery of both behavior and catecholamine levels. Amphetamine and MK-801 or saline were injected on the second day, haloperidol on the third day.

**Monoamine assays**. Dopamine, norepinephrine, and homovanillic acid (HVA) contents were determined by reversephase HPLC with electrochemical detection (UltiMate3000 CoulochemIII, ThermoFischer, USA). All reagents used for the mobile phase were of analytical grade (Carl Roth GmbH or MERCK KGaA, Germany). Monoamines were separated on an analytical column (C18, 150 mm × 3 mm, 3 µm, YMC Triart, YMC Europe GmbH, Germany) at a flow rate of 0.4 ml/min. The potentials of the working electrodes were set at −150 mV, +220 mV, the guard cell potential was set at +350 mV. Monoamine concentrations were calculated by external standard curve calibration using the peak area for quantification. The detection limits for norepinephrine and dopamine were 0.032 and 0.040 nM, respectively; therefore few data sets were excluded from analysis.

Basal levels were not corrected by an in vitro recovery examination. The values were stable across 3 days of measurement both in the mPFC and in the striatum. Oneway ANOVA did not reveal any differences between the day groups in HAB or LAB mice (Fs < 0.74, ps > 0.40).

#### Drug Microinjections

Custom-designed stainless steel injection cannulas (23G) were bilaterally implanted into the mPFC (AP 1.90 mm, ML ± 0.40 mm, and DV −2.00 mm; **Figure 5A**) and the striatum (AP 0.50 mm, ML ± 2.00 mm, and DV −3.00 mm; **Figure 5D**) (Paxinos and Franklin, 2001). Recovery took 1--2 weeks. Injections were done directly before OF tests under slight isoflurane anesthesia. Injections were performed by means of a cannula (0.3 mm o.d.), which was connected to a microliter syringe (65RNR 10.0 µL SYR, Hamilton Bonaduz AG, Switzerland) via calibrated tubing. Once inserted, the injection cannula protruded from the guide cannula1 mm, thus reaching the prelimbic mPFC or the dorso-lateral striatum. Different injection sets were used for drug and vehicle.

#### Tissue Isolation from Frozen Brains

Animals were i.p. injected with saline, amphetamine (1 mg/kg), and MK-801 (0.3 mg/kg) in accordance with the corresponding protocols and examined in the OF test. Sixty minutes after the injections, animals were lightly anesthetized and decapitated, brains removed, immediately frozen on dry ice, and kept at −80◦C. Biopsy punches of the mPFC and the striatum were done with pre-chilled stainless steel sample corers (diameters of 0.8 mm and 1.0 mm for the mPFC and striatum, respectively) (Fine Science Tools GmbH, Germany) from coronal sections of the brains at −20◦C (Tzigaret et al., 1993). The withdrawn regions are schematically depicted in **Figures 4D,E**. Specimens from the right and left hemispheres were pooled together and stored at −80◦C prior to western blot analysis.

#### Western Blot Analysis

Western blot analysis was performed as described previously (Zschocke et al., 2011). Protein extracts were obtained by lysing the brain punches in 62.5 mM Tris, 2% SDS and 10% sucrose, completed with protease (Sigma, P2714) and phosphatase (Roche, 04906837001) inhibitor cocktail. Samples were sonicated and heated at 95◦C for 10 min. SDS-PAGE was carried out to separate proteins. Proteins were electrotransferred onto nitrocellulose membranes. Blots were placed in Tris-buffered saline (TBS), supplemented with 0.05% Tween (Sigma, P2287, USA) and 5% non-fat milk for 1 h at room temperature and then incubated with primary antibody (diluted in TBS/0.05% Tween) overnight at 4◦C. The following primary antibodies were used: GSK3β (1:1000, Cell Signaling, #9315), phospho-Ser9-GSK3β (1:1000, Cell Signaling, #9323), Actin (1:5000, Santa Cruz Biotechnologies, sc-1616), HSC70 (1:5000, Santa Cruz Biotechnologies, sc-7298). Subsequently, blots were washed and probed with the respective horseradish peroxidase- or fluorophore-conjugated secondary antibody for 1 h at room temperature. The immuno-reactive bands were visualized either by using ECL detection reagent (Millipore, Billerica, MA, USA, WBKL0500) or directly by excitation of the respective fluorophore. Determination of the band intensities was performed with ChemiDoc MP (BioRad, CA, USA). The Western blot protocols were adapted to process small quantities of biological material which could be obtained upon brain structure punching. Low content of proteins and respective adjustments of the Western blot image reader sensitivity in particular cases influenced the image quality. Nonetheless, the sample bits were enough for reliable quantification. Comparison for optical density revealed a difference between the experimental groups.

The phosphorylation index was used as an indicator of kinase activity. Drug and line effects were evaluated in comparison to either saline- (**Figures 4G,I,K**) and amphetamine- (**Figure 7D**) treated groups or NAB mice (**Figure 4F**). Drug-induced dosedependent changes were evaluated after normalization to vehicle-treated specimens.

#### Mass Spectrometry-Based GSK3β Kinase Assay

Determination of GSK3β kinase activity in the presence of TDZD-8, LiCl, methylphenidate, amphetamine, or vehicle was done as in Bowley et al. (2005) with slight modifications. The kinase reaction buffer was supplemented with 1 mM ATP, 0.2 mM DTT, protease inhibitor (Sigma, P2714), phosphatase inhibitor (Roche, 04906837001), and 45 ng of recombinant GSK3β (BPS Bioscience, CA, USA). A MALDI-ToF mass spectrometer (Ultraflex I, Bruker Daltronics) was used for analysis. The ratio between concentrations of phosphorylated and total substrate 2B-Sp was used as an index of kinase activity.

#### Histology

Cryo-sections of 25 µm obtained from target or punched brain regions were stained with cresyl violet (Carl Roth GmbH, Germany) and verified under a microscope using the Paxinos and Franklin mouse atlas (Paxinos and Franklin, 2001). Schematic representations of the targeted regions are shown in **Figures 2A,D**, **4D,E**, **5A,D**. When probe/cannula placement was found to be out of the targeting area, the respective samples/animals were discarded before analysis.

#### Experiments' Outline

Different cohorts of LAB, NAB and HAB mice were used in each experiment. The sizes of groups are indicated in the Figures.

**Experiment 1.1.** Comparison of the mean basal locomotor activities (20 min OF test in drug naïve animals) between LAB, NAB, and HAB mice. **Experiment 1.2.** Between line (HAB, NAB and LAB mice) comparison of amphetamine (0.5, 1.0, and 2.0 mg/kg, i.p) and methylphenidate (3, 10, 30 mg/kg, i.p.) doseresponse effects on the gain or loss in the mean locomotor activity within 60 min after drug administration (OF test).

**Experiment 2.1.** Evaluation of acute neurochemical changes in the mPFC of LAB and HAB mice evoked by systemic administration of amphetamine (1 mg/kg, i.p.; this dose is used in the next experiments) and methylphenidate (10 mg/kg) (microdialysis). **Experiment 2.2.** Evaluation of acute neurochemical changes in the striatum of LAB and HAB mice evoked by systemic administration of amphetamine and methylphenidate (microdialysis). **Experiment 2.3.** Examination of the locomotor activity dynamics within 60--120 min after systemic amphetamine and methylphenidate treatment (OF test).

**Experiment 3.1.** Comparison of haloperidol (1 mg/kg, i.p.) effects on the loss in the mean locomotor activity within 60 min after drug administration in LAB and HAB mice (OF test). **Experiment 3.2.** Examination of cataleptogenic activities of amphetamine and haloperidol in LAB mice (descent latency measurement at 60 min after drug administration). **Experiment 3.3.** Evaluation of acute neurochemical changes in the mPFC after haloperidol treatment in LAB and HAB mice (microdialysis).

**Experiment 4.1.** Comparison of the locomotor activity within 60 min after GSK3β inhibitors (TDZD-8, 20 mg/kg and LiCl, 100 mg/kg, i.p; these doses are used in the next experiments) and amphetamine administration in LAB mice (OF test). **Experiment 4.2.** Comparison of TDZD-8, LiCl, and amphetamine effects on the loss or gain in the mean locomotor activity within 60 min after drug administration in LAB, NAB, and HAB mice (OF test).

**Experiment 5.** Comparison of the amphetamine, methylphenidate, LiCl, and TDZD-8 GSK3β inhibitory activity in vitro.

**Experiment 6.** Examination of systemic amphetamine administration effects on GSK3β phosphorylation levels observed at 60 min after drug i.p. injection in the mPFC and in the striatum of LAB, NAB, and HAB mice (Western blot).

**Experiment 7.** Comparison of changes in locomotor activity within 60 min after amphetamine and TDZD-8 bilateral microinjections into the mPFC and into the striatum in LAB mice (OF test).

**Experiment 8.1.** Evaluation of the locomotor activity dynamics within 60 min after MK-801 (0.3 mg/kg, i.p.; this dose is used in the next experiments) treatment in LAB and HAB mice (OF test). **Experiment 8.2.** Examination of MK-801 effects of the amphetamine-induced dopamine release in the mPFC of LAB and HAB mice (microdialysis).

**Experiment 9.1.** Comparison of administration timedependent (pre-, co- and post-treatment) effects MK-801 on the amphetamine-mediated locomotion mitigation in LAB mice (OF test). **Experiment 9.2.** Evaluation of the MK-801 and amphetamine interaction in regulation of GSK3β phosphorylation in the mPFC of LAB mice (Western blot). **Experiment 9.3.** Examination of MK-801 pre-treatment effects on the LiCl-evoked locomotion mitigation in LAB mice (OF test). **Experiment 9.4.** Examination of MK-801 co-treatment effects on TDZD-8-evoked locomotion mitigation in LAB mice (OF test).

#### Data and Statistical Analysis

Considering the basal difference in the absolute values of locomotor activities between LAB and HAB mice (**Figures 1A**, **2F,F,G,G**), the OF data were normalized to better visualize relative changes in the traveled distance after drug administration. The applied algorithm (Hinkelmann et al., 2010) employs a comparison of running values (xi) and measurement of the last 5 min of the pretreatment period (x4). Relative changes were calculated in accordance with the equation: xi(%) = 200·(x 2 i )/(x 2 <sup>i</sup> + x 2 4 ). Gains and losses in the distance traveled were obtained by subtraction of normalized values from 100%. Such a way of data normalization ensures that relative changes in locomotion will be displayed in the ranges of 0%+100% (gain) and 0%−100% (loss), thus avoiding any bias towards increase locomotion on expenses of decreased locomotion. Microdialysis data were expressed as a percentage of absolute dopamine and norepinephrine basal values (basal values were the means of three consecutive samples) or as absolute concentrations in the microdialysates (for basal levels only). All data are expressed as mean ± S.E.M. The sample sizes were chosen on the basis of our previous experience with the procedures used, and they are adequate to detect meaningful differences between conditions. Statistical analyses were performed with Statistica, version 5.0 (StatSoft Inc., Tusla, OK, USA). Data were analyzed with the Kolmogorov-Smirnov for distribution and met the assumptions of the test with regard to normality, skew and homogeneity of variance. Mice of each line were randomly assigned to the treatments for between-groups comparison in microdialysis experiments. Counterbalanced assignment of treatment order for within-subject design was used in behavioral experiments. Experimenters were blind to either the subline or the treatment assignments. Statistics were two-tailed t-test, one-way ANOVA, two-way ANOVA (the factorial design included the time and line or treatment factors for analysis of dynamic locomotor activity and microdialysis data; structure and treatment factors for analysis for molecular data; in any other cases the design is specified particularly in the respective part of Results) and followed by Neuman-Keuls's or Dunnett's or Bonferoni's post hoc tests, if appropriate. All differences were considered significant at p < 0.05.

### Results

#### The Calming Effect of Amphetamine is not Mirrored by Changes in Monoamine Release

To confirm the hyperactive phenotype of LAB mice (Yen et al., 2013), we exposed LAB, NAB and HAB mice to an OF for 20 min (Experiment 1.1). LAB mice showed the elevated mean basal activity during 20 min exposure to the OF

activity (A). LAB mice are strictly different from NAB and HAB mice in their (\*\*\*p < 0.001). (B). Dose-dependent effect of amphetamine on locomotor activity in OF test. Here and in the next graph, data represent relative changes (gain or loss) in the mean total distance traveled within 60 min after drug administration in comparison to basal activities (last 5 min of pre-treatment 20 min period). Asterisks show the result of the Dunnett's post hoc test comparing the effect of different doses with saline effect. (C). Dose-dependent effect of methylphenidate on locomotor activity in OF test. The two-tailed Student's test in the cases of HAB and NAB mice revealed a difference (depicted with hash marks, ###p < 0.001) between changes in locomotor activity after saline and drug treatment. (Asterisks show the result of the Dunnett's post hoc test comparing the effect of different doses with saline effect in LAB mice. \*p < 0.05, \*\*p < 0.01, \*\*\*p < 0.001; n.e. not examined; numbers on the graph panels represent group size.

(791.0 ± 33.12 cm) compared to NAB (484.0 ± 13.66 cm) and HAB mice (438.7 ± 15.21 cm), which were indistinguishable from each other (one-way ANOVA with Bonferroni's post hoc test; F(2) = 72.72, p < 0.0001; **Figure 1A**).

LAB mice differed from NAB and HAB mice in their profile of changes in the mean total distance traveled after amphetamine and methylphenidate administration (Experiment 1.2). Amphetamine exerted a calming activity in LAB mice and increased locomotion in NAB and HAB mice in a range of doses (0.5--2.0 mg/kg, i.p.). Two-way ANOVA (line, dose) exploring amphetamine effects showed significance for all factors (line: F(2,130) = 61.97, p < 0.0001; dose: F(3,130) = 12.03, p < 0.0001; line x dose: F(6,130) = 19.79, p < 0.0001). The subsequent one-way ANOVAs performed separately per line points to significance of amphetamine dose effect in each line (LAB: F(3) = 45.71, p < 0.001; NAB: F(3) = 8.87, p = 0.001; HAB: F(3) = 15.46, p < 0.001; **Figure 1B**), however expressed in a different directions. Methylphenidate showed its stimulatory activity in all lines injected with 3 doses (3.0, 10.0, 30.0 mg/kg). The one-way ANOVA for LAB mice points to significant dose effect (F(3) = 7.20, p = 0.0013). The two-tailed Student's test in the cases of HAB and NAB mice revealed a difference between changes in locomotor activity after saline and drug treatment (HAB mice, t(16) = 18.05, p < 0.007, NAB mice t(20) = 13.72, p < 0.0001; **Figure 1C**). The stereotypia scores were always of zero levels. In no case did we detect even initial signs of amphetamine-specific (discontinuous sniffing) stereotypic behavior which might have explained the decrease in locomotor activity found in LAB mice.

On the basis of these results and considering the monoamine releasing potency of amphetamine and methylphenidate reported in the literature, we selected 1 mg/kg for amphetamine and 10 mg/kg for methylphenidate for the subsequent neurochemical studies. In these studies, we asked whether the differences in behavioral effects of amphetamine vs. methylphenidate in LAB mice (i.e., reduction vs. increase in locomotor activity) were reflected by similar differences in dopamine and norepinephrine levels in the mPFC and/or the striatum. This was done in comparison to HAB mice, which were most responsive to any of the drugs in terms of increased locomotion. Microdialysis was performed in the mPFC and in the striatum in independent groups of mice.

In the first cohort of animals (Experiment 2.1) microdialysis probes targeted the mPFC including the cingulate cortex (Cg1), the prelimbic cortex (PrL) and the infralimbic cortex (IL; **Figure 2A**). Mean basal catecholamine levels were evaluated in the entire pool of basal samples in drug-naive animals (day 1) (HAB, n = 5; LAB, n = 5). Comparison of the absolute extracellular dopamine levels measured in the mPFC failed to reveal any line differences for both the basal dopamine (t(21) = 0.61, p = 0.556; **Figure 2A**) and the norepinephrine (t(21) = 1.22, p = 0.236; **Figure 2A**) content.

As revealed by two-way ANOVAs (line, time), both amphetamine (1 mg/kg, i.p.) (time: F(8,66) = 3.04, p = 0.007; **Figure 2B**) and methylphenidate (10 mg/kg, i.p.) (time: F(8,80) = 2.51, p = 0.019; **Figure 2C**) caused a pronounced increase in the relative dopamine release in the mPFC irrespective of the mouse line (time: Fs > 2.59, p < 0.019 line: Fs < 0.10, p > 0.752; time x line: Fs < 0.39, p > 0.916). Results were essentially the same for norepinephrine (time: Fs > 9.08, p < 0.001; time x line: Fs < 0.64, p > 0.744; **Figures 2B,C**), except for a significantly higher release in HAB mice following methylphenidate treatment (line: F(1,77) = 5.43, p = 0.023). In each case, catecholamine levels

FIGURE 2 | Amphetamine and methylphenidate action on dopamine and norepinephrine release (A1, D1). The schematic diagrams show placement of microdialysis probes in subsequent coronal sections of the mPFC and striatum. The computer-based atlas by Paxinos and Franklin (2001) was used to mark probe locations; numbers refer to distances from the bregma, mm. (A2,3,D2). Basal values of dopamine and norepinephrine in the mPFC and dopamine in the striatum of LAB and HAB mice. \*p < 0.01; "n/N" represent the

sample size/number of animals under evaluation. Amphetamine (1 mg/kg, i.p.) and methylphenidate (10 mg/kg, i.p.) evoke comparable dopamine and norepinephrine release in the mPFC (B1,2,C1,2) and striatum (E1,2) of both LAB (circles) and HAB (triangles) mice. Amphetamine (1 mg/kg, i.p.) decreases hyperactivity in LAB mice (circles in contrast to the stimulatory effect in HAB mice (triangles) (F1,2). Methylphenidate (10 mg/kg, i.p.) stimulates locomotion in both lines (G1,2). The dashed lines define the moment of drug administration.

peaked within the first 20 min after treatment, followed by a return towards basal levels within 2 h.

In the second cohort of mice (Experiment 2.2) microdialysis probes were implanted into the dorso-lateral part of the striatum (**Figure 2D**). Basal levels of dopamine were lower in LAB (n = 8) mice in comparison to HAB (n = 5) mice (t(37) = 2.92, p = 0.006; **Figure 2D**). As shown by two-way ANOVAs, both amphetamine (1 mg/kg, i.p.) (time: F(8,110) = 30.38, p < 0.001; **Figure 2E**) and methylphenidate (10 mg/kg, i.p.) (time: F(8,80) = 43.21, p < 0.0001; **Figure 2E**) caused a significant increase in dopamine release, which was the same in HAB and LAB mice (line: Fs < 0.263, p > 0.668; time x line: Fs < 1.11, p > 0.365). Norepinephrine levels were below the detection limit. Again, dopamine levels peaked within 20 min after injection and returned to baseline within 2 h.

In a separate cohort of LAB and HAB mice we monitored changes in the distance traveled within 60--120 min after amphetamine (1 mg/kg, i.p.) and methylphenidate (10 mg/kg, i.p.) administrations (Experiment 2.3). It is of note that the transient increase in locomotor activity upon amphetamine (two-way ANOVA, treatment: F(1,120) = 9.17, p = 0.0066; **Figure 2F**) and methylphenidate (two-way ANOVA, treatment: F(1,33) = 47.11, p < 0.0001; **Figure 2G**) treatment in HAB and methylphenidate treatment in LAB mice (two-way ANOVA: treatment: F(1,45) = 40.06, p = 0.0008; **Figure 2G**), with return towards basal levels within 1 h, closely resembled the release patterns of dopamine and norepinephrine. The calming effects of amphetamine in LAB mice (two-way ANOVA: treatment: F(1,66) = 58.75, p < 0.0001; **Figure 2F**), in contrast, outlasted by far the neurochemical changes.

Collectively, the microdialysis experiments did not reveal any differences in dopamine or norepinephrine release upon treatment with amphetamine between LAB and HAB mice, which would explain the opposite behavioral effects. Moreover, amphetamine and methylphenidate showed comparable effects on dopamine and norepinephrine release in LAB mice despite their different effects on locomotor activity.

#### Dopamine D2 Receptor Function is not Impaired in LAB Mice

There is evidence for a critical involvement of dopamine D2 receptor in amphetamine behavioral effects (Seeman and Madras, 1998; Beaulieu et al., 2004, 2005). Therefore, we investigated whether dopamine D2 receptor signaling was altered in LAB mice under basal conditions. To this end, we examined behavioral (locomotor activity and catalepsy) and neurochemical changes induced by systemic administration of haloperidol. Haloperidol (1 mg/kg, i.p.) equally decreased locomotor activity in the OF test in both LAB and HAB mice (Experiment 3.1) (two-way ANOVA, line: F(1,27) = 1.75, p = 0.197; treatment: F(1,27) = 63.48, p < 0.0001; line x treatment: F(1,27) = 2.77, p = 0.109; **Figure 3A**); for details concerning the dynamics of locomotor activity changes see Yen et al. (2013). Haloperidol also caused a significant increase in catalepsy in LAB mice (Experiment 3.2). The descent latency from an involuntary posture increased 60 min after haloperidol administration, whereas amphetamine treatment (1 mg/kg, i.p.) had no effects compared to vehicle (one-way ANOVA: F(2) = 10.84, p = 0.0004; **Figure 3B**). This proves the specificity of amphetamine-induced decrease in locomotor activity in these animals.

We also examined the efficacy of dopamine D2 receptors to affect dopamine release on the local level (presynaptic autoreceptor-mediated release regulation) (Experiment 3.3). As revealed by two-way ANOVA, haloperidol elicited an increase in dopamine (time: F(11,93) = 2.32, p = 0.017; **Figure 3C**) and HVA (time: F(11,80) = 28.49, p < 0.001; **Figure 3D**) levels in the mPFC, with no differences between the two lines (line: Fs < 2.41, p > 0.125, time x line: Fs < 1.46, p > 0.337).

Taken together, these results demonstrate intact functions of dopamine D2 receptors, both pre- and postsynaptic, in LAB mice. Moreover, the calming effect of amphetamine does not result from any cataleptic-like effect.

#### GSK3β Inhibitors Attenuate Hyperlocomotion in LAB Mice

We showed before, that LiCl decreased the hyperactivity in LAB mice (Yen et al., 2013). Pharmacological activity of lithium is mediated by both direct and indirect inhibition of GSK3β (Beaulieu et al., 2009). Therefore, we compared the effects of amphetamine (1 mg/kg, i.p.), LiCl (100 mg/kg, i.p.), and the selective GSK3β inhibitor TDZD-8 (20 mg/kg, i.p.) on the locomotor activity in LAB, NAB, and HAB mice (Experiment 4.1 and 4.2). In LAB mice, TDZD-8 did not affect the locomotion in the OF test at a dose of 10 mg/kg, i.p. (data not shown). However, a dose of 20 mg/kg, i.p., sufficed to decrease hyperactivity. Notably, TDZD-8 elicited changes in locomotor activity which were qualitatively and quantitatively similar to the effects of amphetamine and LiCl: a rapid and lasting decline in activity (two-way ANOVA: treatment: F(2,275) = 0.48, p = 0.625; time: F(15,275) = 82.81, p < 0.0001; time x treatment: F(15,275) = 2.47, p < 0.0001; **Figure 4A**). The increase in the selectivity of the GSK3β inhibitors (LiCl vs. TDZD-8) went along with a loss in line specificity in their calming action: Though LiCl decreased locomotor activity in NAB, but not in HAB mice, TDZD-8 was effective in all three lines. Two-way ANOVA (line x treatment) showed significance for each factor and their interaction (treatment: F(1,61) = 21.685, p < 0.0001; line: F(1,61) = 17.65, p < 0.0001; treatment x line: F(1,61) = 7.44, p < 0.0001) (**Figure 4B**).

Although the acute effects of amphetamine and TDZD-8 were comparable in LAB mice, we observed a critical difference in delayed toxic impact of the two drugs: 2--3 days (but not 24 h) after treatment with TDZD-8 the animals showed a mortality rate of 30%. This may be related to a strong peripheral metabolic effect of the selective GSK3β inhibitor. In contrast, we failed to observe any fatal consequences for amphetamine treatment despite the huge number of treated mice (more than 100 over 3 years).

Similar profiles of hyperactivity mitigation after GSK3β inhibitor (TDZD-8, LiCl) and amphetamine administration in LAB mice strongly suggest that these compounds share a common molecular mechanism of action.

#### Amphetamine Selectively Increases GSK3β Phosphorylation in the mPFC of LAB Mice

The similarities in the behavioral consequences of amphetamine and GSK3β inhibitors prompted us to investigate whether

amphetamine directly interferes with GSK3β. To address this question, we examined the GSK3β inhibitory activity for amphetamine and methylphenidate in vitro (Experiment 5). Amphetamine and methylphenidate did not decrease the activity of purified GSK3β in vitro. In contrast, in this preparation, lithium effectively inhibited GSK3β, as well as did TDZD-8 (**Figure 4C**).

We next looked for changes in GSK3β phosphorylation in vivo, assuming an indirect interaction (Experiment 6). To this end, we treated HAB, NAB and LAB mice with amphetamine (1 mg/kg, i.p.) and measured the levels of phospho-Ser9-GSK3β (pGSK3β) in the mPFC (**Figure 4D**) and in the striatum (**Figure 4E**) 60 min later. Between-line (LAB, NAB, and HAB mice) comparison (two-way ANOVA, line, structure) revealed no difference in the phosphorylation for GSK3β, neither in the mPFC and in the striatum of vehicle-treated animals (line: F(2,59) = 3.69, p = 0.307; structure: F(1,59) = 4.21, p = 0.102; interaction: F(1,59) = 2.21, p = 0.489; **Figure 4F**) nor in total GSK3β levels (not shown). This speaks against a significant role of GSK3β in basal differences in locomotor activity. Following amphetamine treatment of LAB mice, however, GSK3β phosphorylation was significantly increased in both the mPFC and striatum. Three-way ANOVA (line x structure x treatment) showed significance of each factor and their interactions (Fs > 6.62, ps < 0.0011). Respectively, the twoway ANOVA confirmed the effect of amphetamine in each line group separately (treatment: F(1,33) = 47.58, p < 0.0001; structure: F(1,33) = 0.57, p = 0.54; treatment x structure: F(1,33) = 0.57, p = 0.54; **Figures 4G,H**). Amphetamine also increased the phosphorylation of GSK3β in NAB mice. This, however, was only the case for the striatum, but not for the mPFC (treatment: F(1,26) = 21.44, p = 0.0001; structure: F(1,26) = 24.22,

FIGURE 4 | Amphetamine treatment results in GSK3β inhibition (A). Amphetamine (1 mg/kg, i.p., circles), LiCl (100 mg/kg, i.p., squares), and TDZD-8 (20 mg/kg, i.p., diamonds) mitigate the locomotor activity in LAB mice with the same dynamics. The dashed line defines the moment of drug injection; n.s. non significant. (B). Within-line comparison of amphetamine (1 mg/kg, i.p.), LiCl (100 mg/kg, i.p.), TDZD-8 (20 mg/kg, i.p.), and saline effects on the locomotor activity in LAB, NAB, and HAB mice illustrates the similar ability of all drugs to decrease the traveled distance in LAB mice. Relative gains or losses in the mean distance traveled within 60 min after treatment was obtained in comparison to basal activities (last 5 min of pre-treatment 20 min period). Analysis confirms the selectivity of amphetamine action in LAB mice and points to a lack in line specify of LiCl and, in particular, TDZD-8 to inhibit locomotion. \*p < 0.05, \*\*p < 0.01, \*\*\*p < 0.001, within line comparison, Bonferroni's post hoc test. Numbers on the bars represent the group size. The same order of treatments was applied for each line. (C). Lithium chloride and TDZD-8 show a direct inhibition of recombinant GSK3β in vitro effectively reducing (p < 0.001) its activity in a dose-dependent manner. Neither amphetamine nor methylphenidate inhibit recombinant GSK3β in vitro. (D,E). Schematic representation of punched areas in the mPFC and the striatum. The computer-based atlas by Paxinos and Franklin (2001) was used to mark probe locations; numbers refer to distances from the bregma, mm. (F). Western blot kinase analysis for the mPFC and the striatum of drug naïve mice shows no difference between LAB (n = 8/11), NAB (n = 11/12) and HAB mice (n = 11/12) in pGSK3β levels. (G,I,K). Changes in the pGSK3β levels in the mPFC and the striatum of LAB, NAB, and HAB mice 60 min after amphetamine (1 mg/kg, i.p.) and saline treatment and coinciding with OF exposure. \*\*p < 0.01, \*\*\*p < 0.001 vs. saline (H,J,L). Representative Western blots for analysis of GSK3β, pGSK3β protein level. Bands include samples both from amphetamine and saline-treated animals. Numbers on the graph panels represent group size.

p < 0.0001; treatment x structure: F(1,26) = 24.22, p < 0.0001; **Figures 4I,J**). In HAB mice, we did not observe any changes in GSK3β phosphorylation, irrespective of the brain structure (treatment: F(1,36) = 4.00, p = 0.217; structure: F(1,36) = 2.21, p = 0.357; treatment x structure: F(1,36) = 2.21, p = 0.357; **Figures 4K,L**).

\*p < 0.05, \*\*p < 0.01.

In summary, amphetamine treatment caused line-dependent and structure-specific changes in GSK3β phosphorylation, whereby LAB mice displayed a selective increase in GSK3β phosphorylation in the mPFC.

mm. Amphetamine (1.8 ng/side, B) and TDZD-8 (1.1 ng/side, C) transiently

#### Selective Inhibition of GSK3β and Amphetamine Activity in the mPFC, but not in the Striatum, Mitigates Hyperlocomotion in LAB Mice

The Western blot data suggest a scenario according to which amphetamine mediates hypolocomotion via increased phosphorylation (i.e., inhibition) of GSK3β at the PFC rather than the striatal level. We verified this assumption by local application of amphetamine or TDZD-8 in the mPFC or striatum of LAB mice (Experiment 7) (**Figures 5A,D**). Bilateral administration of amphetamine (1.8 ng/0.5 ul/side) in the mPFC indeed decreased locomotor activity compared to vehicle immediately after drug infusion (two-way ANOVA: treatment: F(1,165) = 2.00, p = 0.177; time: F(11,165) = 8.66, p < 0.0001; treatment x time: F(11,165) = 2.27, p = 0.013; **Figure 5B**). The same rapid effect was seen after TDZD-8 (1.1 ng/0.5 ul/side; treatment: F(1,132) = 0.51, p = 0.488; time: F(11,132) = 9.42, p < 0.0001; treatment x time: F(11,132) = 3.38, p = 0.0004; **Figure 5C**). Bilateral administration of amphetamine (1.8 ng/0.5 ul/side) in the striatum, in contrast, rather stimulated locomotor activity (treatment: F(1,66) = 0.71, p = 0.431; time: F(11,66) = 6.68, p < 0.0001; treatment x time: F(11,66) = 1.90, p = 0.054; **Figure 5E**), and TDZD-8 (1.1 ng/0.5 ul/side) had no effect at all (treatment: F(1,132) = 0.04, p = 0.843; time: F(11,132) = 9.72, p < 0.0001; treatment x time: F(11,132) = 0.54, p = 0.870; **Figure 5F**).

These observations support the idea that the mPFC as a part of the activity controlling system (Carlsson and Carlsson, 1989) is involved in the paradoxical calming effect of amphetamine in LAB mice.

#### The Action of Amphetamine in LAB Mice is Sensitive to NMDA Receptor Signaling in a GSK3β-Dependent Manner

Since dopamine and norepinephrine are unlikely to be causally involved in the calming effect of amphetamine at the level of the mPFC (**Figure 2**), we studied possible interactions between amphetamine and glutamate signaling. Previous studies have suggested such a possibility (Anderzhanova et al., 2001). To this end, we used a combined treatment of amphetamine and NMDA receptor blocker MK-801.

In the Experiment 8.1 MK-801 (0.3 mg/kg) injected alone increased locomotor activity both in HAB and LAB mice. Twoway ANOVA (time, treatment) of changes in absolute levels of the distance traveled showed in HAB mice significance of treatment: F(1,180) 31.13, p = 0.0008; time F(15,180) = 21.22, p < 0.0001; treatment x time: F(15,180) = 15.31, p < 0.0001. In LAB mice two-way ANOVA revealed treatment: F(1,165) = 35.02, p = 0.0001; time F(15,165) = 10.71, p < 0.0001; treatment x time: F(15,165) = 23.02, p < 0.0001; **Figure 6A** inset). Since the effect of the drug can be hindered by the difference in the basal activity we also analyzed relative changes in the locomotion under MK-801 treatment. The analysis of normalized data confirmed the elevation in locomotor activity by MK-801 both in HAB mice (two-way ANOVA, treatment: F(1,180) = 236.0, p < 0.0001; time F(15,180) = 14.20, p < 0.0001; treatment x time: F(15,180) = 23.97, p = 0.0004) and LAB mice (treatment: F(1,165) = 3.83, p = 0.076; time F(15,165) = 3.58, p < 0.001; treatment x time: F(15,165) = 9.37, p < 0.0001). Relative to the basal levels, MK-801 induced less pronounced elevation in the locomotor activity in LAB mice in comparison to HAB mice (two-way ANOVA: line: F(1,165) = 14.10, p = 0.003; time: F(15,165) = 17.87, p < 0.0001; line x time: F(15,165) = 3.03, p < 0.0001; **Figure 6A**). This difference cannot be ascribed to divergence in habituation to the OF. When saline is injected, HAB, but not LAB, mice show a persistent decrease in locomotor activity (two-way ANOVA, line: F(1,185) = 33.97, p < 0.0001; time: F(15,165) = 24.30, p < 0.001; line x time: F(15,165) = 7.38, p < 0.0001). In the Experiment 8.2 by microdialysis means we have shown that pretreatment (−20 min) with MK-801 (0.3 mg/kg, i.p.) did not facilitate amphetamine-induced dopamine release in the mPFC of LAB mice, whereas stimulated release in HAB mice (two-way ANOVA, line: F(1,24) = 3.27, p = 0.048; time: F(8,24) = 1.31, p = 0.28; line x time: F(8,27) = 1.16, p = 0.32; **Figure 6B**, right panel). (The apparent decrease in dopamine release in LAB mice after MK-801 pretreatment compared to amphetamine given alone failed to reach the level of statistical significance).

In the distinct cohort of LAB mice (Experiment 9.1) MK-801 (0.3 mg/kg) pretreatment (−20 min) restored the hyperlocomotion in amphetamine-treated (1 mg/kg, i.p.) LAB mice (two-way ANOVA: treatment: F(1,165) = 9.09, p = 0.0032;

FIGURE 6 | MK-801 shows different activities in LAB and HAB mice (A). Relative increase in locomotor activity induced by the NMDA receptor antagonist MK-801 is more pronounced in HAB (triangles) than in LAB (circles) mice. The dashed lines mark the moments of saline or MK-801 (0.3 mg/kg, i.p.) administration. The inset graph represents changes in the absolute values of the distance traveled. \*\*p < 0.01, \*\*\*p < 0.001). (B). As revealed by in vivo microdialysis, pre-treatment (−20 min) with MK-801 facilitates dopamine release in the mPFC of HAB (triangles), but not of LAB (circle), mice. The dashed lines mark the moments of amphetamine (1 mg/kg, i.p.) and MK-801 (0.3 mg/kg, i.p.) administrations. \*p < 0.05 vs. LAB mice.

time: F(15,165) = 12.22, p < 0.0001; treatment x time: F(15,165) = 8.25, p < 0.0001; **Figure 7A**). MK-801 administered at the same time with amphetamine (+0 min) abolished the amphetamine calming effect (two-way ANOVA: treatment: F(1,265) = 18.63, p = 0.0004, time: F(15,265) = 30.35, p < 0.0001; treatment x time: F(15,265) = 6.33, p < 0.0001; **Figure 7B**). Post-treatment (+20 min) with MK-801 (0.3 mg/kg, i.p.) in amphetamine- (1 mg/kg, i.p.) treated LAB mice, however, failed to interfere with its calming effect (two-way ANOVA:

FIGURE 7 | Amphetamine action in LAB mice interfere with NMDA receptor signaling (A). Pre-treatment (−20 min) with MK-801 abolishes amphetamine effect on the locomotor activity in LAB mice. The dashed lines mark the moments either of saline and amphetamine (1 mg/kg, i.p.) or MK-801 (0.3 mg/kg, i.p.) and amphetamine (1 mg/kg, i.p.) administration. \*\*p < 0.01. (B). Co-treatment with MK-801 counteracts the amphetamine calming effect in LAB mice. The dashed lines mark the moments of either saline + amphetamine (1 mg/kg, i.p.) or MK-801 (0.3 mg/kg, i.p.) + amphetamine (1 mg/kg, i.p.) administration. \*\*\*p < 0.001. (C). MK-801 post-treatment (+20 min) did not interfere with amphetamine action. The dashed lines mark the moments of amphetamine (1 mg/kg, i.p.) and MK-801 (0.3 mg/kg, i.p.) injections. (D). Decrease in the phospho-Ser9-GSK3β levels in the mPFC of LAB (n = 5/5) mice 60 min after MK-801 (0.3 mg/kg, i.p) and amphetamine (1 mg/kg, i.p.) co-treatment and coinciding with OF exposure. \*p < 0.05 vs. amphetamine + saline. (E). Representative Western blots for the analysis of GSK3β, pGSK3β protein levels in LAB mice. Bands include

samples both from amphetamine + saline- and amphetamine + MK-801-treated animals. (F). MK-801 pre-treatment (−20 min) effectively collapses the calming effect of LiCl. The dashed lines mark the moments of MK-801 (0.3 mg/kg, i.p.) and LiCl (100 mg/kg, i.p.) injections. \*\*\*p < 0.0001. (G). MK-801 does not interfere with the activity of the pure GSK3β inhibitor TDZD-8 in its ability to modulate locomotor activity in LAB mice. The dashed lines mark the moments of either MK-801 (0.3 mg/kg), saline + TDZD-8 (20 mg/kg, i.p.) or MK-801 (0.3 mg/kg, i.p.) + TDZD-8 (20 mg/kg, i.p.) administration. Since there was no difference between saline and 0.5% DMSO treatment, results for both vehicle-treated groups were pooled together. \*\*\*p < 0.001, n.s. non significant. (H). Hypothetical scenario: amphetamine effects on GSK3β comprise NMDA receptor-mediated kinase slowdown (due to glutamate release) rather than any direct effect. This pathway, unidentified in details, is uniquely activated in the mPFC of LAB mice and leads to GSK3β inhibition. Hypofunction of NMDA receptors might be a permissive cause of this pathway activation.

treatment: F(1,180) = 4.10, p = 0.166; time: F(1,15) = 42.74, p < 0.0001; treatment x time: F(1,180) = 1.87, p = 0.722; **Figure 7C**).

Amphetamine (1 mg/kg) and MK-801 (0.3 mg/kg, i.p.) cotreatment resulted in a decrease in amphetamine-driven changes in phosphorylation of GSK3β in the mPFC 60 min after injection (Experiment 9.2) (t(8)= 3.12, R <sup>2</sup>= 0.564, p = 0.012; **Figures 7D,E**).

MK-801 (0.3 mk/kg, i.p.) pre-treatment prevented the calming effect of non-selective GSK3β inhibitor LiCl (100 mg/kg, i.p.) in LAB mice (Experiment 9.3) (twoway ANOVA: treatment: F(1,150) = 8.10, p = 0.057; time: F(1,15) = 12.21, p < 0.0001; treatment x time: F(1,150) = 16.30, p < 0.0001; **Figure 7F**). In contrast, MK-801 (0.3 mg/kg, i.p.) co-administered with the selective GSK3β inhibitor TDZD-8 (20 mg/kg, i.p.) (Experiment 9.4) failed to prevent its calming effect (two-way ANOVA, treatment x time: treatment: F(1,270) = 0.264, p = 0.614; time: F(15,270) = 79.00, p < 0.0001; treatment x time: F(15,270) = 7.005, p < 0.0001; **Figure 7G**. It is know that in addition to direct inhibition of GSK3β activity lithium interacts with upstream factors of GSK3β phosphorylation. Therefore, MK-801 action likely targets upstream pathway(s) of GSK3β activity regulation.

Since both the dynamics and magnitude of hyperlocomotion mitigation in LAB mice were similar after systemic administration of amphetamine, and TDZD-8 (see **Figure 4A**), the differences in MK-801-amphetamine vs. MK-801-TDZD-8 interaction cannot be explained by insufficiency of amphetamine. Taken together, the behavioral and molecular data suggest an interaction of amphetamine and NMDA receptor signaling upstream of GSK3β activity in mediating amphetamine calming effect.

# Discussion

We examined the neurochemical and molecular signature of the amphetamine calming effect in LAB mice, which are characterized by the locomotor hyperactivity and cognitive impairment resembling an ADHD-like endophenotype (Yen et al., 2013). Our findings suggest that changes in dopamine and norepinephrine release in the mPFC and the striatum in LAB mice are unlikely involved in the calming action of amphetamine. Instead, we provide evidence that amphetamine actions involve inhibition of GSK3β at the level of the mPFC and interaction with NMDA receptor signaling.

We employed a line comparison strategy to identify the signature of the calming amphetamine effect in LAB mice. A set of behavioral and microdialysis experiments render it highly unlikely that changes in monoamine release play a major role in the line-specific hyperactivity and the calming effect of amphetamine: First, LAB mice showed lower, but not higher basal dopamine levels in the striatum (**Figure 2D**) compared to HAB mice. This not only corroborates findings in spontaneously hypertensive hyperactive rats (Russell et al., 1998; Russell, 2002), but speaks against the hypothesis that high basal dopamine levels are causally involved in ADHD-like hyperactivity (Waldman et al., 1998; Gainetdinov et al., 1999).

Second, both amphetamine and methylphenidate cause a similar increase in the dopamine and/or norepinephrine levels in LAB and HAB mice despite the line-specific difference in behavior (increased locomotion in HAB vs. decreased locomotion in LAB mice after amphetamine treatment compared to increased locomotion in both lines after methylphenidate treatment). The similar dynamics of neurochemical and behavioral changes after methylphenidate are consonant with an involvement of increased dopamine and/or norepinephrine signaling in the increased locomotion observed in both LAB and HAB mice. The divergent neurochemical (transient increase) and behavioral (sustained decrease) changes after amphetamine treatment in LAB mice, in contrast, argue against a causal relationship between an amphetamine-induced monoamine release and a paradoxical calming effect (**Figures 1**, **2**). There are few reports of methylphenidate ineffectiveness (30--40%) in treatment of hyperactivity in kids (Winsberg et al., 1980; Pelham et al., 1999; Gerwe et al., 2009). At the same time amphetamine effective in a majority of cases (90%) (Pliszka et al., 2000). Therefore, differential response to amphetamine and methylphenidate in clinic may serve for a better diagnosis either to differentiate co-morbid disorders or the ADHD presentations.

Third, compared to HAB mice, MK-801 pre-treatment failed to facilitate amphetamine-induced dopamine release in the mPFC of these animals, but at the same time prevented from changes in the locomotor activity in LAB mice (**Figures 6B**, **7A,B**). This serves as an additional proof of dopamineindependent hyperactivity mitigation in LAB mice and points on possible changes in the NMDA receptor-dependent activity of the mPFC.

Forth, neuroleptic haloperidol, the dopamine D2 receptor antagonist, decreased locomotion in LAB, NAB, and HAB mice to the same extent, induced catalepsy in LAB mice (in contrast to amphetamine), and elicited comparable changes in the dopamine and HVA levels in the mPFC of LAB and HAB mice (**Figure 3**), suggesting unaltered D2 receptor signaling in LAB mice. Finally, the lack of line differences in phosphorylation of GSK3β between saline-treated LAB, NAB, and HAB mice (**Figure 4F**) points towards intact regulation of the β-arrestin2/Akt/GSK3β complex at basal conditions (Beaulieu et al., 2005), and undisturbed dopamine D2 receptormediated signaling in LAB mice.

Mitigation of hyperactivity by amphetamine in LAB mice coincided with an increase in GSK3β phosphorylation both in the mPFC and in the striatum (**Figures 4G,H**). This effect was line-specific only at a level of the mPFC in which neither NAB nor HAB mice showed similar changes. This suggested that the inhibition of GSK3β activity in the mPFC contributes to the calming effects of amphetamine. In mice with normal locomotor activity, an increase in the brain GSK3β phosphorylation in the striatum and the mPFC may be observed as early as 15 min after amphetamine given systemically, whereas GSK3β phosphorylation is decreased in the striatum 60--90 min after amphetamine administration (Svenningsson et al., 2003; Beaulieu et al., 2005). However, Akt-independent increase in GSK3β phosphorylation in the mPFC was shown after recurrent amphetamine administration in a model of amphetamineinduced psychomotor sensitization. This phenomenon may involve activation of a short-cut GSK3β--β-arrestin2 feedback (Mines and Jope, 2012).

The short-term mitigation of hyperactivity resulting from local microinjections of amphetamine and TDZD-8 in the mPFC, but not in the striatum of LAB mice (**Figure 5**) supports the role of mPFC in the observed pharmacological phenomenon. In order to avoid drugs being spread across the structures, we applied low doses that probably resulted in the rapid and non-lasting drug effect. The transient effect of microinjections compared to the results of systemic treatment can be also explained considering the difference in the pharmacokinetics of drugs at local and i.p. administration.

Currently, we can only speculate about the way how the selective decrease in GSK3β activity in the mPFC is translated into behavior. Since the majority of mPFC neurons (70--75%) are glutamatergic neurons, it is tempting to assume that the prominent molecular changes are mediated by alterations in the projecting glutamatergic neurons of the mPFC. An alternative scenario is taking into account the differences in pharmacodynamic aspects of amphetamine and methylphenidate action (Calipari et al., 2015). In contrast to methylphenidate, amphetamine targets the DAT and other monoamine transporters, pumps the neurotransmitters out of the terminal, but does not block their uptake. Amphetamine modulates the phasic release of dopamine, inhibits vesicular monoamine transporter type 2 and monoamine oxidase. Its metabolites may also contribute to the profile of amphetamine actions (Sulzer, 2011). In addition, amphetamine changes the performance of amino acid transporters (Del Arco et al., 1999), which results in an increase in the extracellular glutamate levels (Anderzhanova et al., 2001). As has been recently reported, LAB and HAB mice are strictly different in the expression of proteins involved in the regulation of the glutamatergic and GABAergic neurotransmission at the level of the cingulate cortices (Filiou et al., 2011; Iris et al., 2014). Moreover, a lower plasma level of glutamate was found in LAB mice (Zhang et al., 2011). Together, these observations support the idea of a possible imbalance of the excitatory and inhibitory neurotransmission in LAB mice in general. Interestingly, a disturbance in the glutamatergic neurotransmission was proposed as a mechanism mediating hyperactivity in DAT-KO mice (Gainetdinov et al., 2001).

Given the fact that amphetamine may cause glutamate release, our observations provide a mechanistic explanation for the specific amphetamine action in LAB mice. The relatively small increase in locomotor activity in LAB mice after MK-801 systemic administration in comparison to HAB mice (**Figure 6A**) and the lack of its facilitating effect on amphetamine-evoked dopamine release in the mPFC (**Figure 6B**) point to innate changes in NMDA receptormediated activity in these animals (Duncan et al., 2002). The interpretation of our behavioral data on MK-801 activity may be limited due to difference in the basal locomotor activity between LAB and HAB mice. In fact, the normalization algorithm we applied to compare MK-801 effect between lines may potentially lead to the drug effect overestimation. Thus, original data show that LAB mice develop higher absolute locomotor activity after MK-801 administration than HAB mice. Nonetheless, a summation of the basal activity and MK-801 induced effect may be achieved in LAB mice due to the same function (constitutional or antagonist-induced decrease in NMDA receptor activity). Our consideration that

### References


the basal hyperlocomotion and diminished relative effect of MK-801 in LAB mice have same nature is supported by our neurochemical and molecular data. A possible hypofunctionality of NMDA receptors in GABA-ergic cortical neurons (Corlett et al., 2011) may underlie psychotic traits of the endophenotype representing an ADHD-mania-schizophrenia continuum (Yen et al., 2013). Our findings of differentially timed MK-801 and amphetamine treatment (**Figures 7A,B,C**) and lack of TDZD-8 and amphetamine interaction (**Figure 7G**) suggest that amphetamine directs its action at GSK3β in the mPFC via a pathway, which depends on upstream NMDA receptormediated quasi-metabotropic signaling (**Figure 7H**). This NMDA receptor-mediated GSK3β activity regulation may be independent of pathways forcing Akt phosphorylation at the Ser308. Preliminary data show that the Thr473 phosphorylation site is rather engaged, since we have observed a line x structuredependent correlation between changes in the levels of phospho-Thr473-Akt and phospho-Ser9-GSK3β under amphetamine treatment.

In conclusion, neither the hyperactivity in LAB mice nor the calming effect of amphetamine can be ascribed to changes in dopamine and norepinephrine neurotransmission in the striatum and the mPFC. Instead, amphetamine-triggered phosphorylation of GSK3β in the mPFC, but not the striatum, seems to participate in amphetamine-induced mitigation of hyperactivity in LAB mice. This calming action of amphetamine involves a functional interaction with NMDA receptors upstream of GSK3β. From a translational perspective, our data suggest GSK3β as a target for pharmacotherapy of disorders from the ADHD-mania-schizophrenia continuum.

# Author Contribution

Y-CY, AZ, NCG, EA acquired data; NCG, TR, EA, CTW designed the work; EA, RL, CTW conceived the work and played an important role in interpreting the results; EA drafted the manuscript; EA and CTW contributed equally to the study.

## Acknowledgments

The authors thank Ms. Anja Mederer and Mr. Markus Nußbaumer for invaluable technical assistance.


following in situ freezing. J. Neurochem. 60, 827--834. doi: 10.1111/j.1471-4159. 1993.tb03226.x


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. This work was supported by the Max Planck Society.

Copyright © 2015 Yen, Gassen, Zellner, Rein, Landgraf, Wotjak and Anderzhanova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

#### *Edited by:*

*Gregory B. Bissonette, University of Maryland, USA*

#### *Reviewed by:*

*Gerry Leisman, O.R.T. Braude College of Engineering, Israel Rossella Canese, Istituto Superiore di Sanità, Italy*

#### *\*Correspondence:*

 *Ludger Tebartz van Elst, Section for Experimental Neuropsychiatry, Department for Psychiatry and Psychotherapy, University Medical Center Freiburg, Hauptstr. 5, Freiburg 79104, Germany tebartzvanelst@uniklinik-freiburg.de*

*† Dominique Endres and Evgeniy Perlov have contributed equally to this work.*

*Received: 02 June 2015 Accepted: 24 August 2015 Published: 28 September 2015*

#### *Citation:*

*Endres D, Perlov E, Maier S, Feige B, Nickel K, Goll P, Bubl E, Lange T, Glauche V, Graf E, Ebert D, Sobanski E, Philipsen A and Tebartz van Elst L (2015) Normal neurochemistry in the prefrontal and cerebellar brain of adults with attention-deficit hyperactivity disorder. Front. Behav. Neurosci. 9:242. doi: 10.3389/fnbeh.2015.00242*

*Dominique Endres1† , Evgeniy Perlov1† , Simon Maier1 , Bernd Feige1 , Kathrin Nickel1 , Peter Goll1 , Emanuel Bubl1 , Thomas Lange2,3 , Volkmar Glauche4 , Erika Graf <sup>5</sup> , Dieter Ebert1 , Esther Sobanski <sup>6</sup> , Alexandra Philipsen1 and Ludger Tebartz van Elst1 \**

*1Section for Experimental Neuropsychiatry, Department for Psychiatry and Psychotherapy, University Medical Center Freiburg, Freiburg, Germany, 2Department of Radiology, Medical Physics, University Medical Center Freiburg, Freiburg, Germany, 3 Freiburg Institute for Advanced Studies, Albert-Ludwigs-University, Freiburg, Germany, 4Department of Neurology, University Medical Center Freiburg, Freiburg, Germany, 5Clinical Trials Unit, University Medical Center Freiburg, Freiburg, Germany, 6Clinic for Psychiatry and Psychotherapy, Central Institute for Mental Health Mannheim, Mannheim, Germany*

Attention-deficit hyperactivity disorder (ADHD) is a common neurodevelopmental disorder. In an attempt to extend earlier neurochemical findings, we organized a magnetic resonance spectroscopy (MRS) study as part of a large, government-funded, prospective, randomized, multicenter clinical trial comparing the effectiveness of specific psychotherapy with counseling and stimulant treatment with placebo treatment (Comparison of Methylphenidate and Psychotherapy Study). We report the baseline neurochemical data for the anterior cingulate cortex (ACC) and the cerebellum in a case–control setting. For the trial, 1,480 adult patients were contacted for participation, 518 were assessed for eligibility, 433 were randomized, and 187 were potentially eligible for neuroimaging. The control group included 119 healthy volunteers. Single-voxel proton MRS was performed. In the patient group, 113 ACC and 104 cerebellar spectra fulfilled all quality criteria for inclusion in statistical calculations, as did 82 ACC and 78 cerebellar spectra in the control group. We did not find any significant neurometabolic differences between the ADHD and control group in the ACC (Wilks' lambda test: *p* = 0.97) or in the cerebellum (*p* = 0.62). Thus, we were unable to replicate earlier findings in this methodologically sophisticated study. We discuss our findings in the context of a comprehensive review of other MRS studies on ADHD and a somewhat skeptical neuropsychiatric research perspective. As in other neuropsychiatric disorders, the unclear nosological status of ADHD might be an explanation for false-negative findings.

Keywords: attention-deficit hyperactivity disorder, MRS, glutamate, anterior cingulate cortex, cerebellum, nosology

# Introduction

Attention-deficit hyperactivity disorder (ADHD) is a common and often debilitating disorder that has received increasing public and scientific attention. In adulthood, the prevalence rates are estimated at 2–4% (Biederman, 2005; Philipsen et al., 2008). According to the criteria outlined in the *Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition* (DSM-5), there are three presentations of ADHD: the inattentive subtype (iADHD), the hyperactive-impulsive subtype (hADHD), and the combined subtype (cADHD).1 The dopaminergic system seems to play a central role in the pathophysiology of ADHD (Philipsen et al., 2008). Treatment options for adult ADHD include pharmacological interventions (Castells et al., 2011) and psychotherapy (Philipsen, 2012). However, because evidence of the efficacy of these methods in adults is sparse, the treatment options require further investigation (Volkow and Swanson, 2013).

#### The COMPAS Study

In order to investigate treatment options for adult ADHD, the Comparison of Methylphenidate and Psychotherapy Study (hereafter the COMPAS study), a prospective, double-blind, placebo-controlled, multicenter clinical trial, was funded by the German Federal Ministry of Education and Research (BMBF: ADHD-NET: 01GV0605, 01GV0606) between 2006 and 2013. Using a factorial four-arm design, the COMPAS study compared the efficacy of cognitive behavioral group psychotherapy to that of clinical management in combination with methylphenidate (MPH) or placebo (Philipsen, 2008). As an integral part of the COMPAS study in the present magnetic resonance spectroscopy (MRS) study, we aimed to assess the neurochemical neuronal health of enrolled patients and to obtain evidence of putative dopaminergic dysfunction in measuring glutamate signals without exposure to radiation.

#### Magnetic Resonance Spectroscopy

Magnetic resonance spectroscopy is a unique, non-invasive method of measuring different metabolites in the human brain. Single-voxel spectroscopy (SVS) is the most established method and allows an absolute quantification of the following neurometabolites: glutamate (Glu) and glutamine (Gln) and the combined Glu and Gln signal (Glx); phosphocholine (PCh) and glycerophosphorylcholine (GPC) and the combined total choline signal (t-Cho); and *N*-acetylaspartate (NAA), creatine (Cre), and myo-Inositol (mI). Glu is the major excitatory neurotransmitter in the human brain. Gln is the precursor and storage form of Glu in astrocytes. The activity of the glutamate system is closely interwoven with dopaminergic neurotransmission. NAA is regarded as a marker of overall neuronal and axonal integrity. t-Cho indicates cell-membrane turnover. Cre is a marker of brain energy metabolism. Lastly, mI is a glial marker as well as a part of the phosphatidylinositol second messenger system (Ross and Bluml, 2001). Since MRS is capable of quantifying all of these neurometabolites under specific conditions, it could provide a

#### Previous MRS Findings Regarding ADHD

We performed a comprehensive review of the MRS literature on ADHD on the basis of a PubMed search using the search terms "ADHD" and "spectroscopy." We identified 416 hits (on May 30, 2015) that were individually screened for content. Excluding methodological papers, reviews, and case reports, we identified 32 MRS studies on ADHD. All of these studies are presented in **Table 1**, and we discuss this literature in detail in the discussion section of the paper. During the finalization of our study protocol (2006), there was first evidence – albeit limited – of altered fronto-striatal NAA signals (Hesslinger et al., 2001; Jin et al., 2001) and increased glutamate signals (MacMaster et al., 2003; Courvoisie et al., 2004), which were reduced under medication with stimulants (Carrey et al., 2002, 2003).

#### Rationale of Our Study

Based on the evidence available during the finalization of the study protocol, we hypothesized that we would find altered fronto-striatal Glx and NAA signals in adult patients with ADHD as indirect evidence of dopaminergic dysfunction. Following the initial measurements of the striatum, which had very poor data quality, we abstained from our initial intent to measure the striatum and instead focused on the anterior cingulate cortex (ACC) and cerebellum. We adopted this approach because we already had data – which at that time was unpublished – pointing to glutamatergic abnormalities in these volumes of interest (VOIs) with decreased Glu/Cre ratios in the right pregenual ACC and increased Glu/Cre ratios in the left cerebellar hemisphere (Perlov et al., 2007, 2009, 2010).

### Materials and Methods

As an integral part of the multicenter COMPAS study, the imaging study received approval from the leading ethics committee (Faculty of Medicine, Freiburg University, 217/06) and authorization from the relevant German authorities (EudraCT No.: 2006- 000222-31). Prior to beginning the study, the trial was registered with Current Controlled Trials.2 The methods, including the study design, endpoints, patient enrollment, and characteristics of the clinical study sample, have been described in detail previously (Philipsen et al., 2010, 2014). The study protocol is published in German on the Internet (Philipsen, 2008). All participants gave written consent for their participation in the imaging project.

#### Patient Assessment

The patient assessment took place between January 2007 and August 2010. The diagnostic procedure has been described in detail in two previous papers (Philipsen et al., 2010, 2014). All patients were stimulant-free for at least 6 months prior to scanning. **Table 2** provides an overview of the screening process and the details of the MRS substudy's selection procedure. Inclusion and exclusion criteria for the ADHD group are listed in Supplemental Table 1.

comprehensive assessment of the neurochemical health of the brain (Perlov et al., 2009).

<sup>1</sup>www.dsm5.org

<sup>2</sup>www.isrctn.com/ISRCTN54096201

#### TABLE 1 | Summary of previous MRS studies on ADHD.


*(Continued)* Neurochemistry in adult ADHD

#### TABLE 1 | Continued


*(Continued)*

after MPH

#### TABLE 1 | Continued


*Cre, creatine; t-Cho, phosphorylcholine* + *glycerylphosphorylcholine; Glu, glutamate; Glx, glutamate* + *glutamine; NAA, N-acetylaspartate; mI, myo-Inositol; le, left; ri, right;* ↑*, increase in ADHD;* ↓*, decrease in ADHD;* ↔*, no metabolite differences between groups; MPH, methylphenidate; AM, atomoxetine; iADHD, inattentive ADHD subtype; cADHD, combined ADHD subtype; hADHD, hyperactive ADHD subtype; PRESS, point-resolved spectroscopy; STEAM, stimulated echo acquisition method; T, Tesla; SVS, single-voxel spectroscopy; CSI, chemical shift imaging; 31-P-MRS, phosphorus magnetic resonance spectroscopy; 1H-MRS, proton magnetic resonance spectroscopy; DLPFC, dorsolateral prefrontal cortex; ACC, anterior cingulate cortex; PCC, posterior cingulate cortex; MCC, medial cingulate cortex; free-PME, freely mobile membrane phospholipid precursors; free-PDE, freely mobile membrane phospholipid breakdown products.*

*Studies measuring the anterior cingulate cortex or cerebellar regions are in bold type.*

*aMPH or amphetamine.*

*bMPH or dextro-amphetamine.*

#### TABLE 2 | ADHD and healthy control collective and reasons for exclusion.


113 high-quality ACC and 104 high-quality cerebellar spectra for statistical analysis 82 high-quality ACC and 78 high-quality cerebellar

*Post hoc* information about exclusion criteria 9 9 6 6

spectra for statistical analysis

*ACC, anterior cingulate cortex.a Failure to show up on the measurement dates.*

The MRS sample was restricted to two study centers, one at Freiburg and one at Mannheim, in order to avoid scanner variance by performing measurements with only one MRI scanner at the Freiburg center. For this study, 113 ACC spectra and 104 cerebellar spectra fulfilled all the subsequent quality criteria for inclusion in statistical calculations.

#### Assessment of Control Subjects

The healthy control group was recruited via public announcements. We took great care to exclude relevant neuropsychiatric conditions and to provide psychometric assessments of our control subjects. The assessment instruments and exclusion criteria are illustrated in Supplemental Table 2. We were able to keep 82 ACC and 78 cerebellum data sets that fulfilled all quality criteria.

#### Matching Procedure

We attempted to select controls such that the distributions of demographic variables in controls would resemble those in patients. However, because many data sets had to be excluded after the measurements for data quality reasons, the final control sample was not fully matched. We considered but ultimately decided against a secondary matching procedure, in which we could have selected a smaller control sample from the pool of available control data that would have been matched with respect to all demographic data. Instead, we decided to include all the control data that fulfilled the methodological quality criteria, and we accounted for statistically significant differences in demographic variables by including the respective factors as covariates in the linear model to compare patients with controls.

#### Data Acquisition and Analysis

Magnetic resonance spectroscopy measurement and analysis were performed following a method established in previous studies (Tebartz van Elst et al., 2014a,b). All measurements were obtained at the University of Freiburg on a 3-Tesla whole-body scanner (Siemens Magnetom Trio a TIM system; Erlangen, Germany) using a 12-channel head-coil for signal reception. First, a T1-weighted 3D data set was recorded using a magnetization-prepared rapid-acquisition gradient echo (MPRAGE) sequence with the following parameters: field of view (FOV) = 256 mm × 256 mm, repetition time (TR) = 2200 ms, echo time (TE) = 4.11 ms, flip angle = 12°, and voxel size = 1 mm × 1 mm × 1 mm. For spectroscopic measurements, voxels were placed in the pregenual ACC (16 mm × 25 mm × 20 mm) and in the center of the left cerebellar hemisphere (20 mm × 20 mm × 20 mm). A point-resolved spectroscopy (PRESS) sequence with a TR of 3000 ms and a TE of 30 ms was used. For the absolute quantification of metabolites, we also acquired a non-water-suppressed reference spectrum.

Cramér-Rao lower bounds; SPM8, Statistical Parametric Mapping – Version 8; GM, gray matter; WM, white matter; CSF, cerebrospinal fluid.

For spectroscopic analysis, the well-established LCModel (linear combination of model spectra) algorithm was used. The absolute quantification of metabolites (Cre, t-Cho, Glx, NAA, and mI) was estimated using an internal water signal reference (Provencher, 1993, 2001; Helms, 2008; Tebartz van Elst et al., 2014a,b). For further analyses, only spectra with Cramér-Rao lower bounds (CRLBs) for the main metabolites <20% were included (Provencher, 1993, 2001).3 To estimate the water content in the VOI, the MPRAGE was segmented into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), using the unified-segmentation approach according to Ashburner and Friston (2005) and as implemented in Statistical Parametric Mapping, Version 8 (SPM8). For each spectroscopy voxel, the partial volumes of GM, WM, and CSF were computed from this segmentation. The metabolic concentrations of each VOI were correct for the contents of GM, WM, and CSF. **Figure 1** summarizes all the details.

### Statistical Analysis

Group comparisons for continuous variables (age, IQ, nicotine consumption, and psychometric scores) were performed with two-sided independent-sample *t*-tests. Group comparisons for gender were calculated using Pearson's two-sided chi-squared test. For primary analysis, the neurometabolite concentrations of the patient and control groups were compared using a multivariate analysis of covariance and employing a general linear model (MANCOVA). The concentrations of neurometabolites were chosen as dependent variables. Of the possible influencing factors considered – age (Kaiser et al., 2005), gender, IQ (Jung et al., 1999), and nicotine consumption (Domino, 2008) – all except gender differed significantly between the ADHD and control groups (*p* < 0.05). Therefore, these values were included as covariates. To test for an overall group effect across all five neurometabolites, a multivariate Wilks' lambda test was performed. Group differences per single neurometabolite concentration were tested and estimated with the confidence interval from the linear MANCOVA model, adjusting for the imbalanced influencing factors. Next, the ADHD subtypes were compared with the control groups using the same MANCOVA approach and employing a multivariate Wilks' lambda test. Correlation analyses were performed using the Spearman correlation coefficient. For overall and single-group differences in neurometabolite concentrations, the level of significance was corrected for multiple tests using the Bonferroni approach (*p* < 0.025, because we measured two regions; 97.5% confidence intervals). For all subgroup and correlation analyses, we did not correct for multiple comparisons (*p* < 0.05 as the criterion of significance).

<sup>3</sup>www.s-provencher.com/pages/lcmodel.shtml

# Results

#### Demographic and Psychometric Data

**Table 3** summarizes the demographic and psychometric data for the ADHD and control groups following the exclusion of all compromised data sets. In both the ACC and the cerebellum samples, the patient and control groups significantly differed in age, IQ, and nicotine consumption, but not in gender. As expected, the psychometric scores for ADHD symptoms and depressiveness were significantly higher in the patient group.

#### MRS Results

**Table 4** summarizes the spectroscopic results. Scatterplots of all individual metabolite concentrations, with means and 95% confidence intervals, are presented in **Figure 2**. Subgroup analyses are presented in **Table 5** and correlation analyses are presented in Supplemental Table 3.

#### Anterior Cingulate Cortex

A Wilks' lambda test found no significant overall between-group group differences for the five neurometabolite concentrations in the ACC (*p* = 0.97). Furthermore, none of the metabolite concentrations were found to differ significantly between patients and controls. The inattentive and combined ADHD subtypes did not differ significantly from the control groups. A significant negative correlation between t-Cho and subclinical depression (measured via the Beck Depression Inventory score) was found (*p* = 0.0006).

#### Cerebellum

No overall metabolite differences between groups were found (Wilks' lambda: *p* = 0.62) in the cerebellar VOI as well. Again, no differences in any of the five single metabolites were detected. Moreover, the subgroup analyses did not reveal any significant differences. The correlation analysis showed a discrete negative correlation between the mI signal and nicotine consumption (*p* = 0.04).

# Discussion

The main finding of this large study is the absence of any significant differences between patients and controls. We could not replicate earlier findings and could not confirm our working hypothesis regarding altered NAA and glutamate signals. In fact, we did not find evidence of any neurochemical abnormality.

#### False-Negative Finding?

The issue of a potentially false-negative finding can be addressed by looking at the confidence intervals. **Table 4** illustrates that all lower and upper bounds of the 97.5% confidence intervals for the estimated mean differences for all metabolites are no more than 0.6 standard deviations below or above 0. We can thus exclude even moderate differences between patients and controls corresponding to effect sizes of 0.6 and greater. This observation supports the notion that we have produced true negative results.

#### Comparison to Previous Studies

How does this compare to previous research? We performed a comprehensive review of the MRS literature on ADHD (see **Table 1**). The respective results illustrate that our negative finding is in line with the majority of previous reports. While several earlier studies – including our own (Perlov et al., 2007, 2009, 2010) – did find significant differences in Glx signals and other neurometabolites, no clear pattern of signal change emerged across different studies.

What might be the reasons for these discrepant findings? **Table 1** and Supplemental Table 4 illustrate that the sample sizes of several previous studies were rather small. Furthermore, differences in findings might be due partially to differences in scanning techniques, acquisition protocols, analytical algorithms, and sample-selection criteria. In previous studies, patient samples differed in terms of age, subtype, medication, and comorbidity.


*SD, standard deviation; M, male; F, female; CAARS, Conners Adult ADHD Rating Scales – self report: long version; iADHD, inattentive subtype; hADHD, hyperactive-impulsive subtype; cADHD, combined subtype.*

*a p-value: to test for differences between groups.*

*bMeasured by the Multiple-Choice Vocabulary Intelligence Test (Lehrl et al., 1995).*

*c The hyperactive subtype was added to the combined subtype for all subtype analyses because of the small group size.*


TABLE 4 | Spectroscopic findings in the pregenual ACC and the cerebellum (IU).

*SD, standard deviation; CI, confidence interval; Cre, creatine; t-Cho, phosphorylcholine* + *glycerylphosphorylcholine; Glx, glutamate* + *glutamine; NAA, N-acetylaspartate; mI, myo-Inositol. a MANCOVA, multivariate analysis of covariance using the covariates age, IQ, and nicotine consumption.*

Concerning age, studies on children and adults have to be distinguished because children have developing (i.e., dynamically changing) brains, whereas adult brains' fundamental neurochemical and structural properties are not considered to change in a developmental manner. In the present study, we investigated only adult patients; therefore, developmental considerations should be negligible. Furthermore, the number of medicated patients differed in previous studies. In addition to being linked to short-term neurometabolic changes in some studies (Carrey et al., 2002; MacMaster et al., 2003; Wiguna et al., 2012; Husarova et al., 2014), MPH exposure may also cause long-term changes (Nakao et al., 2011; Frodl and Skokauskas, 2012). In our study, patients had gone at least 6 months without ADHD-specific medication. Therefore, it is unlikely that the normalizing effects of ADHD medication could explain our negative findings. The influence of comorbid diseases must also be taken into account because most psychiatric comorbidities might be related to changes in neurometabolism. In this study, relevant psychiatric comorbidities, such as present depressive disorder or other first-axis disorders, served as exclusion criteria. In summary, there is little evidence to suggest that age, medication, or the pathophysiology of comorbid neuropsychiatric disorders have contributed to our findings.

#### Sample Selection and MRS

This study was an integral part of the abovementioned COMPAS study. The sample selection followed strict standards (Philipsen et al., 2010, 2014). Only primary ADHD cases were included, and evidence of organic cerebral diseases led to exclusion. Therefore, our results cannot be generalized to patients with secondary forms of ADHD. We decided against secondary matching procedures and included all participants with good data quality in our statistical analyses. In doing so, we generated a large, carefully characterized, and fully transparent data set that minimized the risk of false-positive findings due, for example, to inadequate selection of controls. We methodically used the well-established and well-evaluated single-voxel proton spectroscopy to measure the absolute neurometabolite concentrations with an investigatorindependent method (Tebartz van Elst et al., 2014a,b).

#### Neurochemical Perspective

Our data compel us to conclude that the brains of adult ADHD patients are essentially normal from a neurochemical perspective. Of course, we cannot generalize this finding to regions of the brain other than those that we measured. Furthermore, we cannot conclude that this proposition is true for children with ADHD. It might be the case that the reported abnormalities in children were true and that the respective neurochemical abnormalities linked to ADHD have normalized during maturation. However, this interpretation is not supported by our study because most of the clinical symptoms of ADHD were still present in our adult patient sample. Alternatively, in this scenario, one could conclude that the reported neurochemical abnormalities are not necessarily linked to the presence of the clinical ADHD symptoms. However, it is important to recognize that neurochemical processes measured by MRS are only a part of brain functioning. They are based on anatomical structures, which are organized in fronto-basal loops. The structural dysfunction of the fronto-striato-thalamo-frontal circuits plays a key role in the pathogenesis of ADHD (Perlov et al., 2009). These circuits have a common anatomical organization, beginning in the prefrontal cortex, innervating the striatum and the pallidum/substantia nigra, and ending in projections back in the frontal brain (Tekin and Cummings, 2002). For example, the ACC subcortical circuit plays an important role in motivation. The model of circuits illustrates that different neurochemical or anatomical lesions may lead to the same symptoms and that vice versa the same symptoms can be induced by different lesions (Tebartz van Elst and Perlov, 2013).

#### Neuropsychiatric Research Perspective

The alternative interpretation involves a greater degree of skepticism about the scientific process as a whole. As in many other areas of neuropsychiatric research, initially promising findings cannot be replicated in large and methodologically sound studies. We reported relevant neurochemical findings regarding adult ADHD in both VOIs measured here. As illustrated in **Table 1**, the sample sizes of our recent studies compared well to those of other studies, and the methodology of these studies was also sound. However, we were not able to replicate our earlier findings – or those of others – in this large study. What can we conclude from this observation? First of all, it somewhat questions the notion that true abnormalities reported in children and adolescents might have normalized during the process of growing up, because in the area of adult research, we ourselves had reported significant findings in studies with a sample size of about 30 patients and controls that we are now unable to replicate (Perlov et al., 2007, 2010).

Altogether, the essentially negative findings of this large study support a more prudent approach to interpreting data from small studies. Recently, a series of papers addressed this problem in the context of biomedical research (Al-Shahi Salman et al., 2014; Chalmers et al., 2014; Chan et al., 2014; Glasziou et al., 2014; Ioannidis et al., 2014; Macleod et al., 2014). Particularly in imaging research, numerous studies have small and sometimes ill-defined patient samples. The issue of replication is often neglected. Positive findings from different studies are often used as evidence supporting the findings of the study of interest. However, the implicit negative results of many other studies are neglected. For example, we ourselves summarized all the previous MRS studies on ADHD (**Table 1**) and focused on their positive findings. However, every study that does not repeat the findings of previous studies must be regarded as a nonreplication. The fact that most papers focus on positive findings blurs this view. Meta-analyses are one way to pool respective data. However, as in this case, they tend to generate negative findings, particularly if a cross-meta-analytical perspective is taken. Our first meta-analysis of MRS findings in ADHD showed an increase in only the t-Cho signal in the striatum and right frontal lobe of children and in the ACC of adult patients (Perlov et al., 2009). A recent meta-analysis described only an increased NAA signal in the medial prefrontal cortex of children and showed no abnormalities in the adult ADHD group (Aoki et al., 2013). From a cross-meta-analytical perspective, we have to conclude that no specific patterns of MRS abnormalities emerged upon observation of the results of both papers.

#### The Nosology Problem

Another possible reason for a false-negative finding has to be considered at this point. ADHD is generally understood as a disorder of heterogeneous etiologies, which means that it is quite possible that as a result of generating large patient samples, different etiological ADHD subgroups will be examined within one study group. For practical reasons, it might well be the case that in generating large study samples, the likelihood of diversifying the study group increases from an etiological point of view. For example, when organizing large multicenter study groups, it is more likely that different diagnosticians will be involved in the study process even if the study protocol is, like ours, very prudent. For this reason, large studies might be more vulnerable to diverse underlying etiologies than studies with smaller samples generated by one or a few diagnosticians.

The issue of ADHD subforms and subtypes must also be considered. All possible secondary forms of ADHD were excluded from our study. Primary or secondary forms of ADHD and different ADHD subtypes might represent different pathophysiologies (this is comparable to autism spectrum disorder as a basic disorder) (Tebartz van Elst et al., 2013). We think that progress in neuropsychiatric research in general, and neurobiological ADHD research in particular, will be closely linked to the recognition of this nosology problem (Tebartz van Elst et al., 2006). If it is true that a purely clinically defined group of ADHD patients represents different etiologies and cerebral pathophysiologies, then the inclusion of different pathophysiologies within one study sample will necessarily lead to diverse and contradictory results in different samples (Tebartz van Elst et al., 2014a,b). The larger and more pathophysiologically


TABLE 5 | Subgroup analyses of spectroscopic findings in the anterior cingulate cortex and cerebellum of the inattentive and combined subtypes and controls (IU).

*SD, standard deviation; CI, confidence interval; iADHD, inattentive ADHD subtype; cADHD, combined ADHD subtype; Cre, creatine; t-Cho,* 

*phosphorylcholine* + *glycerylphosphorylcholine; Glx, glutamate* + *glutamine; NAA, N-acetylaspartate; mI, myo-Inositol.*

*a MANCOVA, multivariate analysis of covariance using the covariates age, IQ, and nicotine consumption.*

*bThe hyperactive subtype was added to the combined subtype for all subtype analyses because of the small group size.*

heterogeneous a single study sample is, the more likely it is that true signal differences in subgroups of the sample will statistically counterbalance each other and result in normal average signals in calculations of the means of the overall group. Further studies and conceptual work will have to tackle this problem.

### Conclusion

To date, this is the largest MRS study examining cerebral neurochemistry in ADHD. We were able to demonstrate an essentially normal neurochemical profile of the ACC and the cerebellum in adult patients without current comorbid psychiatric disorders. We were unable to replicate earlier positive findings. Such previous positive findings might have been linked to small sample size, psychiatric comorbidity, or medication effects. However, the nosology problem of psychiatry (i.e., disorder categories comprise patient subgroups with different pathophysiologies) also has to be considered when interpreting the negative findings of large neuropsychiatric study samples.

# Funding

This study was funded by the German Federal Ministry of Science and Education (BMBF) (ADHD-NET: 01GV0605, 01GV0606).

# Acknowledgments

We thank all the members of the COMPAS study group at the study sites in Würzburg (Würzburg University Hospital, Department of Psychiatry, Psychosomatics, and Psychotherapy; Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy), Berlin (Charité – University Medicine, Campus Benjamin Franklin, Department of Psychiatry and Psychotherapy), Mannheim (Central Institute for Mental Health, Clinic for Psychiatry and Psychotherapy), Homburg (Saarland University Hospital and Saarland University Faculty of Medicine, Institute for Forensic Psychology and Psychiatry), Essen (LVR-Hospital Essen, Department for Psychiatry and Psychotherapy; University of Duisburg- Essen, Faculty of Medicine), Mainz (University Medicine Mainz, Clinic for Child and Adolescent Psychiatry and Psychotherapy), and Lörrach (St. Elisabethen Krankenhaus, Department of Child and Adolescent Psychiatry and Psychotherapy). Moreover, we thank the members of the Clinical Trials Unit at University Medical Center Freiburg (Director: R. Bredenkamp), as well as the members of the Independent Data Monitoring Committee (Prof. Dr. H. Remschmidt, Prof. Dr. G. Wassmer, PD Dr. N. Wodarz). Independent supervision was carried out at the Institute for Psychology at Freiburg University (Dr. U. Frank) in cooperation with colleagues in private practice (Dr. F. Mayer-Bruns and K. Schehr). The health economic evaluation was planned and conducted by Prof. Dr. M. Schlander, Institute for Innovation and Valuation in Health Care, INNOVAL HC, Wiesbaden, Germany.

# Supplementary Material

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fnbeh.2015.00242

# References


with proton magnetic resonance spectroscopy. *Acad. Radiol.* 14, 1029–1035. doi:10.1016/j.acra.2007.05.017


**Conflict of Interest Statement:** Peter Goll has received travel grants from GSK, Boston Scientific, and Otsuka Pharma. Esther Sobanski has received speakers' honoraria from Medice, Eli Lilly, and Novartis; she is a member of the advisory boards of Medice, Shire, and Eli Lilly; and has performed phase III studies and IITs with Medice, Novartis, Janssen Cilag, and Eli Lilly. Alexandra Philipsen served on advisory boards, gave lectures, performed phase III studies, and received travel grants within the last three years from Eli Lilly, Janssen-Cilag, Medice Arzneimittel Pütter GmbH, Novartis, and Shire. Alexandra Philipsen is also the author of several books and articles on psychotherapy published by Elsevier, Hogrefe, Schattauer, Kohlhammer, and Karger. Ludger Tebartz van Elst served on advisory boards, gave lectures, and received travel grants within the last 3 years from Eli Lilly, Janssen-Cilag, Novartis, Shire, UCB, GSK, Servier, Janssen, and Cyberonics. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

*Copyright © 2015 Endres, Perlov, Maier, Feige, Nickel, Goll, Bubl, Lange, Glauche, Graf, Ebert, Sobanski, Philipsen and Tebartz van Elst. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Association of oxytocin level and less severe forms of childhood maltreatment history among healthy Japanese adults involved with child care

#### Rie Mizuki and Takeo Fujiwara\*

*Department of Social Medicine, National Research Institute for Child Health and Development, Tokyo, Japan*

#### Edited by:

*Gregory B. Bissonette, University of Maryland, USA*

#### Reviewed by:

*Seth Davin Norrholm, Emory University School of Medicine, USA Gabriela Jose Martins, Champalimaud Neuroscience Programme, Portugal*

#### \*Correspondence:

*Takeo Fujiwara, Department of Social Medicine, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya-ku, Tokyo 157-8535, Japan fujiwara-tk@ncchd.go.jp*

> Received: *17 February 2015* Accepted: *12 May 2015* Published: *23 June 2015*

#### Citation:

*Mizuki R and Fujiwara T (2015) Association of oxytocin level and less severe forms of childhood maltreatment history among healthy Japanese adults involved with child care. Front. Behav. Neurosci. 9:138. doi: 10.3389/fnbeh.2015.00138* Background: Oxytocin (OT) is known to play a role in stress regulation. The association between childhood maltreatment history and neuropeptide OT concentration is inconsistent due to the varying degrees of severity of childhood maltreatment, among other contributing factors. Less severe forms of childhood maltreatment history might enhance OT concentrations as a response to coping with social stress within the family. The purpose of this study is to investigate the association between less severe forms of childhood maltreatment history and OT concentrations among healthy adults.

Method: Eighty adults (49 women and 31 men) with 18- to 48-month-old children were recruited using a snowball sample in Tokyo, Japan. Urine samples were collected for OT measurement. Less severe (low and moderate) childhood maltreatment history, including physical abuse, physical neglect, emotional abuse, emotional neglect, and sexual abuse, was assessed using the self-report questionnaire, the Childhood Trauma Questionnaire.

Results: Less severe physical abuse was significantly associated with higher OT concentration after adjusting for age (*p* = 0.014). Also, less severe forms of physical abuse were independently significantly associated with higher OT concentration after controlling for other types of childhood maltreatment (*p* = 0.027). A positive dose-response association between the number of less severe childhood maltreatment types and OT concentration was observed (*p* = 0.031).

Conclusion: A history of less severe forms of childhood physical abuse was associated with higher OT concentration in healthy adults. Poly-victimization of several types of less severe childhood maltreatment was also associated with higher OT concentrations. Less severe forms of childhood maltreatment might enhance OT concentrations in order to cope with social stress.

Keywords: oxytocin, childhood abuse history, child abuse, child maltreatment, social stress

# Introduction

Oxytocin is a neuropeptide that plays an important role not only in social bonding but also in the regulation of stress and anxiety (Carter et al., 1992; Kormos and Gaszner, 2013; Peters et al., 2014). Stress reaction is usually formed at an early age of development in humans and animals. In rats, previous studies have reported that rats that were licked and groomed by a parent reacted better to stress, and showed an epigenetic change in the hippocampus-pituitary-adrenal (HPA) axis during infancy (Liu et al., 1997; Meaney, 2001). Further, rats that were licked and groomed by parents have also shown increased OT levels (Francis et al., 1999, 2000; Champagne et al., 2001). Similar results were also reported in humans, that is, the salivary OT concentrations of infants whose parents practiced responsive care, represented by high affect parent-infant synchrony, were higher than infants raised by parents who showed low affect synchrony (Feldman et al., 2010).

In the same context, childhood maltreatment, which can be considered as "the most visible and obvious indicator of dysfunctional parenting" (Holden, 2010), is associated with lower OT concentration in victimized children. Heim et al. (2009) reported that adult women with severe childhood maltreatment showed significantly lower OT concentrations in cerebrospinal fluid compared with women without a history of childhood maltreatment (Heim et al., 2009). Also, children who had been raised in neglectful institutional care in the first few years of life due to loss of parents had lower urinary OT concentrations (Wismer Fries et al., 2005). Furthermore, physically healthy adult men who experienced adverse life experiences from early childhood up to 13 years of age also showed lower plasma OT concentrations (Opacka-Juffry and Mohiyeddini, 2012). These findings can be interpreted as evidence that lower OT concentration might be associated with the child withdrawing from the stressor, that is, the caregiver who severely maltreats the offspring, because lower OT concentration can lead to a decrease in social behavior. However, strategies to deal with such stress for less severe forms of child maltreatment might be different, because of the role OT plays in the "tend-andbefriend" behavior of dealing with stress (Taylor et al., 2000). For example, Seltzer et al. (2014) reported that urinary OT concentrations after social stress exposure among maltreated girls were higher than the controls (Seltzer et al., 2014), suggesting that OT concentrations in maltreated girls exposed to social stress were enhanced to deal with this stress through more frequent social behaviors. Thus, we hypothesized that children who experienced less severe forms of maltreatment exhibit "tendand-befriend" behaviors to deal with stress, including trying to get along with their caregiver or seeking help from others, with enhanced OT concentration serving as one potential underlying mechanism.

Thus, the purpose of this study is to test the hypothesis that low and moderate childhood maltreatment history increases OT levels among healthy adults.

# Materials and Methods

#### Participants

The study was approved by the Ethics Committee of the National Institute for Public Health, and all participants signed informed consent forms prior to enrollment in the study. Details about participant recruitment and eligibility criteria have already been reported elsewhere (Fujiwara et al., 2012). In short, 81 participants (49 women and 31 men who were spouses of female participants) were recruited for the study in Tokyo, Japan using a convenience snowball sample. Snowball sampling is regarded as a suitable method to use in studies that, like this one, focus on a sensitive issue (Biernacki and Waldorf, 1981). The eligibility criteria restricted the sample to women and men with 18–48-month-old children. All women were married, were not breastfeeding at the time of the study, and were the child's main caregiver.

#### Procedure and Oxytocin Analysis

Procedures and oxytocin analysis methodology have previously been described in detail elsewhere (Fujiwara et al., 2012). Research coordinators visited participants' homes for approximately 1 h between 11 a.m. and 2 p.m. and sent questionnaires to participants in advance of the visit, which were collected during the home visit. We collected a 1-mL urine sample in a tube to which a 40 − µl aliquot of sodium citrate buffer (0.03 M sodium citrate, 25 mM EDTA, and 0.35 mM 1,10-phenanthroline) was added, and immediately stored the samples in a cooler box at 4◦C for a maximum of 2 h and then at −20◦C in the laboratory.

OT concentrations in the urine samples were measured by a competitive radioimmunoassay, as described elsewhere (Sudo et al., 1978). In brief, we created rabbit antiserum specific for human OT by immunizing a rabbit four times with recombinant human OT (ASKA Pharmaceutical. Co., Ltd., Tokyo, Japan) combined with water-soluble carbodiimide (Nakarai Tesque, Tokyo, Japan). Then, we decomplimented the urine sample at 56◦C for 30 min, and the supernatant was extracted after centrifugation (3000 rpm, 10 min, 4◦C). We designated the decomplimented sample and the same amount of <sup>125</sup>I-labeled OT (Perkin Elmer Life Sciences, Inc., Boston MA) for use in an assay tube (Shionogi, Tokyo, Japan). Then, we added rabbit anti-OT serum to each assay tube, followed by incubation for 2 days at 4◦C. Next, we added goat anti-rabbit IgG serum (ASKA Pharmaceutical. Co., Ltd., Tokyo, Japan) to each assay tube, followed by incubation for 1 day at 4◦C. After centrifugation, we measured the radioreactivity of the pellet by a gamma counter (Auto Well Gamma System ARC-1000M, Aloka, Tokyo Japan). The minimal detection limit of this assay was 3µU/ml (1µU of OT is equivalent to 1.776 pg) according to the standard curve. We performed all assays in duplicate, and the assay's intra- and interassay coefficients of variability were <14.2%. We standardized the concentration of OT in the urine according to the urinary creatinine concentration. We measured urinary creatinine using the alkaline picrate colorimetric method (modified Jaffe).

#### Childhood Maltreatment History

The Childhood Trauma Questionnaire (CTQ) is a 25-item self-report questionnaire that assesses five types of childhood maltreatment: physical neglect, emotional neglect, physical abuse, emotional abuse, and sexual abuse (Bernstein et al., 2003). The CTQ defined the severity of each maltreatment type as none, low, moderate, and severe by using corresponding cut-off scores. Low and moderate forms of childhood maltreatment used in the CTQ were defined as "less severe forms of maltreatment" in this study and were scored in a range between 8 and 12 for physical neglect, 10 and 17 for emotional neglect, 8 and 12 for physical abuse, 9 and 15 for emotional abuse, and 6 and 12 for sexual abuse, following the cut-off scores of the CTQ (Bernstein et al., 2003). Scores below these ranges were defined as no maltreatment, and scores above these ranges were defined as severe maltreatment (Bernstein et al., 2003).

One female participant reported a severe form of emotional neglect and was excluded from the analyses. Other severe forms of maltreatment were not reported. The sample was dichotomized into either having low and moderate childhood maltreatment history or no childhood maltreatment history.

#### Covariates

Participants' age, sex, and mental health status were potential covariates. Mental health status was measured with the Depression Anxiety Stress Scales (DASS) (Lovibond and Lovibond, 1995a). DASS was a 42-item self-report questionnaire consisting of three subscales for depression, anxiety, and stress. Participants answered the questionnaire using a 4-point Likert scale and responses were summed up to derive a total score for each subscale ranging from 0 to 42.

#### Statistical Analysis

First, correlation analyses among five different types of maltreatment were conducted. Second, t-tests were conducted to observe the impact of sex and age on OT. Third, bivariate regression analyses were performed to examine the association of OT with low and moderate childhood maltreatment history, in which Model 1 was adjusted for age, and Model 2 was adjusted for age and five different types of maltreatment (physical neglect, emotional neglect, physical abuse, emotional abuse, and sexual abuse). Fourth, regression analysis was conducted to assess the dose-response association between the number of types of maltreatment and OT concentration.

### Results

Sample characteristics are shown in **Table 1**. The mean age of participants was 36.2 years old with a standard deviation (SD) of 3.4. Over half of participants had only one child, more than 90% reported their health as good or better, and over 80% received a level of education consistent with some college or more. The mean scores of depression, anxiety, and stress scales measured by DASS were comparable to past studies with non-clinical samples (Lovibond and Lovibond, 1995b; Muto et al., 2011).

In this study, 37.5% of participants experienced low childhood maltreatment and 20% experienced moderate

#### TABLE 1 | Sample characteristics (n = 80).


*CTQ, Childhood Trauma Questionnaire.*

childhood maltreatment, where both low and moderate were defined as less severe forms of childhood maltreatment. Of these participants, 30.0% reported a history of less severe forms of physical neglect, 40.0% reported less severe forms of emotional neglect, 6.3% reported less severe forms of physical abuse, 8.8% reported less severe forms of emotional abuse, and 5.0% reported less severe forms of sexual abuse. In terms of the number of less severe forms of childhood maltreatment types experienced, 16.3% reported two types of maltreatment, and 7.5% reported three or more types of maltreatment.

Pearson's correlation analysis between each type of less severe form of childhood maltreatment indicated that physical neglect was significantly positively correlated with emotional neglect (r = 0.29, p < 0.05) (**Table 2**). Further, emotional neglect was significantly positively correlated with emotional abuse (r = 0.32, p < 0.05), and physical abuse was significantly positively correlated with emotional abuse (r = 0.26, p < 0.05). Other types of childhood maltreatment were not significantly correlated. Despite statistical significance, the values of these correlation coefficients between each type of less severe form of childhood maltreatment indicated that the correlations were weak.

TABLE 2 | Correlation matrix for less severe forms of maltreatment types.


*Significant coefficients at p* < *0.05 are in bold.*

#### TABLE 3 | Oxytocin concentration by sex and age group.


The association between OT and demographics is presented in **Table 3**. Men had a slightly higher OT concentration (mean = 112.8; SD = 42.1) compared with women (mean = 107.5; SD = 38.2), although the difference was not statistically significant. The mean OT concentration of the younger group (less than 36 years old) was lower (mean = 99.6; SD = 36.6) than the older group (mean = 116.5; SD = 39.6) at a marginal trend level (p=0.057). The association between mental health status and OT was found not to be significant in this sample. Thus, only age was included as a covariate in subsequent analyses.

The association between less severe forms of childhood maltreatment and urinary OT concentration is shown in **Table 4**. Results of the bivariate regression analyses indicated that the presence of less severe forms of physical abuse history was significantly positively associated with OT concentration (coefficient = 42.97; p = 0.016). The presence of any other type of maltreatment history did not result in statistical significance. Also, when age was adjusted in Model 1, less severe forms of physical abuse were still significantly positively associated with OT concentration (coefficient = 43.35; p = 0.014). Due to correlations among five maltreatment types, we simultaneously included those types in Model 2, as well as age, in order to observe the independent impact of each maltreatment type on OT concentration. Here, only less severe forms of physical abuse significantly impacted OT concentration, after ruling out the variance accounted for by other maltreatment types (p = 0.027).

The association between accumulated exposure to different types of maltreatment (i.e., poly-victimization) and OT concentration was analyzed (data not shown). Participants who experienced two, or three and more, types of less severe forms of maltreatment had 12.56 and 41.91µU/ml per creatinin g/L higher OT concentration, respectively, compared to the no-maltreatment group, and three and more types was statistically significant (p = 0.013). Moreover, p for trend was significant (p = 0.031), which suggested a dose-response association between the number of less severe forms of childhood maltreatment types and OT concentration (**Figure 1**).

The same analyses were stratified by sex. A stronger association was found among men; in the adjusted model, men who experienced three and more childhood maltreatment types showed 51.8µU/ml per creatinin g/L higher OT concentration compared to men with no childhood maltreatment history (p = 0.023). The association between number of childhood maltreatment types and OT was marginal among women.

#### Discussion

The current study showed that a history of less severe forms of childhood physical abuse was significantly associated with an elevated urinary OT concentration among healthy adults. A positive dose-response association was also revealed between the number of less severe forms of childhood maltreatment types and OT concentration. The current findings apply to a population sampled within Tokyo, Japan.

As OT concentration can be determined not only by childhood maltreatment, but also by other stressful events (Emeny et al., 2015), HPA reactivity (Cox et al., 2015), or inflammation (Carnio et al., 2006), the findings should be interpreted with caution. That is, we found a positive association between childhood maltreatment and OT levels, which may not be directly associated with or mediated by these unmeasured factors. Nonetheless, the finding is novel as it focuses on less severe forms of child maltreatment. This is inconsistent with past findings, which indicates inverse associations between OT concentrations and severe childhood maltreatment (Heim et al., 2009; Opacka-Juffry and Mohiyeddini, 2012), as well as an absence of the association (Wismer Fries et al., 2005). However, our results were supported by previous findings that showed social stress increased OT among people with a history of less severe forms of childhood maltreatment (Seltzer et al., 2014). Although the reasons for these inconsistent results are not clear, the current study sample is a healthy adult population with a history of less severe forms of child maltreatment in Japan. This differs from previous studies (Wismer Fries et al., 2005; Heim et al., 2009) in which samples included all adults with a history of severe forms of childhood maltreatment, such as a history of institutional care or interactions with child protective services. It has been suggested that the threat which provokes fear in humans could be divided into the categories of "threat" or "challenge" (Blascovich and Mendes, 2010). When childhood maltreatment is very severe, children may regard abusers as a "threat" which elicits a fight-or-flight response (Cannon, 1932), rather than viewing abusers as a "challenge" and attempting to resolve the situation by engaging in communication with the abuser (i.e., approach-oriented behavior). Thus, it could be speculated that exposure to severe maltreatment might reinforce the fight-or-flight response which could contribute to the downregulation of the OT system, as described in previous studies (Wismer Fries et al., 2005; Heim et al., 2009), while less


TABLE 4 | Regression coefficients of oxytocin by less severe forms of childhood maltreatment types.

*Significance at p* < *0.05 are in bold.* † *adjusted for age;* †† *physical neglect, emotional neglect, physical abuse, emotional abuse, sexual abuse, and age are adjusted.*

severe forms of maltreatment could foster up-regulation of the system.

Positive association between a history of less severe forms of childhood physical abuse and OT concentration could be explained by the "tend-and-befriend" response (Taylor et al., 2000). For children, running away from their abusive caregiver is accompanied by a significant risk of survival failure and is often an unrealistic option due to a child's limited resources. Thus, the OT system might be activated, which could promote approachoriented responses and help to maintain social engagement with their abusive caregivers (Blascovich and Mendes, 2010). The reason why the OT system remains at a higher level until adulthood is unknown; however, less severe forms of childhood maltreatment might be associated with attachment style or the marital relationship (Bailey et al., 2007), which is associated with OT concentration (Samuel et al., 2015).

Emotional and sexual abuse on the other hand can affect children differently. Literature suggests that various impacts are observed depending on the type of maltreatment. Contrary to physical abuse, which is associated with aggression from the caregiver, emotional abuse is associated with the caregiver's internalized problems, such as low self-esteem (Briere and Runtz, 1990; Mullen et al., 1996), anxiety, depression, and somatization (Spertus et al., 2003). Such internalized problems may not be recognized as a threat or challenge for children to avoid, and so children may simply accept and adapt to the abuse. In such cases, it is likely that the tend-and-befriend response (Taylor et al., 2000) may not be activated and OT may not be released to the same level as with physical abuse. Similarly, children who were sexually abused might be too young to understand that their engagement in sexual activities is a form of exploitation with potential harms (i.e., threat). As CTQ does not include the age of victimization, it is not possible to assess the age of sexual abuse victimization. Unlike physical abuse, children may not recognize emotional or sexual victimization as a stressor, and hence the OT system may not be duly activated by emotional or sexual abuse.

Child neglect, defined by a failure to adequately care for a child's physical and emotional needs, signifies an absence of or a diminished amount of parental engagement in the child's life (Strathearn, 2011). Unlike physical abuse that involves violence and provokes a strong sense of fear, acts of neglect may not provoke fear in children in the same way. Without fear, the stress response system cannot be activated and actions, such as running away from abusive caregivers or engaging with them in order to procure sufficient and sensitive care, cannot occur. If the stress responses of neglected children are not activated, a positive feedback loop of OT cannot be established.

The current data showed a gradient effect of polyvictimization for less severe forms of childhood maltreatment. To the best of our knowledge, this is the first study to report a dose-response association between a history of less severe forms of childhood maltreatment and OT concentration in adulthood. This result indicates that the impact of various, less severe forms of maltreatment accumulates, and OT concentration in adulthood is enhanced. Particularly, when people have experienced three or more types of maltreatment at low or moderate levels, their OT concentration in adulthood becomes higher by 41.91µU/ml per creatinin g/L (equivalent to 1 SD) compared to people with no history of maltreatment. It could be interpreted that a history of less severe forms of maltreatment might be related to higher OT levels regardless of maltreatment type, due to the accumulation effect of poly-victimization.

This study has several limitations. First, the sample size is small, and warrants further replication studies with larger samples, which could enable stratification by sex. Second, though urinary OT concentration as a proxy for central OT was used for its validity, other peripheral or central samples (i.e., cerebrospinal fluid) and repeated measurements are needed. Further, urinary OT concentration was measured only once. As we could not collect urine samples across several days or at different times of the day, we were unable to investigate if the fluctuation of OT concentration was due to the current family environment, such as the marital relationship. Third, the validity and reliability of retrospective self-reporting on childhood trauma is debatable due to possible underreporting and recall bias, which may lead to significant measurement errors. Since OT concentration may influence memory retrieval and self-perception, people with higher OT concentrations might have reported maltreatment history more frequently than those with lower OT concentration (Bartz et al., 2010; Cardoso et al., 2012). Further prospective studies that measure the baseline childhood maltreatment history and OT concentration via follow-up are necessary. Fourth, given the study's crosssectional design, the current results do not indicate any causal relationship, i.e., those who showed higher OT concentrations might be more likely to recall and report a history of less severe forms of child maltreatment. Fifth, oxytocin receptor gene variations are not assessed in the current study. In addition to the level of OT secreted peripherally or centrally, further examination of the OT receptor gene and its expression is crucial

### References


Cannon, W. B. (1932). The Wisdom of the Body. New York, NY: Norton.

to better our understanding of how the OT system functions (Nomura et al., 2003). Sixth, as all participants in the sample were married, which might be associated with both a history of child maltreatment and OT concentration (McCauley et al., 1997), this may preclude the generalizability of the findings to unmarried adults.

This study also provided several tentative implications. It could be speculated that the severity of childhood maltreatment history has an important impact on OT concentrations in adulthood. Experiences of a certain degree of social stress with parents in childhood could facilitate sensitive interactions with others and social engagement. Although the mechanism is unknown, the current study suggests the importance of measuring the severity of childhood maltreatment to interpret the OT concentration.

In conclusion, less severe forms of childhood physical abuse history were associated with higher OT concentrations among healthy adults in Japan. Poly-victimization among participants with a history of less severe forms of childhood maltreatment was also associated with a higher OT concentration. Further study is needed to elucidate the mediating factors, such as stress coping skills, for the positive association between less severe forms of childhood maltreatment and OT concentration.

#### Acknowledgments

We thank Ms. Maiko Osawa and Hanako Fujiwara, who visited participants' homes and collected samples from participants. We are also grateful to all participants and their families for their involvement in our study. This research was supported by a Research Development Grant for Child Health and Development from the National Center for Child Health and Development (21 shi-10 and 24-12). We also thank Ms. Emma Barber for her editorial assistance.


in response to restraint stress in the hypothalamic paraventricular nucleus of oxytocin gene-deficient male mice. J. Neuroendocrinol. 15, 1054–1061. doi: 10.1046/j.1365-2826.2003.01095.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Mizuki and Fujiwara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

#### OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org