# **DECISION MAKING UNDER UNCERTAINTY**

**Topic Editors Kerstin Preuschoff, Peter N. C. Mohr and Ming Hsu**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2015 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-466-7 **DOI** 10.3389/978-2-88919-466-7

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **DECISION MAKING UNDER UNCERTAINTY**

Topic Editors:

**Kerstin Preuschoff,** University of Geneva, Switzerland **Peter N. C. Mohr,** Freie Universität Berlin, Germany **Ming Hsu,** University of California, Berkeley, USA

Most decisions in life are based on incomplete information and have uncertain consequences. To successfully cope with real-life situations, the nervous system has to estimate, represent and eventually resolve uncertainty at various levels. A common tradeoff in such decisions involves those beween the magnitude of the expected rewards and the uncertainty of obtaining the rewards. For instance, a decision maker may choose to forgo the high expected rewards of investing in the stock market and settle instead for the lower expected reward and much less uncertainty of a savings account. Little is known about how different forms of uncertainty, such as risk or ambiguity, are processed and learned about and how they are integrated with expected rewards and individual preferences throughout the decision making process.

With this Research Topic we aim to provide a deeper and more detailed understanding of the processes behind decision making under uncertainty.

# Table of Contents


Celia Gaertig, Anna Moser, Sonia Alguacil and María Ruz *134 Toward an Affective Neuroscience Account of Financial Risk Taking*

Charlene C. Wu, Matthew D. Sacchet and Brian Knutson

## Decision making under uncertainty

## *Kerstin Preuschoff 1,2\*, Peter N. C. Mohr 3,4,5\* and Ming Hsu6,7\**

*<sup>1</sup> Laboratory of Computational Neuroscience, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland*


#### *Edited by:*

*Scott A. Huettel, Duke University, USA*

**Keywords: decision making, decision neuroscience, neuroeconomics, uncertainty, risk, contextual influences, situational influences, individual differences**

In our everyday life we often have to make decisions with uncertain consequences, for instance in the context of investment decisions. To successfully cope with these situations, the nervous system has to be able to estimate, represent, and eventually resolve uncertainty at various levels. That is, not only are there different forms of uncertainty with different consequences for behavior and learning but research indicates that the processing of uncertainty highly depends on situation and context. The present research topic includes both review and original research articles that seek to shed light on the neural processes underlying decision making under uncertainty with a particular focus on situational and contextual influences.

First, Bland and Schaefer (2012) review the diverse (and often overlapping) definitions of uncertainty. They identify three main forms—expected uncertainty (including risk), unexpected uncertainty and volatility—and review theoretical and empirical evidence that supports this dissociation. Several original research articles then aim to either directly compare different forms of uncertainty or to identify further dissociations within these forms. Payzan-LeNestour and Bossaerts (2012) systematically vary unexpected and estimation uncertainty to study what drives exploration (as opposed to exploitation). They report that humans both seek out new reward opportunities ("curiosity motive") and avoid the unknown ("cautiousness motive"), resulting in exploration and exploitation, respectively. O'Reilly (2013) addresses the same forms of uncertainty in the context of learning with a particular focus on how an organism should adapt their rate of learning in changing environments. Hansen et al. (2012) on the other hand show that decisions made under perceptual vs. categorical uncertainty are differentially affected by prior knowledge such that prior knowledge increases visual cortical activity when uncertainty is driven by the sensory stimulus itself rather than at the cognitive level.

The next set of papers explores situational and contextual aspects of expected uncertainty. First, Studer et al. (2012) demonstrate that neural responses in a distributed network of choice under risk increase when subjects actively choose a risky gamble as opposed to being passively exposed to risk when a computer chooses that gamble. Kim et al. (2012) study what information decision makers attend to when either choosing between two lotteries or betting on a single lottery. Using eye-tracking data they observe task-dependent attentional shifts from probabilities to amounts which may influence the (neural) computation of value. Consequently, individuals often chose options with higher probabilities but place higher bids on options with higher amounts. Schönberg et al. (2012) used the Balloon Analog Risk Taking task to study the neural network underlying naturalistic risk-taking. They find that brain activity in a network previously related to risk increases as individuals continue to inflate a balloon thus, increasing their risk—while activity in a value-related brain region decreases at the same time. Levin et al. (2012) then review the literature on how risk processing differs between the gain and loss domain. They argue that different neural systems indicate different neural and psychological processes for risk-taking in gains and losses. Finally, Heilbronner and Hayden (2013) round off this set of papers by providing an account of riskseeking behavior. While risk-seeking is usually observed in only a minority of human study participants, it is the dominant form of risk preference observed in monkey studies. Heilbronner and Hayden review the literature on this phenomenon and argue that monkeys aren't risk-seeking *per se* but are driven toward risk-seeking by experimental design and training and that under similar conditions rats and humans would behave the same way.

Finally, a third set of papers represents an increasingly fertile area of research by connecting risk-taking to the social contexts and affective processes underlying behavior. Tang et al. (2011) report that socially anxious individuals demonstrate decreased risk aversion and that the degree of social anxiety correlates with activity in anterior insula. Jung et al. (2013) compare the number of risky choices participants made for themselves or for others. They find that at low probabilities subjects are less risk taking for own decisions as opposed to high probabilities where the effect is reversed. This difference in preferences toward risk is underlined by partially distinct neural networks that are recruited when choosing for oneself or for others. Using a model-based approach, Zhu et al. (2012) connect social risk and learning, and demonstrate that age-related differences in social learning can be succinctly captured by a set of models widely used in economics. Gaertig et al. (2012) use an ultimatum game to show that positive social information about the proposer increases acceptance rates by the responder. This effect was further enhanced by the presence of uncertainty. Finally, Wu et al. (2012) provide an affective neuroscience account of decision making under risk thereby connecting the quantitative approach of economic and

financial theories with the psychological approach which focuses on emotion and cognition.

In sum, the papers presented in this research topic demonstrate several points: First, to fully understand decision making under uncertainty one has to first dissociate different forms of uncertainty. Each form impacts behavior and learning in a different way (**Figure 1**). Second, choices under each form of uncertainty can itself be impacted by situational and contextual factors. Third, social context is an important source of uncertainty that is often driven or influenced by affective processes. We can further contend that risk remains the most popular and most powerful form of uncertainty for studying choice under uncertainty. The quantitative framework provided by choice under risk allows the careful study of the impact of situational and contextual factors on preferences and choice. However, as most situations in real life are infused with unexpected uncertainty and volatility rather than expected uncertainty (risk), future research will show how the factors identified in this issue influence other forms of uncertainty, to which degree common mechanism exist and how they can account for the various influences identified so far.

#### **REFERENCES**


*Received: 22 October 2013; accepted: 30 October 2013; published online: 20 November 2013.*

*Citation: Preuschoff K, Mohr PNC and Hsu M (2013) Decision making under uncertainty. Front. Neurosci. 7:218. doi: 10.3389/fnins.2013.00218*

*This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2013 Preuschoff, Mohr and Hsu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Different varieties of uncertainty in human decision-making

## *Amy R. Bland1\* and Alexandre Schaefer <sup>2</sup>*

<sup>1</sup> Neuroscience and Psychiatry Unit, University of Manchester, Manchester, UK

<sup>2</sup> Psychology Department, University of Durham, Durham, UK

#### *Edited by:*

Peter N. C. Mohr, Freie Unversität Berlin, Germany

#### *Reviewed by:*

Bruno B. Averbeck, National Insitute of Mental Health, USA Philippe N. Tobler, University of Zurich, Switzerland

#### *\*Correspondence:*

Amy R. Bland, Neuroscience and Psychiatry Unit, University of Manchester, G9.07 Stopford Building, Oxford Road, Manchester, UK. e-mail: amy.bland@manchester.ac.uk

The study of uncertainty in decision-making is receiving greater attention in the fields of cognitive and computational neuroscience. Several lines of evidence are beginning to elucidate different variants of uncertainty. Particularly, risk, ambiguity, and expected and unexpected forms of uncertainty are well articulated in the literature. In this article we review both empirical and theoretical evidence arguing for the potential distinction between three forms of uncertainty; expected uncertainty, unexpected uncertainty, and volatility. Particular attention will be devoted to exploring the distinction between unexpected uncertainty and volatility which has been less appreciated in the literature. This includes evidence mainly from neuroimaging, neuromodulation, and electrophysiological studies.We further address the possible differentiation of cognitive control mechanisms used to deal with these forms of uncertainty. Finally, we explore whether the dual modes of control theory provides a theoretical framework for understanding the distinction between unexpected uncertainty and volatility.

**Keywords: uncertainty, unexpected uncertainty, volatility, decision-making**

## **INTRODUCTION**

Uncertainty is a common feature of many every day decisions. Uncertainty typically arises in a situation that has limited or incalculable information about the predicted outcomes of behavior (Huettel et al., 2005). Successfully detecting, processing and resolving uncertainty is important to successful adaptive behavior. Recent years have seen a growing body of research dedicated to exploring the brain mechanisms which underlie our choices during conditions of uncertainty. However, it is becoming clear that "*uncertainty*" is not comprised of a single dimension. More recent evidence is beginning to differentiate neural correlates involved in estimating, representing, and resolving different forms of uncertainty. For example, studies have demonstrated separable neural correlates of *reward expectancy* and *variance* (Preuschoff et al., 2006; Tobler et al., 2007), *reward probability* and *magnitude* (Knutson et al., 2005), and *ambiguity* and *risk* (Hsu et al., 2005; Huettel et al., 2006). A major contribution of this work has been a better understanding of how uncertainty can be induced by different variables in the decision-making environment. However, an important form of uncertainty which has received less attention is uncertainty induced by unexpected changes in learned Stimulus-Response-Outcome (S-R-O) contingences, often referred to as "unexpected uncertainty" or "volatility." However, as we will discuss below, unexpected uncertainty and volatility do not necessarily refer to the same phenomenon. Therefore, we review theoretical and empirical arguments supporting a potential distinction between three different forms of uncertainty: *expected uncertainty, unexpected uncertainty*, and *volatility*.

## **DISTINCT VARIETIES OF UNCERTAINTY**

Successful decision-making relies on one's ability to form a stable representation of the underlying S-R-O rules learned from previous experience of gains and losses (e.g., Sutton and Barto, 1998; Ridderinkhof et al., 2004; Seymour et al., 2007). As such, agents can learn that a specific association between a stimulus (S) and a response (R) is linked with a positive or negative outcome (O). For instance, we may choose to enter (R) a particular restaurant (S) if we have previously found that it serves our preferred dish (O). Therefore through learning these associations between a *Stimulus* (restaurant), a *Response* (enter), and its positive or negative *Outcome* (preferred dish) we can guide future decision-making in order to choose the *Response* which will most likely lead to a rewarding *Outcome*. When faced with this kind of decision, an agent has a prediction or expectation of the probability of an outcome. This is derived from the recent history of outcomes of that choice (Sutton and Barto, 1998). Therefore an agent must have the ability to learn these S-R-O relationships and the likelihood to which they occur in order to make the most optimal choices. If we take the example above, our behavioral choice may be caused by previous experiences in which we learned that our preferred dish is available 8 out of 10 visits to that restaurant.

One of the most frequent methods used to manipulate uncertainty usually involves the systematic variation of the probability of these learned S-R-O contingencies. Using the example above, if we begin to learn that our preferred dish is available only 6 out of 10 visits, this increases uncertainty about the potential outcome (i.e., preferred dish) if we choose to enter this particular restaurant. In other words, when an agent is faced with a two-options choice, uncertainty is often said to be maximal when the probability of obtaining a reward linked to any of the two options is *p* = 0.5 but absent at the two extremes (probability = 1 and probability = 0; e.g., Fiorillo et al., 2003). Many studies have therefore used variations of a 75/25% S-R-O probability to create certain environments and 50/50% probability to create uncertain environments (Volz et al., 2003; Paulus et al., 2004; Huettel et al., 2005; Krain et al., 2006; Cohen et al., 2007b; Polezzi et al., 2008). It is also

possible to explore varying degrees of uncertainty (e.g.,Volz et al., 2003; Huettel et al., 2005). Typically in these studies, participants are shown cues which are probabilistic predictors of a given outcome (e.g., a red triangle that predicts the occurrence of a reward on 80% of trials). Uncertainty in these paradigms is induced by lowering the predictability of the learned stimulus-response (S-R) association being rewarded (O). For instance, varying degrees of uncertainty may include 100, 90, 80, 70, 60, or 50% whereby 50% is the most uncertain and 100% being the least uncertain. Furthermore, if the predictability goes below 50% then uncertainty will decrease again, i.e., 40, 30, 20, 10, 0%.

A wealth of literature has begun to elucidate how the brain estimates, represents, and resolves this form of uncertainty which is induced by varying levels of probability. Neuroimaging evidence indicates that the DLPFC (Paulus et al., 2002; Huettel et al., 2005), posterior parietal cortex (Volz et al., 2003; Huettel et al., 2005), anterior cingulate cortex (ACC; Elliott and Dolan, 1998; Critchley et al., 2001; Stern et al., 2010), orbito-frontal cortex (OFC; Goel and Dolan, 2000; Critchley et al., 2001; Hsu et al., 2005; Tobler et al., 2007), and amygdala (Hsu et al., 2005) are involved in processing uncertainty. Electrophysiological evidence points to modulation of the P3, a positive going potential peaking around 300 ms post stimulus onset suggesting greater positivities are associated with greater uncertainty (Duncan-Johnson and Donchin, 1977; Donchin and Coles, 1988; Polich, 1990).

Importantly however, uncertainty can also be induced by unexpected changes in S-R-O contingencies, above and beyond the current S-R-O probability levels. For instance, using the example above, we may choose to enter a particular restaurant if we have previously found that our preferred dish is available 8 out of 10 visits to a particular restaurant. However, uncertainty could be induced if this S-R-O contingency suddenly changes because the usual kitchen chef was fired and replaced by another chef with different menu preferences which would take the "preferred dish probability" to 0.2 on that week. In this case*,* the choice of available dishes in that restaurant right after the replacement of the chef will be uncertain because it can no longer be predicted by past experience. Therefore uncertainty can be induced not only by lowering the probability of S-R-O contingencies, but also by fundamental changes in these contingencies that forces a modification of our previous beliefs.

More recent approaches to uncertainty have begun to establish that the two forms of uncertainty illustrated above refer to two distinct processes. Particularly, uncertainty can arise from (a) the stochasticity inherent in the decision-making environment (e.g., the stable probability of reward where an agent can learn that a stimulus predicts rewards on 80% of trials is less uncertain than a situation where this probability is set at 50%), and (b) from unexpected and fundamental changes in the S-R-O contingencies of the environment that invalidate prediction based on previous experience (Yu and Dayan, 2005; Courville et al., 2006; Behrens et al., 2007; Doya, 2008; Rushworth and Behrens, 2008; Krugel et al., 2009; Nassar et al., 2010; Payzan-LeNestour and Bossaerts, 2011). The former is usually referred to as*expected uncertainty* (Yu and Dayan, 2005) or *Feedback Validity* (e.g., Bland and Schaefer, 2011), and the latter is often referred to as *unexpected uncertainty* (Yu and Dayan, 2005).

Recent developments suggest that *volatility,* has also to be considered (Behrens et al., 2007; Bland and Schaefer, 2011). Volatility can be defined as a variation in the frequency of changes in existing S-R-O contingencies across time. In our example above, a stable situation (low volatility) is attained when our preferred dish is served in our chosen restaurant 8 days out of 10 during an entire year. However, a volatile situation can arise if the manager of the restaurant decides to dynamically change the menu several times during the year. In such case, the "preferred dish probability" will frequently change (e.g., 0.9 in the first week, 0.2 in the second week, 0.7 in the third week, etc.). In this case, the dynamic changes in S-R-O contingencies will constrain agents to continually update their representation of the environment in order to obtain accurate prediction levels. Therefore volatility and unexpected uncertainty can be distinguished by the frequency of contingency changes. Unexpected uncertainty is characterized by rare unpredicted changes in underlying S-R-O rules, whereas high volatility is characterized by frequent occurrences of fundamental changes in S-R-O rules. In addition, it is important to note that a high frequency of changes may potentially cause agents to learn that changes occur rapidly. Therefore, volatility can be *expected* by decision-making agents.

In summary, three distinct forms of uncertainty can be identified: (1) Expected uncertainty: S-R-O rules learned from past events are weak predictors of the outcomes of future actions, and this unreliability is known and stable. (2) Unexpected uncertainty: a rare fundamental change in the environment which invalidates existing S-R-O rules that are no longer able to accurately predict the outcomes of our actions. (3) Volatility: frequent changes in the environment which require a constant updating of S-R-O rules1.

#### **THE MODEL OF YU AND DAYAN (2005)**

Yu and Dayan (2005) have proposed a distinction between *expected* and *unexpected* forms of uncertainty. Yu and Dayan employed a task involving a set of arrows pointing to the left or right hand side of a screen. The directions of the colored arrows are randomized independently of each other on every trial, but one of them, the *cue*, specified by its color, predicts the location of the subsequent target (a light bulb) with a significant probability whilst the rest of the arrows are irrelevant distracters. The color of the cue arrow (i.e., the "relevant" color which generally predicts the location of the light bulb) persists over many trials, defining a relatively stable context. However, the relevant cue color can suddenly change without informing the subject. According toYu and Dayan's (2005)

<sup>1</sup>Although the main focus of this article was on human decision-making, these three forms of uncertainty can also potentially be encountered in animals. For instance, an animal may learn that pressing (response) a blue lever (stimulus) is paired with food (outcome) for 7 out of 10 lever presses. It is reasonable to expect that through learning, the animal might form a representation of the expected amount of error (30%) in this S-R-O contingency (expected uncertainty). If the blue lever predicts food delivery for only 2 out of 10 lever presses, there will be a fundamental change in the contingencies that were previously guiding behavior (unexpected uncertainty), and the animal must adapt to this new situation (probably through the exploration of other levers present in the environment). However, if the association between the lever press and the food reward is constantly changing then the environment becomes volatile and in order to adapt, an optimal solution would be for the animal to form a representation of the fact that these S-R-O contingencies are likely to frequently change.

influential theory, expected uncertainty arises from known unreliability of predictive relationships within a familiar environment (e.g., learning that the relevant color predicts the location of the light bulb on 80% of trials) whereas unexpected uncertainty is induced by fundamental changes in the environment that produce sensory observations strongly violating expectations (e.g., the previously relevant color no longer predicts the location of the target). The former has been equated with environmental stochasticity in an otherwise stable S-R-O relationship (Nassar et al., 2010). This stochasticity is analogous to uncertainty induced by manipulating the predictive value of decision cues. So an agent can learn that a cue predicts a reward on 80% of trials and so on 20% of trials the outcome is not a valid predictor of the S-R-O relationship. This creates a level of expected uncertainty in a familiar environment which can be thought of as the expected amount of error. Indeed, an agent learns to expect that there will be a certain amount of uncertainty when making their decision through sampling the environment. In other words, expected uncertainty remains the same as long as the 20–80% contingencies are maintained, but unexpected uncertainty increases temporarily during an uncued reversal from 80 to 20%.

Unexpected uncertainty arising from fundamental changes in learned predictive relationships should signal for a revision of an agent's belief about the best course of action. Unexpected uncertainty must therefore require a mechanism for suppressing potentially outdated expectations and encouraging faster adaptation to new S-R-O contingencies (Dayan and Yu, 2002). Indeed, learning rate parameters tend to increase during periods of unexpected uncertainty (Yu and Dayan, 2005) and volatility (Behrens et al., 2007; Nassar et al., 2010). In this way, fundamental changes in S-R-O contingencies increase uncertainty, and speed up subsequent learning, by making historical outcomes irrelevant and new outcomes influencing beliefs strongly (Courville et al., 2006; Nassar et al., 2010). Furthermore, surprise induced by changes in S-R-O contingencies can enhance the speed of learning whereas random variation under constant probabilities (as with a sequence of coin flips) will not be surprising (Courville et al., 2006).

Taken together, these findings highlight the importance of considering different forms of uncertainty and how they interact to produce adaptive behavior. Importantly, an agent must possess the neural and cognitive mechanisms to detect if an S-R-O contingency has changed by representing the probabilistic chance that an error is caused by inherent stochasticity. This parameter is crucial for determining a contingency change. For instance, during everyday decision-making, there is often only a probabilistic chance (rather than a certainty) of success therefore the lack of reward on a particular occasion may not necessarily signal the need to switch to an alternative course of action (Kennerley et al., 2006). Therefore, when a participant responds according to the learned S-R-O rule and receives negative feedback, they must possess the ability to infer whether the erroneous response is due to the inherent stochasticity of the task or whether the S-R-O rule has fundamentally changed. Therefore a changing world requires a mechanism which will allow the successful detection and adaptation to both forms of uncertainty. We will discuss in the remainder of this article some mechanisms potentially involved in this adaptation. Most research to date has conceptualized uncertainty as variations in expected uncertainty (as well as slightly different forms of uncertainty, such as ambiguity and risk). Although unexpected uncertainty has received less attention, there is now a growing body of research that has tackled this phenomenon. In addition, the potential distinction between unexpected uncertainty and volatility has not received much attention, and both concepts tend to be somewhat confounded in the literature. We will therefore review existing evidence on unexpected uncertainty, and we will also review the possibility that specific cognitive strategies might be employed for volatility which are not necessarily employed for unexpected uncertainty.

## **COMPUTATIONAL MODELING OF UNEXPECTED UNCERTAINTY AND VOLATILITY**

Modeling human behavior using computational approaches has provided some important insights into the potential mechanisms involved in decision-making under uncertainty. Such behavior can be modeled by Bayesian algorithms (Behrens et al., 2007; Nassar et al., 2010; Mathys et al., 2011). Indeed, Bayesian statistical theory formalizes the notion that optimal inference and learning depend critically on representing and processing the various sorts of uncertainty associated with a behavioral context (Yu and Dayan, 2005). In the specific case of volatility, it has been suggested that humans adapt to a volatile decision-making environment following Bayesian rules (Behrens et al., 2007; Nassar et al., 2010). Particularly, Behrens et al. (2007) showed that using an ideal Bayesian model, human participants can optimally assess volatility and adjust decision-making accordingly to produce the most advantageous future outcomes. In Behrens et al.'s (2007) study, subjects carried out a one-armed bandit task in which they had to choose between blue and green stimuli. Subjects underwent trials where the probability of a blue outcome was 75% (a certain/stable environment) and trials where reward probabilities switched between 80% blue and 80% green every 30 or 40 trials (an uncertain/volatile environment). This study illustrated how human participants repeatedly combine prior and subsequent information as data accumulates, even when faced with a rapidly changing environment by continually tracking the statistics of the environment to assess the salience of every new piece of information. Behrens et al.'s fMRI data suggests the BOLD activity in the ACC might reflect a Bayesian estimate of the environment's volatility during a monitoring stage, i.e., when outcomes are being evaluated in order to regulate current beliefs about the underlying S-R contingencies of the environment. This model also suggests that the ACC might encode how much influence feedback should give to subsequent decisions, with more recent outcomes being more salient in volatile contexts (Rushworth and Behrens, 2008).

Under a Bayesian framework, unexpected observations increase uncertainty whereby a sustained level of such uncertainty results in a high estimate for volatility, which in turn leads to a high learning rate. Indeed, Behrens et al. (2007) showed that the learning rate for human participants was adjusted depending on the estimate of volatility. In situations where the S-R-O rules are changing, new information has more influence. This is because looking too far back in the history of rewarded outcomes is of little use if there has been a recent fundamental change in S-R-O contingencies. This can make prediction more difficult and thus new outcomes have a large impact on future expectations either because they are surprising (inducing a large prediction error) or because of uncertainty about current expectations (inducing a large learning rate; Rushworth and Behrens, 2008). Indeed, learning is enhanced when outcomes occur that are not fully predicted, then slows down as outcomes become increasingly predicted and ends when outcomes are fully predicted (Hollerman and Schultz, 1998).

Other studies have emphasized the idea that learning rates are flexibly adapted to best suit environmental statistics. In fastchanging or volatile situations subjects learn quickly from new outcomes thus a faster learning rate is required (Courville et al., 2006). Indeed, Nassar et al. (2010) accurately modeled subjects' behavior with a Bayesian model finding that the model adjusts the influence of newly experienced outcomes according to on-going estimates of uncertainty and the probability of a fundamental change in the process by which outcomes are generated. Thus outcomes that are unexpected because of a fundamental change in the environment carry more influence than outcomes that are unexpected because of persistent environmental stochasticity.

Together, evidence from computational models suggests that agents can act in a Bayesian fashion in order to track S-R-O contingencies and update these accordingly. In doing so, agents can represent the level of expected uncertainty and use this to detect unexpected changes in the decision-making environment. Importantly however a distinction between unexpected uncertainty and volatility has not been explicitly addressed in this literature. Indeed, there appears to be differences in how these two forms of uncertainty are computed. For instance, during unexpected uncertainty the agent must detect and adapt to the specific change in contingency. However in volatile contexts the agent must also represent the frequency in which S-R-O contingencies are changing. This is what Behrens et al. (2007) refer to as tracking volatility as a high order statistic of the environment.

## **NEUROMODULATORS ASSOCIATED WITH UNCERTAINTY**

Acetylcholine (ACh) and Noradrenaline (NA) may be critical neurotransmitters involved in signaling expected and unexpected sources of uncertainty (Phillips et al., 2000; Bouret and Sara, 2005; Yu and Dayan, 2005; Preuschoff et al., 2011;Avery et al., 2012). Particularly, ACh is said to signal expected uncertainty due to known unreliability in the behavioral context whereas NA is said to signal unexpected uncertainty arising from fundamental changes in the S-R-O contingencies. Evidence that ACh is crucial in expected uncertainty comes from data that ACh varies inversely with the level of estimated cue validity (Witte et al., 1997; Phillips et al., 2000; Sarter and Parikh, 2005;Yu and Dayan, 2005). This cue validity represents the probability of the cue being correct, e.g. the cue is a valid predictor of the S-R-O rule on 80% of trials. This is typically constant over a whole experimental session and thus measures the stochasticity of the task. This suggests that ACh reports a form of expected uncertainty which can be learned through past experience of S-R-O relationships. Studies suggest that ACh increases in a sustained fashion for expected unreliability of the environment when attention needs to be maintained (Dalley et al., 2001). This implies that in order to grasp the predictive relationships of an environment, an agent must utilize a temporally sustained mechanism for estimating uncertainty.

It has been suggested that NA may signal unexpected uncertainty (Bouret and Sara, 2005; Yu and Dayan, 2005; Preuschoff et al., 2011; Avery et al., 2012). There is some empirical evidence supporting this notion. For instance, the prefrontal NA system, unlike the ACh system, is engaged by *novel* S-R-O contingencies, which is compatible with a role in mechanisms of plasticity and new learning (Dalley et al., 2001). Next, available evidence suggests that NA originates in the locus coeruleus (LC) where LC neurons fire phasically (opposed to tonically) and robustly to unpredicted changes in stimulus properties or reversal of S-R-O contingencies (Aston-Jones et al., 1997; Yu and Dayan, 2003; Bouret and Sara, 2004). More recent evidence has shown that NA signals unexpected uncertainty as measured by pupil dilation (Preuschoff et al., 2011). Indeed,Preuschoff et al. (2011) have shown that unexpected uncertainty is closely linked with pupil size and is dissociated from expected uncertainty. Pupil size is thought to correlate remarkably with NA in both animal and human studies (Rajkowski et al., 1993; Gilzenrat et al., 2010). Taken together, these observations suggest that the LC-NA system facilitates attentional and cognitive shifts in behavioral adaptation in changing environments (see Sara, 2009). NA levels, could therefore signal when expectations about our world need to be revised (Cohen et al., 2007a).

Although phasic bursts of NA activity are likely to signal unexpected uncertainty, volatility characterized by a high frequency of fundamental S-R-O changes may be signaled by tonically high levels of NA (Yu, 2007). Indeed, McClure et al. (2006) propose that increased long-term response conflict (induced by frequent changes in S-R-O contingencies) biases the LC toward a tonic NA firing mode to increase exploratory behavior. These authors suggest that increased tonic firing reflects increased environmental uncertainty. This tonic mode of LC functioning may therefore reflect volatility in the environment triggered by frequent changes in the underlying rules guiding behavior.

Taken together, psychopharmacological evidence suggests that unexpected uncertainty is linked with phasic bursts of NA which signal changes in S-R-O contingencies. However expected uncertainty shows a more tonic mode of ACh in order to temporally sustain past S-R-O contingencies and hence the expected level of stochasticity. Further, volatility could be signaled by tonic levels of NA as opposed to phasic bursts (McClure et al., 2006; Yu, 2011). Therefore an important distinction could be made between unexpected uncertainty and volatility in terms of their temporal characteristics of neuromodulation.

## **EXPLOITATION VERSUS EXPLORATION DILEMMA**

Some authors suggest that the distinction between expected and unexpected forms of uncertainty may be an important element in behavioral adaptation i.e., in choosing whether to *explore* or *exploit* the decision-making environment (Cohen et al., 2007a). The *exploitation* versus *exploration* dilemma suggests a trade-off between persisting in our current behavior (exploit) or selecting alternative options (explore) in reinforcement learning. For example,if we experience a poor quality meal at our preferred restaurant then we could choose to persist in our current behavior and continue to visit the restaurant on the assumption that the restaurant is still the best option given its good past record (exploitation). Alternatively, we may decide to explore other restaurants in search of a better dining experience (exploration). Indeed, the *exploitation* versus*exploration* trade-off is a fundamental challenge for the adaptive control of behavior (Cohen et al., 2007a).

Particularly relevant is that uncertainty may precede the decision to explore an alternative option or exploit the current situation (Daw et al., 2006; Cohen et al., 2007a; Frank et al., 2009). For example, the detection of unexpected uncertainty can be an important signal of the need to promote exploration and has a central role in the acquisition of adaptive behavior in environments that change (Daw et al., 2006; Cohen et al., 2007a). For instance, in a familiar, reliable environment with a stable level of expected uncertainty, there is no need for exploration (i.e., the restaurant chef works 8 out of every 10 days so we are likely to gain our preferred dish on 80% of visits, thus we are just*exploiting* knowledge learned from previous experiences). If we experience a bad meal which is a consequence of a brief absence of the chef then we may continue to visit this restaurant (exploit). In contrast, during unexpected changes in the environment that lead to a durable invalidity of our previous representations, one needs to take exploratory actions (Doya, 2008). For instance, if we experience a poor meal because the previous chef was fired and replaced by a less experienced chef, this unexpected uncertainty about future visits to the restaurant might promote our exploration of other restaurants. Therefore uncertainty-driven exploration is a potentially important facet of decision-making and adaptive behavior (Cavanagh et al., 2011).

Research has begun to show that trial-to-trial variations in response-locked frontal theta are related to unexpected uncertainty and are larger in individuals who use uncertainty to guide exploration (Cavanagh et al., 2011). In addition, empirical studies have begun to reveal mechanisms that animals may use to adapt to changes in the environment, by regulating the balance between exploitation and exploration. These studies appear to be converging on the view that neuromodulatory systems; in particular, ACh and NA, interacting with DA-mediated reinforcement learning mechanisms may play a critical role in unexpected uncertainty induced exploration (Cohen et al., 2007a). Indeed, recent studies find that shifts between task engagement (exploitation) and disengagement (exploration) affect the pupil response which is thought to index NA neurotransmission (Preuschoff et al., 2011). This is consistent with Yu and Dayan (2005) theory of unexpected uncertainty and the adaptive gain theory of LC-NA (noradrenaline) mediated explore/exploit behavior (Aston-Jones and Cohen, 2005).

Together this evidence suggests a close relationship between uncertainty and the adaptive control of behavior. Indeed, it appears likely that uncertainty, and particularly unexpected uncertainty signals a contextual change which promotes exploratory adaptive behavior. Conversely, by tracking past representations of S-R-O rules and measuring the stochasticity of the environment, one can represent a form of expected uncertainty which promotes exploitative behavior. The interaction of expected and unexpected forms of uncertainty is likely to drive behavior in an optimal manner. Therefore it may be the case that successfully adapting to uncertainty could depend upon the levels of expected uncertainty and the frequency of changes in S-R-O contingencies.

To our knowledge, the distinction between volatility and unexpected uncertainty has not been explicitly articulated from the perspective of exploitation/exploration behaviors. However, it is reasonable to think that volatility should be characterized by a state in which the need for sustained exploration is anticipated. Indeed, if volatility leads to the formation of a representation that an underlying S-R-O rule can frequently change, then this should enable decision-making agents to be prepared to engage in exploration in this type of contexts. A possible prediction is that exploratory behaviors following an S-R-O rule change would be more rapidly engaged in volatile contexts compared to situations where S-R-O changes are rare because the need for exploration is anticipated. Further research will be needed to examine this question.

## **COGNITIVE CONTROL**

As we have outlined, an emerging body of literature is beginning to demonstrate how different forms of uncertainty are processed. One aspect that has yet to be adequately addressed is the potential involvement of cognitive control processes in the resolution of uncertainty (Mushtaq et al., 2011). Indeed, the ability to rapidly and flexibly adjust behavior to changing environmental demands is a defining characteristic of *cognitive control* (Braver et al., 2003). Therefore successful adaptation to unexpected uncertainty may require the involvement of the dynamic and flexible engagement of cognitive control functions. Interestingly, different cognitive control strategies may be utilized to deal with different forms of uncertainty (i.e., expected uncertainty, unexpected uncertainty, and volatility). Particularly conflict monitoring mechanisms and working memory (WM) are two canonical instances of cognitive control processes that appear to be likely candidates for successful adaption to various forms of uncertainty (Bland and Schaefer, 2011; Mushtaq et al., 2011).

### **CONFLICT MONITORING AND WORKING MEMORY**

The conflict hypothesis (Botvinick et al., 2001; van Veen and Carter, 2002; Kerns et al., 2004) provides a theoretical framework that can be used to understand some of the interactions between uncertainty and cognitive control. According to the conflict hypothesis, adjustments in cognitive control are likely to occur during a high degree of response conflict (Botvinick et al., 2001). According to this hypothesis, response conflict occurs whenever two or more incompatible response tendencies are simultaneously active. For example, response conflict is high when a response must be withheld in contexts in which there is a pre-potent tendency to make an overt response (Nieuwenhuis et al., 2003). Therefore, a change in learned S-R-O contingencies might require inhibiting habitual behavior (e.g., learned from the previous S-R-O rule) following a negative outcome, and overriding it with new behavior adapted to the new rule. This type of behavioral adaptation is likely to rely on *conflict processing*, that is, the ability to efficiently arbitrate between two conflicting behavioral responses (usually a habitual response that needs to be overridden by a new response). Conflict processing is thought to be a key mode of cognitive control (Botvinick et al., 2001; Yeung and Cohen, 2006), and it is more often observed in tasks with a habitual context interrupted by rare high-conflict trials (Botvinick et al., 1999). Indeed, changes in learned S-R-O contingencies and hence unexpected uncertainty are likely to produce conflict and so unexpected uncertainty may be important in signaling the need for increased cognitive control in order to successfully adapt behavior (Mushtaq et al., 2011).

In addition to conflict monitoring mechanism, WM may also play an important role in successfully adapting to varying forms of uncertainty. WM is defined as a system providing temporary storage, manipulation and processing of information (Baddeley, 1992) and is kept on-line or available for immediate access by other cognitive processes (Awh and Jonides, 2001). WM has a key role in active maintenance and updating of information in order to allow task-relevant information to be utilized in a manner that directly biases on-going processing. This makes WM a likely candidate in decision-making whereby adaptive choices in an uncertain environment relies on tracking S-R-O contingencies and the ability to monitor and update for any changes in S-R-O associations. WM is particularly important in many tasks that require the active maintenance and updating of information in order to facilitate goal directed behavior (Owen et al., 2005). Therefore the concepts of WM and cognitive control may be closely linked with decision-making in situations where S-R-O changes might occur such as unexpected uncertainty or volatility. We will next review the link between cognitive control and different varieties of uncertainty from three perspectives: Neuroimaging studies (fMRI and ERP), models suggesting the existence of distinct modes of cognitive control (Koechlin et al., 2003; Braver et al., 2007) and neuromodulation studies.

### **NEURAL CORRELATES OF COGNITIVE CONTROL IN UNCERTAIN ENVIRONMENTS**

Neuroimaging evidence has demonstrated greater ACC activation in studies examining conflict and conflict monitoring (Carter et al., 1998; Botvinick et al., 2001). The error-related negativity (ERN), a negative deflection in the ERP waveform at the time of an erroneous response (e.g., Gehring et al., 1990) which also originates in the ACC (Dehaene et al., 1994) is thought to be an electrophysiological marker which underlies a conflict monitoring mechanism (Carter et al., 1998; Botvinick et al., 1999, 2001; Yeung and Cohen, 2006). In addition, the anterior N2, an ERP thought to be generated in the ACC, has also been shown to reflect the monitoring of response conflict (Nieuwenhuis et al., 2003; Yeung et al., 2004). Importantly, the N2 has been associated with volatility in a habitual environment (Bland and Schaefer, 2011). Bland and Schaefer (2011) presented participants with either a blue or red triangle which was associated with two possible responses. Participants had to learn the correct S-R-O rule (red triangle – response 1 = reward; blue triangle – response 2 = reward). In this task two contextual determinants of decision uncertainty were independently manipulated: *Volatility* (i.e., the frequency of changes in the S-R-O rules) and *Feedback validity* (i.e., the extent to which an S-R-O rule accurately predicts outcomes, synonymous with expected uncertainty). Bland and Schaefer (2011) demonstrated that frequent S-R-O rule changes in an otherwise predictable environment (where *Feedback validity* is high) was associated with a frontally based N2 component. This perhaps reflects the implementation of cognitive control through a mechanism suited to detecting conflict in learned S-R-O contingencies.

In relation to the conflict hypothesis, it has been suggested that the detection of conflict by the ACC leads to the delivery of trigger signals to systems specialized in implementing control (e.g., the prefrontal cortex, PFC). Support for this idea comes from evidence suggesting that conflict-related activity in ACC predicts a subsequent increase in PFC activity and corresponding adjustments in performance (Kerns et al., 2004). Specifically, the ACC is thought to play an essential role in the adjustment of executive control mechanisms governed by the PFC (Botvinick et al., 2001; Kerns et al., 2004; Brown and Braver, 2005; Egner and Hirsch, 2005; di Pellegrino et al., 2007; Mansouri et al., 2009). Given that unexpected uncertainty and volatility are characterized by environmental changes requiring the suppression or adjustment of existing S-R-O representations, these forms of uncertainty could then be seen as states that trigger conflict and therefore the cascade of processes leading to the implementation of cognitive control processes. In other words, these forms of uncertainty can be perceived as a summary of the contextual antecedents of the implementation of cognitive control processes (Mushtaq et al., 2011).

Another theoretical interpretation proposes a link between unexpected uncertainty and specific mechanisms of cognitive control (Nieuwenhuis, 2011). An interesting review by Nieuwenhuis (2011) addresses the relationship between the LC system and the P3 ERP. By bringing together Yu and Dayan's (2005) theory and the prominent theory of the P3 proposed by Donchin (1981),Nieuwenhuis (2011) explores how unexpected uncertainty requires agents to update their representation of the environment. Indeed, a surprising and unexpected outcome must call for revision of an agent's mental model of the decision-making environment. This is indexed by the P3 amplitude which is strongly thought to be generated by the LC and NA signaling. An increased phasic release of NA may have direct enhancing effects on task-specific control representations in PFC contributing to the compensatory increase in control following a transient decrease in performance and/or reward (Aston-Jones and Cohen, 2005). Global changes in the external environment thus serves as an alarm system for contextual switches. Indeed, empirical studies are beginning to show that the variants of the P3 and late positive complex (LPC) are associated with changing S-R-O contingencies (Bland and Schaefer, 2011). Bland and Schaefer (2011) also demonstrated that frequent S-R-O rule changes in a challenging environment (where *Feedback validity* is low) was associated with a frontally based LPC component. This perhaps reflects a mechanism for integrating past outcomes in order to update a mental model of the current S-R-O contingency and the frequency in which it occurs. An S-R-O rule change is likely to signal for a revision in one's mental model which is likely to be reflected in the enhanced amplitude of the P3/LPC complex. This is in contrast to a rule change in an otherwise fairly habitual context (high *Feedback validity*) where volatility is indexed by an N2 component and likely reflects conflict monitoring (Bland and Schaefer, 2011). Therefore, there is some evidence to point to different forms of cognitive control depending on the interaction of expected uncertainty and volatility. However, unexpected uncertainty and volatility are yet to be explicitly dissociated in neuroimaging and electrophysiological studies.

#### **SEPARABLE MODES OF COGNITIVE CONTROL**

Recent studies are beginning to explore differential modes of cognitive control which may have important overlaps with the computational and neurobiological evidence outlined above. The dual modes of control (DMC) theory (Braver et al., 2007, 2009) suggests that cognitive flexibility can be achieved by modulating the manner in which a particular control mechanism is deployed in response to changing task demands or internal goal states. Specifically, this theory proposes a distinction between *proactive* and *reactive* modes of cognitive control (Braver et al., 2007). The proactive control is the early selection of goal-relevant information which is actively maintained in a sustained/anticipatory manner, before the occurrence of cognitively demanding events, to optimally bias attention, perception, and action systems in a goal-driven manner. In contrast, the reactive mode is a late correction mechanism whereby cognitive control is recruited only as needed, such as after a high-interference event is detected. Thus, proactive control relies on the anticipation and prevention of interference before it occurs, whereas reactive control relies on the *post hoc* detection and resolution of interference after its onset (Braver et al., 2009).

A clear prediction of this hypothesis is that proactive and reactive control can be distinguished in terms of lateral PFC activity. For instance, proactive control should be associated with sustained and/or anticipatory activation of PFC, which reflects the active maintenance of task goals. In contrast, reactive control should be reflected in transient activation of lateral PFC, along with a wider network of additional brain regions including the ACC (Braver et al., 2007, 2009). In addition, the DMC theory has been related to distinct ERP components. Particularly, it has been claimed that the P3 and late positivities are linked to proactive control and N2 to reactive control (van Wouwe et al., 2011). Interestingly, the P3 has been linked to WM and sustained maintenance of information in WM (Duncan-Johnson and Donchin, 1982) and the N2 has been linked with conflict monitoring and error detection (van Veen and Carter, 2002).

Importantly, proactive and reactive modes of control may be useful in successfully adapting to different forms of uncertainty. The DMC theory suggests that the temporal dynamics of neural activity can differ between a transient to a predominantly tonic mode. For instance, expected uncertainty may involve a more proactive mode of control in order to implement sustained attentional resources to facilitate internal representations of S-R-O contingencies (however, it might also be argued that automatic processes might be sufficient in a situation with learned and stable levels of expected uncertainty).

Separable modes of control have also been proposed by Koechlin and colleagues using a hierarchical framework. Koechlin et al. suggest two forms of control; *contextual and episodic control* (Koechlin et al., 2003; Koechlin and Summerfield, 2007). *Contextual control* refers to the use of a current cue (context) for selecting task appropriate behavior whereas *episodic control*, refers to the use of past cues that determine, for an extended period of time the way that current stimuli and contextual cues are interpreted (Egner, 2009). The modes of control are arranged hierarchically whereby episodic control affects contextual control, but not vice versa. According to Kouneiher et al. (2009) transient posterior-lateral PFC regions subserve contextual control whilst

sustained mid-lateral PFC regions are associated with episodic control. Importantly, these two modes of control may also play a role in adapting to different forms of uncertainty. For instance, episodic control refers to temporally extended information over a behavioral episode. This requires a sustained mechanism to integrate past representations and form a mental model of the environment. This mode of control may therefore be particularly important to integrating past S-R-O occurrences and representing expected forms of uncertainty. Conversely, contextual control as indicated by transient neural activity in the PFC may be useful in detecting contextual shifts such as a change in underlying S-R-O contingencies.

Together the theories outlined above suggest that there are separable modes of cognitive control. Here, we suggest that these may be particularly relevant to estimating and resolving different forms of uncertainty. As suggested above, expected forms of uncertainty may be estimated by sustained episodic control (Kouneiher et al., 2009) or proactive control (Braver et al., 2009) whilst unexpected forms of uncertainty may be detected by transient contextual control (Kouneiher et al., 2009) or reactive control (Braver et al., 2009).

Importantly however, a reactive mode of control may not necessarily be the most optimal mode in volatile environments in which a high frequency of S-R-O changes occur. Indeed, an agent may learn that the environment is frequently changing and thus these unexpected changes may become anticipated. Therefore a proactive mode of control may be ideal in this type of environment for two reasons. First, it would allow a sustained activation of a representation of the frequency of changes in the environment and hence the potential need for constant exploratory behaviors. Second, a proactive mode of control would allow the maintenance and integration of temporally extended information about past S-R-O contingencies in order to dynamically update current mental models. A parallel could be drawn from the theory of Koechlin et al. (2003), Koechlin and Summerfield (2007), Kouneiher et al. (2009) from which it could be speculated that episodic control could also be useful in order to integrate temporally extended information needed to successfully adapt to volatile situations.

A common theme across these theories is that the separable modes of control can be distinguished by sustained and transient neural activity. This may be particularly important for estimating different forms of uncertainty. Indeed, neurotransmitters thought to underlie expected and unexpected forms of uncertainly have been distinguished by tonic and phasic activity. For instance, ACh increases in a sustained fashion for expected unreliability of the environment (Dalley et al., 2001) and is involved in a prolonged state of readiness to respond to rarely and unpredictable occurring signals (Sarter et al., 2001). It could therefore be speculated that unexpected uncertainty would be associated to transient forms of neural activity related to cognitive control, whereas volatility would be associated to more sustained patterns of neural activity in cognitive control brain networks.

#### **ADAPTIVE GAIN THEORY OF LC-NA FUNCTIONING**

The adaptive gain theory of LC-NA functioning suggests that there are at least two distinguishable modes of LC function which drive behavior. In a phasic mode, bursts of LC activity are observed in association with the outcome of decision processes and are closely coupled with behavioral responses that are generally highly accurate. In a tonic mode however, LC baseline activity is elevated but phasic bursts of activity are absent (Aston-Jones and Cohen, 2005). Interestingly it has been proposed that the OFC and ACC could drive this LC phasic activity directly which in turn promotes exploratory or exploitative behavior (Aston-Jones and Cohen, 2005). This may have important implications for the mode in which cognitive control is implemented. For instance, unexpected uncertainty arises from strong violations of predictions that are expected to be correct (Yu and Dayan, 2005). Phasic NA signals have been associated with novelty and changes in S-R-O contingencies (Aston-Jones et al., 1997; Aston-Jones and Cohen, 2005; Yu and Dayan, 2005; Avery et al., 2012). This would fit well with a reactive mode of control which arises as a consequence of high-conflict events (Braver et al., 2007) which could be cause by strong violations of predictions. Furthermore this is also linked with a view that unexpected uncertainty is induced by a mismatch between prediction and observation and is signaled phasically with rapid habituation (Yu and Dayan, 2005). Indeed, strong projections from the OFC and ACC to the LC may drive this phasic response where signals from OFC and ACC augment the LC phasic release of NA thus improving performance on subsequent trials (Aston-Jones et al., 2002; Aston-Jones and Cohen, 2005). According to the adaptive gain theory, this effect could further contribute to the compensatory increase in control following a transient decrease in performance and/or reward. Indeed, empirical evidence suggests that NA is specifically involved in performance monitoring (Riba et al., 2005). Furthermore, there is substantial evidence for the modulatory influence of NA on cognitive functions that depend on the frontal cortex, particularly selective attention and working-memory tasks (Sara, 2009).

The adaptive gain theory further suggests that signals from ACC to LC (indicating an adverse outcome), possibly complemented by signals from OFC to LC (indicating absence of an expected reward) may augment the LC phasic mode (Aston-Jones and Cohen, 2005). This, in turn, would improve performance on subsequent trials by enhancing the LC phasic release of NA thus having direct enhancing effects on task-specific control representations in PFC (Aston-Jones and Cohen, 2005). Thus conflict detection as reflected by the ACC response which then sends triggers for compensatory adjustments in cognitive control may be mediated by LC-NA functioning. This would be consistent with Yu and Dayan's (2005) theoretical framework of NA functioning as a signal for unexpected uncertainty.

Indeed unexpected uncertainty can be seen as a state signaling the potential need to suppress of previous S-R-O rules in order to override these with more adaptive S-R-O contingencies. This requires flexible adaption of behavior in environments that are changeable. Thus signaling of NA in response to unexpected uncertainty may be crucially involved in ACC-PFC implementation of cognitive control. Indeed, functional neuroimaging studies investigating uncertainty have uncovered a neural network that has a remarkable overlap with brain networks usually associated with cognitive control tasks. In particular, a network involving lateral PFC areas, parietal cortex and the ACC seems to be constantly activated for decision-making tasks in which volatility and expected forms of uncertainty are manipulated and also in a wide range of classical cognitive control tasks (for a review of the neural correlates of uncertainty and cognitive control see Mushtaq et al., 2011). Therefore cognitive control and particularly a reactive mode as indexed by early negativities in the EEG and ACC fluctuations in the BOLD response as well as phasic bursts of NA may be particularly important for estimating, detecting, and resolving unexpected uncertainty. Alternatively, a proactive control mode characterized by sustained neural activity in the PFC and the P3/LPC complex may be important for successful integration of past outcomes in order to measures the stochasticity of the environment and deal with expected uncertainty. However, it is also likely that stable levels of stochasticity could be learned through automatic processes without the involvement of cognitive control processes. In addition, it is possible that proactive control might be also particularly useful in volatile contexts,where the temporally sustained maintenance and updating of past outcome information in WM might be useful to adapt to a context of frequent S-R-O changes.

In summary,it seems that reactive control could be used following a highly unexpected S-R-O change. However, a proactive mode can be very efficient at dealing with volatility. Therefore unexpected uncertainty and volatility should be differentiated: unexpected uncertainty occurs from a single or infrequent unpredicted fundamental changes in S-R-O contingency whereas volatility can be seen as a series of frequent fundamental changes in S-R-O frequencies, and this frequency of changes can itself become predictable. For our example above, a volatile situation is reached when our usual restaurant tends to hire a new chef very often during the year. If customers know this tendency, they will be able to use proactive strategies in order to detect if a change in the quality of the food is due to a transient change in a more stable pattern (e.g., the usual chef is absent 1 day every week) or if it reflects a more fundamental change, i.e., the previous chef was fired and replaced by a new one). Therefore how the brain estimates the relative frequency of changes on the environment is crucial. Behrens et al. (2007) suggest that this is reflected by ACC activity. Indeed, the ACC may be able to estimate the rate at which reward contingencies are changing and signal to the PFC to implement a reactive or more proactive mode of control. This likely reflects a highly sophisticated control mechanism which adjusts for suitable changes in the environment as possibly reflected by neuromodulation of ACh and NA mediated by the ACC-PFC.

## **SYNTHESIS AND CONCLUSIONS**

We have reviewed existing empirical evidence and theoretical evidence in order to form a case for considering three distinct forms of uncertainty; expected uncertainty, unexpected uncertainty, and volatility.Whilst expected uncertainty has received much attention in the literature, the latter two forms of uncertainty are relatively less well explored. Nevertheless a growing body of literature is beginning to unravel how the brain deals with unexpected changes in the environment. This is an exciting line of research which is beginning to prove fruitful (Yu and Dayan, 2005; Behrens et al., 2007; Doya, 2008; Krugel et al., 2009; Nassar et al., 2010; Bland and Schaefer, 2011; Nieuwenhuis, 2011; Preuschoff et al., 2011).

However an explicit distinction between unexpected uncertainty and volatility has yet to be addressed. We have suggested that computational modeling studies provide evidence of how we can deal with unexpected changes in S-R-O contingences and adjust the learning rate accordingly. However, volatility appears to promote a further computation by representing a "volatility" parameter as a high order statistic of the environment (Behrens et al., 2007). Next, the temporal activity of neuromodulators involved in signaling uncertainty may differentiate unexpected uncertainty and volatility. Particularly, unexpected uncertainty appears to be signaled by phasic bursts of NA activity whereas prolonged unexpected uncertainty i.e., volatility may recruit a more tonic mode. Finally these two forms of uncertainty may be differentiated in terms of the involvement of distinct cognitive control modes. It is possible that unexpected changes may be dealt with by a reactive mode of control recruiting conflict detection mechanisms to overcome competing responses in S-R-O contingencies. Alternatively successful adaptation to volatility may be associated with a proactive and sustained mode of control through the continual maintenance and updating of S-R-O contingencies in WM.

In addition, a number of questions remain open. For instance, it is unclear at this stage whether volatility and unexpected uncertainty are associated with distinct brain networks. The evidence reviewed above about the potential involvement of distinct cognitive processes in these two forms of uncertainty suggests that they could be dissociated in terms of their neural correlates. Further research will be necessary to address this question. A

### **REFERENCES**


information in an uncertain world. *Nat. Neurosci.* 10, 1214–1221.


more fundamental question regards the nature of the distinction between volatility and unexpected uncertainty. The main difference between them is the *frequency* of S-R-O changes in a given period of time. This frequency can be manipulated in a gradual, continuous way. However, it can be speculated that systems involved in processing uncertainty should be able to detect a threshold beyond which the processes implemented to deal with the environment will change (e.g., switchingfrom a reactive toward a proactive mode of control). Further research will be needed to test this idea. Finally, although the theoretical avenues considered in this article suggest that volatility and unexpected uncertainty might lead to different modes of cognitive control, and to different neuromodulatory patterns, most of these ideas remain yet to be empirically tested.

In summary, this article has reviewed empirical and theoretical evidence for the distinction between three forms of uncertainty, and in particular, it highlighted a distinction between a rare unexpected change (unexpected uncertainty) and a frequently changing environment (volatility). Future research should therefore form a clear distinction between unexpected uncertainty and volatility in order to further explore how we successfully estimate, represent, and resolve these different forms of uncertainty.

## **ACKNOWLEDGMENTS**

Alexandre Schaefer is supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC).


negativity and EEG spectra. *Neuroimage* 35, 968–978.


in the temporal prediction of reward during learning. *Nat. Neurosci.* 1, 304–309.


(2000). Cholinergic neurotransmission influences covert orientation of visuospatial attention in the rat. *Psychopharmacology* 150, 112–116.


affect modulates flexibility and evaluative control. *J. Cogn. Neurosci.* 23, 524–539.


conflict monitoring. *Psychol. Sci.* 17, 164–171.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 March 2012; accepted: 21 May 2012; published online: 08 June 2012.*

*Citation: Bland AR and Schaefer A (2012) Different varieties of uncertainty in human decisionmaking. Front. Neurosci. 6:85. doi: 10.3389/fnins.2012.00085*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Bland and Schaefer. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Making predictions in a changing world—inference, uncertainty, and learning

## *Jill X. O'Reilly\**

*Nuffield Department of Clinical Neurosciences, FMRIB Centre, John Radcliffe Hospital, Oxford University, Oxford, UK*

#### *Edited by:*

*Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland*

#### *Reviewed by:*

*Joseph W. Kable, University of Pennsylvania, USA Christian C. Luhmann, Stony Brook University, USA*

#### *\*Correspondence:*

*Jill X. O'Reilly, Nuffield Department of Clinical Neurosciences, FMRIB Centre, John Radcliffe Hospital, University of Oxford, Headington, Oxford OX3 9DU, UK e-mail: joreilly@fmrib.ox.ac.uk*

To function effectively, brains need to make predictions about their environment based on past experience, i.e., they need to learn about their environment. The algorithms by which learning occurs are of interest to neuroscientists, both in their own right (because they exist in the brain) and as a tool to model participants' incomplete knowledge of task parameters and hence, to better understand their behavior. This review focusses on a particular challenge for learning algorithms—how to match the rate at which they learn to the rate of change in the environment, so that they use as much observed data as possible whilst disregarding irrelevant, old observations. To do this algorithms must evaluate whether the environment is changing. We discuss the concepts of likelihood, priors and transition functions, and how these relate to change detection. We review expected and estimation uncertainty, and how these relate to change detection and learning rate. Finally, we consider the neural correlates of uncertainty and learning. We argue that the neural correlates of uncertainty bear a resemblance to neural systems that are active when agents actively explore their environments, suggesting that the mechanisms by which the rate of learning is set may be subject to top down control (in circumstances when agents actively seek new information) as well as bottom up control (by observations that imply change in the environment).

**Keywords: change detection, uncertainty, exploratory behavior, modeling, bayes theorem, learning**

To function efficiently in their environment, agents (humans and animals) need to make predictions. We can think of predictions being based on an internal model of the environment, stored in the brain, which represents information that has been observed, and predicts what will happen in future. The process by which such a model is constructed and updated may be called a learning algorithm. Learning algorithms are of interest to neuroscientists, partly because such algorithms actually exist in the brain (and we would like to understand them) and partly because constructing learning algorithms that model participants' incomplete knowledge of task contingencies can help us to understand their behavior in experimental paradigms.

Whilst all knowledge of the environment is arguably acquired through learning, learning is particularly important in environments that change over time. In this review we are concerned with a particular computational problem that arises in complex changing environments—how should learning algorithms adapt their learning rate to match the rate of change of the environment. We will consider two key concepts in inferring the rate of change: the likelihood function, by which the likelihood that current and past observations were drawn from the same distribution is evaluated, and the prior probability of change, which constrains how much evidence will be required for the learning algorithm to infer that a change has in fact occurred. We will relate these two constructs to the concepts of expected and estimation uncertainty, and consider the interplay between uncertainty and learning. Finally we will consider neural correlates of uncertainty and learning, and ask whether these are the same when learning is driven bottom up by surprising observations, and top down as part of the process of actively exploring the environment.

## **WHY IS CHANGE A CHALLENGE FOR LEARNING ALGORITHMS?**

A learning algorithm is an algorithm that makes use of past experience to construct a representation of the learned-about subject (we will call the learned-about subject "the environment" in this article). The purpose of learning is to predict future observations of the environment and hence respond to them efficiently (Friston and Kiebel, 2009; Friston, 2010). Therefore, to function effectively it is essential that the representation developed by the learning algorithm accurately reflects the *current* state of the environment and/or is predictive of *future* environmental states.

Throughout this review, when I mention a changing environment, I mean an environment that changes to an unknown state. Environments can change in both predictable and unpredictable ways. A *predictably changing* environment would be a changing environment whose state can nevertheless be predicted precisely as a function of time—for example, the phases of the moon. An *unpredictably changing* environment could be defined as an environment that undergoes changes that move it to an unknown state. For example, the location of the TV remote control in a family living room often behaves like this. In terms of this discussion of learning algorithms, we are only really interested in the second type of change—in the first case (an environment which changes, but predictably) there is nothing new to learn.

## **THE KEY CHALLENGE: HOW FAR BACK SHOULD YOU LOOK?**

Given that the changing environment is not totally random over time (in which case learning would be useless), a learning algorithm can make use of a history of data extending beyond the most recently experienced observations, to inform its internal representation of the environment. The more past data that can be validly used to create a representation of the environment, the more accurate the representation is likely to be. However, "validly" is the key word because in a changing environment, the challenge is to decide exactly which data should be used to create an up-to-date representation, and which data are no longer relevant (Doya, 2002; Behrens et al., 2007).

To illustrate the point: in a stationary environment (an environment which does not change over time), all data from the past, no matter how old, could be used to inform an internal representation of the current state of the environment. Therefore, for example, in a stationary environment, the mean of *all* observations would give the most accurate estimate possible of the mean of the underlying distribution (the environment) from which future observations will be drawn.

In contrast, in a changing (non-stationary) environment, it is not true that the distribution of all past observations reflects the underlying distribution in force at any particular time point *i*. On the contrary, in a changing environment there is a need for an additional layer of processing to work out how observations from different times in the past predict future states of the environment. For example, if the environment has undergone an abrupt change, the best solution may be to identify the change point and use all data since that point, disregarding data from prior to the change point. There is a trade-off between using as much data as possible (to increase the accuracy of the representation) and leaving out old data, which may be irrelevant or misleading.

#### **A SIMPLE WAY TO DISCOUNT OLDER DATA: DECAY KERNELS**

Firstly, to illustrate the problems associated with adjusting to the rate of change of the environment, we will consider a simple but non-adaptive strategy for discounting old data: namely to discount or down-weight older observations. For example, an estimate of the mean of the underlying distribution at time point *i* could be based on a running average of the last *n* observations (*i* − *n*: *i*), or a kernel-based average where observations (*i* − *n*: *i*) are averaged using a weighting function which down-weights older observations (see **Figure 1**, left hand panels).

This simple, fixed kernel approach is easy to implement in data analysis, and one can imagine how it could be implemented simply in a neural network: Incoming observations each activate a set of neural nodes which represent them (for example, in a spatial map, nodes with spatial receptive fields in which stimuli appear would be activated by these stimuli); activation in the nodes decays gradually over time so more recently activated nodes contribute more to the total activity within the system, as in a "leaky accumulator" model (Usher and McClelland, 2001). This can be achieved using a single-layer neural network (Bogacz et al., 2006).

However, algorithms like the kernel-based approach just described that have a fixed rate of discounting old data rather than adjusting their parameters dynamically to account for periods of faster and slower change, perform poorly in environments in which the relevance of old data does not decay as a simple function of time (**Figure 1**). If the environment has periods of moreand less-rapid change, the ideal solution is to adjust the range of data that are used to inform the model over time, in accordance with how far into the past data are still relevant.

As an extreme example, consider an environment that has periods of stationarity interspersed with sudden changes (as in **Figure 1**). An algorithm that discounts older observations based solely on their age, like the simple fixed kernels described above, applies the same down-weighting to a past observation *i* − *n* regardless of whether a change has occurred since that observation, or not. If in fact a change has occurred since *i* − *n*, then the best solution would be to treat observations from before the change differently from those made since the change. On the other hand, during periods of stability, the best solution would be to use as many old observations as possible, not to arbitrarily disregard observations on the basis of age.

To implement a solution in which the range of data adjusts to changes in the rate of change of the environment over time, a learning algorithm would need some mechanisms by which to evaluate the rate of change of the environment. How can this be achieved?

## **ESTIMATING THE PROBABILITY OF CHANGE**

Consider a clear case in which not all past data are equally relevant—an environment which undergoes abrupt changes, interspersed with periods of stationarity (periods without change) as in **Figure 1**. How can a learning algorithm effectively disregard observations from before an abrupt change, whist using as much data as possible during stable periods? To do this, the learning algorithm needs to be able to infer the rate of change of the environment from the data it observes (Courville et al., 2006; Behrens et al., 2007; Wilson et al., 2010; Wilson and Niv, 2011).

In order to determine the rate of change of the environment, a learning algorithm needs to balance two considerations. Firstly, how unlikely was it that current observations were drawn from the same distribution (the same state of the environment) as previous observations? Secondly, how likely are change points themselves?—If I thought change points occurred on average about every 10,000 trials, I would need more evidence to infer a change than if I thought change points occurred on average every 10 trials (Wilson et al., 2010). We will now consider how these two considerations can be formalized.

## **INFERRING CHANGE I: THE LIKELIHOOD FUNCTION**

Let's start with the first of our two considerations: How unlikely was it that a given observation was drawn from the same distribution as previous observations? Consider a very simple learning task in which on each trial *i*, a target appears at some location across space, *xi*. The location is drawn from a Gaussian distribution with mean <sup>μ</sup> and variance <sup>σ</sup>2, such that *xi* <sup>∼</sup> *<sup>N</sup> (*μ*,* <sup>σ</sup>2*)*.

Now let's say we observe a data point *xi*, and we want to know from what distribution this data point was drawn. In particular, we want to know whether this data point *xi* was drawn from the same distribution as previous data points, or whether a change in the environment has occurred, such that the current parameters

**FIGURE 1 | Algorithms with a fixed temporal discount do not fit well to environments with a variable rate of change.** The right-hand panels illustrate an environment in which observations are drawn from a Gaussian distribution; each row shows a different learning algorithm's estimate of the distribution mean μ. The mean μ, which has period of stability interspersed with sudden change, is shown in black. Actual observations *x* are shown in gray. Estimates of μ are shown in blue. The top three rows are kernel-based learning algorithms with different time constants. The left hand panels illustrate the three weighting functions (kernels) which were used to determine the weighting of observations in the panels next to them. The weighting w(*j*) assigned to observation *i* − *j* when calculating the mean μ(*i*) on observation *i* is defined by the exponential function w(*i*) = exp(−*j/n*). The rate of decay is

μ*i,* σ<sup>2</sup> *<sup>i</sup>* are not equal to previous parameters from some putative pre-change point, <sup>μ</sup>*i*−*n,* <sup>σ</sup><sup>2</sup> *i*−*n*.

Statisticians would talk about this problem in terms of *probability* and *likelihood*. We can calculate the *probability* that a certain observation (value of *xi*) would occur, given some generative distribution *xi* <sup>∼</sup> *<sup>N</sup> (*μ*,* <sup>σ</sup>2*)*, where the value of the parameters <sup>μ</sup>*,* <sup>σ</sup><sup>2</sup> are specified (for example, the probability of observing a value of *xi <sup>&</sup>gt;* 3 given that <sup>μ</sup> <sup>=</sup> 0 and <sup>σ</sup><sup>2</sup> <sup>=</sup> 1 is obtained from the standard probability density function for the Normal distribution, as *p* = 0*.*001). Conversely, we can think about the *likelihood* that the underlying distribution has certain parameters (the likelihood that μ*,* σ<sup>2</sup> take certain values), given that we have observed a certain value of *xi*. The likelihood of some values of μ*,* σ<sup>2</sup> given observations *<sup>x</sup>* can be written as *<sup>p</sup>*(μ*,* <sup>σ</sup>2|*xi)*; conversely the probability of some observation *x* given certain parameters of the environment <sup>μ</sup>*,* <sup>σ</sup><sup>2</sup> can be written *<sup>p</sup>*(*xi*|μ*,* <sup>σ</sup>2*)*. The two quantities are closely related:

$$p(\mu, \sigma^2 | \mathbf{x}\_i) = p(\mathbf{x}\_i | \mu, \sigma^2) \tag{1}$$

This relationship gives us a clear way to evaluate whether a change point has occurred—given some hypothesis about the parameters determined by the constant *n*, with higher values of *n* meaning a longer period of the past is used. The top row shows a kernel using only very recent observations. This tracks the mean μ well, but jumps around a lot with individual observations. Note the blue line tracks the gray (data) line more closely than it tracks the actual mean μ (black line). The 2nd and 3rd rows show kernels using longer periods of the past. This gives a much smoother estimate, but is slow to adjust to changes in μ. The bottom row shows the output of a Bayesian learning algorithm that includes an additional level of processing in order to detect change points. Note how unlike the kernel-based algorithms, its estimate is stable during periods of stability *and* changes rapidly in response to change in the underlying distribution.

of the environment μ*,* σ<sup>2</sup> that were in force prior to a putative change point, we can calculate the probability that an observation or set of observations made after the putative change point would have been observed given the pre-putative-change parameters of the environment, and hence calculate the likelihood that the pre-change parameters are in fact still in force (or conversely, the likelihood a change point has occurred).

It is worth noting that the likelihood function *<sup>p</sup>*(μ*,* <sup>σ</sup>2|*xi)*, or more generally *p*(*parameters*/*observations*) can only be obtained in this way if the shape of the distribution from which observations are drawn is specified—we cannot estimate the parameters of a distribution, if we do not know how that distribution is parameterized. The validity pre-specifying the form of the generative distribution has been debated extensively throughout the twentieth century (McGrayne, 2011) and we will not rehash that debate here—we will simply note that whilst a wrong choice of distribution could lead to incorrect inferences, in practice it is often possible to make an informed guess about the distribution from which data are drawn—partly by applying prior experience with similar systems, and partly because types of observations follow certain distributions, for example, binary events can often be modeled using a binomial distribution.

#### **INFERRING CHANGE II: PRIOR PROBABILITY OF CHANGE AND THE TRANSITION FUNCTION**

Now let's address the second consideration for algorithms that adapt to the rate of change of the environment: the question of how likely change points themselves are, and the probability a-priori of particular transitions in the parameters of the environment.

We have already noted that, intuitively, an observer who believes change is improbable a-priori (for example, if the observer thinks that a change occurs only every 10,000 observations) should demand a higher level of evidence in order to conclude that a change has occurred, compared to an observer who believes change is frequent in his environment (e.g., if the observer thinks the environment changes about once every 10 trials). Furthermore, different environments can change in different ways over time—for example, in some environments the parameters might change smoothly, whilst other environments might change abruptly.

A function that models how the state of the environment evolves over time is called the *transition function* (Courville et al., 2006). A transition function defines how the state of the environment on trial *i* depends on its state on previous trials—so in the Gaussian example, the transition function specifies how the true parameters of the environment on trial *i* that is μ*i*, σ<sup>2</sup> *<sup>i</sup>* , depend on the true parameters of the environment on previous trials, μ1:*<sup>i</sup>* <sup>−</sup> 1, σ2 <sup>1</sup>:*<sup>i</sup>* <sup>−</sup> <sup>1</sup>.

Different transition functions represent different models of how the environment changes over time. For example, we could specify that the parameters of the environment vary smoothly over time, such that μ*<sup>i</sup>* = μ*<sup>i</sup>* <sup>−</sup> <sup>1</sup> + δμ where δμ is small compared to μ. Alternatively, we could allow the parameters of the environment to jump to totally new values after a change point, for example by specifying:

$$\{\mu\_i, \sigma\_i^2\} = \begin{cases} \{\mu\_{i-1}, \sigma\_{i-1}^2\} & \text{if } J = 1\\ \text{random} & \text{if } J = 0 \end{cases} \tag{2}$$

*...* where *J* is a binary variable determining the probability of a change, e.g., *J* follows a binomial B(0.1,1), giving a probability of 0.1 of a change on any given observation.

Both the form of the transition function (e.g., smooth change vs. jumps) and its parameters (e.g., the probability of a jump or the rate of smooth transition) are used to evaluate whether a change in the environment has occurred—models with transition functions specifying faster rates of change or higher probabilities of jumps in the parameters of the environment should infer change more readily than models that have low a-priori expectations of change.

#### **BAYES' THEOREM AND CHANGE DETECTION**

We have seen that for a learning algorithm to adapt to the rate of change in the needs to evaluate the both likelihood of different states or parameters of the environment given the data, and the probability of change points themselves. These two elements are captured elegantly in Bayes' rule, which in this case can be written:

$$p(\theta\_i|\mathbf{x}\_{1:i}) \propto p(\mathbf{x}\_i|\theta\_i)p(\theta\_i) \tag{3}$$

*...* where θ*<sup>i</sup>* represents the parameters of the environment on the current trial *i*(μ*i*, σ<sup>2</sup> *<sup>i</sup>* ) in our Gaussian example, and *x*1:*<sup>i</sup>* are the observations on all trials up to and including the present one.

On the right hand side, *p(xi*|θ*i)* is equal to *p(*θ*i*|*xi)*, the likelihood function, due to Equation 1 above; *p(*θ*i)*, the prior probability of the parameters θ*i*, can be thought of as *p(*θ*i*|*x*1:*<sup>i</sup>* <sup>−</sup> <sup>1</sup>*)* and is obtained from the estimate of the parameters of the environment on trial *i* − 1 via the transition function. For example if we model a transition function as in Equation 2, so that the parameters of the environment mostly stay the same from one trial to the next but can jump to totally new values with some probability *q*, then

$$p(\theta\_i) = (1 - q)p(\theta\_i | \mathbf{x}\_{1:i-1}) + q(U(\theta)) \tag{4}$$

*...* where *p(*θ*i*|*x*1:*<sup>i</sup>* <sup>−</sup> <sup>1</sup>*)* is the probability that the parameters θ*<sup>i</sup>* took some values given all previous observations *x*1:*<sup>i</sup>* <sup>−</sup> 1, and *U(*θ*)* is a uniform probability distribution over all possible new values of θ, if there had been a change point.

Bayes' rule expresses a general concept about how an observer's beliefs should be updated in light of new observations (for example, whether observations indicate a change in the underlying environment); it expresses the idea that the degree to which the observer should change his beliefs depends on both the likelihood that previously established parameters are still in force, and the transition function or change-point probability. Hence Bayes' rule captures the two considerations we have argued are important for algorithms that respond adaptably to the rate of change of the environment.

Because these considerations relate so closely to Bayes' theorem, it could be argued that any change-detection model that considers the likelihood that old parameters are still in force, and the prior probability of different parameter values (for example based on a transition function) is Bayesian in nature.

### **UNCERTAINTY AND LEARNING**

In this review we are interested in how learning algorithms adapt to change. A key concept in relation to learning and change is uncertainty. There is a natural relationship between uncertainty and learning in that it is generally true that the purpose of learning is to reduce uncertainty, and conversely, the level of uncertainty about the environment determines how much can be learned (Pearce and Hall, 1980; Dayan and Long, 1998; Dayan et al., 2000). We will now see that two types of uncertainty, *expected uncertainty* and *estimation uncertainty*, which can be loosely related to the concepts of likelihood and transition function just discussed, play different roles in learning and may have distinct neural representations.

#### **TYPES OF UNCERTAINTY**

Uncertainty can be divided into two constructs—*risk* or expected uncertainty, and *ambiguity* or estimation uncertainty (Knight, 1921; Dayan and Long, 1998; Courville et al., 2006; Preuschoff and Bossaerts, 2007; Payzan-Lenestour and Bossaerts, 2011).

*Risk* or *expected uncertainty* refers to the uncertainty which arises from the stochasticity inherent in the environment for example, even if an observer knew with certainty that observations were drawn from some Gaussian distribution *x* ∼ *<sup>N</sup> (*μ*,* <sup>σ</sup>2*)*, with known parameter values (known values <sup>μ</sup>, <sup>σ</sup>2), he would still not be able to predict with certainty the value of the next observation *xi*+1—because observations are drawn stochastically from a (known) distribution with some variance, σ2. Thus, σ<sup>2</sup> determines the level of expected uncertainty in this environment.

In contrast, uncertainty that arises from the observer's incomplete knowledge of the environment—in our Gaussian example, uncertainty about the values of μ*,* σ<sup>2</sup> themselves—is called *estimation uncertainty* or ambiguity (Knight, 1921). Estimation uncertainty is the type of uncertainty that may be reduced by obtaining information, e.g., by increasing the number of observations of the environment. Estimation uncertainty generally increases when the environment is thought to have changed to a new state (since relatively few observations of the new state are available).

Expected uncertainty and estimation uncertainty relate to the two factors we previously discussed in relation to change detection: the likelihood that the same state of the environment is in force now as previously, and the a-priori probability that the state of the environment is not what the observer had previously thought (determined in part by the transition function).

Expected uncertainty affects inferences about the likelihood that the same state of the environment is in force now as previously, because given some observation *xi*, the strength of evidence for a change in the environment depends not only on how far *xi* falls from the expected value E(*x*) but also on the estimated variance of the distribution from which *x* is drawn. For example, in our Gaussian learning model, for some putative μ, the probability of an observation *xi* and hence the likelihood of that model parameters μ*,* σ<sup>2</sup> take a given value depends both the distance of the observation from the putative model mean, *xi* − μ, and on the level of expected uncertainty within the environment, σ2: if expected uncertainty (σ2*)* is low, then a given value of (*xi* <sup>−</sup> <sup>μ</sup>*)* represents stronger evidence against μ*,* σ<sup>2</sup> still being in force, compared to if expected uncertainty *(*σ2*)* was high. This concept is illustrated in **Figure 2**.

Estimation uncertainty, in contrast, relates more closely to the idea of assessing the a-priori probability of change in the environment. Firstly, the strength of belief in any particular *past state* of the environment affects estimation uncertainty—intuitively, if the observer is not sure about the state of the environment, he may be more willing to adjust his beliefs. Secondly, beliefs about the rate or frequency of change in the environment (i.e., about the transition function) affects estimation uncertainty because if the observer believes the rate of change of the environment to be high, then the extrapolation of past beliefs to predictions about the future state of the environment is more uncertain. These concepts are illustrated in **Figures 3**, **4**.

In order to illustrate how the effect of expected and estimation uncertainty on change point detection translate into an influence on learning rate, we can consider a model which observes a series of data points from a Gaussian distribution and uses these sequentially to infer the parameters of that distribution, whilst taking into account the possibility that those parameters have jumped to new values, as in Equation 2. Details of this model

**FIGURE 2 | Relationship between the concepts of Expected Uncertainty and Likelihood.** Plot of values of some observed variable *x* against their probability, given two Gaussian distributions with the same mean. The red distribution has a lower variance, and hence lower expected uncertainty, than the blue distribution. Points *a* and *b* represent possible observed values of *x*. For the red and blue distributions, the distance from the mean (*a* − μ) is the same, but at *a*, the red distribution has higher likelihood (because point *a* has a higher probability under the red distribution than the blue distribution) whilst at point *b*, the blue distribution has a higher likelihood. Consider an algorithm assessing evidence that the environment has changed. If a datapoint *x* = *b* is observed, whether the algorithm infers that there has been a change will depend on the variance or expected uncertainty of the putative pre-change distribution. If the algorithm "thinks" that the red distribution is in force, an observation *x* = *b* is relatively strong evidence for a change in the environment (as *b* is unlikely under the red distribution) but if the algorithm "thinks" the blue distribution is in force, the evidence for change is much weaker, since point *b* is not so unlikely under the blue distribution as it is under the red distribution.

are given in the Appendix and its "behaviour" is illustrated in **Figure 5**.

In **Figure 2** we saw that when expected uncertainty is high, the deviation of an observed value or set of values from the distribution mean needs to be higher, to offer the same weight of evidence for a change in the underlying model parameters, compared to when expected uncertainty is low. In the case of our Gaussian target locations example, this would mean that when σ<sup>2</sup> is believed to be high, a given deviation of a sample from the mean (*x* − μ) is weaker evidence for change, compared to when the estimate of σ<sup>2</sup> is low. In terms of a learning algorithm, this is illustrated in panels **(A)** and **(B)** of **Figure 6**. Panel **(A)** shows a case where the true mean of the generative distribution changes when σ<sup>2</sup> is thought to be high (so expected uncertainty is high). Panel **(B)** shows a change of similar magnitude in the generative mean, when σ<sup>2</sup> is thought to be low. The model adapts much more quickly to the change in the distribution mean in the case with lower expected uncertainty.

In contrast, we have argued that the level of estimation uncertainty or ambiguity is more closely related to the second consideration, the probability of change itself. Consider the process by which probability densities over the model parameters are updated in our Bayesian learning model. A-priori (before a certain data point *xi* is observed), if the probability of change is believed to be high, estimation uncertainty over the parameters

**FIGURE 3 | Illustration of estimation uncertainty.** These plots show the output of a numerical Bayesian estimation of the parameters of a Gaussian distribution. If *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>(*μ*,* <sup>σ</sup>2*)*, and some values of *<sup>x</sup>* are observed, the likelihood of different values for μ*,* σ<sup>2</sup> can be calculated jointly using Bayes' rule. The colored plots (left) show the joint likelihood for different pairs of values μ*,* σ2, where each point on the colored image is a possible pair of values μ*,* σ2, and the color represents the likelihood of that pair of values. The line plots (**Right panel**) show the distribution across *x* implied by different values of μ*,* σ2. The dashed black line is the true distribution from which data were drawn. The blue line is the maximum a-posteriori distribution—a Gaussian distribution with values of μ*,* σ<sup>2</sup> taken from the peak of the joint distribution over μ*,* σ<sup>2</sup> shown on the left. The red line represents a weighted sum (W.S.) of the Gaussian distributions represented by all possible values of μ*,* σ2, weighted by their joint likelihood as shown in the figure to the left. The top represents an estimate of the environment based on fewer data points than the bottom row. With relatively few data points, there is a lot of uncertainty about the values of μ*,* σ2, i.e., estimation uncertainty—illustrated by the broader distribution of likelihood over different possible values of μ, σ<sup>2</sup> (**Left panel**) in the top than bottom row. Whilst the maximum a-posteriori distribution is a good fit to the "true" distribution from which data were drawn in both cases, if we look at the weighted sum of all distributions, there is a lot more uncertainty for the top row case, based on fewer data points. Hence if the observer uses a weighted sum of all possible values of μ*,* σ<sup>2</sup> of the environment to calculate a probability distribution over *x*, the variance of that distribution depends on the level of estimation uncertainty.

μ and σ<sup>2</sup> is also high—this is the effect illustrated in **Figure 6**. Conversely, a-posteriori (after a data point or data points are observed), estimation uncertainty is increased if evidence for a change-point is observed (i.e., a data point or set of data points which are relatively unlikely given the putative current state of the environment), (Dayan and Long, 1998; Courville et al., 2006). We can see this in **Figure 7**. As the model starts to suspect that the parameters of the environment have changed, the spread of probability density across parameter space (i.e., estimation uncertainty) increases. As more data are observed from the new distribution, the estimate of the new parameters of the environment improves, and estimation uncertainty decreases. Hence estimation uncertainty is related to both to the a-priori expectation of change, and the a-posteriori probability that a change may have occurred.

The role of estimation uncertainty in determining how much can be learned can be related to concepts in both Bayesian theory (Behrens et al., 2007) and classic associative learning

**FIGURE 4 | Two considerations for evaluating whether a change has occurred.** Plots show the probability of observing some value of *x*, given that *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>(*μ*,* <sup>σ</sup>2*)* and the values of <sup>μ</sup>*,* <sup>σ</sup><sup>2</sup> can jump to new, unpredicted values as defined in Equation 2. When an observation of the environment is made, an algorithm that aims to determine whether a change has occurred should consider both the likelihood of the previous model of the environment given the new data, and the prior probability of change as determined in part by the transition function. **Top panel:** the probability of an observation taking a value *x* is shown in terms of two distributions. A Gaussian shown in blue represents the probability density across *x* if the most likely state of the environment (the most likely values of μ*,* σ2*)*, given past data, were still in force. The uniform distribution in red represents the probability density across *x* arising from all the possible new states of the environment, if a change occurred. The possible new states are represented by a uniform function (red line in the figure) because, if we consider the probability of each value of *x* under an infinite number of possible states at once (i.e., the value of *x* given each of infinitely many other possible values of μ and σ2), the outcome is a uniform distribution over *x*. A change should be inferred if an observation occurs in the gray shaded regions—where the probability of *x* under the uniform (representing change) is higher than the probability under the prior Gaussian distribution. Hence the red data point in **Figure 4** should cause the system to infer a change has occurred, whereas the blue data point should not. **Bottom panel:** as above, the probability distribution over *x* is a combination of a Gaussian and a Uniform distribution (representing the most likely parameters of the environment if there has been no change, and the possible new states of the environment if there has been a change, respectively). In this panel, the Gaussian and Uniform components are summed to give a single line representing the distribution over *x*. The different colored lines represent different prior probabilities of change, and hence different relative weightings of the Gaussian and uniform components. Increasing the prior probability of change results in a wider distribution of probability density across all possible values of *x*.

theory (Pearce and Hall, 1980): in the terminology of classical conditioning, estimation uncertainty can be equated with *associability* (Dayan and Long, 1998; Dayan et al., 2000)—associability being a term in formal learning theory which defines how much

**Figures 6**, **7**.

can be learned about a given stimulus, where the amount that can be learned is inversely related to how much is already known about the stimulus (Pearce and Hall, 1980). Low estimation uncertainty means low associability—which means minimal learning. Similarly, estimation uncertainty relates to the *learning rate*—α in the Rescorla–Wagner model of reinforcement learning (Rescorla and Wagner, 1972; Behrens et al., 2007)—because higher estimation uncertainty is associated with faster learning.

#### **TOP DOWN CONTROL OF ESTIMATION UNCERTAINTY?**

In a stable environment, estimation uncertainty—uncertainty about the parameters of the environment—generally decreases over time, as more and more observations are made to be consistent with a particular state of the environment. Indeed it has been argued that the main goal of a self-organizing system like the brain is to reduce surprise by improving the match between its internal representations of the environment and the environment

**FIGURE 6 | Learning is faster when expected uncertainty is low.** Panels **(A)** and **(B)** show two sets of trials which include changes of similar magnitude in the mean of the generative distribution (distribution from which data were in fact drawn). In panel **(A)**, the estimate of σ*<sup>i</sup>* is high (high expected uncertainty) but in panel **(B)**, the estimate of σ*<sup>i</sup>* is lower—this is indicated by the distribution of probability density from left to right in the colored parameter-space maps, and also the width of the shaded area μ ± σ on the lower plot. The red boxes indicate the set of trials shown in the parameter space maps; the red arrow shows which parameter space map corresponds to the first trial after the change point. Note that the distribution of probability in parameter space changes more slowly when expected uncertainty is high (panel **A**), indicating that learning is slower in this case.

**FIGURE 7 | Change in the environment increases estimation**

**uncertainty.** Here we see a set of trials during which a change point occurs (change point indicated by red arrow). Before the change point, the model has low estimation uncertainty (probability density is very concentrated in a small part of parameter space, as seen from the first three parameter space maps). When the change point is detected, estimation uncertainty increases as the model initially has only one data point on which to base its estimate of the new parameters of the distribution. Over the next few trials, estimation uncertainty decreases (probability density becomes concentrated in a smaller part of parameter space again).

itself (Friston and Kiebel, 2009; Friston, 2010), i.e., to reduce estimation uncertainty as well as estimation error.

Whilst additional observations of the environment tend to decrease estimation uncertainty, estimation uncertainty is driven up by observations that suggest a change may have occurred in the environment: surprising stimuli are associated with increases in the learning rate (Courville et al., 2006). We might think of this as bottom-up or data-driven control of the level of estimation uncertainty in the model, or equivalently the learning rate, or the prior expectation of change.

However, it is also possible to imagine situations in which it might be advantageous to control estimation uncertainty (or the learning rate) top down instead of bottom up—i.e., to *actively* increase the learning rate in order to "make space" for new information about the environment. One such situation would be when an observer is actively exploring his environment and hence presumably wishes to adapt his internal model of the environment to take into account the new information obtained by exploring. Indeed, change of context (moving an animal from one location to another) is associated with increased learning rate in experimental animals (Lovibond et al., 1984; Hall and Channell, 1985; McLaren et al., 1994).

## **NEURAL REPRESENTATIONS OF ESTIMATION UNCERTAINTY AND LEARNING RATE**

A common set of neural phenomena are associated with the rate of learning, processing of stimuli that could indicate a change in the environment, and active exploration of the environment; these phenomena could be conceptualized computationally in terms of control of the level of estimation uncertainty in the brain's models of the environment.

Neuroanatomically, an area of particular interest in relation to estimation uncertainty is the anterior cingulate cortex (ACC). Activity in the ACC has been shown to correlate with learning rate such that, in environments in which the environment changes frequently and observers learn quickly about change (i.e., conditions of high estimation uncertainty), the ACC is more active (Behrens et al., 2007). The ACC is also activated when people receive feedback about their actions or beliefs that causes them to modify their behavior on future trials (and by implication, to modify their internal model of the environment) (Debener et al., 2005; Cohen and Ranganath, 2007; Matsumoto et al., 2007) this activity, which has been observed using fMRI and electrophysiological recordings, is probably the source of the error- or feedback-related negativity (ERN; Debener et al., 2005).

Interestingly, ACC activity may be more closely related to the *forgetting* of old beliefs about the environment (and hence the increasing of estimation uncertainty), than to new learning. In a particularly relevant study Karlsson et al. (2012), showed that in rats performing a two-alternative probabilistic learning task, patterns of activity in the ACC underwent a major change in activity when the probabilities associated with each of the two options reversed. Importantly, rats' behavior around a probability reversal (when the values associated with each lever switched) had three distinct phases—before the reversal, rats showed a clear preference for the high value lever, but when the probabilities reversed there was a period in which the rats showed no preference for either lever (they probed each lever several times as if working out the new values associated with each lever) before settling down into a new pattern of behavior that favored the new high value lever. The ACC effect was associated with the point at which rats abandoned their old beliefs about the environment in favor of exploration and the acquisition of new information (and hence, should have had raised levels of estimation uncertainty)—rather than at the time at which a new model of the environment started to govern behavior.

Further experiments have reported ACC activity when participants make the decision to explore their environment rather than to exploit known sources of reward (Quilodran et al., 2008), or to forage for new reward options rather than choosing between those options immediately available to them (Kolling et al., 2012)—again, these are cases in which estimation uncertainty in the brain's internal models could be actively raised, to facilitate the acceptance of new information in the new environment (Dayan, 2012).

Neurochemically, Dayan and colleagues have proposed that the neuromodulator noradrenaline (also called norepinephrine) signals estimation uncertainty. Evidence from pupilometry studies suggests that noradrenaline levels [which are correlated with pupil dilation (Aston-Jones and Cohen, 2005)] are high when estimation uncertainty is high in a gambling task (Preuschoff et al., 2011). Increases in pupil dilation have been demonstrated both circumstances that should drive estimation uncertainty bottomup [when data are observed that suggest a change point has occurred (Nassar et al., 2012)], and top down [during exploratory behavior (Nieuwenhuis et al., 2005)].

Pupil diameter is increased in conditions when observers think the rate of change in the environment is high, and is phasically increased when observers detect a change in the environment (Nassar et al., 2012). Hence tonic noradrenaline levels could be said to represent the prior probability of change in the environment, whilst phasic noradrenaline may represent a-posteriori evidence (based on sensory input) that a change is occurring or has occurred at a given time point (Bouret and Sara, 2005; Dayan and Yu, 2006; Sara, 2009).

Interestingly, whilst events which are surprising in relation to a behaviorally-relevant model of the environment are associated with an increase in noradrenaline release [29,30] and pupil diameter [31], it has also been shown that irrelevant surprising events which cause an increase in pupil diameter also cause an increase in learning rate (Nassar et al., 2012) suggesting a rather generalized mechanism by which the malleability of neural circuits may be affected by surprise, in accordance with behavioral evidence that surprising events affect the learning rate (Courville et al., 2006).

The mechanism by which noradrenaline represents or controls estimation uncertainty is not known, although two appealing theoretical models are that noradrenaline acts on neural models of the environment by adjusting the gain function of neurons (Aston-Jones and Cohen, 2005), or by acting as a "reset" signal that replaces old models of the environment with uninformative distributions, to make space for new learning (Bouret and Sara, 2005; Sara, 2009).

The involvement of the ACC and noradrenaline in the control/representation of estimation uncertainty may be linked, because the ACC has strong projections to the nucleus that produces noradrenaline, the locus coeruleus (Sara and Herve-Minvielle, 1995; Jodo et al., 1998).

Whilst there is currently little consensus on the representation of learning rate and uncertainty in the brain, the data reviewed here do begin to suggest a mechanism by which estimation uncertainty and learning rate are controlled neurally, which is involved both when uncertainty/learning is driven bottom-up (by observations that suggest the environment is changing) and when they are driven top-down (such as when agents actively quit a familiar environment and explore a novel one).

## **REFERENCES**


electroencephalogram and functional magnetic resonance imaging identifies the dynamics of performance monitoring. *J. Neurosci.* 25, 11730–11737. doi: 10.1523/JNEUROSCI.3286-05.2005


*Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy*. New Haven, CT: Yale University Press.


conditioning: variations in the effectiveness of reinforcement and nonreinforcement," in *Classical Conditioning II: Current Research and Theory,* eds A. H. Black and W. F. Prokasy (New York, NY: Appleton-Century Crofts), 64–99. doi: 10.1037/a0030892


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 August 2012; accepted: 24 May 2013; published online: 14 June 2013.*

*Citation: O'Reilly JX (2013) Making predictions in a changing world inference, uncertainty, and learning. Front. Neurosci. 7:105. doi: 10.3389/ fnins.2013.00105*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2013 O'Reilly. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

## **APPENDIX**

#### **LEARNING MODEL FOR FIGURES 4–7**

Let data *x* be drawn from a Gaussian distribution with unknown mean μ and variance σ2. The values of μ and σ<sup>2</sup> occasionally jump to new values; the probability of such a jump occurring between and pair of observations is fixed at some value *q*. For simplicity in this example we assume *q* is known, but it is also possible to infer *q* from the data (Nassar et al., 2010; Wilson et al., 2010).

Then the structure of the environment can be described as follows:

$$\propto\_i \sim \mathcal{N}(\mu\_i, \sigma\_i^2) \tag{5}$$

$$\mu\_i, \sigma\_i^2 = \begin{cases} \mu\_{i-1}, \sigma\_{i-1}^2 & \text{if } J = 0\\ \mathcal{U}^2(\mu\_{\min}, \mu\_{\max}, \sigma^2\_{\min}, \sigma^2\_{\max}) & \text{if } J = 1 \end{cases} \tag{6}$$

where *J* is a binary variable determining the probability of a jump, such that *J* follows a Bernouilli with probability *q*.

$$J \sim \mathcal{B}(q) \tag{7}$$

Then the values for μ*<sup>i</sup>* and σ*<sup>i</sup>* can be inferred from the data using Bayes' rule as follows:

$$p(\mu\_i, \sigma\_i^2 | \mathbf{x}\_1, i) = p(\mathbf{x}\_i | \mu\_i, \sigma\_i^2) p(\mu\_i, \sigma\_i^2 | \mathbf{x}\_{1:i-1}) \tag{8}$$

where the likelihood is

$$p(\mu\_i, \sigma\_i^2 | \mathbf{x}\_i) = p(\mathbf{x}\_i | \mu\_i, \sigma\_i^2) \sim \mathcal{N}(\mu\_i, \sigma\_i^2) \tag{9}$$

*...* and the prior is derived from the posterior on the previous trial, incorporating a uniform "leak" over parameter space to represent the possibility that the values of the parameters have changed since the previous observation:

$$\begin{split}p(\mu\_i, \sigma\_i^2 | \varkappa\_{1:i-1}) &= (1-q)p(\mu\_{i-1}, \sigma\_{i-1}) \\ &+ q \{ \mathcal{U}^2(\mu\_{\min}, \mu\_{\max}, \sigma^2\_{\min}, \sigma^2\_{\max}) \} \tag{10} \end{split} \tag{10}$$

On trial 1, the prior over parameter space is uniform.

## Do not bet on the unknown versus try to find out more: estimation uncertainty and "unexpected uncertainty" both modulate exploration

## **Élise Payzan-LeNestour 1,2\* and Peter Bossaerts <sup>2</sup>**

<sup>1</sup> Australian School of Business, University of New South Wales, Sydney, NSW, Australia <sup>2</sup> California Institute of Technology, Pasadena, CA, USA

#### **Edited by:**

Ming Hsu, University of California Berkeley, USA

**Reviewed by:**

Michael J. Frank, Brown University, USA Sander Nieuwenhuis, Leiden University, Netherlands

#### **\*Correspondence:**

Élise Payzan-LeNestour, School of Banking and Finance, Australian School of Business, University of New South Wales, Room 359A, Sydney, NSW 2052, Australia. e-mail: elise@elisepayzan.com

Little is known about how humans solve the exploitation/exploration trade-off. In particular, the evidence for uncertainty-driven exploration is mixed.The current study proposes a novel hypothesis of exploration that helps reconcile prior findings that may seem contradictory at first. According to this hypothesis, uncertainty-driven exploration involves a dilemma between two motives: (i) to speed up learning about the unknown, which may beget novel reward opportunities; (ii) to avoid the unknown because it is potentially dangerous. We provide evidence for our hypothesis using both behavioral and simulated data, and briefly point to recent evidence that the brain differentiates between these two motives.

**Keywords: estimation uncertainty, unexpected uncertainty, Bayesian learning, exploration bonuses, restless bandit problem**

### **1. INTRODUCTION**

Learning to choose between multiple unknown prospects, in the hope of eventually exploiting the most rewarding ones, is a difficult yet fundamental problem. It involves a trade-off between two competing courses of action: to exploit known options that are believed to yield the best outcomes versus to explore unknown alternatives that may be even more rewarding.

Little is known about how humans solve this trade-off. In particular, the determinants of exploratory decisions remain underspecified. In the model-free reinforcement learning framework, exploration is undirected, i.e., it boils down to introducing *annealing* in the choice rule, whereby the agent either periodically chooses at random, or increases stochasticity of choice when options have similar estimated values (Sutton and Barto, 1998). A more efficient strategy may consist of directing exploration to those options about which the agent is most uncertain about the expected value (e.g., Gittins and Jones, 1974; Kakade and Dayan, 2002; Huettel et al., 2006; Cohen et al., 2007). Whether individuals implement such uncertainty-driven exploration remains an open question.

The existing evidence for uncertainty-driven exploration is mixed. Recently, (Frank et al., 2009) found that participants in a reward learning task were"*ambiguity seekers*,"i.e., they strategically explored the least well known options, with large individual differences that varied as a function of prefrontal cortex genetic function. In a follow-up imaging study (Badre et al., 2012) revealed the rostrolateral prefrontal cortex (RLPFC) to signal estimation uncertainty only in the participants identified as ambiguity seekers. Furthermore, Cavanagh et al. (2011) showed with EEG that these uncertainty signals are represented prior to the decision, which further suggests they drive ambiguity seeking choice. However, these results may appear at odds with the ample evidence, from

Allais (1953)to Payzan-LeNestour and Bossaerts (2011), that individuals direct exploration to the *least* uncertain options, thereby shying away from coping with the unknown ("*ambiguity aversion*"). A neurobiological foundation for ambiguity aversion has recently been laid (see, e.g., Hsu et al., 2005; Huettel et al., 2006; Levy et al., 2010).

The current study attempts to reconcile these findings. As noted by Cavanagh et al. (2011) and Badre et al. (2012), the phenomenon of ambiguity aversion could be parasitic on *sticky choice* – the behavioral pattern consisting in repeating the same choice regardless of reward statistics. The idea is that would the agent preferentially choose the options he repeatedly chose in the past, he may behave this way either because he is ambiguity averse (those repeatedly sampled options are the least uncertain), or merely because he tends to stick to prior choices. A related concern is that unless modeled explicitly, sticky choice makes it hard to identify any positive influence of estimation uncertainty on exploration. However, sticky choice appeared to be a second-order phenomenon in Payzan-LeNestour and Bossaerts's (2011) task. Besides, the evidence for ambiguity aversion documented in Payzan-LeNestour and Bossaerts (2011) still prevailed after accounting for sticky choice in the behavioral models used in that study, which rules out the possibility that such ambiguity averse behavior merely be "sticky choice in disguise<sup>1</sup> ."

<sup>1</sup> Specifically, the data reported in Payzan-LeNestour and Bossaerts (2011)were fitted by a model allowing for both modulation of exploration by ambiguity (ambiguityseeking or ambiguity-averse – see Results Section for details on the functional forms) and stickiness in choice (i.e., choice probability is biased towards the latest chosen option, with the biasing factor being a free parameter). The value of the weight on the ambiguity component turned out to be negative for the majority (60 out of 62) of the subjects, which implies ambiguity aversion.

The current study proposes a novel hypothesis about exploration that helps reconcile the findings of Payzan-LeNestour and Bossaerts (2011) and Frank et al. (2009)/Cavanagh et al. (2011)/Badre et al. (2012; henceforth, FCB). According to this hypothesis, uncertainty-driven exploration involves a dilemma between two motives: (i) to speed up learning about the unknown, which may beget novel reward opportunities; (ii) to avoid the unknown because it is potentially dangerous. The first motive is connected with the notion of *curiosity* (van Dijk and Zeelenberg, 2007) whereas the second is connected with cautiousness. Below we will briefly point to recent evidence that the brain differentiates between these two motives. We argue that in the task used in FCB, both motives prevailed, though behavior was only influenced by the first motive, which dominated the second one. The second motive was somewhat muted because the potential monetary losses in that task were relatively small, especially compared to those in the task used in Payzan-LeNestour and Bossaerts (2011), where the payoffs were highly skewed. The two motives were – arguably – equally important in that task. This claim may seem strange at first: that ambiguity aversion prevailed would rather suggest that the second motive dominated, i.e., that the cautionary signal not to bet on things unknown countervailed the directive to sharpen the learning about the unknown. But the current study shows that our subjects were in fact both ambiguity averse and novelty seekers.

We flesh out new explanations of subject behavior in Payzan-LeNestour and Bossaerts's (2011) task, a *restless* (Wittle, 1988) multi-armed bandit in which reinforcement contingencies jumped at unsignaled times. In this kind of changing environment, the directive to speed up learning is primarily relayed through *unexpected uncertainty* (Yu and Dayan, 2005) signals: when jump likelihood is high (i.e., unexpected uncertainty is great), the motivation to explore to find out novel reward opportunities ought to be maximal. We fitted to subject behavior in the task a new model that allows trial-by-trial estimates of both estimation uncertainty and unexpected uncertainty. This model assumes that the agent, in addition to directing exploration to the options for which estimation uncertainty is minimal, also directs exploration to the options for which unexpected uncertainty is maximal. This model markedly improved the fit of the previously developed ambiguity averse model, which Payzan-LeNestour and Bossaerts (2011) found to be the best fit to behavior in the task. This finding shows that in our experiment, unexpected uncertainty modulated the "curiosity motive" (i), while estimation uncertainty modulated the "cautiousness motive" (ii).

We also show with simulated data that the behavior consisting of mixing ambiguity aversion with novelty seeking is natural viewed from the evolutionary fitness principle. We conducted a number of simulations of behavior in the foregoing restless bandit task, in order to compare economic performance of a variety of models that allowed alternate kinds of uncertainty-driven exploration (specifically, ambiguity seeking, ambiguity aversion, novelty seeking, and a mixture of the latter two). Our simulated data reveal ambiguity aversion to improve economic performance in the task compared to ambiguity seeking. This result questions the standard claim that ambiguity aversion [i.e., motive

(ii) in the above dilemma] is irrational. We further found that the behavior that mixes ambiguity aversion with novelty seeking fared best in the task. This suggests that both stated motives (i) and (ii) can be vindicated on the grounds of evolutionary fitness.

## **2. MATERIALS AND METHODS**

### **2.1. EXPERIMENTAL TASK**

The current study builds on the restless bandit task originally described in Payzan-LeNestour and Bossaerts (2011) as well as Payzan-LeNestour (2012), where full task details are provided<sup>2</sup> . In what follows we focus on the task features relevant for the current study.

The task is a six-armed bandit. Three arms are blue and three are red. Color is visible. At each trial, every arm generates one of three possible outcomes: 1, −1, or 0 CHF<sup>3</sup> for the blue arms; 2, −2, or 0 CHF for the red arms. At each trial, the agent selects one arm and immediately receives the outcome returned by the chosen arm. He is not told the outcomes returned by the other arms.

Our bandit is restless: while absolute expected value is constant for each arm, the sign of expected value occasionally flips, thus arms switch from having positive to negative expectation and back. The flips in the outcome probabilities occur without notice. Specifically, changes are instantiated with two independent Bernoulli processes, one for the blue arms and one for the red. For each process and at each trial, either "jump" or "no jump" occurs. When jump occurs for one of the two colors, then at the three arms of this color, the probabilities of two outcomes flip. Jump frequency is higher for the red arms than for the blue ones (1/4 versus 1/16), whereby unexpected uncertainty is higher for the red arms on average.

The subject knows that outcome probabilities will change without warning during the experiment (he also knows red arms are more unstable but is not told the jump probabilities), which leads him to track unexpected uncertainty throughout the task, as we show elsewhere (Payzan-LeNestour et al., in preparation). The same study reveals subjects to track estimation uncertainty as well. One distinctive characteristic of our design is that the levels of both estimation uncertainty and unexpected uncertainty vary substantially during the task. Unexpected uncertainty levels vary from high, upon jumps, to low, during the stable phases. Also, because learning has to be reset after each jump, estimation uncertainty remains significant throughout the task. This manipulation renders the trial-by-trial estimation of both uncertainty components meaningful. Importantly, participants in our task did estimate these components, contrary to that in prior studies where unexpected uncertainty appeared to be artifactually maximal throughout the task (e.g., Daw et al., 2006; Jepma and Nieuwenhuis, 2011) 4 .

<sup>2</sup>Payzan-LeNestour (2012) is available at http://papers.ssrn.com/sol3/papers.cfm? abstract\_id=1628657.

<sup>3</sup> Swiss Francs, the currency used in the original experiment.

<sup>4</sup> In these studies, the analysis suggests that participants presumed changes in the reward contingencies would occur at each trial during the task, perhaps because the task instructions were vague about the nature of the changes in the reward contingencies, and in the absence of knowledge, the "worst-case scenario" (maximal instability) is imagined.

#### **2.2. COMPUTATIONAL MODELS**

The current study augments the Bayesian model described in Payzan-LeNestour and Bossaerts (2011). Here we briefly point to the essentials of that model. The model learns the outcome probabilities of the six arms through a natural sampling scheme (analogous to the one proposed in Hirayama et al. (2004, 2006) and Quinn and Karny (2007) which exponentially discounts ("forgets") the past outcomes returned by a given arm after discovering the arm has jumped. A key feature of the model is that the discount factor is adjusted on the spot on each trial *T*. It equals the likelihood that no jump occurred at trial *T*, i.e., it quantifies the "confidence in stability" at trial *T*. Since jumps are color-specific in the task, the model uses two discount factors, one for the red arms, λ*red*(*T*), and one for the blue, λ*blue*(*T*). λ*red*(*T*) (resp. λ*blue*(*T*)) is thus proportional to the strength of evidence that red arms (resp. blue arms) did not change at trial *T*.

Exponential discounting of the past has the appealing property of being related to *leaky-integration processes*, which have been commonly used to model neuronal dynamics in a changing environment (e.g., Sugrue et al., 2004). So this kind of "forgetting Bayesian" model is both a good descriptive model of behavior (as shown in Payzan-LeNestour and Bossaerts, 2011) and a good model of neuronal dynamics (as argued in Yu and Cohen, 2009) 5 .

For each arm *i* and at each trial *T*, the model computes *Q*(*i,T*), the expected value (i.e., the sum of the three possible outcomes weighted by their estimated probabilities of occurrence). The model thus assumes participants were risk neutral and did not distort the outcome probabilities, which is at odds with a number of theories (e.g., *Prospect Theory*). The motivation for this modeling choice is both parsimony and agnosticism about whether/how individuals actually distort probabilities (which reflects disagreement in the literature<sup>6</sup> ).

Action selection in the task is modeled with the *softmax rule*. According to this rule, option *i* is chosen with probability *PiT* which is proportional to the exponential of the value of arm *i*:

*PiT* ∝ expβ*Qi*,*<sup>T</sup>* .

β (the *inverse temperature*) is a free parameter controlling the degree to which the subject makes exploitative choices versus exploratory ones.

Payzan-LeNestour and Bossaerts (2011) report that their behavioral data were best fit with the assumption that subjects tracked the level of estimation uncertainty of the options, in order to strategically explore options with minimal estimation uncertainty on a given trial. Such ambiguity averse behavior is accomplished by subtracting from the *Q*-value entering the softmax rule an exploration "malus" proportional to the level of estimation uncertainty:

$$Q\_{\vec{t}T} \leftarrow Q\_{\vec{t}T} - e\mu\_{\vec{t}T},$$

where *euiT* is the level of estimation uncertainty about option *i* at trial *T*, quantified in terms of the width (variance or entropy) of the posterior probability distribution tracked by the Bayesian learner (cf.Yoshida and Ishii,2006;Behrens et al.,2007 and Payzan-LeNestour and Bossaerts, 2011). The width of the distribution reflects the subject's uncertainty regarding option value. Early in learning, the width is larger (and uncertainty higher) than later is learning.

The alternate "ambiguity seeking" model assumes that subjects guided exploration toward the options for which estimation uncertainty was maximal, whereby they explored the least well known options. This behavior is instantiated by adding to the *Q*value an exploration bonus proportional to the level of estimation uncertainty:

*QiT* ← *QiT* + *euiT* .

The two previous models modulate exploration as a function of estimation uncertainty. We also developed a model featuring a novel kind of uncertainty-driven exploration, to formalize the idea – previously suggested by Cohen et al. (2007) – that exploration ought to be modulated by unexpected uncertainty. Specifically, when reinforcement contingencies change abruptly over time, survival depends on constant adaptation to such changes. This adaptation requires that the agent increases exploration when he deems the environment to be novel (i.e., when unexpected uncertainty is high), in accordance with our stated motive (i) above. We refer to this behavior as "novelty seeking" (to be distinguished from ambiguity seeking as previously defined). In the context of our multi-armed bandit task, the novelty seeking model directs exploration to the arms that have most probably changed. What follows describes how this behavior is accomplished. Without loss of generality, suppose the arm that is tried out at trial *T* is a red one. The model adds to the value of the two red options not currently sampled an exploration bonus proportional to the level of unexpected uncertainty:

$$Q(i, T) \leftarrow Q\left(i, T\right) + \left(1 - \lambda\_{rel}\left(T\right)\right),$$

where 1 − λ*red*(*T*) is the level of unexpected uncertainty about the red options at trial T, quantified in terms of the likelihood that red options did change at trial *T*. To further increase novelty seeking after a jump has been detected, the model also penalizes the value of the arm that is currently tried out, in proportion to the level of unexpected uncertainty at the current trial: *Q*(*i*,*T*)←*Q*(*i*,*T*) − (1 − λ*red*(*T*)).

According to the hypothesis stated in the Introduction, both motives (i) and (ii) influence exploratory decisions. To reflect this, the "hybrid model" combines ambiguity aversion and novelty seeking by modifying the *Q*-value of the two red options not currently sampled as follows:

$$Q(i, T) \leftarrow Q\left(i, T\right) - \epsilon u\_{iT} + \left(1 - \lambda\_{rel}\left(T\right)\right),$$

<sup>5</sup>Alternate Bayesian schemes could do as well. For instance, eraspou proposes a "Hierarchical Bayesian"model that is equally good at learning outcome probabilities in the current task, compared to the forgetting Bayesian approach. The probability estimates of the two models are strongly correlated. The forgetting Bayesian model is more tractable and particularly suitable for our purpose in the current analysis. <sup>6</sup>E.g., Trommershäuser et al. (2008) report that subjects in a movement task represented probabilities in a way that was close to perfect (no distortion whatsoever).

By contrast Hertwig et al. (2003) document underweighting of the probability of occurrence of rare events, which is at odds with Prospect Theory which states overweighting.

while the value of the arm that is currently tried out is modified as follows: *Q*(*i*,*T*)←*Q*(*i*,*T*) − *euiT* − (1 − λ*red*(*T*)). This hybrid model is the readout of the aforementioned dilemma in the context of the current task: unexpected uncertainty modulates motive (i) while estimation uncertainty modulates motive (ii).

Note that the foregoing models put equal weight on the *Q*-value and uncertainty components. The motivation for this particular modeling choice is parsimony; the relative weights can be changed without changing the essence of the schemes. Specifically, to ensure that our results are robust, for each of the four models above, we tested several alternate models that have a different relative weighting on the *Q*-value component vis-a-vis the uncertainty component(s). These alternative models led to similar results.

#### **2.3. EVALUATING MODEL FIT TO BEHAVIORAL DATA**

We fitted the two new models introduced by the current study (the novelty seeker and hybrid models) to the choice data of Payzan-LeNestour and Bossaerts (2011), using maximum likelihood estimation. Only one parameter (the inverse temperature β) needed to be estimated. We allowed this estimated parameter to vary across participants. We compared the log-likelihoods of each model to the one of the ambiguity averse model (the best fit in Payzan-LeNestour and Bossaerts, 2011) which we use as benchmark here.

#### **2.4. EVALUATING MODEL FITNESS IN SIMULATED DATA**

We compared the average fitness of the ambiguity averse, ambiguity seeking, novelty seeker, and hybrid models, in a set of 500 simulations of the task, each comprised of 500 trials (the length of our experimental sessions). Here the gage of fitness is the economic performance, i.e., the money accumulated in the 500 trials of the task, averaged across the 500 simulations. For each model, we ran the set of 500 simulations for different values of β, which allowed us to assess the fitness as a function of β.

#### **3. RESULTS**

#### **3.1. BEHAVIORAL**

The novelty seeker model fitted choices better than the benchmark (ambiguity averse model) in the vast majority (95%) of the participants. A *paired t-test* based on the difference between the negative log-likelihoods of the benchmark and novelty seeker models leads to the conclusion that the novelty seeker model fitted subject behavior better than the benchmark (*p* < 0.001; *N* = 62). For 82% of the participants, the hybrid model fitted subject behavior better than the novelty seeker model. The former significantly outperformed the latter according to a paired *t*-test (*p* < 0.001). **Figure 1** reports the negative log-likelihood of the hybrid model, related to that of the benchmark.

#### **3.2. SIMULATIONS**

**Figure 2** shows that in our simulations, the ambiguity averse model performed uniformly better than not only the ambiguity seeking model but also the model that excludes any kind of modulation of exploration by uncertainty ("base model"<sup>7</sup> ). The novelty seeker

model outperformed the ambiguity averse model, and the hybrid model performed best overall. The standard error of the economic performance is of the same order of magnitude across all models.

#### **4. DISCUSSION**

Both the behavioral and simulated data reported here support the hypothesis stated in the Introduction. Specifically, the evidence suggests that individuals seek to uncover novel reward opportunities ["curiosity motive" (i)] while they also tend to shy away from the unknown ["cautiousness motive" (ii)], and that this behavior is adaptive, at least in the context of the present task.

Note the ways the task used in the current study is atypical in comparison to previous tasks that were used to study exploration (Daw et al., 2006, FCB). In our task, the dynamic contingencies induced unexpected uncertainty about the value of unexplored options. Unexpected uncertainty and estimation uncertainty did vary significantly throughout the task and participants could estimate them on each trial. This allowed the identification of an unexpected uncertainty bonus together with an estimation uncertainty "malus"in subject exploration. By contrast,in an environment that is unexpected uncertainty free, i.e., when the reinforcement contingencies are stationary (like in the task used in FCB), estimation uncertainty modulates both motives (i) and (ii), and behavior is the readout of the dominating motive [arguably (i) in FCB]. Perhaps cautiousness was muted in FCB because participants knew they would not lose much money by exploring. Additionally, as suggested in Cavanagh et al. (2011), the motivation to learn should be maximal when the agent knows he can potentially suppress ignorance, which is in principle the case when things are stable. In contrast, when things change all the time, motive (i) is probably dampened since the "returns on learning" are low.

<sup>7</sup>While the superiority of the ambiguity-averse model over the ambiguity-seeker model appears to be robust to the use of different weighting on the Q-value relative to the uncertainty component in the decision rule, the superiority of the

ambiguity-averse model over the base model is not. Specifically, in our simulations, the ambiguity-averse model that puts a minimal weight on the Q-value (i.e., that tends to focus on the uncertainty component exclusively) did not outperform the base model.

Strikingly, the dilemma we describe here has been overlooked in prior work in decision neuroscience and machine learning, on the grounds that exploration should be exclusively driven by the directive to find out more (e.g., Gittins and Jones, 1974; Kakade and Dayan, 2002). Yet, the motive to not bet on the unknown, which is perceived as potentially dangerous, may be equally – if not more – important for survival. Our simulated data point to this possibility: the ambiguity averse model fared better than the ambiguity seeker model in our task.Also, the finding that the ambiguity averse model (let alone the novelty seeker and hybrid models) performed better than the primary model, which excludes any kind of modulation of exploration by uncertainty, should caution the generally accepted view in classical decision theory (Savage, 1954) that uncertainty-driven exploration is irrational. For standard valuation theory, any sensitivity to uncertainty is irrational in that it violates one of the most fundamental principles of rational decision making, namely *the sure thing principle*<sup>8</sup> . Our results contradict this view. We find that in the context of natural sampling, being sensitive to uncertainty appears to be beneficial. This may be

#### **REFERENCES**


the reason why humans display such sensitivity, even if this generates choice inconsistencies in other contexts (e.g., the *Ellsberg Paradox*; Ellsberg, 1961). Humans can afford to be "irrational" as long as this shows up only in ecologically irrelevant contexts (like the gambles underlying the Ellsberg Paradox?), and as long as it is adaptive in ecologically relevant contexts (like our natural sampling task).

That ambiguity aversion may play a positive role, in avoiding danger, has been suggested (albeit implicitly) in Hsu et al. (2005), where amygdala was found to encode ambiguity, presumably through "fear signals." Also, the current evidence that unexpected uncertainty induces novelty seeking in the action selection rule, together with prior evidence that unexpected uncertainty plays a key role in value updating (e.g., Behrens et al., 2007 and Payzan-LeNestour and Bossaerts, 2011), suggests that unexpected uncertainty plays a dual role, as a modulator of learning as well as of action selection. This implies new challenges and opportunities for neurobiological studies. One can envisage unexpected uncertainty to influence learning through the neuromodulator norepinephrine, while it biases choice through changes in serotonin levels. The former would be consistent with Hasselmo (1999), Yu and Dayan (2005), Rutishauser et al. (2006); the latter would be related to Doya (2008).


<sup>8</sup>According to the sure thing principle, if the agent would take a certain action if he knew that an event *E* obtained, and also if he knew that the negation of *E* obtained, then he should take that action even if he knows nothing about E.


Hirayama, J.,Yoshimotoa, J., and Ishii, S. (2006). Balancing plasticity and stability of on-line learning based on hierarchical Bayesian adaptation of forgetting factors. *Neurocomputing* 69, 1954–1961.

Hsu, M., Bhatt, M., Adolphs, R., Tranel, D., and Camerer,C. F. (2005). Neural systems responding to degrees of uncertainty in human decisionmaking. *Science* 310, 1680–1683.


*Comput. Biol.* 7, e1001048. doi:10.1371/journal.pcbi.1001048


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 June 2012; accepted: 19 September 2012; published online: 16 October 2012.*

*Citation: Payzan-LeNestour É and Bossaerts P (2012) Do not bet on the unknown versus try to find out more: estimation uncertainty and "unexpected uncertainty" both modulate exploration. Front. Neurosci. 6:150. doi: 10.3389/fnins.2012.00150*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Payzan-LeNestour and Bossaerts. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Effects of prior knowledge on decisions made under perceptual vs. categorical uncertainty

## **Kathleen A. Hansen<sup>1</sup>\*, Sarah F. Hillenbrand<sup>2</sup> and Leslie G. Ungerleider <sup>1</sup>**

<sup>1</sup> Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA <sup>2</sup> Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA

#### **Edited by:**

Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland

#### **Reviewed by:**

Floris P. De Lange, Radboud University Nijmegen, Netherlands Erie Dell Boorman, University of Oxford, UK

#### **\*Correspondence:**

Kathleen A. Hansen, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Building 10, Room 4C104, Bethesda, MD 20892, USA. e-mail: hansenka@mail.nih.gov

Humans use prior knowledge to bias decisions made under uncertainty. In this fMRI study we predicted that different brain dynamics play a role when prior knowledge is added to decisions made under perceptual vs. categorical uncertainty. Subjects decided whether shapes belonged to Category S – smoother – or Category B – bumpier – under both uncertainty conditions, with or without prior knowledge cues.When present, the prior knowledge cue, 80/20 or 50/50, indicated that 80 and 20% (or 50 and 50%) were the chances that responding "S" and "B" (or vice versa) would be correct. During perceptual uncertainty, shapes were degraded with noise. During categorical uncertainty, shapes were ambiguous. Adding the 80/20 cue increased activation during perceptual uncertainty in bilateral lateral occipital (LO) cortex and left middle frontal gyrus (MidFG), and decreased activity in bilateral LO cortex during categorical uncertainty. Right MidFG and other frontoparietal regions were active in all conditions. The results demonstrate that left MidFG shows activation changes, suggestive of an influence on visual cortex, that depend on the factor that makes the decisions difficult. When sensory evidence is difficult to perceive, prior knowledge increases visual cortical activity. When the sensory evidence is easy to perceive but difficult to interpret, prior knowledge decreases visual cortical activity.

**Keywords: prior probability, expectation, frontoparietal, dorsolateral prefrontal cortex**

### **INTRODUCTION**

Studies of perceptual decisions made under uncertainty use various methods to define and control uncertainty. One common approach is to ask subjects to make decisions about targets degraded with noise. We call this type of uncertainty *perceptual uncertainty* because difficulty perceiving the sensory evidence in the noise is the limiting factor on accuracy. Another approach is to ask subjects to make decisions about targets that are members of overlapping categories, such that some targets are ambiguous and could belong to either category. We call this type of uncertainty *categorical uncertainty* because the sensory evidence, though easy to perceive, is difficult to interpret.

Historically, researchers testing sensory and systems neuroscience hypotheses typically choose to use perceptual uncertainty, while researchers testing neuroeconomic and cognitive neuroscience hypotheses are more likely to use conditions analogous to categorical uncertainty. In both uncertainty conditions, when prior knowledge indicates that one alternative is likelier than another, subjects bias their decisions in favor of the indicated alternative (Green and Swets, 1966). However, the neural mechanism(s) underlying this behavioral effect are not well understood.

In this study, we show that modulatory effects obtained in the laboratory using perceptual uncertainty may not generalize to conditions of categorical uncertainty, and vice versa. These results may be valuable to researchers seeking to interpret data and design translational studies bridging different subfields of neuropsychology. For example, in ecological contexts, the ability to apply prior knowledge during conditions of perceptual uncertainty may be highly adaptive. If an organism knows that there are tigers in the region, it makes sense for that organism to "see" a barely perceptible shape in the shadows as a likely tiger. In contrast, there are other contexts – for example, financial – in which categorical, not perceptual, uncertainty is the bottleneck making decisions difficult. When a person makes decisions about how to interpret a number representing a price, the decision process generally does not depend on the legibility of the digits. More commonly, the digits are clearly perceived, but may be difficult to categorize as too high or too low.

In previous fMRI studies (Hansen et al., 2011, 2012),we manipulated prior knowledge during decisions about visual stimuli in categorical uncertainty only. Instead of asking subjects to make decisions about abstract items such as numbers, we asked them to categorize shapes that differed along the single, quantitative dimension of curvature. These studies showed that prior knowledge altered fMRI activity levels in prefrontal and parietal cortex, but did not reveal enhanced activityfrom prior knowledge in visual cortex. The absence of an effect in visual cortex was surprising, because it seemed to be at odds with published observations documenting that cues providing subjects with expectations about visual stimuli enhanced activity in stimulus-selective visual cortex (Eger et al., 2007; Summerfield and Koechlin, 2008; Esterman and Yantis, 2010).

We wondered whether this lack of an effect in visual cortex might be due to the fact that decisions in our studies were made under categorical uncertainty only. Our reasoning here was that given sensory stimuli that were ambiguous but not degraded with noise, substantial internal modulation of the sensory evidence would amount to misperception. In contrast, during perceptual uncertainty, when sensory stimuli are noisy, it could be adaptive for prior knowledge to enhance the representation of the evidence itself. Therefore, we hypothesized that prior knowledge would increase activity in sensory processing regions in decisions made under perceptual uncertainty but not categorical uncertainty.

To test this hypothesis, we asked subjects to categorize curved shapes under perceptual uncertainty, with and without prior knowledge, and compared the resulting behavioral and fMRI data with the previously published categorical uncertainty data. Differences in activation between the perceptual and the categorical uncertainty conditions were identified in left middle frontal gyrus (MidFG) and in bilateral lateral occipital (LO) cortex. In all three regions, activation levels were greater during perceptual than categorical uncertainty. Breaking down the within-regions of interest (ROI) data into prior knowledge and naïve subject groups revealed that the activation differences observed in the pooled data were driven by the prior knowledge group. In the perceptual uncertainty condition, activations in left MidFG and bilateral LO were higher for prior knowledge subjects than naïve subjects, while in the categorical uncertainty condition, activations in left MidFG and bilateral LO were lower for prior knowledge subjects than naïve subjects. The sign of the activations was positive in all conditions in the occipital regions. In left MidFG, the activation was positive during perceptual uncertainty with prior knowledge and negative in the other conditions. Right MidFG and other regions previously implicated in executive control and decisions were positively activated in all conditions, but the activation levels did not differ across uncertainty type. Thus, positive MidFG activation was right-biased in three of the four experimental conditions, and activation was seen in both right and left MidFG only during perceptual uncertainty with prior knowledge.

These findings indicate that left MidFG shows activation changes, suggestive of an influence on visual cortex, that depend on thefactor that makes the decisions difficult. Given prior knowledge when the limiting factor is perceptibility, right prefrontal activity is accompanied by positive activity in left prefrontal cortex and enhanced positive activity in sensory processing regions. In contrast, when the sensory evidence is easy to perceive but difficult to interpret, prior knowledge results in right-biased prefrontal activity accompanied by decreased positive activity in sensory processing regions.

## **MATERIALS AND METHODS PARTICIPANTS**

In this study, we report fMRI and behavioral data from 66 subjects (34 male) of mean age 25 years (range 20–41). All subjects provided informed consent before the experiment. All procedures were approved by the National Institute of Mental Health Institutional Review Board. All subjects were right-handed and had normal or corrected-to-normal vision. Of the subjects, 22 made decisions under perceptual uncertainty with prior knowledge; 22 made decisions under categorical uncertainty with prior knowledge; and 22 made decisions under both uncertainty conditions with no prior knowledge. For the first subject group, we acquired data from 26 subjects but excluded data from four because *d* 0 from the scanning data was more than 2 SDs below the mean of the other subjects'*d* 0 (two subjects) or a decision criterion shift was observed in the non-predicted direction (two subjects). The last two subject groups were described in a previous publication in which we used a different approach to analyze the datasets (Hansen et al., 2011); the perceptual uncertainty data are presented here for the first time.

## **STIMULI AND TASK**

Subjects used two fingers of the right hand to press buttons to report decisions about visual targets. In the categorical uncertainty condition, targets were form-modulated. The form-modulated targets were distorted circles with sinusoidal modulation ranging linearly from 4 to 22% of the mean radius, with a step size of 0.5% (**Figure 1**). No noise obscured the form-modulated targets. Distributions of Category S and B form-modulated targets were Gaussian and overlapping (Healy and Kubovy,1981;Maddox, 2002). The overlapping distributions (**Figure 1**) made the intermediate form-modulated targets ambiguous, so that the targets alone would not contain sufficient information for subjects to classify them with perfect accuracy. In the perceptual uncertainty condition, targets were signal-to-noise-modulated (SNR-modulated). The SNR-modulated targets were distorted circles (Wilkinson et al., 1998) with sinusoidal modulation of either 4% (Category S for smooth) or 22% (Category B for bumpy) of the mean radius, obscured with noise (**Figure 1**). The noise pattern used in each image was unique. We created the noise patterns by combining an original target's power spectrum with random phases and converting this information back to image space via inverse Fourier transform. Each target was overlaid with its own noise pattern, using one of nine different weight ratios that ranged from 15% target + 85% noise to 40% target + 60% noise. The weights were derived from pilot studies done outside the scanner in order to equate behavioral performance (as measured by *d* 0 and by the magnitude of the criterion shift between the 80/20 and 50/50 prior knowledge conditions) during perceptual uncertainty relative to categorical uncertainty. In all, cases, targets were presented one at a time with random sizes, orientations, and locations to prevent subjects from relying on retinotopic location or spatial attention in order to perform well.

Before entering the scanner, all subjects underwent behavioral training that provided instant feedback after each trial. For the subjects making decisions with prior knowledge, the explicit prior knowledge cue "80/20" in some runs and "50/50" in other runs preceded each SNR-modulated or form-modulated shape. The indicated target category – that is, the category indicated by 80 in the 80/20 runs – was either S or B for each subject. The 80/20 training runs were comprised of 80% indicated and 20% contraindicated targets, and the 50/50 training runs were comprised of 50% indicated and 50% contraindicated targets. Thus, during training, the explicit prior knowledge cues reflected the implicit prior probability distributions of the targets. The training distributions were created by manipulating the prior probability of occurrence of the physical targets themselves, rather than changing the category boundary. The prior knowledge subjects were informed explicitly that the target distributions were either 80%

indicated and 20% contraindicated or 50% of each, and their understanding of this concept and the task was confirmed by their answers to questions during pre-training instruction.

For the naïve subjects, a sham cue "OO/OO" preceded each SNR-modulated or form-modulated shape. The subjects were told that they could think of the letter O's as open eyes reminding them to keep looking at the screen. Except for the cue, the training runs for the naïve subjects were identical in all respects, including the target images, to the 50/50 training runs for the prior knowledge subjects. Thus, the training runs for the naïve subjects were comprised of 50% S and 50% B targets, although the subjects were not informed explicitly of this fact. In fact, except for the cue, all aspects of the training runs for the naïve subjects, including the target image sets, were identical to the training runs used for the prior knowledge subjects at 50/50. The subjects' understanding of the task was confirmed by their answers to questions during pre-training instruction.

During scanning, no subject received feedback. In one-third of the scanning trials, a blank screen took the place of the target and subjects were instructed to make no response; including these blank trials permitted us to obtain estimates of activity during decision vs. blank trials. The only difference in the runs for the prior knowledge subjects vs. the naïve subjects was that the cues – as in the training runs – were 80/20 or 50/50 for the prior knowledge subjects and OO/OO for the naïve subjects. Importantly, for all subjects, all scanning runs were comprised of 50% S and 50% B targets, and the target images themselves were identical in every respect for all subjects. This control ensured that differences between prior knowledge conditions could be attributed only to the cue and not to stimulation differences.

The order of trial types (Category S target, Category B target, or blank) for the scanning runs was determined by assigning each run a different ternary m-sequence. m-Sequences are efficient in terms of signal per time, especially for relatively short scan durations, and are exactly counterbalanced over time, minimizing any uncontrolled adaptation or expectation effects (Sutter, 2001; Buraˇcas and Boynton, 2002). m-Sequences were generated using code written by Buraˇcas and Boynton (2002). Each run-length msequence was length 3<sup>4</sup> − 1 = 80 trials, consisting of 27 Category S stimulus trials, 27 Category B stimulus trials, and 26 blank trials. Each trial lasted 2.5 s.A blank grayscale screen was shownfor 10 s at the beginning of each run to allow the magnetic field to reach equilibrium and for 12.5 s at the end of each run to allow for the delay in the hemodynamic response. The data presented here represent six runs at 80/20 and six runs at 50/50 from each prior knowledge subject, under either perceptual or categorical uncertainty, and six runs under perceptual and six runs under categorical uncertainty from each naïve subject.

#### **IMAGING DATA ACQUISITION AND PREPROCESSING**

All MRI data were collected on a GE 3-T scanner with a GE whole-head eight-channel coil. For fMRI we used an EPI (echoplanar imaging) sequence with TR (repetition time) = 2.5 s per shot (=2.5 s per acquired brain volume), TE (echo time) = 30 ms, field of view 22 cm × 22 cm, resolution 64 × 64 voxels per slice (inplane voxel size 3.4 mm × 3.4 mm), and slice thickness 3.0 mm. Each fMRI brain volume consisted of 38 axial slices. For anatomical images we used an MP-RAGE (magnetization prepared rapid acquisition gradient echo) sequence with field of view 24 cm × 24 cm, 128 locations per slab, and slice thickness 1.2 mm. Unless otherwise noted, preprocessing and subsequent analysis of theMRI data was performed with theAFNI software package (Cox, 1996; Cox and Hyde, 1997). The first four brain volumes of every fMRI run were removed; brain volumes were shifted to account for slice acquisition time and motion-corrected. Each subject's T1 weighted anatomical dataset was warped via 12-parameter affine transform to the TT-N27 brain template.

#### **ROI IDENTIFICATION**

To identify ROIs as a test of our main hypothesis – that prior knowledge would increase activity in sensory processing regions in decisions made under perceptual uncertainty but not categorical uncertainty – we used a general linear model (GLM) in which the regressor of interest was a sequence of 0's and 1's convolved with a model hemodynamic function. The 0's and 1's represented blank and decision trials respectively. The outputs of each GLM were voxelwise beta weights representing decision trial activity for a single subject in one condition, where the six possible conditions were perceptual uncertainty with prior knowledge at 80/20, perceptual uncertainty with prior knowledge at 50/50, perceptual uncertainty without prior knowledge, categorical uncertainty with prior knowledge at 80/20, categorical uncertainty with prior knowledge at 50/50, and categorical uncertainty without prior knowledge. Using a two-tailed *t*-test on data from the 80/20 and naïve conditions, pooled across all subjects, we calculated the group voxelwise significance of the absolute value of the difference between the beta weights from the perceptual uncertainty vs. categorical uncertainty conditions. ROIs were located by limiting surviving clusters in the group results to regions with *p*-values < 0.05, corrected for multiple comparisons across voxels. Cluster coordinates were determined by affine registration to the TT-N27 brain template.

In a subsequent test we located regions with positive decisionrelated fMRI activity in the conjunction of four conditions: 80/20 perceptual uncertainty, naïve perceptual uncertainty, 80/20 categorical uncertainty, and naïve categorical uncertainty. Using a two-tailed *t*-test pooled across all subjects, we calculated the group voxelwise significance of the mean activation level for each condition. ROIs were located by limiting surviving clusters to regions with *p*-values < 0.05, corrected for multiple comparisons across voxels and experimental conditions. The conjunction here was the strict conjunction of conditions. As in Nichols et al. (2005), we used a test for a logical AND by requiring that all the comparisons in the conjunction were individually significant: to obtain the corrected *p* < 0.05 across conditions, we required positive activations of *p* < 0.0125 in every one of the four conditions. Cluster coordinates were determined by affine registration to the TT-N27 brain template.

We also located regions where activity, as defined by linear covariation with the degree of uncertainty, differed across uncertainty conditions. The object here was to test for the possibility that although no regions exhibited greater average activity during categorical than perceptual uncertainty, some regions'activity covaried with categorical but not perceptual uncertainty and vice versa. We performed ROI searches using this approach on the naïve and 80/20 data independently. In each prior condition, we performed a whole-brain search and a search constrained to perirhinal cortex and anterior temporal lobe, which are known to be responsive for learned visual categories. With the exception of the regressors used in the GLM analyses of individual subject data, these analyses were essentially equivalent to that defining the ROIs for our main hypothesis (above). In the covariation analysis, the regressor of interest was a sequence of numbers ranging between 0 and 1, convolved with a model hemodynamic function. Before convolution, target trials were represented by a number between 0 and 1 equivalent to the distance from the target distribution's nearest extreme to its midpoint. Thus, midpoint targets received a value of 1 (representing complete uncertainty), endpoint targets received a value of 0 (representing no uncertainty), and intermediate targets received values scaling proportionately. Blank trials received values of 0.

## **RESULTS**

### **BEHAVIOR**

The behavioral data acquired during fMRI data acquisition (**Figure 2**) indicated that training with the prior knowledge cues induced a decision bias during the fMRI experiment. In this paper, the term *prior knowledge subjects* refers to subjects trained in conditions that both implicitly and explicitly indicated that one of two target categories was likelier to be presented than the other.We refer to prior knowledge subjects trained that Category S (or B) was the likely category as Group S (or B) prior knowledge subjects. The term *naïve subjects* refers to subjects trained in conditions that did not implicitly or explicitly indicate either category as more likely than the other. For details about the training, see Section "Materials and Methods" and **Figure 1**. During the fMRI experiment, Group S prior knowledge subjects working under both perceptual and categorical uncertainty responded "S" (or "B") for a given shape more often than did the naïve subjects making decisions about the same shapes (**Figure 2**). That is – unsurprisingly – prior knowledge about the stimuli biased subjects' decisions in favor of the expected stimulus type. Under both uncertainty conditions, given the 50/50 prior knowledge cue, the prior knowledge subjects retained a persistent, though diminished, bias of the same sign

as their bias in the 80/20 condition; for an in-depth examination of this phenomenon in the categorical uncertainty condition, see Hansen et al. (2011).

Orange: prior knowledge subjects whose pre-scan training indicated that

To obtain a first indication of the mechanisms underlying the decision bias induced by prior knowledge, we examined response times (RTs) in all subjects (**Figure 3**). In the prior knowledge subjects, RTs were shorter at 80/20 than 50/50 in both uncertainty conditions, demonstrating that prior knowledge about the stimuli conferred a speed advantage regardless of uncertainty type. In the prior knowledge subjects, RTs were also shorter for subjects performing under perceptual than categorical uncertainty. This observation suggests that the mechanism by which prior knowledge is integrated into decisions differs when the decisions are made under perceptual vs. categorical uncertainty. Importantly, RTs in the naïve subjects did not differ during perceptual vs. categorical uncertainty, implying that our effort to match difficulty

across uncertainty types by adjusting the noise weights in the perceptual uncertainty condition was successful.

and relatively few extreme, i.e., unambiguous, targets.

#### **IMAGING DATA**

The current study was designed to reveal differences in how the brain integrates prior knowledge into decisions during perceptual vs. categorical uncertainty. To investigate this topic, we first identified ROIs in which activation levels were different in decision trials made under perceptual vs. categorical uncertainty, pooled across all subjects (Materials and Methods). The ROI locations – left MidFG, left LO and posterior fusiform (LOpF) cortex, and right LO are shown in **Figure 4**, and their coordinates and volumes are listed in **Table 1**. These results show that left MidFG, left LO/pF, and right LO responded differentially to the perceptual uncertainty and the categorical uncertainty conditions.

The analysis that located the ROIs pooled the data across all subjects. To indicate whether the differences were driven by the prior knowledge subjects, the naïve subjects, or both groups, we plotted activations for each condition separately in a within-ROI bar chart (**Figure 5**). The chart shows that the differences were driven by the prior knowledge data. In all three ROIs, the perceptual uncertainty condition evoked the same activity level as the categorical uncertainty condition in the naïve subjects. At 80/20, the prior

**FIGURE 3 | Response times.** The 80/20 cue gave a speed advantage to the prior knowledge subjects relative to the 50/50 cue in the same subjects. Across the prior knowledge subjects, response times were longer for the subjects performing under categorical than perceptual uncertainty. This difference was not seen in the naïve subjects. Stars indicate p < 0.0001, calculated via a two-tailed t-test across conditions.

knowledge subjects showed greater activation than the naïve subjects in visual association (LO/pF) and prefrontal (MidFG) cortices during perceptual uncertainty. In contrast, during categorical uncertainty, at 80/20 the prior knowledge subjects showed less activation than the naïve subjects in bilateral LO (in the MidFG ROI, a trend in the same direction did not reach significance). Thus, the results supported our main hypothesis: prior knowledge increased activity in sensory processing regions in decisions made under perceptual uncertainty but not categorical uncertainty. The results also indicated a prefrontal mechanism for this effect, namely, positive activity levels in left MidFG, which occurred only during decisions made in the combination of perceptual uncertainty and prior knowledge.

The procedure for locating ROIs was based on a contrast: the absolute value of the difference between activity levels during

#### **Table 1 | Brain regions selective for uncertainty condition.**


This table provides the coordinates and volumes of voxel clusters responding with activation differences (p < 0.05, corrected) between decision trials under perceptual vs. categorical uncertainty, pooled across subjects. Negative (positive) values of x indicate the left (right) hemisphere.

**FIGURE 4 | Brain regions selective for uncertainty condition.** We pooled the 80/20 and naive fMRI datasets across all subjects and searched for regions with a significant (p < 0.05, corrected) difference, of either sign,

between the perceptual and categorical uncertainty conditions. Surviving clusters are shown here, overlaid on an average of the anatomical images from all subjects.

**FIGURE 5 |Within-ROI results.** The bar charts show mean brain activity across all decision trials, relative to blanks, in each condition. Stars indicate differences of p < 0.05 as calculated with a two-tailed t-test. In the naïve subjects, decisions under perceptual uncertainty evoked the same level of within-ROI activity as decisions under categorical uncertainty. Adding the prior knowledge cue to decisions under perceptual uncertainty visual targets increased activation in visual association and prefrontal cortices relative to no prior knowledge cue. Adding the prior knowledge cue to decisions under categorical uncertainty decreased activation in visual association cortex relative to no prior knowledge cue. In left MidFG, only decisions made under perceptual uncertainty with the 80/20 prior knowledge cue resulted in positive brain activations; the mean activation with the 50/50 cue was not significantly above zero.

perceptual vs. categorical uncertainty. We also wished to document the regions in which decisions in each of four conditions – perceptual uncertainty with 80/20 prior knowledge, perceptual uncertainty without prior knowledge, categorical uncertainty with 80/20 prior knowledge, and categorical uncertainty without prior knowledge – elicited positive levels of activation. **Table 2** lists the coordinates and volumes of brain regions active in the strict conjunction (defined by logicalAND) of thefour conditions (*p* < 0.05, corrected for multiple comparisons between voxels and experimental conditions): right MidFG; left putamen (Put); two clusters in the left anterior insula (AntIns1 and AntIns2); a left hemisphere thalamic cluster (Thal) whose coordinates included those of the medial dorsal, ventral posterior medial, ventral posterior lateral, and ventral lateral nuclei; a left hemisphere cluster including postcentral gyrus, inferior parietal lobule (IPL), and intraparietal sulcus (PcG/IPL/IPS); right IPL; and large bilateral clusters covering much of ventrotemporal cortex plus some cerebellum (VT/cereb).

One particularly interesting observation emerging from this table is of a right hemisphere MidFG (a.k.a. dorsolateral prefrontal, DLPFC) cluster, active in all four conditions. The right MidFG cluster also overlaps with a region in which activity modulations across prior knowledge conditions were previously shown to correlate with the main effect of prior knowledge on decision behavior, i.e., a shift in the decision criterion (Hansen et al., 2012). The right MidFG cluster observed in the current study is also

#### **Table 2 | Brain regions active in all conditions.**


This table provides the coordinates and volumes of voxel clusters responding with positive activations during decision trials in all four conditions (p < 0.05, corrected), where the conditions were perceptual uncertainty with prior knowledge, perceptual uncertainty without prior knowledge, categorical uncertainty with prior knowledge, and categorical uncertainty without prior knowledge. Negative (positive) values of x indicate the left (right) hemisphere.

located at coordinates that are essentially the mirror image of the left hemisphere MidFG coordinates. Recall that the left hemisphere MidFG ROI was activated by decisions during both prior knowledge and perceptual uncertainty, but deactivated in all other conditions. Thus, across all conditions the overall pattern of activity in DLPFC was generally right-biased, becoming bilateral only when prior knowledge was combined with perceptual uncertainty.

We also located regions where activity, as defined by linear covariation with the degree of uncertainty, differed across uncertainty conditions. The object here was to test for the possibility that although no regions exhibited greater average activity during categorical than perceptual uncertainty, some regions'activity covaried with categorical but not perceptual uncertainty and vice versa. We performed ROI searches using this approach on the naïve and 80/20 data independently. In each prior condition, we performed a whole-brain search and a search constrained to perirhinal cortex and anterior temporal lobe, which are known to be responsive for learned visual categories.

Whole-brain searches in naïve subjects revealed ROIs in left and right LO (**Table 3**). These overlapped with the LO ROIs identified in the main test. Activity levels in both ROIs covaried with the degree of uncertainty more during perceptual than categorical uncertainty. In prior knowledge subjects at 80/20, smaller overlapping ROIs showed the same difference sign (perceptual over categorical). An additional ROI was identified in lingual gyrus in the 80/20 data only, and in this ROI activity levels covaried more **Table 3 | Brain regions selective for uncertainty condition when activity was defined as covariation with degree of uncertainty.**


This table provides the coordinates and volumes of voxel clusters responding with covariation differences (p < 0.05, corrected) between perceptual vs. categorical uncertainty. Negative (positive) values of x indicate the left (right) hemisphere.

**Table 4 | Brain regions selective for uncertainty condition when activity was defined as covariation with degree of uncertainty and the search space was constrained to anterior temporal lobe and perirhinal cortex.**


This table provides the coordinates and volumes of voxel clusters responding with covariation differences (p < 0.05, corrected) between perceptual vs. categorical uncertainty. Negative (positive) values of x indicate the left (right) hemisphere.

during categorical than perceptual uncertainty. However, some care should be exercised in interpreting this result, since the location of the ROI appears to be consistent with a part of early visual cortex (V1 or V2) representing far-peripheral visual space that would not have been stimulated by our targets.While we find it difficult to provide a simple explanation, we note that similar regions often appear in lists of activated regions in cognitive neuroscience papers (though the correspondence to far-peripheral V1/V2 is rarely mentioned). Possibly, some spatial attentional effect may be involved.

A similar search, constrained to perirhinal cortex and anterior temporal lobe, identified one ROI (**Table 4**) in which activity levels covaried more during categorical than perceptual uncertainty in naïve subjects. No ROIs in this anatomical search space were identified with greater covariation for perceptual than categorical uncertainty in naïve subjects, and no ROIs in this anatomical search space were identified at all in the 80/20 data.

One potential concern with the above observations is that the contrast used to define the key ROIs was based on a subject pool of which two-thirds were prior knowledge subjects. The potential pitfall here is a scenario in which the naïve subjects might have had ROIs in other locations, in which existing differences between the two uncertainty conditions failed to reach significance in the pooled subject dataset. To check against this possibility, we performed a separate analysis in an attempt to locate ROIs for the perceptual vs. categorical uncertainty conditions in the 22 naïve subjects only. No clusters were found that survived our statistical threshold.

## **DISCUSSION**

In this study, we asked subjects undergoing fMRI scanning to make decisions about visual targets under conditions of perceptual and categorical uncertainty, with and without prior knowledge of the response that was likely to be correct. Subjects trained to use a prior knowledge cue showed larger positive activations in bilateral LO and left pF cortex during decisions made under perceptual uncertainty than did naïve subjects. Under categorical uncertainty, the prior knowledge subjects experienced smaller decision-related positive activations than did naïve subjects. In the left MidFG, the condition associated with the highest activation levels in the occipital ROIs – namely, prior knowledge during perceptual uncertainty – was the only one eliciting positive decision-related activity. During perceptual uncertainty when no prior knowledge was available and during categorical uncertainty, regardless of prior knowledge, decisions negatively activated left MidFG.

These observations enhance our understanding of the integration of prior knowledge into decision-making in several respects. First, the results demonstrate that top-down prior knowledge effects in the brain during perceptual decisions depend on the reason the decisions are difficult. Namely, the sign of the priorsrelated modulation in visual cortex was positive when the sensory evidence was difficult to perceive and negative when the evidence was easy to perceive but difficult to interpret. The sign difference observed across these two conditions implies that the effects of prior knowledge on perceptual decisions are not uniform across decision types, but rather depend on the attributes of the stimuli about which the decisions are made.

Besides simply establishing a dependence on stimulus attributes, the observations point to underlying neural mechanisms in each uncertainty condition. During perceptual uncertainty, decisions in the context of prior knowledge positively activated left MidFG (**Figure 4**; **Table 1**). The prefrontal decision-related activation in the prior knowledge and perceptual uncertainty condition was actually bilateral, since this condition as well as the others positively activated right MidFG (**Table 2**). Our observations may be related to the observation by Rahnev et al. (2011) of larger activity in the lateral prefrontal cortex, not far away from the current site of activity, when participants had prior knowledge about a perceptual decision. The bilateral MidFG activation was also associated with increased positive activation in bilateral LO and left pF cortex, visual regions selective for shapes and objects (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi and Kanwisher, 2000). This increase in activation confirmed the prediction that motivated the current study: when stimuli were noisy, such that enhancing the representation of the sensory evidence could be adaptive, prior knowledge increased activation in the relevant parts of visual cortex. The anatomical locations of the occipital ROIs are consistent with previously documented loci for shape selectivity (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi and Kanwisher, 2000), so these were precisely the regions in which signal modulation had the most potential to affect performance in our shape decision task. Thus, the perceptual uncertainty observations are analogous to a previous demonstration that prior knowledge favoring faces (or houses) enhanced fMRI activity in FFA (or PPA; Esterman and Yantis, 2010). In parallel, during categorical uncertainty, subjects with prior knowledge experienced significantly lower activation levels in the visual ROIs than did naïve subjects. The prediction motivating this study was that prior knowledge would increase visual cortical activity during perceptual but not categorical uncertainty; we did not explicitly predict that prior knowledge would actually decrease visual cortical activity during categorical uncertainty. However, the decrease is intuitive; it suggests that the prior knowledge subjects were giving less weight to the sensory evidence of curvature than were the naïve subjects. Such a strategy would be reasonable, as the RT data imply that adding prior knowledge to the categorical task imposed an additional cognitive load relative to the naïve condition. Giving less weight to visual appearance, relative to the naïve condition, may have partly compensated for an increased cognitive load.

Our observations may be relevant to those from previous studies that identified dissociations between abstract rule- or categoryselective activity in prefrontal cortex and stimulus-selective activity in more posterior brain regions. For example,Jiang et al. (2007) asked subjects undergoing fMRI to make decisions about morphed cars and showed that changing perceptual vs. categorical qualities of the stimuli modulated activity in LO and right prefrontal cortex respectively. Similarly, Montojo and Courtney (2008) used a mental arithmetic task with fMRI and showed that rule updating preferentially activates prefrontal cortex while number updating preferentially activates parietal cortex.

During perceptual uncertainty without prior knowledge, and during categorical uncertainty regardless of prior knowledge, decisions negatively activated left MidFG (**Figure 4**; **Table 1**). The term negative activation, also known as deactivation, means that the fMRI signal level was lower during trials when a target was present and a decision was made than during blank trials when no target was present and no decision was made. Negative activations are seen in brain regions whose function is not relevant to the experimental condition being tested. For example, when task-relevant stimuli are visual, stimulus presentation often results in negative activation of auditory cortex (Haxby et al., 1994; Amedi et al., 2005). Concurrent negative and positive activations can also occur in left hemisphere and right hemisphere counterparts of the same cortical area. For example, stimulation of the right median nerve, which elicits positive activation in left primary somatosensory cortex, also elicits negative activation in right primary somatosensory cortex (Hlushchuk and Hari, 2006;Kastrup et al., 2008). One interpretation of such observations is that negative activations reflect suppression of functional activity that is not required for the task at hand. According to this line of reasoning, our results imply that left MidFG plays a role in integrating prior knowledge during perceptual uncertainty, but is not required during decisions in general. This conclusion is consistent with our previous results that implicated only right MidFG involvement in prior knowledge during categorical uncertainty (Hansen et al., 2011, 2012).

Our results also show that the modulation of sensory activity cannot be attributed to a general arousal effect, but rather is targeted to the part of visual cortex where a modulation could have the most impact on task performance. This can be seen by examining the location of the occipital ROIs: bilateral LO and left pF, regions already known to be selective for shapes and objects like our shape targets (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi and Kanwisher, 2000). For comparison, we did not see any effects in earlier visual areas, such as V1,V2, or V3, which are selective for the spatial location but not for the shape of visual stimuli. Since our stimuli were jittered in size, orientation, and spatial position, modulatory effects in the earlier visual areas would not be predicted to affect performance. Changes in arousal or attention have been shown to modulate signals in these earlier visual areas (Tootell et al., 1998; Watanabe et al., 1998; Somers et al., 1999; Huk and Heeger, 2000). Since no such modulation was observed in the earlier areas, we conclude that the modulation that we did observe in LO and pF was not due to overall arousal or attentional state.

During both uncertainty conditions, the decision response curves (**Figure 2**) and the within-ROI fMRI activity levels (**Figure 5**) seen in prior knowledge subjects at 50/50 tended to fall between activity levels seen in the same subjects at 80/20 and activity levels in the naïve subjects.A previous publication (Hansen et al., 2011) focuses on this interesting *persistent bias* pattern in the categorical uncertainty behavioral and fMRI data, showing for the first time that practice making decisions under categorical uncertainty in the context of non-equal prior probabilities biases decisions made later when prior probabilities are equal. In simple terms, once you learn a bias it is hard to let it go. The observation of the same tendencies in the perceptual uncertainty data indicates that bias persistency is not unique to categorical uncertainty, but may generalize across decision-making paradigms.

Our manipulation of categorical uncertainty involved ambiguous shapes. It might be asked whether we performed a true test of categorical uncertainty, which would require keeping the shape information constant, but varying the validity of the association between the shape and the correct response. In fact, this description fits our manipulation well. The simplest way to see this is to consider a single categorical shape with curvature in the intermediate (ambiguous) range –for example, the shape with average (13%) curvature. A subject's experience with this shape is equivalent to the true test of categorical uncertainty. At 50/50 this particular shape is associated with complete uncertainty, while at 80/20 there is less uncertainty for this shape. A similar relationship between the prior condition and the uncertainty level holds for every shape in the intermediate range. Shapes on the extreme ends of the distribution are not ambiguous and therefore are associated with no uncertainty, but this attribute is common to both the categorical and the perceptual uncertainty conditions.

The increased visual cortical activity seen with prior knowledge during perceptual (but not categorical) uncertainty is reminiscent of the increased visual cortical activity seen with top-down, goaldirected, endogenous attention. Conceivably, similar to the effects of directing attention to noisy stimuli (Lu and Dosher, 1998), an adaptive modulation could enhance stimulus attributes indicated by the prior knowledge and/or decrease contraindicated stimulus attributes. Future experiments could explore this issue by systematically investigating the effects of attention on classifying targets during perceptual vs. categorical uncertainty.

#### **REFERENCES**


brain activity predicts individual differences in prior knowledge use during decisions. *J. Cogn. Neurosci.* 24, 1462–1475.


## **ACKNOWLEDGMENT**

This work was supported by the NIMH Intramural Research Program.

attention mechanisms. *Vision Res.* 38, 1183–1198.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 June 2012; accepted: 22 October 2012; published online: 16 November 2012.*

*Citation: Hansen KA, Hillenbrand SF and Ungerleider LG (2012) Effects of prior knowledge on decisions made under perceptual vs. categorical uncertainty. Front. Neurosci. 6:163. doi: 10.3389/fnins.2012.00163*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Hansen, Hillenbrand and Ungerleider. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## What are the odds?The neural correlates of active choice during gambling

## *Bettina Studer 1,2\*, Annemieke M. Apergis-Schoute1,2,TrevorW. Robbins 1,2 and Luke Clark 1,2*

<sup>1</sup> Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK

<sup>2</sup> Department of Experimental Psychology, University of Cambridge, Cambridge, UK

#### *Edited by:*

Ming Hsu, University of California Berkeley, USA

#### *Reviewed by:*

O'Dhaniel A. Mullette-Gillman, National University of Singapore, Singapore Lusha Zhu, Virginia Tech Carilion Research Institute, USA Adam Craig, University of South Florida, USA

#### *\*Correspondence:*

Bettina Studer, Department of Psychology, University of Basel, Missionsstrasse 62A, 4055 Basel, Switzerland. e-mail: bettina.studer@unibas.ch

Gambling is a widespread recreational activity and requires pitting the values of potential wins and losses against their probability of occurrence. Neuropsychological research showed that betting behavior on laboratory gambling tasks is highly sensitive to focal lesions to the ventromedial prefrontal cortex (vmPFC) and insula. In the current study, we assessed the neural basis of betting choices in healthy participants, using functional magnetic resonance imaging of the Roulette Betting Task. In half of the trials, participants actively chose their bets; in the other half, the computer dictated the bet size. Our results highlight the impact of volitional choice upon gambling-related brain activity: Neural activity in a distributed network – including key structures of the reward circuitry (midbrain, striatum) – was higher during active compared to computer-dictated bet selection. In line with neuropsychological data, the anterior insula and vmPFC were more activated during self-directed bet selection, and responses in these areas were differentially modulated by the odds of winning in the two choice conditions. In addition, responses in the vmPFC and ventral striatum were modulated by the bet size. Convergent with electrophysiological research in macaques, our results further implicate the inferior parietal cortex (IPC) in the processing of the likelihood of potential outcomes: Neural responses in the IPC bilaterally reflected the probability of winning during bet selection. Moreover, the IPC was particularly sensitive to the odds of winning in the active-choice condition, when the processing of this information was required to guide bet selection. Our results indicate an important role of the IPC in human decision-making under risk and help to integrate neuropsychological data of risk-taking following vmPFC and insula damage with models of choice derived from human neuroimaging and monkey electrophysiology.

**Keywords: betting, choice, fMRI, inferior parietal cortex, ventromedial prefrontal cortex, reward**

## **INTRODUCTION**

Gambling is a common recreational activity in which a bet, typically a sum of money,is placed on an uncertain prospect. Gambling can be seen as a form of decision-making under risk and requires pitting the subjective values of potential wins and losses against their probability of occurrence. Abnormal betting on laboratory gambling tasks has been observed in a number of psychiatric disorders that are characterized by impairments in everyday decisionmaking, such as addictions (Lawrence et al., 2009), bipolar disorder (Murphy et al., 2001; Roiser et al., 2009), and schizophrenia (Hutton et al., 2002). Neuropsychological research using the Cambridge Gamble Task (CGT) has further shown that laboratory betting behavior is highly sensitive to focal brain injury. Patients with lesions to the ventromedial prefrontal cortex (vmPFC) show increased overall betting (Mavaddat et al., 2000; Manes et al., 2002; Clark et al., 2003, 2008), while a group of patients with insula damage were impaired in adjusting their bets to the chances of winning (Clark et al., 2008). These results indicate that the anterior insula and the vmPFC are critically involved in betting decisions. In healthy participants, previous neuroimaging studies revealed that the vmPFC and anterior insula, among other structures, are

activated during valuation of risky response options (e.g., Chib et al., 2009) and during anticipation of uncertain outcomes (for reviews, see Ernst and Paulus, 2005; Krain et al., 2006; Knutson and Greer, 2008; Liu et al., 2011). While the results of these studies on valuation are compatible with the aforementioned neuropsychological work, the neural responses to bet selection as the most direct analog of gambling-related choice in healthy humans have rarely been studied. In the current study, we administered the Roulette Betting Task (Studer and Clark, 2011), in which participants are asked to place bets on risky gambles with varying chances of winning, to healthy volunteers and assessed the neural responses during bet selection by use of functional magnetic resonance imaging (fMRI).

Our first aim was to investigate differences in neural responses during active and passive selection of bets. Research on real-life gambling has highlighted a key influence of active choice upon risk-taking behavior. Even in games of pure chance, gamblers prefer situations that allow direct choice or manual control, and place higher bets under such conditions, a phenomenon termed the "illusion of control" (Langer, 1975; Ladouceur and Mayrand, 1987; Davis et al.,2000).We have recently shown that the requirementfor active choice boosts selection-related psychophysiological arousal during laboratory gambling (Studer and Clark, 2011). Furthermore, prior fMRI studies revealed that neural responses to the presentation of wins and losses in the striatum are enhanced under conditions of instrumental choice (Coricelli et al., 2005; Rao et al., 2008; Camille et al., 2011). In contrast, the influence of the requirement for active choice upon neural activity at the time of selection remains largely unstudied. In the current study, we compared neural responses during active (i.e., volitional, selfdirected) versus computer-dictated selection of the bet amount. We hypothesized that neural activity during the selection phase in the brain reward circuitry, specifically in the striatum, would be higher in the active-choice condition.

Our second goal was to assess how the chances of winning are represented in the brain during the selection of bets. We reasoned that areas guiding risk-sensitive choice would be more responsive to the chances of winning during active compared to passive bet selection. Previous fMRI research assessing neural activity during outcome anticipation consistently found that neural responses in the anterior insula and vmPFC are modulated by the likelihood of potential outcomes (Critchley et al., 2001; Knutson et al., 2005; Preuschoff et al., 2006, 2008; Yacubian et al., 2006; Tobler et al., 2007; Rolls and Grabenhorst, 2008). The neural representation of the chances of winning during the selection phase, i.e., during the decision process *per se,* is less clear. A small number of previous fMRI studies indicate that, in addition to the anterior insula and vmPFC, the inferior parietal cortex (IPC) reflects the probability of potential outcomes during the choice window (Huettel et al., 2005;Van Leijenhorst et al., 2006; Smith et al., 2009). In close parallel, electrophysiological research in non-human primates reported that firing rates of neurons in the posterior parietal cortex co-vary with the reward likelihood during response selection (Shadlen et al., 1996; Platt and Glimcher, 1999; Shadlen and Newsome, 2001; McCoy and Platt, 2005; Kable and Glimcher, 2009). Thus, we hypothesized that neural activity in the IPC, anterior insula, and vmPFC would reflect the likelihood of winning during bet selection, particularly in the active-choice condition.

Our design also allowed the investigation of brain responses modulated by bet size. Previous fMRI studies found that the striatum and medial OFC are sensitive to the magnitude (and expected value) of potential rewards during outcome anticipation (Knutson et al., 2001, 2005; Yacubian et al., 2006; Tobler et al., 2007; Tom et al., 2007). Based on these results, we hypothesized that the striatum and the vmPFC would be sensitive to the bet size during the selection phase.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Right-handed male healthy volunteers (*n* = 41) took part in this study (mean age = 24 years, SD = 4) and attended a single MRI session following a screening appointment. Volunteers were prescreened to exclude MRI contraindications, regular use of drugs, regular gambling, and prior history of neurological or psychiatric illness. The study was approved by the national research ethics committee and was conducted in accordance with the Declaration of Helsinki. All participants gave written informed consent, and were reimbursed £40 for participation plus a variable bonus

depending on their final score in the task, which participants were told would range between £0 and £10 (in reality, all participants received bonuses between £5 and £8). In the MRI session, participants received the task instructions and 10 practice trials before entering the scanner. Light head restraints were used to limit participant's head movement during MRI data acquisition. Two participants were excluded from analysis; one due to technical problems with the MRI scanner, the other due to problems with the normalization of MRI data.

## **TASK**

Participants were administered the Roulette Betting Task (Studer and Clark, 2011), a computerized task that assesses risk-sensitive decision-making. The task was programmed in Visual Basic 2008 (Microsoft Corp., Redmond, WA, USA). Participants viewed the computer monitor through a mirror fitted on top of the head coil and used a MRI-compatible button box to make their choices. Participants completed three runs of the task; each run consisted of 25 trials and lasted about 10 min. Each trial consisted of three phases: selection, anticipation, and feedback (see **Figure 1**). At the beginning of each trial, a fixation cross was displayed for a variable inter-trial interval, drawn from an exponential distribution ranging from 4 to 10 s. Subsequently, a wheel with 10 red and blue segments was presented, along with three bets. Participants were instructed that if the wheel stopped on a blue segment, they would win, and if the wheel stopped on a red segment, they would lose. The ratio of blue (winning) and red (losing) segments varied across trials, reflecting the chances of winning (60, 70, or 80%). The presentation of the wheel initiated the selection phase: participants were asked to choose one of the three presented bet boxes by pressing a corresponding key on the button box. Two trial types were contrasted: "active-choice" trials, in which the participants were required to select the size of bet (10, 50, or 90 points), and "no-choice" trials, in which all three bets boxes contained identical amounts. Once a response had been made, the corresponding bet box stayed highlighted until the end of the selection period (fixed duration = 3.5 s). The wheel then spun (anticipation period), with a variable duration drawn from an exponential distribution ranging from 4 to 8 s. The wheel stopped on one of the 10 segments, initiating the feedback period. If the wheel stopped on a blue segment, the chosen amount of points was won, and the outcome message "YOU WON [XX] POINTS" was presented. If the wheel stopped on red, the selected amount of points was lost, and the message "YOU LOST [XX] POINTS" appeared. The accumulated point score was presented to participants at the end of each run.

## **DATA ACQUISITION AND PREPROCESSING**

Gradient echo T2∗-weighted echo-planar images (EPIs) were acquired on a Siemens Tim Trio 3 Tesla magnet using a 32 slice axial oblique sequence, with a repetition time of 2 s (TE 30 ms, flip angle 78˚, voxel size 3.0 mm × 3.0 mm × 3.0 mm, matrix size 64 × 64, field of view 192 mm × 192 mm, bandwidth 2442 Hz). In order to reduce signal dropout in the orbitofrontal cortex, the plane of acquisition was individually tailored for each participant by aligning it with the base of brain (approximately 0˚ to −10˚ to the anterior commissure – posterior commissure line).

illustration purposes only.

At the start of each of the three sessions, six dummy volumes were discarded to allow for equilibrium effects. Each run lasted a maximum of 360 repetitions (12 min), but was terminated early on block completion. In addition, a high-resolution T1-weighted structural image was collected for each participant.

decision outcome was presented. "Active-choice trials" and "no-choice trials"

Processing and analysis of fMRI data was performed using SPM5 (Statistical Parametric Mapping, Wellcome Department of Cognitive Neurology, London, UK). Data preprocessing consisted of within-subject spatial realignment, spatial normalization, and spatial smoothing using an isometric Gaussian kernel with a full width at half-maximum of 10 mm. Volumes were normalized to the International Consortium for Brain Mapping (ICBM) templates that approximate to Talairach and Tournoux (1988) space, using a matrix obtained from normalizing each subject's segmented structural scan onto the ICBM gray and white matter templates.

## **DATA ANALYSIS**

For analysis of behavioral responses, the following two measurements were assessed for each trial: (a) response time, (b) selected bet amount (in active-choice trials only). Statistical analysis of behavioral data was conducted in SPSS (Version 15.0; SPSS Inc., Chicago, IL, USA). All statistical tests are reported two-tailed, and alpha was set at 0.05.

We assessed event-related BOLD responses modeled to the selection and outcome phases of each trial, using a canonical hemodynamic response function implemented within a general linear model (GLM). Four event types were distinguished: *activechoice trials* and *no-choice trials* were modeled at selection onset using epoch functions with individual response times as the durations, and *wins* and *losses* were modeled at outcome with a duration length of 2 s. The probability of winning and the bet size were added as parametric modulators onto the active-choice and no-choice selection regressors. Thus, a total of four parametric modulators were added to the GLM. The use of these decision variables as parametric modulators allows for the identification of brain areas in which the magnitude of BOLD responses correlates with the probability of winning and the bet size on a trial-by-trial basis. The design matrix hence comprised 8 columns [3 (selection: active choice) + 3 (selection: no choice) + 2 (feedback)], plus the 6 movement parameters from spatial realignment as covariates of no interest.

Twelve subjects uniformly selected the highest bet option in all active-choice trials. The lack of any variation in the bet size in active-choice trials made the calculation of the parametric modulator impossible for these subjects; hence they were excluded from further analysis. The remaining 27 participants included in the final analysis selected a bet other than their most frequently chosen one in 7–58% of active-choice trials (Mean = 34%, SD = 13%).

Next, we calculated the following first-level single-subject contrasts for the selection phase:


In the specified GLM, any shared variance between the two parametric modulators (probability of winning and bet size) is assigned to the probability modulator (entered first) through autoorthogonalization implemented in SPM. We chose this ordering of modulators in this primary GLM as it gives maximal explanatory power to the probability modulator (see Hare et al., 2008; Symmonds et al., 2010). As the chances of winning and the size of chosen bets were correlated in the active-choice trials in most subjects, we conducted a follow-up analysis in order to test whether the regions associated with the likelihood of winning were *uniquely* sensitive to the probability independent of bet size. Thus, a second GLM was calculated, in which the order of modulators was reversed (bet size entered first). The activations identified in the contrasts (2) and (3) in the primary GLM were then compared with the results obtained in the same contrasts in this second GLM.

The individual contrast images were taken to a second-level group analysis. One sample *t*-tests were calculated on the singlesubject contrast images. We first computed region-of-interest (ROI) analyses based on *a priori* hypotheses about the involvement of four brain regions in risky selection as discussed in the Introduction: (a) vmPFC (gyrus rectus, orbital parts of mid frontal gyrus, and orbital parts of superior frontal gyrus), (b) bilateral insula (c) bilateral striatum (caudate, putamen), (d) bilateral IPC (inferior parietal lobe, supramarginal gyrus, angular gyrus). Pick-Atlas (Maldjian et al., 2003, 2004) was used to create a single combined mask of the four ROIs defined anatomically using the Anatomical Automatic Labeling (AAL) Atlas (Tzourio-Mazoyer et al., 2002). Statistics within this ROI mask were thresholded at *P* < 0.05 with false discovery rate (FDR) correction applied and an extent threshold of 10 voxels. AAL was used for voxel localization. Rfxplot software (Gläscher, 2009) was used to extract and display percent signal change or parameter estimates for peak voxels. To test for other foci outside the ROI mask that may be sensitive to the choice parameters, we also conducted exploratory whole-brain analyses at a less stringent level with statistical inferences performed at a level of *P* < 0.001 uncorrected and a minimal cluster size of 10 voxels (see also Van Leijenhorst et al., 2006; Elliott et al., 2008; Sharot et al., 2009; Plassmann et al., 2010).

Two supplemental analyses were conducted to provide quality checking of our task against established effects and to facilitate comparison with previous work. First, although the goal of this

study was to investigate neural correlates of decision-making (i.e., during selection), we compared outcome-related BOLD responses to wins and losses in a whole-brain analysis in order to validate our data in relation to the prior literature. The results of this analysis can be found in the **Table A4** in Appendix. Second, a number of prior neuroimaging studies have assessed the neural representation of the expected value of choice options (e.g., Tobler et al., 2009; Symmonds et al., 2010). In order to allow the comparison of our data with this prior literature, we calculated an additional GLM: BOLD responses were modeled to the selection and outcome events as in the primary GLM, but with expected value [(probability of winning minus probability of losing) multiplied by bet amount] entered as a single parametric modulator to the selection regressors. We then identified areas that were sensitive to the expected value during active and passive bet selection, using the ROI and whole-brain approaches. The results of these analyses can be found in **Tables A5** and **A6** in Appendix.

Analysis of the behavioral data revealed considerable individual differences in betting behavior.Most importantly, participants varied considerably in the degree to which they adjusted their bets to the chances of winning in active-choice trials ("risk adjustment"). This tendency can numerically be expressed for each participant by calculating the change in average bet size in 60 and 70%-trials compared to in 80%-trials (Studer and Clark, 2011).We assessed whether this heterogeneity in choice behavior was related to individual differences in neural sensitivity during bet selection, and particularly neural responsiveness to the chances of winning, by entering risk adjustment as a co-variable in the following three group-level *t*-tests: (1) active-choice versus no-choice trials, (2) parametric modulation by the chances of winning in both choice conditions, (3) ratio × choice interaction. Whole-brain analysis (*P* < 0.001 uncorrected, *k* = 10) was then conducted to identify areas where activity correlated with risk adjustment across participants.

## **RESULTS**

## **BEHAVIOR**

Analysis of behavioral data replicated our previous results on the same task administered outside the MRI scanner (see Studer and Clark, 2011 for details). Specifically, we first examined whether participants varied their bets in the active-choice condition. A oneway ANOVA showed a significant main effect of the likelihood of winning [*F*(2, 78) <sup>=</sup> 47.31, *<sup>P</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.55], with bets rising with increasing likelihood (average chosen bet: trials with 60% probability of winning: 52 ± 3 points, 70%-trials: 72 ± 3 points, 80%-trials: 88 ± 1 points).

Response times were sensitive to the chances of winning and the requirement for active choice: a 3 (probability of winning) × 2 (choice) repeated-measures ANOVA on the decision latencies revealed a significant probability × choice interaction [*F*(2, 78) <sup>=</sup> 18.96, *<sup>P</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.33], as well as significant main effects of probability [*F*(2, 78) = 50.82, *P* < 0.001, η2 <sup>p</sup> <sup>=</sup> 0.57] and choice [*F*(1, 39) <sup>=</sup> 5.21, *<sup>P</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.12]. As expected, participants were faster to select their bet on nochoice trials compared to active-choice trials and deliberated less when the probability of winning increased, particularly in active-choice trials (active choice: 60%-trials: 1691 ± 67 ms, 70%-trials: 1517 ± 64 ms, 80%-trials: 1224 ± 43 ms, no choice:

60%-trials: 1433 ± 73 ms, 70%-trials: 1304 ± 69 ms, 80%-trials: 1264 ± 57 ms).

### **NEURAL CORRELATES OF ACTIVE CHOICE**

First,we compared neural activations during the selection phase in active-choice trials to brain responses during the selection phase in no-choice trials. The requirement for active choice was associated with higher responses in the caudate bilaterally (right: peak at 12, 4, 8; *t* = 6.39; left: peak at −8, 10, 2; *t* = 5.64), anterior insula bilaterally (right: peak at 32, 20, 4; *t* = 5.25; left: peak at −34, 26, 2; *t* = 4.32), IPC bilaterally (right: peak at 34, −48, 40; *t* = 4.57; left: peak at −24, −54, 52; *t* = 4.15), and in the right OFC (peak at 36 50 −2; *t* = 3.31), compared to computer-dictated selection (see **Figure 2**). There were no foci within the ROI mask that displayed higher activity during passive selection.

An exploratory whole-brain analysis additionally showed an increased signal during active compared to passive selection in a number of areas outside the ROIs, including in the anterior cingulate cortex (BA32), midbrain, and superior parietal cortex (see **Table A1** in Appendix). Within the IPC, both increased and decreased activations during active choice of bets compared to passive selection were observed in adjacent subregions. Note, however, that the ROI analysis only confirmed increased activation in the IPC during active choice of bets.

#### **NEURAL CORRELATES OF PROBABILITY OF WINNING**

Our second aim was to identify brain areas that are sensitive to the likelihood of winning during the selection phase. We reasoned that regions that subserve decision-making would predominantly be sensitive to the probability of winning in active-choice trials, when this information was used to guide choice.

The ROI analysis revealed such a probability × choice interaction in the mOFC (peak at −6, 26, −12; *t* = 5.86), angular gyrus bilaterally (right: peak at 60, −54, 26; *t* = 5.20; left: peak at −56, −66, 26; *t* = 5.76), supramarginal gyrus bilaterally (right: peak at 66,−22, 22; *t* = 4.08, left: peak at −50,−24, 16; *t* = 3.26), the anterior insula bilaterally (right: peak at 32, 20,−20; *t* = 4.23; left: peak at –28, 16, −8; *t* = 4.94), and in the right caudate (peak at 10, 10, −2; *t* = 4.21; see **Figure 3**). Whole-brain analysis (see **Table A2** in Appendix) revealed additional responses (outside the ROI mask) in the medial superior frontal gyrus and the midcingulate cortex.

In a follow-up analysis, we tested whether these activations remained significant after the variance shared with the bet size modulator was removed from the estimation of the probability modulator. A second GLM with the order of parametric modulators reversed confirmed a significant probability × choice interaction in the mOFC (peak at −4, 26, −12; *t* = 4.33), angular gyrus bilaterally (right: peak at 62, −38, 34; *t* = 3.68; left: peak at −50, −60, 24; *t* = 4.09), right supramarginal gyrus (peak at 68, −44,

**FIGURE 2 | Active versus passive selection of bets.** ROI analysis revealed stronger activations during active choice of bets compared to computer-dictated bet selection in the anterior insula bilaterally (peaks at −34, 26, 2; 32, 20, 4), the caudate bilaterally (peaks at −8, 10, 2; 12, 4, 8), and the

inferior parietal lobe (IPL) bilaterally (peaks at −24, −54, 52; 34, −48, 40). Results are displayed at P < 0.05, FDR-corrected. Bar graphs show percent signal change at peak voxels [**(A,B)**: anterior insula, **(C,D)**: caudate, **(E,F)**: IPL] during bet selection for the two choice conditions. Error bars represent SEM.

**FIGURE 3 | Neural correlates of chances of winning during bet selection.** ROI analysis revealed that neural responses in the supramarginal gyrus bilaterally (peaks at −50, −24, 16; 66, −22, 22), angular gyrus bilaterally (peaks at −56, −66, 26; 60, −54, 26), anterior insula bilaterally (peaks at −28, 16, −8; 32, 20, −20), and ventromedial prefrontal cortex (peak at −6, 26, −12) are differentially modulated by the chances of winning during active versus passive bet selection. Results are displayed at P < 0.05, FDR-corrected. Bar graphs show parameter estimates for the probability of winning modulator in the two choice conditions at peak voxels [**(A,G)**: supramarginal gyrus, **(B,F)**: angular gyrus, **(C,E)**: anterior insula, **(D)**: ventromedial prefrontal cortex]. Error bars represent SEM.

24; *t* = 3.16), and the anterior insula bilaterally (right: peak at 30, 18, −20 *t* = 3.52; left: peak at −38, −2, −14; *t* = 4.82) in the ROI analysis. The results of the corresponding whole-brain analysis are described in **Table A2** in the Appendix.

We also tested for brain areas that were modulated by the probability of winning independently of the choice condition. In the ROI analysis, no regions were significantly modulated by the probability of winning across both choice conditions. Whole-brain analysis found that neural responses in the left dlPFC, right posterior insula, and visual cortex, as well as in the left angular gyrus and left supramarginal gyrus, were correlated positively with the probability of winning in both active-choice and no-choice trials (see **Table A2** in Appendix). A follow-up analysis (whole-brain) showed that BOLD responses in the left angular gyrus and visual cortex remained significantly modulated by the probability of winning in the second GLM, after removing variance shared with the bet size modulator (see **Table A2** in Appendix). There were no areas identified in which activity was negatively correlated with the probability of winning (i.e., greater activity with lower likelihoods of winning) in either analysis.

## **INDIVIDUAL DIFFERENCES IN GAMBLING-RELATED BRAIN ACTIVITY**

Exploratory analyses tested whether heterogeneity in behavioral performance was related to individual differences in neural activity during betting choices. Specifically, we examined whether behavioral sensitivity to the chances of winning (risk adjustment) was related to neural sensitivity to the chances of winning across participants. Whole-brain analysis showed that risk adjustment was positively correlated with neural sensitivity to the chances of winning in active-choice and no-choice trials in the left supramarginal gyrus (peak at −54, −28, 30, *t* = 4.05), the left cuneus (peak at – 22, −60, 26, *t* = 4.08), and the right precuneus (peak at 8 −56, 36, *t* = 4.41). Thus, participants who adjusted their bets more to the chances of winning showed stronger responses to higher likelihood

of winning in these brain areas. The reverse contrast did not reveal any significant activations, i.e., areas where neural responsivity to the chances of winning were negatively correlated with risk adjustment. No significant relationships between risk adjustment and neural responsivity in the chances of winning × choice contrast were found. We additionally tested whether risk adjustment was correlated with neural responsivity in the overall active-choice versus no-choice contrast. No significant activations were found in this analysis.

#### **NEURAL CORRELATES OF MAGNITUDE OF BETS**

Our design also allowed us to identify areas that were sensitive to the magnitude of the bet placed. Specifically, due to the order of the parametric modulators in our design matrix, we could test for areas where the BOLD signal was modulated by the bet size, and was not already explained by the probability modulator. As in the probability analysis, we first identified brain areas that were more responsive to the magnitude of bets during selection in active-choice compared to no-choice trials. In the ROI analysis, no regions were identified that showed such a bet size × choice interaction. An exploratory whole-brain analysis, however, found this pattern in the supramarginal gyrus bilaterally and the right visual cortex (see **Table A3** in Appendix; **Figure 4**).

We also tested for neural activations that were modulated by the size of bets independently of the choice condition. ROI analysis did not reveal any areas that were significantly modulated by the bet size. However, in the whole-brain analysis, we observed that vmPFC (BA10) and two clusters in left and right caudate were positively correlated with bet size across both choice conditions (see **Figure 4** and **Table A3** in Appendix). The peak of the vmPFC cluster was just on the border of our ROI, with about half of the cluster located superior to the ROI. The caudate clusters were located in close proximity to, but fully outside, the striatal ROI. No areas in

which responses were negatively correlated with the size of bets were found.

## **DISCUSSION**

The present study investigated the neural basis of betting choices in healthy subjects using fMRI. We analyzed BOLD responses during the selection phase of the Roulette Betting Task and manipulated choice requirements and the odds of winning. Our first aim was to compare brain responses during volitional (i.e., active, instrumental) versus computer-dictated (passive) bet selection. Active choice of bets was accompanied by increased activity in the striatum, midbrain, medial orbitofrontal cortex, anterior insula, anterior cingulate cortex, visual, and (pre-)motor areas, compared to computer-dictated selection of bets. Our second aim was to assess how the likelihood of winning is neurally represented during active and passive bet selection. ROI analysis showed that the anterior insula bilaterally, IPC bilaterally, right caudate, and vmPFC were particularly sensitive to the chances of winning in active-choice trial, that is to say when this information was used to guide selection. Whole-brain analysis found that the left IPC and right insula correlated with the probability of winning across both active and passive conditions. Individual differences in risk adjustment were positively correlated with neural sensitivity to the chances of winning in the left IPC, across participants.

#### **NEURAL SUBSTRATES OF ACTIVE CHOICE**

Our results highlight the impact of volitional choice upon brain activity during laboratory gambling. Key structures of the brain reward system – specifically the striatum, midbrain, and vmPFC – were more strongly activated during active choice of bets compared to computer-dictated bet selection. We previously

showed that psychophysiological arousal is enhanced during active compared to passive bet selection on the same task (Studer and Clark, 2011). In naturalistic gambling, players are more likely to bet and to accept higher risks under conditions of active choice (e.g., selecting lottery numbers) compared to no-choice conditions ("lucky dip"), even in games of pure chance where these manipulations do not affect the likelihood of winning (Henslin, 1967; Langer, 1975; Ladouceur and Mayrand, 1987; Davis et al., 2000). In the brain, instrumental action has previously been found to modulate feedback-related neural activity in the midbrain and striatum (e.g., O'Doherty et al., 2004; Tricomi et al., 2004; Zink et al., 2004; Walton et al., 2007) and active choice of risky gambles has been observed to enhance striatal responses to the presentation of outcomes (Coricelli et al., 2005; Rao et al., 2008; Camille et al., 2011). Our results extend this work by showing that neural activity in the midbrain and striatum is also boosted by active choice at the point of selection, that is to say, during the actual decision period.

The anterior cingulate cortex (ACC) was also more activated during active compared with computer-dictated bet selection. A considerable body of research in non-human primates has revealed that the ACC plays a critical role in active, volitional action selection and instrumental responding (see Walton et al., 2007; Rushworth, 2008; Rushworth and Behrens, 2008 for reviews). Furthermore, previous neuroimaging studies in humans reported that the ACC is activated during volitional action selection in learning environments (e.g., Walton et al., 2004; Behrens et al., 2007). For instance, Walton et al. (2004) assessed neural responses during performance of a higher-order switching task, in which participants received a switch cue and were either instructed which new response rule to follow, or could choose freely. The authors found stronger activations in the ACC during active, self-generated

rule selection compared to instructed selection. Our results extend these previous findings by showing that the human ACC is also implicated in the volitional choice of (explicitly presented) risky gambles.

## **NEURAL REPRESENTATION OF THE PROBABILITY OF WINNING DURING BET SELECTION**

The second aim was to identify brain areas that are sensitive to the chances of winning during bet selection, and to test for qualitative and quantitative differences in odds sensitivity under active and passive choice conditions. Neural responses in the IPC (angular and supramarginal gyrus) reflected the chances of winning during the selection phase, and more so in the active-choice condition. Neuroimaging studies on decision-making under risk frequently report activations in the IPC (see Krain et al., 2006; Platt and Huettel, 2008 for reviews), but many studies have failed to consider the functional significance of these activations, often making reverse inferences concerning hypothetical attentional demands. Thus, the role of the IPC in human decision-making has remained poorly specified. A few authors have speculated that the IPC might process the probabilities of outcomes during decision-making under risk (see Ernst et al., 2004; Labudda et al., 2008), in line with the wellestablished role of this region in numerical cognition (for recent reviews, see Ansari, 2008; Sandrini and Rusconi, 2009; Arsalidou and Taylor, 2011). Our results provide correlative evidence for this hypothesis, by showing that neural activity during bet selection in the IPC was modulated by the probability of winning: responses were greater on trials with more favorable odds. Our findings also converge with electrophysiological evidence in non-human primates, which shows that neurons in the posterior parietal cortex represent the probability of rewards during free and forced choice of options with uncertain outcomes (Platt and Glimcher, 1999; McCoy and Platt, 2005; Kable and Glimcher, 2009; Louie and Glimcher, 2010), and reflect choice certainty during perceptual decision-making (Shadlen et al., 1996; Shadlen and Newsome, 2001; Kiani and Shadlen, 2009). Moreover, we found that the IPC was particularly sensitive to the chances of winning in the activechoice condition, i.e., in situations where this information is used to guide risky choice. In close parallel to our results, Mohr et al. (2010) recently argued that the IPC is involved in risk processing during the decision window, but not during outcome anticipation, based on a meta-analysis of priorfMRI studies on decision-making under explicit risk. Finally, we observed that neural sensitivity to the chances of winning within the left supramarginal gyrus was stronger for individuals that adjusted their bets more to the likelihood of winning, i.e., showed a stronger behavioral sensitivity to the chances of winning. Together, these results indicate that the IPC subserves decision-making under explicit risk, and imply that current models of human choice based primarily on fronto-striatal circuitry (e.g., Brand et al., 2006; Frank and Claus, 2006) may be inadequate.

It is noteworthy that the IPC has recently also been implemented in other types of decision-making that do not include uncertain outcomes. Specifically, recent electrophysiological and neuroimaging studies reported that neural responses in the IPC reflect the amount of evidence accumulated for a decision and decision confidence in cost–benefit and perceptual decision-making (e.g.,Kiani and Shadlen, 2009; Basten et al., 2010; Kayser et al., 2010). Our results are broadly consistent with these data, as one might speculate that decision confidence increased with the chances of winning on our task.

The IPC has also been implicated in the planning and execution of eye movements (for reviews, see, e.g.,Andersen et al., 1992; Pierrot-Deseilligny et al., 1995, 2004; Grosbras et al., 2005). Could it be that the identified parietal activations reflect eye movements in order to gather information about the chances of winning rather than the processing of this information *per se*? While we have not explicitly controlled for potential eye movements in the data analysis, we think this is unlikely. If the inferior parietal activation reflected eye movements, one would expect stronger responses in trials with lower chances of winning,in which the wheels contained a more balanced number of winning and losing segments. However, we observed the opposite pattern: activations in the IPC were positively correlated with the chances of winning. In other words, responses in the IPC were strongest in the 80%-trials, which contained only two losing segments. There is ample evidence that in the range of 1–4 visual objects, numerosity is assessed in an automatic and fast visual process known as "subitizing" (see Feigenson et al., 2004 for a review). Thus,we posit that the chances of winning in the 80%-trials can easily be assessed at the first glance.

Neural responses during bet selection in the anterior insula were also characterized by an interaction between the probability of winning and the choice condition. Similar to the present results, Clark et al. (2008) found abnormal betting behavior in patients with damage to the insular cortex on the CGT: individuals with insula lesions failed to adjust their bets to the chances of winning. Our finding that neural responses in the anterior insula reflect the chances of winning × choice condition interaction is also consistent with a study by Rao et al. (2008), who observed differential activations in the anterior insula during voluntary versus involuntary risk-taking on the Balloon Analog Risk Task. The direction of the relationship between neural responses in the anterior insula and the probability of winning differed between active-choice and no-choice trials. During passive bet selection, neural responses in the anterior insula were *negatively* correlated with the chances of winning, while there was a *positive* correlation between the probability of winning and insula activity during active choice of bets (see **Figure 3**). Prior neuroimaging studies by Preuschoff et al. (2006, 2008) showed that the anterior insula is sensitive to reward variance during the anticipation of outcomes. In our task, participants tended to select higher bets, and thus took higher risks, when there was a greater probability of winning. Thus, it could be speculated that the anterior insula is sensitive to (subjective) risk during bet selection (see also Bossaerts, 2010). We further observed different activation patterns in the left and right anterior insula. The left anterior insula was primarily modulated by the chances of winning during passive bet selection, while the right anterior insula activation reflected the chances of winning during both volitional and computer-dictated bet selection. In line with these findings, the meta-analysis by Mohr et al. (2010) suggested the right anterior insula to be involved in risk processing during the choice window, whereas the left anterior insula processes outcome uncertainty during anticipation.

Finally, a probability of winning × choice condition interaction was also found in the vmPFC and the right caudate. These two regions were additionally sensitive to the size of bets, independent of the choice condition, although it should be noted that the cluster peaks fell outside of our *a priori* ROI. Prior neuroimaging work has implicated the vmPFC in the subjective valuation of choice options (e.g., Chib et al., 2009; Peters and Büchel, 2009, 2010; Hare et al., 2010; Sescousse et al., 2010). Neuropsychological studies showed that injury to the vmPFC is associated with enhanced risk-taking in everyday life (Eslinger and Damasio, 1985; Shallice and Burgess, 1991; Satish et al., 1999) and poor performance on laboratory gambling tasks (e.g., Bechara et al., 1999; Bechara et al., 2000; Fellows and Farah, 2005, 2007; Weller et al., 2007). Specifically, we previously found that patients with vmPFC-lesions selected higher bets than healthy participants and brain damaged controls on the CGT (Clark et al., 2003, 2008). Another study found impaired probability judgment on the CGT in patients with vmPFC-damage (Rogers et al., 1999). Similarly, neural responses in the ventral striatum have previously been found to reflect the expected value (i.e., the combination of reward magnitude and occurrence probability) of anticipated uncertain outcomes (e.g., Knutson et al., 2005; Preuschoff et al., 2006; Yacubian et al., 2006;Tobler et al., 2007). Here we found that the vmPFC and ventral striatum reflected both the probability and the magnitude of potential wins during risky selection, suggesting that these areas might hold a coordinated representation of these two decision parameters. Indeed, an additional analysis of our data (see **Table A5** in Appendix) showed that vmPFC and ventral striatum were sensitive to the expected value of active and passive gambling choices.

## **REFERENCES**


## **CONCLUSION**

Our results highlight the impact of active choice upon the neural correlates of gambling: a distributed network of brain regions was more activated during volitional compared to computer-dictated bet selection, including key areas of the brain reward system, namely the midbrain, striatum, and vmPFC. In line with previous neuropsychological data, we found that the vmPFC and anterior insula are involved in betting choices. Our data also provide correlative evidence for a role of the IPC in human decision-making under risk linked to the processing of outcome probabilities. Neural responses during the selection phase in the IPC reflected the probability of winning, especially so in the active-choice condition. In other words, the IPC was particularly implicated in situations where the processing of probability information was required to guide bet selection. Our data converge with recent findings of electrophysiological research in non-human primates and suggest that current models of human decision-making under risk focused on fronto-striatal circuitry should be extended to include interactions with the IPC.

## **ACKNOWLEDGMENTS**

This research was supported by a James McDonnell Foundation network grant (grant number: 22002015501 – RG51821). The study was completed within the Behavioral and Clinical Neuroscience Institute, supported by a consortium award from the MRC and Wellcome Trust. We would like to thank the radiographers and staff at the Wolfson Brain Imaging Centre, Cambridge UK for their assistance in the acquisition of MRI data, and Dr. Jon Roiser and Dr. Anna Barnes for valuable advice on MRI data analysis.

and Robbins, T. W. (2008). Differential effects of insular and ventromedial prefrontal cortex lesions on risky decision-making. *Brain* 131, 1311–1322.


absolute value of financial rewards in humans. *Eur. J. Neurosci.* 27, 2213–2218.


and Robbins, T. (2002). Decisionmaking processes following damage to the prefrontal cortex. *Brain* 125, 624–639.


human subcortical structures. *Neuron* 51, 381–390.


and behavioral responses to visual


single-subject brain. *Neuroimage* 15, 273–289.


Berns, G. S. (2004). Human striatal responses to monetary reward depend on saliency. *Neuron* 42, 509–517.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 January 2012; accepted: 20 March 2012; published online: 18 April 2012.*

*Citation: Studer B, Apergis-Schoute AM, Robbins TW and Clark L (2012) What are the odds? The neural correlates of active choice during gambling. Front. Neurosci. 6:46. doi: 10.3389/fnins.2012.00046*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Studer, Apergis-Schoute, Robbins and Clark. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## **APPENDIX**

**Table A1 | Neural correlates of active compared to passive selection of bets identified in the exploratory whole-brain analysis (***P* < **0.001,** *k* = **10).**


## **Table A2 | Neural correlates of probability of winning during bet selection (whole-brain analysis,** *P* < **0.001,** *k* = **10).**


Continued


\*Activations that were also found in the same contrasts calculated in GLM2, in which the order of parametric modulators was reversed [(1) bet size, (2) probability of winning]. Thus, these areas represented the probability of winning independently of the bet size.

## **Table A3 | Neural correlates of bet size during selection (whole-brain analysis,** *P* < **0.001,** *k* = **10).**


## **Table A4 | Neural correlates during feedback (whole-brain analysis, FWE-corrected,** *P* < **0.05,** *k* = **10).**


#### **Table A5 | Neural correlates of expected value during selection (GLM3, ROI analysis,** *P* < **0.001, FDR-corrected,** *k* = **10).**


## **Table A6 | Neural correlates of expected value during selection (GLM3, whole-brain analysis,** *P* < **0.001,** *k* = **10).**


## Preference reversals in decision making under risk are accompanied by changes in attention to different attributes

## **Betty E. Kim, Darryl Seligman and JosephW. Kable\***

Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA

#### **Edited by:**

Ming Hsu, University of California Berkeley, USA

#### **Reviewed by:**

Gregory R. Samanez-Larkin, Vanderbilt University, USA Meghana Bhatt, City of Hope, USA Robin Chark, National University of Singapore, Singapore Charlene C. Wu, Stanford University, USA

#### **\*Correspondence:**

Joseph W. Kable, Department of Psychology, University of Pennsylvania, 3720 Walnut Street, Philadelphia, PA 19104, USA. e-mail: kable@psych.upenn.edu

Recent work has shown that visual fixations reflect and influence trial-to-trial variability in people's preferences between goods. Here we extend this principle to attribute weights during decision making under risk. We measured eye movements while people chose between two risky gambles or bid on a single gamble. Consistent with previous work, we found that people exhibited systematic preference reversals between choices and bids. For two gambles matched in expected value, people systematically chose the higher probability option but provided a higher bid for the option that offered the greater amount to win. This effect was accompanied by a shift in fixations of the two attributes, with people fixating on probabilities more during choices and on amounts more during bids. Our results suggest that the construction of value during decision making under risk depends on task context partly because the task differentially directs attention at probabilities vs. amounts. Since recent work demonstrates that neural correlates of value vary with visual fixations, our results also suggest testable hypotheses regarding how task context modulates the neural computation of value to generate preference reversals.

**Keywords: anchoring, context effects, contingent weighting, eye-tracking, neuroeconomics, risk aversion, visual attention**

## **INTRODUCTION**

A challenge for theories of decision making under risk is to account for known systematic inconsistencies in people's decisions. An example is the "preference reversal phenomenon," which involves systematic inconsistencies between preferences and prices (Lichtenstein and Slovic,1971, 1973; Grether and Plott,1979). Preference reversals were initially demonstrated by Lichtenstein and Slovic (1971). When given a choice between two gambles of similar expected value (EV), one with a high probability of winning a smaller amount of money (termed the P-bet) and another with a low probability of winning a larger amount (termed the \$-bet), most people choose the higher probability P-bet. However, when providing selling prices for the same exact gambles, most people assign a higher price to the larger amount \$-bet. These two decisions appear to be mutually inconsistent. The P-bet cannot be simultaneously better than *and* worse than the \$-bet, and one would expect people to demand a higher price for their preferred gamble. Preference reversals violate the principle of *procedure invariance*, whereby preferences should not change depending on how they are measured (Tversky et al., 1990; Stalmeier et al., 1997).

Despite its apparent irrationality, the preference reversal phenomenon is remarkably robust. For specifically designed alternatives, the frequency of reversals can be greater than 50% (Lichtenstein and Slovic, 1973; Grether and Plott, 1979; Tversky et al., 1990). The basic inconsistency has been replicated numerous times by psychologists and experimental economists, including under different designs using non-gamble stimuli and various incentive mechanisms (Mowen and Gentry, 1980; Tversky et al., 1990; Mellers et al., 1992a,b). Further, preference reversals persist in the face of large incentives (Lichtenstein and Slovic, 1973; Grether and

Plott, 1979), including when the experimenter exploits the inconsistency to take money from the subject (Berg et al., 1985; Chu and Chu, 1990).

Various explanations have been proposed for preference reversals, which attribute the reversal to changes at different stages of the decision process. Different theories attribute preference reversals to changes in how attributes are weighted (Tversky et al., 1988), changes in how weighted attributes are combined toform an evaluation (e.g., additive vs. multiplicative combination; Mellers et al., 1992b), or changes in how a formed evaluation is expressed, or translated into a response, in different tasks (Goldstein and Einhorn, 1987). Though conceptually distinct, changes at these different stages are also not mutually exclusive.

A prominent explanation for preference reversals is Tversky et al. (1988) *contingent weighting* hypothesis. They argue that attribute weights are closer to lexicographic (i.e., closer to all-ornone) in choice compared to other tasks, which leads to the most important attribute being weighted even more heavily in choice, a phenomena called the *prominence effect* (Slovic, 1975; Tversky et al., 1988). Since most people are risk-averse (Holt and Laury, 2002), weighting probability more than amount, this would lead to the probability dimension being weighted even more in choice than other decision tasks (Note there is some debate, though, about whether the prominence effect occurs for gambles; see Tversky et al., 1988, p. 382). By contrast, Tversky et al. (1988) argue that the payoff dimension is weighted more during bids because of the *compatibility effect*, whereby attributes that are compatible with the output are given more weight (in this case, payoff is compatible with bids, since both are in dollars; Slovic, 1975; Tversky et al., 1988). Formally, Tversky et al. (1988) model the change in

responses across the two tasks as a change in the weight α*<sup>i</sup>* (where *i* = choice, bid) of the following utility function for a gamble to win amount *a* with probability *p*:

$$U\left(\rho, a\right) = \log \rho + \alpha\_i \log a$$

Note that this is simply the logarithmic transform of an expected utility (EU) model in which the degree of risk aversion varies between choices and bids.

Here, using visual fixations as an index of information processing and visual attention, we sought to determine what information people attend to during a preference reversal paradigm. Specifically, we aimed to test whether visual fixations reflect changes in the weighting of different attributes, with people looking at probability information more during choices and amount information more during bids. Since preference reversals could be due to changes at different stages of the decision process, this finding would also provide additional support for contingent weighting being at least part of the explanation.

This experiment also builds on recent research linking visual fixations and preferences. Rangel and colleagues have shown that visual fixations both reflect and influence preferences between goods (Armel et al., 2008; Krajbich et al., 2009, 2010). Visual fixations also modulate the neural correlates of preferences, with activity in ventromedial prefrontal cortex and ventral striatum reflecting the value of the fixated item compared to the value of item not fixated (Lim et al., 2011). Here we test whether the link between fixations and preferences generalizes to decision making under risk, and whether fixations are further linked to attribute weights. Given the link between fixations and neural correlates of preferences, this evidence should also inform theorizing regarding the specific neural signals that might be modulated by task context to give rise to preference reversals.

Our investigation follows previous process tracing studies by Johnson et al. (1988) and Schkade and Johnson (1988). Using Mouselab, they found that individuals spent proportionally more time looking at probability information during choices than during bidding. However, Mouselab may not always provide the most natural decision environment (Lohse and Johnson, 1996). In Mouselab, subjects acquire information by positioning a mouse cursor over different windows, and the pattern of mouse movements is recorded. This can increase the amount of effort needed to acquire information, which can then alter the information processing behavior of subjects (Lohse and Johnson, 1996). Eyetracking does not have this problem. Since eye-tracking does not impose additional requirements on subjects to obtain or maintain information, it might in some cases provide a more sensitive or more accurate measure of information processing. For this reason, as well as to build on recent work linking visual fixations and preferences, we thought it was important to further investigate preference reversals using eye-tracking techniques.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Twenty-six paid volunteers from the University of Pennsylvania community participated in this study. Data from two participants were discarded because their responses suggested confusion regarding the bidding task. One participant's bids were not positively correlated with EV, and the other participant bid higher than the amount to win in several gambles. The mean age of our final sample (*N* = 24) was 23.6 years (age range: 19–29 years), and 52% were female. All participants gave written informed consent in accordance with the procedures of the Institutional Review Board at the University of Pennsylvania.

## **TASKS AND STIMULI**

On each trial, subjects either made a choice between two gambles (choice trials) or provided their evaluation of a single gamble (bid trials, see **Figure 1**). On choice trials, subjects chose between two different gambles with varying probabilities (12–95%) of winning different amounts of money (\$10–\$98). On bid trials, subjects entered their subjective evaluation of a gamble in dollar amounts. At the end of each session, one trial was randomly selected, and participants were paid according to their decision on that trial. If subjects won money, they received that money in addition to the show-up fee of \$10.

We used E-Prime to present all behavioral stimuli (Psychology Software Tools, Pittsburgh, PA, USA). Subjects entered their responses using a keyboard. Subjects were presented with a total of 100 bid trials and 100 choice trials in eight alternating blocks of 25 trials each. In case placement of the probabilities and amounts biased decision making, half the subjects saw the amounts as the top number and the other half saw the probabilities as the top number. All subjects saw the same set of gambles in the same order. During a choice trial, subjects were presented with a screen with the word "Choose" for one second. They then saw a screen with two gambles side-by-side and had unlimited time to choose between the two gambles. Subjects pressed "1" to choose the gamble on the left side of the screen and pressed "0" to choose the gamble on the right. During a bid trial, subjects were presented with a screen with the word "Bid" for 1 s. They then saw a screen with a single gamble and had unlimited time to enter their dollar bid. Subjects used the number keys to enter their bid and submitted their response by pressing the "return" key. Once bids were entered, subjects were unable to change their responses. Participants were instructed to bid the "smallest amount of money (they) would be willing to exchange for the opportunity to play the gamble."

Subjects went through a training period in the beginning to ensure understanding of the task. Subjects had two practice trials for each of the trial types. On bid practice trials, subjects were taken through a series of questions after they entered their bid. These questions were used during training to ensure that subjects understood the bidding task and could provide well-calibrated bids. First, subjects were asked if they would forego playing out the gamble to take a counteroffer that was \$1 higher than their bid. If they answered "no," they were told they bid too low and were asked to bid again. If subjects answered "yes," they were then asked if they would play out the gamble and forego taking a counteroffer \$1 less than their bid. If they answered "no," they were told they bid too high and were asked to bid again. Subjects repeated this process until they answered yes to both questions. These questions were only asked on practice trials, and were not included on experimental trials.

In choice trials, one gamble had a high probability of winning a small amount of money (termed the P-bet, e.g., 84% chance of \$20), and the other had a low probability of winning a larger amount (termed the \$-bet, e.g., 24% chance of \$70). Fifty pairs of P-bets (ranging from 70 to 95% chance of winning \$10–\$34) and \$-bets (ranging from 12 to 37% chance of winning \$35–\$98) were selected so that the P-bet and \$-bet were approximately equal in EV, with differences ranging from \$0.00 to \$0.09 and a median difference of \$0.02. Probability ranges were chosen based on previous studies (e.g., Lichtenstein and Slovic, 1971) and ensured the ranges for P-bets and \$-bets did not overlap. Amounts were chosen to provide a reasonable range of EV, given that subjects would be paid according to the outcome on a single trial. No probability or dollar amount was used more than twice in the stimulus set. This stimulus set was pre-tested in pilot behavioral subjects (*n* = 12) who demonstrated a robust preference reversal effect, and has now been used in several studies in our laboratory. To encourage participants to attend to each choice and avoid following a simple heuristic (such as always choosing the higher probability gamble), 10 of the 50 pairs were mismatched so that either the P-bet or \$-bet had a much higher EV. The EV across all gamble pairs varied from \$8.10 to \$29.23, with a median of \$18.13. Each pair was presented twice during choice trials, with the left-right placement of the gambles switching between presentations. The same gambles used in the choice task were shown once individually in the bidding task. Thus for each subject we have 100 choice and 100 bid trials where the stimulus on the left of the screen is identical, and what differs is the presence of another gamble or the bid prompt on the right side of the screen.

Both tasks were administered in an incentive-compatible manner. At the end of the experiment, participants rolled dice to randomly determine one bid or choice trial to be played out for real money. If a choice trial was selected, participants were given the opportunity to play the gamble that they chose, using a 100-sided die to determine the outcome. For example, if the chosen gamble was a 75% chance of winning \$21, a roll of 75 or below on the die would pay \$21 and a roll of 76 or above would pay \$0. If a bid trial was selected, participants were paid using the Becker–DeGroot–Marschak (BDM) method, a widely used incentive-compatible procedure (Becker et al., 1964). The subject's bid on the selected gamble was compared to a randomly generated counteroffer (between \$0 and the amount to win), created by dividing the roll of a 100-sided die by 100 and multiplying the resulting fraction by the amount to win. If the subject's bid was higher than the counteroffer, the subject played the gamble. If the subject's bid was lower than the counteroffer, the subject received the counteroffer amount. This method incentivizes participants to bid their true valuation of the gamble, the amount at which they would be indifferent between receiving their bid and playing the gamble. The amount of money subjects won varied from \$0 to \$37.41 with a median of \$21.

## **EYE-TRACKING**

We used an Eyelink II head-mounted eye-tracker (SR Research Ltd.,Mississauga, ON,Canada) to monitor participant's eye movements during the task. A camera imaged the participant's right eye at 250 Hz. Subjects sat approximately 18<sup>00</sup> from the screen and were calibrated using a 9-point calibration. To manage eye drift and head movement, the subject fixated on a black dot at the center of the screen after each trial and a drift correction measured how much each subject's measured gaze differed from the center of the screen. The experimenter monitored drift corrections throughout the whole experimental session and re-calibrated when the subject's gaze drifted from the center. Eye movements were recorded during each trial between the time of the first stimuli and the time of the subject's response.

#### **BEHAVIORAL ANALYSIS**

We used Matlab (Mathworks, Natick, MA, USA) and SPSS (SPSS Inc., Chicago, IL, USA) to analyze our behavioral and eye-tracking data. For each pair of gambles, we categorized responses in the choice task according to whether the subjects chose the P-bet both times ("chose P"), chose each bet once ("chose="), or chose the \$-bet both times ("chose \$"). Participants were consistent about 79% of the time, choosing the same gamble across both choices. In the bid task, we categorized responses according to whether the subject bid higher on the P-bet ("bid P"), bid equal amounts for both bets ("bid="), or bid higher on the \$-bet ("bid \$"). Within the 40 gamble pairs matched in EV, we calculated two measures of the preference reversal effect. One measure included all instances of increasing preference for the \$-bet ("weak P-to-\$ reversals"), that is, when subjects chose the P-bet both times then bid equal amounts, when they chose each bet once then bid higher on the \$-bet, or when they chose the P-bet both times then bid higher on the \$-bet. The other measure included only this last category, instances where the subject chose the P-bet twice and then bid higher on the \$-bet ("strict P-to-\$ reversals"). We also calculated two similar measures for reversals in the unpredicted direction, from the \$-bet in choice to the P-bet in bids.

In addition, we estimated a model in both tasks that assumed subjects' decisions were a function of the EU of the gambles:

$$\operatorname{EU}\left(\mathfrak{p},a\right) = \mathfrak{p} \times a^{\alpha\_i}$$

Here α*<sup>i</sup>* (where *i* = choice, bid) is a measure of risk aversion. An α*<sup>i</sup>* equal to one leads to risk-neutral decisions, an α*<sup>i</sup>* less than one to risk-averse decisions, and an α*<sup>i</sup>* greater than one to risk-seeking decisions. As mentioned in the introduction, one simple model of contingent weighting is merely the logarithmic transform of this equation (Tversky et al., 1988). From that perspective, an α*<sup>i</sup>* equal to one means equal weighting, an α*<sup>i</sup>* less than one means probability is weighted more strongly, and an α*<sup>i</sup>* greater than one means amount to win is weighted more strongly.

For choices, we fit a logistic regression that assumed choice probabilities (cp) were a function of the difference in expected utility between the two gambles:

$$\text{cp (EU1, EU2)} = \frac{1}{1 + \text{e}^{\oint (EU1 - EU2)}}$$

We fit this equation for each subject to his/her observed choices using an iterative optimization in MATLAB (fminsearch and fminunc) to find the maximum likelihood estimate of αchoice and β. The αchoice's of two subjects exceeded the boundaries that our model could reliably estimate (0.17 < αchoice < 5.05), so we excluded both α's from these subjects from further analysis. For bids, we fit a model that assumed the subject's bid was equal to the expected utility of the gamble, using non-linear least squares in MATLAB. We obtained almost identical results to those reported below if we fit αchoice and αbid using the logarithmic transform of expected utility (i.e., the contingent weighting equation in the introduction).

Response time was calculated as starting from the onset of the stimuli and ending when the participant submitted their responses.

Placement of the amounts and probabilities did not have any significant effects on choice and bidding behavior (i.e., strict or weak P-to-\$ reversals, αchoice or αbid). All *p*s > 0.10.

#### **EYE TRACKING ANALYSIS**

We used DataViewer (SR Research Ltd.,Mississauga, ON, Canada) for all pre-processing of the eye-tracking data and Matlab (Mathworks, Natick, MA, USA) for all eye-tracking analysis. The Eyelink II software automatically parses eye movement data into fixations, blinks, and saccades based on standard saccade thresholds (velocity threshold = 30˚/s, acceleration threshold = 8000˚/s<sup>2</sup> ). Only fixations initiated after the onset of the gambles were included in our analyses. Additionally, the Eyelink on-line parser denoted a blink when the pupil was very small, or when the eye-camera image of the pupil was missing or severely distorted by eyelid occlusion.

We defined regions of interest (ROI) corresponding to each amount and probability within each trial. The size of the screen was 800 by 1200 pixels, and each ROI was approximately 280 by 320 pixels. There were four ROIs in choice trials, and two ROIs during bid trials. For a controlled comparison between choice and bid trials, we focused our analyses on only the two ROIs for the left gamble in choice trials, since these were visually identical to and contained the same amount of physical space as the two ROIs in bid trials. For fixations and looking durations (but not first fixations), we observed the same pattern of results if we collapsed across all four ROIs in choice trials.

We included three dependent variables in our eye-tracking analyses: number of fixations, looking duration, and the first fixation of each trial. For each of our dependent variables, we ran an ANOVA with gamble type (P-bets vs. \$-bets), attribute (probability vs. amount), and trial type (choice vs. bids) as within-subject factors and attribute placement (probability on top vs. amount on top) as a between-subject factor. We refer to this ANOVA below as our between-task analysis. To test subsequent comparisons within a trial type, we ran separate ANOVAs for choice trials and bid trials with gamble type (P-bets versus \$-bets) and attribute (probability vs. amount) as within-subject factors and attribute placement (probability on top vs. amount on top) as a between-subject factor. We refer to these ANOVAs below as within-task analyses. These analyses were all done using raw fixation numbers and looking times, but we observed the same pattern of results if we examined ratios of these variables (e.g., the ratio of fixations on probability versus amount, etc.). Fixations and looking durations for gamble types and attribute were highly correlated. All *r*s > 0.92, *p*s < 001.

For fixations and looking durations (but not first fixations), placement of the amounts and probabilities did not interact with the eye-tracking effects reported below. There was, however, an interaction between attribute and attribute placement for all three dependent measures. Subjects had more total fixations [mean = 5.69 ± 0.46 fixations vs. mean 4.77 ± 0.48 fixations; *F*(1, 22) = 33.37, *p* < 0.001], longer looking durations [mean = 1,718 ± 208 ms vs. mean = 1,337 ± 192 ms; *F*(1, 22) = 22.15, *p* < 0.001], and more first fixations [mean = 77 ± 3% vs. mean = 23 ± 3%; *F*(1, 22) = 87.54, *p* < 0.001] for the attribute that was presented on top.

Finally, to testfor any effects of individual differences,we looked at the correlation between each of our eye-tracking dependent variables (proportion of total fixations and looking duration by trial type and gamble type; proportion of total fixations, looking duration, and first fixations by trial type and attribute) and each of our behavioral variables (number of strict and weak P-to-\$ reversals, αchoice and αbid). This analysis excluded the two subjects whose choice alphas exceeded the boundaries that we could reliably estimate (these two subjects were also outliers in terms of the number of reversals, with neither making any weak P-to-\$ reversals while the minimum among the remaining subjects was 22 weak reversals).

## **RESULTS**

#### **BEHAVIORAL RESULTS**

Overall, subjects spent more time on bid trials than on choice trials. There was a significant increase in response times from choice trials to bid trials*, F*(1, 23) = 74.95, *p* < 0.001. The average response time was 4,257 ± 549 ms during choice trials and 6,894 ± 485 ms during bid trials. [Note that, presumably secondary to this reaction time effect, there were also more total fixations, *F*(1, 22) = 43.57, *p* < 0.001, and longer looking durations, *F*(1, 22) = 40.45, *p* < 0.001, during bid trials than during choice trials.] Within bid trials, subjects took longer to bid on \$-bets than on P-bets, *F*(1, 23) = 31.43, *p* < 0.001. The average response time for bids on P-bets was 6,381 ± 97 ms and the average response time for \$-bets 7,394 ± 101 ms.

Subjects also demonstrated a robust preference reversal effect. During choice trials, subjects chose the P-bet significantly more often than the \$-bet, *F*(1, 23) = 34.02, *p* < 0.001. On average, subjects chose the P-bet both times for 66 ± 13% of the pairs, chose equally for 21 ± 4% of the pairs and chose the \$-bet both times for 13 ± 3% of the pairs (see **Figure 2A**). In contrast,

**FIGURE 2 | (A)** Percentage of gamble pairs where subjects chose the P-bet option twice (P), the \$-bet twice (\$), or both equally (=). On average, subjects chose the P-bet significantly more than the \$-bet. **(B)** Percentage of gamble pairs where subjects bid higher for the P-bet option (P), \$-bet option (\$), or bid the same amount for both gambles (=). On average, subjects bid higher on

\$-bets than on the P-bets**. (C)** Average alpha values for choice trials and bid trials. Alphas were significantly higher for bidding than for choice. **(D)** The average expected utility function for bids and choices given the inferred alphas. Subjects were risk-averse during choices and slightly risk-seeking during bids.

subjects bid significantly higher on the \$-bet than on the P-bet, *F*(1, 23) = 18.22, *p* < 0.001. Subjects bid higher on the \$-bet for 61 ± 13% of the pairs, bid the same on both gamblesfor 10 ± 2% of the pairs, and bid higher on the P-bet for 28 ± 6% of the pairs (See **Figure 2B**). Subjects preferred the P-bet significantly more often when choosing than when bidding, *F*(1, 23) = 40.54, *p* < 0.001, and preferred the \$-bet significantly less often when choosing than when bidding, *F*(1, 23) = 49.21, *p* < 0.001.

Across all gamble pairs, subjects exhibited increased preference for the \$-bet in bids more often than the reverse effect, *F*(1, 23) = 104.37, *p* < 0.001. Subjects made weak P-to-\$ reversals for 67 ± 5% of gamble pairs and weak \$-to-P reversals for 10 ± 3% of gamble pairs. Subjects also exhibited significantly more strict P-to-\$ reversals, choosing the P-bet both times and bidding higher on the \$-bet, than strict \$-to-P reversals, choosing the \$-bet both times and bidding higher on the P-bet, *F*(1, 23) = 53.30, *p* < 0.001. Subjects made strict P-to-\$ reversals for 37 ± 4% of gamble pairs and strict \$-to-P reversals for less than 1 ± 1% of gamble pairs. For pairs where the subject chose the P-bet both times, they bid an average of \$10.37 ± 2.35 higher on \$-bet.

Preference reversals were also evident by changes in risk aversion, or attribute weighting, in the two tasks. Subjects were risk-averse, weighting probability more, during choice trials (αchoice = 0.77, SE = ±0.05). In contrast, subjects were close to risk-neutral, weighting probability and amount almost equally during bid trials (αbid = 1.03, SE = ± 0.01; see **Figures 2C,D**). αchoice's were significantly smaller than αbid's, *t*(21) = −4.37, *p* < 0.001.

#### **EYE-TRACKING RESULTS**

For eye-tracking analyses, our main dependent variables were number of fixations and looking durations. Both of these variables showed strong effects of task context. In each task, subjects looked more at the preferred gamble type (P-bet in choices, \$-bet in bids) and the more heavily weighted attribute (probability in choices, amount to win in bids).

Subjects looked at the preferred gamble type more, fixating on P-bets more often during choice trials and \$-bets more often during bid trials (**Figure 3**). This was evidenced by a significant interaction between trial type and gamble type for both the number of fixations, *F*(1, 22) = 44.25, *p* < 0.001, and for the duration of fixations, *F*(1, 22) = 23.53, *p* < 0.001, in our between-task analysis. Looking within each task, subjects made significantly more fixations on P-bets (mean = 8.73 ± 0.55) than on \$-bets (mean = 7.55 ± 0.61) during choice trials, *F*(1, 22) = 27.48, *p* < 0.001. Subjects also spent significantly more time looking at P-bets (mean = 2,229 ± 191 ms) than at \$-bets (mean = 2,050 ± 231 ms) during choice trials, *F*(1, 22) = 7.77, *p* = 0.01. In contrast, during bid trials, subjects made more fixations on \$-bets (mean = 13.60 ± 0.98) than on P-bets (mean = 12.05 ± 0.93; *F*(1, 22) = 22.75, *p* < 0.001) and spent more time looking at \$-bets (mean = 4,213 ± 445 ms)

than at P-bets (mean = 3,727 ± 405 ms; *F*(1, 22) = 16.68, *p* < 0.001).

Fixations of the two attributes, probability and amount, also differed between choice and bid trials. Subjects were more likely to look at probabilities during choice and more likely to look at amounts during bidding (**Figure 4**). This was evidenced by a significant attribute by trial type interaction for both number of fixations, *F*(1, 22) = 14.13, *p* < 0.01, and looking durations, *F*(1, 22) = 4.29, *p* < 0.05, in our between-task analysis. Looking within each task, subjects made significantly more fixations on probability (mean = 4.3 ± 0.32 fixations) than on amount (mean = 3.9 ± 0.30 fixations) during choice trials, *F*(1, 22) = 5.57, *p* < 0.05. Similarly, subjects spent marginally more time looking at probability (mean = 1,126 ± 122 ms) than at amount (mean = 1,012 ± 100 ms) during choice trials, *F*(1, 22) = 4.19, *p* = 0.05 [this effect was more reliable when considering both gambles, instead of just the left gamble: duration on probability = 2,121 ± 233 ms, duration on amount = 1,865 ± 191 ms, *F*(1, 22) = 5.99, *p* < 0.05]. In contrast, during bid trials, subjects made significantly more fixations on amount (mean = 6.74 ± 0.53 fixations) than on probability [mean = 6.06 ± 0.47 fixations; *F*(1, 22) = 8.12, *p* < 0.01], and spent marginally more time looking at amount (mean = 2,137 ± 220 ms) than at probability [mean = 1,832 ± 237 ms; *F*(1, 22) = 2.97 *p* < 0.10].

There was further interaction between these effects of gamble type and attribute. Specifically, the interaction between trial type and attribute was greater for \$-bets than for P-bets. This was evidenced by a significant three-way interaction between trial type, gamble type, and attribute for both fixations, *F*(1, 22) = 5.43, *p* < 0.05, and for looking duration, *F*(1, 22) = 11.32, *p* < 0.01, in our between-task analysis.

We also examined which attribute was fixated on first in choice and bid trials. First fixations were more likely to be on probability than on amount across both kinds of trials (mean first fixation on probability = 59 ± 13%; *F*(1, 22) = 9.70, *p* < 0.01 in our betweentask analysis). Looking within each task, probability was more likely to be fixated on first in both choice trials, *F*(1, 22) = 11.06, *p* < 0.01, and in bid trials, *F*(1, 22) = 4.62, *p* < 0.05. This was qualified by a significant interaction between attribute and trial type [*F*(1, 22) = 5.88, *p* < 0.05], with probability more likely to be fixated on first in choice trials (62 ± 5% in choice trials vs. 55 ± 8% in bid trials). This interaction, however, was not reliable when we included both choice options (all four ROIs) in the analysis, rather than restricting our analysis to only the left choice option [*F*(1, 22) = 1.27, *p* > 0.10]. The two-way interaction between attribute and trial type was further qualified by a three-way interaction between attribute, trial type, and attribute order, *F*(1, 22) = 52.32, *p* < 0.001, in our between-task analysis. This interaction arose because during bid trials, subjects primarily fixated on the top attribute first (mean = 88 ± 3% of first fixations), regardless of whether it was probability or amount. Subjects fixated on the top attribute first to a lesser degree during choice trials (67 ± 5% of

during choices and bids.

first fixations). Thus it appears attribute placement had a stronger effect on first fixations than attribute identity.

Finally, we tested for any effects of individual differences by examining the correlations between the eye-tracking measures and behavioral measures. Only two of these correlations were statistically significant. Individuals who fixated on the P-bet more during choice (evaluated using either fixations or looking duration) were more risk-averse, *r*s = −0.66 and −0.61, *p*s < 0.01, respectively. Note these correlations remained significant even when using a Bonferroni correction for the number of correlations examined.

#### **DISCUSSION**

Here we replicated the preference reversal phenomenon in decision making under risk, in which people facing two gambles of equal EV choose the one with the higher probability of winning, but assign a higher price to the one with the larger potential payoff. We have additionally shown that preference reversals are accompanied by changes in visual fixations. Participants had more fixations on the preferred gamble in each task (P-bets in choices, \$-bets in bids). They also had more fixations on the more heavily weighted attribute in each task (probability in choices, amounts in bids). These results show that visual fixations reflect preferences in decision making under risk, as they do in decisions about goods (Krajbich et al., 2009, 2010), and that fixations further reflect attribute weights in a multi-attribute choice paradigm. These results support a contingent weighting explanation of preference reversals, and also suggest testable hypotheses about the neural mechanisms of preference reversals.

Behaviorally, we replicated the classic preference reversal finding. Our participants predominantly chose the high-probability bet from a pair of gambles matched in EV, and predominantly assigned higher prices to the (alternative) bet that offered the larger amount to win. For 37% of gamble pairs, our participants made strict P-to-\$ reversals, choosing the P-bet twice and bidding higher on the \$-bet. Consistent with this, participants were overall risk-averse during choices, and very slightly risk-seeking during bids.

One novel aspect of our paradigm compared to previous work is the highly repeated nature of the trials. Participants made 100 choices and 100 bids over the course of the experiment. Our results demonstrate that preference reversals are not eliminated when subjects are tested with many repeated trials. Our design does not allow us to test whether they are diminished by repeated trials, though the effects we observed in this experiment are of similar size to those reported in the literature. Most neuroscientific methods require many repeated trials and within-subject comparisons. While many context effects are eliminated under these conditions, our results show that preference reversals are not, and therefore may be a good paradigm for neuroscientific studies of context effects.

Despite only having to assess the value of one gamble, participants took longer to make bids than to make choices. Although it is possible that the difference in response times might be due to differences in response entry, it is unlikely that pressing one or two more buttons accounts for an increase of more than 2 s. Spending more time deciding on a bid than choosing between two options is consistent with previous findings (Johnson et al., 1988; Schkade and Johnson, 1988). It suggests that the decision process for assigning prices is potentially more complex than that required for binary choices. This is consistent with models that assume that binary choice is the more basic process (Johnson and Busemeyer, 2005), but not with models that assume pricing is more basic (Luce et al., 1993). Pricing and matching tasks have rarely been studied in decision neuroscience (though see Plassmann et al., 2007) so an interesting question for future research is the degree to which choice and bidding rely on shared vs. distinct neural processes.

Recent work has found that fixations reflect trial-to-trial variability in preferences (Krajbich et al., 2009, 2010). Our findings extend this principle to decision making under risk. During choices, participants made more fixations on the preferred gamble type in that task, P-bets, and spent a greater amount of time looking at P-bets. During bids, participants made more fixations on the preferred gamble type in that task, \$-bets, and spent a greater amount of time looking at \$-bets. We acknowledge that the bidding results are confounded by a longer reaction time for \$-bets than for P-bets, making this finding more difficult to interpret. There is not such a confound in the choice results, however, which clearly replicate the link between fixations and preferences observed in other choice domains.

Our key finding, though, was that preference reversals were associated with changes in visual fixations to the two gamble attributes in the two tasks. During bidding, participants made more fixations on amounts and spent a greater amount of time looking at amounts. During choices, participants made a greater number of fixations on probabilities and spent a greater amount of time looking at probabilities.

The directionality of these results is broadly consistent with the contingent weighting hypothesis. According to this hypothesis, preference reversals result from an increased weight on probability in value computations during choice, and a corresponding increased weight on amount during bids. We found that people fixate probabilities more during choice and amounts more during bids.

These differences in fixations might only be an index of the differential weighting of attributes, or alternatively might also be a cause of this differential weighting. This latter possibility raises several ideas for future research that would involve exogenously controlling fixations. If fixations influence attribute weighting, then preference reversals might be reduced, or even eliminated, when participants are forced to look equally at probabilities and amounts. In addition, forcing more fixations to the weaker attribute of an option might make people less likely to choose that option, a potential exception to previous work showing that fixating on an option makes people more likely to choose it (Armel et al., 2008).

However, a simple model in which preference reversals are due solely to changes in attribute weights, and fixations provide an unbiased index of these weights, has trouble completely accounting for our data. As shown in **Figure 2D**, participants' decisions reflect nearly equal weighting of probability and amount during bids (i.e., participants are close to risk-neutral), and a greater weighting of probability during choices (i.e., participants are riskaverse). In contrast, as shown in **Figure 4**, participants fixate probabilities more during choices and amounts more during bids.

One possible resolution is that people are intrinsically riskaverse, weighting probabilities more, and only changes from that intrinsic baseline are reflected in changes from equal fixation of the two attributes. Another possibility is that fixations are monotonically, but not linearly, related to attribute weights. While participants are close to risk-neutral during bids, they are still significantly risk-seeking, and they also fixate amounts more than probabilities. A final possibility, of course, is that fixations and looking times reflect more than attribute weights alone. For example, first fixations showed a strong effect of the spatial position of attributes, and other influences could have shifted fixations similarly in both choices and bids.

Our findings are similar to those reported previously by Johnson et al. (1988) and Lohse and Johnson (1996). Using Mouselab, those authors found that subjects attended to amounts more, and probabilities less, during bids than during choices (for example, 56 vs. 51% of the time in Experiment 1 of Schkade and Johnson, 1988). This same overall pattern was arguably more dramatic in our fixation data. This points to a potential difference in sensitivity between the two techniques, which might arise from how people process information differently in the two environments. In the Mouselab environment, only one piece of information is available at any one time. Johnson et al. noted that in their experiments some subjects used a strategy of first looking at all of the information sequentially, and then holding it in mind while they made their decision. Under free viewing, subjects do not adopt this strategy at all. Of the total fixations in **Figure 3**, 3.59 ± 0.28 fixations during choice trials are made when returning to an item after fixating on it once and then looking elsewhere, while 5.40 ± 0.42 represent return fixations during bidding.

Our data on individual differences provide additional support for the notion that fixations reflect preferences during choices. Individuals who fixated more on the P-bet during choice trials were more risk-averse. However, we did not find any other significant correlations between individual differences in eye movements and behavioral measures. A possible reason for these null findings is that we have a small sample size for evaluating individual differences. Additionally, most participants show a robust preference reversal effect, so there is limited variability in the number of preference reversals. Future research could further explore how individual differences in fixations related to individual differences in preference reversals, perhaps using a larger sample or a paradigm in which there is greater variance in the behavioral effect.

Future research could also investigate how different presentation formats affect eye fixations and, in turn, preference reversals.

#### **REFERENCES**


For example, Johnson et al. (1988) have shown that different presentation formats can move around preference reversals and that these changes are associated with changes in information processing. Specifically, when probabilities are more complex (e.g., 399/456) the number of preference reversals increases. In addition, subjects spent a greater proportion of time viewing probability information when probabilities were displayed as complicated fractions, and subjects who spent more time on probability also demonstrated more reversals. We do not know of any similar studies looking at the relationship between visual fixations and decisions under risk when presentation format varies, though this would be an interesting follow-up to our study.

Another interesting question for future research concerns the neural mechanism of preference reversals. Several studies have now demonstrated that BOLD activity in ventromedial prefrontal cortex and ventral striatum is correlated with the subjective value of the options under consideration during decision making (Kable and Glimcher, 2009). A recent study showed that value-related activity in these regions is further modulated by visual fixations, tracking the value of the fixated item compared to the item not fixated (Lim et al., 2011). Paired with our findings, this suggests the intriguing hypothesis that BOLD activity in ventromedial prefrontal cortex and ventral striatum differentially reflects probabilities and amounts during choices and bids. That is, in a preference reversal paradigm, BOLD activity in these regions might be more strongly affected by probabilities during choice and more strongly affected by amounts during bids. Such a finding would also suggest that neural correlates of probability and magnitude (Knutson et al., 2005) could depend on the task context.

In conclusion, we found that preference reversals in decision making under risk were accompanied by differential attention to probabilities vs. amounts. The directionality of this effect was consistent with a contingent weighting explanation (Tversky et al., 1988), with people looking at probabilities more during choice and amounts more during bids. Given recent work demonstrating neural correlates of value (Kable and Glimcher, 2009), which are modulated by visual attention (Lim et al., 2011), this work suggests testable hypotheses regarding how task-dependent strategies might alter the weighting of attributes in the neural computation of value to cause preference reversals.

#### **ACKNOWLEDGMENTS**

This research was supported by NIH grant DA029149 to Joseph W. Kable. We thank Karin Cox, Joe McGuire, and Nicole Senecal for comments on a previous draft, and Sharon Thompson-Schill for assistance with eye-tracking.

the preference reversal phenomenon. *Am. Econ. Rev.* 69, 623–638.


computational model of preference reversal phenomena. *Psychol. Rev.* 112, 841–861.


System Sciences 1996,"in *Proceedings of the Twenty-Ninth Hawaii International Conference on 4.* 86–97.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 March 2012; accepted: 26 June 2012; published online: 19 July 2012. Citation: Kim BE, Seligman D and Kable JW (2012) Preference reversals in decision making under risk are accompanied by changes in attention to different attributes. Front. Neurosci. 6:109. doi: 10.3389/fnins.2012.00109*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Kim, Seligman and Kable. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, providedthe original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Decreasing ventromedial prefrontal cortex activity during sequential risk-taking: an fMRI investigation of the balloon analog risk task

## *Tom Schonberg1\*, Craig R. Fox 2,3, Jeanette A. Mumford4, Eliza Congdon5,6, Christopher Trepel 2,7 and Russell A. Poldrack 1,4,8*


<sup>8</sup> Department of Neurobiology, University of Texas at Austin, Austin, TX, USA

#### *Edited by:*

Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland

#### *Reviewed by:*

Bernd Weber, Rheinische-Friedrich-Wilhelms Universität, Germany Thorsten Kahnt, University of Zurich, Switzerland

#### *\*Correspondence:*

Tom Schonberg, Imaging Research Center, University of Texas at Austin, 3925-B West Braker Lane, Austin, TX 78759, USA.

e-mail: tom@mail.utexas.edu

Functional imaging studies examining the neural correlates of risk have mainly relied on paradigms involving exposure to simple chance gambles and an economic definition of risk as variance in the probability distribution over possible outcomes. However, there is little evidence that choices made during gambling tasks predict naturalistic risk-taking behaviors such as drug use, extreme sports, or even equity investing.To better understand the neural basis of naturalistic risk-taking, we scanned participants using fMRI while they completed the Balloon Analog Risk Task, an experimental measure that includes an active decision/choice component and that has been found to correlate with a number of naturalistic risk-taking behaviors. In the task, as in many naturalistic settings, escalating risk-taking occurs under uncertainty and might be experienced either as the accumulation of greater potential rewards, or as exposure to increasing possible losses (and decreasing expected value).We found that areas previously linked to risk and risk-taking (bilateral anterior insula, anterior cingulate cortex, and right dorsolateral prefrontal cortex) were activated as participants continued to inflate balloons. Interestingly, we found that ventromedial prefrontal cortex (vmPFC) activity decreased as participants further expanded balloons. In light of previous findings implicating the vmPFC in value calculation, this result suggests that escalating risk-taking in the task might be perceived as exposure to increasing possible losses (and decreasing expected value) rather than the increasing potential total reward relative to the starting point of the trial. A better understanding of how neural activity changes with risk-taking behavior in the task offers insight into the potential neural mechanisms driving naturalistic risk-taking.

**Keywords: risk, risk-taking, BART, ventromedial prefrontal cortex, decision-making, fMRI**

#### **INTRODUCTION**

To date, functional imaging studies examining neural correlates of risk-taking have generally assumed an economic conception of risk defined as the variance of the probability distribution over possible outcomes (Markowitz, 1952). Thus, many functional imaging studies have relied on paradigms that were adapted for use with fMRI and involve exposure to simple chance gambles. These studies have asserted that regions such as the dopaminergic midbrain, the striatum, and anterior insula code risk (Paulus et al., 2003; Kuhnen and Knutson, 2005; Preuschoff et al., 2006) and that the insula codes risk prediction errors (Preuschoff et al., 2008).

While imaging studies using chance gambles have been interesting and informative, they provide an incomplete account of naturalistic risk-taking behavior. First, there is only modest evidence that choices among chance gambles in the laboratory can predict naturalistic risk-taking behaviors, such as drug abuse, physically risky sports, or even aggressive financial investment (Figner and Weber, 2011; Fox and Tannenbaum, 2011; Schonberg et al., 2011). Although a few studies have documented some successes (Barsky et al., 1997; Pennings and Smidts, 2000; Brown et al., 2006; Jaeger et al., 2010) others have failed to do so (e.g., Brockhaus, 1980) or have found that a simple self-report question about general risk propensity predicts naturalistic risk-taking more consistently (Dohmen et al., 2011). Naturally, such selfreports do not lend themselves to imaging studies, but can serve as covariates to fMRI-compatible tasks. Second, some fMRI-adapted laboratory tasks (e.g., Preuschoff et al., 2006; Tobler et al., 2007) have not included an active decision component, whereas others that do (e.g., Christopoulos et al., 2009; Tobler et al., 2009) may fail to evoke the dynamic, anticipatory emotions accompanying

<sup>1</sup> Imaging Research Center, University of Texas at Austin, Austin, TX, USA

naturalistic risky decisions (Loewenstein et al., 2001), such as escalating tension and exhilaration.

In contrast to chance gamble paradigms, The Balloon Analog Risk Task (BART, Lejuez et al., 2002) captures the escalating tension, which is often inherent to naturalistic risk-taking, and has also been found to predict several naturalistic risk-taking behaviors. In the BART, participants sequentially pump puffs of air into a balloon depicted on a computer screen (**Figure 1**). On each trial a participant earns a fixed amount of money for each successful pump (i.e., that expands, but does not break the balloon) but loses the accumulated amount if the balloon explodes before the participant stops pumping the balloon and cashes out. Subjects are unaware of the explosion probability of the balloon and thus the decision to pump or cash-out is made under uncertainty. The average number of pumps across all trials has been shown to correlate with self-reports of risk-taking behaviors such as stealing, unprotected sex, smoking, and substance abuse in adults and adolescents (Lejuez et al., 2003a,b, 2004, 2007; Bornovalova et al., 2005).

The goal of the current study was to identify the neural systems associated with risk-taking in the BART. In the task, as in natural environments, taking a risk (making an additional pump) can result in increased potential gains but also increases the likelihood of potential losses. This raises the question of whether participants cognitively represent the task in terms of the potential total reward relative to the starting point of a given trial (so that the potential gain rises with continued pumping) or in terms of possible losses and gains relative to a reference point that shifts after each successful pump (so that loss exposure increases and expected value decreases with continued pumping). Interestingly,

example of an explosion trial: participants press one of two buttons to inflate puffs of air into a balloon presented on a computer screen. Every successful pump adds \$0.25 to their temporary bank for that trial. If the balloon explodes before the participant cashes out then nothing is won on that trial. However, an explosion does not affect the cumulative total winnings earned on prior trials. **(B)** An example of a cash-out trial where the participant decided to stop pumping the balloon and earn the amount accumulated up to that point.

whenWallsten et al. (2005) compared the predictive power of computational learning models to account for participants' behavior in the BART, they found that two models best fit the data. The results marginally favored the model suggesting that people focus on accumulating rewards relative to the starting point of a trial over a model in which participants evaluated gains and losses relative to an updating reference point. However, several studies found that lay perceptions of risk tend to increase with greater exposure to possible harm or loss (e.g., March and Shapira, 1987), and behaviors such as drug use, stealing, and base jumping are often labeled "risky" because they can result in loss or harm to oneself or others (e.g., Furby and Beyth-Marom, 1992). In the current study we used fMRI data to investigate the cognitive representation of risk-taking in the BART, which can potentially inform how people frame risk-taking in naturalistic settings. A prior fMRI study of the BART (Rao et al., 2008) did not address this issue directly and focused on comparisons between active and passive risk-taking. That study also modeled risk in the task differently and did not have subjects play for real money.

Previous studies using static choice tasks involving chance gambles have found that activity in the ventromedial prefrontal cortex (vmPFC) correlates with decision values for a wide range of different rewards (Rangel and Hare, 2010) and is consistent with value integration (Rushworth et al., 2011). Based on these findings, we suggest that if participants represent the value of each pump as an accumulated reward relative to the starting point of the trial, we would expect an increasing activation in vmPFC with increasing pumps. If, on the other hand, participants update their reference point after each pump, we would expect decreasing vmPFC activity as the number of pumps increases. A better understanding of how neural activity changes with risk-taking behavior in these systems during the BART may shed new light on potential neural mechanisms driving naturalistic risk-taking, including instances of impaired decision-making such as addiction.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Sixteen healthy, right-handed volunteers (six males; mean age 23.6 ± 2.9 years) were recruited via advertisements on the UCLA campus. All subjects were free of neurological or psychiatric history and gave informed consent according to a University of California, Los Angeles Institutional Review Board protocol. Subjects were informed that they would be compensated on the basis of task performance.

#### **TASK**

In the BART (**Figure 1**), subjects inflate simulated balloons, and accrue monetary rewards for each successive "pump" during a particular trial. A trial is defined as a balloon that can be pumped a certain number of times and the trial can conclude in two different ways. First, the participant may "cash-out" at any point during the trial and secure the cumulative winnings up to that point for that balloon in their cumulative total "bank." Second, a balloon may explode; in this case, participants would lose the money accumulated on that trial alone (but not the total accumulated during previous cash-out trials). In our fMRI-adapted

version of the BART, each trial began with a balloon displaying a value of \$0.25 and the value of the balloon increased by \$0.25 for each successive pump. An explosion did not affect the cumulative total earnings from previous cash-out trials, which was displayed at the bottom of the screen at the end of each trial. During each trial, participants were presented with one of three types of "reward" balloons, each having a different explosion probability and signified by a different color: red, green, or blue. The maximum number of pumps allowed during each trial was determined by drawing a random number from a uniform distribution with maximum values of 8, 12, and 16, respectively. Thus, the explosion probability of each additional pump within a trial increased exponentially during the trial, at different rates for different color balloons. Participants were informed that balloon colors may signify differing explosion distributions, but were not provided any specific information about the explosion parameters. As a control task, participants intermittently inflated a gray "control" balloon (maximum 12 pumps) that did not explode and had no associated monetary value. The participants were instructed to inflate the control balloon until it disappeared from the screen (pumps ranged from 1 to 12, average 6.4 inflations) and the next trial began. Unlike with reward balloons, participants had no control over how many times they could inflate the control balloon before the trial ended. The order in which trials were presented was randomized among these four balloons.

#### **PROCEDURE**

Participants were given instructions and a short demonstration of the task before entering the scanner. They were instructed to use two buttons on a button box: the right pointer finger to inflate the balloon, and the right middle finger to cash-out. Inter-stimulus (pump) intervals varied between 1 and 3 s and inter-trial (balloon) intervals varied between 1 and 12 s with a mean of 4 s; these intervals were chosen in order to maximize de-convolution of the hemodynamic response of each individual event. The task was self-paced, and therefore the number of trials varied for each participant. Three scanning runs each lasted 10 min unless the participant ran out of balloons (each participant was allowed a maximum of 12 of each of the different balloons, including the control balloon), which also terminated the run. Stimulus presentation and recording of responses was conducted using MATLAB 6 and Psychtoolbox1, on a PowerBook G4 running Mac OS9. Visual stimuli were presented using MRI-compatible goggles (Resonance Technologies, Van Nuys, CA, USA).

#### **BEHAVIORAL ANALYSIS**

For each participant, for each of the three sessions, and for each of the three balloon types we calculated the total and average number of pumps. In addition, we calculated the total and average number of pumps only for trials when the participant cashed out before the balloon exploded (we refer to the latter measure as "adjusted pumps," which has been found to have higher predictive validity for self reported risk-taking; Lejuez et al., 2002). We

<sup>1</sup>www.psychtoolbox.org

also calculated the total and average number of cash-out trials, the average sum won on each trial, and the average reaction time (RT) for all pumps, cash-outs, and of the first and last pump from each trial. We performed a repeated measures ANOVA to compare these variables across the three sessions and three balloons. Statistical analyses of behavioral data were conducted using PASW Statistics Version 18.0.

#### **MRI DATA ACQUISITION**

Imaging was conducted using a 3T Siemens AG (Erlangen, Germany) Allegra MRI scanner at the Ahmanson-Lovelace Brain Mapping Center at UCLA. Participants first received a short localizer scan, followed by a T2-weighted matched-bandwidth highresolution structural scan, which matched the prescription of the functional runs. In each functional run, up to 300 functional T2∗-weighted blood-oxygen level-dependent (BOLD) echoplanar (EPI) images were acquired [34 contiguous 4 mm oblique axial slices; repetition time (TR) of 2 s, echo time (TE) of 30ms; matrix, 64 × 64; flip angle 90˚]. A full structural magnetization-prepared rapid-acquisition gradient echo (MPRAGE) scan was conducted for each participant following the functional runs (TR, 2.3; TE 2.1; FOV 256; matrix, 192 × 192; sagittal plane; slice thickness, 1 mm; 160 slices). The data are available from the OpenfMRI repository2.

#### **IMAGE PREPROCESSING AND REGISTRATION**

Data analysis and preprocessing were conducted using FSL 4.1.6 software tools3. The first two volumes were discarded to allow for T1 equilibrium effects. The remaining images were then realigned using MCFLIRT to compensate for small head movements. Translational movement parameters did not exceed 2 mm in any direction. The data were highpass-filtered in the temporal domain using a Gaussian-weighted least-squares straight line fitting, with sigma = 50.0 s. Brain extraction was done using BET. Affine spatial normalization was done using FLIRT and motion correction. Data were spatially smoothed using a 5-mm full-width-half-maximum Gaussian kernel. A three-step registration procedure was used by first registering BOLD EPI images to the matched-bandwidth high-resolution structural scan, then to the MPRAGE image, and finally into standard Montreal Neurological Institute (MNI) space. Statistical analyses of functional data were performed in native space, with the statistical maps normalized to standard space prior to higher-level analyses.

#### **fMRI ANALYSIS**

Analysis of functional data was done using a multi-stage general linear model approach with FEAT, in which event modeling was performed separately for each run using a canonical doublegamma hemodynamic response function. The three runs for each participant were then averaged together in a higher-level fixedeffects model. The group-level analysis was performed using the FMRIB Local Analysis of Mixed Effects 1 module in FSL (Beckmann et al., 2003). Outliers were automatically de-weighted in the multi-subject statistics using mixture modeling as implemented in FSL (Woolrich, 2008). Group analysis *Z* statistic images were prepared to show clusters determined by a height threshold of *Z* > 2.3 and an extent threshold of *p* < 0.05, corrected using the theory of Gaussian random fields (Poline et al., 1997), and all data shown in the figures adhere to these thresholds. For visualization purposes, statistical maps of all analyses were projected onto a study-specific average brain of the participants.

#### **fMRI MODEL**

In the general linear model we defined several regressors for each of the three types of events occurring in the task: pumps, cash-outs, and explosions. For the pumps we included three regressors:


For the first two regressors, we used the average RT for all pumps across all participants. The third regressor (PumpsRT) was orthogonalized with respect to the average activity regressor (PumpsAverage). The RT regressor was included to accountfor brain activity related to RT effects (see Grinband et al., 2008, 2010) across pumps. These three regressors were also included for the control balloons (ControlAverage; ControlParamertic; and ControlRT), to account for the motor and visual activity occurring when pumping balloons with no potential monetary reward or explosions. For the cash-out events we included three similar regressors (CashAverage; CashParametric; and CashRT). However, because there could be only one cash-out (or explosion) event for each trial (as opposed to multiple pumps within each trial), the demeaning of the pump number on which the cash-out (explosion) occurred was done across trials, rather than within trials. For the explosion events we included two regressors: ExplodeAverage and ExplodeParametric as there was no measured RT associated with explosions. Temporal derivatives were included as covariates of no interest to improve statistical sensitivity. Null events, consisting of the jittered inter-trial intervals when the screen was blank, were not explicitly modeled and therefore constituted an implicit baseline.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

The average number of pumps differed significantly between the different colored balloons (**Table 1**) suggesting that participants learned to differentiate between the balloons' explosion thresholds, despite the fact that they were not explicitly informed that these balloons differed in their underlying explosion probabilities. The average number of pumps on cash-out trials was lower than the average tolerance of the balloons [3.53, 3.99, and 4.82 for the average balloon tolerances of 4 (8 max), 6 (12 max), and 8 (16 max) pump balloons, respectively], suggesting that participants were, on average, risk-averse. In particular, a risk-neutral participant would maximize expected payout if she pumped to the level of the average tolerance for every balloon. We ran a

<sup>2</sup>http://openfmri.org/dataset/ds000001

<sup>3</sup>www.fmrib.ox.ac.uk/fsl


**Table 1 | Statistical analyses of behavioral variables from the task (SD in parentheses).**

In the ANOVA for the calculation of main effects of RUN, BALLOON we used the number of balloons per run. In the Table, for simplicity purposes we present the averages separately for the three balloons collapsed across runs. Standard deviation (SD) is presented in parentheses. \*For these variables data from two participants were not included in the analysis, as these participants had no cash-out trials for one or more of the balloons in one or more of the runs.

Cash-out RT\* 0.95 (0.81) 0.88 (0.35) 0.90 (0.40) F2,26 = 13.468, p < 0.001 F2,26 = 0.369, p = 0.69

repeated measures ANOVA with factors BALLOON and RUN to test the interaction between these factors but the interaction was never significant. For almost all of the behavioral variables there was a significant main effect of BALLOON, but no effect of RUN (**Table 1**; **Figure 2**). That said, participants apparently adjusted their behavior as the task progressed, as seen in the significant RUN effect for the average cash-out RT (i.e., the RT decreased across runs) and a smaller but significant effect of the RT of pumps, but only on cash-out trials. No effect of BALLOON was noted for any of the RT variables.

## **NEUROIMAGING RESULTS**

Different task-related events (pumps, cash-outs, explosions) activated distinct regions of the reward-based decision-making network. We now review the results for each event separately (see **Table 2** for a complete listing of coordinates).

## *Pumps*

Active risk-taking in the BART is captured by the sequential pumping of the rewarded balloons. Therefore, we focused on the parametric modulation of the rewarded balloons pumps, subtracting the parametric modulation of the control balloon pumps (thus removing visual and motor effects unrelated to risk and reward). Our behavioral results show that participants modulated their choice behavior coincident with the balloons' different explosion probabilities.We separately modeled participants'pump responses across the three rewarded balloons. However, we found no significant differences between the activity elicited during pumping of the different balloons, possibly due to power limitations arising from the limited number of trials for each balloon type. Therefore, we collapsed the rewarded balloons into a single regressor. We demeaned the number of pumps within each trial to capture the escalating explosion ("tension") probability and potential gain and/or loss associated with each of the three unique balloon types.

*Parametric effects.* For the positive contrast of parametric modulation by pump number (PumpsParametric > ControlParametric) we found significant activations in the bilateral anterior insula, dorsal anterior cingulate cortex (ACC), and right dorsolateral prefrontal cortex (DLPFC; **Figure 3A**, red). Each of these regions has been associated with risk (traditionally

defined as variance in the probability distribution over possible outcomes) in previous studies (Preuschoff et al., 2006, 2008). More importantly, when we tested the negative of this contrast (i.e., ControlParametric > PumpsParametric) we found highly focused vmPFC activation (**Figure 3A**, Blue) as well as bilateral medial temporal lobe (MTL) activation. The same effect was observed in vmPFC (as well as posterior cingulate) in the Baseline > PumpsParametric contrast, suggesting that this effect is not driven by the response to the control balloons.

*Average activity.* We observed widespread and significant positive effects for average activity during pumps (**Figure 3B**), subtracting average activity during control pumps (PumpsAverage > ControlAverage), in bilateral insula, dorsal ACC, caudate, lateral orbito-frontal cortex (OFC), frontal poles, and the visual and parietal cortices. Moreover, there was widespread activation with the negative of this contrast (i.e., ControlAverage > PumpsAverage) in the default mode network

#### **Table 2 | Peaks of significant clusters of activation.**



X, Y, and Z MNI coordinates in millimeters indicate the location of peak voxel activation. R, Right; L, Left.; B, Bilateral.

(Smith et al., 2009), which includes frontal, parietal, and temporal cortices.

*Reaction time.* PumpsRT > ControlRT revealed bilateral occipital pole activations. There were no activations for the negative of this contrast (i.e., ControlRT > PumpsRT).

#### *Cash-outs*

*Parametric effects.* For the parametrically modulated cash-out regressor there were clusters of activation in ACC as well as in areas that have not been emphasized in the reward/risk related literature (including planum temporale, precuneus, and visual areas). No regions showed a negative correlation with the parametrically modulated cash-out regressor.

*Average activity.* Cash-out events led to significant activations across many dopamine-innervated regions including cingulate cortex, bilateral insula, and striatal regions (**Figure 4A**). This event has been interpreted as a"win"in a previous BART study (Rao et al., 2008). However, it might also be interpreted as the alleviation of the tension that would have been caused by continued exposure to risk (i.e., "relief"). Cash-outs have a completely predicted outcome, as participants already know exactly how much money will be transferred to their bank when they decide to cash-out. No regions showed a negative correlation with average activity during cash-outs.

**FIGURE 3 | fMRI activations during pumping. (A)** Parametric modulation of increasing number of pumps of the rewarded balloons (subtracted by the parametric modulation of the control balloon). Red scale presents PumpsParametric > ControlParametric and blue scale presents

ControlParametric > PumpsParametric. **(B)** Average activity during pumps (subtracted by the average activity of pumping the control balloon). Red scale presents PumpsAvergae > ControlAverage and blue scale presents ControlAverage > PumpsAverage.

*Reaction time.* Cash-out activity modulated by cash-out RT (**Figure 4B**) was seen in visual areas, parahippocampal areas and also in regions previously related to risk including bilateral anterior insula, middle frontal gyrus (MFG), and dorsal ACC. No regions showed a negative correlation with average activity modulated by actual cash-out RT.

## *Explosions*

*Parametric effects.* For parametrically modulated activity during explosions, we observed activations in the anterior and posterior cingulate cortex, and right inferior frontal gyrus (**Figure 5A**). No regions showed a negative correlation with parametrically modulated explosion activity.

*Average activity.* During explosions, activity was seen in bilateral insula, ACC, parietal, and superior frontal gyrus (**Figure 5B**). However, unlike a previous BART fMRI study (Rao et al., 2008), we observed no positive or negative activity in the ventral striatum (i.e., no indication of a negative prediction error signal). The activation for the negative of this contrast was focused within vmPFC.

## **DISCUSSION**

To investigate the neural basis of naturalistic risk-taking, we scanned participants using fMRI while they completed the BART, an experimental measure that includes an active decision/choice component and that has been found to correlate with naturalistic risk-taking behaviors (Lejuez et al., 2002, 2003a,b). In this task, as in many naturalistic settings, escalating risk-taking might be perceived as the accumulation of greater potential rewards or as exposure to increasing possible losses and therefore decreasing marginal expected value. We found that vmPFC activity decreased as the number of pumps increased. In light of previous findings implicating vmPFC in value calculation (e.g. Rushworth et al., 2011), we believe that this result may suggest that escalating risktaking in the task may be perceived as exposure to increasing possible losses (and decreasing marginal expected value) rather than as an increasing potential aggregate reward relative to the starting point of the trial (see below for alternative interpretations of this result). In addition we found that activations in bilateral anterior insula, ACC, and right DLPFC correlated positively with increasing number of pumps. Activations in all of these regions have been previously found to correlate with risk and/or risktaking, though they have also been associated more generally with task difficulty and error monitoring.

In the original BART, and in the version used in the current study, each successful pump increases the potential trial reward by a fixed amount. At the same time, each successful pump increases the amount that a participant could potentially lose on the next pump, as well as the likelihood that the next pump will result in an explosion. Wallsten et al. (2005) compared several computational learning models to account for participants' behavior in the BART. In particular, they examined two potential cognitive representations of the decision to continue pumping (or not). They suggested that, on each pump, participants might consider: (a) the *total value* of the potential gain they will receive if the balloon does not explode, relative to the trial starting point, or (b) the sequentially updated *marginal value* that each additional pump will add (if the balloon does not explode) or subtract (if it does explode), relative to the current accumulated gain. Their results did not lead to a definitive conclusion but the authors found evidence supporting the first representation. However, our results support the second representation and favor the suggestion that participants dynamically update the value of each additional pump until the subjective value of the next pump is negative. In this value calculation, the potential amount of gain over pumps is considered constant across pumps (but decreases in probability) while the possible amount of loss is perceived to increase with every pump (and increases in probability). Although this is only one possible interpretation of the results (see other possibilities below) it accords with the common lay and clinical view of risk as increasing with greater exposure to loss or harm (March and Shapira, 1987; Furby and Beyth-Marom, 1992).

Previous findings suggest that the vmPFC encodes different types of decision values (Plassmann et al., 2008; Chib et al., 2009; Glascher et al., 2010; Hare et al., 2010) and acts as a value integrator (Rushworth et al., 2011). Thus, our finding of decreasing vmPFC activation, coinciding with participants decision to further inflate the balloon, suggests that they may be

updating their reference point when assessing the possible consequences of each additional pump. Moreover, the current study is the first to provide evidence consistent with such a value representation in a sequential risk-taking task. We note that activity in the vmPFC has been shown to parametrically increase (decrease) with potential gains (losses) when participants were deciding whether or not to accept mixed gambles that offer a 50-50 chance of gaining (or else losing) various amounts of money (Tom et al., 2007). This result is consistent with the notion that participants focus their attention on potential losses from each additional pump rather than on the sequential marginally added value. Unfortunately, we cannot distinguish between these two interpretations because the expected value of an additional pump and the potential loss are perfectly correlated in the BART.

A previous imaging study of the BART (Rao et al., 2008) did not report any evidence of a value signal encoded in the vmPFC. This might be either due to the lack of reporting any negatives of the main contrasts and corresponding activations and/or due to the fact that the study used different value and explosion functions and that the participants did not play for real money. A recent investigation into the link between alcohol dependence and risktaking behavior in the BART (Bogg et al., 2011) also did not report vmPFC activations for any contrast, but this may be due to the use of a very different version of the BART that separated the outcome of each pump from the next decision. It should be noted that both of these studies parameterized risk as the objective explosion probability of each balloon. We chose to use the demeaned number of pumps for each balloon (rather than the objective explosion probability known only to an ideal observer) since our behavioral results suggested that subjects did not have an accurate estimation of the actual explosion probabilities for each balloon (see **Figure 2**; **Table 1**). The choice to demean each pump within a trial compared to that trial's average encapsulates the different explosion probabilities of the different balloons (since the average pumps per balloon were significantly different) while testing for the increasing tension with each increasing pump. Unfortunately, the number of trials per balloon type and the sample size of this study did not allow us to perform a proper fit of a learning model to estimate the subjective explosion probability of each subject on a trial by trial basis. The current sample size also did not allow examination of individual differences (on the required sample size for individual differences related to risk-taking in the task see Yarkoni, 2009).

The regions that exhibited activations with increased risktaking in the present version of the BART (bilateral insula, ACC, and right DLPFC) were the same as those identified with a different version of the BART (Rao et al., 2008). First, the insula has been previously shown to encode economic risk (as defined by variance in the probability distribution over possible outcomes; Preuschoff et al., 2006, 2008) and likewise in the BART, each additional pump leads to increased variance in the probability distribution over possible outcomes. Activity in the insula has also been previously shown during active risk-taking tasks and specifically to be more active when choosing to avoid risk (Paulus et al., 2003; Kuhnen and Knutson, 2005). Second, increasing ACC activation has been previously observed with increasing decision

conflict, error likelihood (Alexander and Brown, 2011), and action selection (see recent review by Rushworth et al., 2011). The ACC (and anterior insula) are the most commonly activated regions in neuroimaging studies (Nelson et al., 2010; Yarkoni et al., 2011). This may be due to the fact that task difficulty generally correlates with prolonged RTs, which might have led to increased fMRI activations. Recently, Grinband et al. (2010) demonstrated this by showing that RT effects correlated with activity in dorsal ACC beyond the conflict in a Stroop task. It is important to note that we observed ACC and insula activations that persisted when controlling for RT. This could be an indication that the difficulty of the decision increased during each subsequent pump of the balloon. To our knowledge, our study is the first in the risk-taking domain to account for RT effects. Third, an additional manipulation used by the authors in a previous BART study (Rao et al., 2008) tested active versus passive risk-taking in the task and found that right DLPFC was active when participants were taking active compared to passive risk. Fecteau et al. (2007) were able to reduce risk-taking in the BART by enhancing DLPFC activity using transcranial direct current stimulation (tDCS). In a task very similar to the BART (the Devil's task), Gianotti et al. (2009) found a negative correlation between tonic activity in the DLPFC and risk-taking. Studies using other risk-taking tasks have shown that temporarily disrupting DLPFC activity, using repetitive transcranial magnetic stimulation (rTMS), led to increased risk-taking (Knoch et al., 2006). DLPFC activity has been also demonstrated while exerting self-control in a task where participants needed to choose healthy over unhealthy food items (Hare et al., 2009). All of these studies support the conclusion that DLPFC activity is required in order to exert cognitive control and reign in continued risk-taking. We interpret our result showing increasing DLPFC activation with increased pumping as reflecting the increased engagement of self-control, which drives subjects to stop pumping as the balloons increase in size and are more likely to explode.

There is an intriguing similarity between our results and those of Campbell-Meiklejohn et al. (2008). Using a loss-chase paradigm, in which participants decide to either accept a small loss or else continue gambling and thereby increase or expunge that loss, the authors found that loss-chasing correlated with an increase in vmPFC activity. Concurrently, when participants stopped chasing losses the authors saw an increase in activity in ACC, anterior insula, and frontal regions. Thus, loss-chasing might be seen as an anti-BART paradigm in the sense that when participants are chasing losses they appear to be focused on the increasing potential loss.

There are two main caveats to the present study. First, because we followed the design of the original BART as closely as possible, participants in our task were required to learn the explosion probabilities of the different balloon types from experience while making pumping decisions. Our behavioral results show that participants did not change their choice behavior significantly over the three task sessions, suggesting that they rapidly learned the properties of the task. As noted above, a computational learning model has been proposed for a similar version of the BART (Wallsten et al., 2005) that parameterizes subjective probabilities of explosion for each pump. The sample size in the current study did not allow the use of this model and thus future studies with much larger sample sizes will be needed to test whether such a model applies to the fMRI-adapted design that we employed here. Second, our interpretation of how participants appear to have framed the task relies on a reverse inference (see review by Poldrack, 2006): we surmise from involvement of the vmPFC that the participants assessed the marginal decreasing expected value of each successive pump and/or focused on increasing loss exposure rather on total potential gains relative to the starting point of the trial.We feel this inference may be justified because analysis of the NeuroSynth database<sup>4</sup> (Yarkoni et al., 2011) shows that the closest non-empty coordinate to our peak activation in vmPFC ([4, 24, −16], which is included in the activation cluster) has a very high posterior probability of terms associated with choice [P("choice" present in paper | activation) = 0.88] and losses (posterior probability of "losses"= 0.84). This region is also often associated with the default mode network (Smith et al., 2009), and an alternative interpretation of the results might be that with increasing pumps participants are more and more engaged in the task and thus, vmPFC activity could simply reflect activity in the default mode network. However, the association of the same voxel with the term "resting state" is weaker

4www.neurosynth.org

#### **REFERENCES**


(posterior probability = 0.76). These meta-analytic results suggest that our reverse inference may be reasonable, though these inferences must remain tentative until tested using an alternative design of the task that will allow a more direct test of this interpretation.

In summary, we show using the unique design of the BART that while activity parametrically increased in anterior insula, dorsal ACC, and DLPFC with the additional risk associated with each pump, activity in vmPFC parametrically decreased with each successive pump of the balloon. Although this is only one possible interpretation, it suggests that even under the dynamic conditions of the task, participants encoded the decreasing subjective value of each pump and/or focused on the increasing potential losses until they decided to stop pumping. Identifying these two opposing brain systems during BART performance, the one increasing and the other decreasing, suggests that increased naturalistic risktaking, as previously shown to be measured using the task, might be attributed to an abnormality in one (or both) of these brain systems.

## **ACKNOWLEDGMENTS**

This work was supported by NSF DMI-0433693 (R. Poldrack and C. Fox, principal investigators, PIs). We would like to thank Elena Stover for assistance with data collection and for helpful comments on an earlier version of this manuscript.


error likelihood. *Neuroimage* 57, 303–311.


risk-taking task. *Psychol. Rev.* 112, 862–880.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 March 2012; accepted: 12 May 2012; published online: 04 June 2012.*

*Citation: Schonberg T, Fox CR,Mumford JA, Congdon E, Trepel C and Poldrack RA (2012) Decreasing ventromedial prefrontal cortex activity during sequential risk-taking: an fMRI investigation of the balloon analog risk task. Front. Neurosci. 6:80. doi: 10.3389/fnins.2012.00080*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Schonberg , Fox,Mumford, Congdon, Trepel and Poldrack. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## A neuropsychological approach to understanding risk-taking for potential gains and losses

## *Irwin P. Levin1\*, Gui Xue2, Joshua A.Weller 3, Martin Reimann2, Marco Lauriola4 and Antoine Bechara2,5,6\**

<sup>1</sup> Department of Psychology, University of Iowa, Iowa City, IA, USA


<sup>6</sup> Clinical Research Division, Douglas Mental Health University Institute, Montreal, QC, Canada

#### *Edited by:*

Kerstin Preuschoff, University of Zurich, Switzerland

#### *Reviewed by:*

Bruno B. Averbeck, National Institute of Mental Health, USA Philippe N. Tobler, University of Zurich, Switzerland Anna Van Duijvenvoorde, University of Amsterdam, Netherlands

#### *\*Correspondence:*

Irwin P. Levin, Department of Psychology, University of Iowa, Iowa City, IA 52242, USA. e-mail: irwin-levin@uiowa.edu; Antoine Bechara, Department of Psychology, University of Southern California, Los Angeles, CA 90089, USA.

e-mail: bechara@usc.edu

Affective neuroscience has helped guide research and theory development in judgment and decision-making by revealing the role of emotional processes in choice behavior, especially when risk is involved. Evidence is emerging that qualitatively and quantitatively different processes may be involved in risky decision-making for gains and losses. We start by reviewing behavioral work by Kahneman and Tversky (1979) and others, which shows that risk-taking differs for potential gains and potential losses. We then turn to the literature in decision neuroscience to support the gain versus loss distinction. Relying in part on data from a new task that separates risky decision-making for gains and losses, we test a neural model that assigns unique mechanisms for risky decision-making involving potential losses. Included are studies using patients with lesions to brain areas specified as important in the model and studies with healthy individuals whose brains are scanned to reveal activation in these and other areas during risky decision-making. In some cases, there is evidence that gains and losses are processed in different regions of the brain, while in other cases the same region appears to process risk in a different manner for gains and losses. At a more general level, we provide strong support for the notion that decisions involving risk-taking for gains and decisions involving risk-taking for losses represent different psychological processes. At a deeper level, we present mounting evidence that different neural structures play different roles in guiding risky choices in these different domains. Some structures are differentially activated by risky gains and risky losses while others respond uniquely in one domain or the other. Taken together, these studies support a clear functional dissociation between risk-taking for gains and risk-taking for losses, and further dissociation at the neural level.

**Keywords: decision neuroscience, risky decision-making, gain/loss domain differences**

## **INTRODUCTION**

The combination of methods from the behavioral decisionmaking literature such as risky decision-making tasks derived from the classic work of Kahneman and Tversky (1979), and methods of neuroscience such as functional magnetic resonance imaging (fMRI) and lesion studies has led to breakthroughs in both fields. Examples include how impairment in specific brain functions translate into disadvantageous decision-making inside and outside of the laboratory (Bechara et al., 1994, 1996, 1997, 1999) and how common decision-making biases and heuristics can be understood at the neural level (Sanfey et al., 2003; Hsu et al., 2005; Kuhnen and Knutson, 2005; De Martino et al., 2006; Huettel et al., 2006; Tom et al., 2007). New areas of study have emerged with titles such as neuroeconomics and decision neuroscience.

A major contribution of this work has been a better understanding of how emotion, in combination with cognition, guides our decisions, particularly in the realm of risky decision-making where conflicts often arise in balancing the lure of reward and the fear of loss. Evidence is accumulating that emotional reactivity differs in response to risky gains and risky losses. Logical questions are whether risk-taking for gains and risk-taking for losses can best be understood as separate psychological processes, and ultimately, whether they rely on different brain structures. In this paper, we integrate findings from our own work and that of others to come to conclusions that have some generality but also allow for differences between studies based on methodology.

In order to frame this investigation, we start with a model put forth to support the findings from two studies we conducted with patients with lesions to areas of the brain known to be critical to risky decision-making, namely the ventromedial prefrontal cortex (VMPFC), the amygdala, and the insula (Bechara et al., 1999; Clark et al., 2008). As summarized in **Figure 1** (from Weller et al., 2007), we propose that risky decision-making is influenced by the opposing forces of lure of gain and fear of

<sup>5</sup> Department of Psychiatry, McGill University, Montreal, QC, Canada

risk1. We operationalize the "lure" of rewards as either the potential for a relatively large gain in the gain domain (in comparison to the small sure gain from a riskless choice) or the potential for avoiding a loss altogether in the loss domain, and the "fear" of risk as arising from risking a relatively large loss in the loss domain (in comparison to the small sure loss from a riskless choice) or not winning anything in the gain domain. These two forces act in opposite directions in exciting or inhibiting risk-taking. We suggest that theVMPFC subregions, the amygdala, and the insula each contribute in different ways to the processing and utilization of these two critical pieces of emotional information. The mere presence of uncertainty induces a primary "fear" response elicited by the amygdala, which has been associated specifically with fear processing and avoidance behavior (LeDoux, 2000; Trepel et al., 2005; Phelps,2006). Thisfear response activates theVMPFC whosefunction it is to mediate decision-making and allows for more careful deliberative processes by linking together working memory and emotional systems (Damasio, 1994).

While the amygdala has been studied extensively and shown to be a key substrate for triggering emotional responses, especially in connection with fear (LeDoux, 2000), the fact remains that the triggering of emotional responses involves multiple neural regions, and not just the amygdala. Thus, structures such as the insula, which are independent of the amygdala, are also likely to impact decision-making under uncertainty (Kuhnen and Knutson, 2005; Clark et al., 2008; Weller et al., 2009). In particular, we propose that the insula and the amygdala provide complementary systems for dealing with potential losses, which we attribute to the evolutionary significance of dealing with potential losses. Our ancestors

learned to avoid situations that risked the loss of things essential for survival and it is reasonable to assume that our brains have been primed for avoiding losses.

This account parallels the proposed dual systems approach of System 1 (experiential) and System 2 (deliberative) for decision-making (Kahneman, 2003). The neural underpinnings of these mechanisms have also been addressed in the "somatic marker" framework. According to the "somatic marker hypothesis" (Bechara and Damasio, 2005; Reimann and Bechara, 2010), after the amygdala triggers an automatic emotional response (or primary induction), the VMPFC subsequently prompts a more careful deliberative analysis that triggers secondary emotional responses (secondary induction) that help guide advantageous decision-making. Findings in support of the somatic marker hypothesis were key to new behavioral theories in which emotions play a pivotal role in decision-making (Mellers et al., 1999; Loewenstein et al., 2001; Slovic et al., 2002).

In the following sections of this paper, we review the evidence for our model based on studies involving the VMPFC, amygdala, and insula, but we also include studies involving other areas that have implications for addressing the basic question of whether there is evidence at the neural level of a distinction between risky decision-making in the gain and loss domains. We will provide evidence that separate psychological processes are involved in risktaking for gains and losses in terms of both behavioral and neurological reactions that discriminate between risk-taking to achieve a gain and risk-taking to avoid a loss. We then address the more complex issue of whether distinct neural structures support these different reactions. In the case of fMRI studies, we will see that results depend on when during the decision-making process the recordings are made. We start, however, with some more straightforward and well-known behavioral phenomena that motivate the search for neurological dissociations between risk-taking for gains and losses.

It is typical to consider risk-taking as a unified behavioral concept when we talk about a person in terms such as "She is a risk-taker" or "He likes to play it safe." However, it has been shown that risk-taking within the same individual varies across content domains such as monetary, health, and social risks (Weber et al., 2002). Within each of these domains, we may talk about an action as being "risky" because of the uncertainty of its outcome without differentiating between the potential for achieving benefits versus the potential for avoiding aversive consequences. Kahneman and Tversky (1979) demonstrated a fundamental principle that sparked decades of later research: individuals were more likely to take a risk to avoid a loss than to achieve a gain of the same magnitude2. Later work by the same authors revealed a fourfold pattern of risk-aversion for gains and risk-seeking for losses of

<sup>1</sup>Some definitions of risk include loss as a component. However, in order to incorporate risk-taking for gains and losses, we use a more general definition of risky choice as involving choice options of differing outcome variability. In the typical task described here, the choice is between a "sure thing" or "riskless" option with fixed outcome and a "risky" option with variable possible outcomes.

<sup>2</sup>Following Kahneman and Tversky (see Kahneman, 2003), most framing studies have employed between-subjects designs. However, it is important to note that reliable risky choice framing effects have been reported in within-subject designs where procedural precautions have been taken to avoid recognition of repeated problems by using multiple problems and presenting gain and loss versions of the same problem in separate sessions spaced widely apart (Levin et al., 2002). Emphasis in this paper will be on tasks involving actual gains and losses where separate gain and loss trials can be administered to the same decision makers without concern for consistency demands.

high probability but risk-seeking for gains and risk-aversion for losses of low probability (Tversky and Kahneman, 1992). This was explained in terms of underweighting the likelihood of high probability but overweighting the likelihood of low probability events. The tasks described in this paper will primarily be of the former type. This paper describes a relatively new component of this research: neuroscientific studies that provide additional sources of data that separate risk-taking to achieve a gain and risk-taking to avoid a loss.

In presenting the most recent research in our laboratory, we focus on the "cups task" (Levin et al., 2007), which we developed specifically to separate risky decision-making for actual gains and losses, both in terms of overall riskiness and sensitivity to expected value (EV) differences between choice options. The cups task includes a gain domain and a loss domain. Gain trials involve some probability of an addition to the decision-maker's account while loss trials involve a possible reduction. Decision makers choose between one array of cups in which the outcome is constant (the riskless choice) and one array of cups in which the outcomes vary (the risky choice). Outcomes are displayed immediately after choices are made. By varying the number of cups and the amount to be won or lost, we create gain and loss trials with contingencies that either do or do not favor a risky choice (see **Figure 2**). For example, a one-out-of-three chance of winning five coins is better in the long run than a sure gain of one coin but a one-out-of-three chance of losing five coins is worse in the long run than a sure loss of one coin. A key component of data analysis for the cups task is the extent to which an individual makes choices based on the consideration of relative EV between choice options, for both gainand loss-related decisions. EV sensitivity represents an index of advantageous decision-making because consistently choosing the option with a more favorable EV will yield more positive outcomes in the long run. As will be described later, a somewhat simpler

version of the task was adapted for use in scanner research. Across many data sets, we demonstrated that Kahneman and Tversky's (1979) original finding of more risk-taking to avoid a loss than to achieve a gain of the same magnitude is reproduced in the cups task. Beyond the initial demonstration of greater risk-taking for losses than for gains, our recent research with the cups task showed age-related differences in risk-taking as a function of decision domain (risk-taking to achieve a gain versus to avoid a loss). Risk-taking in the domain of gains decreased monotonically from early childhood to older adulthood whereas overall risk-taking to avoid losses was remarkably constant across age groups (Weller et al., 2011). Within both domains, EV sensitivity increased from early childhood through adulthood with a slight decline for older adults.

## **EVIDENCE FROM DECISION NEUROSCIENCE**

We turn to neuroscience for an exploration of brain functions that may help explain these gain/loss behavioral differences. Our approach in this paper is to provide a body of evidence that is consistent with the proposition that risky decision-making is separable in the gain and loss domains rather than providing a single "critical" test.

Historically, the most fundamental functional division of the brain was thought to be the one that distinguished between approach and avoidance behaviors. However,many years of animal research failed to identify anatomically separate neural substrates– neural systems underlying pain and pleasure seem to overlap considerably (e.g., Craig, 2009). Later human behavioral studies found equivocal support for a separation of neural systems whereby the left hemisphere is predominantly concerned with approach behaviors and the lure of reward,whereas the right hemisphere is critical for avoidance behaviors and the fear of uncertainty (Davidson et al., 1990). More recently, neuropsychological

In the experiments these were counterbalanced over trials.

research on the approach–avoidance conflict evolved into studies of risky decision-making where the shift was to a more microscopic analysis of neural systems.

Neuroimaging data have been used to gain new insights concerning risky decision-making. In particular, fMRI studies use changes in blood flow that accompany neural activity in different parts of the brain to associate these areas to particular behaviors. For instance, in a recent meta-analysis of fMRI studies of risky decision-making using young, healthy adults, Mohr et al. (2010) found evidence common to all studies that risk processing is associated with activation of specific emotional systems in the brain such as the anterior insula, especially when potential losses are involved. The dorsolateral prefrontal cortex and parietal cortex are also activated when making decisions involving risk. Using fMRI in conjunction with a paradigm in which individuals decided whether to accept or reject gambles offering a 50/50 chance of gaining or losing varying amounts of money, Tom et al. (2007) found that activity in the ventral striatum and the VMPFC increased as potential gains increased but decreased as potential losses increased. Also, in the anterior insula, activity was found more strongly associated with the anticipation of losses than with anticipation of gains (Knutson et al., 2007). Earlier research showed increased arousal following losses than following gains (Bechara et al., 1999). Such results motivated us to classify study results based on whether activation was measured before, during, or after a decision was made.

In order to get a more complete picture,we conducted a focused literature search. Using the keywords "fMRI," "gains," "losses," "risk," and "uncertainty,"**Table 1** summarizes the results of a number of fMRI studies in terms of which areas of the brain were studied and at what point in time, and whether the study provided support for distinct mechanisms involved in risky decisionmaking for gains and losses.While the results are"mixed,"a pattern emerges when the studies are separated based on whether brain activation was measured before, during, or after a risky choice was made. Most noteworthy, while different regions were the focus of different studies, in 14 studies in which activation was assessed prior to a choice (i.e., anticipation), support for separate mechanisms was found in eight studies, four studies did not support separate structures, and two studies did not make claims about separate structures because they focused on a specific region only. For example, studies by Kuhnen and Knutson (2005) and Knutson et al. (2008b) each found that the nucleus accumbens was activated in anticipation of a risky gain, whereas the insula was activated in anticipation of a risky loss. We think these results are particularly compelling because they suggest that different parts of the brain drive risky decision-making in anticipation of uncertain gains versus uncertain losses. Whereas activation during or after a risky choice can influence subsequent risky choices, activation prior to a choice is unique in its potential to influence the current choice.

Beside the dissociation at the pre-decision stage, recent evidence suggests that experienced gains and losses might also activate different regions, which then affect subsequent decisions making. In a recent study using the cups task, we found that at the feedback stage, experienced reward was associated with strong activation in the VMPFC and the ventral striatum, and the stronger reward-related responses in the VMPFC were positively associated with risk-taking (Xue et al., 2009). In a follow up study, we explicitly examined how neural and behavioral responses to gains and losses were associated with subsequent decisions. We developed a modified version of the cups task in which a single array of cups was presented on a given trial where one coin would be lost for all but one randomly selected cup, but multiple coins would be won if the other cup was drawn (Xue et al., 2011). The decision-maker indicated whether to take or not take the gamble. In one analysis, we focused on how an experienced gain versus an experienced loss could modulate subsequent risky decision-making, both behaviorally and neurally. We found that subjects took more risk after losing a gamble than after winning a gamble. At the neural level, we again found that at the feedback stage, win was associated with stronger activation than loss in the anterior cingulate cortex, the posterior cingulate cortex, the ventral striatum, and the insula. More importantly, decisions after loss were associated with stronger activation in the frontoparietal network, which was positively correlated with individuals' increased tendency to take more risk. These results thus suggest that experienced gains and losses not only involve different brain regions, but also trigger differential neural responses and behaviors in subsequent decisions.

Despite this suggested anatomical separation, the fact remains that the same structure, for example, the insula, has sometimes been implicated in the processing of both painful and pleasurable stimuli (e.g., Craig, 2009). Indeed, when compared to a baseline of activation following trials on which the decision-maker decided not to take the gamble, both experienced gains and losses elicited strong insular activation, which then modulated subsequent decision-making (Xue et al., 2010). This calls for caution when making absolute determination about the anatomical separation of these pleasure (gain)–loss (pain) systems. In particular, a proper baseline should be included in this analysis since the same regions might show opposite modulation by gains and losses (Tom et al., 2007). Thus, the stronger activation for gains or losses in some regions might not necessarily reflect distinct neural structures for gains and losses. Another reason for these difficulties in establishing absolute anatomical separations is that cellular physiological evidence of neurons responding to positive versus negative valence stimuli, at least within the amygdala, indicates separation, while anatomical evidence is highly inter-mixed (e.g., Paton et al., 2006). This explains why the neural systems for risky gains versus losses can be functionally separate, but finding clear-cut separation viewed at the global anatomical level is more difficult, given the proximity and overlap of these two systems.

Next, we turn to lesion studies which are smaller in number in terms of addressing this issue but which should align with the "anticipatory" fMRI studies because, of course, pre-existing brain damage would likewise serve to influence revealed choices. While neuroimaging studies argue whether a particular brain region is involved in a particular function, lesion studies test whether that brain region is necessary for that function, and thus form more direct tests of the model in **Figure 1** and our earlier reference to anatomically separate neural substrates. The logic here is that if a particular function is impaired in individuals with a localized lesion, then the affected neural region must play a crucial role in executing that function. Lesion studies seem to reveal little


#### **Table 1 | Functional magnetic resonance imaging studies of risk-taking for gains and losses.**





This table is sorted by time of measurement (before, during, or after decision-making) and by result (supportive of separate structures or not). In each category, the table is sorted first in chronological order, then in alphabetical order.

dissociation between the domains of gains and losses within the prefrontal cortex region, but such dissociations are more likely to be revealed when one considers two other neural systems, the insula and amygdala, which feed information into the prefrontal cortex. Indeed, within the prefrontal cortex, patients with damage to the VMPFC show deficits for both risky gains and risky losses (Weller et al., 2007). Compared to healthy controls,VMPFC patients showed increased levels of risk-taking and decreased sensitivity to EV differences in both gain and loss domains. In contrast, amygdala patients showed impaired decision-making and exaggerated levels of risk-taking to achieve gains. However, in the loss domain amygdala damage was not associated with significantly increased risk-taking or decreased EV sensitivity. Given the abundance of literature suggesting that the amygdala is involved with avoidance of punishment, this finding suggests that other structures may act in concert with the amygdala to produce a signal that engages theVMPFC.When patients with insula damage were compared to controls, a different pattern emerged (Weller et al., 2009). Consistent with research suggesting that the insula is important for risk processing (Preuschoff et al., 2008), insula lesion patients like VMPFC and amygdala patients showed decreased sensitivity to EV differences between choice options for both risky gains and risky losses. However, these individuals showed lower levels of risk-taking compared to healthy controls, especially on gain trials. Thus the insula, with connections to the amygdala, ventral striatum, and the VMPFC, may serve the purpose of providing a "gate" to determine the effectiveness of excitatory and inhibitory motivational circuits, signaling approach or danger. Subsequently, insula damage may result in a blunted response toward risk, and would lead to insensitivity to changes in environmental contingencies signaling the approach or avoidance of a risk, regardless of domain.

Because the amygdala and insula have long been implicated in the processing of negative emotions, evoked from stimuli that are particularly aversive and perhaps even a threat to survival (e.g., LeDoux, 2000; Paulus and Stein, 2006; Phelps, 2006), we argue that these emotional reactions may be processed by multiple neural structures and are thus more difficult to disrupt as a result of a focal lesion to the amygdala or the insula alone3. Specifically, a person with a damaged amygdala but an intact insula can still

make reasoned decisions in the domain of losses even when they cannot in the domain of gains. While a separation in processing gains and losses is achieved at the level of the amygdala versus insular cortex, the two neural systems may come closely together (and become more difficult to dissociate) by the time information reaches the prefrontal cortex, which responds similarly to risky gains and risky losses. Nevertheless, when considering the evidence from both insula and amygdala lesions, support for separate processes for risky decision-making in the gain and loss domains seems to emerge. Consistent with our model, the insula, in addition to its general role in processing risk, serves to especially aid in recruitment of the VMPFC to guide risky decisions in the more emotion-laden loss domain.

## **SUMMARY AND CONCLUSION**

Taken individually, each of the neuroimaging and lesion studies reviewed here has its limitations. Lesion studies are limited to the small sample of available participants who meet the criteria of damage to a targeted area. Furthermore, some of those included may have collateral damage to other adjacent areas. fMRI studies also typically have small sample size due to financial and time constraints. Furthermore, the complexity and length of tasks that can be conducted in a scanner are limited. Also, because different studies focus on different areas (see **Table 1**), comparisons, and integration of findings can be difficult. Finally, for present purposes, the tasks used in the different studies differed in their ability to separate the gain and loss domains.

Nevertheless, we believe that we can provide a meaningful summary of the findings reviewed here. Behavioral studies suggest differences in decision-making for risky gains and risky losses. A study comparing different age groups suggests different developmental trajectories for risk-taking in the gain and loss domains. Neuroimaging studies are sometimes inconclusive in mapping brain systems to differential reactions to risky gains and losses. For example, while there is evidence that a system such as the VMPFC or the striatum is involved in both risky gains and losses, different parts of the system may be differentially sensitive to gains and losses (Xue et al., 2009). In such cases, the more general hypothesis of separate processes underlying risk-taking for gains and losses is still supported. With regard to the stricter hypothesis of separate structures, a breakdown of fMRI studies in **Table 1** shows the strongest evidence for this hypothesis when recordings capture pre-decisional or anticipatory processes. We believe that the lesion studies provide the most direct evidence implicating separate structures.

<sup>3</sup>It should be noted that redundancy has also been found in learning and memory systems, which allow learning to occur in multiple parallel memory systems; see Pinker and Ullman (2002)for an example of how multiple memory systems support the generation of verb past-tense.

Although a more detailed meta-analysis is clearly warranted, **Table 1** shows that a wide variety of structures are involved in risky decision-making beyond those depicted in **Figure 1**. Nevertheless, we feel that the relatively simple depiction of the model represents a good start in capturing the different neurological underpinnings of risk-taking for gains and losses. The complementary roles of the VMPFC, amygdala, and insula depicted in the model are consistent with both the general hypothesis that

#### **REFERENCES**


separate processes underlie risk-taking for gains and losses, and the stricter hypothesis of separate neural structures coming together in different ways to guide risky decision-making in the gain and loss domains. In conclusion, we find that evidence of different neural responses underlying risk-taking for gains and losses favors the hypothesis that decision makers react differently to risky gains and losses, both in terms of overt risk-taking and neural activation.

reward cues on financial risk taking. *Neuroreport* 19, 509–513.


function of uncertain prospects. *Neuroimage* 30, 668–677.


scale: measuring risk perceptions and risk behaviors. *J. Behav. Decis. Mak.* 15, 263–290.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 July 2011; accepted: 19 January 2012; published online: 07 February 2012.*

*Citation: Levin IP, Xue G, Weller JA, Reimann M, Lauriola M and Bechara A (2012) A neuropsychological approach to understanding risk-taking for potential gains and losses. Front. Neurosci. 6:15. doi: 10.3389/fnins.2012.00015*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Levin, Xue, Weller, Reimann, Lauriola and Bechara. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Contextual factors explain risk-seeking preferences in rhesus monkeys

## **Sarah R. Heilbronner <sup>1</sup>\* and BenjaminY. Hayden<sup>2</sup>**

<sup>1</sup> Department of Pharmacology and Physiology, University of Rochester Medical Center, Rochester, NY, USA

<sup>2</sup> Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY, USA

#### **Edited by:**

Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland

#### **Reviewed by:**

Naoshige Uchida, Harvard University, USA Veit Stuphorn, Johns Hopkins University, USA

#### **\*Correspondence:**

Sarah R. Heilbronner, Department of Pharmacology and Physiology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 711, Rochester, NY 14642, USA. e-mail: sarah.heilbronner@gmail.com In contrast to humans and most other animals, rhesus macaques strongly prefer risky rewards to safe ones with similar expected value. Why macaques prefer risk while other animals typically avoid it remains puzzling and challenges the idea that monkeys provide a model for human economic behavior. Here we argue that monkeys' risk-seeking preferences are neither mysterious nor unique. Risk-seeking in macaques is possibly induced by specific elements of the tasks that have been used to measure their risk preferences. The most important of these elements are (1) very small stakes, (2) serially repeated gambles with short delays between trials, and (3) task parameters that are learned through experience, not described verbally. Together, we hypothesize that these features will readily induce risk-seeking in monkeys, humans, and rats. Thus, elements of task design that are often ignored when comparing studies of risk attitudes can easily overwhelm basal risk preferences. More broadly, these results highlight the fundamental importance of understanding the psychological basis of economic decisions in interpreting preference data and corresponding neural measures.

**Keywords: risk, gambling, neuroeconomics, uncertainty, macaque**

## **INTRODUCTION**

In 1996, Kacelnik and Bateson published a comprehensive review of the literature on animal risk preferences (Kacelnik and Bateson, 1996). They reported that, across 59 different studies, the majority of animals exhibited risk-averse preferences during gambles for food rewards. This pattern is generally concordant with the observation that humans are risk-averse for gains across a broad variety of contexts (Bernoulli, 1954/1738; Kahneman and Tversky, 1979). Here, risk is operationalized as the uncertainty in the possible outcomes of a decision, and can be mathematically specified as coefficient of variation (Weber et al., 2004). This usage is distinct from the everyday usage of the term, which is often synonymous with threat and necessarily involves the possibility of loss. The correspondence between human and animal data suggests that risk attitudes are evolutionarily ancient and are robustly stable across conditions (Chen et al., 2006). These results also tacitly endorse the validity of animal models for studies of risk preferences, and provide a foundation for neuroeconomic studies of risky choice (Platt and Glimcher, 1999; Fiorillo et al., 2003; McCoy and Platt, 2005).

In 2005,McCoy and Platt published the first study of the singleneuron correlates of risky choice (McCoy and Platt, 2005). In contrast to the large body of animal studies reviewed by Kacelnik and Bateson, they found reliable risk-seeking behavior in two rhesus monkeys (*Macaca mulatta*). Given a choice between a medium-sized squirt of cherry juice and a risky option that offered a 50% chance a large amount of juice and a 50% chance of a small amount, the monkeys reliably preferred the risky option, even though the expected values of the two options were matched. As the size of the large and small reward diverged, and risk level

of the risky option thus increased, monkeys became even more risk-seeking. These monkeys even continued to prefer the risky option in a control experiment where the probability of winning was only 1/3 and the mathematical expected value of the gamble was lower than that of the safe option (McCoy and Platt, 2005). These risk-seeking preferences were not due to lack of training: the monkeys consistently chose to gamble even after months of experience with the task.

Since this early study, it has become clear that strong riskseeking preferences are not unusual in macaques. The original monkeys continued to exhibit risk-seeking behavior for years, and six othersfrom the same lab were consistently risk-seeking, totaling eight animals and hundreds of thousands of trials, with no cases of risk-aversion observed (Hayden and Platt, 2007; Hayden et al., 2008a,b, 2010; Long et al., 2009; Watson et al., 2009; Heilbronner et al., 2011). At least two other neurophysiology labs have also found reliable risk-seeking behavior in rhesus macaques (O'Neill and Schultz, 2010; So and Stuphorn, 2010). To our knowledge, no published study has reported stable risk-aversion in rhesus monkeys. With a variety of laboratories reporting the same result, one might conclude that risk-seekingness is some generalizable preference of rhesus macaques – perhaps distinguishing them from humans and other animals.

Here we argue the opposite: rhesus monkeys are not unique among animals, nor are they even inherently risk-seeking. Instead, we argue that, for practical reasons, the task design elements used by scientists who have studied risk attitudes in monkeys are those most likely to encourage risk-seeking. The most important of these elements are (1) decisions have very small stakes, a squirt or two of juice, (2) decisions are repeated hundreds or thousands of times

with short delays (a few seconds) between trials, and (3) the reward structure of the task is learned through experience, rather than explained through language. These elements were preserved across studies in several laboratories in large part because they are optimal for neurophysiological recording, and they show up in rhesus macaque studies because monkeys are generally trained to gamble for the purpose of neuronal recording studies. Although the many studies listed above varied considerably from the original McCoy and Platt experiment,for example, by use of priming stimuli (Watson et al., 2009) and changes in cue presentation (Hayden et al., 2010), they still had the core features in common.

## **CONVEX UTILITY CURVES ARE INSUFFICIENT TO EXPLAIN RISK-SEEKING IN MACAQUES**

Before we address the factors that promote risk-seeking, it is helpful to discuss the most common explanation for risk-aversion or seeking: that risk-sensitive decision-makers have non-linear utility, and animals seek to maximize expected utility. Since the eighteenth century, it has been argued that decision-makers weight veridical reward values by a personal utility function, and these utilities, discounted by probability, are combined to form an expected utility (Bernoulli, 1954/1738). Thus, a concave utility curve, in which the marginal utility of each additional reward unit diminishes, is often assumed to explain human risk-aversion (**Figure 1**). These arguments are restricted to the domain of gains;for losses, hypothesized convex utility curves may explain risk-seekingness (Kahneman and Tversky, 1979). Indeed, in his commentary on the original macaque study (McCoy and Platt, 2005); Lee (2005) pointed out that the rhesus macaques' behavior was consistent with a convex utility curve in the gains domain. However, while utility curves can *describe* risk preferences, it remains unclear whether they provide an accurate process model that *explains* risk preferences.

We performed an experiment to investigate this question. We gave macaques a choice between two options, (1) a standard risky option with a 50/50 chance of a large and small juice reward and (2) an alternating option that provided either the large or small reward; its value alternated between these two reward sizes each time it was chosen (Hayden et al., 2008a). If non-linear utility functions drive risk-seeking preferences, then monkeys should be indifferent to risky and alternating options because the utilities must be the same. Instead we found that monkeys strongly and stably preferred risky options to alternating ones (**Figure 2**), indicating that the uncertainty itself biases the monkeys toward the risky option. In contrast, the alternating option was only weakly preferred to the safe option, suggesting that non-linearities in the utility function account for a small amount of risk attitudes.

Another utility-based account often used to explain riskseeking, the energy budget rule, comes from foraging theory (Caraco et al., 1980; Caraco, 1981; Stephens and Krebs, 1986). The energy budget rule postulates that foraging animals should be risk-seeking if a large outcome means survival but the (ironically named) safe option means death. That is, if the minimum number of calories necessary to survive lies somewhere between the value of the safe option and the value of winning from the risky option, an animal should choose the risky option. It is theoretically possible that this preference, imbued by natural selection, is so strong that it affects monkeys in the laboratory, even though they are never even close to mortal danger. Although this is an appealing explanation for risk-seeking,is it highly unlikely to apply to gambling macaques. First, if the minimum number of calories necessary to survive lies somewhere between the value of losing from the risky option and the value of the safe option, the animal should always be risk-averse, not risk-seeking. Macaques are not limited to a single decision, but instead face hundreds or thousands of gambling choices every day. As the number of trials performed in a day increases, even when it is only into the double digits (and even more so beyond that), the optimal risk-sensitive foraging strategy rapidly comes to approximate risk neutrality. Consistent with this, satiety level (whether within a session or between sessions) does not affect risk preferences (Hayden, McCoy, and Platt, unpublished data). Thus, the criteria for the energy budget rule are quite strict, and, not surprisingly, there is inconsistent empirical evidence of risk-seeking behavior based on energy budget (Kacelnik and Bateson, 1997).

### **PSYCHOLOGICAL FACTORS THAT AFFECT RISK ATTITUDES**

Given the failure of traditional utility-based explanations, we next consider the possibility that specific contexts used in studying risk attitudes in monkeys influenced their preferences. Because these tasks were all originally developed for use in single unit physiology, studies of macaque gambling behavior are subject to unique constraints among the corpus of human and animal risk studies. They involve small stakes (to increase the number of trials performed in a session), a large number of trials (averaging neuronal responses over trials reduces the noise that comes from variability in neuronal firing patterns) presented very quickly (because neuronal isolation is unstable, physiologists collect data as quickly as possible), and task structures learned through experience (because monkeys have no language). We consider each of these factors here.

## **SMALL STAKES**

In humans, risk-aversion is weaker when the reward stakes are small (Holt and Laury, 2002; Fehr-Duda et al., 2010). Nobel-prize winning economist Markowitz,for example, intuited that typically risk-averse humans would prefer a lottery offering a 10% chance of \$1 over a guaranteed 10 cents (Markowitz, 1952). Empirical tests of this bias, known as the "peanuts effect," have demonstrated markedly reduced risk-aversion for small stakes (Hershey and Shoemaker, 1980; Green et al., 1999; Weber and Chapman, 2005). Some studies even suggest that small stakes may be sufficient to promote risk-seeking, although this is not fully demonstrated (Weber and Chapman, 2005). Consistent with this idea, some have argued that casinos effectively increase risk-seeking behavior among gamblers by dividing gambles into small (peanut-sized) amounts (Simmons and Novemsky, 2008). The reasons for the peanuts effect remains unclear, although it may reflect changing attitudes to disappointment for small amounts. Another possibility is that when reward sizes are small, gains loom larger than losses (Harinck et al., 2007). Indeed, there is some evidence that monkeys' behavioral adjustments are more strongly motivated by the possibility of gains (or large rewards) than fear of losses (or small rewards) in their small-stakes gambling paradigms (Hayden and Platt, 2007; Hayden et al., 2008a), despite the well-known

phenomenon of loss aversion for normal-sized amounts (Tversky and Kahneman, 1981).

Regardless of the psychological cause, it is clear that riskaversion weakens greatly when stakes are very low. Primate gambling studies invariably use very small stakes – typically 0.1–0.3 ml of fluid per trial. The average daily intake for laboratory monkeys is generally around 2000 times greater than this amount. For the purpose of comparison, let us consider the human monetary equivalent. If an average American earns approximately \$40,000 per year, or about \$120 per day, the equivalent trial would offer 1/2000 of that amount, or 6 cents, lower than Markowitz' "peanuts" amount. Although money and juice may not be directly comparable, it is safe to conclude that each individual trial offers such a small amount of juice that it probably puts monkeys into the domain of peanuts effects, and likely biases the monkeys away from risk-aversion and potentially toward risk-seeking.

Reward sizes in other animal studies run the gamut from small to large, and are of course difficult to equate to juice and money. For example, Abreu and Kacelnik (1999) gave starlings access to an average of 0.085 g of food crumbs on each trial, 0.4% of their average daily intake. This is four to eight times greater than the equivalent average monkey reward. In contrast, Kagel et al. (1986)

gave rats approximately 0.2 ml of water per trial, which represents 4% of daily intake, an order of magnitude greater. Future studies using similar energy budget condition and trial structures should vary reward amounts and values directly, allowing for direct assessment of how much variability in risk preferences can be attributed to reward size. Nevertheless, Craft et al. (2011) demonstrated that rats are more risk-seeking when reward quality is low, matching the pattern from the human literature.

## **REPEATED GAMBLES**

In a classic paper, Samuelson (1963) described a cafeteria meeting in which he offered his lunchtime companion a 50/50 gamble with two possible outcomes: winning \$100 or losing \$50. The possibly fictional colleague said he would reject the offer but would take it if it were repeated 100 times. Samuelson proceeded to mathematically prove the irrationality of this pair of preferences. Despite the attendant irrationality, the lure of serialization, which allows one to amortize one's losses, is psychologically strong. Even upon being explained of its irrationality, many people persist in this set of preferences (Lopes, 1981). More generally, many researchers have found that serialization makes many types of gambles more attractive (Weaver, 1963; Lopes, 1981; Keren and Wagenaar, 1987; Wedell and Bockenholt, 1990; Hayden and Platt, 2009b). Preferences in repeated gambles often move toward risk neutrality (from risk-aversion for unique gambles), or even toward risk-seeking. In other cases, repeated gambles may elicit preferences that more closely match expected values (Keren and Wagenaar, 1987).

Interestingly, the frequency at which gambles are presented may impact preferences in a similar way, perhaps because frequency is good proxy for number of iterations. We tested this by measuring risk preferences in macaques in different blocks in which the delay between trials was controlled systematically (Hayden and Platt, 2007). When the inter-trial interval is lengthened from a few seconds to dozens of seconds, monkeys become significantly less risk-seeking (**Figure 3**). In our study, when the delay reached 90 s, the longest value tested, monkeys were risk-neutral. We speculate that delay affects the relative attention paid to the possibility of winning and losing, and that this change in attention affects preferences. Specifically, we developed a model in which monkeys estimated the expected time of the next large reward (and thus, in essence, differentially attended to winning) and then chose the option that would maximize the discounted value of the sequence leading to the large reward. (A control experiment confirmed that monkeys did not simply forget task details between trials). These results argue strongly for the importance of serial presentation of gambles on risk preferences, and more generally confirm the powerful importance of seemingly irrelevant details, like inter-trial interval, on gambling behaviors.

Another element of serial gambles that influences preferences is a strong bias toward adjusting behavior in response to recent outcomes (Barron and Erev, 2003; Hayden et al., 2008b). Gambling humans are susceptible to recency biases, including the hot-hand fallacy and the gambler's fallacy. The hot-hand fallacy is the irrational belief that wins typically follow wins (Gilovich et al., 1985). Monkeys appear to be susceptible to the hot-hand fallacy, a pattern known in primate gambling studies as the win-stay lose-shift bias (Barraclough et al., 2004; Lau and Glimcher, 2005; Hayden et al., 2008b). That is, following a winning outcome, monkeys are more likely to choose the risky option again than if they have just experienced a loss (**Figure 4**). (The gambler's fallacy, which has not been observed in monkeys, would have them predict that wins are "due" and thus more likely after a loss.) In addition, monkeys change their strategy based on surprisingness, meaning positive or negative deviation in outcome from expectation. Even though trials are independent, an unexpected outcome biases monkeys toward choosing the inferior option on the subsequent trial (Hayden et al., 2011). These changes in preference based on recent outcomes are large and robust; they can only be eliminated with extensive training (Lee et al., 2005). While this effect does not, by itself, explain risk-seeking, it demonstrates the importance of recent context, and not just present offer values, in governing preferences (see above and Haruvy et al., 2001; Hayden et al., 2008a).

As with reward amounts, ITIs and numbers of trials differ across animal studies. Even so, on average, the number of trials seems to be lower than in the monkey gambling studies, and the ITIs seem to be considerably longer. In the same starling study mentioned above (Abreu and Kacelnik,1999), subjects did 36 trials per session, two sessions per day. The ITI was variable, but averaged 57.5 s. In the rat study (Kagel et al., 1986), the ITI was 146 s and subjects completed 17–41 trials per day. These numbers contrast drastically with monkeys' hundreds to thousands of trials per day with ITIs of a few seconds. Clearly these are very different experimental environments, and may account for cross-species discrepancies.

## **FEEDBACK-BASED LEARNING PROMOTES RISK-SEEKING**

In most studies measuring risk attitudes in humans, the subject learns about probabilities and rewards through written (or spoken) description rather than through experience. It is assumed that results from these studies have predictive validity to contexts in which gamble parameters are learned through experience. However, studies using repeated gambles in which contingencies must be learned from experience elicit strikingly different preference patterns in humans. Ido Erev and colleagues have shown that when humans rely on feedback instead of descriptions to learn about outcomes, they can become risk-neutral or even riskseeking in the gains domain (and risk-averse for losses; Barron and Erev, 2003). The reasons for this discrepancy are currently unresolved (and reviewed in detail in Hertwig and Erev, 2009). One possibility is that subjects use overly small sample sizes in estimating probabilities; another is that they overweight low probabilities, as in prospect theory. Additional possible explanations include the recency bias observed in estimates based on memory, and biased mental sampling. Regardless of the ultimate cause, the fact that decisions from experience produce systematic biases is well-established (Hertwig et al., 2004; Hertwig and Erev, 2009).

Another factor that applies to experienced gambles, but not described ones, is information-seeking. In volatile environments, decision-makers will mix two simple strategies: exploitation (selecting the option thought to provide a greater expected value) and exploration (selecting the more uncertain – and thus more informative – option;Daw et al.,2006;Pearson et al.,2009). Indeed, in a dynamic foraging environment, monkeys and humans both have strong propensities to explore, routinely sacrificing rewards for the possibility to try new, more uncertain options (Daw et al., 2006; Pearson et al., 2009). In completely stable environments like those used in laboratory gambling studies in rhesus monkeys, exploratory sampling is theoretically unnecessary (and in fact costly); nevertheless, some baseline level of exploration may be innate, or perhaps the decision-maker assumes some inherent variability despite evidence to the contrary. Confirming the strong drive toward curiosity, monkeys will pay a premium to

have uncertainty resolved earlier in the trial (Bromberg-Martin and Hikosaka, 2009). This drive for information may bias monkeys toward a risk-seeking strategy because choosing the risky option provides greater information about the range of possible outcomes in the environment than the safe one.

## **CONCLUSIONS: VALIDITY AND APPLICABILITY TO OTHER CONTEXTS**

We have identified three major features that may promote riskseeking in rhesus monkeys: small stakes, repeated gambles, and learning from feedback. Preferences for risk are strongly dependent on these task parameters, and all published studies of risk attitudes in rhesus macaques have in common the three elements mentioned above. Future studies should directly manipulate these elements in isolation, thus testing the hypothesis that they are responsible for promoting macaque risk-seeking. This is not an exhaustive list of the psychological influences on risk-seeking; other factors that influence risk attitudes in monkeys include social milieu (Watson et al., 2009), background context (So and Stuphorn, 2010), and mood (Long et al., 2009).

The common features of task design in these studies suggest that risk-seeking preferences observed in macaques may not be an innate trait of their species. Indeed, recent studies optimized for physiology in rats have found risk-seeking preferences as well (Roitman and Roitman, 2010). In a similar vein, we have shown that when humans gamble in a paradigm designed to mimic, as closely as possible, that experienced by monkeys, risk-aversion disappears, and humans approach risk-seeking – and also show the same types of trial-to-trial hot-hand-like effects as monkeys do (Hayden and Platt, 2009a). In this study, we asked our human study participants to sit alone in an anechoic chamber with a juice tube placed in their mouth delivering squirts of juice in response to individual decisions in a task in which all rules were learned through experience. (Indeed, it was the same task used with monkeys, with the exception that humans used a keyboard rather than eye movements to signal their decisions). We speculate that if the study participants had been exposed to the same task for weeks and weeks, as the monkeys are, they may have become risk-seeking. These results are consistent with the observation that when humans and animals are placed in similar conditions, they exhibit similar gambling preferences (Weber et al., 2004). Thus, although there may be a main effect of species on risk attitudes (Heilbronner et al., 2008), it is likely to be overwhelmed by contextual factors when experimental conditions are not carefully standardized.

How do we reconcile these results with the pronounced riskaversion observed in other animal species? Certainly,most animals are risk-averse (Kacelnik and Bateson, 1996), although there are now many known exceptions (for a review, see Heilbronner et al., 2009). As we have noted here, paradigms developed for neurophysiological recordings differ substantially even from most other animal choice studies. For example, ITIs are typically tens or hundreds of seconds rather than a few seconds, and animals may complete dozens of trials per day compared to hundreds or thousands. Handling times and reward values for seeds, pellets, and sucrose solution (Kacelnik and Bateson, 1996) may also differ drastically from those associated with drinking juice from a tube. Because most animal studies do find risk-aversion toward gains across a wide variety of methods,we should think of the conditions used in macaque gambling studies as somewhat extreme.

Broadly speaking, it is clear that attitudes toward risk are influenced by a large number of psychological factors, and that careful manipulation of these factors can push attitudes toward risk-aversion, risk-seeking, or neutrality. So far, the attributes of task design used to study risk preferences in rhesus monkeys bias them toward risk-seeking. These results highlight the importance of carefully considering the influence of task parameters when comparing across species, and if possible of using the same design elements. Of course, rhesus monkeys lack the ability to use language, and their conceptual representation of large numbers

## **REFERENCES**


*Behav. Ecol. Sociobiol. (Print)* 8, 213–217.


and explicit probabilities remains unclear. Given monkeys' lack of language, it is quite possible that there may be no fair primate analog of standard written risk tasks in humans, just as there is no primate analog of jokes, irony, word-naming, or any other product of language. Thus, it may be impossible to use animals to model certain aspects of risky decisions in humans. However, it is also clear that humans and monkeys have similar behavior in response to gambles in which parameters are learned, suggesting that monkeys may be a good model for specific types of decision-making under uncertainty. Clearly, an important future goal will be dissociating the differences between preference patterns for different types of uncertainty (Volz and Gigerenzer, 2012).

Understanding the patterns of preferences for risk among rhesus macaques is critical to neuroeconomists – scientists who use measures of brain activity to infer the computational mechanisms of incentive-based (i.e., economic) decisions. Rhesus macaques are hoped to be a viable model for human economic preferences. Although a cursory examination of human vs. macaque preferences would suggest that they are quite different (in fact, opposite), here we have argued that similar psychological factors influence both species. Thus, macaque models of decision-making may accurately reflect the many cognitive biases influencing human risk preferences. These results therefore highlight the fundamental importance of identifying and accounting for the psychological processes behind decisions.

#### **ACKNOWLEDGMENTS**

We thank Alli McCoy and Michael Platt for performing the initial studies, Michael Platt for years of generous support, John Pearson for helpful talks about these ideas, and Caleb Strait and Jay Kralik for comments on the manuscript. This research was supported by a fellowship from the Sloan Foundation and by NIH ROO DA027718 (Benjamin Y. Hayden).


theories of decision-making. *Trends Cogn. Sci. (Regul. Ed.)* 1, 304–309.


L. (2009). Neurons in posterior cingulate cortex signal exploratory decisions in a dynamic multioption choice task. *Curr. Biol.* 19, 1532–1537.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 September 2012; paper pending published: 17 November 2012; accepted: 10 January 2013; published online: 01 February 2013.*

*Citation: Heilbronner SR and Hayden BY (2013) Contextual factors explain risk-seeking preferences in rhesus monkeys. Front. Neurosci. 7:7. doi: 10.3389/fnins.2013.00007*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2013 Heilbronner and Hayden. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Social anxiety modulates risk sensitivity through activity in the anterior insula

## *Grace S. Tang1\*,Wouter van den Bos 1, Eduardo B. Andrade2 and Samuel M. McClure1*

<sup>1</sup> Department of Psychology, Stanford University, Stanford, CA, USA

<sup>2</sup> Haas School of Business, University of California, Berkeley, CA, USA

#### *Edited by:*

Kerstin Preuschoff, University of Zurich, Switzerland

#### *Reviewed by:*

Martin P. Paulus, University of California San Diego, USA Luke Clark, University of Cambridge, UK

#### *\*Correspondence:*

Grace S. Tang, Department of Psychology, Stanford University, 450 Serra Mall, Building 420, Stanford, CA 94305, USA. e-mail: gstang@stanford.edu

Decision neuroscience offers the potential for decomposing differences in behavior across individuals into components of valuation intimately tied to brain function. One application of this approach lies in novel conceptualizations of behavioral attributes that are aberrant in psychiatric disorders. We investigated the relationship between social anxiety and behavior in a novel socially determined risk task. Behaviorally, higher scores on a social phobia inventory (SPIN) among healthy participants were associated with an increase in risky responses. Furthermore, activity in a region of the dorsal anterior insula (dAI) scaled in proportion to SPIN score in risky versus non-risky choices. This region of the insula was functionally connected to areas in the intraparietal sulcus and anterior cingulate cortex that were related to decision-making across all participants. Overall, social anxiety was associated with decreased risk aversion in our task, consistent with previous results investigating risk taking in many everyday behaviors. Moreover, this difference was linked to the anterior insula, a region commonly implicated in risk attitudes and socio-emotional processes.

**Keywords: social anxiety, risk, SPIN, anterior insula, intraparietal sulcus, anterior cingulate cortex**

## **INTRODUCTION**

Among anxiety disorders, social anxiety has the highest lifetime prevalence in the U.S. population (Kessler et al., 1994). It is defined by fear and avoidance of a range of social situations, and negative physiological reactions during these encounters (Connor et al., 2000). Socially anxious individuals are typically stereotyped as withdrawn in social contexts, leading to overall less trusting, and more risk-averse behaviors (Erwin et al., 2003; Kashdan and McKnight, 2010). There is some evidence indicating that there is an overall difference in risk preferences with social anxiety; specifically, high social anxiety predicts greater risk aversion in the Balloon Analog Risk Task (BART; Maner et al., 2007). However, recent findings and theoretical arguments have argued that social anxiety may promote risk seeking in some circumstances. It has been shown that when expecting a positive outcome, social phobics have higher risk preferences than less socially anxious controls (Kashdan et al., 2006). Subsets of social phobics have also been found to engage in risky behaviors such as alcohol and drug use as emotion regulation strategies to protect against anxiety responses in social scenarios (Kashdan and McKnight, 2010). Similar compensatory behaviors may arise more generally, with increased aggression (or anger) expressed to protect against the consequences of anticipated withdrawal in social interactions (e.g., bargaining). Indeed, increased expression of anger has been found in individuals with high social anxiety (Erwin et al., 2003). Anger, in turn, has been associated in other work with increased risk-seeking choices (Lerner and Keltner, 2001). Overall, prior work indicates that risk taking may either increase or decrease as a function of social anxiety, particularly in anticipation of social interactions.

Given the complexity of the phenomenon, we aimed to study how social anxiety correlates with risk attitudes in a simplified social context. To do so, we employed an adapted version of a twoplayer two-stage response game (**Figure 1**, cf. Charness and Rabin, 2002;Charness and Rabin, 2005;Kosfeld et al., 2005;Krueger et al., 2007). The task was structured such that the target player decided between a risky or safe option in which the likelihood of greater reward depends on the anticipated beneficence of other players. We aimed to have the probability of different outcomes depend only on social factors while otherwise minimizing the impact of other player's attitudes toward monetary gains. We expected that this feature of the task would maximize behavioral differences that depend on social anxiety, allowing for a direct assessment of the relationship between social anxiety and social risk taking in a simplified task. To enhance the emotional effects, we furthermore added subliminal social primes in the form of backward-masked fearful and happy faces. As a secondary aim,we examined the effect of fear versus happy social primes.

As indicated above, two hypotheses about how risk aversion may differ with social anxiety are suggested by the literature. We hoped to differentiate between these using measures of behavior and brain activity. With regard to brain activity, decision neuroscience has linked numerous brain areas to specific aspects of valuation in studies of risk and ambiguity. If social anxiety is associated with overall differences in risk aversion, then neuroimaging may reveal differences in neural structures associated with assessment and integration of the incentives in a risky choice. Numerous brain areas have been associated with these processes, most prominently areas in the prefrontal and posterior parietal cortex involved in cognitive and executive functions (Hsu et al., 2005;Huettel et al., 2005, 2006; Brand et al., 2007; Rangel et al., 2008). Conversely, if differences in risk aversion depend on emotional responses triggered by social context (cf. Kashdan et al., 2006; Kashdan

and McKnight, 2010), then a different pattern of brain responses may emerge. Regions in the anterior insula have been associated with the emotional responses triggered by the anxiety associated with potential losses (Kuhnen and Knutson, 2005; Mohr et al., 2010). Intriguingly, social anxiety has been linked to hyperactivity in the anterior insula in other work (Etkin and Wager, 2007). Building from this, we therefore aimed to use neuroimaging to differentiate between the conflicting hypotheses about how risk attitudes are associated with social anxiety and to explore the patterns of neural activation that may give rise to these behavioral differences.

## **MATERIALS AND METHODS PARTICIPANTS**

Twenty healthy males (ages 19–46 years, *M* = 25.0, SD = 6.8, one declined to specify) were recruited from the community surrounding Stanford University to participate in the study. To prevent knowledge of the purpose of the study, no mention of social anxiety was made during recruitment. We restricted recruitment only to males to reduce variability in emotional responses to emotional face primes used in the study (cf. Whalen et al., 1998). Two participants who exclusively chose only safe or only risky options were excluded from analysis, leaving a total of 18 participants included in the final data set (ages 19–46 years, *M* = 25.2, SD = 7.0). The study was approved by Stanford University's Institutional Review Board, and all participants gave informed consent. Participants received \$30 for participation in the 90-min experiment. Additionally, they were paid the outcome of one trial chosen at random from all choices made during the experiment.

Upon arrival, participants signed a consentform and completed a magnetic resonance screening form. They were presented with task instructions and two practice trials of the decision-making task on a laptop computer prior to entering the scanner.

### **SOCIAL RISK TASK**

The scanner session consisted of four task blocks of 16 trials each. The overall structure of the task is depicted in **Figure 1A**. Each trial contained a backward-masked face prime followed by a risky choice. The inter-trial interval was random, ranging from 5 to 9 s, during which a white fixation cross was shown. The fixation cross turned green 1 s before the onset of the face prime to signal that a trial was about to begin. A random inter stimulus interval of 3–6 s separated the face prime and the decision period.

Participants were instructed that they would be performing two tasks. The first task was a foil used to ensure that participants paid attention to the presentation of face stimuli that were otherwise instructed to be irrelevant. For this first task, participants were simply told to attend to the faces displayed in order to perform a recognition task at the end of the experiment. Face primes were fearful or happy expressions of eight individuals (eight fearful and eight happy primes per block; Ekman and Friesen, 1976). Backward-masked emotional faces were presented with a display time of 33 ms, mimicking the subliminal presentation of the same stimuli by others (Whalen et al., 1998). Each face presentation was immediately masked by a neutral face (of the same individual as the emotional face image) image for 200 ms. All images were in grayscale. We used this procedure to better elicit socially relevant emotions without consciously prompting participants to alter their decision-making strategy.

The second task was a two-player (Player A and B) decision task schematized in **Figure 1B**. Participants were instructed that they were to complete multiple one-shot iterations of a task with other anonymous players, whose responses were collected beforehand. They were further instructed that the other players in this task were unrelated to the faces shown for the memory task. Each round of the task proceeded as follows:for each choice, the amount the other player (Player B) received was fixed. To determine how much the participant (Player A) would receive, a choice was made between two options. One option guaranteed that both players each received the same amount of money ("safe" option; payment of *V*<sup>0</sup> to both players). The other option allowed Player B to determine whether the participant received more (amount *V* max >*V*0) or less (*V* min <*V*0) money than the second player ("risky" option). For example, consider the choice depicted in **Figure 1**. For the safe option, the participant was guaranteed a payoff of \$4. For the risky option, the participant may earn \$1 or \$10 depending on the action of Player B. Participants were told that since the other players changed on every trial, and only one random trial would be selected for payment, they should treat each trial as independent, and as if it were the only trial presented to them. The side of the screen on which the safe and risky options appeared was randomized across trials.

Player B choices were collected before the experiment by polling an independent random sample from the Stanford community. Participants (Players A) were informed that Player B responses were collected beforehand from real respondents, but were not instructed about how to assess the likelihood of Player B selecting either the higher or lower payment. They were also not provided feedback about the other players' choices during the session. We did this to allow socially relevant emotions from the face presentation to better carry over to risky decision-making.

The range of values for the safe option (*V*0) was \$3 to \$8, the smaller value for the risky option (*V* min) was \$1 to \$7, and the larger values (*V* max) ranged from \$5 to \$14. The task included a total of 64 trials. In every trial the safe value was always intermediate between the large and small risky values (*V* min <*V*0<*V* max). A ratio term was computedfor every trial with thefollowingformula:

$$\text{Ratio} = \frac{\text{Potential Gain}}{\text{Potential Loss}} = \frac{V\_{\text{max}} - V\_0}{V\_0 - V\_{\text{min}}} \tag{1}$$

The ratio term is a measure of the relative potential gain over the potential loss of allowing the opponent to choose the outcome of the trial, and was the best predictor of choice outcomes in our experiment. Eight sets of eight trials were created, such that each set contained the same distributions of ratios ranging 1–3. The same sets of values were used for all participants. Two of these sets were randomly assigned to each block, one set for the fearful trials, and the other for the happy trials.

#### **fMRI ACQUISITION AND ANALYSIS**

Functional images were acquired with a 3-T General Electric Discovery scanner (Waukesha, WI, USA). T2∗-sensitive gradient echo spiral in/out pulse sequences (Glover and Lai, 1998; Glover and Law, 2001) were used for functional imaging (33 oblique axial slices parallel to the AC–PC line, slice thickness = 4 mm, no gap, TR = 2000 ms, TE = 30, TE2 = 30.5, flip angle = 77, FOV = 20 cm, 64 × 64, ascending sequential). Spiral in/out methods have been shown to reduce signal loss in regions compromised by susceptibility-induced field gradients generated near air-tissue interfaces such as ventral PFC and striatum (Glover and Law, 2001; Preston et al., 2004). High-resolution T2-weighted fast spin-echo structural images (BRAVO) were acquired for anatomical reference (TR = 8.2 ms, TE = 3.2 ms, flip angle = 12 slice thickness = 1.0 mm, FOV = 24 cm, 256 × 256).

The imaging data were preprocessed and analyzed with SPM8 (Wellcome Department of Imaging Neuroscience, University of London). Preprocessing of the data used SPM8 for slice-timing correction, realignment to the first image for motion correction, coregistration, normalization to an Montreal Neurological Institute (MNI) template image, and spatial smoothing with an 8-mm full-width half-maximum Gaussian kernel.

Our main analyses were performed using whole-brain general linear model (GLM) analyses. Events of interest are described for individual analyses in Section "Results." In all analyses we included a set of regressors to account for potentially confounding effects. Specifically, to account for variability in response times, we modeled the decision period using a boxcar with duration from the onset of the decision to the time of choice submission. We also included regressors for head movement during the experiment (estimated from realignment). Regressors of interest were convolved with a canonical hemodynamic response function.

AlphaSim (Ward, 2000) was used to calculate the appropriate cluster size for a corrected significance threshold of *p* < 0.05 (1000 Monte Carlo simulations). A minimum cluster size of 45 was required with a voxel-wise threshold of *p* < 0.005 given the smoothness of our preprocessed data.

#### **POST-SCAN QUESTIONNAIRES**

After completing the scanner session, participants completed the 17-item Social Phobia Inventory (SPIN; Connor et al., 2000) and the 20-item trait version of the Spielberger State-Trait Anxiety Inventory (STAI-trait; Spielberger, 1983). These questionnaires were administered at the end of the session to reduce subject knowledge of our hypotheses.We included the STAI-trait as a control measure for overall anxiety (non-social). STAI-trait scores did not correlate with any of the behavioral or neural indices discussed below. For succinctness, we therefore omit further discussion of this variable.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

On average, participants chose the risky option 57.8% of the time, but there were large individual differences in the proportion of risky choices made (SD = 17.6%, range = 25–84.4%). Mean reaction time was 5.0 s (mean reaction times across subjects: SD = 1.6 s, range = 2.7–7.4 s). There was no significant difference in reaction time for risky versus safe choices (*p* > 0.9). Furthermore, there was no significant relationship between average reaction time per subject and proportion of risky choices made (*r* = −0.0136, *p* > 0.9).

There was no significant effect of the emotion of the face prime on choice (fearful versus happy; *p* > 0.8 on β coefficient in logistic regression, see below). Because the specific emotion of the face prime had no significant effect on our observed results, for all the analyses that follow we averaged over fear and happy face primes, as has been done by others (Casey et al., 2011), to produce an aggregate measure of the effect of social emotions.

On the SPIN, participants scored an average of 19.0 out of a maximum of 68 (SD = 10.6, range = 1–46). Scores of 20 and above are considered clinically relevant. Our population therefore spanned a range of no significant social anxiety, to (in one case) severe. In subsequent analyses we analyzed SPIN scores as a continuous variable. However, for illustration purposes we split participants into low and high SPIN (e.g., **Figure 2**) around the median SPIN value of 18. Serendipitously, this median split approximately corresponds to clinically relevant and irrelevant SPIN scores.

To investigate the effects of social anxiety and ratio (see Eq. 1) on risky decisions, a linear mixed model was used with choice as the dependent variable, ratio, emotion prime, and SPIN as fixed effects and subject as a random effects variable. Variables were mean centered. As expected, the effect of ratio was highly significant: the higher the ratio, the more likely the risky choice was selected (*p* < 0.001). A main effect of social anxiety was also found: the probability that a risky option was chosen correlated positively with SPIN scores (*p* < 0.04). There was a significant ratio × SPIN interaction as well (*p* < 0.002). Higher SPIN scores were associated with higher differential sensitivity to ratio (**Figure 2**).

#### **NEUROIMAGING RESULTS**

#### *Correlations between behavior and neural activation*

We performed two analyses to relate brain activity to choices. We began by constructing a GLM with five regressors of interest, one regressor each for fearful and happy emotion primes, and one for the decision period, with ratio and choice (1 for risky, −1 for safe) included as parametric modulators of decision-related activity. Using a fearful–happy contrast, no significant effects of emotion of face prime were found at our significance criterion anywhere in the brain. The ratio and choice regressors were separately used to identify candidate brain areas that govern evaluation of risk in our task. No main effects of choice were found at our significant threshold. However, brain areas that correlated with ratio include a number of areas that have been associated with risk assessment and decision-making in other work (Rushworth et al., 2004; Huettel et al., 2005, 2006; Rangel et al., 2008; Hare et al., 2009). We found significant effects in the supplementary motor

area (SMA), anterior cingulate cortex (ACC), bilateral intraparietal sulcus (IPS), bilateral inferior frontal gyrus (IFG), and bilaterally in the ventral anterior insula (vAI; **Figure 3**; **Table 1**).

Because potential gain/loss ratio was found to be a strong predictor of choice in our behavioral analyses, we conducted an ROI analysis to further explore how the areas associated with ratio related to choice. ROIs were created using 6 mm radius spherical masks around the peak voxels from the following areas (coordinates are reported in MNI space): SMA (−2 38 52), ACC (2, 36, 34), bilateral IPS (46, −56, 50 and −44, −56, 50), bilateral vAI (40, 16, −6 and −44, 16, −6), bilateral IFG (31, 59, 14 and −38, 56, 8).

For each of these ROIs, the correlation between mean activity as a function of choice and the percentage of safe choices made was calculated across participants. Specifically, we hypothesized that if these brain areas are associated with decision-making, then differences in activity as a function of choice should predict individual propensities for selecting risky/safe alternatives. Results from this analysis are shown in **Table 2**. Using a threshold α of 0.0063, based on Bonferroni correction for multiple comparisons (*p* = 0.05/8 ROIs, SMA, ACC, and bilateral IFG, IPS, and vAI), significant negative correlations were found between activity in the left and right IPS and the proportion of safe choices made. Marginally significant results were found in each of the other ROIs except for the left vAI, ACC, and IFG. Based on these findings, we conclude that a number of areas are associated with evaluation of risky options in our task. However, the IPS (bilaterally) appears to play a particularly important role in governing individual differences in behavior (cf. Mohr et al., 2010).

## *Neural responses associated with effects of social anxiety on decision-making*

To examine the effects of social anxiety, analyses were repeated with individual SPIN scores as a covariate. This allowed us to determine the correlation between social anxiety and BOLD signal change. A negative correlation was observed between SPIN score and BOLD activity for the choice (risky–safe) regressor in the left dorsal anterior insula (dAI; **Figure 4A**; **Table 3**). This region of the insula was distinct from that found in the analysis above, occupying a more dorsal/medial position (peak voxel at −28, 22, 2).

Interestingly, choice-dependent differences in dAI activity were of opposite sign for those participants with above and below clinically relevant SPIN scores. To illustrate this, **Figure 4B** shows mean dAI activity for each subject in a 6-mm sphere surrounding the peak dAI voxel identified in **Figure 4A**. Those participants with low social anxiety showed greater dAI activity in risky versus safe choices, consistent with previous reports (e.g., Kuhnen and Knutson, 2005; Mohr et al., 2010). The opposite finding held in participants with high SPIN scores. Specifically, choices for safe options were associated with greater dAI activity than choices made for risky options. *A priori*, we would have expected emotion regulation in high SPIN individuals to give reduced dAI activity, not that it would change the sign of the effect. Nonetheless, our critical hypothesis was confirmed: those participants with high SPIN scores show less activity in brain areas associated with emotions that lead to choice of safe outcomes.

## *Interaction between dAI region associated with social anxiety and brain areas responsible for evaluation of risk*

We have identified the dAI as mediating the effects of social anxiety on behavior in our task. We have also identified a number of other areas as governing the effect of risk on choice. In this final analysis we determine how the dAI interacts with the regions associated with decision-making using functional connectivity analyses.

A psychophysiological interaction (PPI) analysis was conducted to find areas that show a stronger functional connectivity with the dAI during the decision periods of the task for risky versus safe

**Table 2 | Correlation between** *p* **(choose safe option) and activation with safe–risky choice contrast.**


Bold: p < 0.05; \*p < 0.05 after Bonferroni correction.

#### **Table 1 | Correlation between activity during the choice period and the ratio of potential gains to losses.**


MNI coordinates; p < 0.05 corrected (p < 0.005, k > 45).

choices. Using the peak voxel within the dAI as a seed region, raw time courses were extracted, *z*-normalized, corrected for linear drift, and used as regressors in a separate GLM analysis. In order to examine the task contrast of interest (risky versus safe choices), the time course values for six TRs after each onset of the decision

individuals had higher activation in the dAI for risky versus safe choices, while the opposite was true in high social phobic individuals. See**Table 3**. period were multiplied by 1 for risky choices and −1 for safe. Six TRs (12 s) was the maximum time period possible for this design (to prevent interference from subsequent trials), and the full 12 s was included in our model so that an adequate signal could be obtained for PPI analysis (cf. Cohen et al., 2005; Park et al., 2010; van den Bos et al., 2011). Six motion regressors were included as regressors of no interest. Significant correlations were found in the ACC and the IPS for this model (**Figure 5A**; **Table 4**). These areas overlap with the regions that were found to be significantly correlated with ratio (the conjunction of results from the two analyses are shown in **Figure 5B**). This analysis was also carried out with the vAI as seed region. No significant connectivity was found with any other regions of the brain. Based on these findings, it seems that social anxiety influences risk preferences through interactions between the dAI and valuation processes in the ACC and IPS.

## **DISCUSSION**

Contrary to the common portrayal of social phobics as risk-averse and distrusting, the current study showed that the preference for risk on a social task correlated positively with social anxiety scores. As mentioned in the introduction, people with social anxiety have been found to employ risky behaviors as an emotion regulation strategy, especially when one expects positive outcomes to arise from these risks (Kashdan et al., 2006; Kashdan and McKnight, 2010). It is possible that in the modified version of the response game, in which the second players' choices do not affect their own payoffs, participants might have expected that the other player would grant the larger sum. This expectation of a positive outcome may underlie the increased propensity of those with higher social anxiety scores to utilize risk as a compensatory strategy.

This rationale may also explain the discrepancies between the findings in this paper and those obtained using the BART as a measure of risk sensitivity. Maner et al. (2007) showed that social phobics were more risk-averse than controls using BART. One hypothesized cause for risky behavior in socially anxious individuals is that it compensates for anticipated anxiety in social situations. BART is strictly a single player game and, therefore, the need for compensatory emotion regulation does not arise. However, our task involves a second player and mimics a two-person interaction. These attributes may trigger socially anxious individuals to anticipate anxiety and compensate by increasing risk seeking. More generally, this social feature may highlight domains in which social anxiety is associated with greater or lesser risky behavior in daily life.

This is the first study using the task we employed. It is therefore relevant to note that the brain areas we identified as important for governing choices correspond well with areas associated with risky

#### **Table 3 | Correlation between activation based on choice (risky–safe) and SPIN score.**


MNI coordinates; p < 0.05 corrected (p < 0.005, k > 45).



MNI coordinates; p < 0.05 corrected (p < 0.005, k > 45).

decision-making in other work. The IPS correlates with individual differences in risk attitudes in simple gambles (Huettel et al., 2006). Similarly, the ACC and SMA are involved in a number of decisionmaking tasks, including those that involve socially determined uncertainty (e.g., Sanfey et al., 2003; Baumgartner et al., 2008; van den Bos et al., 2009). Similarly, ventral regions of the AI correlate with perceptions of risk and drive choice outcomes accordingly (Sanfey et al., 2003; Kuhnen and Knutson, 2005; Bossaerts, 2010).

The anterior insula is increasingly being appreciated for its importance in numerous cognitive functions (Kurth et al., 2010; Deen et al., 2011). Ventral parts of the anterior insula have commonly been associated with perception of risk, as noted above (Bossaerts, 2010). The region of dAI that we find to be related to social anxiety lies at the intersection of regions associated with socio-emotional processes and cognitive processing in a recent large meta-analysis (Kurth et al., 2010). Our experiment was motivated by the hypothesis that assessment of socially determined risk would differentially trigger compensatory risk-seeking behavior as a function of social phobia. Relating this region of the dAI to social anxiety in our task therefore makes conceptual sense. Moreover, a recent study related hyperactivity of the same region of dAI to clinical presentation of social anxiety (Etkin and Wager, 2007).

Our analyses showed that the dAI is functionally connected to the IPS and ACC in a manner consistent with a role in biasing choice. At least in social contexts, even as simply approximated by our task, we conclude that the anterior insula is a region tied to clinically relevant behavior (risk seeking). This provides a new conceptual framework, rooted in cognitive neuroscience, for understanding aspects of the behavioral differences that manifest clinically as social anxiety.

Due to the small sample size of this study, we recruited only male participants to reduce sample variance. Furthermore, male participants were recruited because social phobia symptoms and risk taking behavior have been found to vary over the menstrual cycle (Chavanne and Gallup, 1998; Voelker, 1998; Bröder and Hohmann, 2003). Future studies including women are necessary in order to generalize the findings to both genders.

Advances in understanding the computational and brain bases of behavior have enabled a recent spurt of neurobiological accounts of various psychiatric disorders. For example, Maia and Frank (2011) link aspects of Parkinson's disease, Tourette's syndrome, ADHD, addiction and schizophrenia to specific functional deficits in cortico-basal ganglia circuitry. Some mood disorders have also been addressed. Differences in anterior insula activity associated with borderline personality disorder predict behavioral outcomes in a two-person trust game (King-Casas et al., 2008). Likewise, depression has been associated with specific computational deficits tied to serotonin function (Dayan and Huys, 2008). In non-clinical populations, behavioral preferences and associated brain activity also appear to depend on individual differences in personality traits; for example, in decisions involving risk, insula activity was found to correlate with harm avoidance and neuroticism, while activity in the temporal parietal junction, anterior insula, and ACC correlated with social value orientation (Paulus

## **REFERENCES**


rating scale. *Br. J. Psychiatry* 176, 379–386.


et al., 2003; van den Bos et al., 2009). Our results contribute to this growing literature in the domain of social anxiety.

## **ACKNOWLEDGMENTS**

This work was supported by a NeuroVentures pilot grant through Stanford University. We thank Bob Dougherty and Atsushi Takahashi for invaluable help at practically every stage of data collection and analysis. Chan Jean Lee made important contributions during the development of the task. Finally, we thank Kent Blake for help with data collection.

risk and ambiguity. *Neuron* 49, 765–775.


E., and Schmidt, N. B. (2007). Dispositional anxiety and risk-avoidant decision-making. *Pers. Individ. Dif.* 42, 665–675.


motivates repayment? Neural correlates of reciprocity in the Trust Game. *Soc. Cogn. Affect. Neurosci.* 4, 294–304.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2011; accepted: 09 December 2011; published online: 03 January 2012.*

*Citation: Tang GS, van den Bos W, Andrade EB and McClure SM (2012) Social anxiety modulates risk sensitivity through activity in the anterior insula. Front. Neurosci. 5:142. doi: 10.3389/fnins.2011.00142*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Tang , van den Bos, Andrade and McClure. This is an openaccess article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Dissociable neural processes underlying risky decisions for self versus other

#### **Daehyun Jung<sup>1</sup> , Sunhae Sul <sup>2</sup> and Hackjin Kim<sup>3</sup>\***

<sup>1</sup> Laboratory of Social and Decision Neuroscience, Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea

<sup>2</sup> Laboratory of Social and Decision Neuroscience, Wisdom Science Center, Korea University, Seoul, South Korea

<sup>3</sup> Laboratory of Social and Decision Neuroscience, Department of Psychology, Korea University, Seoul, South Korea

#### **Edited by:**

Ming Hsu, University of California, USA

#### **Reviewed by:**

Mathieu D'Acremont, California Institute of Technology, USA Songfa Zhong, National University of Singapore, Singapore

#### **\*Correspondence:**

Hackjin Kim, Department of Psychology, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 136-701, South Korea. e-mail: hackjinkim@korea.ac.kr

Previous neuroimaging studies on decision making have mainly focused on decisions on behalf of oneself. Considering that people often make decisions on behalf of others, it is intriguing that there is little neurobiological evidence on how decisions for others differ from those for oneself. The present study directly compared risky decisions for self with those for another person using functional magnetic resonance imaging (fMRI). Participants were asked to perform a gambling task on behalf of themselves (decision-for-self condition) or another person (decision-for-other condition) while in the scanner. Their task was to choose between a low-risk option (i.e., win or lose 10 points) and a high-risk option (i.e., win or lose 90 points) with variable levels of winning probability. Compared with choices regarding others, those regarding oneself were more risk-averse at lower winning probabilities and more risk-seeking at higher winning probabilities, perhaps due to stronger affective process during risky decisions for oneself compared with those for other. The brain-activation pattern changed according to the target, such that reward-related regions were more active in the decision-for-self condition than in the decision-for-other condition, whereas brain regions related to the theory of mind (ToM) showed greater activation in the decision-for-other condition than in the decision-for-self condition. Parametric modulation analysis using individual decision models revealed that activation of the amygdala and the dorsomedial prefrontal cortex (DMPFC) were associated with value computations for oneself and for another, respectively, during risky financial decisions.The results of the present study suggest that decisions for oneself and for other may recruit fundamentally distinct neural processes, which can be mainly characterized as dominant affective/impulsive and cognitive/regulatory processes, respectively.

**Keywords: fMRI, self–other decision, amygdala, dorsomedial prefrontal cortex, risky decision, prosocial behavior, social neuroscience**

## **INTRODUCTION**

In daily life, we make decisions on behalf of others as often as we make decisions on behalf of ourselves: we sometimes order a lunch for a friend, choose presents for family, make decisions for a company, or buy products or stocks for customers. These other-regarding decisions, albeit not immediately targeted toward ourselves, can be critical to the establishment and maintenance of our social lives. Like decisions for oneself, decisions for others ranging from mundane to profound also involve some level of risk. Thus, it is important to understand the mental processing that drives risky decisions for others as well as those for oneself. Despite the significance of this issue, few neuroimaging studies have directly compared decisions for oneself with those for others, and only a small body of literature on the subject exists in the field of social psychology. Thus, the goal of the present study is to understand whether and how decisions (i.e., a risky decision in a gambling task) for oneself and for others differ from each other at the neural level through the use of functional magnetic resonance imaging (fMRI).

An emerging body of literature on self–other decision making has documented risky decisions in various domains, such as surrogate decisions in medicine (Hare et al., 1992; Fagerlin et al., 2001; Lipkus et al., 2001), public policy (Roszkowski and Snelbecker, 1990; Reynolds et al., 2009), career choice (Kray and Gonzalez, 1999), romantic relationships (Beisswanger et al., 2003; Wray and Stone, 2005), and financial decisions in gambling tasks (Hsee and Weber, 1997; Loewenstein et al., 2001; Stone et al., 2002; Fernandez-Duque and Wifall, 2007). Although some progress has been made, the findings of these studies have been rather inconsistent. For instance, some studies have reported that people behaved/thought in a more risk-seeking manner when they decided for another person than for themselves (Hsee and Weber, 1997; Beisswanger et al., 2003), whereas others found that people became more risk-averse in similar situations (Fernandez-Duque and Wifall, 2007).

In order to reconcile the conflicting findings listed above, recent studies have considered the potential mediating factors of these observations (Fernandez-Duque and Wifall, 2007; Stone and Allgaier, 2008). For example, Fernandez-Duque and Wifall (2007) examined actor-observer asymmetry in risky decisions and proposed that the self–other discrepancy could be mediated by differential access to experiential and rational decision making

systems. They suggested that when people decide for themselves, the experiential system – which involves intuitive and emotionally based processes – might weigh more heavily on the decision making process than the rational system – which engages effortful, logical, and analytical processes (Denesraj and Epstein, 1994). This could be the case because actors who make decisionsfor themselves are more likely to be influenced by their own affective reactions to the consequent rewards and punishments. Similarly, Hsee and Weber (1997) showed that the level of description of the other person accessible to participants mattered in the self–other decision making discrepancy. In their study, participants predicted that others would be more risk-seeking than they would be in terms of financial decisions when the other person for whom the decision was made was described in anonymous and abstract terms. However, the self–other difference diminished when the other person for whom the decision was made was described vividly and in concrete terms. The authors suggested that a vivid description of the other person made the decisions for self and for other commensurate by eliciting strong affective reactions in subjects. To explain the findings, they proposed a "risk-as-feelings hypothesis," which maintains that people rely on affective evaluations when making decisions for themselves in risky situations (Hsee andWeber, 1997; Loewenstein et al., 2001).

The idea that risky decisions for oneself are mainly affected by emotional reactions has been supported by a large body of neuroscience literature. Most relevant is the finding that the amygdala, a key structure for emotional processing during decision making (Morrison and Salzman,2010), plays a critical role in risky decision making (Bechara et al., 1999; Hsu et al., 2005; De Martino et al., 2006, 2010; Brand et al., 2007; Ghods-Sharifi et al., 2009; Smith et al., 2009). For instance, De Martino et al. (2006, 2010) studied the neural correlates of the framing effect, whereby people become more risk-averse in a gain frame (i.e., when gains are made salient) than in a loss frame (i.e., when losses are made salient). This effect is a representative example of emotionally driven decision making in risky situations and was strongly associated with activity in the amygdala (De Martino et al., 2006); further, the effect was significantly diminished in patients with amygdalar damage (De Martino et al., 2010).

The amygdala forms extensive anatomical and functional connections with the dorsomedial prefrontal cortex [DMPFC, which includes Brodmann areas (BAs) 9, 32, 33, and part of the medial prefrontal cortex (MPFC); Etkin et al., 2011] that show developmental progress (Hung et al., 2011). The amygdala's affective reactions seem to be regulated *via* these connections (Banks et al., 2007; Kim et al., 2011). Although relatively little is known about the role of these amygdala–DMPFC connections in risky decisions (Cohen et al., 2005), the DMPFC itself is also known as a key structure for decision making in risky situations. For example, Wu et al. (2011) found that activation of the MPFC, including both dorsal and ventral regions, quantitatively reflected the subjective value of monetary outcomes combined with probability information about lottery tasks. Another study showed that the DMPFC was specifically responsive to risk-related information (Xue et al., 2009). Similarly, many previous studies using the Iowa Gambling Task (IGT) have shown that risky decision making is associated with increased activation of the DMPFC (Bolla

et al., 2003; Fukui et al., 2005; Tanabe et al., 2007). Further, the DMPFC plays an important role in emotional regulation during affective decision making (Banks et al., 2007), effort-based decision making (Rudebeck et al., 2006; Floresco and Ghods-Sharifi, 2007; Croxson et al., 2009), and perspective taking during otherregarding processes (St. Jacques et al., 2010). In sum, while the amygdala is responsible for affective reactions in risky decisions, the DMPFC seems to control cognitive processes, such as weighing the probabilities and reward values of different options and regulating emotion.

As reviewed above, evidence from the social psychology literature implies the existence of distinctive neural circuitry subserving decisions on behalf of others as opposed to those made for oneself, and unveiling this difference would greatly advance the current theoretical account of prosocial decisions. In line with this idea, a recent study showed that activity in the ventromedial prefrontal cortex (VMPFC) was modulated by activity in the inferior parietal lobule (IPL) – a brain region close to the temporoparietal junction (TPJ) that is involved in mentalization (Saxe and Powell, 2006) – when people made product purchase decisions for others, whereas no such modulation effect of TPJ was found when people made the same decisions for themselves (Janowski et al., 2012).

The present study aimed to examine the difference between decisions made for oneself and those made for another in a risky situation by using a gambling task paradigm with systematically variable winning probability. On the bases of previous findings, we predicted that affective processes would have stronger weight in decisions made for oneself than for other. Thus, we hypothesized that considering a risky choice on behalf of another may employ the brain regions involved in cognitive/rational processes (e.g., the prefrontal cortex) more than those associated with affective/experiential processes (e.g., the amygdala), whereas the opposite may be true when the risky decision is made for oneself.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Twenty-three undergraduate students in South Korea [12 women; mean age (SD) = 23.32 (2.59)] participated voluntarily and were compensated an average of KRW 30000 (∼USD 25) for about 1 h of participation. Any potential health risks were carefully screened *via* a self-report questionnaire, and informed consent was obtained from all participants. All participants were righthanded and reported having no chronic mental illness. Three participants were excluded from analysis because they fell asleep inside the scanner. The experimental procedures were approved by the Institutional Review Board of Korea University.

### **TASK AND PROCEDURES**

Participants performed a gambling task inside the MRI scanner. We adopted the "modified risk task" developed by Knoch et al. (2006), in which participants were asked to choose between two options: one with lower risk (i.e., win or lose 10 points) and another with higher risk (i.e., win or lose 90 points). The winning probability of each option was 17–83% (the probabilities used were 17, 33, 50, 67, and 83%). In each trial, participants were presented with six boxes distinguished by pink and blue colors, and they were asked to choose either pink or blue. The colors of

the boxes indicated the numbers of points that the participants could win or lose: 10 for pink and 90 for blue (**Figure 1A**). Participants were told that a yellow coin had an equal chance to appear in any of the boxes and they would gain points if the coin was contained in one of the boxes with the chosen color and that they would lose the same number of points if the coin appeared in an opposite-colored box. For example, if they chose pink and there was a coin in one of the pink boxes, they won 10 points, but if there was no coin in the chosen color, they lost 10 points. Likewise, if they chose blue, they won or lost 90 points if there was or was not a coin among the blue boxes, respectively. Thus, pink were the low-risk and blue were the high-risk options. The ratio of pink and blue boxes determined the winning probability of each option; for example, five pink boxes and one blue box meant that the winning probabilities were 83% for pink and 17% for blue. In each trial, participants had to make a decision on behalf of either themselves (decision-for-self condition) or another person (decision-for-other condition), according to a cue presented prior to the task. The structure of a single trial is shown in **Figure 1B**. First, the cue indicating the decision condition (decision-for-self or decision-for-other) was presented for 1–3 s, followed by the risk task. After the six boxes were presented, the participants chose between pink and blue by pressing the left or right button of an MR-compatible mouse. They were asked to respond carefully but as quickly as possible. Each participant performed 120 total trials (60 in each of the decision-for-self and decision-for-other conditions), which were divided into three sessions. Each session consisted of 40 trials, with the same number of trials for each condition. The order in which the different types of trials were presented was determined pseudorandomly. Earned points were accumulated separately for each condition. Participants were told that the points accumulated in all trials would be converted into real money.We informed participants that they would be endowed with a base payment of 30,000 KRW; we also informed them that 25000–35000 KRW was an approximate range of final compensation. Participants were kept blind to the exact ratio between points and money, because we did not want them to focus on calculating the exact amounts of money earned by themselves or others. Subjects were also told that their task performance in the decision-for-other condition would determine extra earnings of another person who was randomly selected among the participants of the same experiment. Participants understood that the transfer would be completely anonymous so that neither the participants themselves nor their counterparts would know each other's identities. They also knew that their decisions for others would not affect their own profits, because the point totals for self and other were calculated separately, and the tasks were performed individually. In both conditions, participants started with 100 points; 4 s after the participants made a decision, the result of the decision (i.e., win or lose) was presented on the feedback screen (**Figure 1B**).

All instructions were given outside the scanner, and each participant performed 10 practice trials before entering the scanner to learn the task rules. After the completion of the task, the points earned during the decision-for-self condition were converted into money and added to the subject's base payment (KRW 30000), and the points earned during the decision-for-other condition were actually transferred to another participant for whom the participant made the decisions. Participants' final earnings varied 25000–35000 KRW, depending on their own performance and that of the other randomly matched participant.

#### **NEUROIMAGING PROCEDURES fMRI data acquisition**

We acquired data using an ISOL Forte 3T system with a standard birdcage coil in the Brain Science Research Center at the Korea Advanced Institute of Science and Technology. T2<sup>∗</sup> -weighted functional images were obtained using gradient– echo echo-planar pulse sequences (TR = 2000 ms; TE = 30 ms; FA = 80˚; FOV = 240 mm; 64 × 64 matrix; 24 slices; voxel size = 3.75 mm × 3.75 mm × 4.0 mm). The stimuli were presented through an MR-compatible LCD monitor mounted on a head coil (refresh rate: 60 Hz; display resolution: 640 × 480 pixels; viewing angle: 30˚). Each functional run lasted 480–600 s, including the first five TRs, which were discarded later due to unstable magnetization.

#### **fMRI data analyses**

Neuroimaging data were preprocessed and analyzed using Statistical Parametric Mapping 8 (SPM8; the Wellcome Trust Centre for Neuroimaging, University College London, UK). After timing correction for interleaved slice acquisition, correction for head motion was performed through realignment of all functional images to those of the first scan, and then a mean brain image was created for each participant. The realigned images were normalized to the Montreal Neurological Institute (MNI) echoplanar imaging (EPI) template, resampled at a voxel resolution of 2 mm × 2 mm × 2 mm, and spatially smoothed with an 8 mm Gaussian filter. Anatomical localization was performed in SPM8 by reference to a T1 template image, and then the preprocessed functional images were analyzed using the general linear model (GLM; Friston et al., 1995). Regressors for the events of choice and outcome delivery were convolved with a canonical hemodynamic response function. Motion vectors obtained from the realignment process were included as regressors in the GLM in order to reduce noise. The resulting statistical parametric maps were first thresholded stringently (significance level: FWE < 0.05, corrected for multiple comparisons; cluster size threshold >5 voxels), but no activation cluster survived this threshold.

To verify *a priori* hypotheses regarding several regions of interest (ROI), we used the small-volume correction (SVC) method for multiple comparisons (*p* < 0.05) in SPM8. We expected that the ventral striatum (VS; left: *x* = −10, *y* = 6, *z* = −14; right: *x* = 8, *y* = 6, *z* = −10), the ventral tegmental area (VTA; *x* = −4, *y* = −14, *z* = −20), the anterior cingulate cortex (ACC; *x* = 4, *y* = 24,*z* = 40), and the insula (*x* = −34, *y* = 20,*z* = −4) would be involved in reward anticipation and feedback during risky choice in the self–other contrast (Ernst et al., 2004) and that the TPJ (left: *x* = −48, *y* = −57, *z* = 25; right: *x* = 53, *y* = −54, *z* = 17), the posterior cingulate cortex (PCC; *x* = 2, *y* = −60, *z* = 27), and the MPFC (*x* = 1, *y* = 63, *z* = 2) – which are known to be related to theory of mind (ToM) functions (Saxe and Powell, 2006) – would be involved in the same function in the other–self contrast. In addition, we expected the amygdala (*x* = 22, *y* = −4, *z* = −18; Smith et al., 2009) and the DMPFC (*x* = −8, *y* = 36, *z* = 30; Wu et al., 2011) to encode the value of the risky option. The search volumes for SVC were restricted to spheres with radii of 15 mm and center coordinates obtained from corresponding studies. Additionally, we defined the ROIs in both hemispheres by mirroring the coordinates obtained in previous studies. To reduce the risk of false negatives and completely overview the clusters at which activation occurred, we also applied a less-stringent significance level (*p* < 0.001, uncorrected; cluster size threshold ≥5 voxels); a table with a list of activation clusters is included in the Supplementary Material.

Brodmann areas and brain regions were identified in Talairach space (Talairach and Tornoux, 1988) after converting the MNI coordinates to Talairach ones using non-linear transformation (Lancaster et al., 2007).

#### **Contrasting decision-for-self versus decision-for-other**

In order to explore which brain regions were more highly activated by the decision-for-self task than by the decision-forother task or vice versa, we estimated whole-brain contrast maps from the periods when participants watched the six boxes and received reward information during both tasks. The single-subject whole-brain GLMs included the following regressors: (1) decision events at the time of task onset, when participants viewed the stimuli for 10 types of trials (i.e., five levels of probability in both the decision-for-self and decision-for-other conditions) and made decisions, (2) button-pressing events, (3) feedback events (at feedback onset, when participants watched two types of outcomes: those for self and other), and (4) motion parameters. The self–other and other–self contrasts were defined for all probability conditions.

#### **Parametric modulation analysis based on individual decision models**

We conducted parametric modulation analysis to determine which brain regions had activation levels that correlated with the decision value that each participant placed on the risky choice. Each participant made risky choices for self and other with varying probabilities of a favorable outcome; we calculated the decision value using optimal sigmoid functions fitted to the participant's probability (0–1) of choosing the high-risk option over the low-risk option as a function of the probability of winning. The parameters of the estimated models were calculated by using the least-squares method for each participant (see the equation below).

$$f\left(\chi\_i\right) = \frac{1}{e^{a(b-\chi\_i)} + 1}$$

In the equation above, the variable *x* is the winning probability of the high-risk option, and *f*(*xi*) is the probability of a risky choice on trial *i*. The parameter *a* indicates the slope of the sigmoid function that reflects how drastically the probability of risky choice changes according to the level of winning probability, and *b* denotes an offset criterion for the winning probability of the high-risk option when the participant is expected to choose the risky option with 50% probability. We calculated the parameters separately for the decision-for-self and decision-for-other conditions. We generated separate single-subject GLMs for parametric modulation analysis, which included the following regressors: (1) decision events when a new configuration of colored boxes is displayed on the computer screen, along with individually estimated decision values [i.e., *f*(*xi*)] as parameters for the decision-for-self and decision-for-other conditions; (2) button-pressing events; (3) feedback events at the time of feedback onset when participants watched two types of outcomes (i.e., those for self and other); and (4) motion parameters.

#### **Psychophysiological interaction analysis**

We conducted psychophysiological interaction (PPI) analysis (Friston et al., 1997) to examine the functional connectivity between the brain regions identified from the contrast analyses. Specifically, we searched for brain regions whose activity showed differential patterns of correlations with that of a source region as a function of experimental condition (i.e., the decisionfor-self and decision-for-other conditions). We used the right TPJ (rTPJ) as the source region, because it is the representative area that reflects both perspective taking (Castelli et al., 2000; Saxe and Wexler, 2005; Decety and Lamm, 2007) and otherregarding behavior (Morishima et al., 2012), and because we focused on examining how the brain regions activated during decision-for-other communicated with other areas. We extracted time-series data from the peak voxel of a cluster found in rTPJ for each participant and then generated PPI regressors, that is, the time-course of activity in the seed region modulated by two levels of the psychological variable (i.e., decisionfor-other versus decision-for-self). We then estimated singlesubject whole-brain GLMs with the following regressors: (1) time-course of activity in the seed region (rTPJ activity), (2) psychological contrast (other–self contrast weight), (3) interaction term (rTPJ activity × other–self contrast weight), and (4) motion parameters.

### **RESULTS**

#### **BEHAVIORAL RESULTS**

We first calculated the ratio of risky choices to the total number of trials in each condition for each participant. Then, we conducted a 2 (conditions: decision-for-self and decision-for-other) × 5 (winning probabilities of the high-risk option: 17, 33, 50, 67, and 83%) repeated-measures ANOVA on the probability of choosing the high-risk option (i.e., blue) over the low-risk option (i.e., pink). Because Mauchly's test indicated that the assumption of sphericity had been violated (χ <sup>2</sup>= 34.070, *p* < 0.05), we used a multivariate test, which revealed a significant two-way interaction effect, *F*(4, 16) = 3.150, *p* < 0.05. As shown in **Figure 2**, the difference between the frequencies of high-risk decisions for self and for other varied according to the probability of winning. Participants were more likely to make risk-seeking decisions in the decision-for-self condition than in the decision-for-other condition when the winning probability of the high-risk option was higher, while the reverse was true when it was lower.

To investigate this interaction further, we conducted *post hoc* pairwise *t*-tests on the differences in the risky choice ratio between the decision-for-self and decision-for-other conditions at each level of wining probability of the higher risk option. We found a significant difference between the conditions at 83%, *t*(22) = 2.319, *p* < 0.05, and a marginally significant difference at

## 17%, *t*(22) = −2.01, *p* = 0.059 (**Figure 2**), although none of the tests survived Bonferroni correction.

#### **NEUROIMAGING RESULTS**

## **Decision-for-self versus decision-for-other during the decision event**

To compare the brain regions associated with decisions for oneself with those associated with decisions for another, the self–other and other–self contrasts at the time of decision (i.e., task onset time) were estimated. The self–other contrast revealed greater activation in the decision-for-self condition than in the decisionfor-other condition in various regions, including the bilateral VS (**Figures 3A,C**; left: *x* = −12, *y* = −2, *z* = −14; right: *x* = 18, *y* = 12, *z* = −16), the VTA (*x* = 6, *y* = −24, *z* = −18), the ACC (*x* = 8, *y* = 36, *z* = 34), and the right insula (*x* = 34, *y* = 24, *z* = −12; all findings thresholded at *p* < 0.05, SVC FWE-corrected unless otherwise stated). The other–self contrast showed that the bilateral TPJ (left: *x* = −50, *y* = −62, *z* = 16; right: *x* = 58, *y* = −66, *z* = 24) and the PCC (*x* = −6, *y* = −58, *z* = 30) were more active in the decision-for-other condition than in the decision-for-self condition (**Figures 3B,D**).

#### **Neural responses to monetary outcomes for self versus other**

The self–other contrast at the time of the monetary outcome events revealed preferential activation of the right insula (*x* = 32, *y* = 18, *z* = −16) in the decision-for-self condition. The other– self contrast revealed the opposite pattern in the bilateral TPJ (left:

**FIGURE 3 | Main contrast maps between self and other conditions.** Areas showing greater activity during decision events **(A)** in the decision-for-self condition than in the decision-for-other condition and **(B)** in the decision-for-other condition than in the decision-for-self condition. The statistical threshold for the images was set at p < 0.005 (uncorrected). The bar graphs in the lower panel show the beta coefficients (averaged across all probabilities) of **(C)** the left VS (x = −12, y = −2, z = −14; Z = 4.19, p < 0.05, SVC FWE-corrected) for the self–other contrast and **(D)** the left TPJ (x = −50, y = −62, z = 16; Z = 4.34, p < 0.05, SVC FWE-corrected) for the other–self contrast.

*x* = −48, *y* = −62,*z* = 24; right: *x* = 48, *y* = −58,*z* = 22) and the PCC (*x* = 2, *y* = −64, *z* = 38), which are similar to the areas of activation observed at the time of decision events (Figure S1 in Supplementary Material).

#### **Parametric modulation analysis using individual decision models**

The present study mainly aimed to examine the distinctive neural structures involved in the computation of values of choices and the prediction of risky choices on behalf of both oneself and others. With this in mind, we conducted parametric modulation analysis using the participants'individual decision models. Model parameters were generated by fitting sigmoid functions to the probabilities of choosing the high-risk option over the low-risk option. Eighteen participants were subjected to the analysis, excluding two participants whose behavioral data fit poorly to sigmoid functions because of their atypical decisions (e.g., risky choices regardless of probability). The individual decision models were estimated for the decision-for-self and decision-for-other conditions separately.

The analyses revealed that the chance of making a risky choice for self was positively correlated with activation of the right anterior amygdala (*x* = 16, *y* = 6, *z* = −16), whereas the chance of making a risky choice for other was positively correlated with activation of the left DMPFC (*x* = −14, *y* = 34, *z* = 32). Activation was not negatively correlated with the chance of making a risky decision-for-self or for other in any brain region.

To investigate which brain regions drive the differences between the models for self and other, we calculated the contrast between the value computation models for self and other *via* parametric modulation analysis. The self–other contrast showed that activation in the right amygdala (*x* = 24, *y* = 0, *z* = −22) was closer to that predicted by the value computation model for self than that for other, while activation in the left DMPFC (*x* = −14, *y* = 32, *z* = 32) showed a stronger association with the decision model for other than for self (**Figure 4**).

#### **Parametric modulation analysis using expected value and outcome**

We conducted additional parametric modulation analysis to examine prediction error (PE)-related neural activity at the time of the feedback events. The PE parameters were calculated by subtracting the expected values (EVs) from the monetary outcomes (−10, 10, −90, or 90 points) separately for self and other conditions. The EV of the low-risk option (i.e., choosing the pink box) was calculated by adding the respective EVs for gain (i.e., winning probability of the low-risk option × points gained for winning) and loss (i.e., winning probability of high-risk option × points lost for losing); the EV of the high-risk option was calculated in an analogous manner. This analysis revealed that in the decision-for-self condition, activity in the ACC (*x* = 8, *y* = 32, *z* = 26) was correlated negatively with PE, whereas activation was not significantly correlated with PE in any brain area in the decision-for-other condition (Figure S2 in Supplementary Material).

#### **Psychophysiological interaction analysis**

We assessed the functional connectivity between brain regions during the decision-for-self and decision-for-other conditions using PPI analysis. We identified the brain regions in which correlations between their activity levels and those of rTPJ were modulated by psychological condition (decision-for-self versus decision-for-other). The results revealed that rTPJ showed stronger positive connectivity with the left DMPFC (*x* = −4, *y* = 34, *z* = 34) in the decision-for-other condition than the decision-for-self condition (*p* < 0.001, uncorrected; **Figures 5A,B**). The coordinates of the DMPFC reported here are immediately adjacent to those reported from the decision

**FIGURE 4 | Main contrast maps between self and other decision models.** Significant correlations with the value parameters of risky choice estimated by fitting sigmoid functions to actual decisions were found in **(A)** the right amygdala (x = 24, y = 0, z = − 22; Z = 3.85, p < 0.05, SVC FWE-corrected) for the self versus other contrast and **(B)** the left DMPFC (x = − 14, y = 32, z = 32; Z = 3.94, p < 0.05, SVC FWE-corrected) for the other versus self contrast. The statistical threshold for the images was set at p < 0.005 (uncorrected). The bar graphs in the lower panel show the beta coefficients of the **(C)** amygdala for the self versus other contrast and **(D)** the DMPFC for the other versus self contrast.

**FIGURE 5 | Psychophysiological Interaction (PPI) analysis. (A)** Stronger functional connectivity with rTPJ was found in the left DMPFC during decisions for another than for oneself (x = −4, y = 34, z = 34; Z = 3.22, p < 0.001, uncorrected). The statistical threshold for the images was set at p < 0.005 (uncorrected). **(B)** The scatter plot representing a single-subject's data. It shows a stronger positive correlation between rTPJ and DMPFC during the decision-for-other condition than the decision-for-self condition.

model for the other–self contrast (Figure S3 in Supplementary Material).

Considering the widespread problem of non-independence error in neuroimaging research (Kriegeskorte et al., 2009), we were concerned about whether the present PPI findings in the DMPFC were independent of seed-point selection. We did not observe elevated DMPFC activity in the other–self contrast, even at a low statistical-significance threshold (*p* < 0.05, uncorrected), and careful examinations of individual subjects' PPI GLM models revealed no evidence of a significant correlation between the TPJ–time-course regressor and the psychological variable regressor. Therefore, it seems more plausible that the variability in TPJ activity not accounted for by the regressor for the other–self contrast contributed significantly to the PPI-related activity in the DMPFC observed in the present study. This argument is further supported by the relationship between the TPJ and DMPFC activities, as exemplified by a scatter plot of a representative individual in **Figure 5B**. In addition, we performed cross-validation analysis using a leave-one-subject-out method (Esterman et al., 2010), in which single-subjects are iteratively left out of the first-stage group analysis that localizes the TPJ. This analysis confirmed the original results, although the size of the cluster in the DMPFC became slightly smaller (*x* = −4, *y* = 34,*z* = 34; *p* < 0.001, uncorrected; Figure S4 in Supplementary Material). In sum, although potential bias due to non-independent use of the data cannot be completely excluded, we believe the possibility that it occurred is minimal.

#### **ADDITIONAL BEHAVIORAL EXPERIMENT**

In order to explain the behavioral results, which were less distinguishable than the neural data in terms of self–other differences, we conducted an additional behavioral experiment in which we examined whether individual differences in prosociality explain the reduction in self–other behavioral discrepancies. When we interviewed the participants about how they felt during the task, some said that their decisions made for another person felt the same as those made for themselves, whereas others said that they could clearly distinguish between the two conditions in terms of feelings. Thus, we hypothesized that individual differences in prosociality (i.e., the ability or disposition to regard another person's benefit as being as important as one's own) would affect the degree of self–other discrepancy in risky decision making.

Nineteen participants performed the same risk tasks as we used in the main experiment. The selection of high-risk options during the task increased linearly as a function of the probability of winning; this replicated the findings of the main experiment. The statistical-analysis procedures were the same as those used for the behavioral data in the main experiment. The interaction between conditions and winning probabilities was significant, *F*(4, 15) = 3.099, *p* < 0.05. To investigate the modulatory role of individual variability, we measured each individual's prosocial tendency with the Triple Dominance Measure (TDM) task (see Supplementary Material for details), which was adopted from a previous study (Haruno and Frith, 2010). After removing two participants who made inconsistent choices, which prevented clear categorization, we categorized the participants into three groups: prosocials (*n* = 6), individualists (*n* = 10), and competitors (*n* = 1). We then combined individualists and competitors into the proself group, following two previous related studies (Van Lange and Liebrand, 1989; Sattler and Kerr, 1991). **Figure 6** shows a greater self–other distinction in the probability of risky choice for the proselfs (**Figure 6A**) compared with that for the prosocials (**Figure 6B**), although no significant three-way interaction was observed among group (proselfs versus prosocials), condition, and winning probability using a multivariate mixed ANOVA, *F*(4, 12) = 1.252, *p* = 0.341. However, the three-way interaction was significant when a mixed ANOVA was applied after Greenhouse–Geisser correction, *F*(2.216, 33.245) = 3.724, *p* < 0.05.

Additionally, we performed a correlation analysis between self– other indices (generated by squaring the difference in probability of making a risky choice between the self and other conditions) and individuals' TDM scores (obtained by counting the number of prosocial choices across eight sets of decision trials). We found a negative relationship between prosocial tendency and self–other indices (Pearson correlation coefficient *r* = −0.523, *p* < 0.05), indicating that more prosocial participants showed smaller differences in risky choice between the self and other conditions.

## **DISCUSSION**

The present study investigated the differences between the neural correlates of risky decision making on behalf of oneself and that on behalf of others *via* fMRI. The behavioral results showed that participants were more sensitive to risk-related information (i.e., probability of winning) in the decision-for-self condition than the decision-for-other condition. When participants decided for themselves, they became more risk-averse and riskseeking when the winning probability of the high-risk option was lower and higher, respectively. This tendency became weaker when people decided for others; this might suggest diminished affective responses to the risky situation in this condition. We sought to test this possibility by contrasting the neural correlates of risky decision making for oneself with those for others. The brain-activation pattern changed according to the target of the decision, such that reward-related regions were more active in the decision-for-self condition than in the decisionfor-other condition, whereas regions related to the ToM showed the opposite association. Parametric modulation analysis describing each individual's decision model revealed that the amygdala and DMPFC were involved in computing decision values targeting self and other, respectively. These findings indicate that fundamentally distinct neural processes subserve value computations when making risky choices for oneself versus for other people.

### **SELF–OTHER DIFFERENCES IN RISKY DECISION MAKING**

Participants were more likely to vary their choices according to winning probability in the decision-for-self condition than in the decision-for-other condition. More specifically, participants made risk-seeking and risk-averse choices when the winning probability of the risky option was high and low, respectively. This may indicate greater involvement of emotional processes in biasing risky choices for self versus other. The fact that this pattern was not observed in the decision-for-other condition may indicate weak emotional intrusion or effective cognitive regulation while making choices for other. Moreover, as is the case for other types of otherregarding behavior, risky decisions for others may also require the ability to understand others' minds. Indeed, we found in the additional behavioral experiment that the self–other difference in risk-seeking choices was affected by subjects' levels of prosociality. That is, prosocial participants made decisions for themselves and for others in the same manner, whereas proself participants made the two types of decisions in distinguishable ways. This result hints that the ToM function may contribute to risky decision making for others, given that prosocial orientation is tightly related to perspective taking and mentalization (Underwood and Moore, 1982). In addition, the amount of effort expended deciding for another person could be another determinant of individual differences in decision making regarding self versus other. In our additional experiment, proself participants were less sensitive to probability information that is critical for successful decisions, when making choices for others than for themselves. This suggests that people who are indifferent to others' benefit put less effort into decisions for others than those for themselves. In other words, making a decision for another person would be a painstaking task to someone who acts in the best interests of others (i.e., a prosocialist), because he/she would feel the need to reduce the fundamental self–other difference.

Overall, decision making on behalf of others seems to be a demanding process that entails expending more effort and cognitive resources than making decisions for oneself: it requires different psychological and physiological mechanisms and is more difficult. When people make decisions on behalf of others, cognitive processes are weighted more heavily than affective processes are (Fernandez-Duque and Wifall, 2007), and subjects tend to make such decisions in norm-based ways, such that they consider what they think is "right" rather than what they "feel like doing" (Stone and Allgaier, 2008). Subjects making decisions on behalf of others also seem to value their reputations (i.e., the impressions they convey to the people for whom they make the decisions; Jonas et al., 2005). At the same time, other-regarding behavior might require self-regulatory processes to deal with the conflict between selfish and prosocial motivations: subjects feel a need to regulate their emotional reactions and inhibit their selfish impulses to minimize cost, but if they do nothing, they neglect the other person's interests (DeWall et al., 2008). Thus, it would be reasonable to think that the self–other discrepancy in risky decision making observed in the present study reflects the different types of psychological processes (i.e., affective versus cognitive processes) associated with making decisions on behalf of oneself versus others, respectively. Further, the self–other difference in the amount of cognitive resources required during the risky decision task might have resulted in behavioral differences.

## **BRAIN REGIONS ASSOCIATED WITH DECISIONS FOR SELF AND FOR OTHER**

One of the main findings of the present study is that people seem to use different modes of decision making when they decide for themselves and for others; this is particularly emphasized by the neuroimaging results. In the decision-for-self condition, the VS, caudate, VTA, insula, and ACC were more active than in the decision-for-other condition. Given the large number of previous studies that reported strong associations between these regions and both reward processing (Breiter et al., 2001; Knutson et al., 2001; Baxter and Murray, 2002; Ernst et al., 2004; Yacubian et al., 2006; Carter et al., 2009; Ghods-Sharifi et al., 2009; Smith et al., 2009) and risk processing (Kuhnen and Knutson, 2005; Preuschoff et al., 2006), people might be more sensitive to reward and perceived risk when they make decisions for their own profit than for that of others. On the other hand, the TPJ, PCC, and MPFC showed greater activation in the decision-for-other condition than in the decision-for-self condition. Given that these regions are regarded as parts of the ToM network, which is central to understanding others' intentions through mentalization and perspective taking (Fletcher et al., 1995; Gallagher et al., 2000; Walter et al., 2004; Saxe and Wexler, 2005; Amodio and Frith, 2006; Frith and Frith, 2006; Saxe and Powell, 2006), it seems that people might activate their ToM systems in order to take another's perspective and thus perform the risky choice task for another's benefit. Supporting this idea, a recent study (Janowski et al., 2012) found that VMPFC activity during decision making for others – but not for oneself – was modulated by TPJ, one of the important brain regions involved in mentalization. These differences between decision making for

oneself and for others may lead to self–other distinctions in the value computation and decision processes, which are discussed below.

## **NEURAL CORRELATES OF VALUE COMPUTATION IN RISKY DECISIONS FOR THE SELF VERSUS FOR OTHERS**

The most noteworthy finding of the present study is revealed by the contrast between the decision models for self-targeted versus for other-targeted decisions. The parametric modulation analysis of each individual's decision models elucidated the distinct neural correlates of value computation for self and for other in risky decision making, revealing negative coupling between activations of the amygdala and the DMPFC whose magnitudes depended on the target of the decision. Activations in the amygdala and DMPFC were associated with value computation in the modified risk task, replicating the results of previous studies (Ghods-Sharifi et al., 2009; Smith et al., 2009; Morrison and Salzman, 2010). Importantly, the direct contrast between self-regarding and otherregarding decision making in terms of the computed value of the risky option revealed that the amygdala was more strongly associated with the value computation for self than that for other; this result is in line with the "risk-as-feelings hypothesis," which proposes that affective responses play a relatively greater role in risky decision making for oneself than for others (Loewenstein et al., 2001). On the other hand, the DMPFC was more engaged in the value computations regarding decisions for other than those for self; this result supported our prediction that cognitive processes might outweigh affective processes in risky decision making for others.

As we reasoned above, decision making for another without regard to one's own benefit could require effort and additional cognitive resources. In this respect, recent evidence on the role of theACC – which is immediately adjacent to the DMPFC – in effortbased decision making might provide an interesting explanation for our findings. For example, severing the connection between the amygdala and the ACC impaired rats' decision making abilities, such that they no longer chose a high-reward option that required more effort than the corresponding low-reward option (Floresco and Ghods-Sharifi, 2007). Studies in both animals and humans have shown that the ACC is sensitive to the amount of effort exerted during decision making and shows increased activation during increased effort to earn larger rewards in both animals and humans (Rudebeck et al., 2006; Croxson et al., 2009). Thus, it seems plausible that the stronger association between DMPFC activation and the value computation in the decision-for-other condition than in the decision-for-self condition may reflect the fact that people tend to expend greater amounts of effort during risky decision making for others than for themselves. In addition, the regulatory function of the DMPFC over amygdalar activity may play a role in creating the self–other distinction between the neural correlates of value computation. The DMPFC forms strong connections with the amygdala (Roy et al., 2009; Salzman and Fusi, 2010; Etkin et al., 2011; Hung et al., 2011; Robinson et al., 2011) and regulates its emotional reactions (Banks et al., 2007). Indeed, decision making for others requires self-regulatory processes in order to deal with the conflict between selfishness and prosociality (DeWall et al., 2008). The role of the DMPFC as part of the

ToM network (Fletcher et al., 1995; Gallagher et al., 2000; Walter et al., 2004; Amodio and Frith, 2006; Frith and Frith, 2006) provides another possible explanation for the self–other distinction in the neural correlates of value computation, especially in relation to perspective taking, mentalizing, and inferring others' intentions (St. Jacques et al., 2010). Consistent with this idea, a recent study showed that DMPFC activity during the judgment of others' opinions (Waytz et al., 2012) or during the observation of others' distress (Masten et al., 2011) predicted subsequent prosocial behavior. Thus, consideration of risky options for others may require inference of their mental states, which then in turn recruits the ToM network, including the DMPFC.

Consistent with the previous ToM literature, activity in the rTPJ – a major area for mentalization (Castelli et al., 2000; Saxe and Wexler, 2005; Decety and Lamm, 2007) – was greater during the decision-for-other than the decision-for-self condition in the present study. This region also showed heightened functional connectivity with DMPFC in the other–self contrast; the activity of DMPFC increased as a function of the computed values of risky option during choices for other more than during choices for self. The findings make it tempting to speculate that rTPJ may send a signal to DMPFC and contribute to its control of amygdalar activity when considering choices for others, enabling us to choose options with diminished emotional biases in risky decision making.

In summary, the results of the parametric modulation analysis support our prediction that risky decision making on behalf of another person may involve additional cognitive processes, including effort-based decision making, self-regulation, and ToM functions. Alternatively, the cognitive/rational system might outweigh the affective/experiential system in risky decision making on behalf of others, given the evidence that links the DMPFC to cognitive processes and the amygdala to affective processes.

This study also unfolds important questions that need to be addressed in future projects. First, we could not determine the relationship between individual differences in brain activity and behavioral responses. We computed a self–other difference index score for each participant and examined the relationship between neural activity and behavioral results. In contrast with our predictions, however,we failed to find statistically significant correlations between them. Although the exact reason for this failure is currently unknown,it may be thatfew participants showed sufficiently large self–other difference indices. This may have caused limitations in individual variability that obscured the relationship between participants' decisions and neural responses. Given the role of the prosocial trait in this task, as revealed in the second behavioral study, it would be interesting for a future fMRI study to select participants with a wide range of prosociality. Similarly, it would be interesting to investigate the neural underpinnings of prosocial orientation during self–other decision making, considering our finding from the additional behavioral experiment that increased prosocial orientation reduced self–other differences. We envision future studies to address this important issue.

Second, in this experimental design, we kept the magnitude of gain/loss for each option constant to minimize noise due to variable reward magnitude; we varied only the reward's attainability (*via* manipulation of winning probability), which modulated the attractiveness of the risky option. Therefore, the difference between the EVs of the high-risk and low-risk options changed with the winning probability, whereas the risk of each option (defined as outcome variance; Markowitz, 1952) remained constant across different levels of winning probability (see Table S2 in Supplementary Material). This feature of our experimental design may leave room for alternative interpretation of the behavioral results. More specifically, the greater sensitivity to winning probability in the decision-for-self condition may simply reflect choices based on the EV of the risky option. Likewise, we cannot completely rule out the possibility that people may have chosen the high-risk option for others less than for themselves out of spite, that is, with the intention of lowering the benefits of others. In this sense, particular caution may be necessary in interpreting the observed correlations between neural activity and the model parameters, and future study should allow for more-systematic manipulation of the EV and risk values for each option.

Third, we did not measure the various psychological factors that could have affected the self–other difference. For instance, it is possible that the subjective social distance between a participant and another person for whom the participant made the decision could have affected his/her decisions, although we explicitly told the participants that they were making decisions for an anonymous person. It would be interesting to test the effect of social distance by comparing decisions made for individuals with whom the subject is close with decisions made for strangers. In addition, we could not confirm which of the psychological processes discussed above is the most prominent driver of the self–other difference. Future studies using different types of tasks or including additional behavioral and physiological measurements, such as eye movements, skin conductance response, or glucose consumption levels, could further elucidate

## **REFERENCES**


M., Contoreggi, C., et al. (2003). Orbitofrontal cortex dysfunction in abstinent cocaine abusers performing a decision-making task. *Neuroimage* 19, 1085–1094.


the mechanisms underlying the self–other discrepancy in risky decision making.

The present study included direct comparisons between risky decisions for self and other in a single experiment and provided the first evidence of differences in neural processes between risky financial decisions on behalf of oneself and those on behalf of other. Reward systems were activated when people decided for themselves, whereas the ToM network became more active when subjects made decisions for another person. Most importantly, activity in the neural loci of value computation differed between risky decisions for oneself and for others: the amygdala and DMPFC were associated with decisions on behalf of oneself and others, respectively. Our findings suggest that affective processes have greater weight than cognitive processes in risky decision making for self. On the other hand, decision making for others seems to be a more difficult and effortful process that engages cognitive systems and emotional regulation, in which ToM functions might also participate. We expect future research to follow up on the present findings with the aim of providing a more-complete understanding of the neural mechanisms underlying prosocial and other-regarding behaviors.

## **ACKNOWLEDGMENTS**

This study was supported by the Cognitive Neuroscience Program of the Korean Ministry of Science and Technology (M10644020003-06N4402-00310) and the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2011-327-H00038).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Decision\_Neuroscience/10. 3389/fnins.2013.00015/abstract


on difficult medical decisions. *Arch. Intern. Med.* 152, 1049–1054.


the effects of person perception in a give-some dilemma. *Eur. J. Pers.* 3, 209–225.


others in relationships. *J. Behav. Decis. Mak.* 18, 125–144.


C. (2006). Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. *J. Neurosci.* 26, 9530–9537.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 March 2012; accepted: 28 January 2013; published online: 20 March 2013.*

*Citation: Jung D, Sul S and Kim H (2013) Dissociable neural processes underlying risky decisions for self versus other. Front. Neurosci. 7:15. doi: 10.3389/fnins.2013.00015*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2013 Jung , Sul and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Neuroeconomic measures of social decision-making across the lifespan

#### **Lusha Zhu<sup>1</sup> , DanielWalsh2,3 and Ming Hsu2,3\***

<sup>1</sup> Virginia Tech Carilion Research Institute, Virginia Polytechnic Institute and State University, Roanoke, VA, USA

<sup>2</sup> Neuroeconomics Laboratory, Haas School of Business, University of California Berkeley, Berkeley, CA, USA

<sup>3</sup> Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA

#### **Edited by:**

Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland

#### **Reviewed by:**

Geoffrey Schoenbaum, University of Maryland School of Medicine, USA Bernd Weber, Rheinische-Friedrich-Wilhelms

#### Universität, Germany **\*Correspondence:**

Ming Hsu, Haas School of Business, University of California, 2220 Piedmont Avenue, Berkeley, CA, USA. e-mail: mhsu@haas.berkeley.edu

Social and decision-making deficits are often the first symptoms of a striking number of neurodegenerative disorders associated with aging. These includes not only disorders that directly impact dopamine and basal ganglia, such as Parkinson's disorder, but also degeneration in which multiple neural pathways are affected over the course of normal aging. The impact of such deficits can be dramatic, as in cases of financial fraud, which disproportionately affect the elderly. Unlike memory and motor impairments, however, which are readily recognized as symptoms of more serious underlying neurological conditions, social and decision-making deficits often do not elicit comparable concern in the elderly. Furthermore, few behavioral measures exist to quantify these deficits, due in part to our limited knowledge of the core cognitive components or their neurobiological substrates. Here we probe age-related differences in decision-making using a game theory paradigm previously shown to dissociate contributions of basal ganglia and prefrontal regions to behavior. Combined with computational modeling, we provide evidence that age-related changes in elderly participants are driven primarily by an over-reliance in trial-and-error reinforcement learning that does not take into account the strategic context, which may underlie cognitive deficits that contribute to social vulnerability in elderly individuals.

**Keywords: aging, game theory, reinforcement learning, strategic learning, neuroeconomics, decision neuroscience**

## **INTRODUCTION**

A widow responds to a telemarketing investment firm's offer of financial security. The firm convinces her to convert all her assets to risky, liquid investments managed by the firm. Over the course of a year, the firm provides near constant attention to the widow, who, by the end of the year, had lost \$800,000 (Starnes, 1996). Such crimes are unfortunately common. Although there is widespread recognition of elderly fraud among both financial and legal scholars, and efforts to introduce legislation to combat this problem (e.g., Smith, 2000), we know very little about the specific sources of such vulnerability at the neurobiological level. Unlike memory and motor impairments, which are readily recognized as symptoms of more serious underlying neurological conditions, decision-making deficits often do not elicit comparable concern in the elderly (Denburg et al., 2007). There are also few neuropsychological tools or biomarkers available to measure decision-making deficits, particularly those that contain a social component such as susceptibility to fraud.

Here we sought to probe age-related effects of an important class of social behavior captured by economic games, and build upon recent advances in understanding of the neural substrates of value-based decision-making. Intuitively, efficient value-based decision-making requires organisms to make decisions to obtain rewards and avoid punishments that are present in the environment (Fehr and Camerer, 2007; Rangel et al., 2008; Maia and Frank, 2011). In the social domain, however, organisms also need to anticipate and respond to actions of others competing or cooperating for the same rewards.

Neurobiologically, there is much evidence that the capacity to make appropriate value-based decisions depends critically upon integrity of the nigrostriatal dopamininergic (DA) system and frontostriatal circuits, which is well known to degenerate over the course of aging (Bäckman and Farde, 2005). Furthermore, there is growing consensus that the computational underpinnings of these systems can be parsimoniously characterized by reinforcement learning (RL) theories of behavior (Sutton and Barto, 1981; Schultz et al., 1997). This synthesis of theory and data has led to speculations that abnormalities observed in healthy older adults is at least partially caused by age-related decreases in neuronal number in these circuits, as well as a decreased number of synapses in those neurons (Li et al., 2001; Li and Sikström, 2002; Samanez-Larkin et al., 2007).

Despite this rapid progress, however, there has been limited application of this formal framework to understand age-related changes in value-based decision-making in the social domain. Here, in addition to needing to learn about available rewards and punishments in the environment, agents also need to anticipate and respond to cooperative or competitive actions of others (Camerer, 2003; Lee, 2008). This requires the ability to behave strategically, which has been the subject of intense study in theoretical biology and game theory (Fudenberg and Levine, 1998; Hofbauer and Sigmund, 1998). Game theory provides a mathematically precise description of the social environment, thus allowing for quantitative modeling of behavior that can build upon previous findings on reward learning (Fehr and Camerer, 2007; Lee, 2008).

An important insight of this literature is that standard RL models provide an incomplete account of strategic learning. Individuals blindly exhibiting RL behavior in social and strategic settings are essentially ignoring the fact that their behavior can be exploited by others (Camerer, 2003; Hampton et al., 2008). In contrast, another well-studied class of learning models, commonly referred to as belief-based learning, requires players to form and update first-order beliefs regarding the likelihood of future actions of opponents through experience, and provides a tractable model of social learning in relatively simple environments. Neurobiologically, there is converging evidence that social decision-making depends upon a broader network of regions that project to the striatum,in particular the medial prefrontal cortex (mPFC),which is widely thought to be intimately involved in "theory of mind" critical for social cognition and strategic reasoning (Amodio and Frith, 2006; Jackson et al., 2006; Saxe, 2006).

Results from our previous research has shown that this paradigm engaged key components of the frontostriatal circuits, and to dissociate their respective contributions to behavior (Zhu et al., 2012). Specifically, using model-based fMRI, activity in the ventral striatum was found to underlie standard model-free RL through trial and error. In contrast, activity in the mPFC underlies more cognitively sophisticated belief learning that involves forming and responding to first-order beliefs about the actions of other individuals. Based on these results, we hypothesize that the ability to make advantageous social decisions would decline over age as a result of decline in higher-order cognitive functions that we believe to be crucial for complex social decision-making. Furthermore, we hypothesized that the differences in behavior can be captured by key parameters in the computational model across age cohorts.

### **MATERIALS AND METHODS**

#### **SUBJECTS**

We compare results from 30 young subjects (16 female, mean age 23.3 ± 4.6 years) from University of Illinois at Urbana-Champaign, and 29 elderly subjects (14 female, mean age 64.1 ± 5.4 years) recruited from: (1) local flyers and bulletins in the Berkeley community, (2) online forums such as Craigslist, and (3) Berkeley Retirement Center (**Table 1**). All elderly subjects were tested on the mini-mental status exam and self-reported to be healthy and with no significant neurological issues.

#### **EXPERIMENTAL PARADIGM**

We used the "Patent Race" game, first studied experimentally by Rapoport and Amaldoss (2000), and most recently used in our previous neuroimaging study. This game is simple in motivation but rich in the strategic nuances and the patterns of behavior that it can generate (Zhu et al., 2012). In the game, two opposing players are randomly matched from a large pool of players at the beginning of each round and compete for a prize by choosing how much to invest (in integer amounts) from their respective endowments. The player who invests more wins the prize, while the other loses. In the event of a tie, neither player wins the prize. Players keep the part of their endowment that is not invested.

#### **Table 1 | Demographic information of participants.**


In the particular payoff structure that we use, the prize size is 10, and players are of two types: *Strong* and *Weak*. The Strong player has five units of endowment, and can invest between 0 and 5 units in integer amounts, whereas the Weak player has four units to invest, and can invest between 0 and 4 units (**Figure 1**). Furthermore, to reduce cognitive burden associated with playing this relatively complex game, we used a new interface first introduced in Zhu et al. (2012). This interface replaced the standard matrix form representation of the game that contains 60 elements with one that directly reflects the logic of the game.

#### **PROCEDURE**

Upon arrival at the laboratory, subjects were given instructions and quiz to ensure the understanding of the experiment. Participants played two stages of 80 rounds each of strong and weak roles (counterbalanced). Opponents' choices were drawn from a pool of 16 young adults who participated in an earlier session at University of Illinois at Urbana-Champaign. We ran subjects separately to allow us to pair players against a common distribution of opponents. Previous sessions comparing "live" sessions and "non-live" sessions show that young adults do not differ significantly across treatments (Zhu et al., 2012). All subjects were fully informed of the purposes of the research and were free to withdraw without penalty. Elderly participants further completed psychometric tests for IQ (Shipley Institute of Living Scale) and executive functioning (Wisconsin Card Sorting Task, WCST).

#### **COMPUTATIONAL MODELING**

To quantitatively compute the mapping from the stimulus inputs to the behavioral observations, we used the "experience-weighted average"(EWA) model first introduced byCamerer and Ho (1999). This model embeds both RL and belief learning, two of the most widely used approaches to studying learning in competitive games.

These two learning rules differ with regards to the information that subjects use to update action values. Intuitively, at the end of each round, the subject receives two pieces of information – the received rewards in the form of payoffs, and how much the opponent invested. For example, consider two rounds where the subject chose 5, but where the opponent chose 0 in one round and 4 in the other. In both cases, the subject's received payoff is 10. However,

in the former the subject could have earned more by investing less, with the optimal investment being 1. In the latter case, however, the subject cannot improve by investing any other amount.

Under RL, players are assumed to ignore the actions of the opponent, and as a result treat both cases as equivalent. On the other hand, belief learning assumes that players, either directly or indirectly depending on the particular interpretation, include this information in updating of action values (Cheung and Friedman, 1997; Camerer, 2003). The hybrid EWA model provides a parametric account of the weighting between the two learning rules, as well as capturing how the past experiences depreciate over time, both of which we will study in our data.

Formally, on each round, player *i* assigns a value *V k i* (*t*), to each strategy *S k i* in the strategy set *S<sup>i</sup>* = n *S* 1 *i* , *S* 2 *i* , . . . , *S k i* o , (i.e., investment amount). They also come into the game with certain prior beliefs *N*(0), which reflect either the result of logical deduction or previous life experiences. Denote *Si*(*t*) as the investment amount by player *i* at period *t*, and *s*−*i*(*t*) as the investment amount of the opponent at period *t*, the evolution of *V k i* (*t*) and *N*(*t*) is governed by three parameters and updates according to the following:

$$V\_i^k(t) = \begin{cases} \frac{\phi\_i N(t-1) V\_i^k(t-1) + \delta\_i \pi\_i \left( S\_i^k, S\_{-i}(t) \right)}{N(t)}, & \text{if } S\_i^k \neq S\_i(t) \\\frac{\phi\_i N(t-1) V\_i^k(t-1) + \pi\_i \left( S\_i^k, S\_{-i}(t) \right)}{N(t)}, & \text{if } S\_i^k = S\_i(t) \end{cases}$$

$$N(t) = \rho\_i N \left( t - 1 \right) + 1$$

As discussed in Camerer and Ho (1999), the three parameters capture qualitatively distinct aspects of the learning process. First, two of the parameters describe distinct notions of "experience": pre-game experience (or prior beliefs) and in-game experience. Updating of the former (pre-game prior beliefs) is controlled by the parameter ρ*<sup>i</sup>* , such that a large value of ρ*<sup>i</sup>* leads prior beliefs to wear off quickly. On the other hand, updating of in-game adaptation – that is, responsiveness to actual experience during the game – is captured via the parameter ϕ*<sup>i</sup>* , where smaller values imply greater weight placed on recent game experience. Finally, the weight between reinforcement and belief learning is captured

by the parameter δ*<sup>i</sup>* , which reduces to pure RL when δ*i*=0, and to pure belief learning model when δ*i*=1.

To convert latent values *V k i* (*t*) to choice probabilities, we assume that the probability of player*i* playing *S k i* follows a softmax distribution *P k i* (*t* − 1) = exp λ • *v k i* (*t*) / P*<sup>L</sup> <sup>l</sup>* = 1 exp λ • *v k i* (*t*) , where λ is a measurement of subjects' sensitivity to differences in latent values (Camerer and Ho, 1999; Hsu et al., 2005). Using initial values *N*(0) and *V k i* (*t*) calculated from first period data (Roth and Erev, 1995; Ho et al., 2008), we performed maximum likelihood estimation at the individual level for both young and elderly cohorts using a grid search over a large range of values for all free parameters. That is, we maximized for each subject the log-likelihood function P *t* log *P si*(*t*) *i* (*t*) Standard errors were estimated through a jackknife procedure (Camerer and Ho, 1999; Zhu et al., 2012).

#### **RESULTS**

Our primary hypothesis is that elderly adults will exhibit slower adaptation in strategic learning as compared to young adults. That is, elderly adults will be less responsive to the actions of opponents in terms of choice behavior. Furthermore, using our computational paradigm, we aim to distinguish between contributions of three non-mutually exclusive computational accounts of any observed age-related changes. First, we test whether older adults employ less belief-based learning, and rely more upon simpler RL. This would suggest that behavioral differences are caused by not taking a complete account of possible information in the decision context. Second, we test the hypothesis that older adults may be less sensitive to recent in-game experiences. This will be reflected in the estimated values for parameter ϕ, and can intuitively capture the notion that older adults are more"sluggish"in their adjustment process (Kovalchik et al., 2005). Third,we test whether older adults exhibit stronger pre-game prior beliefs, captured by the parameter ρ, which would suggest that they are more "stubborn" in the sense that their pre-game prior belief decays slower. These hypotheses and a discussion of the different parameters are summarized in **Table 3**.



#### **MODEL-FREE MEASURES**

We begin with simple model-free comparisons of choice behavior across age groups. **Table 2** presents the empirical frequencies of choices for each age group separated by player role. In order to provide a benchmark for such comparison, we also include Nash equilibrium choice probability predictions. The unique Nash equilibrium prediction is that strong players should invest five 60% of the time, one and three 20% of the time respectively, and weak players invest zero 60% of the time, two or four 20% of the time. As shown in **Table 2**, young subjects on average were reasonably close to the Nash equilibrium prediction with the exception of overinvesting 4 and underinvesting 3 as strong players, whereas the distribution of choices made by elderly strong subjects were further from Nash equilibrium prediction, with more evenly distributed choice over investing 2, 3, 4, and 5. Yet as weak players, both elderly and young subjects overinvested 0 and underinvested 4. However, elderly subjects also overinvested 1, which is the iteratively dominated strategy for the weak role. A test of the proportion of deviation showed that play from elderly cohort's deviation from Nash equilibrium significant more than did the young cohort (*p* < 0.01).

To examine the "stickiness" of choices between successive rounds, we computed the instances where participants switched investment levels versus those where they did not. This gives us an index of the proportion of rounds in which participants switched strategies, versus those rounds in which they stayed (**Figure 2**). We found that young subjects on average repeated investment in 44% of the choices over the course of the experiment, which is remarkably similar to the Nash equilibrium prediction. In contrast, we found that elderly subjects repeated previous investments at a much higher rate (60%), and significantly greater than more often than young adults (*p* < 0.05).

#### **MODEL-BASED MEASURES**

To provide a mechanistic account of the differences as measured using the model-free measures, we next fitted choice behavior of our participants using the EWA model. First, we compared

the mean goodness of fit of the model across both cohorts. A significant difference may suggest that comparisons of estimated parameters are biased due to different explanatory powers of the model. We found, however, that the mean log-likelihood values did not differ significantly between the young and old cohorts (**Table 3**, *p* > 0.2). This suggests that our computational model is able to capture trial-by-trial variations in behavior at a similar rate for both cohorts.

Next, we compared the mean values of the individual-level parameter estimates (**Table 3**). We found that the mean estimates for parameter δ was significantly lower for the elderly as compared to the young (0.48 for young, 0.28 for elderly, *p* < 0.05), indicating that the elderly on average employ less belief-based strategy, and more reinforcement. This is in line with the findings through fMRI that there may exist significant tissue loss in gray matter volume in mPFC, which is indicated to be involved in belief-based learning in our previous study (Zhu et al., 2012).

In contrast, we found that mean estimates for both types of discount rates did not differ significantly between young and elderly cohorts. Both young and elderly cohorts were estimated to have a similar value of φ (0.95 for the young; 0.89 for the elderly, *p* > 0.1), suggesting that both groups responded smoothly to past in-game experience. Similarly, both groups also discounted prior-game beliefs, captured by parameter ρ, at approximately similar levels (*p* > 0.1, **Table 3**). Neither finding can be explained by differences in the learning environment as both cohorts faced the same pool of opponents.

Motivated by findings in the aging literature that aging increases variability of behavioral responses (Samanez-Larkin et al., 2010), we next investigated individual differences in the learning parameters using our model. We therefore compared the empirical cumulative distributions of the model fit and parameter estimates across the cohorts. We found that the distribution of the model fits of the two cohorts, as measured by the log-likelihood value, is distributed similarly, such that there is no indication of increased variance or clustering of the elderly cohort (**Figure 3A**). Similarly, we found that the two discounting parameters are also similarly distributed across age cohorts. That is, there was no indication of increased variance in either of the discounting parameter estimates (**Figures 3C,D**).



<sup>1</sup>Parentheses indicate SEM.

<sup>2</sup>Both t-test and Kolmogorov–Smirnov test p-values are given, with those corrected for multiple comparisons (three tests) given in parentheses.

In contrast, the mediation parameter δ showed that behavior in approximately half of the elderly cohort was driven entirely by RL (**Figure 3B**). In light of the non-normal distribution of the δ parameter estimates, we also included significance tests using the Kolmogorov–Smirnov (K–S) test, a non-parametric, distributionfree test that is robust to violations of normality (**Table 3**). We found that all results using the K–S test are consistent with those using the *t*-test, and show highly significant differences in the δ parameters between age cohorts (*p* < 0.001). Interestingly, the remaining elderly cohort appear to be distributed similarly to the young adults, as can be seen by the upper half of the distribution (**Figure 3B**). There was, however, no indication that this group of pure reinforcement learners differed on other dimensions. Surprisingly, we found no differences in either demographic or other model estimates for these two groups. The only measure that approached significance was the value of ρ. The low δ group had a slightly higher mean ρ (0.95) compared to the high δ group (0.70), which is significant at the *p* < 0.1 level. This lack of differentiation, however, may well be due to a lack of power in our sample given the relatively modest sample size and restricted range of age for the elderly cohort.

#### **DISCUSSION**

We spend much of our lives devoted to the accumulation of financial and social prosperity, and often with much success. To take just one measure, the median net worth of a 65-year-old American in 2007 is more than double that of a 40 year old (Bucks et al., 2009). For many, however, such wealth comes at a vulnerable time when the cognitive and neurological apparatus that made this possible is beginning to break down (Plassman et al.,2008). This vulnerability can be attributed in part to a decline in the ability to make decisions that take into account the appropriate cost-benefit tradeoffs. Often these decisions take on a social dimension, where the elderly appear particularly vulnerable. For example, it is well known that the elderly are disproportionate targets of fraud across the world, and constitute a conservatively estimated 30% of all fraud victims in the United States (Templeton and Kirkman, 2007; Bucks et al., 2009).

An understanding of the neurocognitive substrates of these vulnerabilities therefore depends upon the availability of neuropsychological tools that can be used to probe and characterize such decision-making deficits, particularly those that contain a social component such as susceptibility to fraud. This work makes two contributions toward this goal. First, we show our novel social learning paradigm was able to probe behavioral differences between cohorts, as well as individual differences within cohorts. We found that, in contrast to young adults, learning in about half of the elderly adults is driven primarily by RL. Future studies can explore the degree to which such changes are present in other social and non-social settings that require higher-order cognition, such as cooperative interactions (King-Casas et al., 2005; Chang et al., 2011), or those involving explicit task structure (Ribas-Fernandes et al., 2011; Simon and Daw, 2011). Second, using a well-established computational model of strategic learning, we were able to dissociate between two possible sources of the observed differences. Previous accounts have largely

focused on qualitative descriptions of "sluggishness"of adjustment of behavior. However, such a behavior can be a result of either (1) an inability to integrate new information into one's reward expectations by discounting previous experiences, captured by the two different discounting parameters, or (2) an attenuation of the ability to extract or integrate information beyond the received rewards and punishments. We show that the behavior is primarily driven by the latter, and in particular an attenuation of the ability to observe or integrate actions of others or counterfactual outcomes. In contrast, elderly individuals did not show significant differences in two types of discounting of past experiences.

More broadly, our results potentially shed light on contradictory findings in previous psychological and economic studies of age-related effects in relation to social behavior. In particular, a number of studies have suggested a decline in the ability of theory of mind with normal aging (McKinnon and Moscovitch, 2007; Slessor et al., 2007). Yet others found that older adults actually performed better than younger adults, even in the face of possible decline in many forms of cognitive processing (Happé et al., 1998; Grossmann et al., 2010). Using a behavioral economic approach similar to ours, Kovalchik et al. (2005) compared the ability of strategic reasoning between the young and healthy elderly subjects using the so-called "*p*-beauty contest." This task has been widely used in previous behavioral studies using traditional undergraduate subjects as well as non-standard populations such as business executives and portfolio managers (Nagel, 1995; Duffy and Nagel, 1997). Surprisingly, they similarly found no significant difference between the healthy elderly (mean age 82) and young undergraduate participants.

Combined with our results, however, these results suggests that the diminished reliance on mentalization and/or counterfactual information to *dynamically update* behavior may reflect core changes in the cognitive processing of social information that occurs over the aging process. This is as opposed to strategic reasoning, which refers to the *static* inferential process of guessing what others will do without any prior contractual agreement, which may well be preserved during aging. This hypothesis is consistent with previous findings of behavioral deficits in elderly

## **REFERENCES**


patients in the"Iowa Gambling Task" (IGT; Bechara et al., 2000), as well with what is known about degeneration of the dopaminergic circuits and mPFC that supports value-based decision-making. In particular, longitudinal studies have found frontal lobes suffered the most drastic loss of volume as assessed through MRI (Resnick et al., 2003).

In contrast, we speculate that static reasoning capacities in the elderly may be partially preserved by reallocation of processing resources from other brain regions. Such compensatory processes at the neural level have been found across a variety of cognitive functions, including episodic retrieval and visual perceptual attention, and which occur even in the face of global declines in neural integrity (Davis et al., 2008). For example, there is abundant evidence that older adults compensate for declines in bottom-up sensory processing by over-recruitment of top-down processes mediated by PFC (Davis et al., 2008; Dennis and Cabeza, 2008).

In the case of social cognitive functioning, there is substantial evidence that, during development, the so-called "mentalizing system" – consisting of the anterior mPFC, the posterior superior temporal sulcus at the temporoparietal junction (pSTS/TPJ), and the anterior temporal lobe (ATL) – undergo substantial changes in their functional response to social information such as mental states (Paus, 2005; Blakemore, 2008; Burnett et al., 2009). Adolescents have been shown, for example, to exhibit greater activity within the mPFC than do adults in social cognition tasks (Burnett et al., 2009). In contrast, we know much less about how neural responses change over adulthood in the social cognition and behavior (Castelli et al., 2010; Beadle et al., 2012; Moran et al., 2012), and whether they might have compensatory functions that have been documented for other cognitive functions. A more complete account of these age-related changes, however, is only possible with a proper characterization of the computational and structural integrity of the underlying neural systems and their interactions.

#### **ACKNOWLEDGMENTS**

This research was supported by the Risk Management Institute and a CEDA Pilot grant from the University of California, Berkeley (Ming Hsu).

learning in normal form games. *Econometrica* 67, 827–874.


(2008). Que PASA? The posterioranterior shift in aging. *Cereb. cortex* 18, 1201–1209.


making: A comparison between neurologically healthy elderly and young individuals. *J. Econ. Behav. Organ.* 58, 79–94.


elimination of strongly dominated strategies: an experimental investigation of states of knowledge. *J. Econ. Behav. Organ.* 42, 483–521.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2012; accepted: 19 August 2012; published online: 21 September 2012.*

*Citation: Zhu L, Walsh D and Hsu M (2012) Neuroeconomic measures of social decision-making across the lifespan. Front. Neurosci. 6:128. doi: 10.3389/fnins.2012.00128*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Zhu, Walsh and Hsu. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Social information and economic decision-making in the ultimatum game

## **Celia Gaertig1,2, Anna Moser 1,3, Sonia Alguacil <sup>1</sup> and María Ruz <sup>1</sup>\***

<sup>1</sup> Department of Experimental Psychology, University of Granada, Granada, Spain

<sup>2</sup> Laboratory for Biological and Personality Psychology, Department of Psychology, University of Freiburg, Freiburg, Germany

<sup>3</sup> Department of Cognitive, Perceptual and Brain Sciences, University College London, London, UK

#### **Edited by:**

Ming Hsu, University of California Berkeley, USA

#### **Reviewed by:**

George Christopoulos, Nanyang Technological University, Singapore Claudia Civai, University of Minnesota, USA

#### **\*Correspondence:**

María Ruz, Department of Experimental Psychology, University of Granada, Campus Universitario Cartuja s/n, 18071 Granada, Spain. e-mail: mruz@ugr.es

The present study tested how social information about the proposer biases responders' choices of accepting or rejecting real monetary offers in a classic ultimatum game (UG) and whether this impact is heightened by the uncertainty of the context. Participants in our study conducted a one-shot UG in which their responses had direct consequences on how much money they earned. We used trait-valenced words to provide information about the proposers' personal characteristics.The results show higher acceptance rates for offers preceded by positive words than for those preceded by negative words. In addition, the impact of this information was higher in the uncertain than in the certain context. This suggests that when deciding whether or not to take money from someone, people take into account what they know about the person they are interacting with. Such non-rational bias is stronger in an uncertain context.

**Keywords: ultimatum game, decision-making, social information, uncertainty**

## **INTRODUCTION**

Within the emerging field of judgment and decision-making, it is broadly accepted that humans are not purely rational decisionmakers (see Camerer, 2003). A recent line of research within this field tries to capture the nature of decision-making in social contexts. This is particularly interesting as many of our everyday choices involve or affect other people.

Regarding the aspects that influence such decisions, different studies stress the importance of emotions as a biasing factor. It has been shown that displayed positive and negative facial expressions (e.g., Scharlemann et al., 2001; Ruz and Tudela, 2011) as well as induced emotions unrelated to the task (Harlé and Sanfey, 2007) influence decision-making in inter-personal interactions. Further aspects that have been found to influence decision-making include the physical attractiveness and the gender of people with whom we interact (e.g., Solnick and Schweitzer, 1999; Solnick, 2001; Eckel and Grossman, 2008).

As another clear biasing factor, social information has been shown to have an effect on economic choices in social contexts with high degrees of uncertainty, such as those in the Trust Game (Delgado et al., 2005). In an iterated Trust Game, trading partners described as morally praiseworthy were trusted more often than those with neutral or untrustworthy moral character, even when the descriptions had no predictive value regarding the actual behavior of the partners. As the reciprocity of the unknown partner in this game has direct consequences on the monetary outcome of the truster, it seems useful to rely on any relevant information available to guide trust choices. This matches previous data in non-social contexts showing that uncertainty increases the value of information. For example, people might be more disposed to being influenced when they lack complete knowledge of the situation (e.g., Behrens et al., 2007; Rushworth and Behrens, 2008) and might try to make use of any additional piece of information they can gather. As a practical example, when making decisions on the stock market, investors facing unstable prices are more receptive to new tips than during stable periods (Schachter et al., 1985).

There are other social situations in which the degree of uncertainty is smaller than in the Trust Game, such as when making choices of accepting or rejecting offers in the Ultimatum Game (UG; Güth et al., 1982). In this game two people interact to divide a sum of money between them. One of them, the *proposer*, receives a certain amount of money. He has to split it into two parts, one for him and one for his counterpart, the *responder*. The responder then can either accept or reject the proposal. If he accepts it, both receive their part; if he rejects it, neither the proposer nor the responder gets any payoff. For the responder the degree of uncertainty of this situation is low, seeing that he reacts to a given decision of the proposer in form of a monetary offer.

From the economic point of view, the self-interested, incomemaximizing *homo economicus* should accept every kind of offer, no matter how little it is (Nash, 1950). However, such predicted behavior is not confirmed in experimental settings where small offers (of 20% or below of the initial amount) are rejected about half of the times (Camerer, 2003). Irrational rejection of unfair offers in the UG may be explained by several factors, such as inequity aversion (Fehr and Schmidt,1999) and emotions accompanying the perception of unfairness. Responders often feel wounded pride and anger when facing unfair offers and tend to punish their selfish game partner favoring emotional satisfaction over money gains (Pillutla and Murnighan, 1996). Physiological (van't Wout et al., 2006) and neuroimaging studies (Sanfey et al., 2003) support the important role of emotions in the UG, showing, for example, that emotionally relevant brain regions, such as the right anterior insula, are activated when participants are faced with unfair offers (Sanfey et al., 2003).

As the responder in the UG finds himself in a situation where the decision of the proposer has already taken place, there is no obvious reason why his choices should change depending on the information he has about the person he interacts with. However, even in such a certain context social information about the proposer seems to influence decision-making.

Using a modified version of the UG, Ruz et al. (2011) showed that personal descriptions of game partners biased decisions to the same set of offers. Offers preceded by negative words were rejected with higher probability than those preceded by positive words. In addition, rejection responses were faster after negative words, whereas acceptances were faster following positive words, which suggests that the social information primed action tendencies. Thus, even though the words did in no manner predict how fair the following offer was going to be, social information regarding the partner affected the decisions of participants in this game. Furthermore, these authors introduced a manipulation of the uncertainty of the social situation and found that the social bias had a much larger effect when the context of the game was uncertain. Some characteristics of this study, however, limit the scope of the results. First, Ruz et al. (2011) used a modification of the UG instead of the original task setting. In their version, the difference between the two parts of a split was either one (fair offers) or four (unfair offers) and the responder's part of the split could be either higher or lower than their partner's amount. Thus, offers could be either convenient or inconvenient for the participant. Furthermore, to enable measurement of response times, they required participants to take their decision within a time limit of 1500 ms. When participants did not respond on time, they saw a message stating that the higher amount of the split would be added to their partner's earnings. This leaves open the question of whether similar results would be obtained in a version closer to the classic UG. Additionally, they did not pay real money to participants, and thus it could be claimed that the social information biased responses because participants did not have anything at stake.

Two recent studies solve part of these problems. Campanhã et al. (2011) used the classic UG and demonstrated that friendship with the proposer modulated the choices made by the responder in the game. More specifically, responders rejected unfair offers less frequently when the proposer was believed to be a friend rather than an unknown person. However, as several rounds were played with the same partner, it is not sure that the responders' choices reflected responses to a single offer instead of bargaining behavior. It must also be noted that no real money was offered to participants. Furthermore, interactions with a friend can always be affected by the long-term relation we hold with this person, which may have affected the results found by Campanhã et al. (2011).

Another study (Marchetti et al., 2011) showed that the type of information about the proposer provided to the responder has an influence on decision-making in the UG. Most interestingly, they found an interaction between the psychological description of the partner and the fairness of the offer: a negative (selfish) description of the proposer led to a decreased acceptance rate of fair offers, while a positive (generous) description led to an increased acceptance rate of unfair offers. As this study employed

one-shot interactions with unknown partners, concerns regarding long-term interactions or even friendship do not arise. However, as in the game by Ruz et al. (2011) and in the study of Campanhã et al. (2011), participants in this study did not receive money in accordance with their decisions.

Thus, it still has to be tested whether social information regarding the partner in a classic UG biases people's decisions to offers of real money, which was the goal of the present study. We conducted a classic computerized, one-shot UG and used traitvalenced words to describe the moral characteristics of otherwise unknown partners. As previous results indicate that the level of uncertainty modulates the scope of the biasing information in a modified UG (Ruz et al., 2011), we included a manipulation of uncertainty to explore whether the level of uncertainty also affects responses in the classic UG.

Participants of our experiment played the role of the responders and received either fair or unfair offers from several different proposers represented by the computer. Following the findings of the classic UG it was hypothesized that more fair than unfair offers would be accepted. As a description of the partners' characteristics in each trial, each offer was preceded by a word with positive or negative valence highly linked to morality and trustworthiness (see **Table 1**). It was hypothesized that acceptance rates would be biased by this social information, with higher acceptance rates for offers preceded by positive than for those preceded by negative words. Additionally, participants had either full or incomplete information about the outcome of their choices, which modulated the uncertainty of the game. In the certain condition, participants were informed of which part of the split corresponded to them, while this information was not given in the uncertain condition. It was hypothesized that the influence of the valence of the word would be higher in the uncertain context. Experiment 1 confirmed these hypotheses. Experiment 2 showed that personal information only influenced choices when it was *attributed* to the partners in the game.

#### **EXPERIMENT 1**

Experiment 1 manipulated the level of uncertainty (uncertain vs. certain), the type of offer proposed by the partner in the trial (fair

**Table 1 | List of words selected as stimuli in the study and acceptance rates of offers depending on the word that preceded the offer.**


Standard deviations are given in parentheses.

vs. unfair) and the valence of the word preceding the offer (positive vs. negative).

## **MATERIALS AND METHODS**

#### **Participants**

Thirty-six native Spanish-speaking, right-handed students from the University of Granada participated in the study (23 female, 18–27 years, average 21.5). All participants signed a consent form approved by the Department of Experimental Psychology of the University of Granada. In exchange for their participation in the study, participants were paid. The payment amount depended on their earnings during the game task and ranged from about 3–6 Euros.

## **Stimuli**

Sixteen trait-valenced words were selected from the Spanish translation of the Affective Norms English Word database (ANEW; Redondo et al., 2007) as stimuli in the study. The words selected provided moral and trustworthiness information and had either a positive (in average 7.5, *SE* = 0.5) or a negative valence (in average 1.9, *SE* = 0.3). Positive and negative words were equated regarding number of letters and frequency of use (average number of letters: 6; average frequency: 21.72; all *p*s > 0.390). The English translation of the words used is listed in **Table 1**.

#### **Procedure**

First, an introduction explaining the rules of the UG was given to the participants. They were told that they were going to play the UG in the role of the responder and were going to receive offers that other participants had made in previous experiments of the lab. To enhance the plausibility of this cover story, participants completed a short questionnaire in which they themselves generated offers for 16 anonymous partners. For each partner they were asked to decide how to divide 10 Euros into two parts, one for them and the other one for their partner.

Before conducting the game, participants were informed that they were playing with actual money, which was to strengthen their motivation to make real decisions. They were informed that one point earned in the game was to be exchanged for 1.5 cents of Euro. To avoid possible influences of previous reciprocation, they were told that on each trial they were going to play with a different partner, who was never the same across the game.

The initial amount of the proposer was always 10 Euros, and the split proposed to the responder was presented in the middle of the screen. To every participant, the same set of splits including two kinds of fair offers (5/5, 4/6) and three different kinds of unfair offers (1/9, 2/8, 3/7) was presented. These types of offers match the range of offers that humans normally propose in the role of the proposers in the UG.

In total, participants received 128 offers and they had to accept or reject each of them by pressing the number 1 or 2 (counterbalanced across participants) of the computer keyboard. If they accepted the offer, their part of the split was added to their earnings and their partner for the trial received the other part. If they rejected the offer, no transaction was carried out.

Additionally, each offer was preceded by a word. Participants were told that the words represented personal descriptions of their partner and that these characteristics had been obtained through questionnaires completed by the same participants that made the offer proposals in previous experiments. The same words were presented in a different random order to each participant. Half of the words had a positive valence and the other half had a negative one (see **Table 1**). In reality, the valence of the words was not related in any way to the type of offer, as each word was followed equally often by fair and unfair offers.

To test whether the uncertainty of the context influenced decision-making in the classic version of the UG, the whole task consisted of two blocks (order counterbalanced across participants) differing in the information provided to participants. In the *certain* block, as in the original game, participants knew which part of the split would be added to their earnings if they accepted the offer. Therefore, for each split the two numbers were presented in different colors (green and red) and participants were told which of the colors corresponded to them (counterbalanced across participants). Accordingly to the range of offers normally proposed by humans in the game, the participants' part of the split always consisted in the smaller or equal (5/5 offer) number. In the *uncertain* block, in contrast, participants did not know which part of the split would be added to their earnings, as both numbers were presented in black. Therefore they lacked that part of the information.

The experiment was conducted using a PC running E-Prime software (Schneider et al., 2002). Each trial (see **Figure 1**) started with a fixation cross presented for 1500 ms (+ ; 0.4˚) in the center of the screen. Following this, the word (average 2.5˚) was displayed for 200 ms, and then, the fixation point was presented for another 700 ms. Subsequently, the offer (1.5˚), consisting of two numbers separated by a slash symbol, appeared in the center of the screen until the participant made the response. Following the decision of the participant, the next trial began. The whole experiment consisted of 128 trials and had an approximate duration of 10 min.

#### **RESULTS**

The acceptance rate measured in percentage of accepted offers was analyzed by a 2 (uncertainty: uncertain vs. certain) × 2 (offer:

fair vs. unfair) × 2 (valence of the word: positive vs. negative) multifactorial ANOVA.

On average participants accepted 62.1% of all offers across the experiment (**Figure 2**). There was a main effect of the fairness of the offer (*F*1,35 = 86.25, *p* < 0.001), as fair offers were accepted more often (*M* = 85%, *SE* = 17%) than unfair ones (*M* = 39%, *SE* = 25%). Decisions were also influenced by the valence of the word (*F*1,35 = 33.64, *p* < 0.001). Offers were accepted more often when they were preceded by positive words (*M* = 74%, *SE* = 16%) than by negative words (*M* = 50%, *SE* = 23%). **Table 1** shows the acceptance rates of offers separately for each trait-valenced word.

In addition, a significant interaction between uncertainty and the fairness of the offer was found (*F*1,35 = 12.38, *p* < 0.01). In the certain condition, the effect of the offer was higher (51%) than in the uncertain condition (39%). Furthermore, unfair offers were accepted more often in the uncertain (44%) than in the certain context (35%; *F*1,35 = 9.65, *p* < 0.01), whereas there was no significance difference between the acceptance rate of fair offers in both contexts (83 vs. 86%, *F*1,35 = 3.13, *p* = 0.086). Finally, as predicted there was a significant interaction between uncertainty and valence of the word (*F*1,35 = 8.01, *p* < 0.01). The effect of the valence of the word was higher in the uncertain (28%) than in the certain condition (20%).

In the analysis described above all offers are included (5/5, 4/6, 3/7, 2/8, and 1/9) for both certain and uncertain contexts. The experiment included the fair offer 5/5 to mimic the range of offers

normally proposed in the classic UG. Nevertheless, it is clear that when facing this offer participants knew that they would get five points, both in the certain and uncertain blocks. Therefore, we conducted a second analysis excluding the 5/5 offers to ensure that our main results were not affected by this. This analysis replicated all previous results.

### **DISCUSSION**

The results of Experiment 1 confirm our initial hypotheses. First of all and in agreement with typical findings in the UG (Camerer, 2003), the fairness of the offer influenced participants' choices in a major way as fair offers were accepted more often than unfair offers.

With regard to the main question of interest of this study, our results show that trait-valenced words influence decision-making in the classic UG in which the decisions of participants influenced how much money they earned. Offers preceded by positive traitwords were accepted more often than those preceded by negative trait-words. Crucially, we show that this bias exists in a classic UG in which long-term effects were not present, as participants were interacting only once with a partner not personally known to them. Furthermore, participants in our study knew that their aggregate decisions of each trial determined the amount of money that they were going to earn. Therefore, they were motivated to take their decisions seriously. However, their choices of accepting or rejecting an offer were influenced by the information they had

about their respective partners on each trial. Thus, they did not act purely rational, but took into account what they knew about the other person to accept or not the money. The social information biased choices, although objectively trait-valenced words and offers were not associated, insofar as positive and negative words were paired with the same set of offers. Due to such lack of association between words and offers, a learning effect can be excluded.

Two aspects might have influenced participants' reactions in our study using the classic UG. First, it is possible that the subjective perception of fairness for all kinds of offers was biased by the social information provided. Offers made by a negatively described person might have been perceived as less fair than those made by a positively described person. Research on the UG suggests that responders in the UG reject unfair offers due to the perceived unfairness and the negative emotions arising from this perception (Pillutla and Murnighan, 1996; Sanfey et al., 2003). When providing a negative description, responders' attention may have focused on the negative aspects of the offer (e.g., the proposer assigns more to himself than to me) rather than on the possible gains, as they were already expecting an unfair offer. Additional to and in congruence with biased subjective perception, negative emotional reactions may have been stronger in this condition, which would have led to lower acceptance rates. In the future, it could be useful to include emotional measures such as skin conductance, to directly evaluate the role that emotional reactions may play in the current paradigm.

Another accompanying explanation places the bias in a later stage of decision-making. From this perspective, offers would be perceived in the same manner, and the biasing effect would take place afterward, during the decision stage to punish the partner or not. Participants punished partners tied to a negative description more than those associated to positive characteristics. This explanation is consistent with real life experience, as people normally behave more nobly toward friendly persons, even when the chances of meeting the same person again are unlikely. We will take a closer look at the question of how to decide between these two approaches in the general discussion.

In addition to the offer fairness and the valence of the words, we manipulated the certainty of the context (uncertain vs. certain) as a third independent variable. In the uncertain context responders lacked the information about which part of the offer was assigned to them. Therefore, they could not judge whether the offer was advantageous for them or not, which is of particular relevance in the face of unequal splits.

We found that offer fairness interacted with uncertainty, as unfair offers were accepted more frequently in the uncertain than in the certain context, while the acceptance rate of fair offers did not differ in both conditions. The higher acceptance rate of unfair offers in the uncertain condition was not predicted. However, the limited possibility of objectively judging offers as convenient or inconvenient and thus, a possible lower arousal of negative emotions in the face of unfair offers, might explain this effect (see also Pillutla and Murnighan, 1996). It is also possible, that in the uncertain context responders simply feared the rejection of a convenient split and therefore accepted more offers consisting of an unequal split.

Finally and as predicted, uncertainty modulated the weight of the social information. The influence of the words was not restricted to uncertain situations, but it was higher in this context. As responders were not able to judge the convenience of the offer in the uncertain context, they might have weighted the information they received more highly and used it to generate expectations about the offer.

The results clearly show that the valence of the information about the proposer influences decisions made by participants in the classic UG. However, an alternative and less appealing explanation of our results could be that the presentation of positive and negative words primed participants with a valence-consistent mood in an automatic manner. To rule out that this is the case in our findings, we conducted Experiment 2.

A possible control experiment to study the automatic effect that valenced words may have on acceptance rates would be to use non-trait-words as primes, matched in valence and arousal ratings to the words used in Experiment 1. However, such a control would entail a change both in the words and the *instructions* (as words can no longer be attributed to the personal characteristics of the partners), which would make the interpretation of the results difficult. An alternative option, which is the one we chose for our control study, is to use exactly the same items but to change the instructions regarding the social meaning of the words in the game.

## **EXPERIMENT 2**

We conducted a second experiment to rule out the possibility that the impact of the words in Experiment 1 could be explained by an "automatic" priming effect driven by the mere presentation of words with high positive and negative connotations.

In Experiment 2 we told participants that the computer presented the words at random before each offer, and thus that they had nothing to do with the person who initially proposed the offer. Except for this minor change in the whole set of instructions, the experimental design was exactly the same as in Experiment 1. It was hypothesized that if the mere presence of the words, regardless of their association to the partners in the game, generates priming, results from Experiment 2 should be quite similar to those of Experiment 1. In contrast, if the key manipulation is the association of the words with the personality characteristics of the people we interact with, decision-making should not be influenced by words in the current experiment.

## **MATERIALS AND METHODS**

**Participants**

Thirty-six native Spanish-speaking students from the University of Granada participated in the study (27 female, 18–38 years, average 23.3). All participants signed a consent form approved by the Department of Experimental Psychology of the University of Granada. In exchange for their participation in the study, participants were paid. The payment amount depended on their earnings during the game task and ranged from about 3–6 Euros.

#### **Stimuli and procedure**

Stimuli and Methods were the same as in Experiment 1 with the exception that participants received a different instruction regarding the meaning of the words preceding each offer. They were told that the words were randomly presented by the computer program and that they were in no manner linked neither to the offers, nor to their partners in the game. As in Experiment 1, we manipulated the variables uncertainty (uncertain vs. certain), offer (fair vs. unfair), and valence of the word (positive vs. negative).

#### **RESULTS**

On average participants accepted 65.8% of all offers across the experiment (**Figure 3**)*.* There was a significant main effect of offer fairness (*F*1,35 = 67.84, *p* < 0.001), as fair offers were accepted more often (*M* = 89%, *SE* = 18%) than unfair ones (*M* = 42%, *SE* = 28%). The uncertainty of the context also influenced participants' choices. Offers in the uncertain context were accepted more often (*M* = 69%, *SE* = 16%) than in the certain context (*M* = 63%, *SE* = 18%), *F*1,35 = 8.82, *p* < 0.05. Furthermore, there was a significant interaction between these two variables (*F*1,35 = 11.13, *p* < 0.01). The effect of the offer in the certain condition was higher (55%) than in the uncertain condition (40%). Again, whereas the acceptance rate of fair offers did not differ significantly in both contexts (89 vs. 90%, *F*1,35 = 0.14, *p* = 0.713), unfair offers were accepted more often in the uncertain (49%) than in the certain context (35%; *F*1,35 = 16.38, *p* < 0.001). There was neither a significant main effect of the valence of the word, nor any other interaction (all *p*s > 0.28).

#### **DISCUSSION**

As it was the case in Experiment 1, in Experiment 2 participants' choices were influenced by the fairness of the offer. In contrast to

Experiment 1, however, participants' choices were not affected by the words and there was no interaction between block uncertainty and word-valence. Hence, the mere presentation of valenced words does not prime action tendencies that lead participants to modify their acceptance decisions. This result strongly suggests that the key element for such biasing to occur is the link of the words to social characteristics of the partners in the game.

It might be argued that given that the instructions clearly told participants that the words were not related with the subsequent offers they did not pay attention to this information, which would explain the lack of effect of the words on acceptance decisions. Withdrawing attentional resources is indeed a normal consequence of deeming something as irrelevant to the task at hand (e.g., Driver, 2001), and thus it is likely that this took place in our experiment. Future studies should test the level of processing accrued by irrelevant words in this procedure by means of both explicit and more implicit memory tests (e.g., Ruz and Fuentes, 2009) as well as with brain imaging techniques (e.g., Ruz et al., 2005).

## **GENERAL DISCUSSION**

The aim of the present study was to determine the influence of social information about people with whom we interact in a classic UG and to test whether such impact was modulated by the uncertainty of the context. We showed that positive and negative trait-words influenced acceptance rates to the same set of offers in a one-shot UG with unknown partners in which participants earned money, and that this effect was higher in the uncertain context. As Experiment 2 showed, the key aspect for the influence

of social information was the link between trait-valenced words and the characteristics of the proposer.

Our results extend the findings of the study of Ruz et al. (2011), which used a modified version of the UG instead of the original game. Responders in our current experiment received a single offer from different anonymous partners and thus no longterm strategies can explain their behavior, as it might be the case when playing several rounds of the UG with the same partner or when interacting with a friend (Campanhã et al., 2011). As participants accumulated money with each accepted offer, their final payment depended directly on their choices in the game. Showing that, nevertheless, social information and offer fairness influenced acceptance rates of such monetary offers, the present study nicely complements recent reports using the UG without money payment (Campanhã et al., 2011; Marchetti et al., 2011; Ruz et al., 2011). Furthermore and in addition to previous studies, the manipulation of uncertainty allowed us to test how this variable modulates the impact of the social information.

It could be argued, however, that the payment associated to each offer was too small, as one point earned in the game was exchanged for 1.5 cents. Given the small amount participants were able to earn with each accepted offer, they still might not have been motivated to take the decisions seriously, and this could have led them to take social information into consideration. Perhaps, if the outcome had been higher they might have weighted their own and the others' outcome in each trial more. However, previous studies using the UG showed that this explanation is unlikely, as it is commonly found that raising the stakes to a large amount has only a weak impact on rejection rates (Camerer, 2003). Although it is always possible to claim that bigger amounts of money could obliterate a positive result, our results show that by using payments within common ranges used in the UG we can prove that people take into account their impressions of others to accept or reject their monetary offers.

Other limits regarding the experimental setting nevertheless persist. To assure an adequate control, the experiment was conducted in the laboratory and monetary offers were presented through the computer. In addition, every participant interacted only once with each partner to avoid effects of reciprocity. One of the drawbacks of this artificiality is the caution it imposes regarding the generality of the effects to less artificial, daily life situations. Note, however, that our design replicates basic phenomena found in many previous studies, such as the rejection of unfair offers even when they are beneficial to the participants in economic terms.

On the other side, the experimental setting of the current study provides clear benefits for the design of a research study to explore the neural mechanisms underlying the biasing effect of social information. Previous studies employing the ERP methodology (Boksem and De Cremer, 2010; Campanhã et al., 2011) suggest that fair and unfair offers are perceived differently, and that this effect takes place at a relatively early stage of processing.Campanhã et al.'s (2011) ERP results further indicate that the medial frontal negativity responds to social distance, as its polarity is reversed

#### **REFERENCES**

Behrens, T. E.,Woolrich, M. W.,Walton, M. E., and Rushworth, M. F. (2007). Learning the value of information in

an uncertain world. *Nat. Neurosci.* 10, 1214–1221.

Boksem, M. A. S., and De Cremer, D. (2010). Fairness concerns

when the offer is made by a close friend rather than an unknown proposer. In the future, it would be interesting to further explore the biological basis of inter-personal decisions and to analyze how the inclusion of social information modulates these mechanisms. To date, it remains unclear whether the effects are driven by a rather automatic kind of processing, which leads to a different perception of the offer after the presentation of valenced information, or by a controlled process at a later stage of decision-making. Further studies using electrophysiological methods could provide a closer look on the cognitive processes occurring when perceiving the offer, dependent on the valence of the social information, and, thus, could help exploring at which level of information processing social information affects decision-making.

Additional research is also needed to explore the relation between the influence of social information on responders' choices and a possible expectancy or framing effect. Sanfey (2009) showed that expectations of fairness have a strong influence on responders' decisions in the UG. The importance of framing effects is also discussed in Marchetti et al. (2011) to explain that positive and negative descriptions of the proposer bias the acceptance of offers in the UG. This study used psychological attributes very closely linked to fairness expectations (generous vs. selfish). Our results complement their findings, showing that different positive and negative descriptions of the partner in the UG influence decisionmaking. This influence exists although objectively the design of the task does not associate the trait-valenced words with the offers, as positive and negative items are presented equally often with fair and unfair offers. Future studies should test expectation generation more directly to help fully understand the relation between social information and a framing effect.

Other future lines of research could include the combination of an emotional introduction and the presentation of social information. While induced negative emotions are correlated with higher rejection rates of unfair offers (Harlé and Sanfey, 2007), in the present study the same effect was shown regarding negative social information. It would be interesting to link both approaches, testing whether the introduction of positive or negative moods biases the impact of social information.

Overall, our results show once again that human behavior is motivated by more than pure income maximization. The opinion we hold regarding the moral characteristics of people with whom we interact modulates our tendencies to accept or reject the money that they offer us. The uncertainty of the context furthermore has an effect on how we make use of this information. The present study provides further insights into the complex decision-making processes during inter-personal interactions and gives way to new questions for future research using economic games.

#### **ACKNOWLEDGMENTS**

Financial support to this research came from the Spanish Ministry of Science and Innovation through a "Ramón y Cajal" research fellowship (RYC-2008-03008) and grants PSI2010-16421 and SEJ2007.63247 to María Ruz.

predict medial frontal negativity amplitude in ultimatum bargaining. *Soc. Neurosci.* 5, 118–128.

Camerer, C. F. (2003). *Behavioral Game Theory: Experiments in Strategic Interaction*. Princeton: Princeton University Press.


Marchetti, A., Castelli, I., Harlé, K. M., and Sanfey, A. G. (2011). Expectations and outcome: the role of proposer features in the ultimatum game. *J. Econ. Psychol.* 32, 446–449.

Nash, J. F. (1950). The bargaining problem. *Econometrica* 18, 155–162.


attractiveness and gender on ultimatum game decisions. *Organ. Behav. Hum. Decis. Process.* 79, 199–215.

van't Wout, M., Kahn, R. S., Sanfey, A. G., and Aleman, A. (2006). Affective state and decision-making in the ultimatum game. *Exp. Brain Res.* 169, 564–568.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 March 2012; accepted: 20 June 2012; published online: 06 July 2012. Citation: Gaertig C, Moser A, Alguacil S and Ruz M (2012) Social information and economic decision-making in the ultimatum game. Front. Neurosci. 6:103. doi: 10.3389/fnins.2012.00103*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Gaertig , Moser, Alguacil and Ruz. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Toward an affective neuroscience account of financial risk taking

## **Charlene C.Wu<sup>1</sup>\*, Matthew D. Sacchet 1,2 and Brian Knutson1,2**

<sup>1</sup> Department of Psychology, Stanford University, Stanford, CA, USA

<sup>2</sup> Neurosciences Program, Stanford University School of Medicine, Stanford, CA, USA

#### **Edited by:**

Kerstin Preuschoff, École Polytechnique Fédérale de Lausanne, Switzerland

#### **Reviewed by:**

Bruno B. Averbeck, National Institute of Mental Health, USA Shunsuke Kobayashi, Fukushima Medical University, Japan

#### **\*Correspondence:**

Charlene C. Wu, Department of Psychology, Stanford University, 450 Serra Mall, Jordan Hall, Building 420, Stanford, CA 94305, USA. e-mail: charlene.wu@stanford.edu

To explain human financial risk taking, economic, and finance theories typically refer to the mathematical properties of financial options, whereas psychological theories have emphasized the influence of emotion and cognition on choice. From a neuroscience perspective, choice emanates from a dynamic multicomponential process. Recent technological advances in neuroimaging have made it possible for researchers to separately visualize perceptual input, intermediate processing, and motor output. An affective neuroscience account of financial risk taking thus might illuminate affective mediators that bridge the gap between statistical input and choice output. To test this hypothesis, we conducted a quantitative meta-analysis (via activation likelihood estimate or ALE) of functional magnetic resonance imaging experiments that focused on neural responses to financial options with varying statistical moments (i.e., mean, variance, skewness). Results suggested that different statistical moments elicit both common and distinct patterns of neural activity. Across studies, high versus low mean had the highest probability of increasing ventral striatal activity, but high versus low variance had the highest probability of increasing anterior insula activity. Further, high versus low skewness had the highest probability of increasing ventral striatal activity. Since ventral striatal activity has been associated with positive aroused affect (e.g., excitement), whereas anterior insular activity has been associated with negative aroused affect (e.g., anxiety) or general arousal, these findings are consistent with the notion that statistical input influences choice output by eliciting anticipatory affect. The findings also imply that neural activity can be used to predict financial risk taking – both when it conforms to and violates traditional models of choice.

**Keywords: neuroeconomics, neurofinance, FMRI, accumbens, striatum, insula, activation likelihood estimation, meta-analysis**

## **INTRODUCTION**

Imagine a world where people act as computers, consistently taking in, analyzing, and responding to all of their sensory impressions. These "rational" actors should not show volatile and inconsistent changes in preferences, and so their future choices should be predictable based on their past behavior. Such a world may be hard to imagine, because it is not the world we live in. Instead, people often show sudden, pronounced, and inconsistent changes in choice. For instance, although most people will never win the lottery or lose a limb, the same individuals will often pay a high premium both for a tiny chance to hit the jackpot as well as to compensate for the unlikely possibility of dismemberment. To explain financial risk taking, decision theorists have either appealed to the objective statistical properties of financial options or to the subjective emotional experience of individuals. Do these distinct accounts conflict with or complement each other, and can they be reconciled?

#### **ECONOMIC AND FINANCE MODELS OF RISK TAKING**

Traditional economic models assume that people seek to maximize value. Blaise Pascal and Pierre de Fermat historically concluded that the expected value of uncertain gambles could be calculated by multiplying the magnitude of the gamble outcomes by their probability. Thus, they mathematically defined "expected value" as the mean (or the first statistical moment) of repeated outcomes. In economics,*expected value* (and its close cousin *expected utility*) provide a foundational guide to choice by providing a common metric that individuals can use to compare different and diverse financial options (von Neumann and Morgenstern, 1944). One implication of preferences for expected value is that people should not only prefer gambles with the best outcomes, but also those with more chances to obtain a good outcome. Beyond expected value, financial theorists have additionally and separately considered the role of risk, which can be mathematically defined as variance (or the second statistical moment) of repeated outcomes (Markowitz, 1952). Resulting *mean-variance* financial models further assume that while people are attracted to expected value, they are instead repelled by risk. One implication of preferences against risk is that people should prefer gambles with relatively steady outcomes over those with more variable outcomes.

Behavioral research, however, suggests that neither expected value nor mean-variance models fully account for individuals' financial risk taking (Edwards, 1954). As a result, some theorists have suggested that anomalies in choice (e.g., the lack of diversity in investors' portfolios) might result from preferences for large yet

improbable outcomes, which has been mathematically defined as skewness (or the third statistical moment; Mitton and Vorkink, 2007). One implication of preferences for skewness (according to some theories) is that people might prefer "long shot" gambles (e.g., those with high magnitude but low probability outcomes) over others. Despite some behavioral evidence that skewness can influence preferences (Kraus and Litzenberger, 1976; Coombs and Lehner, 1981), either by enhancing (Menezes et al., 1980) or interacting with risk (Alderfer and Bierman, 1970; Chiu, 2005), only a few models of financial risk taking consider skewness. For instance, cumulative prospect theory (Tversky and Kahneman, 1992) and rank-dependent utility models (Quiggin, 1982) have attempted to account for skewness by overweighting large but unlikely positive and negative outcomes. In doing so, however, these models sacrifice their ability to explain tolerance for variance (Levy and Levy, 2004). Although most economic theories do not account for the influence of skewed outcomes, skewed outcomes may nonetheless influence choice, at both the individual and the market levels (Patton, 2004). Thus, while traditional economic and finance theories consider the influence of mean and variance on risky choice, most remain agnostic about the influence of higher order statistical moments such as skewness. By implication, a theory that accounts for individuals' preferences for skewness in addition to mean and variance might generate more accurate predictions about risky financial choice.

### **EMOTION AND RISKY CHOICE**

If people base risky financial choices solely on statistics (e.g.,mean, variance, skewness), then all individuals should show similar choices, generating predictable market movements. Psychological theorists, however, have argued that financial choices likely result from multicomponential processes that generate heterogeneous choices. If multicomponential processes drive financial risk taking, then those processes may unfold over time and be influenced by factors other than statistical moments.

Early economic theorists suspected that emotions influence choice. Smith (1759) argued that behavior was determined by a struggle between the "passions" and an "impartial spectator." The passions included emotions such as fear and anger, as well as motivational feeling states arising from self- or other-regarding interests. Smith argued that although behavior may be influenced by passions, individuals can overcome their impulses by observing their actions from the perspective of an "impartial" outsider. Due to a subsequent emphasis on rational decision-making (partly encouraged by von Neumann andMorgenstern's work on expected value), interest in the influence of the passions diminished.

More recently, although traditional economic theorists have endorsed the rationally grounded "Efficient Markets Hypothesis" (Samuelson, 1965; Fama, 1970), unpredicted and rapid rises and crashes of the market valuation of technology and housing sectors have raised new questions about investor rationality. Critics of the Efficient Markets Hypothesis have contended that investors consistently exhibit irrational tendencies including overconfidence (Barber and Odean, 2001; Gervais and Odean, 2001), loss aversion (Kahneman and Tversky, 1979; Shefrin and Statman, 1985; Odean, 1998), herding (Huberman and Regev, 2001), psychological accounting (Tversky and Kahneman, 1981), miscalculation

of probabilities (Lichtenstein et al., 1981), and regret (Bell, 1982; Clarke et al., 1994). These "irrational" biases have been attributed to psychological factors with emotional overtones – including fear, greed, and other affective reactions to price fluctuations and shocks to wealth. In an attempt to explain individual and market anomalies, an expanding field of research has begun to examine links between emotion and "irrational" decision-making (Loewenstein, 2000).

Beyond the notion that emotion acts peripherally to undermine choice, some theorists have proposed that affect can play an even more central role by providing a "common currency" that allows individuals to compare and choose between different options (Peters et al., 2006). Despite the difficulty of measuring affect, scientists have had some success by examining associations between different affective reactions and choice (Mellers, 2000). Although most of this research has focused on "consequential" affect, which arises in response to choice outcomes, some have additionally argued for the importance of "anticipatory" affect, which occurs *prior* to choice (Loewenstein et al., 2001).

To assess affect, behavioral researchers have primarily relied upon self-reported experience. For instance, investigators can compare a single individual's affective reactions to different stimuli (e.g., gambles) in two dimensions (e.g., valence on a continuum from bad to good, and arousal on a continuum from not aroused to aroused). Valence and arousal ratings can then be mean-deviated across stimuli within an individual and mathematically rotated through affect space (by 45˚) to derive indices of positive and negative arousal (Knutson et al., 2005). Using these and related methods, investigators have shown that anticipation of uncertain monetary gains elicits positive arousal, whereas anticipation of uncertain monetary losses elicits negative arousal – even before outcomes are revealed, and when measured either online during anticipation or retrospectively (Samanez-Larkin et al., 2007; Nielsen et al., 2008).

Since anticipation of uncertain gains or losses elicits selfreported affect, this might subsequently influence risky choice. Unfortunately, anticipatory affect is difficult to assess because most affective self-reports are retrospective (and thus prone to memory and other biases) and online probes of affect may change the very nature of the choice being made (e.g., introducing reflection, distraction, delays, and other biases into the decision process). Ideally, investigators could also collect online physiological probes of anticipatory affect in order to validate and augment self-report measures. Fortunately, advances in neuroimaging at the end of the twentieth century may provide these probes.

#### **NEURAL TARGETS**

In initial attempts to link physiological measures of affect to financial risk taking, researchers collected peripheral physiological measures (including skin conductance, blood volume pulse, heart rate, muscular tone, respiration, and body temperature) from financial traders at work. The investigators observed increased physiological reactions during periods of market volatility, and reported greater increased physiological reactions to market volatility in less experienced traders (Lo and Repin, 2002). Subsequent findings suggested that the strength of physiological reactions correlated with poor trading performance (Lo et al., 2005). Contrary to the notion that

emotions play no role in financial risk taking, these findings suggested that market events correlated with both self-reported and physiological reactions, even in experienced professional traders. The correlational nature of these findings, however, could not establish whether financial events caused the arousal, or whether arousal might reciprocally influence financial choice.

Advances in the temporal and spatial resolution of neuroimaging techniques (such as functional magnetic resonance imaging or FMRI) have enabled researchers to visualize changes in brain activity as individuals anticipate and make financial choices. Critically, these advances allow investigators to examine changes in neural activity in anticipation of choice. Thus, investigators can temporally capture neural responses to statistical properties of financial options before outcomes are revealed. Enhanced temporal resolution also raises the possibility of using anticipatory neural activity to predict choice. Advances in spatial resolution also matter, since FMRI allows investigators to probe activity in deep subcortical as well as cortical circuits. Based on evolutionary reasoning, while more recently evolved cortical circuits may play critical roles in the representation of language and numeric symbols, more ancient subcortical circuits that share greater homology across mammalian species may play a more prominent role in emotional and motivational functions that can promote immediate survival (MacLean, 1990). Specifically, decades of brain stimulation in animals suggest that animals will work to the exclusion of all other rewards to stimulate subcortical regions that lie along the ascending mesolimbic dopamine pathway, extending from the ventral tegmental area of the midbrain through the lateral hypothalamus to ventral striatal regions (including the nucleus accumbens or NAcc) and medial and orbital prefrontal cortices (MPFC; Olds and Fobes, 1981). In contrast, animals will work equally hard to avoid stimulating other subcortical pathways that extend from the periaqueductal gray of the midbrain up through the stria terminalis and the medial hypothalamus to the lateral amygdala, and possibly the anterior insula (Panksepp, 1998). Based on its subcortical

spatial resolution, FMRI could allow investigators to test for the involvement of affect not only in choices linked to immediate survival, but also more abstract choices related to financial risk taking.

To link activity in these deep brain circuits to affective experience and ultimately choice, we have outlined an anticipatory affect model (Knutson and Greer, 2008). The model posits that uncertainty elicits increased aroused affect, while potential gains versus losses elicit positive versus negative affect. Since most future events are subjectively uncertain, potential gains should elicit positive arousal (e.g., feelings like excitement) as well as correlated neural activity in the NAcc, but potential losses should elicit negative arousal (e.g., feelings like anxiety) as well as correlated neural activity in the anterior insula. The anticipatory affect model has additional implications for motivated behavior, since the evolved function of positive arousal is to promote approach, whereas the function of negative arousal is to promote avoidance (**Figure 1**).

Most risky financial propositions (e.g., gambles, stocks) require concurrent assessment of uncertain gains and uncertain losses. According to the anticipatory affect model, if positive arousal increases, uncertain gains should appear more prominent, which should lead people to approach the risk (all else being equal). On the other hand, if negative arousal increases, uncertain losses should appear more prominent, which should lead people to avoid the risk. Consistent with this account, in an initial FMRI study that used neural activity to predict financial risk taking, anticipatory NAcc activity predicted increased financial risk taking, whereas anticipatory anterior insula activity predicted decreased financial risk taking (Kuhnen and Knutson, 2005).

Although initially inspired by a combination of animal brain stimulation (Panksepp, 1998) and human neuroimaging findings, the anticipatory affect model shares features with earlier "somatic marker"and"risk as feelings"models, both of which posit that anticipation of uncertain outcomes can generate emotional arousal (Bechara et al., 1996; Loewenstein et al., 2001). Critically,

**FIGURE 1 | An anticipatory affect model (adapted from Knutson and Greer, 2008).** An incentive cue for uncertain future outcome first elicits brain activation in at least two brain regions (NAcc and anterior insula) associated

with anticipatory affect (positive arousal and negative arousal, respectively). The balance of activation in related circuits then promotes either approach toward or avoidance of risk.

however, the anticipatory affect model does not require mediation through bodily sensations (i.e., requiring only brain activity, unlike somatic marker accounts), and specifically distinguishes anticipatory positive arousal from negative arousal (which have opposite effects on subsequent approach versus avoidance behavior, unlike the risk as feelings model). Finally, the anticipatory affect model links positive and negative arousal to activity in distinguishable neural circuits, implying that neuroimaging data could be used to directionally predict risky choice (e.g., Kuhnen and Knutson, 2005).

Different statistical moments of financial options might influence either the same or different neural circuits. The anticipatory affect model implies that distinct statistical moments should exert different but overlapping influences on affect and associated neural activity. First, financial options with high means involve large potential gains, and so should elicit positive arousal and correlated NAcc activity. Second, financial options with high variance involve both large potential losses and gains, which should elicit negative arousal and correlated anterior insula activity as well as positive arousal and correlated NAcc activity. Third, financial options with high (overall) skewness involve even larger potential losses and gains, which should elicit even more negative arousal and correlated anterior insula activity, as well as positive arousal, and correlated NAcc activity. However, positive skewness and negative skewness might have divergent impacts, since options with high positive skewness involve large potential gains, which should elicit positive arousal and correlated NAcc activity, while options with high negative skewness involve large potential losses, which should elicit negative arousal and correlated anterior insula activity. By implication, since anticipatory affective circuits are especially sensitive to the best or worst potential outcomes, they may de-emphasize probability and other considerations that require simulation or integration of many potential outcomes over time (and which may rely more on prefrontal circuits such as the MPFC).

Consistent with the anticipatory affect model, previous selfreported affect findings suggest that anticipating the outcomes of higher mean gambles elicits greater positive arousal (Knutson et al., 2005). Anticipating the outcomes of higher variance gambles (with equal mean) elicits both greater negative arousal and positive arousal. Additionally, anticipating the outcomes of positively skewed gambles (with equal mean and variance) elicits greater positive arousal, whereas anticipating the outcomes of negatively skewed gambles (with equal mean and variance) elicits greater negative arousal (**Figure 2**; Wu et al., 2011). But beyond self-reported affect, do patterns of neural activity also align with the anticipatory affect model? Since a number of recent studies have investigated the impact of financial statistical moments on FMRI activity, we now survey their collected findings.

### **PRESENT AIMS**

Although earlier reviews have considered how financial risk influences neural activity (Knutson and Bossaerts, 2007; Mohr et al., 2010a), none have integrated both economic and psychological accounts by explicitly linking different statistical moments of financial options to neural responses. The purpose of this metaanalysis was to examine whether different statistical moments of financial options (i.e., mean, variance, and skewness) recruit distinct or overlapping neural circuits implicated in anticipatory affect, and to explore implications of these findings for subsequent choice. To address these aims, we conducted a quantitative meta-analysis of FMRI studies of statistical moments on financial risk using the activation likelihood estimation (ALE) method (Eickhoff et al., 2011, 2012; Turkeltaub et al., 2012). Based on the anticipatory affect model, we predicted that distinct statistical moments would elicit overlapping patterns of activation, such that moments involving large gains (high mean, high variance, positive skewness) should increase activity in the ventral striatum (including the NAcc), and moments involving large losses (high variance, negative skewness) should increase activity in the anterior insula.

## **MATERIALS AND METHODS**

**STUDY SELECTION**

We reviewed FMRI studies of financial risk taking. Studies were identified for meta-analysis via a search of the PubMed database using key phrases "mean" OR "reward" OR "expected

value" OR "variance" OR "risk" OR "uncertainty" OR "skewness" AND "finance" OR "monetary" AND "human" AND "FMRI." This search (performed on July 10, 2012) identified 248 studies. We specifically searched for FMRI studies that used monetary incentives to manipulate one or more of the first three statistical moments of interest (i.e., mean, variance, skewness). We further identified recent review papers about risky choice that explicitly addressed neural correlates of financial risk taking (Knutson and Bossaerts, 2007; Mohr et al., 2010a). All studies found through the database search or cited by the review papers underwent a selection process. Inclusion criteria were: (1) assessment of healthy young adults (i.e., between 18 and 60 years); (2) acquisition of whole brain FMRI data; (3) availability of peak activation coordinates from group activation tables; (4) information about the probability of uncertain outcomes was provided to participants (as opposed to ambiguity); (5) at least one of the three statistical moments of interest (i.e., mean, variance, or skewness) were objectively manipulated, independent of subjective interpretations (i.e., risk tolerance/aversion measures).

These inclusion criteria were chosen to ensure that results would generalize to the population of healthy young adult humans. Several studies suggest that aging may alter brain structure and function (Cabeza et al., 2005). Furthermore, older adults often show qualitatively different activation patterns than young adults (Park et al., 2004). Therefore, this meta-analysis focused on studies that investigated risk processing in younger healthy adults (Criterion 1). Because some studies focus on specific brain regions, they may not report whole brain results. Partial findings, however, impede the detection of unexpected activations in unscanned or unreported brain regions, so studies were excluded if they acquired or reported only partial brain data (Criterion 2). As the ALE approach requires activation foci, only studies that reported peak activation coordinates of group statistical maps were included (Criterion 3). Because risk is often conceptually distinguished from ambiguity – a form of uncertainty in which probabilities are unknown – only studies in which probabilities were known or estimated by subjects were included (Criterion 4). Since the focus of this metaanalysis was to examine neural responses to statistical moments of uncertain financial options, only studies that systematically varied mean, variance, and/or skewness (as opposed to linear probability) of monetary incentives were included (Criterion 5). Studies evaluated for variance and skewness were only included if lower order moments (e.g., mean, mean and variance) were held constant. If studies manipulated multiple moments simultaneously, the lowest appropriate manipulated moment was included in the meta-analysis (e.g., studies that manipulated variance without controlling for mean were included only for mean).

Activation maps were constructed for three distinct contrasts. For the mean map, we included contrasts of neural activity during processing of monetary incentives with high versus low mean. For the variance map, we included contrasts of neural activity during processing of monetary incentives with high versus low variance (but which controlled for mean). For the skewness map, we included contrasts of neural activity during processing of

monetary incentives with high (either positive or negative) versus low skewness (but which controlled for variance and mean).

Activation foci coordinates for contrasts in the 28 studies that met inclusion criteria were submitted to ALE meta-analyses (**Table 1**). Of these, 21 contrasts were included in the mean map, 10 in the variance map, and 4 in the skewness map. Three studies that separately modeled mean and variance were included in both maps, and 2 studies that separately modeled variance and skewness were included in both maps. Yacubian et al. (2006) replicated their results in a second sample, thus their replication findings were separately included in the mean map. Symmonds et al. (2011) separately modeled positive skewness and negative skewness in different whole brain analyses, so these results were separately included in the skewness map.

### **Table 1 | Studies included in the ALE meta-analysis, with associated contrasts.**


<sup>1</sup>Whole brain coordinates acquired via personal communication.

3

<sup>2</sup>Modeled positive skewness and negative skewness trials separately. Included a separate replication sample.

#### **ACTIVATION LIKELIHOOD ESTIMATE RATIONALE**

In contrast to behavioral meta-analyses that aim to estimate the effect size of a finding, FMRI meta-analyses aim to identify brain regions, or circuits implicated in certain mental processes (Turkeltaub et al., 2002). Due to this difference in research goals, meta-analytic techniques have been adapted to fit the format of FMRI findings. Specifically, whereas the key results of behavioral studies are test statistics (*p*, *t*, or *z* scores) and effect sizes, test statistics in FMRI studies usually only have meaning when paired together with the information about the location of the effect, often revealed by the location of voxels with the highest test statistics. One frequently used meta-analytic technique that utilizes this spatial information is ALE analysis (Eickhoff et al., 2011, 2012; Turkeltaub et al., 2012). ALE analysis is a quantitative meta-analytic technique that compares activation likelihoods calculated from a group of observed activation foci with a null distribution of randomly generated activation foci. The ALE metaanalytic method provides advantages over traditional label-based meta-analytic methods because it relies upon activation foci coordinates, which show greater reliability across FMRI studies than do anatomical labels.

Meta-analyses were conducted using the ALE algorithm implemented with Ginger ALE software available from www.brainmap.org (Laird et al., 2005). Foci originally reported in Montreal Neurological Institute coordinates were converted to Talairach coordinates using the icbm2tal transformation prior to analysis (Lancaster et al., 2007). In the ALE analyses, each contrast's activation foci are modeled as the peaks of Gaussian functions, the spatial extent of which is dependent on the number of subjects included in the corresponding analysis. The resulting distributions of values (called "activation likelihood estimates") represent the probability of activation occurring in a given voxel (i.e., the ALE values). For the whole brain ALE values, significance was assessed against 5000 sets of randomly distributed foci with a nonparametric statistical permutation test. Statistically thresholded maps were then computed using a false discovery rate procedure that corrected for multiple comparisons across the whole brain [FDR (*q*) = 0.01, minimum cluster size = 100 mm<sup>3</sup> ].

### **RESULTS**

The contrast of neural responses to high versus low mean (studies = 21, foci = 210, subjects = 407) had the highest probability of activating foci in the bilateral NAcc of the ventral striatum. Highly significant foci were also observed in the anterior cingulate cortex, followed by the bilateral anterior insula. Other significant foci were observed in the left red nucleus, thalamus, and putamen (**Table 2**; **Figure 3**).

The contrast of neural responses to high versus low variance (studies = 10, foci = 82, subjects = 164) had the highest probability of activating foci in the left subgenual cingulate cortex and left anterior insula. Significant foci were also observed in the left superior temporal sulcus, left medial prefrontal cortex, right ventral striatum, and right anterior insula.

The contrast of neural responses to high versus low general skewness (studies = 4, foci = 23, subjects = 92) had the highest probability of activating foci in the left NAcc of the ventral striatum.

**Table 2 | ALE of neural foci implicated in processing high versus low mean, variance, and skewness.**


(In Talairach space, x = right-left; y = anterior-posterior, and z = superior-inferior coordinates; predicted peak foci in **bold**).

## **DISCUSSION**

This meta-analysis aimed to determine whether distinct statistical moments of risky financial options (i.e.,mean, variance, skewness) elicit different patterns of neural activity. Rather than recruiting either the same or completely distinct circuits, statistical moments activated overlapping circuits implicated in anticipatory affect. Specifically, statistical moments that promised large gains (i.e., high mean, high variance, high skewness) maximally activated the ventral striatum (particularly in the NAcc), whereas moments that threatened large losses (i.e., high variance) maximally activated the anterior insula. The deep subcortical localization of these circuits is noteworthy (as opposed to neocortical structures implicated in symbolic representation and working memory), as it implies that affective rather than cognitive processes play a critical role in financial risk assessment.

Most of the findings were consistent with the anticipatory affect model. Specifically, high versus low mean maximally activated ventral striatum (including the NAcc), high versus low variance maximally activated the anterior insula (and secondarily the ventral striatum), and high versus low skew maximally activated the ventral striatum. However, high versus low mean also activated the anterior insula to a lesser extent. This may be due to the fact that while most studies of higher order moments (e.g., variance and skewness) controlled for lower order moments (e.g., mean), studies of lower order moments typically did not control for higher order moments. Because increasing lower order moments (e.g., mean) often also increases higher order moments (e.g., variance), studies of lower order moments may inadvertently elicit activity related to higher order moments. Reduced control of higher order

moments in studies of lower order moments might also account for the larger overall number of activation foci observed in the high versus low mean contrast.

Findings for skewness partially conformed to the anticipatory affect model. While general skewness activated ventral striatum (including the NAcc), as predicted, common activation of the anterior insula was not as apparent. The omission is unexpected given that all three surveyed studies of skewness have reported that skewed gambles tend to activate the anterior insula (Burke and Tobler, 2011; Symmonds et al., 2011; Wu et al., 2011). The small number of relevant studies and variability of activation foci in the anterior insula may have precluded a common finding. The anticipatory affect model also specifically predicts that positively skewed gambles will more powerfully activate the ventral striatum, as was found in one study (Wu et al., 2011). However, this prediction could not be evaluated in the context of the meta-analysis because all studies did not provide contrasts for positive versus negative skewness, though this represents an important direction for future research. Finally, some studies modeled statistical moments during the uncertain anticipatory period before gambles were evaluated, whereas others modeled the entire gambling episode from anticipation to outcome. Since the anticipatory affect model is most relevant to the uncertain anticipatory period, it might best predict neural activity that occurs then.

#### **INTEGRATING ANTICIPATORY AFFECT AND FINANCIAL RISK TAKING**

The meta-analytic findings support neither monolithic nor modular views of neural responses to the statistical moments of financial options. Specifically, ascending from mean (lower order) to skewness (higher order moments) neither repeatedly activates all the same regions, nor does it recruit wholly distinct regions at each step. Thus, ordering the findings by objective statistical properties of the options does not yield a coherent framework for predicting associated neural activity (**Table 3**).

Alignment by affective impact, however, reconfigures the statistical moments in a coherent way that generates more consistent predictions about associated neural activity (**Table 4**). Specifically, financial options that involve uncertain large gains are likely to elicit positive arousal (e.g., high mean, positive skewness) and recruit NAcc activity, but financial options that involve uncertain large losses are likely to elicit negative arousal (e.g., high

variance, negative skewness) and recruit anterior insula activity. Reordering these statistical moments by affective impact thus scaffolds a more parsimonious and coherent framework for predicting choice. Thus, statistical moments representing objective financial risk may be translated into subjective feelings of risk indexed by neural circuits associated with affect, which together promote choice. Of course, statistical moments may also influence choice through other neural routes as well. For instance, statistical information might recruit circuits involved in symbolic representation and working memory for numerical computation (e.g., dorsolateral and parietal cortices), or might activate circuits implicated in following habits or rules (e.g., the dorsal striatum and premotor cortex). The current analysis, however, suggests that an affective neuroscience account may provide an initial viable framework both for describing and predicting financial risk taking.

#### **IMPLICATIONS FOR FINANCIAL CHOICE**

While traditional economic (e.g., expected value) and finance (e.g., mean-variance) models can account for a range of choices, other choices elude these models' explanatory reach. For instance, individual choices may be influenced by higher order statistical moments (e.g., skewness, kurtosis) as well as by incidental factors that are not relevant to the choice at hand (e.g., news, weather, nutrition, sleep, etc.). By both encompassing and transcending the explanatory reach of traditional models, an affective neuroscience approach may eventually offer a more comprehensive account of financial risk taking.

Although this meta-analysis focused on the influence of financial options on neural activity, the anticipatory affect account also has implications for choice. Indeed, neuroimaging evidence suggests that while ventral striatal activity (and NAcc activity in particular) predicts risk seeking stock choices, anterior insula activity instead predicts risk avoidant bond choices in investment tasks (Kuhnen and Knutson, 2005). Extended to higher order statistical moments, individual differences in NAcc activation as well as positive arousal predict subsequent preferences for positively skewed gambles (Wu et al., 2011). These findings suggest that even given the same statistical gambles, individual differences in affective and neural responses may provide finer-grained predictions that describe not only group behavior, but also individual choice. These findings also imply the novel prediction that even


#### **Table 3 | Predicted maximum activity organized by statistical moments (lower to higher order).**

#### **Table 4 | Predicted maximum activity organized by affective impact.**


after holding mean and variance constant, ventral striatal (NAcc) activity should predict approach toward positively skewed gambles, while anterior insula activity should predict avoidance of negatively skewed gambles – a prediction worthy of further investigation. Thus, different types of financial risk (e.g., variance versus skewness, positive versus negative skewness, etc.) may differentially recruit circuits involved in financial risk taking.

An affective neuroscience account also yields novel predictions about the influence of incidental stimuli on financial risk taking. Specifically, stimuli that increase positive arousal should encourage financial risk taking, whereas stimuli that increase negative arousal might discourage financial risk taking, even when those stimuli are irrelevant to the task at hand. Indeed, in a neuroimaging study of heterosexual males, exposure to positive images (i.e., erotic – versus neutral office supplies or aversive snakes and spiders) tended to increase choices of higher risk (i.e., higher variance) gambles, and this effect was partially mediated by NAcc activation (Knutson et al., 2008). In a follow-up behavioral study that included males and females, prior presentation of positive images increased financial risk taking, but prior presentation of negative images decreased financial risk taking (Kuhnen and Knutson, 2011).

While these influences may hold in tightly controlled and carefully incentivized laboratory demonstrations, do they generalize to "real world" choices? Researchers have speculated that investors continue to show biases in choice despite financial advice or knowledge to the contrary. Some of these, such as the lack of diversity in investment portfolios, may result from preferences for skewness (Mitton andVorkink, 2007). Additionally, because individuals are willing to pay more for positively skewed investments but receive more for accepting negatively skewed investments (Ang

#### **REFERENCES**

Abler, B., Herrnberger, B., Gron, G., and Spitzer, M. (2009). From uncertainty to reward: BOLD characteristics differentiate signaling

pathways. *BMC Neurosci.* 10, 154. doi::10.1186/1471-2202- 10–154

Alderfer, C., and Bierman, H. (1970). Choices with risk: beyond the et al., 2006), skewness preferences may not only describe individual investment choices, but may even scale to market valuation at the aggregate level (Arditti and Levy, 1975).

In summary, to explain human financial risk taking, economists have traditionally referred to objective statistical properties of financial options, while psychologists have emphasized subjective emotional and cognitive processes in the decision maker. An affective neuroscience account bridges these perspectives by proposing that the brain translates statistical input into affective experience, which then can influence choice. Importantly, this affective neuroscience account generates novel yet testable predictions of how various statistical moments might influence choice, and further specifies which neural components should translate statistical input into choice output. The existing findings summarizing neural responses to the first three statistical moments of financial options (i.e., mean, variance, and skewness) lends support to an affective neuroscience approach. Future work using brain activity to predict choice may generate predictions that transcend traditional economic and psychological theories. Ultimately, a better understanding of the neural mechanisms that influence financial risk taking may not only improve individuals' financial choices, but also societal welfare by better informing public policy.

#### **ACKNOWLEDGMENTS**

This research was supported by grant NSF Grant 0748915 to Brian Knutson and an NSF Graduate Research Fellowship to Charlene C. Wu. We thank Christopher Burke and Philippe Tobler for providing their data for reanalysis, as well as Grace Tang, Katja Spreckelmeyer, and two reviewers for comments on previous drafts.

mean and variance. *J. Bus.* 43, 341–353.

Ang, A., Chen, J., and Xing, Y. (2006). Downside risk. *Rev. Financ. Stud.* 19, 1191.

Arditti, F., and Levy, H. (1975). Portfolio efficiency analysis in three moments: the multiperiod case. *J. Financ.* 30, 797–809.


(2012). Activation likelihood estimation revisited. *Neuroimage* 59, 2349–2361.


choice. *Philos. Trans. R. Soc. B Biol. Sci.* 363, 3771–3786.


Markowitz, H. (1952). Portfolio Selection. *J. Financ.* 7, 77–91.


of reward magnitude, probability, and risk during a wheel of fortune decision-making task. *Neuroimage* 44, 600–609.


gain- and loss-related value predictions and errors of prediction in the human brain. *J. Neurosci.* 26, 9530–9537.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 July 2012; paper pending published: 23 August 2012; accepted: 15 October 2012; published online: 02 November 2012.*

*Citation: Wu CC, Sacchet MD and Knutson B (2012) Toward an affective neuroscience account of financial risk taking. Front. Neurosci. 6:159. doi: 10.3389/fnins.2012.00159*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012Wu, Sacchet and Knutson. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*