## DECISION-MAKING EXPERIMENTS UNDER PHILOSOPHICAL ANALYSIS: HUMAN CHOICE AS A CHALLENGE FOR NEUROSCIENCE

EDITED BY: Gabriel José Corrêa Mograbi and Carlos Eduardo Batista de Sousa PUBLISHED IN: Frontiers in Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-668-5 DOI 10.3389/978-2-88919-668-5

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **DECISION-MAKING EXPERIMENTS UNDER PHILOSOPHICAL ANALYSIS: HUMAN CHOICE AS A CHALLENGE FOR NEUROSCIENCE**

## Topic Editors:

**Gabriel José Corrêa Mograbi,** Universidade Federal de Mato Grosso (UFMT), Brazil **Carlos Eduardo Batista de Sousa,** Universidade Estadual do Norte Fluminense (UENF), Brazil

## Cover image: Image by Gabriel José Corrêa Mograbi

Introductory page figure: Multi-level kernel density analysis results for (A) externally guided decisionmaking under uncertainty, (B) externally guided decision-making in a social situation, and (C) internally guided decision-making. Results from the different statistical thresholds are shown with different colors: cyan, pink, and yellow, a height threshold of familywise error rate (FWE) corrected at p < 0.05; orange, a stringent threshold of FWE corrected for the spatial extent at p < 0.05 with primary thresholds of uncorrected p < 0.001; blue, violet, and red, a medium threshold of FWE corrected for the spatial extent at p < 0.05 with primary thresholds of uncorrected p < 0.01. No clusters were identified at the stringent threshold in externally guided decision-making under uncertainty or in a social situation. DMPFC, dorsomedial prefrontal cortex; DLPFC, dorsolateral prefrontal cortex; IPL, inferior parietal lobule; IFG, inferior frontal gyrus; pACC, perigenual anterior cingulate cortex; PCC, posterior cingulate cortex; MPFC, medial prefrontal cortex.; STG, superior temporal gyrus.

Image taken from: Nakao T, Ohira H and Northoff G (2012) Distinction between externally vs. internally guided decision-making: operational differences, meta-analytical comparisons and their theoretical implications. Front. Neurosci. 6:31. doi: 10.3389/fnins.2012.00031

This introduction just aims to be a fast foreword to the special topic now turned into an e-book. The Editorial "Decision-Making Experiments under a Philosophical Analysis: Human Choice as a Challenge for Neuroscience" alongside with my opinion article "Neurophilosophical considerations on decision making: Pushing-up the frontiers without disregarding their foundations" play the real role of considering in more details the articles and the whole purpose of this e-book.

What I must highlight in this foreword is that our intention with such a project was to deepen into the very foundations of our current paradigms in decision neuroscience and to philosophically moot its foundations and repercussions. Normal Science (a term coined by Philosopher Thomas Kuhn) works under a research consensus among a scientific community: A shared paradigm, consolidated methods, widespread convictions. Pragmatically, winning formulas must be kept, although, not at any cost. What differentiates a gifted and revolutionary scientist from a more bureaucratic colleague is the capacity own paradigm. That is best strategy to avoid that a paradigm itself would gradually come under challenge. own paradigm. That is best strategy to avoid that a paradigm itself would gradually come under challenge.

In my view, some achievements, in this sense, were brought about in our project. The e-book will be inspiring and informative for both neuroscientists that are concerned with the very foundations of their works and for philosophers that are not blind to empirical evidence. Kant once said: "Thoughts without content are empty, intuitions without concepts are blind". Paraphrasing Kant we could say: Philosophy without science is empty, science without philosophy is blind.

**Citation:** Mograbi, G. J. C., Batista de Sousa, C. E., eds. (2015). Decision-Making Experiments under Philosophical Analysis: Human Choice as a Challenge for Neuroscience. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-668-5

# Table of Contents


Gabriel J. C. Mograbi

## Editorial: Decision-making experiments under a philosophical analysis: human choice as a challenge for neuroscience

Gabriel J. C. Mograbi\*

Research Group on Mind and Brain, Department of Philosophy, Federal University of Mato Grosso, Cuiabá, Brazil

Keywords: decision-making, decision neuroscience, neurophilosophy, neuroethics, neuroeconomics, free will, neural correlates of decision-making

Decision-making is a complex subject in neuroscience. In the last years, considerable advances were achieved in different fields ranging from modulatory neurotransmitters to functional imaging, from neuroeconomics to neuroethics. Our research topic envisages a critical view on the state-of-theart of decision neuroscience by means of foudational and methodological approaches to practical and empirical science. Accordingly, we exhorted contributions that deeply analyze neuroscientific experiments in both technical and philosophical ways aiming a broader understanding of the relevance, scope and limitations of decision-making experiments. Moreover, we encouraged epistemological reflections about the necessary neural mechanisms to decision-making. This topic is constituted by the following papers:

Sip et al. (2012) addresses decision to deceive and its related social pressure. Participants in the fMRI scaner were confronted by an opponent about his/her knowledge on a display's content and were rewarded for successful deception and penalized for ineffective ventures. The results, in addition to showing, as expected, that the decision to deceive is influenced by the risk of being detected and the social confrontation represented by the detection, also reveal that participants were slower when taking an honest course of action instead of taking advantage of their privileged knowledge. Also, important results concerning functional brain areas involved in the tasks are presented.

An elegant Bayesian decision model is presented in Deneve (2012) that both infers the probability of two different choices and simultaneously estimates the reliability of the sensory information on which this choice is based. Trials in which the level of difficult is higher show early sensory inputs having a stronger impact on the decision. Accordingly, the threshold collapses such that response time is shorter, tough with lower accuracy. Easy trials, by their turn, show the opposite: an increased sensory weight and a higher threshold over time, eliciting slower, but more accurate, decisions. As the model advanced by the author considers adaptive sensory weights, it could not only extract a single estimate from the sensory input, but also evaluate the uncertainty associated with it.

Osman (2012) empirically compares Choice-based decision-making and Predictionbased learning, showing that the former leads to more accurate cue-outcome knowledge. The author interprets results as suggesting that the additional demand of cognitive resources for the processing of rewards could be an explanation of its adverse effect in the decisional process. Also, a series of philosophical considerations is forwarded to question how generalizable is evidence from neuropsychology to psychology and vice-versa. In this context, the relationship of intra-level and inter-level experiments is considered.

Edited and reviewed by: Paul E. M. Phillips, University of Washington, USA

\*Correspondence: Gabriel J. C. Mograbi, gabriel.mograbi@gmail.com

## Specialty section:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 23 June 2015 Accepted: 30 July 2015 Published: 18 August 2015

#### Citation:

Mograbi GJC (2015) Editorial: Decision-making experiments under a philosophical analysis: human choice as a challenge for neuroscience. Front. Neurosci. 9:288. doi: 10.3389/fnins.2015.00288

Nakao et al. (2012) compares and disentangle two types of empirical protocols used for study of decisional processes: experiments that assign to its participants tasks in which a unique but uncertain answer is presupposed and experiments in which no unique external cued answer could be considered correct. The former is categorized as externally oriented decision-making and the latter as internally oriented decision-making. The article also uses Multi-Kernel Density Analysis (MKDA) to contrast internally and externally guided decisions in terms of recruitment of areas, to finally compare commonalities and differences between the two types of decisions.

Heinzelmann et al. (2012) discusses the practical and moral question of inappropriate behavior considering its foundations in both philosophical normative and descriptive domains. The moral implication of empirical findings in neuroscience, economics and psychology are discussed in the light of this philosophical background aiming at an understanding of the possible mechanisms of moral inappropriate actions and the decisional process that leads to them. More importantly, the paper addresses the morally important and controversial question of interventions to promote behavior improvement.

Taking as a standpoint Stephens and Anderson's (2001) already classic article, Bourgeois-Gironde (2012) aims at considering the viability of methodological transfers from behavioral ecology to experimental economics, including human choice inasmuch as it is concerned with intertemporal preferences. The author suggests that economic theories have noticeable similarities to ecological models in their assumptions and implications.

Lucci (2013) proposes an investigation of the subjective component of time in intertemporal choice (IC). The author asserts that deviations from exponential reward discounting, as a function of time, could have as a primary factor the deviation of subjective time from the calendar metric system time. Time perception, she claims, could modulate discounting. Consequently, time perception would be a fundamental component of intertemporal choice.

In Smaldino and Richerson (2012) the authors argue that current paradigms in neuroscience are focused on decisions made among a previously established set of options,

## References


although, the very generation of options has barely been studied and still to a great extent an untapped issue. The author considers various specific factors that could influence the generation of options that would be categorizable in two broadly defined domains: psycho-biological and socio-cultural.

Volz and Gigerenzer (2012) Argues that normative strategies used to decide under risk could not be generalized to all types of decision-making processes. They stress that in most of the experimental designs, the strategies to deal with risk are assumed as implicit presuppositions even if they are not applicable. They show that criteria for generating optimal solutions in decisional processes under risk could not be the best whenever uncertainty is the difficulty the agents have to cope with.

Shadlen and Roskies (2012) defends the possibility of a reconciliation of responsibility with neurobiological mechanism by philosophically reviewing presuppositions and implications of recent empirical studies in neurobiology. Instead of the more traditional account of compatibilism based on an appeal to randomness or noise as a source of freedom, they rather recognize that randomness could possibly establish the background against which policies have to be adopted.

Finally, Mograbi (2013) summarizes and critically analyses the merits, achievements, scope and limitations of each article in this present edition and also considers future directions in some of those cases. It can be taken as an extension of this editorial and constitutes a more detailed introduction to the whole edition.

## Acknowledgments

On a final note, I want to thank the contributions of all referees, the editorial team, especially Graemme Moffat and the Chief Editors of Frontiers in Decision Neuroscience, Hauke Heekeren and Scott Huettel and my co-editor Carlos de Sousa. More importantly I want to distinctively thank the contributions of each author to our research topic. GM was supported by a postdoctoral fellowship from the Coordination for the Improvement of Higher Education Personnel (Coordenação de Aperfeiçoamento de Pessoal de Ensino Superior - CAPES – Brazil).

their foundations. Front. Neurosci. 7:261. doi: 10.3389/fnins.2013. 00261


making in social interaction. Front. Neurosci. 6:58. doi: 10.3389/fnins.2012. 00058


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Mograbi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## What if I get busted? Deception, choice, and decision-making in social interaction

## *Kamila E. Sip1,2,3\*, Joshua C. Skewes 2, Jennifer L. Marchant 4,5,William B. McGregor 3, Andreas Roepstorff 2,6 and Christopher D. Frith2,4*

<sup>1</sup> Department of Psychology, Rutgers University, Newark, New Jersey, USA

<sup>3</sup> Department of Aesthetics and Communication – Linguistics, University of Aarhus, Aarhus, Denmark

<sup>4</sup> Wellcome Trust Centre for Neuroimaging, University College London, London, UK

<sup>5</sup> Institute of Cognitive Neuroscience, University College London, London, UK

<sup>6</sup> Section for Anthropology and Ethnography, Department of Culture and Society, University of Aarhus, Aarhus, Denmark

#### *Edited by:*

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil

#### *Reviewed by:*

Nobuhito Abe, Harvard University, USA Julian Keenan, Montclair State University, USA Daniel C. Mograbi, King's College London, UK

#### *\*Correspondence:*

Kamila E. Sip, Social and Affective Neuroscience Lab, Department of Psychology, Rutgers University, Smith Hall, Room 301, 101 Warren Street, Newark, NJ 07102, USA. e-mail: ksip@psychology.rutgers.edu

Deception is an essentially social act, yet little is known about how social consequences affect the decision to deceive. In this study, participants played a computerized game of deception without constraints on whether or when to attempt to deceive their opponent. Participants were questioned by an opponent outside the scanner about their knowledge of the content of a display. Importantly, questions were posed so that, in some conditions, it was possible to be deceptive, while in other conditions it was not. To simulate a realistic interaction, participants could be confronted about their claims by the opponent. This design, therefore, creates a context in which a deceptive participant runs the risk of being punished if their deception is detected. Our results show that participants were slower to give honest than to give deceptive responses when they knew more about the display and could use this knowledge for their own benefit. The condition in which confrontation was not possible was associated with increased activity in subgenual anterior cingulate cortex. The processing of a question which allows a deceptive response was associated with activation in right caudate and inferior frontal gyrus. Our findings suggest the decision to deceive is affected by the potential risk of social confrontation rather than the claim itself.

#### **Keywords: deception, confrontation, social interaction, decision-making**

## **INTRODUCTION**

Deception has been of interest to psychologists, forensic experts, and laymen (Woodruff and Premack, 1979; Whiten and Byrne, 1988; Saarni and Lewis, 1993; Bradley et al., 1996; Walters, 2000). It has triggered trans-disciplinary scientific investigations within anthropology; philosophy; cognitive, social, and forensic psychology; and recently, cognitive neuroscience. Among the reasons for studying deception, determining the motivation for deceptive behavior, and enhancing recognition of deceptive strategies appear to be of core interest. For deception to be successful, it needs to have some foundation in truth, such that people tend not to deceive with a cluster of deceptive messages, but instead incorporate deception while telling the truth (see e.g., Ekman, 1992; DePaulo et al., 1996; DePaulo and Kashy, 1998). Therefore, deception may be interwoven into a partially honest message, to secure the trust of interlocutors.

Complex social interaction typically requires the ability to make rapid decisions that take account of possible outcomes. This involves a broad set of cognitive processes, including the ability (i) to determine the possible courses of action and to identify how they could be coordinated with the interlocutor, (ii) to weigh these available courses of action against one another, and (iii) to choose which action to perform next in the interaction.

Deception is an example of a complex social interaction and thus involves the same set of cognitive processes (Sip et al., 2008) but has the goal to instill a false belief in the mind of the interlocutor so as to manipulate how the interaction unfolds. To deceive, therefore, consciously and/or subconsciously we must be able (i) to determine whether deception is one of the set of possible actions in the interaction, (ii) to weigh the advantage to be gained by deceiving against the risks and consequences of being detected, and (iii) to choose to perform the deceptive action. As argued by Sip et al. (2008)these key cognitive components of social decisionmaking, and not the telling of a falsehood as such, provide the main explanatory content for the neural activity associated with the production of deception. Here, we aim to explore decision-making in deception in terms of the costs and values of our day-by-day contexts, while providing a free choice within the limitations of decision-making in laboratory settings.

In deceptive encounters, the change in circumstances is connected not only to the decision *per se*, but also to the impact resulting from an attempt to modulate the perspectives and beliefs of others. Therefore, like all choices – especially in social interactions – deception is influenced by probable gains and losses. Usually, we choose to deceive because we believe that if our deception is successful,we shall be better off than if we had told the truth.

<sup>2</sup> Center for Functionally Integrative Neuroscience, Aarhus University Hospital, Aarhus, Denmark

There are many variables to consider in making such a choice. Will our deception be detected? What are the consequences of detection? Will we gain something if we are falsely accused of telling a falsehood (see Sip et al., 2010)? Deception is not just a simple matter of truth and falsehood. The gains from deception can be large, but the actual calculation of relative gains and losses involves solving a complicated decision-making tree, which can, at best, only be approximated. In real-life, the cost of being caught red-handed can be enormous, in terms of loss of reputation, trust, power, or money. Consequently, the danger of being confronted with one's deceptive claims may share similarities with experiencing negative social consequences, such as rejection (Masten et al., 2009; Onoda et al., 2009).

There has been a significant lack of imaging literature that treats deception as a social phenomenon. Only recently, neuroimaging investigations started treating deception within a framework of social decision-making (see e.g., Abe et al., 2007; Barrios et al., 2008; Baumgartner et al., 2009; Greene and Paxton, 2009; Bhatt et al., 2010;Carrion et al., 2010; Sip et al., 2010). Abe and colleagues addressed the issue of instructed lies by introducing a clever twist in their instructions to participants (Abe et al., 2007). Using a temporary absence of experimenter 1, experimenter 2 secretly instructed participants to deceive experimenter 1 by providing opposite responses than those suggested by the experimenter 1. Interestingly, in this study, participants faced an externally introduced change to the set of rules, and therefore it might be problematic to account for that change as a result of both peripheral attentional load and deception activation that could have contributed to the final results. Bhatt et al. (2010) investigated the role of social image in strategic deception to manipulate others' beliefs about each other for gains in a bargaining game. Another study tested how participants would behave when faced with a possibility of being deceptive to gain monetary rewards (dishonest gain; Greene and Paxton, 2009).

Many earlier studies (see e.g., Ganis et al., 2003; Spence et al., 2004; Langleben et al., 2005) have tested the production of deception by instructing participants when to tell a falsehood. In this way, the truth or falsity of participants' claims have been treated as an *independent* variable in most experimental paradigms, such that in most experiments, whether a claim is true or false has been under the control of the experimenter and not the participant. This approach excludes social decision-making from the experimental equation (see Sip et al., 2008 and also Greely and Illes, 2007). Therefore, the purpose of the present study is to take an alternative approach that focuses more on the social decisionmaking processes involved in deception, rather than on deception as a "yes" or "no" response equated with an honest or deceptive response respectively.We were primarily interested in investigating how participants produced deception given a free choice to make deceptive claims when detection was a possible social consequence. Therefore, rather than treating deception as an *independent variable* coded in a balanced factorial design, we instead controlled the social context for deception by systematically varying both the possibility to deceive and the possibility of being detected. Then, within this context, we left participants free to decide when and if they should attempt to make deceptive claims. We thus treated the responses associated with the decision to deceive as a *modulatory variable*.

A novel design was implemented in an attempt to accommodate for free choice and potential confrontation. In a paradigm modified from a behavioral study of Keysar et al. (2000), participants were questioned by an interlocutor about their knowledge of the content of a display, and the interlocutor could sometimes challenge their responses. Rather than being instructed to deceive the interlocutor, questions were posed to participants so that deception was meaningful in some conditions and not in others, and so that any acts of deception could be detected in some conditions and not in others. Within this design, participants were left to choose for themselves when to deceive, and with that choice followed the possible consequence of being caught out in a lie. This allowed us to treat deception as an outcome of a social decision-making process, and,in our data analysis, to regress the decision to deceive with neural and behavioral measures. Given that deception is a social decision-making process, and that the anterior cingulate cortex (ACC) is involved in decisionmaking (see e.g., Botvinick, 2007; Dolan, 2007; Rushworth and Behrens, 2008; Croxson et al., 2009), we expected ACC to be active in conditions where it was necessary to balance a monetary reward for successfully deceiving the interlocutor against the risk of detection (e.g., Abe et al., 2006; Baumgartner et al., 2009).

Participants played both against (what they believed were) a human and a computer. This double partnership was motivated by previous social studies that showed that participants care whether their opponent is a human and attribute different behavior accordingly (see e.g., Gallagher et al., 2002). This aspect has not yet been tested in deception paradigms.

It bears clarifying that the primary aim of our study was not to observe how behavior and neural activity of individuals were affected by the *performance* of deception *per se*. Rather, the primary aim of our study was to investigate how individuals' decision to deceive modulates their behavior and neural activity given the social and informational context in which that decision is made. Our focus was therefore not on the production of deception as an act in and of itself, but rather on the social decision-making processes associated with the production of deception. This is why the participants' decision to deceive was treated as a free modulatory parameter in this study, and not as part of the study's factorial design. In this way, our study breaks with standard practice in the design of deception experiments for the purpose of addressing an important unresolved issue.

## **MATERIALS AND METHODS**

#### **SUBJECTS**

Sixteen healthy, right-handed participants with no reported neurological or psychiatric disorders responded to an ad to volunteer in the experiment. Data from two participants were excluded. One told a falsehood at all times regardless of the context, while there were excessive movement artifacts in the fMRI data for the other. The remaining 14 participants (7 males) were aged between 20 and 45 years (mean = 26; SD = 6.9). Participants gave written informed consent to take part in the study, conducted according to the principles expressed in the Declaration of Helsinki, which was approved by the Joint Ethics Committee of the National Hospital for Neurology and Neuroscience (UCL NHS Trust) and Institute of Neurology (UCL).

## **STIMULI**

Participants were presented with a two-dimensional representation of a three-dimensional box. The box was divided into 16 compartments (4 × 4 grid) or shelves (**Figure 1**). On each trial, each compartment could be empty or contain one of seven different objects. Each compartment was always represented as open to the front, but could be either open or closed to the back. From the front view, it was obvious if a particular object could also be seen from the back.

## **PROCEDURE**

While in the scanner, participants were shown the front view of the stimulus, and were told an interlocutor was simultaneously being shown the back view. On each trial (see **Figure 2**), the interlocutor asked participants if they could see a target object on any of the shelves. The target object was randomized across trials. There was no restriction on whether the response should be true or false. Participants heard the questions via headphones and responded yes or no by button press.

The opponent could ask three types of question (A, B, and C). For *Question type A,* the target object was visible from the front and the back views, so that it was obvious to the participant that the interlocutor could easily detect deception (symmetrical knowledge; truth\_eliciting question). For *Question type B,* the target object was only visible from the front view, so that it was obvious to the participant that it should be more difficult for the interlocutor to detect deception (asymmetrical knowledge, deception by omission; falsehood\_eliciting question). For *Question type C,* the target object was not present in the box, so that it was more difficult for the interlocutor to detect deception, but this was not immediately obvious to the participant because it required visual search (asymmetrical knowledge, deception by commission; falsehood\_eliciting question).

The experiment consisted of two sessions with different types of interlocutor (human or computer). Each session consisted of six blocks. In two blocks participants were informed that a computer

participants were asked several different types of questions regarding the contents of the box, e.g., Question type A "Do you see a roller-skate?" (Truth\_eliciting question), Question type B "Do you see a doll?" (Falsehood\_eliciting question), Question type C "Do you see a giraffe?" (Falsehood\_eliciting question).

program posed the questions and a computer-generated voice was used. In another two blocks participants were informed that the questions were posed online by the experimenter (K. Sip), whose voice they had heard, and with whom the participants had interacted with prior to the functional scans. In the two remaining blocks, participants were instructed to always state whether an object was present (answer truthfully with no motivation to deceive). These blocks were only used to check whether participants understood the task, and they were not used in the fMRI analysis. Unknown to the participants, the experimenter's voice was pre-recorded and the questions were posed in a predetermined order.

In each of these situations, the interlocutor could confront participants about their responses in one block but not in the other. Although participants always knew which block they were in, they did not know which responses would be confronted. They were informed prior to the start of the confrontation block that the interlocutor was allowed to confront only some of their responses, usually up to four responses per block.

Each experimental trial could be rewarded or punished with a small amount (50 pence per event). Participants were informed that they would be rewarded for successful deception and penalized for unsuccessful attempts across all conditions. There was no monetary consequence for telling the truth when the object was visible for both players. The system of rewards was introduced to further motivate participants to try to avoid detection. Importantly, no monetary feedback was given to the participants during the functional scans at any point. Therefore, participants were not able to track their rewards on a trial to trial basis, instead allowing them to give priority to the decision about whether to be honest or not. This was important to ensure that participants were attentive in all conditions and refrained from giving only one type of response, e.g., always replying "yes" when confrontation was not possible. The total rewards were calculated at the end of experiment.

The same reward pattern was used for unchecked trials in the confrontation blocks. However, in the few predetermined checked trials (four per block), participants were penalized if they were caught telling a falsehood, and were compensated for being wrongly accused of telling a falsehood when they made a truthful response.

Question trials were randomized within the blocks. Block and session order were counterbalanced using a 2 × 2 Latin Square. After the experiment was completed, the participants were debriefed, which revealed that all believed they had interacted with a human during the human sessions, and that all had actively tried to deceive her.

## **ANALYSIS AND DESIGN**

A three-way factorial design was used with question type (3) × confrontation (2) × interlocutor (2) as factors, with response type included as a covariate and response time as a dependent variable. In data analysis, participants' decision to answer truthfully or to try to deceive the interlocutor was added as a modulator [as a covariate for the response times and a parametric modulation for the blood oxygenation level-dependent (BOLD) signal]. This allowed us to determine the influence of participants'

active social decision-making on their behavior and neural activity when performing deception.

The approach to include participants' decision to deceive as a modulatory variable deviates from the usual approaches of treating variables of interest as controlled experimental factors to be analyzed with analysis of variance. However, our choice is justified, both in principle and empirically, from the perspective of our experimental design. The truth or falsity of participants' responses were not experimentally controlled, but intentionally left under participant control, so that the choice to deceive was not an independent variable in our study. In principle, therefore, the choice to deceive is not a valid target for inclusion as a separate factor in our analysis. Moreover, because participants were free to decide when they should make deceptive claims, they attempted to deceive more often in some conditions than in others. Empirically, therefore, participants' decision to deceive is not sufficiently balanced across conditions, so that treating this variable as a factor would violate one of the core assumptions of analysis of variance. It should also be recalled in this context that our reason for designing the study in this way was that we were not interested in deception in itself as an isolated speech act, but in the social decision-making processes involved in deception. Participants' free decision to deceive was thus conceived in our experimental design as a modulatory variable, and is analyzed as such.

#### **fMRI SCANNING PARAMETERS**

A 1.5T Siemens Sonata MRI scanner (Siemens, Erlangen, Germany) was used to acquire T1-weighted anatomical images and T2∗-weighted echo-planar functional images with blood oxygenation level-dependent (BOLD) contrast (35 axial slices, 2 mm slice thickness with 1 mm gap, 3 × 3 resolution in plane, slice TE = 50 ms, volume TR = 3.15 s, 64 × 64 matrix, 192 × 192 mm FOV, 90˚ flip angle). Two functional EPI sessions of up to 345 on average whole brain volumes (range 300–364 depending on participants response speed) were acquired and the first four volumes were discarded to allow for T1 equilibrium effects.

Image processing was carried out using SPM5 (Statistical Parametric Mapping software, Wellcome Trust Centre for Neuroimaging, UCL)<sup>1</sup> implemented in MATLAB (The Mathworks Inc.,Massachusetts)2. EPI images were realigned and unwarped to correct for movements, slice time corrected, spatially normalized to standard space using the Montreal Neurological Institute EPI template (voxel size of 2 mm × 2 mm × 2 mm) and spatially smoothed with a 8 mm full-width half maximum Gaussian kernel.

#### **IMAGING DATA ANALYSIS**

All events were modeled using the standard hemodynamic response function of SPM5. The design matrix comprised a column for each experimental condition,with separate events defined by their onset time and duration (based on participants' response times). In keeping with our statistical approach of treating the participants' decision to deceive as a modulatory variable, participants' truthful, and deceptive responses in each condition were

<sup>1</sup>www.fil.ion.ucl.ac.uk/spm

<sup>2</sup>www.mathworks.com

added as separate parametric modulations of each column of the design matrix. The fit to the data was estimated for each participant using a general linear model (Friston et al., 1995) with a 128 s high-pass filter, global scaling, and modeling of serial autocorrelations.

Individual T-contrasts related to the different conditions within our factorial design were created from the parameter estimates (beta weights). T-contrasts were computed within subjects for the main effect of confrontation and the main effect of partner, for the effects of question types A, B, and C, and for the relevant interactions. These were then used in separate second level random effects analyses in order to facilitate inferences about group effects (Friston et al., 1995).

Unless specified otherwise, whole brain results are reported for clusters with at least 10 voxels and a threshold of *p* < 0.005 uncorrected for multiple comparisons, the most commonly reported threshold for social neuroimaging studies (Wager et al., 2007). This threshold allows for an appropriate balance between Type I and Type II errors especially in complicated designs involving sociocognitive decision-making (see e.g., Lieberman and Cunningham, 2009). Additionally, we indicate several areas which survive a more stringent FWE correction for multiple comparisons.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

A 2 (partner) × 2 (possibility of being confronted) × 3 (type of question) repeated measures ANOVA revealed significant main effects of confrontation [*F*(1,13) = 16.23, *p* = 0.001] and question type [*F*(2,26) = 61.72, *p* < 0.001] on producing false responses. The main effect of partner was not significant [*F*(1,13) = 1.49, *p* = *0.24*]. The test revealed a significant interaction between confrontation and question type on the percentage of false claims [*F*(2,26) = 3.65, *p* = 0.04] There were fewer false responses in the confrontation condition, but this was only the case for the falsehood\_eliciting question types (see **Figure 3**). There was no significant interaction between partner and question type [*F*(2,26) = 1.56, *p* = 0.23] and partner and confrontation [*F*(1,13) = 0.11, *p* = 0.75] on producing false responses. The three-way interaction was not significant [*F*(2,26) = 0.024, *p* = 0.97].

When the decision to deceive was added as a covariate, a 2 (type of interlocutor) × 2 (possibility of being confronted by the interlocutor) × 3 (type of question asked) repeated measures ANCOVA on response time revealed a significant main effect of question type [*F*(2,12) = 13.26, *p* = 0.001], and a significant interaction between the question type factor and the response type covariate [*F*(2,12) = 4.98, *p* = 0.03]. A marginally significant interaction between confrontation and question type [*F*(2,12) = 3.84, *p* = 0.05] was also revealed.

**Figure 4** (see **Figure 4**) shows that (i) when participants and interlocutors had the same knowledge about the presence of an object in the box, participants were faster to give a true response, regardless of the possibility of confrontation; (ii) when there was obviously asymmetric knowledge between participants and the interlocutor, participants were slower to give a true response, but only when there was no possibility of being confronted; and (iii) when participants knew more about the stimulus but greater

**FIGURE 3 | Mean percentage of false claims across conditions.** For illustration purposes, this graph shows the mean percentage of false claims across question type and confrontation. In the confrontation condition participants gave 58.95% (SE = 5.63) false responses to Question Type B (the target object was only visible from the front view), 56.04% (SE = 7.15) false responses to Question Type C (the target object was not present in the box), and 8.3% (SE = 2.76) false responses to Question Type A (the target object was visible from the front and the back views). In the non-confrontation condition they gave 76.45% (SE = 4.49) false responses to Question Type B, 72.74% (SE = 7.62) false responses to Question Type C, and 5.6% (SE = 2.61) false responses to Question Type A.

**FIGURE 4 | Mean response times (RT) to answer the opponent's question.** Separate means are given for false and true responses, and for responses given both when the opponent could and could not confront the response. Error bars represent one SEM.

attention was required to take advantage of this knowledge, they were slower to give a true than a false response, regardless of the possibility of being confronted. These effects were not significant, however, if the covariate coding participants' decision to respond truthfully or falsely on each trial was removed from the analysis.

#### **NEUROIMAGING RESULTS**

When the decision to deceive was added as a parametric modulator, the main effect of confrontation showed increased activity in subgenual anterior cingulate cortex (subACC) when participants' responses could not be confronted (**Figure 5**; see **Table 1**).

There was also a significant main effect of question type. For question type B, we observe increased activation in right caudate and inferior frontal gyrus (IFG; **Figure 6**). For question type A, we observed increased activity in right putamen, superior temporal gyrus (auditory cortex), and occipital cortex.

## **DISCUSSION**

The current investigation allowed participants the choice to deceive by creating a context in which deception was sometimes possible, but ran into the risk of being punished if it was detected. Our paradigm captures the idea that when people attempt to deceive others, they face a demanding task, based on balancing the tensions between choice and potential outcomes. The paradigm allowed us to treat deception as the outcome of social decision-making, and in our data analysis, to regress the choices participants made with the neural and behavioral measures taken.


Our results suggest that social feedback can only be seen to mediate responses to the question being asked if we take seriously the variance introduced by the free choice the participants are given.

Although this is not the first study to explore deception in social interaction (see Baumgartner et al., 2009; Sip et al., 2010), it is one of the first to provide a context in which participants run the risk of being socially confronted in case their deception is detected (see also Baumgartner et al., 2009; Sip et al., 2010). Participants were allowed to decide whether or not to deceive the partner on any given trial. We found activation in subgenual ACC when the partner could not check the truthfulness of the participants' response. Activation in right caudate and IFG was observed when participants were deciding how to respond to a question that allowed deception. Surprisingly, there were neither behavioral nor neural effects of partner (human vs. computer). This is surprising because one would expect that (1) participants would consider a computer of less importance and thus exhibit a very different pattern of behavior in contrast to that toward human; and (2) participants would try to attribute intentions and causality of actions to people, but not to computers (see e.g., Gallagher et al., 2002).We speculate that the lack of partner effect results from the paradigm placing the main focus on confrontation. Even though participants played with a computer, the machine still exposes their deception to the people observing the task outside the scanner.

The activations in right caudate and IFG strongly suggest that when participants are in the position to make a false claim, presumably they have to decide whether or not to do so given the ratio between the effort invested in the action and its potential rewards. The right IFG has been typically associated with response inhibition tasks in which participants typically need to inhibit their natural response (e.g., Aron et al., 2004). Interestingly, this area has also been implicated in risk aversion, and is suggested to play a role in inhibition of accepting a risky option (Christopoulos et al., 2009). Additionally, the area BA47 (see **Table 1**) has also been implicated in comprehending spoken language (Petrides and Pandya, 2002), which suggests that participants in the current study had to focus on what they were asked about before giving a response. The activation of caudate – well-known for processing


The coordinates are given according to the MNI space, together with T-scores, Z-scores, and significant thresholds p < 0.005 uncorrected for multiple comparisons with a cluster extent threshold of 10 voxels, corrected at the cluster level. We indicate with an asterisk (∗) the areas which survive more stringent threshold of FWE correction of p < 0.05 at the voxel level.

**FIGURE 6 | Main effect of falsehood-eliciting question (Question Type B) on response type.** The peak activations are in **(A)** the right caudate (14 12 10) and **(B)** right inferior frontal gyrus (IFG; 42 20–12), p < 0.005, uncorrected. The color-bar corresponds to T -values.

effort to engage in an action/choice selection (Croxson et al., 2009; Kurniawan et al., 2010) – and dorsal putamen – reported in prediction error, memory, and affective learning (Delgado, 2007) – suggests that the choice of making either a false or true claim may elicit the feeling of reward, reward anticipation, or the feeling of control when making a choice (Leotti et al., 2010).While giving a response, participants needed to also account for previous choices as well as indirectly learn from the interaction what would be their best strategy to exercise deception. Interestingly, activation of dorsal putamen and caudate nucleus may indicate that memory and learning facilitated the choice participants were faced with in our task.

Anterior cingulate cortex has been implicated in social–affective processes involved in decision-making (Dolan, 2007; Rushworth and Behrens, 2008; Croxson et al., 2009). ACC is believed to store associations between past behaviors and rewards (for reviews see Paus, 2001; Rudebeck et al., 2008) and to process choices in dynamic and open-ended contexts (Walton et al., 2007). It subserves response and cognitive conflict monitoring (Botvinick, 2007), calculates cost–benefit evaluations (Croxson et al., 2009), reward expectations (Delgado et al., 2005; Etkin et al., 2006) as well as action selection (for review see e.g., Rushworth et al., 2004; Rushworth et al., 2007). The dorsal and rostral portions of ACC have been associated with choice, conflict monitoring (Rushworth et al., 2004) and representations of beliefs and expectations (Petrovic et al., 2005). The more ventral part of ACC has been reported in processing the value of possible choices in relation to expected reward (Bush et al., 2000). Because of anatomical and functional connections with orbitofrontal cortex (OFC; for review see e.g., Paus, 2001) and ventral striatum (Balleine et al., 2007; Delgado, 2007), ACC functions are strongly modulated by social and emotional context (Rushworth et al., 2007; Rushworth and Behrens, 2008). Multiple ACC functions are therefore likely to be implicated in the decision to deceive (e.g., Ganis et al., 2003; Abe et al., 2006; Baumgartner et al., 2009).

Our finding that ACC is active in a task involving deception is not surprising. Surprisingly though, in other studies an increased activation in ACC has been reported in very different portions of this large area. Several groups reported the activation of dorsal ACC (BA 24/32; Ganis et al., 2003; Kozel et al., 2005; Langleben et al., 2005) in association with the production of deception. However, the tasks used in these experiments were quite different from the task used in the present study (for discussion see Greely and Illes, 2007; Sip et al., 2008; Christ et al., 2009), and the activations were located more dorsally. For example, Ganis et al. (2003) found activation in the dorsal ACC (BA32, 4 6 39; among other areas) by contrasting activity associated with the production of "spontaneous lies" that do not necessarily fit into a coherent story with the production of well-rehearsed falsehoods accommodated in a prepared story. Kozel et al. (2005) observed right ACC activation (ACC, 3 18 60) in a mock-crime experiment in which the subjects were asked to deny possession of a "stolen" object. This activation was associated with monitoring a deceptive response by inhibiting truth-telling. In another study,Abe et al. (2006) observed increased activation of right ACC (BA 24/32) when participants engaged in deception about past events. Only recently was ACC (BA 24) activation reported in an ecologically valid study (Baumgartner et al., 2009),where it was associated with breaking a previously expressed promise in a trust game.

Our observation that the subgenual ACC is active when the decision to deceive does not have immediate social consequences is, however,interesting. Subgenual ACC has previously been implicated in studies of social rejection (8 22 −4 and 10 20 −8 in Masten et al., 2009) and social pain (10 32 −10 in Onoda et al., 2009). Our imaging findings, supported by our behavioral results, therefore suggest that ACC subserves social monitoring when the decision to deceive does not depend upon possible confrontation. In the confrontation condition, the decision to deceive or not will be based largely on utilities, for example the value of deception, and the likely hood of being detected. In the non-confrontation condition these considerations are irrelevant. Rather, the decision not to deceive, even when deception cannot be detected, would be based on moral considerations. To our knowledge, this role of subgenual ACC has not been implicated in other deception studies. Our results confirm our hypothesis (also expressed in Sip et al., 2008) that social feedback – and consequently a potential social rejection – affects production of deception. We speculate that subACC, caudate, and IFG play an important role in mediating a decision to deceive based on the context, rather than in producing false statements.

#### **SOCIAL AND MORAL CONSIDERATION IN EXERCISING DECEPTION**

For many of us, social rejection may also be based on moral values (Greene et al., 2001; Raine and Yang, 2006) and expectations. Thus deception is interestingly related to moral emotions, such as guilt and shame. However, a moral belief that we should not deceive others may be dismissed in contexts in which deception is allowed or even expected, as in most game scenarios and controlled experimental settings (Sip et al., 2010). This means that although there is an important relationship between deception and morality, when deception is sanctioned by the context, it is possible for people to perform genuine deception without experiencing any of the moral emotions one might expect to experience otherwise. Nevertheless, other social consequences of being detected must still be weighted accordingly when one is faced with the choice to deceive, even when moral concerns are made irrelevant to the decision.

We did not observe activation in an emotional network (e.g., insula or amygdala) as in another ecological study of deception (Baumgartner et al., 2009). The reason for this difference may be a difference in focus. Our participants did not declare (promise) to their interlocutor whether they would be honest or deceptive on specific trials. Therefore, the component of explicit social commitment is not involved in our study, such that we should not expect a similar emotional reaction as observed in Baumgartner's study (Baumgartner et al., 2009). This might be because the choice of whether to perform a morally sanctioned act of deception in a game and the more morally loaded choice of whether to break a promise, involve different social phenomena – rejection (van Beest and Williams, 2006) and guilt respectively. Nevertheless, it is challenging to evoke and accurately assess guilt associated with deception in real-life interrogations (Bashore and Rapp, 1993; Pollina et al., 2004), let alone in experimental settings.

Additionally, given that most neuroimaging studies of deception use a researcher as a recipient of deception (and this is known to the subjects), one may argue that this could weaken participants' attempts at deception. In our experiment, however, participants do not act against the experimenter, but rather act within the normative context of the experiment, which implies that the same behavior would not be processed differently toward a stranger. In other words, if participants believe they play with another human in the context of this experiment, this entails an oppositional behavior. Therefore, moral emotions are canceled out by the fact that immoral behavior is sanctioned by the context. Additionally, based on the post-scan debriefing, we are confident that participants tried their best to deceive the experimenter, where in many

cases this was a matter of gaining an upper hand over somebody more experienced in the topic.

#### **THE ROLE OF INSTRUCTIONS**

In experimental settings, instructions given to the participants not only determine their behavior, but also frame how they think about others' actions, mental states, and expectations. In complicated studies of social decision-making, there is a discrepancy between what the instructions say, what the participants agree to do, and what they actually do while lying still in the MR chamber. This is specifically relevant to experimental tasks based on explicit forcedchoice instructions, in which the execution of deception is often presumed to be intelligible independently of the choice and intention to instill a false belief in another person (Sip et al., 2008). These social cognitive processes, functioning in the context of the instructions, constrain the concrete task of executing deception, thus posing conceptual problems for interpreting results produced by any experimental design that does not incorporate them. Ideally, then, task instructions (1) must not define too specifically for the participants when to be deceptive or truthful, and (2) they should not overly limit the quantity and the quality of the choices made by the participants.

In human behavioral and psychological experiments more generally, the interaction between the experimenter and the participant involves sharing a specific script that is aimed to facilitate the execution of an experimental task (Roepstorff and Frith, 2004). In order words, the experimenter communicates the nature of the paradigm to the participant, who acts according to the instructions, or more precisely, to her own understanding of what they entail. In the ideal situation, it is then up to the subject to make the choice of whether or not to comply. However, if the instructions tell the participants to "lie" about events in one condition and to be honest about other events in another (Sip et al., 2008), then the executive role of the participant in choosing to act is essentially left out. Thus, an interesting aspect of deception, namely the social cognitive processes involved in the decision to deceive, are excluded unless participants are able to achieve a certain degree of freedom in response selection, which is not controlled by the experimenter.

Interestingly, in the current study, even though experimental instructions implicitly suggested telling a falsehood, participants did not tell a falsehood 100% of the time when deception was possible (**Figure 3**). This suggests that even when there was no direct danger of being caught in a lie in the non-confrontation condition, participants still mimic a real-life situation in this context, in which the ratio of true and false claims is not predetermined across contexts. Another interesting result was that there were several trials in which participants decided to tell a falsehood in response to questions in which the object was visible to both parties (**Figures 3** and **4**). Peculiar as it sounds; this suggests that mistakes aside, participants did exercise their free choice, even in a situation that was not beneficial to them. Additionally, **Figure 4** shows an interesting pattern of reaction times relative to the question type and response type. One possibility is that the slower RTs of true claims are concerned with less plausible responses that perhaps require more thought. For example, the somewhat irrational responses of telling a falsehood in response to question type A, and telling the truth when deception cannot be detected in question type B, are similarly slowed.

## **LIMITATIONS**

Because of our effort to account for a natural deceptive interaction in laboratory settings, this study faces certain limitations: (a) free choice in deceptive decision-making give rise to a range of behavior that is difficult to predict prior to the experiment, (b) unbalanced numbers of events that are then included in imaging analysis, (c) interpersonal differences that cause inter- and intra-subject variability in recorded data. Additionally, our study might be underpowered due to the small sample size to detect activations associated with moral emotions. Therefore, one may speculate alternative explanations for the lack of moral and emotional networks, such that it is plausible that the presence of moral emotions was merely diminished instead of canceled out. Further ecological studies are called for to allow better understanding of neural and behavioral processes that facilitate deceptive behavior.

## **REFERENCES**


anterior cingulate function. *Cogn. Affect. Behav. Neurosci.* 7, 356–366.


Overall, our findings suggest that production of deception depends upon an effort-based affective–motivational network rather than merely higher-level cognitive processes as has been suggested thus far. Given that potential social consequences affect decisions to deceive, we argue that real-life deception may be interpreted as a decision with costs, benefits and losses. The gain from the deception must be evaluated as greater than the cost of the deception. Similarly, the gain made possible by the deception must be balanced against the cost of being found out. As in all such decisions, the costs are monitored according to what the other person knows and does not know, in relation to what the deceptive agents know. We suggest that the fields of neuroeconomics and deception intersect (see e.g., Baumgartner et al., 2009) and could offer an interesting contribution to further understanding of deception.

## **ACKNOWLEDGMENTS**

We are grateful to Ian Apperly for his help with the development of the paradigm.

relationships. *J. Pers. Soc. Psychol.* 74, 63–79.


J. D. (2001). An fMRI investigation of emotional engagement in moral judgment. *Science* 293, 2105–2108.


production and detection of deception in an interactive game. *Neuropsychologia* 48, 3619–3626.


primates. *Behav. Brain Sci.* 11, 233–244.

Woodruff, G., and Premack, D. (1979). Intentional communication in the chimpanzee: the development of deception. *Cognition* 7, 333–362.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 January 2012; accepted: 01 April 2012; published online: 18 April 2012.*

*Citation: Sip KE, Skewes JC, Marchant JL, McGregor WB, Roepstorff A and Frith CD (2012) What if I get busted? Deception, choice, and decision-making in social interaction. Front. Neurosci. 6:58. doi: 10.3389/fnins.2012.00058*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Sip, Skewes, Marchant, McGregor, Roepstorff and Frith. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Making decisions with unknown sensory reliability

## *Sophie Deneve1,2\**

<sup>1</sup> Département d'Etudes Cognitives, Group for Neural Theory, Ecole Normale Supérieure, Paris, France

<sup>2</sup> Groupe de Neuroscience Théorique, Collège de France, Paris, France

#### *Edited by:*

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil

#### *Reviewed by:*

Floris P. De Lange, Radboud University Nijmegen, Netherlands Philippe N. Tobler, University of Zurich, Switzerland

#### *\*Correspondence:*

Sophie Deneve, Département d'Etudes Cognitives, Group for Neural Theory, Ecole Normale Supérieure, 29, rue d'Ulm, 75005 Paris, France. e-mail: sophie.deneve@ens.fr

To make fast and accurate behavioral choices, we need to integrate noisy sensory input, take prior knowledge into account, and adjust our decision criteria. It was shown previously that in two-alternative-forced-choice tasks, optimal decision making can be formalized in the framework of a sequential probability ratio test and is then equivalent to a diffusion model. However, this analogy hides a "chicken and egg" problem: to know how quickly we should integrate the sensory input and set the optimal decision threshold, the reliability of the sensory observations must be known in advance. Most of the time, we cannot know this reliability without first observing the decision outcome. We consider here a Bayesian decision model that simultaneously infers the probability of two different choices and at the same time estimates the reliability of the sensory information on which this choice is based.We show that this can be achieved within a single trial, based on the noisy responses of sensory spiking neurons. The resulting model is a non-linear diffusion to bound where the weight of the sensory inputs and the decision threshold are both dynamically changing over time. In difficult decision trials, early sensory inputs have a stronger impact on the decision, and the threshold collapses such that choices are made faster but with low accuracy. The reverse is true in easy trials: the sensory weight and the threshold increase over time, leading to slower decisions but at much higher accuracy. In contrast to standard diffusion models, adaptive sensory weights construct an accurate representation for the probability of each choice. This information can then be combined appropriately with other unreliable cues, such as priors. We show that this model can account for recent findings in a motion discrimination task, and can be implemented in a neural architecture using fast Hebbian learning.

**Keywords: Bayesian, decision making, uncertainty, adaptation, expectation-maximization, prior, evidence, decision threshold**

## **INTRODUCTION**

Survival requires fast and accurate decisions in an uncertain and continuously changing world. Unfortunately, our sensory input is noisy, ambiguous, and unfolding across time. The outcome of actions, such as reward or punishment, is also uncertain. As a result, perceptual and motor decisions cannot be pre-defined and instantaneous. Instead, sensory evidence needs to be accumulated over time and integrated with prior knowledge and reward predictions. Decision making investigations address solutions adopted by living organisms to solve two distinct but related problems: faced with different choices, which one would yield the most desirable outcome ("what to decide")? In addition, since delaying decisions allows more time for collecting information and increasing choice accuracy, when should this decision be made ("when to decide")? Optimal decision strategies solve this time/accuracy trade-off in order to maximize the rewards collected per unit of time, i.e., the reward rate.

One of the most fundamental questions in the study of decision making is whether or not the strategies used by humans and animals are optimal. Indeed, recent experimental and theoretical results suggest that humans use Bayes optimal strategies in a wide variety of tasks (Doya, 2002; Knill and Pouget, 2004; Sugrue et al., 2004; Daw et al., 2006; Wolpert, 2007). In simple experimental regimes, such as a two-alternative-forced-choice (2AFC) task, the optimal decision strategy can be described quantitatively as an integration to threshold (Gold and Shadlen, 2002; Ratcliff and McKoon, 2008). In this framework, decision making is divided into two successive stages: First, the inference stage accumulates sensory evidence over time by computing the probabilities that each choice is correct given past sensory observations ("what to decide?"). Subsequently, a decision is made to commit to one of the choices, when these probabilities have satisfied a given criteria ("when to decide?"). This response criterion is critical because it shapes the time/accuracy trade-off and controls the total reward collected by the subject.

In certain contexts, Bayesian decision making is equivalent to relatively simple decision mechanisms such as the diffusion model. However, in general, Bayesian methods lead to non-linear, nonstationary models of integration and decision making (Behrens et al., 2007; Deneve, 2008a,b;Mongillo and Deneve, 2008). In order to solve a decision problem, a Bayesian integrator must constantly adapt its decision making strategy to the statistical structure of the task and the reward. Though simple to formulate, these probabilistic decision problems can have solutions that are quite difficult to analyze mathematically, and are computationally intractable. Simplifying assumptions are required.

On the other hand, a major advantage of the Bayesian approach is its adaptability and generalizability to situations where simpler decision models would be suboptimal or not work (Doya, 2002;Yu and Dayan, 2005; Behrens et al., 2007;Walton et al., 2007;Whiteley and Sahani, 2008). Here we start from an extremely simple task (a 2AFC) where Bayesian decision making may be equivalent to the diffusion model, but only if the probability distributions of sensory inputs (i.e., the sensory likelihoods) are known in advance. We then show than when these distributions are not known *a priori* (which is likely to be the case in realistic decision tasks) enough information can be extracted from the sensory input (in the form of sensory neuron spike trains) to estimate the precision of the sensory input on-line and adapt the decision strategy accordingly.

This has strong consequences for the decision mechanisms. In particular, it predicts that in hard decision tasks, the sensory input is weighted more strongly during early stimulus presentation. The influence of sensory input decays later, implying that a choice is made based on prior knowledge and the earliest sensory observations, not on the latest sensory inputs preceding the decision, as one might initially think. On the contrary, in easy trials, sensory weights increase, and the latest sensory inputs are most predictive of the subject's decision. This framework also predicts that the decision threshold (i.e., the amount of integrated sensory evidence deemed necessary to commit to a choice) is not fixed but evolves as a function of time and the sensory input:for hard tasks, this threshold collapses, forcing a decision within a limited time frame; for easy tasks, this threshold increases, i.e., decisions are made with higher accuracy at the cost of slightly longer reaction times.

We present simulations of a decision task implementation that has been very influential in the study of decision making in human and non-human primates. We compare the Bayesian decision maker with a diffusion model, and show that while both models predict similar trends for the mean reaction time and accuracy, the Bayesian model also predicts some strong deviations from the diffusion model predictions consistent with observations of behaving monkeys trained at this task (Shadlen et al., 1996; Gold and Shadlen, 2003; Mazurek et al., 2003; Palmer et al., 2005).

#### **MATERIALS AND METHODS**

#### **SEQUENTIAL PROBABILITY TEST**

Consider a 2AFC between two possible responses, "A" or "B." This decision needs to be made based on an on-going, noisy stream of sensory data. We can express all the sensory information received up to time *t* as an unfolding sequence of sensory inputs, *So*→*<sup>t</sup>* = {*s*0, *sdt*, ..., *st* − *dt*} where *st* is the sensory input received between time *t* and *t* + *dt*. Let us suppose that correct choices are rewarded, while incorrect choices are not. How could subjects adjust their decision strategies in order to maximize their total expected reward? This problem can be separated into an inference stage and decision stage.

#### *Inference*

The inference stage corresponds to a temporal integration of sensory evidence in order to compute the probability that each of the choices is correct. Using the sequential probability ratio test (Ratcliff and McKoon, 2008), the log odds for choices A and B is computed recurrently as:

$$\begin{aligned} L\_t &= \log \left( \frac{P\left(A \mid s\_{o \to t}\right)}{P\left(B \mid s\_{o \to t}\right)} \right) = \log \left( \frac{P\left(A \mid s\_{o \to dt}\right)P\left(s\_t \mid A\right)}{P\left(B \mid s\_{o \to t}\right)P\left(s\_t \mid B\right)} \right) \\ &= L\_{t - dt} + l\left(s\_t\right) \end{aligned}$$

By taking the limit for small temporal steps *dt*, we get

$$\frac{\partial L}{\partial t} = l \left( s\_t \right) \tag{1}$$

where *l*(*st*) = log(*P*(*st* |*A*)/*P*(*st* | *B*)) is the log likelihood ratio for the sensory input received at time *t*, and the starting point of integration corresponds to the prior probability of choices *Lo* = log(*P*(*A*)/*P*(*B*)) – for example, *Lo* = log(2) when A is *a priori* twice more likely than B (Gold et al., 2008).

Of course, this requires that the likelihoods *P*(*st* |*A*) and *P*(*st* | *B*) are known. These likelihoods capture the selectivity and variability of sensory responses. Their relative values describe the reliability of the sensory input at time *t*. Therefore, if the sensory input likelihood is much larger for choice A than for choice B, then this input will strongly support choice A opposed to choice B.

To illustrate this, let us consider a decision based on the noisy spike train of a single motion-sensitive, direction-selective neuron. In this simple decision task, the two alternative choices are between the stimulus moving in the preferred direction of this neuron (choice A) and the opposite, anti-preferred direction (choice B). The sensory input *st* corresponds to the spike train of the neuron, i.e., a temporal binary stream of 1 or 0 (depending on whether a spike is emitted or not at time *t*). We suppose that the baseline firing rate *q* is increased to *q* + *dq* in the preferred direction, and decreased to *q* − *dq* in the anti-preferred direction. Therefore, *q* + *dq* and *q* − *dq* describe the likelihood of a sensory spike given choice A and choice B, respectively.

The initial log odd ratio at the start of the trial is set to *Lo* = 0, indicating that the two stimulus directions occur with the same prior probability. The likelihood ratio is given by

$$l\_t = s\_t \log\left(\frac{q+dq}{q-dq}\right) + (1-s\_t)\log\left(\frac{1-\left(q+dq\right)dt}{1-\left(q-dq\right)}\right)$$

In the limit of small *dt*, and if the change in firing rate induced by the stimulus is small compared to the baseline firing rate, i.e., if *dq* ≈ *q*, the inference equation can be simplified to

$$\frac{\partial L\_t}{dt} = \mathbf{w} \begin{pmatrix} s\_t \ -q \end{pmatrix}$$

where *st* − *q* corresponds to the sensory evidence at time *t* and the sensory weight is set by the input signal-over-noise ratio (SNR) *w* = 2*dq*/*q*. The log odds *Lt* represent the current confidence in choice A relative to choice B. It increases on average if the input firing rate is above baseline, and decreases on average if the input firing rate is below baseline. However, this accumulation is noisy due to the Poisson variability of the sensory spike train. Three example trials are plotted in **Figure 1A**.

**FIGURE 1 | "Bayesian" diffusion model. (A)** Log odd ratios Lt as a function of time in the trial (t = 0: start of sensory stimulation) on three different trials. Dashed lines correspond to the decision thresholds. Red plain line: a correct trial where "choice A" was made (i.e., the upward threshold was reached first), and choice "A" was indeed the correct choice. Blue plain line: another

correct trial where choice B was made (the lower decision threshold was reached first) and B was indeed the correct choice. Dotted blue line: an error trial where choice "A" was made while choice "B" would have been the correct choice. **(B)** Optimal decision thresholds as a function of the strength of the modulation of input firing rate by motion direction (dq).

#### *Decision criteria*

We can distinguish between two variants of the 2AFC task leading to two different decision strategies (Mazurek et al., 2003). In "reaction time" tasks, subjects observe the sensory input and are required to respond as soon as they feel ready to do so. In "fixed delay" tasks, subjects observe the sensory input presented for a fixed duration. They indicate their choice only after a "go" signal, and thus cannot control the decision time.

In "fixed delay" tasks, the optimal decision strategy simply consists of measuring the sign of the log odds at the end of stimulus presentation. If the log odds is positive, choice A is more probable than choice B, and vice versa. Going for the most probable choice will maximize the probability of getting rewarded on each trial.

For "reaction time" tasks, the optimal strategy is a little more complicated. The log odds ratio indicates the on-line probability of making a correct choice if one chooses A ahead of B. If we decide on option A when the log odds ratio crosses a positive threshold *D* and decide on option B when it crosses a negative threshold −*D* (see **Figure 1A**), then the probability of making the correct choice will be given by *PD* = exp(*D*)/1 + exp(*D*).

However, the decision threshold also controls the duration of the trial, since it takes longer to reach a higher threshold. The time/accuracy trade-off can be optimized by setting D to a value that maximizes the total amount of reward collected per unit of time – the reward rate. The optimal decision threshold depends on the details of the experimental protocol. If, for example, a reward is provided only for correct choices, and each trial is followed by a fixed inter-trial interval, the total reward rate is given by

$$RR\left(D\right) = \frac{P\_D}{\text{RT}\_D + T\_{iti}}\tag{2}$$

where RT*<sup>D</sup>* is the mean reaction time, that is, the time it takes on average for *Lt* to reach either *D* or −*D* (Ratcliff and McKoon, 2008). To estimate RT*D*, we approximate the Poisson noise in the cumulated spike counts by white Gaussian noise with variance equal to the mean. The mean first passage time (i.e., reaction time) is then RT*<sup>D</sup>* ≈ *D*/(*l* tanh(*D*)), where *l* = 2*wdq* is the average log likelihood ratio of the sensory input, or, equivalently, the average slope of *Lt*. In analogy with diffusion models, *l* corresponds to the "drift rate."

The optimal threshold is a function of the sensory likelihoods and a solution to *dRR*/*dD* = 0.

The optimal threshold increases with the sensory reliability, as defined by the SNR *w* = *2dq/q*. If the input is very reliable, accurate decisions can be made very quickly. Thus, the optimal threshold is high. If, on the other hand, reliabilities and sensory weights are low, reaching high choice accuracy would be very costly in terms of reaction time. In this case, the optimal threshold is low. Below a certain drift rate, waiting to make a decision is not worth the additional gain in accuracy, and the optimal threshold is zero: decisions should be made immediately, without waiting for the sensory input, resulting in a random choice with accuracy *PD* = 0.5 and reaction time RT*<sup>D</sup>* = 0. The optimal boundary as a function of the sensory "contrast" *dq* is plotted on **Figure 1B**.

This Bayesian approach is different from descriptive models of decision making such as the race model or the diffusion model (Laming, 1968; Link, 1992; Ratcliff and McKoon, 2008). These models were not initially derived from principles of optimality, but from the requirement of capturing human behavior with the simplest possible models. Interestingly, however, these decision mechanisms are equivalent to Bayesian decision making in specific contexts. For example, the parameters of a diffusion model can be adjusted to be equivalent to Bayesian optimal decision in 2AFC tasks when the sensory likelihoods are Gaussians (Ratcliff and McKoon, 2008). The diffusion model first integrates a noisy signal (analogous to the "inference stage" in the Bayesian framework), and takes a decision when the integrator reaches one of two possible bounds (analogous to the optimal criteria *D*). Variants of diffusion models have been shown to successfully reproduce human and animal behavior in 2AFC tasks (Ferguson, 1967; Newsome et al., 1989; Yuille and Bülthoff, 1996; Mazurek et al., 2003; Ratcliff et al., 2003; Palmer et al., 2005;Ratcliff and McKoon, 2008).

While they share similar mechanisms with diffusion models, Bayesian decision models have the advantage of being more constrained by the experimental protocol and the sensory noise. In a diffusion model, the drift rate, the threshold, and the starting point of the integration are all free parameters that can be adjusted to fit experimental data. In a Bayesian model, these are constrained respectively by the prior probabilities of the choice, the likelihoods of the sensory input, and the reward schedule. These parameters are either fixed by the experimental protocol (such as the prior) or can be estimated separately (such as the sensory reliabilities).

Unfortunately, an important disadvantage of the Bayesian framework is that the likelihood ratio of the sensory input *l*(*st*) needs to be known at the start of the trial. In other words, subjects need to know exactly what sensory signals and noise to expect for each of the choices. Without this knowledge, the optimal boundary, the sensory weight and consequently, the drift rate cannot be set. Most past models of decision making did not consider the possibility that the sensory likelihoods could be adjusted on-line as a function of the sensory input. Instead, the threshold and drift rate were assumed to be independent of sensory observations. Thus, the parallel between diffusion models and Bayesian decision making remained essentially qualitative. However, we show below that sensory reliabilities can in fact be estimated within the timescale of the decision itself. Therefore, the parameters of the decision process can be adjusted on-line to better approximate Bayesian decision making within the duration of a single trial.

#### **ESTIMATING SENSORY LIKELIHOODS ON-LINE**

Sensory likelihoods are determined not only by the sensory noise, but also by the nature of the decision task. For example, categorization tasks result in very different likelihoods compared to discrimination tasks. Most of the decisions we make everyday occur in a unique context that will never be repeated. As a result, sensory likelihoods generally cannot be derived purely from past experience. For example, consider the choice between investing in one of two different stock options. If stock option "A" suddenly rises and stock option "B" falls, this could be due to a higher yield of option "A," or just random fluctuations in the stock market. We will never know what to make of this observation without accumulating enough experience on the reliability of stock prices. However, in order to maximize the outcome, we should evaluate the reliability of market fluctuations at the same time that we accumulate evidence, thus making our investment as early as possible. Is this realistic?

There is an equivalent problem in 2AFC tasks. Usually, these protocols inter-mix trials with various levels of difficulties in order to measure psychophysical curves. For example, subjects could be asked to decide between two directions of motion, while varying the level of noise in the motion display (Shadlen and Newsome, 1998), or to do a categorization task, while varying the distance between the test stimulus and the category boundary (Ratcliff et al., 2003). In these protocols the"quality" of the sensory input (i.e., the sensory likelihood ratio) is not known at the start of a trial. In our toy model, varying task difficulties would correspond to changes in the sensory "contrast" *dq*, which affects the sensory weights and optimal boundary for decision making (**Figure 1B**).

For example, let us suppose that the task difficulty in our toy example is varied by controlling the amount of noise in the visual motion stimulus. This can be done by using motion displays composed of moving dots while varying the proportion of dots moving coherently in a single direction, with the rest of the dots moving in random directions (Britten et al., 1992). The proportion of dots moving coherently corresponds to the "motion coherence." These kind of stimuli have been used intensively to investigate the neural basis of decision making in humans and non-human primates. They induce responses in direction-selective sensory neurons (e.g., in the medio-temporal area MT) that can roughly be described by an increase or decrease of the background firing rate by an amount proportional to motion coherence (Newsome et al., 1989; Britten et al., 1992). Schematically, the firing rate of the sensory neuron is *q* + *cdq* for choice A, and *q* − *cdq* for choice B, where c is a function of motion coherence (see **Figure 2A**). The sensory weights and the bounds should be updated accordingly. But how can this happen when trials with high and low coherences are randomly intermixed?

There are two possible approaches to addressing this issue: one is to set a "compromise" between the different levels of coherence by using a fixed sensory weight and a fixed threshold. Alternatively, one could attempt to estimate the coherence on-line, adjusting the sensory weight, and the bound on-line during trial.

Motion coherence influences the firing rate of sensory neurons, and therefore, can be estimated from the sensory input at the same time as the direction of motion. Using the Bayes theorem, we can compute the joint probability of both contrast and choice, *P*(*A*, *c* | *so* →*t*) and *P*(*B*, *c* | *so* →*t*), based on augmented sensory likelihoods; let us call *x* the unknown direction of motion, with *x* = 1 for direction A and *x* = 0 for direction B. To compute the joint probability of all choices and coherence *Pt*(*x*, *c*) = *P*(*x*,*c* | *so* →*t*), we use the sensory likelihood

1 *dt P*(*st* = 1|*x*,*c*) = *q* + (2*x* − 1)*cdq*, resulting in the following recurrent equation:

$$\begin{aligned} P\_t\left(\mathbf{x}, c\right) &= \frac{1}{Z} P\_{t-dt}\left(\mathbf{x}, c\right) \left(q + cdq\right)^{\mathbf{x}s\_t} \left(q - cdq\right)^{(1-\mathbf{x})s\_t} \\ &\times \left(1 - (q + cdq)dt\right)^{\mathbf{x}(1-s\_t)} \left(1 - (q - cdq)dt\right)^{(1-\mathbf{x})(1-s\_t)}, \end{aligned}$$

where *Z* is a normalization term. An estimate of contrast can be obtained by computing its expected value *c*ˆ*<sup>t</sup>* = *<sup>c</sup> c*(*P*(*A*,*c*|*so*→*t*) + *P*(*B*,*c*|*so*→*t*)), while the probability for choice A is given by marginalizing over all possible coherence values *P*(*A* | *so* →*t*) = Σ*cP*(*A*, *c* | *so* →*t*).

The temporal evolution of the estimated coherence and the choice probability are plotted for two motion coherence values in **Figure 2B**. Observe that the coherence estimate evolves on a similar time scale than the choice probability. Consequently, sensory weights and decision thresholds based on motion coherence could, in theory, be adjusted during the time scale of a single decision trial.

However,implementing the full Bayesian integration algorithm requires the accumulation of evidence for all possible combinations of coherence and choice. This is considerably more computationally intensive than a diffusion model, and it is unclear how this could be implemented in a neural architecture. Instead, we considerably simplify the computation using approximate Bayesian optimal decision making, by *separatel*y estimating the reliability of the sensory input and the choice probability. By integrating the sensory input, we extract an on-line estimate of coherence *c*ˆ(*t*).

This estimate is used to adjust both the sensory weights and the boundary on-line during the decision trial. This method is suboptimal, but still reaches higher levels of performance than fixed boundaries and fixed sensory weights while requiring only one additional sensory integrator.

To do this approximate inference, we use an on-line version of the "Expectation Maximization" algorithm (Mongillo and Deneve, 2008). At each time step, we update the log odds *Lt* using the current estimate of coherence:

$$\frac{d}{dt} = \hat{c}\_t \le (s\_t - q) \tag{3}$$

These log odds provide us with an on-line estimate of motion direction (or choice probability) *<sup>x</sup>*ˆ*<sup>t</sup>* <sup>=</sup> *<sup>e</sup>Lt* /<sup>1</sup> <sup>+</sup> *<sup>e</sup>Lt* . We then estimate the coherence by performing a stochastic gradient descent on the log likelihood (see mathematical derivations):

$$\frac{1}{\eta} \frac{d\hat{\boldsymbol{c}}}{dt} = -\hat{\boldsymbol{c}} + \frac{1}{l} \left(2\hat{\boldsymbol{x}} - 1\right) \boldsymbol{w} \left(\boldsymbol{s}\_t - \boldsymbol{q}\right),\tag{4}$$

where *<sup>l</sup>* <sup>=</sup> 2(*dq*2/*q*) is the "default" drift rate for coherence *<sup>c</sup>* <sup>=</sup> 1.

The learning time constant η is a free parameter that controls the amount of past observation used to estimate the coherence. A short time constant provides rapidly adapted but highly variable coherence estimates, while a long time constant provides less variable, slower estimates. In practice, we adjusted η in order to best approximate the mean dynamics of the coherence estimate during exact inference (**Figure 2B**). An even better approximation can be obtained by using a learning rate that decays as an inverse of time (i.e., by implementing a running average). However, we found that this has only a very minor impact on the reward rate or dynamics of the weights and threshold. Therefore, we used a simpler and more biologically plausible stochastic gradient descent rule to update the coherence estimate on-line.

In order to estimate the optimal threshold, we define the function *D*opt (*c*) as the maximal value between zero and the numerical solution of

$$\frac{\partial RR\left(c,D\right)}{dD} = 0\_\* $$

with the reward rate defined as

$$RR\ (c, D) = \frac{P\_D}{\frac{D}{c^2 l} \tanh\left(D\right) + \ \ T\_{iti}}$$

Here we used the fact that the mean drift rate for coherence *c* is *<sup>c</sup>*2*l*. The time-varying optimal threshold is set on-line to *<sup>D</sup>*opt (*c*ˆ).

#### **EXTENSION TO A POPULATION OF INPUT NEURONS**

To test the predictions of the model in a biologically relevant setting, we focused on a noisy motion integration task that has been extensively used for studying the neural basis of decision making. The task is the same as that in our toy example, except that the decision is based on the activities of population of neurons rather than a single spike train.

Subjects in these experiments were required to watch a stimulus consisting of randomly moving dots and chose between two opposite direction of motion (directionA or direction B). The level of noise in the motion stimulus is controlled by the "coherence," that is, the proportion of dots moving coherently in direction A or direction B. Motion coherence varied randomly from trial to trial, so the subject did not know the coherence at the start of the trial. The subjects indicated their choice by an eye movement in the direction of perceived motion, and were rewarded for correct choices. In a "reaction time" version of this task, the subject responded as soon as ready. In "fixed delay" version of this task, the stimulus is presented for a fixed duration and the subjects respond when prompted by a "go signal."

A series of experimental studies with macaque monkeys trained at this task showed that at least two brain areas are involved. In particular, the role of the "sensory input"is played (at least in part) by the medio-temporal area MT. Neural responses from area MT are integrated in the lateral intraparietal area LIP, a sensorimotor brain area involved in the generation of eye movements. Thus LIP is a potential candidate for a Bayesian integrator. However, we focus here on the behavioral prediction of a Bayesian decision model based on the sensory input from area MT.

The firing rates of MT cells are modulated by the direction of motion and by motion coherence. MT neurons have a background response to purely noisy visual displays (with zero coherence) and a "preferred" direction of motion, i.e., their firing rate will be higher in response to motion in this direction and lower in the opposite direction. To a first approximation, if *qi* is the baseline firing of a MT cell, its firing rate is *qi* + *cdqi* in the preferred direction, and *qi* − *cdqi* in the anti-preferred direction, where c parameterize motion coherence. To simplify notation, we suppose that the MT population is balanced between the two directions of motions, i.e., Σ*idqi* = 0.

As before, the log odds are computed as a weighted sum of the spikes from the population of MT cells, gain modulated by an on-line coherence estimate:

$$\frac{d}{dt} = \hat{c}\_t \sum\_i w\_i s\_t^i \tag{5}$$

The initial value for the log odds correspond to the prior odds: *Lo* = log(*P*(*A*)/*P*(*B*)).

The on-line coherence estimate is obtained by a weighted average of motion coherence extracted from each spike train (see mathematical derivations). This gives a single leaky integration equation:

$$\frac{1}{\eta} \frac{d\hat{\mathfrak{c}}}{dt} = -\hat{\mathfrak{c}} + \frac{1}{l} \left(2\hat{\mathfrak{x}} - 1\right) \sum\_{i} w\_i s\_i,$$

where *l* = 4 *i dq*<sup>2</sup> *i qi* is the drift rate for coherence 1. Without loss of generality, we can assume that the "default" coherence is 1, i.e., integration starts at *c*ˆ(0) = 1.

Finally, the optimal threshold is set as before at *D*opt (*c*ˆ).

We compare the predictions from the Bayesian decision model with a diffusion model with fixed sensory weights and a fixed threshold. This diffusion model is similar to a model previously used to account for behavioral and neurophysiological data (Mazurek et al., 2003). The "integrated input" in the diffusion model is:

$$\frac{dL}{dt} = \sum\_{i} w\_{i} s\_{t}^{i} \tag{6}$$

The boundary is set at a fixed level *D*¯ , and the starting point of integration (for each setting of the prior) is set at a fixed value *L*¯*o*. For easier comparison with the Bayesian decision model, *D*¯ was adjusted in order to achieve the same mean reaction time. For each prior, *L*¯*<sup>o</sup>* was adjusted in order to reproduce the mean response biases in the Bayesian model.

#### **SIMULATION PARAMETERS**

For the single neuron model, we used *q* = 200 Hz and *dq* = 20 Hz. For the population model, we employed a population of 100 MT neurons, with baseline firing rate *q* = 10 Hz and modulation by motion stimulus *dq* = 1 Hz (50 neurons) or *dq* = −1 Hz (50 neurons). The time constant for coherence estimation was set to 1/η = 112 ms. Motion coherence was varied between 0 and 4. The inter-trial interval *T*iti was set to 1 s.

#### **MATHEMATICAL DERIVATION OF THE COHERENCE ESTIMATE**

We describe here the stochastic gradient descent method for estimating the coherence *c*(*t*) on-line. We do so in the case of a single spike train. The generalization to a population of input neurons is straightforward.

Standard "batch" expectation maximization would consist in choosing a fixed temporal window *T*, and then repeating the following procedure until convergence: First, compute the expected motion direction *x*ˆ(*T*) given the current coherence estimate, then update the coherence estimate by the value of *c* that optimizes the log likelihoods (summed for all input *s*0→*<sup>T</sup>* in the temporal window) given the current direction estimate. This is an off-line method and thus biologically implausible. Instead,we perform online expectation maximization using stochastic gradient descent. At each time step we update the coherence estimate using only the current training example (input *st*) instead of the whole sequence *s*0→*<sup>T</sup>* . Using regularization parameter η, coherence is updated iteratively by the value of contrast that maximizes the sensory likelihood. In discrete time, this corresponds to:

$$
\hat{c}\_{t+dt} = (1 - \eta dt)\,\hat{c}\_t + \frac{\eta dt}{\langle s \rangle} c\_i^s,
$$

where *s*is the frequency of observation *st* [*qdt* if *st* = 1 (*1* − *qdt)*if *st* <sup>=</sup> 0] and *<sup>c</sup><sup>s</sup> <sup>t</sup>* is the value of coherence that maximizes the current likelihood:

$$P(s\_l | \hat{\mathbf{x}}, c) = (q + cdq)^{\hat{\mathbf{x}}s\_l} (q - cdq)^{(1 - \hat{\mathbf{x}})s\_l} (1 - (q + cdq)dt)^{\hat{\mathbf{x}}(1 - s\_l)}$$

$$(1 - (q - cdq)dt)^{(1 - \hat{\mathbf{x}})(1 - s\_l)} \tag{7}$$

Taking the limit *dt* →0 and neglecting terms of higher order in *dt* leads to the following differential equation:

$$\frac{1}{\eta} \frac{d\hat{c}}{dt} = -\hat{c} + \left(2\hat{\kappa} - 1\right) \frac{\left(s\_t - q\right)}{dq}$$

From which it is straightforward to derive eq. 5.

#### **FREE PARAMETERS IN THE DIFFUSION AND BAYESIAN MODELS**

Here, our goal is to show that the Bayesian model reproduces qualitative trends in the data that are not captured by a diffusion model. However, it is crucial to identify the free parameters (and thus, the complexity) of both models if they are to be fitted quantitatively to behavioral data. Since the true sensory likelihoods *q*, *dq*, and the modulation of firing rate by each level of coherence *c* are not observables in behavioral tasks, they would have to be fitted to the data for each model. In addition, our version of the diffusion model has the following additional free parameters: the starting point of integration for each priors *L*¯*<sup>o</sup>* and the decision threshold *D*¯ . The simplified Bayesian model has the following free parameters: the initial coherence estimate *c*ˆ(0) and the coherence estimate update rate η. Other parameters (dynamics of thresholds and sensory weight, starting point of sensory integration) are imposed by parameters of the task (e.g., *T*iti, priors for choices *P*(*A*), *P*(*B*)) and approximate Bayesian inference equations. Our simplistic diffusion model have thus at least as many free parameters as the simplified Bayesian model.

More complex diffusion models can provide better fits to experimental data and capture some of these qualitative trends, but it comes at the cost of additional free parameters, i.e., variability in starting point of integration and drift rates (Ratcliff and McKoon, 2008), urgency signals (Hanks et al., 2011), or time-varying costs for sensory integration (Drugowitsch et al., 2012).

### **RESULTS**

#### **DYNAMICS OF SENSORY WEIGHTS AND DECISION THRESHOLD**

The sensory weight (i.e., the weight given to each new spike for updating the log odds) is proportional to the coherence estimate. Thus the sensory weight is a dynamic function of time and the integrated sensory signal (see **Figure 3A**). At the start of the trial, the coherence estimate is equal to the initial estimate *c* = 1. As time increases, the coherence estimate converges to its true value. When the true motion coherence is higher than 1, the sensory weight increases over the duration of the trial. As a result, sensory inputs have a larger impact on the log odds at the end of the trial than at the beginning of trial. If, on the other hand, the true coherence is lower than 1, the sensory weight decreases over the duration of the trial. Thus, an input spike has a larger impact on the log odds at the beginning of the trial than at the end.

The decision threshold also needs to be updated on-line, since it depends on motion coherence. **Figure 3B** represents the average temporal evolution of the log odds and optimal threshold, for two levels of (true) motion coherence. Notice that the threshold follows the same trend than the sensory weight: it collapses for hard tasks, but stays constant or increases moderately for easy tasks. The effect of the collapsing bound at low contrast is to force a decision within a limited time frame if the trial is too difficult. In this case, the cost of waiting longer to make a decision outweighs the benefit of improved accuracy. Collapsing bounds have indeed been proposed as an upgrade for diffusion-based decision models with varying levels of sensory input strength. In particular, approximate

Bayesian decision making predicts that a decision is not made at a fixed level of accuracy. Rather, the decision is made with a more permissive threshold (i.e., at a lower confidence level) when the trial is more difficult.

### **BEHAVIORAL PREDICTIONS**

Simulated behavioral results are presented separately for"Reaction time" and "Fixed delay" tasks. To investigate the effect of priors, we either presented the two directions of motion with equal probability (*Lo* = 0) or direction A was presented more often than direction B (*Lo* = 0.6) or vice-versa (*Lo* = −0.6).

#### *Reaction time task*

Psychophysical curves and reaction times as a function of motion coherence and priors are plotted in **Figure 4**. While the psychophysical curves are qualitatively similar for the diffusion model and the Bayesian model (**Figures 4A,B**), the mean reaction times (**Figures 4C,D**) and reaction time distribution (**Figures 4E,F**) are notably different. In particular, RTs are shorter at low coherence and larger at high coherence than expected from a diffusion model (**Figures 4C,D**). This is mainly because for low coherence trials, the on-line estimate of coherence tends to decrease the decision threshold, thus shortening the reaction time. The reverse is true at high coherence. As a consequence, the animal spends less time on difficult trials (they are not worth the wait), and more time on easy trials (little extra-time result in a large increase in accuracy) than would be predicted by a diffusion model.

While the reaction time distributions for a diffusion model are very asymmetrical, with a fast rise and a long tail, the reaction time distributions predicted by the Bayesian model are quasisymmetrical. The decision threshold is initially high, resulting in an absence of very short reaction time. The collapsing bound also prevents very long reaction times, which explains why the reaction time distributions of the Bayesian model do not have long tails. This occurs at all motion coherence levels even if, on average, the threshold does not collapse at high coherence: Long trials correspond to "bad trials" were the quality of the sensory input was low (since the decision threshold was not crossed early). In these trials, the estimated motion coherence is also low (even if true motion coherence is high). The bound collapses, resulting in a shortening

**FIGURE 3 | Simplified Bayesian model. (A)** Sensory weights in the simplified Bayesian model (average of 20000 trials) as a function of time after stimulus presentation. Black line: c = 2. Light gray line: c = 0 (i.e., sensory input is pure noise). Dark gray line: c = 1. **(B)** Average log odds Lt (plain lines) and decision threshold (dashed lines) in the simplified

Bayesian model. These temporal profiles were obtained by averaging over 20000 trials. The decision variables and threshold on individual trials (as well as the sensory weights) are varying randomly due to sensory noise (e.g., see **Figure 1A**). Black lines: c = 2 Dark gray lines: c = 1. Light gray lines: c = 0.2.

of the duration of these "bad trials," which would have formed the tail of the RT distribution in a diffusion model.

For the same reason, the Bayesian model predicts longer reaction times for error trials than for correct trials (**Figure 4C**). In contrast, a diffusion model would predict the same reaction time for correct and error trials (**Figure 4D**). This is another consequence of the correlation between the length of the trial and the estimated coherence. In trials where the quality of the sensory input is low (due to sensory noise) the threshold collapses and is crossed at a lower value of accuracy. These "bad trials" have both longer reaction times and lower accuracy.

The benefit of using a Bayesian decision model is particularly strong when it comes to incorporating prior knowledge with the sensory evidence. By estimating motion coherence, the Bayesian integrator can appropriately adjust the contribution of the sensory evidence compared to its prior (see results from the fixed delay tasks). The diffusion model, on the other hand, over-estimates the quality of the sensory input at low coherence and under-estimates it at high coherence. Consequently, the overall effect of the prior (as implemented by a bias in the starting point of integration) is too weak at low coherence and too high at high coherence.

By adjusting the sensory weights and decision thresholds online as a function of the coherence estimate, the Bayesian decision model constantly re-evaluates the influence of the prior during the entire duration of the trial. The effect of the prior is thus much more than setting the starting point for sensory integration. In particular, this can paradoxically make the prior appears as an additional "sensory evidence," as illustrated in **Figure 5**. While the diffusion model (**Figures 5A,B**) starts integration at a level set by the prior, but later behaves as a simple integrator, the influence of the prior in the Bayesian model (**Figures 5C,D**) is amplified during the trial. This strongly resembles a change in the drift rate, as if the priors were in fact an additional "pseudo" motion signal.

## *Fixed delay tasks*

During fixed delay tasks, subjects see the stimulus for a fixed duration and are required to respond only after presentation of a "go" signal. Thus, in this case, there is no time/accuracy trade-off and no need for a dynamic decision threshold. Instead, the decision is determined by the sign of the log odds ratio at the end of stimulus presentation.

In a diffusion model, all sensory inputs are taken equally into account, regardless of whether they occur at the beginning or at the end of stimulus presentation. By contrast, the Bayesian decision model re-weights the sensory evidence as a function of the estimated motion coherence, and thus, sensory inputs do not all contributes equally to the final decision. This is illustrated in **Figure 6A** where we plotted the average sensory input ( *twis<sup>i</sup> t*) at different times during stimulus presentation, conditioned on the fact that the final choice was A. Here we consider only trials with zero coherence, i.e., *c* = 0. In this case the decision is purely driven by random fluctuations in the sensory input. The curves are a result of averaging over 20000 trials. The stimulus was presented for 2000 ms and the decision was made at *t* = 2000 ms. Only trials

resulting in choice A (*L*<sup>2000</sup> > 0) were selected for averaging. For a diffusion model (red), the curve is flat and slightly above zero. This is because positive inputs tend to increase the probability that the final log odds will be positive, and the final choice will be "A." In a diffusion model, the order of arrival of these inputs does not matter, resulting in a flat curve. In contrast, the Bayesian decision model (blue line) gives more confidence to inputs presented early in the trial. This is because the initial coherence estimate [*c*ˆ(0) = 1] is actually larger that the real motion coherence (*c* = 0 in this case). This results in the first inputs being taken into account more so than later inputs. As a consequence, the decision-triggered average of the input decays over time.

A non-intuitive consequence of estimating motion coherence on-line is to decrease the apparent temporal window of integration for the Bayesian decision model. For low coherence trials, the initial input will influence the final decision much more than it should. Later in the trial, the influence of the input decays, but can never completely overcome the initial bias produced by early sensory noise. Consequently, integration is initially fast and later slows down considerably, to a point where the decision accuracy does not appear to benefit much from longer stimulus presentation (**Figure 6B**). This does not happen in a diffusion model, where each sensory input is equally weighted at all time. For long presentations of low coherence stimuli, the diffusion model performs paradoxically better than a Bayesian model. This is a consequence of approximate inference: coherence is estimated separately from motion direction, thus ignoring correlations between the two estimates.

Finally, the diffusion model and the Bayesian model behave very differently in the presence of priors. This is illustrated on **Figure 6C**. At zero coherence trial, the influence of the prior is very strong for short stimulus presentation, but decays for longer stimulus presentation, even when the stimulus is pure noise. This decay is not a desirable feature: the sensory input is completely uninformative so the influence of prior information about the choice should stay strong regardless of the length of stimulus presentation (the ideal strategy would be to always respond according to the sign of the prior and completely ignore sensory information). Unfortunately, this decay cannot be completely prevented if one does not know initially that the coherence is zero. By dynamically reweighing sensory evidence, the Bayesian decision model can prevent this "washing away" of prior information by noise. Once enough sensory information has been collected to bring the coherence estimate to zero, it stops integrating the sensory noise and relies only on the prior. The diffusion model, on the other hand, keeps accumulating noise and quickly forgets the prior information.

## **DISCUSSION**

#### **EXPERIMENTAL PREDICTIONS**

The Bayesian model predicts significant deviation from the prediction of a diffusion model when the precision of the sensory input (or the task difficulty) is varied randomly from trial to trial. Some of these predictions qualitatively fit previous results.

Thus, we predict that the reaction times are slower for error trials than correct trials, as shown in **Figure 4C**. This was indeed reported experimentally (Mazurek et al., 2003).

The model also predicts quasi-symmetrical reaction time distribution, as shown in **Figures 4E,F**. Such quasi-symmetrical RT distributions were observed in macaque monkeys performing this motion discrimination task (Ditterich, 2006). This is one of the most striking deviations of this behavior from the predictions of a diffusion model. An "urgency signal" increasing the probability of a choice with time during the trial has been proposed to account for these data (Ditterich, 2006). The effect of the urgency signal is similar to the effect of a collapsing bound.

The modulation by the prior resembles a pseudo "motion" signal, as shown in **Figures 4C,D**. Indeed, this was also reported experimentally (Palmer et al.,2005;Hanks et al.,2011). Once again, this data was attributed to a collapsing bound or an urgency signal forcing faster decisions in low coherence trials (Palmer et al., 2005; Hanks et al., 2011).

Finally, we predict that the influence of the sensory signal is stronger early in the trial than later in the trial, as shown in **Figure 6A**. Indeed, this effect is also observed in monkeys performing the motion discrimination task in zero coherence trials (Kiani et al., 2008). The decrease in sensory weights in low coherence

Only trials were choice A was made after a 2 s stimulus presentation (i.e., L<sup>2000</sup> > 0) were used for this choice-triggered average. Blue line: simplified Bayesian model. Red line: diffusion model. The diffusion model weights all sensory inputs equally while the Bayesian model relies on inputs only early in the trial. **(B)** Percent of correct choices as a function of the duration of stimulus presentation. Plain blue line: Bayesian model at low coherence (c = 0.1). Dotted blue line: Bayesian model at higher coherence (c = 0.5).

(c = 0), with a prior favoring choice A (Lo = 0.6), as a function of the duration of stimulus presentation. Since the input is pure noise, optimal strategy (if coherence was known) would be to always respond "A" (i.e., probability of choice A should be 1). The Bayesian model saturates to a suboptimal but still high probability of choice A. In the diffusion model, the influence of the prior decays over time.

trials limits the effective integration time window, causing saturation of performance with longer stimulus presentation, as shown in **Figure 6B**. Indeed, this was reported as well in the fixed duration task (Kiani et al., 2008). The authors accounted for these data by assuming that the animal reaches an internal decision bound after which it stops integration until the "go signal" is provided. We predict on the contrary that there is no "internal bound." The monkey stops integrating in low coherence trials as soon as it realizes that the sensory input in entirely unreliable. This should not occur in high coherence trial.

Finally, a strong prediction of the adaptive Bayesian model is that the effect of the prior will not "wash away" for longer presentation times when the motion coherence is zero (**Figure 6C**), in contrast to the decay in prior influence normally observed when coherence is higher. To our knowledge, this prediction has not been tested experimentally.

## **COMPARISON WITH OTHER BEHAVIORAL MODELS**

Our model is not the first variant of a diffusion model that accounts for the observed animal behavior in the motion discrimination task. Other models of decision making have focused on proposing a biologically plausible neural basis for decision mechanisms (Gold and Shadlen, 2002; Kiani et al., 2008; Wang, 2008; Churchland et al., 2011). They did not consider that the drift rate

of a diffusion process or the bound could be adjusted on-line as a function of the sensory input. However, they share similar mechanisms with Bayesian decision models, such as a decision thresholds that collapses over time or, equivalently, an urgency signal that increases over time (Ditterich, 2006). The "integration to bound" model (Kiani et al., 2008) assumes that sensory integration takes place as in a diffusion model, but only until the integrated evidence reaches an internal bound. No further integration is performed after that. This could indeed account for the stronger weight of sensory evidence at the beginning of the trial and the saturation of performance for longer stimulus duration.

One of the strongest motivations in building a Bayesian model is to have the capability to not only extract a single estimate from the sensory input (e.g., direction of motion) but also to extract the uncertainty associated with this estimate. This is extremely useful since this information can then be combined optimally with other noisy sensory cues (Ernst and Banks, 2002) or used to compute probabilistically optimal policies (Dayan and Daw, 2008). Unfortunately, this is also costly. Uncertainties are harder to estimate since they generally require much more data than a simple estimate. Fortunately, biological spike trains are Poisson distributed to a first approximation. In a Poisson process, uncertainty is directly reflected in the gain of the neural responses (Zemel et al., 1998). Uncertainty can then be relatively easy to extract, which is what we exploited here.

Note that an even easier solution is available when both the modulation of firing rate (*dq*) as well as the baseline firing rate (*q*) are both equally gain modulated by certainty. For our toy model, this could correspond to an effect of coherence corresponding to multiplying both the *dq* and *q* by *c*. In this case, the sensory weights are constant (independent of *c*) and the diffusion model is exactly equivalent to a Bayesian decision model. This solution has been proposed previously in the context of population coding, in a variant of the motion discrimination task involving a continuous direction estimate (Beck et al., 2008). Unfortunately, the firing rate modulation reported in MT during motion discrimination tasks does not support this assumption. The baseline firing rate appears to be largely independent of motion coherence (Britten et al., 1992).

Other solutions have also been proposed involving the use of elapsed time rather than an explicit representation of the choice probability (Kiani and Shadlen, 2009; Hanks et al., 2011). Indeed, each level of integrated evidence and time during the trial can be mapped to a particular level of accuracy for the sensory signal. The predictions for the effect of priors are similar to ours and have been shown to fit experimental data (Hanks et al., 2011). Note, however, that a policy based on elapse time is only useful if coherence is constant during the whole duration of the trial and if the "beginning" of sensory stimulation is clearly marked. This strategy also assumes that "elapsed time"is directly available to the decision maker. While this use of elapsed time could represent a strategy learnt by highly trained subjects, it is not clear whether it could be applied to "single shot" decision making or in the presence of on-going sensory data whose reliability may vary, as in our stock market example. Moreover these models cannot deal with sensory signals starting and ending unpredictably. For example, a coherent motion signal could suddenly appear in an initially random motion display. Ourframework constantly adapts the sensory integration and decision strategy to the on-going sensory signal. It can thus detect and properly respond to such events. Models based on elapsed time could not do so, since the start of sensory integration (time 0) cannot be inferred before the motion stimulus is actually detected.

The trials for which the Bayesian model makes predictions that are most notably different from previous models are the easy trials, where coherence is high. In this case, our model predicts an increase in sensory weights and a constant or slightly increasing (not collapsing) decision threshold. This means in particular that "motion pulses" would have more impact if given at the end of the trial. This contrasts with zero coherence trials, where they have more impact at the beginning of the trial than at the end (Kiani et al., 2008). This suggests a simple ways of testing our theory experimentally.

A recent approach used dynamic programming to model optimal decision strategies under varying motion coherence (Drugowitsch et al., 2012). This model maximized the reward rate by estimating (for each time in the trial) the value of three possible actions: collecting more evidence, making choice A or making choice B. This method is similar to the full Bayesian integration algorithm, except that it replaces the joint probability distribution over motion direction and coherence with a probability distribution over cumulated sensory evidence and time in the trial. It can indeed reproduce the behavioral results with high accuracy, in particular the RT distributions. However, in order to do so one must assume an explicit cost to cumulating more sensory evidence (rather than taking the decision immediately). This cost varies with the time in the trial (i.e., a full temporal profile for the cost of cumulating evidence as a function of time is fitted to the data). This additional degree of freedom can capture many deviations from what would be Bayesian optimal. Note that the measured cost was initially stable at the beginning of the trial then increased rapidly in both monkeys and humans (Drugowitsch et al., 2012). Rather than assuming a time-varying cost, we propose instead that these deviations are a result of approximate inference. Instead of computing the probability distribution over all sensory likelihoods, which would in general be intractable, the brain uses two coupled integrators separately estimating the sensory precision and motion direction on-line. Whether our model fits behavior quantitatively (and not only qualitatively) will need to be further investigated.

## **POSSIBLE NEURAL IMPLEMENTATION**

An example of biologically plausible mechanisms for decision making involves recurrent network models with two competing populations of neurons receiving evidence for each direction of motion (Wang, 2002, 2008; Wong and Wang, 2006). Parameters can be adjusted to ensure a slow time constant of integration during the sensory integration phase (line attractor), similar to a diffusion process. The network eventually reaches a basin of attraction, converging to one of two possible stable states, which implement the threshold crossing and decision (Wong et al., 2007). This is however not an instantaneous process. As the network reaches the basin of attraction, it gradually loses its sensitivity to the input, resulting in a decaying sensory input influence on the final decision, and, if in addition, both populations receive an ongoing background signal, an urgency signal or "collapsing bound" could also be implemented.

Recurrent dynamics could indeed implement the decreasing sensory weights and collapsing bound required in low coherence trials. However, they cannot implement the increasing sensory weights predicted in easy trials.We notice however, that the on-line coherence estimate (and thus, the synaptic gain) can be understood as fast Hebbian plasticity with a strong regularization term (the decay η). More precisely, it is equivalent to the "BCM" rule (Bienenstock et al., 1982) measuring covariance between pre and post-synaptic activity. Here, we interpret the pre-synaptic input as *st* (with mean *q*) and the post-synaptic activity as the probability of choice *x*ˆ*<sup>t</sup>* (with mean is 0.5). For example, fast Hebbian plasticity between MT cells and LIP cells could implement such mechanism in the motion integration task Therefore, local synaptic plasticity rules could provide an on-line estimate of sensory precision, thereby gain-modulating the incoming sensory information by its reliability at each level of the cortical processing hierarchy, while recurrent network dynamics could implement the collapsing bound. Note that if the decay η was replaced by a much smaller learning rate and gain modulated by reward prediction error, this rule would correspond to a reinforcement learning rule previously proposed to account for the improved performance of monkeys learning coarse versus fine motion discrimination tasks (Law and Gold, 2009). This suggests that on-line changes in sensory weights during a single decision trial could rely on neural mechanism similar to those implementing perceptual learning at a much slower time scale.

#### **LIMITATIONS OF THE APPROACH**

In order to avoid accumulating information for all combinations of coherence and motion directions while proposing a biologically plausible implementation, we separated the estimate of coherence and the estimate of motion direction, thus implementing approximate (not exact) inference. The cost of this approximation is the introduction of biases, e.g., a differential weighting of sensory information at different moment of the trials. In the reaction time task, the improvement acquired using an approximate Bayesian framework is also moderate compare to an optimized diffusion model (corresponding to an increase of about 5% in the reward rate).

We also chose to concentrate on the inference stage (i.e., extracting and using sensory likelihoods to infer the probability of sensory interpretations) rather than the decision stage (i.e., the threshold).

### **REFERENCES**


Our greedy method for setting the threshold to the value that would be optimal if motion coherence was always (i.e., in all trials) equal to its current estimate *c*ˆ*<sup>t</sup>* is naive and probably strongly suboptimal. We suspect, however, that any efficient policy based on an on-line estimation of sensory likelihoods will result in qualitatively similar predictions, i.e., dynamic sensory weights and thresholds.

Finally, RT distributions in humans performing the same motion discrimination task are more non-symmetrical than monkey RT distributions, and are better fitted by a diffusion model. Moreover, the effect of priors in humans is well fitted by a change in the starting point of integration (Ratcliff and McKoon, 2008). Note that in contrast to monkeys, there were no 0 coherence trials in these human experiments, which may have decreased the interest of using a collapsing bound (the collapsing bound essentially prevents "guess" trials based on pure noise from taking too long). In a more recent experiment including zero coherence trials, evidence for an urgency signal was also found in human subjects (Drugowitsch et al., 2012) albeit its exact influence on RT distributions is not shown. Moreover, it is unclear whether humans used the same criteria for reward rate than monkeys performing for juice reward.

decision in developing oculomotor commands. *J. Neurosci.* 23, 632–651.


markov models. *Neural Comput.* 20, 1706–1716.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 January 2012; accepted: 04 May 2012; published online: 05 June 2012.*

*Citation: Deneve S (2012) Making decisions with unknown sensory reliability. Front. Neurosci. 6:75. doi: 10.3389/fnins.2012.00075*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Deneve. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## The role of reward in dynamic decision making

## *Magda Osman\**

Biological and Experimental Psychology Centre, School of Biological and Chemical Sciences, Queen Mary College, University of London, London, UK

#### *Edited by:*

Carlos Eduardo De Sousa, Northern Rio de Janeiro State University, Brazil

#### *Reviewed by:*

Jochen Ditterich, University of California, USA Michael X. Cohen, University of Amsterdam, Netherlands

#### *\*Correspondence:*

Magda Osman, Biological and Experimental Psychology Centre, School of Biological and Chemical Sciences, Queen Mary College, University of London, Mile End Road, London E1 4NS, UK. e-mail: m.osman@qmul.ac.uk

The present study investigates two aspects of decision making that have yet to be explored within a dynamic environment, (1) comparing the accuracy of cue-outcome knowledge under conditions in which knowledge acquisition is either through Prediction or Choice, and (2) examining the effects of reward on both Prediction and Choice. In the present study participants either learnt about the cue-outcome relations in the environment by choosing cue values in order to maintain an outcome to criterion (Choice-based decision making), or learnt to predict the outcome from seeing changes to the cue values (Prediction-based decision making). During training participants received outcome feedback and one of four types of reward manipulations: Positive Reward, Negative Reward, Both Positive + Negative Reward, No Reward. After training both groups of learners were tested on prediction and choice-based tasks. In the main, the findings revealed that cue-outcome knowledge was more accurate when knowledge acquisition was Choice-based rather than Prediction-based. During learning Negative Reward adversely affected Choice-based decision making while Positive Reward adversely affected predictive-based decision making. During the test phase only performance on tests of choice was adversely affected by having received Positive Reward or Negative Reward during training. This article proposes that the adverse effects of reward may reflect the additional demands placed on processing rewards which compete for cognitive resources required to perform the main goal of the task. This in turn implies that, rather than facilitate decision making, the presentation of rewards can interfere with Choice-based and Prediction-based decisions.

**Keywords: dynamic, decision making, prediction, choice, reward**

## **INTRODUCTION**

The main objective of the present study is to build on the paradigms developed in the decision sciences in order to explore insights from work in the neurosciences on the role of reward. Based on the presentation of different types of reward outcomes, the present study examines the accuracy of cue-outcome knowledge when learning about a dynamic environment either through Choice-based decisions or Prediction-based decisions. A broader aim of this article is to elucidate the philosophical issues raised from work investigating decision making exclusively using behavioral techniques as compared to work using neuropsychological techniques.

Imagine a scenario in which we have recently installed a new energy monitoring system as a way of trying to reduce our fuel bill. In order to achieve this goal we need to learn about the relationship between cues (the devices in our home) and outcomes (energy use), while also taking into account our basic living requirements. We might decide that the best way to go about learning the cue-outcome relationships is by first choosing to make regular interventions on cues (varying which devices to use, varying the length of time of using the devices, and the time of use of various devices) and then examining their effects on the outcome (billing of fuel consumption). This is an example of Choice-based decision making in which cue-outcome relations are acquired via cueintervention. Alternatively, by first monitoring the changes in cues (i.e., what devices are being used, and when) and then observing the changes in the outcome (energy use as indicated on the monitor) we might decide to predict the changes in the outcome from the changes in cue values. This is an example of Prediction-based decision making in which cue-outcome relations are acquired via estimates of the expected outcome value. Thus, both Choicebased decision making and Prediction-based decision making are methods of acquiring cue-outcome knowledge.

In order to achieve the intended goal, which is to ultimately to reduce our fuel bill, we would need to implement cue-outcome knowledge (acquired by either method – prediction/choice) in order to decide how we might change ourfuture behavior to reduce our energy consumption. By implementing cue-outcome knowledge, over time we would be able to track the relative success of our decisions (positive reward, i.e., discovering that there was a decrease in the fuel bill) and the relative failure of our decisions (negative reward, i.e., discovering that there was an increase in the fuel bill). This form of updating, often referred to as reinforcement learning/reward learning is a way of associating rewards to the outcomes of decisions, which in turn influences how cue-outcome knowledge is implemented and modified.

What the above example illustrates is that, when we try to learn what variables that cause changes in a dynamic environment, we need to learn about cue-outcome relations, and we can do this through Choice-based decision making or Prediction-based decision making. Choice-based decision making involves refining the decisions that will help utilize the value functions associated with an outcome in order to reduce the discrepancy between a target (goal) and the outcome (Wörrgötter and Porr, 2005). Alternatively, we can learn what variables generate changes in a dynamic environment via Prediction-based decision making. This involves a process that refines the decisions that will determine the expected value function associated with an outcome (Wörrgötter and Porr, 2005). Either form of decision making will enable an incremental build-up of cue-outcome knowledge through a series of decision (prediction or choice). This means that future actions reflect the process of adapting and updating the cumulative changes experienced in the environment (Osman et al., 2008; Osman, 2008a,b, 2010a).

While neuropsychological research has made considerable advances in understanding the ways in which rewards are processed under different conditions (i.e., when the rewards occur and how often), very little work has focused on comparing the effects of different types of rewards on Prediction-based and Choice-based decision making, particularly in task environments that involve dynamic decision making (hereafter DDM) of the kind described in the example. Similarly, only recently has there been any work in the Judgment and decision making domain which directly compares the accuracy of cue-outcome knowledge gained via Prediction-based and Choice-based decision making in a dynamic environment (Osman and Speekenbrink, in press).

Osman and Speekenbrink (in press) showed that generally cueoutcome knowledge acquired either through Prediction-based or Choice-based decision making was sufficiently flexible to enable successful transfer to tests of choice and prediction. Moreover, these findings are generally consistent with reinforcement learning models that would claim that prediction errors are the source of cue-outcome learning, which can be generated either through Choice or Prediction. The key issue, and the focus of the present study, is to bring together the work from the decision sciences and the neuropsychological domain in order to investigate an unexplored question: What are the effects of different types of rewards on cue-outcome learning (i.e., Prediction-based, Choice-based decision making) in a DDM environment?

Broadly, both Prediction-based decisions and Choice-based decisions should lead to an estimate of what will happen to the outcome following a change in a cue variable, in other words a prediction is generated. Moreover, Reinforcement learning/Reward based learning models (Montague et al., 1996; Schultz et al., 1997) also claim that cue-outcome knowledge is acquired via errorbased learning, that is, an error (prediction error) is generated by a comparison between an action (cue-intervention) and the actual outcome that occurs (reward; i.e., Choice-based decision). Alternatively an error can occur based on a comparison between an expected outcome from a choice and the actual outcome (i.e., Prediction-based decision). Thus, prediction errors are the source of learning – or fine tuning cue-outcome knowledge, and this is because the magnitude of the deviation between prediction/cueintervention and the actual outcome indicates the accuracy of cue-outcome knowledge. The models predict that changes in the rate of learning reflect changes in the reward outcomes (i.e., success or failure of a decision reflected in the outcome itself).

Reinforcement learning models have enjoyed much success in the neuropsychological domain in which there is amassing evidence that the processing of rewards corresponds to phasic activity of mid-brain dopamine neurons (Schultz et al., 1997; Schultz, 2006; Rutledge et al., 2009). The pattern of activation of these neurons differs according to the different types of reward outcomes that occur. That is, dopaminergic neurons show short phasic activation in the presence of unexpected rewarding outcomes (e.g., presentation of food, presentation of money), and in the course of learning the phasic response shifts to indicators (i.e., cues) of rewarding outcomes (e.g., lights, tones, smiley faces, money). Similarly, in the presence of unexpected negative outcomes (e.g., loss of reward) there is a corresponding decrease in activation (Hollerman and Schultz, 1998). In addition, event-related brain potential (ERP) studies have reported that performance feedback generates ERP waveforms that are typically observed as a negative-going component peaking between 250 and 300 ms after feedback is presented (Holroyd and Coles, 2002; Hajcak et al., 2007; Peterson et al., 2011). The amplitude of the feedback negativity is determined by the impact of phasic dopamine signals (Holroyd and Coles, 2002). The amplitude of feedback negativity indicates the interaction between feedback valence and expectedness, so that unexpected negative feedback produces greater feedback negativity relative to unexpected positive feedback, which is typically associated with smaller negativity signals (Hajcak et al., 2007).

In addition, neuropsychological research on decision making has examined different properties of rewards (e.g., reward probabilities, reward structures; e.g., Daw et al., 2006; Behrens et al., 2007; Boorman et al., 2009; Jocham et al., 2009). Brain imaging data (O'Doherty, 2004; Sailer et al., 2007) has shown that there is greater brain activation in the orbital frontal cortex (OFC), caudate nucleus, and frontal polar areas when participants experience positive rewards (gains) rather than negative rewards (losses). This suggests that reward outcomes themselves are processed differently. Also, cortical activation can also reflect differences in reward probabilities, as well as changes in the reward probabilities over time (Cohen, 2006; Schultz, 2006; Sailer et al., 2007; Schultz et al., 2008). Moreover, during cue-outcome learning, activation increases in the OFC and putamen when experiencing losses, and activation decreases following gains; this is consistent with evidence from EEG studies (e.g.,Cohen et al., 1996) and fMRI studies (e.g., Cohen et al., 2008).

Two recent neuropsychological studies contrasting Predictionbased learning (making judgments of expected rewards from actions, alternatively Prediction-based decision making) with action-based learning (choosing a cue that will bring about a reward, alternatively Choice-based decision making) suggest that there may in fact be underlying neurological differences between these two forms of learning (Hajcak et al., 2007; Peterson et al., 2011). The task in Hajcak et al.'s (2007) ERP study involved selecting from four doors the one which was likely to have a prize behind it (i.e., choice). Prior to each choice participants were told the objective probability of reward [i.e., the prize is behind 1 (*P* = 0.25),2 (*P* = 0.50), or 3 (*P* = 0.75) doors]. The key manipulation involved participants guessing (i.e., predict) "yes" or "no" that they would win just before their choice (Experiment 1), or just after their choice (Experiment 2).Hajcak et al. (2007)found that consistent with reinforcement models, there was no difference between the two conditions based on behavioral measures of prediction and choice. There was however an effect on the correspondence

between feedback negativity amplitude and subjective estimates of success. Feedback negativity tracked predictions of outcomes after people made their choices, but not before. It was speculated that the process of actively making a selection involved estimating the success of each choice, and then selecting the option with the highest subjective reward outcome. Thus, this evaluative method strengthened and stabilized predictions, whereas before a choice was made the prediction was based on few evaluations of the expected outcomes, and therefore weakened the strength of the predictions.

Using a different design, Peterson et al.'s (2011) study also separated prediction from action using an incremental learning task. Participants were either free to select a cue (one of four pictures) that yielded the highest expected pay off (choice trials), or were instructed to select a particular cue (instructed trials). Generally, the findings from the neurophysiological data suggested that prediction error magnitudes were lower for choice trials compared to instructed trials, but that only in choice trials did the error magnitude became substantially lower over the course of learning. Peterson et al. (2011) claimed that expectations are in closer alignment with feedback when feedback itself results from actions that are under volitional control, and this is based on the speculation that in Choice-based trials people can actively choose the option with the highest payoff where as for instructed trials people do not have volitional control.

The implication of Peterson et al. (2011) and Hajcak et al.'s (2007) findings is that active choice (i.e., Choice-based decision making) is an important factor in reward learning, and may involve different neural activity as compared to non-choicebased decisions (e.g., prediction, classical conditioning), but that there is no corresponding difference in behavioral measures of choice and prediction. The main reason for focusing on Hajcak et al. (2007) and Peterson et al. (2011) studies is that both make strong claims about reward learning in choice-based and prediction-based decision making. Moreover, in both studies the claim is made that reward differentially effects neurological behavior associated with prediction and choice, but that there is no corresponding behavioral differences (i.e., performance on tests of prediction and choice are no different). The problem is that without directly testing prediction and choice under the same task environment, unless one first establishes the presence or absence of behavioral differences, there are no secure ground for claiming that there are neurological differences but not behavioral differences. It is not clear why there would be differences at the neurological level and not at the behavioral level, which poses a number of questions concerning the kinds of inferences that can be drawn from neurological data to behavioral data, and vice versa.

*What can we infer about the relationship between brain and behavior given that the changes detected at the neurophysiological level do not correspond with any observable changes in behavior at the psychological level?* These findings raise important issues with respect to making inferences about the neurological mechanisms that support different forms of decision making. First, although in Hajcak et al.'s (2007) study predictions were made either before or after choices, both decisions were made on each trial. A cleaner design would have been to block trials in which people either predicted the success of a choice, or actually made a choice. In this way a comparison of prediction only and choice only trials would be free from potential order effects which were not examined in the study. Peterson et al. (2011) did in fact separate the trials in which choices and non-choices were made, but since participants were not explicitly required to make a subjective judgment about expected reward, the critical comparison was not between prediction and choice, but between choice and no-choice. Peterson et al. (2011) argued that their method of estimating prediction error magnitude from their reinforcement learning model was a more sensitive method than simply relying on verbal reports. Taken together, these methodological factors may explain the reported differences in neural activity and the absence of a difference at a behavioral level. However, both EGGs studies of choice and prediction are consistent with behavioral findings from Osman and Speekenbrink's (in press) study showing that the accuracy of cueoutcome knowledge is similar regardless of whether it was gained through prediction or choice. Though crucially in Osman and Speekenbrink's study there was no presentation of rewards during learning, only outcome feedback. Thus, the issue remains, to what extent can we extrapolate from neuropsychological findings to behavioral findings given that the differences are only present neurologically?

These issues will be revisited in the Section "General Discussion," but for now the key point is that evidence suggesting that choice and prediction may in fact be supported by different neurological processes has been demonstrated in simple forced choice tasks. The methodological concerns raised here may limit the extent to which the findings can be generalized to more complex decision making contexts. Therefore, given that behavioral studies comparing prediction and choice-based decision making do not include reward manipulations along the lines of Peterson et al. (2011) and Hajcak et al.'s (2007), and given that both these studies are problematic, the aim of the present study is to: (1) address the methodological issues raised here, (2) explore the generalizability of their findings to a DDM task by incorporating reward manipulations, and (3) explore the generalizability of their findings to a task which is commonly described as cognitively demanding (Brehmer, 1992).

Previous studies using DDM tasks directly comparing the effects of learning via prediction and learning via Choice-based decisions have shown that accuracy of cue-outcome knowledge is unaffected by mode of learning (Osman and Speekenbrink, in press). However, in the DDM tasks used previously, only outcome feedback was presented. This is different from the typical reward outcomes used in choice tasks in the neuropsychological domain. These tasks tend to incorporate salient reward outcomes (i.e., tones, lights, smiley faces) which have been shown to impact on performance. Therefore, the DDM task used in the present study incorporated reward outcomes during learning. Participants received outcome feedback, and were also presented with information as to the relative success of their decisions over time (indicated by a thumbs up sign and a smiley face – positive feedback), and the relative failure of decisions over time (indicated by a thumbs down sign and a sad face – negative feedback). In addition, the present study incorporated experimental procedures from Peterson et al. (2011) study and Hajcak et al.'s (2007) studies to make the DDM task comparable to their studies. In the prediction-based learning condition participants were presented with pre-selected cues (akin to Peterson et al., 2011 study) and were given the opportunity of guessing what the outcome value would be on each trial (akin to Hajcak et al., 2007 study).

By incorporating these methodologicalfeatures into the present study, the aim is to align Peterson et al. (2011) and Hajcak et al.'s (2007) tasks to a paradigm examining decision making processes which is commonly referred to as cognitively demanding (Osman, 2010a), and is often described as externally valid (Funke, 2001). In so doing, the present study examines Hajcak et al.'s (2007) and Peterson et al.'s (2011) claim that Choice-based decisions rather than Prediction-based decisions facilitate closer correspondence between subjective expectations and feedback. They propose that, compared with Prediction-based decisions, Choice-based decisions reflect a process of volitional control over an action. The action itself is informed by an evaluative process in which each choice option is weighted and the one with the highest subjective reward is selected. This in turn would suggest an advantage for those making Choice-based decisions rather than Prediction-based decisions. However, this generates a discernable difference in neurophysiological behavior, but not in behavioral measures of performance. A null effect is also predicted from a reinforcement learning perspective. If experiencing the effects of one's predictions or choices cumulatively in a dynamic environment leads to the same prediction error, then regardless of the mode of learning, cue-outcome knowledge should be equally accurate in Prediction-based and Choice-based learning conditions.

## **EXPERIMENT 1**

The experiment is designed to address the following empirical question: *Are there behavioral differences between Choice-based and Prediction-based dynamic decision making under reward based learning?* To answer this, the present study employed a DDM paradigm that incorporated a reward based structure similar to the simple choice tasks used in the neuropsychological domain discussed above. In one version of the DDM task, from trial to trial participants were required to learn the probabilistic cue-outcome associations by using the cue values to predict the outcome value (Prediction-based learners). The other version involved the same cue-outcome task structure, but in this case participants were required to control the outcome value by manipulating the cue values to reach and maintain a specific outcome value (Choicebased learners). To match the two versions as closely as possible, the learning histories experienced by both types of learners were identical, but the critical difference between the two was that Choice-based learners set the cue values (choice under volition), whereas the cue values were preset for Prediction-based learners (non-volitional cue manipulation). This was achieved by using a yoked design. In this way, Prediction-based learners were matched to Choice-based learners' learning trials, and so the cue-outcome values that were experienced were identical to those chosen by Choice-based learners. To examine the effects of the different modes of learning on the accuracy of cue-outcome knowledge, all participants were presented with two tests of control, and two tests of prediction.

## **METHODS**

*Participants*

Ninety-six graduate and undergraduate students from University of London volunteered to participate in the experiment for reimbursement of £5. The assignment of participants to the four conditions was semi-randomized. There were a total of eight groups (Choice-based learning Positive Reward, Choicebased learning Negative Reward, Choice-based learning Both Positive + Negative Reward, Choice-based learning No Reward, and Prediction-based learning Positive Reward, Prediction-based learning Negative Reward, Prediction-based learning Both Positive + Negative Reward, Prediction-based learning No Reward), with 12 participants in each. Pairs of participants (Choice-based learners and yoked Prediction-based learners) were randomly allocated to one of the four types of reward based conditions (Positive Reward, Negative Reward, Both Positive + Negative Reward, No Reward). Participants were tested individually.

## **DESIGN**

The experiment used a 2 × 4 design. It included two between subject manipulations, namely learning mode (Prediction-based vs. Choice-based) and type of reward (Positive Reward, Negative Reward, Both Positive + Negative Reward, No Reward). Success of learning performance was measured using two types of tests (Control Test 1, 2; Predictive Tests 1, 2).

The task environment consisted of the following: Positive cue = *x*1, Effect of positive cue = *b*<sup>1</sup> = 0.65, Negative cue = *x*2, Effect of negative cue = *b*<sup>2</sup> = −0.65. Random perturbation = *et*, (the random perturbation component, is normally distributed, with a mean of 0), Outcome value = *y*(*t*), Previous outcome value = *y*(*t* − 1). Thus, there were three cues and one outcome. One of the cues increased the outcome, and one of the cues decreased the outcome. The third cue had no effect on the outcome. More formally, the task environment can be described as in the following equation

$$\chi(t) = \chi(t-1) + 0.65\,\varkappa\_1(t) - 0.65\,\varkappa\_2(t) + e(t)$$

in which *y*(*t*) is the outcome on trial *t*, *x*<sup>1</sup> is the positive cue, *x*<sup>2</sup> is the negative cue, and *e* a random noise component, normally distributed with a zero mean and SD of 81. The null cue *x*<sup>3</sup> is not included in the equation as it had no effect on the outcome.

The DDM task included a total of 112 trials, divided into two phases. The structure of the entire experiment was as follows: Learning phase (40 trials), Test Phase – Two tests of Controlling the Outcome (20 trials each) interleaved with Two test of Predicting Cue and Outcome values (16 trials each). The order of presentation of the tests was as follows, Control Test 1, Prediction Test 1, Control Test 2, Prediction Test 2.

<sup>1</sup>The assignment of noise to the system was first piloted in order to generate High variance (16 SD) and low variance (4 SD). Osman and Speekenbrink (in press) includes two studies which varied the random perturbation component, In Experiment 1, 16 SD was found to be difficult as reflected in choice performance and predictive performance, while 4 SD was considerably easier. In Experiment 2, 8 SD was moderately difficult, and on this basis was chosen in order investigate the effects of reward on Choice-based and Prediction-based learning in the present study.

## **BEHAVIORAL TASK**

The visual layout of the screen, cover story, and the main instructions were identical for Prediction-based and Choice-based learning groups. Participants were presented with a story about a newly developed incubator designed especially for babies with an irregular state of health (a global measure based on heart rate, temperature, blood pressure)2. Using this type of context ensured that participants were highly motivated to learn the task. Choicebased learners were informed that as a trainee maternity nurse they would be trying to regulate the health of a newborn girl called "Molly." They would be regulating the levels of three parameters (air pressure, oxygen, and humidity) with the aim of maintaining a specific safe healthy state. The system was operated by varying the cue values which would affect the baby's state of health. Predictionbased learners were assigned the same role, but instead they were told that they would see the nurse regulating the incubator parameters and that their role would be to predict the subsequent change in a global measure of health. The screen included three cues which were labeled (air pressure, oxygen, and humidity), and the outcome (healthy state) which was presented in two ways, as a value in the middle right of the screen, and also on a small progress screen in which a short trial history (five trials long) of outcome values was presented. Both Prediction-based and Choice-based learning groups were shown the current state of health, new value of the state of health after manipulation and the target value of the healthy state. Prediction-based learners were also shown the result they predicted in the form of a dashed line on the progress screen. The task was self-paced. **Figure 1** shows an example of the environment participants were required to interact with.

## *Rewards*

Rewards based stimuli were presented during the learning phase only. The rewards did not correspond to money or points, but rather they were simple characters that indicated an increase (smiley face and a thumbs up sign) or decrease (sad face and a thumbs down sign) in performance. Participants in the No Reward (No Reward) condition received no reward, only outcome feedback. Outcome feedback was provided in the form of a value that changed on a progress screen indicating graphically the difference between the target value and the achieved outcome value (for Choice-based learners), or the predicted outcome value and the achieved value (for the Prediction-based learners). In addition the outcome value and target value were also listed on the side of the progress screen.

Participants in the positive reward condition (Positive Reward) observed a picture of a smiley face and a thumbs up on trials in which the discrepancy between their achieved outcome value and the target value was smaller than the previous trial (for Choicebased learners), or the discrepancy between expected and actual outcome was smaller than the previous trial (for the Predictionbased learners). Participants in the negative reward condition (Negative Reward) observed a picture of sad face and a thumbs down on trials in which discrepancy between the achieved outcome and target outcome was greater than the previous trial (for Choice-based learners), again a similar logic was applied to Prediction-based decisions (for the Prediction-based learners). Participants in Positive + Negative reward condition (Both-Rewards) received positive and negative rewards on trials adhering to the conditions specified above. Rewards were only presented during the learning phase. During the Test phase, for control tasks all participants received outcome feedback, and for tests of prediction no feedback was presented.

## *Learning phase*

*Choice-based learners.* During each trial participants had to interact with the system by changing the value of the cues using a slider corresponding to each. Each slider had a scale that ranged from 0 to 100. On the start trial, the cue values were set to "0," the outcome value was 178, the target value throughout was 62, and a safe range (±10 of the target value) was given. When participants made their decision they clicked a button labeled "Submit" which deactivated the cues and revealed on the progress screen the effects of their decisions on the outcome. The effects on the outcome value were cumulative from one trial to the next, and so while the cue values were returned to "0" on the next trial, the outcome value was retained from the previous trial. After completing the learning phase, participants then proceeded to the test phase.

<sup>2</sup>It was made clear to participants at the start of this experiment, that they were taking part in a simulation, and that there was no real baby in an incubator.

*Prediction-based learners.* The procedure was identical to Choice-based learners, with the following exceptions. Once presented with the cue values, they predicted the outcome value by adjusting a slider that was placed alongside the outcome progress screen; this would move a line on the progress screen to indicate the outcome value. Once they made their decision, they clicked a button labeled "Submit," which deactivated the outcome value slider and revealed the actual outcome value as well as their predicted outcome value. The button "Continue" was then pressed to proceed to the next trial. The start of the next trial triggered the outcome value slider to become activated and the presentation of new cue values. The predicted value of the previous trial was omitted from the progress screen, but the trial history of the last five actual outcome values remained.

## *Test phase*

*Control tests.* After the learning phase, all participants were examined on their ability to control the system to a criterion (outcome value = 62, and safe range ±10 of the target value). Test 1 involved the same procedure that the Choice-based learners were following during the learning phase, but consisted of only 20 trials. For the Prediction-based learners this was the first occasion they could manipulate the cues. To examine the ability to control the system to a different goal, all participants were then presented with Test 2 in which they followed the same procedure as Test 1,with the following exceptions. In the Test 2 participants were informed that they needed to be even more careful in reaching and maintaining the outcome value (outcome value = 74), and that staying within the safe range (±5 of the target value) was of particular importance. The starting value of Test 1 was 178, and was set to 156 in Test 2. In the Test 2 Choice-based learners and Prediction-based learners had no experience of the new criterion value, and so they would have to base their decisions on acquired knowledge of the system in order to control the new outcome value.

Predictive tests were designed to examine explicit cue-outcome knowledge. Each test included 16 trials which were divided in the following way. Participants were required to predict the value of a cue (Positive, Negative, Null) based on the given value of the outcome and the other cues (e.g., predicting the Positive cue value, based on the values of the Negative, Null, and Outcome values), or they were required to predict the outcome value given the value of the other three cues. Participants were not told that the test involved a mixture of eight old trials and eight new trials. Old trials were divided accordingly: 2 × Positive cue value, 2 × Negative cue value, 2 × Null cue value, 2 × Outcome value). These trials were randomly selected from the initial learning phase (for Choice-based learners these were trials that they had generated themselves, for Prediction-based learners these were the same yoked learning trials in which they predicted the outcome value). The 8 new trials were divided accordingly: 2 × Positive cue value, 2 × Negative cue value, 2 × Null cue value, 2 × Outcome value. Neither group had prior experience of them. All participants were presented with the same set of new trials; these were predetermined prior to the experiment. The presentation of the 16 trials in each set of Predicting Cue and Outcome values Tests was randomized. For each trial the predictive value was recorded along with the response time.

## *Dependent measures*

Predictive performance was measured by an error score *S*p(*t*) calculated as the absolute difference between predicted and expected outcome values:

$$S\_{\mathbb{P}}(t) = \left| P(t) - \chi(t-1) - 0.65 \text{ } \mathbb{x}\_1(t) + 0.65 \text{ } \mathbb{x}\_2(t) \right|,$$

in which *P*(*t*) is a participant's prediction on trial *t*. We chose to compare predictions to expected rather than actual outcomes as the latter are subject to random noise.

Choice performance was measured as the absolute difference between the expected achieved and best possible outcome:

$$S\_{\mathbf{c}}(t) = \left| G(t) - \boldsymbol{\chi}(t-1) - 0.65 \,\boldsymbol{\chi}(t) + 0.65 \,\boldsymbol{\chi}\_2(t) \right|,$$

in which *G*(*t*) is the goal on trial *t:* either the target outcome if achievable on that trial, or the closest achievable outcome. To illustrate, choice performance was based on how much participants' cue manipulations deviated from the optimal cue settings (the same principle applies to predictive performance except the deviation was from expected outcome values on each trial). In the choice tasks used here, for a given (previous) outcome value and goal, the optimal cue settings define a line in a two-dimensional plane. For example, if the deviation between the previous outcome and goal is 50, then the optimal cue settings are all values for the positive cue *x*<sup>1</sup> and negative cue *x*<sup>2</sup> such that 50 = 0.65 *x*<sup>1</sup> − 0.65 *x*2, for instance a value of *x*<sup>1</sup> = 77 and *x*<sup>2</sup> = 0, or *x*<sup>1</sup> = 78 and *x*<sup>2</sup> = 1, *x*<sup>2</sup> = 87 and *x*<sup>2</sup> = 10, etc. Thus, choice performance was computed as the (shortest) distance between a participant's actual settings for these two cues and the line defining the optimal cue settings.

## **RESULTS**

The participants' patterns of learning were first examined separately for Choice-based learners and Prediction-based learners. Comparisons between conditions could not be conducted at this stage as the optimality scores were incomparable (one based on the difference between achieved and best possible outcome value, and the other between predicted and expected outcome value). The Test Phase was the first occasion in which both conditions were directly compared for the participants' ability to reach and maintain the outcome to a specific criterion (Tests of Controlling the Outcome), and their ability to predict cue values from the state of the outcome, or predict the outcome from the pattern of cue values (Test of Predicting Cue and Outcome values).

## *Learning phase: choice-based learning*

The learning phase was divided into two blocks of 20 trials each (Learning first half; Learning second half), and Control optimality scores were averaged across each block, for each participant. The following analyses were based on the mean error scores by block, presented in **Figure 2**. To examine the success of learning, 2 × 4 repeated measures ANOVA was conducted using Block (Learning first half; Learning second half) and Reward (No Reward, Both-Rewards, Positive Reward, Negative Reward). Overall, with more exposure to the task, Choice-based learners showed general improvements in their ability to control

the outcome to criterion as revealed by a main effect of Block [*F*(1,44) = 44.019; *P* < 0.0005, η = 0.527]. There was a significant main effect of Reward [*F*(2,44) = 3.443; *P* < 0.05, η = 0.202]. A Bonferroni *post hoc* tests revealed that Negative Reward led to poorer control performance as compared to those receiving Both-Rewards (19.147, *P* < 0.05) and compared to those receiving No Reward (19.389, *P* < 0.05).

## *Learning phase: prediction-based learning*

In order to examine predictive accuracy during learning Predictive optimality scores were subjected to 2 × 4 repeated measures ANOVA with Block (Learning first half; Learning second half) and Reward (No Reward, Both-Rewards, Positive Reward, Negative Reward). The analysis revealed a main effect of Block [*F*(1,44) = 26.278; *P* < 0.001, η = 0.374], confirming the pattern of behavior presented in **Figure 2** indicating that predictive accuracy improved with more practice. There was also a Block × Reward interaction [*F*(3,44) = 3.064; *P* < 0.05, η = 0.173]. Bonferroni *post hoc* test failed to reach significance. There was also a significant main effect of Reward [*F*(3,44) = 3.010; *P* < 0.05, η = 0.170]. Bonferroni *post hoc* tests revealed that receiving

Positive Reward led to poorer predictive accuracy as compared to Both-Rewards (12.237, *P* < 0.03).

#### *Test phase: control*

Control optimality scores were averaged across participants in each group for each of the two Tests of Controlling the Outcome and are presented in **Figure 3**. An ANOVA using Condition (Choice-based learners, Prediction-based learners) and Reward (No Reward, Both-Rewards, Positive Reward, Negative Reward) × Test (Control Test 1, Control Test 2) was conducted. Generally all participants improved in their control performance in Test 2 as compared to Test 1, suggesting the presence of practice effects, as revealed in a main effect of Test, [*F*(1,88) = 14.020; *P* < 0.0001, η = 0.137]. A main effect of Condition suggested that Choice-based learners were more accurate in their control performance compared to Prediction-based learners [*F*(1,88) = 8.293; *<sup>P</sup>* <sup>&</sup>lt; 0.005, <sup>η</sup> <sup>=</sup> 0.086]3. There was also a main effect of Reward [*F*(3,88) = 9.506; *P* < 0.0005, η = 0.245]. To examine this further,

<sup>3</sup>Bonferroni correction was applied.

control optimality scores were collapsed across Test and Condition and Bonferroni tests were carried out on Feedback. The tests revealed those receiving No Reward during learning showed more accurate control performance as compared with Positive Reward (16.007,*P* < 0.01), and Negative Reward (22.756, *P* < 0.001). Also, receiving Negative Reward led to poorer control performance as compared to receiving Both-Rewards (18.87, *P* < 0.001). No other comparisons were significant. It appears that in tests of control, those receiving no reward during training tended to show the most accurate control performance.

## *Test phase: prediction*

Tests of Predicting Cue values and Outcome values provided the opportunity to examine the extent to which the cueoutcome knowledge gained by Choice-based learners was sufficiently flexible to equivalent levels of accuracy as Predictionbased learners. Prediction optimality scores for Test 1 and Test 2 are presented in **Figure 4**. The scores were collapsed across the Tests, since an ANOVA with Test (Predictive Test 1, Predictive Test 2) × Condition (Choice-based learners, Predictionbased learners) and Reward (No Reward, Both-Rewards, Positive Reward, Negative Reward) failed to show any differences in patterns of predictive accuracy between tests. Cue (Positive, Negative, Outcome) × Familiarity (Old trials, New trials) × Condition (Choice-based learners, Prediction-based learners) × Reward (No Reward, Both-Rewards, Positive Reward, Negative Reward) were used as factors in an ANOVA. A main effect of Familiarity [*F*(1,176) = 21.464; *P* < 0.0005, η = 0.196] was significant. In general all participants were more accurate in their predictions for trials they had experienced previously during learning as compared to unfamiliar trials. There was a Familiarity × Cue interaction [*F*(2,176) = 3.902; *P* < 0.05, η = 0.042]. Paired *t*-tests revealed that compared with new trials, there was greater predictive accuracy for old trials when predicting the value of the positive cue [*t*(95) = 3.708, *P* < 0.0004] and the negative cue [*t*(95) = 5.433, *P* < 0.00004]. There was no difference in predictive accuracy between old and new trials when predicting the outcome. No other effects or interactions were significant.

## **GENERAL DISCUSSION**

The main objective of this study was to investigate the following question: *Are there behavioral differences between Choice-based and Prediction-based dynamic decision making under reward based learning?* In general, the evidence from the present study corroborates the pattern of neuropsychological evidence from ERP studies (Hajcak et al., 2007; Peterson et al., 2011), but not the behavioral evidence from these studies. The present study shows that active involvement generates more accurate cue-outcome knowledge than non-volitional learning of cue-outcome relations. Though reward based learning led to differences in performance between Choice-based and Prediction-based learning, the effects of reward were unexpected. Compared to participants that were not presented with reward, on the whole the presentation of reward tended to impair learning and transfer of cue-outcome knowledge. Therefore, the findings demonstrate behavioral differences between Prediction-based and Choice-based decision making in a DDM task were the result of the presentation of reward.

More specifically, the findings from this study show that during learning Negative Reward severely impaired Choice-based performance, while Positive Reward severely degraded predictive accuracy. Moreover, Positive Reward and Negative Reward generally impaired performance in Learning and Test when compared with participants receiving No Reward or Both-Rewards. In addition, Choice-based learners showed an overall advantage in later tests of control. This suggests that volitional control over cue manipulations during learning facilitated later ability to control an outcome to different criteria. Moreover, Choice-based learning also facilitated successful transfer of cue-outcome knowledge to Predictive tests. The present discussion focuses on two main issues: (1) the detrimental effects of reward on decision making, and (2) the broad philosophical issues that are raised by neuropsychological research on choice and prediction.

## **WHY DID REWARD BASED FEEDBACK IMPAIR DDM?**

Kluger and DeNisi's (1996, 1998) review of the effects of feedback on skill based learning (low level motor and perceptual learning as well as high level problem solving and decision making) suggest that unless the task is simple, feedback will lead to no

additional benefits in most cases, and in extreme cases impair learning (e.g., Hammond and Summers, 1972; Salmoni et al., 1984). They claimed that the effectiveness of feedback depends on the type of goal that that the learner is pursuing. More recently, Harvey (2011) has proposed a cognitive resources account as a way of explaining the differential effects on performance through feedback as a function of task difficulty. He proposes that tasks, such as DDM, are examples in which the knowledge needed to achieve success is not easily identified from the outset, and so the process of information search makes high demands on executive functions. As a result, the provision of feedback (e.g., cognitive feedback, reward outcomes) is problematic in these tasks for the reason that it is a source of additional information that needs to be processed in order to be usefully incorporated into the performance of the main task. The more demanding the task is, the more likely it is that feedback will interfere because processing feedback competes with performing the main task.

In fact, many have argued that DDM tasks are examples of complex problem solving tasks (Funke, 2010; Osman, 2010a), and have been used as methods of indexing IQ (Joslyn and Hunt, 1998; Gonzalez, 2005; Funke, 2010). Therefore, there are good grounds for assuming that the kind of decision making processing that goes on in DDM tasks is cognitively expensive. This is because decision making involves tracking cue-outcome relations in a dynamic environment. At any one time a decision maker is still uncertain as to the generative causes of changes in an observed outcome in a DDM task. The reason being that the observed changes to the outcome can result from endogenous influences (i.e., cue manipulations in the DDM task) or exogenous influences on those outcomes (i.e., functions of the system itself/noise), or a combination of both endogenous and exogenous influences.

It may be the case that feedback (cognitive feedback, reward outcomes) may impair decision making processes such as those involved in DDM tasks because additional processing resources are needed to evaluate feedback in order to use it to adapt and update decision making behavior (Harvey, 2011). For simple forced choice tasks (e.g., Hajcak et al., 2007; Peterson et al., 2011), the learner possess the relevant knowledge for making a decision from the outset, and learning simply reflects the efficiency in implementing that knowledge. Therefore, providing feedback in forced choice tasks does not compete with processing demands made from performing the main task. By extension, when contrasting the simple forced choice task used by Hajcak et al. (2007) and Peterson et al. (2011) and the DDM task in the present study, reward based learning may have adversely affected performance because DDM task is more cognitively demanding than forced choice tasks.

To explore this, separate analyses were conducted comparing the optimality scores of the Choice-based learning No Reward condition and the Prediction-based learning No Reward condition in the Control tests, and the findings revealed that there were no difference in performance between conditions [*F*(1,22) = 0.07; *P* = 0.785, η = 0.003; see text footnote 3]. Furthermore, this result replicates the findings from Osman and Speekenbrink's (in press) study (Experiment 2). When the same analysis was conducted collapsing across the three remaining reward based conditions, more accurate performance was found for Choice-based learners

receiving feedback as compared to Prediction-based learners receiving feedback, [*F*(1,70) = 9.47; *P* < 0.005, η = 0.119]. Though caution should be exercised in drawing any firm conclusions from this result, it certainly is supportive of the proposal that in the case of DDM tasks, reward infers with DDM, more specifically, active based decision making in which cue-interventions are made. Moreover, the inference may result from the fact that DDM tasks are cognitively demanding and so processing rewards competes for the same limited resources available to perform the main task. This may also explain why the presentation of rewards does not appear to impair performance in forced choice tasks.

Clearly this has implications for reinforcement learning models (Schultz et al., 1997; Schultz, 2006), at two levels, given that fundamentally, Choice-based and Prediction-based decisions should lead to equivalent cue-outcome knowledge, why is it that a difference in performance at test was found? Second, reinforcement learning models would predict differential effects on performance based on different types of reward, but why is it that rewards differentially affected performance of Prediction-based and Choicebased conditions during the learning? In response to these issues, it might be worth considering the informational content of the outcome feedback for Choice-based and Prediction-based learners. On each trial during learning, outcome feedback could be used to indicate the deviation of the expected outcome value from the achieved outcome value (comparison 1 – prediction error) and the deviation of the achieved outcome value from the target value (comparison 2). This was the case in the present study and in Osman and Speekenbrink (in press). Osman and Speekenbrink's (in press) findings suggest that both Prediction-based and Choice-based learners were using comparison 1 and comparison 2 interchangeably during learning, because this enabled both Prediction-based and Choice-based learners to perform control and prediction tasks equally well at test. In the present study, the introduction of reward may have prevented Choice-based and Prediction-based learners from attended to both comparison 1 and 2. Instead the presence of reward made salient comparison 1 for Prediction-based learners, and made salient comparison 2 for Choice-based learners. This may have resulted in the advantage found in Choice-based learners in later tests of control. The equivalent cue-outcome knowledge found in Prediction-based and Choice-based learners in tests of prediction suggest that either comparison 1 or 2 generates sufficient cue-outcome knowledge to perform the test.

This would be consistent with the speculation that volitional control over setting the cue values during learning encouraged Choice-based learners to evaluate each cue-outcome relationship, whereas the evaluation process was not as exhaustive during Prediction-based learning (Hajcak et al., 2007; Peterson et al., 2011). The differential effects of reward on Prediction-based decisions and Choice-based decisions may reflect a difference in the magnitude of the effects of gains and losses for different types of decisions (Schultz et al., 1997; Sailer et al., 2007). However, this is still speculative and given that to date, no previous study has examined the effects of feedback on Choice-based and Predictionbased decisions in a DDM task, further work is needed to explore the possible influences of reward on decision making.

## **PHILOSOPHICAL ISSUES RAISED BY NEUROPSYCHOLOGICAL RESEARCH ON CHOICE-BASED AND PREDICTION-BASED DECISION MAKING**

A question asked at the start of this article based on the implication of Peterson et al. (2011) and Hajcak et al.'s (2007) findings was: *What can we infer about the relationship between brain and behavior given that the changes detected at the neurophysiological level do not correspond with any observable changes in behavior at the psychological level?* The same question will now be tackled with respect to philosophical issues concerning the inferences that this and present study can make about the neurological mechanisms that support different forms of decision making.

The virtue of neuroscience is that it allows us to gain access to processes that were once inaccessible to psychologists. The rational usually follows along the lines of: If brain region X is active, then cognitive process Y will be active. For this rational to work, there also has to be an assumption that the causal arrow goes in the direction of brain to behavior. Detractors of this position can make the argument that there is a lack of functional specificity of regions in the brain which undermines any strong inferences that can be made from neuroimaging data to behavioral measures (Poldrack, 2006). As a case in point, while Peterson et al. (2011) and Hajcak et al.'s (2007) are not neuroimaging studies, nevertheless, their critical findings concern differences neurophysiologically but not behaviorally. So what can be inferred from such findings? Given that the logical of many neuropsychology studies involves detecting a change in the pattern of activation in certain brain regions and then inferring cognitive processes from observable changes in behavioral measures, it is perhaps even more problematic to make inferences about the association between brain regions and cognitive processes when the differences lie only in neurophysiological data.

Also, if, like many psychologists and neuroscientists, materialism (in which ever flavor is adopted) is the favored position, because if behavior is reducible to regions in the brain, then one is interested in discovering the etiology of human behavior by examining the processes in the brain. The rational here follows along the lines of: If my study manipulates cognitive process Y, then given what I know from work conducted in the neurosciences, brain region X should be activated. So long as neurophysiological and behavioral data converge, there are no problems in developing an explanatory account of a cognitive process based on the patterns of data at both level. The problem that is posed here is deciding what the appropriate level of explanation for prediction-based and choice-based decision making given that behavioral data imply one type of account, and neurophysiological data suggest an alternative account. As a case in point, the findings from Peterson et al. (2011) and Hajcak et al.'s (2007) studies pose this problem. The experimental manipulations in both studies were designed to pit two cognitive processes (i.e., choice and prediction) against each other. While the behavioral data from both studies implies a single mechanism that supports Choice-based and Prediction-based decisions through the generation of prediction errors, the neurophysiological data suggests there might be different underlying mechanisms that correspond to the cognitive processes.

Where as the issues discussed above concern problems in interpreting neurophysiological and behavioral data, a more general issue is that there may well be limitations in extrapolating from simple tasks to more complex task in designed to simulate real world situations (Osman, 2010b). The issue comes down to scalability. The argument concerning the practice of transforming higher-level cognitive behaviors observed in the real world to detectable lower-level neurobiological phenomena takes many forms (Bickle, 2006, 2007; Craver, 2007; Sullivan, 2009); though for simplicity this discussion will focus on two: Internal and External validity. *External validity* refers to the correspondence between results implying a causal relationship between variables in a laboratory to variables of the same kind existing outside of it (Guala, 2003). Elegant simple choice tasks used in neuropsychological research may not be sufficient tools for studying complex behaviors if they cannot adequately explain or predict complex behavior in the real world. *Internal validity* refers to the success of an experimental result that establishes a causal relationship between variables found to operate in the context of a laboratory. If there is not a general convergence of reductive practices in neuropsychological experiments in establishing causal relationships between high level behaviors and cellular/molecular processes, then mental functions are ultimately not reducible to cellular/molecular processes.

To a large extent, pragmatic factors (i.e., the investigative aims of the researcher) determine which type of validity is prioritized when developing an experiment (Sullivan, 2009). But, pragmatism does not necessarily lead to any unity in the way in which phenomena (e.g., Prediction-based vs. Choice-based decision making) are examined in a cognitive psychology laboratory or an EEG laboratory. However, philosophers such as Craver (2007) would argue that the same mechanism (decision making) is being examined in at different levels in neuroscientific and cognitive science circles. There is a: (a) specialized level in the nature (e.g., neural activity) of the components of the mechanism are being examined (intralevel) – and (b) a more expansive level in which the interventions are made in order to examine the function of the components of the mechanism (interlevel). Unity is achieved when researchers refer to and try and integrate findings from both intralevel and interlevel experiments. By the same token, the behavioral differences found presently between Prediction-based and Choice-based decision making, and the differences in neural activity between the two reported in Hajcak et al.'s (2007) and Peterson et al.'s (2011), could be viewed as examples of findings from studies at intralevel and interlevel. However, the convergence of general findings at the different levels still creates a problem, because there are more still differences in the methodologies between the present study and the aforementioned EEG studies, and so this still compromises the possibility of drawing broad conclusions that the differences between prediction and choice essentially is based on volitional control.

## **CONCLUSION**

The resent study used a DDM task to investigate the accuracy of cue-outcome knowledge when learning in dynamic environment was Prediction-based or Choice-based. In addition, the influence of reward on both was examined. To this end, the evidence suggests that Choice-based decision making leads to more accurate cue-outcome knowledge than Prediction-based learning. However, the inclusion of reward adversely effected decision making during learning and at test. The type of DDM task included in the present study is cognitively more demanding than the typical choice tasks used in neuropsychological studies examining reward learning. The present article argues that the processing of rewards places an additional burden on cognitive resources that are already stretched when performing DDM tasks. The competition for resources leads to general decrements in decision making performance as compared to when no rewards are present. Though the general findings from this study are compatible with recent evidence from the neuropsychological domain, large differences in methodology prevent any strong conclusions being drawn with respect to supporting the claim that differences between prediction and choice are based on the level of volitional

## **REFERENCES**


decisions in humans. *Nature* 441, 876–879.


control. A number of philosophical arguments are considered with respect to generalizing evidence from neuropsychology to psychology and vice versa, in particular the inferential fallacies that are made, and the pragmatic constrains on the way studies are conducted.

## **ACKNOWLEDGMENTS**

The support of the ESRC Research Centre for Economic Learning and Human Evolution is gratefully acknowledged. Preparation for this research project was supported by the Economic and Social Research Council, and the Engineering and Physical Sciences Research Council, EPSRC grant – EP/F069421/1 (Magda Osman). I would also like to thank Maarten Speekenbrink for preparing the experimental program and for conducting the data analysis.


M. (2008). The striatum and learning to control a complex system? *Neuropsychologia* I46, 2355–2363.


to reductionist and non-reductionist models of the unity of neuroscience. *Synthese* 167, 511–539.

Wörgötter, F., and Porr, B. (2005). Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. *Neural Comput*. 17, 245–319.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 October 2011; paper pending published: 25 November 2011; accepted:* *26 February 2012; published online: 20 March 2012.*

*Citation: Osman M (2012) The role of reward in dynamic decision making. Front. Neurosci. 6:35. doi: 10.3389/fnins.2012.00035*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Osman. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Distinction between externally vs. internally guided decision-making: operational differences, meta-analytical comparisons and their theoretical implications

## *Takashi Nakao1,2\*, Hideki Ohira3 and Georg Northoff <sup>1</sup>*

<sup>1</sup> Mind, Brain Imaging and Neuroethics, Institute of Mental Health Research, Royal Ottawa Health Care Group, University of Ottawa, Ottawa, ON, Canada

<sup>2</sup> Japan Society for the Promotion of Science, Tokyo, Japan

<sup>3</sup> Graduate School of Environmental Studies, Nagoya University, Nagoya, Japan

#### *Edited by:*

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil

#### *Reviewed by:*

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil Willem Huijbers, Harvard Medical School, USA

Bernard J. Baars, The Neurosciences Institute, USA

#### *\*Correspondence:*

Takashi Nakao, Mind, Brain Imaging and Neuroethics, Institute of Mental Health Research, Royal Ottawa Health Care Group, University of Ottawa, 1145 Carling Avenue, Room 6440, Ottawa, ON K1Z 7K4, Canada. e-mail: takana818@gmail.com

Most experimental studies of decision-making have specifically examined situations in which a single less-predictable correct answer exists (externally guided decision-making under uncertainty). Along with such externally guided decision-making, there are instances of decision-making in which no correct answer based on external circumstances is available for the subject (internally guided decision-making). Such decisions are usually made in the context of moral decision-making as well as in preference judgment, where the answer depends on the subject's own, i.e., internal, preferences rather than on external, i.e., circumstantial, criteria. The neuronal and psychological mechanisms that allow guidance of decisions based on more internally oriented criteria in the absence of external ones remain unclear. This study was undertaken to compare decision-making of these two kinds empirically and theoretically. First, we reviewed studies of decision-making to clarify experimental–operational differences between externally guided and internally guided decision-making. Second, using multi-level kernel density analysis, a whole-brain-based quantitative meta-analysis of neuroimaging studies was performed. Our meta-analysis revealed that the neural network used predominantly for internally guided decision-making differs from that for externally guided decision-making under uncertainty. This result suggests that studying only externally guided decision-making under uncertainty is insufficient to account for decision-making processes in the brain. Finally, based on the review and results of the meta-analysis, we discuss the differences and relations between decision-making of these two types in terms of their operational, neuronal, and theoretical characteristics.

**Keywords: preference, moral judgment, default-mode network, conflict, medial prefrontal cortex, social situation, resting state, fMRI**

## **INTRODUCTION**

How the human brain predisposes us to make certain choices while not making others is an important question that is often explored in current neuroscience (Bechara et al., 2000; O'Doherty, 2004, 2007; Sanfey et al., 2006; Volz et al., 2006; Wallis, 2007; Platt and Huettel, 2008; Rangel et al., 2008; Rilling et al., 2008b; Rolls and Grabenhorst, 2008; Sanfey and Chang, 2008; Vorhold, 2008; Balleine and O'Doherty, 2010; Ohira et al., 2010). Most experimental studies of decision-making have addressed situations in which one particular more or less-predictable answer is available. Although such studies particularly addressing lowpredictability include uncertainty related to an answer (Platt and Huettel, 2008; Rushworth and Behrens, 2008), they nevertheless presuppose a particular correct answer based on the external circumstances. One might consequently want to speak of externally guided decision-making in such a case.

In addition to such externally guided decision-making, instances of decision-making do exist for which there is no correct answer available for a subject based on external circumstances

(Goldberg and Podell, 1999, 2000; Lieberman and Eisenberger, 2005; Volz et al., 2006; Nakao et al., 2009b). Such decisions are usually made in the context of moral decision-making (e.g., Moll et al., 2006; Greene and Paxton, 2009) as well as in the context of preference judgment (Paulus and Frank, 2003; Johnson et al., 2005; Nakao et al., 2009a, 2010a,c), where the answer depends on the subject's own, i.e., internal, preferences rather than on external, i.e., circumstantial, criteria. One might consecutively want to speak of internally guided decision-making as distinguished from externally guided decision-making. Although subjects can draw on their representation of circumstantial criteria in externally guided decision-making, how and on what they can base their decision in internally guided decision-making remains unclear. More specifically, the neuronal and psychological mechanisms that guide decisions based on more internally oriented internal criteria in the absence of external ones remain unclear.

This study compares externally and internally guided decisionmaking in both respects: empirically and theoretically. First, we review the decision-making literature to clarify conceptual and operational differences between externally and internally guided decision-making. Regarding externally guided decision-making, we review reports of studies that have investigated the effect of a situation in which an objectively correct answer is difficult to predict (i.e., uncertain situation) because of insufficient information to make a judgment (e.g., probabilistic outcome). We also review the literature related to neuroeconomic studies using tasks in which the outcome is varied (or believed to be varied) by the other people's decisions. For internally guided decision-making, we review reports of studies of decision-making for which no correct answer exists, meaning that none of the stimuli or presented options is regarded as the only objectively correct answer.

Second, we compare externally and internally guided decisionmaking with regard to their recruitment of regions. For that we conducted a meta-analysis of previous neuroimaging studies using the multi-level kernel density analysis (MKDA) approach (Wager et al., 2007, 2009). Finally, based on the review of relating articles and results of the meta-analysis, we discuss the differences and commonalities between decision-making of these two kinds. We also discuss the possible directions to advance the future investigation, especially that of internally guided decision-making.

## **REVIEW OF STUDIES OF DECISION-MAKING**

## **EXTERNALLY GUIDED DECISION-MAKING UNDER UNCERTAINTY** *Operational characteristics of externally guided decision-making under uncertainty*

Most experimental studies of decision-making have examined situations in which only one less-predictable correct answer exists. With low-predictability, a low probability of reward or punishment can be associated with a stimulus, action, and/or outcome. In such cases, decision-making can be characterized by "uncertainty." Platt and Huettel (2008) define the concept of uncertainty as the psychological state in which a decision maker lacks knowledge about what outcome will follow from either choice in decisionmaking. Experimentally, uncertainty has been operationalized as low-predictability using a probabilistic outcome (Volz et al., 2003, 2004, 2005; Delgado et al., 2005b; Knutson et al., 2005; Huettel, 2006; Tobler et al., 2007; Chandrasekhar et al., 2008; Preuschoff et al., 2008; Abler et al., 2009) or by a perceptual difficulty to judge (Heekeren et al., 2004; Grinband et al., 2006; Callan et al., 2009). Despite the low-predictability, these experimental situations subsume that one of the possible answers is correct. In these situations, participants must adjust their decision to comply with the externally defined sole correct answer.

For example, Volz et al. (2003) manipulated low-predictability by the probabilistic outcome. They examined brain activity during participants' prediction of which of the two concurrently presented visual stimuli would win. Each of the pairings of figures was associated systematically with a particular probability of winning from 60 to 100% (e.g., B wins against C with a mean probability of 60%). In their experiment, participants were never given explicit information about these probabilities.

As the manipulation of low-predictability, Hsu et al. (2005) manipulated the predictability of the probabilities of different outcomes. They compared neural substrates of decisionmaking under risk (low-predictability outcomes with predictable probabilities) and ambiguity (low-predictability outcomes with unpredictable probabilities) which are two conditions in which the consequences of possible outcomes have low-predictability.

Not only the probabilistic outcome, perceptual difficulty to judge is also used to manipulate uncertainty (Heekeren et al., 2004; Grinband et al., 2006; Callan et al., 2009; Banko et al., 2011). For example, Heekeren et al. (2004) used face and house stimuli to which were added several levels of noise to manipulate the amount of sensory evidence in the stimuli. Participants were asked to decide whether a presented image was a face or a house. Although an objectively correct answer was presented, it was difficult to predict which judgment (house or face) was correct for the stimulus during simultaneous presentation of noise.

Results of these neuroimaging reports using probabilistic outcome and perceptual difficulty have typically shown increased activity within the dorsal part of the medial prefrontal cortex (DMPFC; Volz et al., 2003, 2004; Hsu et al., 2005; Knutson et al., 2005; Grinband et al., 2006; Krain et al., 2006; Callan et al., 2009; Mohr et al., 2010a), lateral prefrontal cortex (LPFC; Volz et al., 2003, 2004; Heekeren et al., 2004; Hsu et al., 2005; Krain et al., 2006; Abler et al., 2009; Callan et al., 2009), orbitofrontal cortex (Hsu et al., 2005; Tobler et al., 2007; Abler et al., 2009), insula (Volz et al., 2003, 2004; Heekeren et al., 2004; Knutson et al., 2005; Grinband et al., 2006; Krain et al., 2006; Callan et al., 2009; Mohr et al., 2010a), and thalamus (Volz et al., 2003; Heekeren et al., 2004; Grinband et al., 2006; Krain et al., 2006; Callan et al., 2009; Mohr et al., 2010a).

## *Theoretical accounting for externally guided decision-making under uncertainty*

The process of externally guided decision-making has generally been interpreted in the context of a reinforcement learning (RL) model. In that model, the expected value (i.e., the magnitude of outcome times the probability of outcome) biases the decision; the expected value is modified based on the prediction error (i.e., discrepancies between expected and actual rewards; e.g., O'Doherty et al., 2004; Tanaka et al., 2004; Kim et al., 2006; Yoshida and Ishii, 2006; Behrens et al., 2007; Cohen, 2007; Boorman et al., 2009; Glascher et al., 2009; Wunderlich et al., 2009).

Corresponding neural substrates to this model and related concepts have well been identified. The expected value is typically processed within the orbitofrontal cortex, amygdala, ventral striatum, and insula. Prediction error is related to the ventral striatum and the dorsal anterior cingulate cortex (dACC;Tanaka et al., 2004; Daw et al., 2006; Kim et al., 2006; Cohen, 2007; O'Doherty, 2007; Tom et al., 2007; Rolls et al., 2008; Glascher et al., 2009;Wunderlich et al., 2009, 2011).

Hampton et al. (2006) reported results suggesting an important limitation of the RL model. They sought to ascertain whether the use of stored knowledge of the task structure guides choice or whether learned values guide choice without assuming a higherorder structure, as in the standard RL model. A computational model of standard RL model and another model that exploits knowledge of a task structure for a probabilistic reversal learning task (i.e., when one action is "good" the other is "bad" and *vice versa*, as well as the rule that after a time the contingencies will reverse; structure-based model) were then constructed and fitted to both the behavioral and fMRI data.

The results revealed that neural activity in the ventral region of MPFC (VMPFC), the orbitofrontal cortex and the posterior dorsal amygdala were more consistent with the expected reward signal from a structure-based model than with that from an RL model.

Their results imply that the standard RL model is not always appropriate for the analysis of decision-making in the human brain. The limitation of the standard RL model was also pointed out by other studies (Daw et al., 2006; Hampton et al., 2008; Pearson et al., 2011).

Taken together, externally guided decision-making under uncertainty has been investigated mainly using a task with a probabilistic outcome or stimuli with perceptually difficult judging. Although the RL model has generally been used to interpret externally guided decision-making, it is also pointed out that the model cannot fully explain the brain functions for externally guided decision-making under uncertainty.

## **EXTERNALLY GUIDED DECISION-MAKING IN A SOCIAL SITUATION** *Operational characteristics of externally guided decision-making in a social situation*

In addition to the probabilistic outcome and perceptual difficulty, an outcome that is varied (or believed to be varied) by other people's decisions has been used in externally guided decisionmaking (e.g., trust game and prisoner's dilemma game; Rilling et al., 2002, 2004, 2008a; Delgado et al., 2005b; Elliott et al., 2006; Sanfey, 2007; Frith and Singer, 2008; McCabe and Castel, 2008; Assaf et al., 2009; Wischniewski et al., 2009; Yoshida et al., 2010). Despite low-predictability on a social basis, experimental situations include the presumption that one of the possible answers is correct, and participants are required to adjust their choices to comply with an externally defined single correct answer. For that, one might consequently want to categorize tasks of these kinds, called neuroeconomic tasks, as involving externally guided decision-making.

The study by Gallagher et al. (2002) is a good example of an externally guided decision-making in a social situation. They studied brain activation in humans who played the game rock– scissors–paper against a human or a computer. The play of the "human" or the "computer" did not actually differ: they were random sequences.

In their experiment, greater activity was visible in the pregenual ACC (pACC) and MPFC when participants believed they were playing against a human as opposed to a computer. Similar observations have been obtained using neuroeconomic tasks of other kinds (prisoner's dilemma game, Rilling et al., 2004; guessing task, Elliott et al., 2006; domino game, Assaf et al., 2009; and a beauty contest game, Coricelli and Nagel, 2009).

## *Theoretical accounts for externally guided decision-making in a social situation*

The control conditions of these experiments were non-social lowpredictability decision-making (e.g., random sequences of outcomes), meaning that the differences between conditions were not uncertainty itself but were differences in the stance of the participants (i.e., playing against a person, or against a computer). For that reason, the observed brain activities when participants believe they are playing against another person compared to the control task have been inferred as reflecting the process of thinking about the mental state of that person (mentalizing; Frith and Frith, 1999; Frith and Singer, 2008).

Hampton et al. (2008) presented evidence that mentalizing has the function of guiding decision-making during game performance. They scanned human participants using fMRI while they played a repetitive inspection game in which employees decide whether to work or shirk at each trial and an employer decides whether or not to inspect the work area. In addition to a simple RL model, the following two computational models were used to analyze the behavioral and fMRI data: a fictitious model, which exploits prediction of the opponent's next actions considering the history of prior actions by the opponent; and an influence model, which exploits not only tracking of the opponent's actions but which also incorporates knowledge of how one's own actions influence the opponent's strategy.

As a result, the influence model provided a better fit to participants' behavior than did either the fictitious model or the RL model. Regarding brain activity, results show that the expected reward signal from the influence model provides a better account of the neural data in MPFC than does that from a simple RL model. These results suggest that mentalizing engaged in MPFC affects reward prediction, and that it might be used to guide choice during game performance.

Collectively, these neuroeconomic researchers have examined the effects of social interaction in externally guided decisionmaking. Even if the outcome varied by other people's decisions, a correct answer is determined externally, and participants are required to predict which option produces a better outcome in each trial. Different from the externally guided decision-making under uncertainty, however, the results from these neuroeconomic studies do not reflect uncertainty itself, but instead reflect the effects of social interaction. These reports described that signals in MPFC related to mentalizing have a function of biasing decisionmaking in a social situation to choose an externally determined correct option.

## **INTERNALLY GUIDED DECISION-MAKING**

#### *Operational characteristics of internally guided decision-making*

Uncertainty and social situations still presuppose some externally determined single correct answer, although that answer is chosen with low-predictability. How about the complete absence of one correct answer based on external circumstances, even when given no low-predictability choices? In such cases, we cannot rely on an externally determined objectively correct answer to choose and to regulate one's own behavior, and the answer and its correctness depends on one's own, i.e., internal, preferences rather than on circumstantial, i.e., external, criteria (Goldberg and Podell, 1999, 2000; Lieberman and Eisenberger, 2005; Volz et al., 2006; Nakao et al., 2009b).

Such situations are apparent in the context of moral decisionmaking (Moll et al., 2001, 2002, 2006; Zysset et al., 2002, 2003; Heekeren et al., 2003, 2005; Greene et al., 2004; Paulus and Frank, 2006; Schaich Borg et al., 2006; Greene and Paxton, 2009; Cikara et al., 2010; Hare et al., 2010; Sommer et al., 2010; Caspers et al., 2011; Kahane et al., 2011; Schleim et al., 2011). For instance, when requiring participants to decide about giving money to either

themselves or to a charitable organization, the study by Moll et al. (2006) does not presume that either of the two options is correct. Here, the outcome indicating that the participant receives money (the good outcome in the case of the externally guided decisionmaking) is not necessarily a correct answer because, taking a more moral stance, the donation to the charitable organization might be regarded as the correct answer. While adopting the subject's viewpoint and that person's own financial interests, receiving the money (rather than giving it to charity) would be regarded as the correct answer. This choice entails that the decision (whether participant choose their behavior based on self-interest or moral) depends on criteria employed by the participant. Results demonstrate that costly decisions (choosing costly donation or costly opposition) were associated more closely with activation of the MPFC than pure reward decisions were.

A similar finding was also reported by Greene and Paxton (2009). They examined neural activity involved in participants' decisions of whether to tell the truth or lie when reporting their success at predicting the outcome of coin flips. In this task, if participants report their success at the prediction, then they win the amount of money shown. In contrast, if they report their failure at the prediction, they lose the amount of money shown. In this task, lying to get the money is not a good choice from a moral viewpoint. Nevertheless, reporting the successful prediction is a good choice for obtaining money even if it is based on lying. Consequently, neither of the choices was the correct answer. The authors found DMPFC, LPFC, and right parietal lobe activity when dishonest people chose to tell the truth instead of lying for profit.

In addition to such moral decision-making, preference judgments are included in internally guided decision-making. In the preference judgment task, participants are required to make a decision based on personal criteria; the judgment is not based on external criteria. Preference judgments of many kinds have been used in previous studies: preference judgment for food (Arana et al., 2003; Paulus and Frank, 2003; Hare et al., 2009; Piech et al., 2009; Linder et al., 2010), products (Knutson et al., 2007, 2008), brands (Santos et al., 2011), faces (Kim et al., 2007; Chen et al., 2010), holiday options (Chaudhry et al., 2009), paintings (Jarcho et al., 2011), political beliefs (Zamboni et al., 2009), occupations (Nakao et al., 2009a, 2010c), task types (Forstmann et al., 2006), agencies of choice (Forstmann et al., 2008), shapes (Jacobsen et al., 2006), and colors (Goldberg and Podell, 1999, 2000; Johnson et al., 2005).

For instance, Paulus and Frank (2003) investigated brain activity during preference judgment for soft drinks. They presented two pictures of a soft drink in each trial. In preference judgment tasks, participants were asked to judge which drink they would like better. In the control task (visual discrimination task), stimuli were the same picture set with the preference judgment task. Then they were asked to identify which soft drink was in a bottle, a can, or a carton: the control task has an objective correct answer with no uncertain situation.

Analogously, Nakao et al. (2009a) used an occupational choice task (e.g., Which occupation do you think you could do better? – dancer or chemist) without an objectively correct answer and a word-length task (e.g., Which word is longer? – dentist or comedian) that has one certain correct answer. In the occupational choice task, participants were clearly instructed that there is neither an objectively correct answer nor a contingent outcome with each decision.

These preference judgment tasks typically show increased activity within the pACC, VMPFC, and posterior cingulate cortex (PCC) compared with the control task, which is the externally guided decision-making with a certain correct answer.

In sum, internally guided decision-making has been investigated in moral judgment and preference judgment studies. When compared with the judgment task with a clear objective correct answer, several neural substrates' increased activity has been observed during internally guided decision-making. Although MPFC seems to be observed consistently in internally guided decision-making (Nakao et al., 2009b, 2010b), no previous report has described examination of which brain regions were activated consistently among internally guided decisionmaking studies using a quantitative approach. Furthermore, no report has described a study that has investigated the differences and similarities of neural substrates between the two kinds of decision-making representing real-life decision-making (i.e., internally guided decision-making and externally guided decisionmaking under uncertainty). For that purpose, we conducted the exploratory meta-analysis described hereinafter.

## **METHOD**

## **STUDY SELECTION**

Research papers were found primarily by searching the PubMed database (http://www.ncbi.nlm.nih.gov/pubmed/) using the keywords ("fMRI" or "functional magnetic resonance imaging" or "PET" or "positron emission tomography") and ("decisionmaking") and ("uncertain" or "uncertainty" or "probability" or "probabilistic" or "difficult" or "difficulty" or "neuroeconomic" or "economic" or "social" or "game" or "moral" or "morality" or "ethic" or "ethical" or "preference" or "prefer" or "belief" or "free" or "evaluation") As additional references, we added several reports from the reference lists of the relevant articles to ensure the inclusion of all relevant studies fitting our criteria. The reference lists of several review articles were also inspected (Frith and Frith, 1999; Bechara et al., 2000; Rolls, 2000, 2004; Greene and Haidt, 2002; Casebeer, 2003; Greene, 2003; Fellows, 2004; Glimcher and Rustichini, 2004; O'Doherty, 2004, 2007; Sanfey and Cohen, 2004; Moll et al., 2005; Roberts, 2006; Sanfey et al., 2006; Volz et al., 2006; Coricelli et al., 2007; Sanfey, 2007; Wallis, 2007; Frith and Singer, 2008; Heekeren et al., 2008; Lee, 2008; Platt and Huettel, 2008; Rangel et al., 2008; Rilling et al., 2008b; Rolls and Grabenhorst, 2008; Rushworth and Behrens, 2008; Sanfey and Chang, 2008; Vorhold, 2008; Knabb et al., 2009; Volz and von Cramon, 2009; Wischniewski et al., 2009; Balleine and O'Doherty, 2010; Mohr et al., 2010a,b; Nakao et al., 2010b; Rangel and Hare, 2010; Liu et al., 2011).

In the relevant literature, we included reports of studies of decision-making of the following kinds (see also **Table 1**). (1) Brain activity coordinatesfrom healthy participants were included. Those of neurological or psychiatric patients and those using medications were not included. (2) Only reports describing all the significant activation foci as 3D coordinates (*x*, *y*, *z*) in the space of the MNI template or the atlas of Talairach and Tournoux were included; those of studies based on region of interest (ROI)

#### **Table 1 | Summary of inclusion and exclusion criteria for meta-analysis**


analysis were excluded. (3) Data related to brain activity revealed by task comparison or image-subtraction methods, parametric designs, or brain-behavioral correction were included. Data related to changes in functional or effective connectivity were excluded. (4) Only activation data were included in the relevant analysis; deactivation data were not considered. (5) A study was regarded as decision-making-related if it necessitated that a participant make a decision. We therefore excluded all studies in which participants were not required to make a decision.

In the review part of this paper, we cited possible related articles. For our meta-analysis, however, we selected the articles more strictly for comparison between externally and internally guided decision-making. In numerous externally guided decisionmaking studies, psychological/computational models (e.g., RL model) and related concepts (e.g., expected value and prediction error) have been used broadly (e.g., Hampton et al., 2006; Cohen, 2007). Although these models and concepts presuppose the presence of outcomes, internally guided decision-making did not presuppose the presence of an outcome (i.e., an objectively correct answer): these models and concepts used in externally guided decision-making were not applicable to internally guided decision-making. This difference makes it difficult to use the

results obtained using models and concepts of these kinds for comparisons between externally guided decision-making and internally guided decision-making. For that reason, in the analyses presented herein,we did not include reports of studies of externally guided decision-making based on these models and concepts. We chose externally guided decision-making studies that focused on the effect from the situation with uncertainty or with social interaction (e.g.,low-predictability vs. high-predictabilityfor externally guided decision-making under uncertainty; low-predictability in a social situation vs. low-predictability in a non-social situation for externally guided decision-making in a social situation).

Similarly, as representative of internally guided decisionmaking, we chose studies which specifically addressed the effect from a situation without an externally determined correct answer (no objective correct answer vs. a single objective correct answer).

See the following and **Table 1** for details related to inclusion and exclusion criteria.

#### *Externally guided decision-making*

As externally guided decision-making studies, we included reports of studies using a task in which one choice was associated with a better outcome (e.g., reward) than others, indicating that the choice is correct. We also included studies using a task in which no feedback was presented, but for which the task has one objective correct answer and participants had to try to respond correctly (e.g., Heekeren et al., 2004; Hsu et al., 2005; Callan et al., 2009; Banko et al., 2011). For comparison with internally guided decision-making, we excluded reports of studies using a computational model that is not applicable to internally guided decision-making to analyze the fMRI data (e.g., RL model with incorporating the effect of the situation of low-predictability (task structure; Hampton et al., 2006, see review part for the details). We excluded neural activations that are specific to the feedback epoch and prediction error (e.g., Wittmann et al., 2008), which cannot be compared with internally guided decision-making.

*Externally guided decision-making under uncertainty.* As reports of studies of externally guided decision-making under uncertainty we included those of studies which investigated the effect of a situation in which it is difficult to predict a correct answer because of the insufficient information for judgment (e.g., low probability of reward > high probability of reward). Contrasts that investigated the effect of risk (e.g., Cohen, 2007; Xue et al., 2009; Van Leijenhorst et al., 2010) or expected value (e.g., Rolls et al., 2008; Symmonds et al., 2010;Wu et al., 2011b) were excluded in cases where these were manipulated not only by the probability of outcome but also by the amount of outcome.We excluded them because our main interest here is not the effect of the amount of the outcome but the effect from a low-predictability (i.e., uncertain) situation (for results of a meta-analysis of reward/outcome-related brain regions, see Liu et al., 2011; for results of meta-analysis of risk-related brain regions, see Mohr et al., 2010a).

*Externally guided decision-making in a social situation.* With studies of externally guided decision-making in a social situation, we included reports of studies that investigated a brain region that is sensitive to the varied outcome by other people's decisions [e.g., low-predictability (social) > low-predictability (nonsocial)]. Contrasts that investigated the effects from different decisions were excluded (e.g., share vs. keep decisions in a trust game as described by Delgado et al., 2005a).

#### *Internally guided decision-making*

For studies of internally guided decision-making, we included studies using tasks in which no stimulus or option was regarded as correct. Studies investigating differences of decision-making for problems with no correct answer from decision-making for problems with one correct answer were included. We excluded the contrasts which compared different kinds of internally guided decision-making (e.g., Schaich Borg et al., 2006; Hare et al., 2009; Sommer et al.,2010).We also did not include contrasts which compared different decisions in the internally guided decision-making (e.g., Sanfey et al., 2003; Greene et al., 2004). We excluded a study using a task that clearly requires participants to make judgments based on social criteria instead of the participants' own criteria (Prehn et al., 2008).

#### **ANALYSIS TO EVALUATE THE BALANCE BETWEEN SELECTED STUDIES**

To evaluate stimulus-specific effects in the comparison between externally and internally guided decision-making, the stimulus types (verbal/non-verbal or visual/auditory) of these studies were described respectively for these studies of decision-making (see **Table A1** in Appendix). Chi-square tests or Fisher's exact tests were conducted to examine whether the constitution of studies relying upon stimuli of different types differs between externally and internally guided decision-making. Because MKDA results are also affected by the sample size and the quality of the statistical analysis of the original studies, studies in these categories were also compared relative to their sample size and the false discovery rate correction they adopted.

Furthermore, to assess the influence of difficulty of the experimental tasks on the meta-analysis, the response time differences between the compared conditions were calculated (e.g., uncertain – control conditions, preference judgment – control condition; see **Table A1** in Appendix). In cases where parametric design (e.g., decreasing predictability, 50% > 69% > 100%) was used, we took the average of all the differences between close conditions (e.g., average between 50–69 and 69–100%). In several studies, the exact differences of reaction times were not available, although the results of statistical analyses were available. To take account of these cases, we conducted a Chi-square test using data showing whether the reaction times of the experimental condition (uncertain or internally guided) were significantly longer than those of the control condition or not.

#### **MULTI-LEVEL KERNEL DENSITY ANALYSIS**

We conducted MKDA (Wager et al., 2007, 2009), a coordinatebased meta-analysis method, for peak coordinates in a particular statistical contrast map (SCM) of the selected decision-making studies. In this method, the probability of activation of a given voxel in the brain across the studies is estimated. The null hypothesis is a random distribution of peak coordinates within each comparison in the standard brain. The well-established MKDA approach (Wager et al., 2007, 2009) has been used in several studies (Etkin and Wager, 2007; Kober et al., 2008; Wang et al., 2010; Fan et al., 2011; Qin and Northoff, 2011). The MKDA method was selected because of its several important advantages over the meta-analysis approaches used previously (ALE, KDA). First, the previous methods analyzed the peak coordinatesfrom a set of studies without considering the nesting of peaks within contrasts. Such procedures produce results that are biased by numerous peak coordinates reported in a single study. In theMKDA approach,multiple peaks are nested within a contrast, and multiple contrasts are nested within a study. This method enables true assessment of consistency across studies. A second advantage is that MKDA allows the weighting of contrasts by study sample size and by the quality of analyses based on random or fixed-effects designs used in the original study. These weights allow for studies with more numerous participants. Alternatively, random effects designs are assigned greater weight to exert more influence on the meta-analytic results. Finally, the results from MKDA provide a straightforward interpretation as a weighted proportion of activated contrasts within a kernel around (typically 10 mm of) each voxel (Kober et al., 2008).

For the present meta-analysis, relevant variables were sample size, analysis type (fixed or random effects), and coordinates of peak activation in selected contrast of previous studies. The coordinates in Talairach space were translated into MNI space. The coordinates from the one contrast were used to build one special SCM, and the coordinates from each SCM were convolved with a spherical kernel of 10 mm radius. The voxels within 10 mm around the coordinate were thresholded at a maximum value of 1. The SCM were then weighted by the sample size and the analysis type (fixed or random effects). The weight for each contrast was the square root of the sample size, multiplied by an adjustment weight for the analysis type (1 for the resulted from a randomeffect analysis; 0.75 for the results from a fixed-effects analysis). We did not consider the *Z*-scores of each study because they are not provided by all studies we selected. In addition, their inclusion has been shown to affect the replicability of activation across studies, thereby rendering interpretation more difficult (Kober et al., 2008; Wager et al., 2009). A statistical threshold was established through 5000 iterations of a Monte Carlo procedure. The results were reported as an MKDA statistic map at a height threshold of familywise error rate (FWE) corrected at *p* < 0.05, a stringent threshold of FWE corrected for spatial extent at *p* < 0.05 with primary thresholds of uncorrected *p* < 0.001, and a medium threshold of FWE corrected for spatial extent at *p* < 0.05 with primary thresholds of uncorrected *p* < 0.01.

To compare the differences and similarities between externally and internally guided decision-making, we conducted the metaanalysis in two steps. First, we conducted the meta-analysis for decision-making of each kind [i.e., externally guided decisionmaking (uncertainty), externally guided decision-making (social), and internally guided decision-making]: separate MKDA statistic maps were constructed for decision-making of each kind. Two of these maps were mounted on the same standard brain to indicate the distinctive regions involved in these instances of decision-making. Inclusive masks were applied to determine the overlap between two of these activation maps (i.e., externally guided decision-making (uncertainty) and internally guided decision-making, or externally guided decision-making (social) and internally guided decision-making). The overlap analyses were conducted using MRIcroN (Rorden, 2007).

Second, we compared the activation of externally guided decision-making (uncertainty) and internally guided decisionmaking by subtraction analysis in MKDA: separate maps constructed for decision-making of each of the two types were subtracted to yield difference maps. The same procedure was employed in the course of the Monte Carlo randomization to establish a threshold for significant differences. We did not construct difference maps between externally guided decision-making (social) and internally guided decision-making, or between externally guided decision-making (social) and externally guided decision-making (uncertainty) because only six studies were included for externally guided decision-making (social).

#### **RESULTS**

### **BALANCE BETWEEN THE SELECTED STUDIES FOR EACH DECISION-MAKING**

Of the studies considered, 18 studies (24 contrasts, 205 coordinates, 293 participants in total) were regarded as relevant for externally guided decision-making (uncertainty), 6 studies (8 contrasts, 49 coordinates, 86 participants) were included for externally guided decision-making (social), and 18 studies were selected for internally guided decision-making (22 contrasts, 143 coordinates, 303 participants; see **Table A1** in Appendix). Chisquare tests show a significant difference for the number of studies among these three categories [χ*2*(2) <sup>=</sup> 6.86, *<sup>p</sup>* <sup>=</sup> 0.03]. *Post hoc* Bonferroni tests (*p* < 0.05) revealed no significant difference between externally guided decision-making (uncertainty) and internally guided decision-making. The studies of externally guided decision-making (social) were fewer than those of externally guided decision-making (uncertainty) and internally guided decision-making. Because of the low number of externally guided decision-making (social), we did not use the dataset for externally guided decision-making (social) to construct difference maps [i.e., externally guided decision-making (uncertainty) vs. externally guided decision-making (social), and externally guided decision-making (social) vs. internally guided decision-making] in the following MKDA analysis.

Regarding externally guided decision-making (uncertainty) and internally guided decision-making, Fisher's exact test revealed no significant difference related to the stimulus modality (visual or auditory; *p* = 1.00). Moreover, no significant difference was found related to the quality of statistics [corrected or uncorrected; <sup>χ</sup>*2*(1) <sup>=</sup> 1.78, *<sup>p</sup>* <sup>=</sup> 0.18], and the sample size [*t*(34) <sup>=</sup> 0.20, *p* = 0.84]. No significant difference of the sample size was observed even when we included externally guided decision-making [social; *F*(2,39) = 0.24, *p* = 0.79]. A significant difference was found related to the proportion of verbal stimulus and non-verbal stimulus (Fisher's exact test *p* < 0.01). Verbal stimuli tended to be used more in internally guided decision-making; non-verbal stimuli were used more in externally guided decision-making under uncertainty (see **Table A1** in Appendix).

Furthermore, to assess the influence of difficulty of the experimental tasks on the meta-analysis, the response time differences between the compared conditions (e.g., uncertain – control, or internally guided – control) were calculated. No significant difference of the reaction-time differences was observed between externally guided decision-making (uncertainty) and internally guided decision-making [*t*(24) = 1.18, *p* = 0.25]. No significant difference was observed even when we included externally guided decision-making in social situations [*F*(2,25) = 1.91, *p* = 0.17]. Consistently, no significant difference was found related to the statistical difference of reaction times (significantly longer in experimental condition or not) between externally guided decisionmaking (uncertainty) and internally guided decision-making [χ*2*(1) <sup>=</sup> 0.27, *<sup>p</sup>* <sup>=</sup> 0.60].

To assess whether the experimental conditions (uncertain, social, or internally guided) induced a longer time to make a decision than the control condition, we compared the reactiontime differences with 0 (no difference of reaction time between the conditions) within each type of decision-making. No significant difference was observed in any type of decision-making [externally guided (uncertainty), Welch's *t*(8) = 0.50, *p* = 0.63; externally guided (social), Welch's *t*(1) = 1.47, *p* = 0.38; internally guided,Welch's*t*(16) = 1.31, *p* = 0.21]. Consistent with these results, Chi-square tests for the statistical difference of reaction times (significantly longer in experimental condition or not) revealed no significant differences in externally guided [uncertainty,χ2(1) <sup>=</sup> 1, *<sup>p</sup>* <sup>=</sup> 0.32] and in internally guided [χ2(1) <sup>=</sup> 0.07, *p* = 0.80]. Because of the small sample size, we were unable to use Chi-square tests for externally guided decision-making (social).

#### **MKDA RESULTS**

## *Externally guided decision-making (uncertainty) vs. internally guided decision-making*

Meta-analysis results indicated different neural representation patterns for externally guided decision-making (uncertainty; **Figure 1A**) and internally guided decision-making (**Figure 1C**; see also **Table 2**). **Figure 2A** presents results of statistical overlap as based on inclusive masking. Regions with significant proportions of activation for the externally guided decision-making were in DMPFC, dorsal LPFC (DLPFC), insula, thalamus, and IPL. For internally guided decision-making, the clusters in MPFC, pACC, PCC, and superior temporal gyrus (STG) were revealed. Only the DMPFC (BA 8) overlapped between decision-making of the two kinds. Although we refer to the overlapped region as DMPFC hereinafter, it is noteworthy that the same region (BA8) has been mentioned also as a part of the supplemental motor area (SMA; Caria et al., 2011) and pre-SMA (Rubia et al., 2001; Chen et al., 2010) in several previous studies.

**Figure 3** presents results from the two difference maps as based on their respective contrasts [i.e., externally guided decisionmaking (uncertainty) < / > internally guided decision-making]. Although the extensions of the several clusters were restricted, the direct comparison showed (more or less) similar regions to those portrayed in **Figure 1**. Internally guided decision-making showed larger clusters in mainly medial cortical regions while externally guided decision-making showed stronger clusters in lateral regions (see also **Table 3**).

## *Externally guided decision-making (social) vs. internally guided decision-making*

**Figure 1B** presents results of externally guided decision-making in a social situation. To observe the effect from social component included in internally guided decision-making, we mounted the MKDA results of externally guided decision-making (social) and internally guided decision-making on the same stereotaxic standard brain, and indicated the statistical overlaps (**Figure 2B**. DMPFC (BA8, 9) overlapped between social and internally guided decision-making. In contrast, no overlap was observed in the other regions observed in internally guided decision-making.

**FIGURE 1 | Multi-level kernel density analysis results for (A) externally guided decision-making under uncertainty, (B) externally guided decision-making in a social situation, and (C) internally guided decision-making.** Results from the different statistical thresholds are shown with different colors: cyan, pink, and yellow, a height threshold of familywise error rate (FWE) corrected at p < 0.05; orange, a stringent threshold of FWE corrected for the spatial extent at p < 0.05 with primary thresholds of uncorrected p < 0.001; blue, violet, and red, a medium

threshold of FWE corrected for the spatial extent at p < 0.05 with primary thresholds of uncorrected p < 0.01. No clusters were identified at the stringent threshold in externally guided decision-making under uncertainty or in a social situation. DMPFC, dorsomedial prefrontal cortex; DLPFC, dorsolateral prefrontal cortex; IPL, inferior parietal lobule; IFG, inferior frontal gyrus; pACC, perigenual anterior cingulate cortex; PCC, posterior cingulate cortex; MPFC, medial prefrontal cortex.; STG, superior temporal gyrus.


### **Table 2 | MKDA results for decision-making studies of each type.**

Regions marked \*\* were significant at FWE voxel-level corrected p < 0.05 with extent size >10 voxels. Regions marked \*were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.001. Regions marked †were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.01. Regions marked with\* and with †were reported if these were additional regions. BA denotes Brodman Area; Maxstat. denotes maximum of the Z field.

**between externally guided decision-making under uncertainty and internally guided decision-making and (B) between externally guided decision-making in a social situation and internally guided decision-making.** DMPFC, dorsomedial prefrontal cortex.

## **DISCUSSION**

## **OPERATIONAL DIFFERENCES BETWEEN EXTERNALLY AND INTERNALLY GUIDED DECISION-MAKING**

As we described earlier in the review part, experimental– operational differences existed between externally and internally guided decision-making. Externally guided decision-making studies have used the decision-making task with a single correct answer that is less-predictable. In these situations, participants must adjust their decision to comply with the externally defined single correct answer. Uncertainty (i.e., low-predictability) has been manipulated with a probabilistic outcome or with stimuli that are perceptually difficult to judge. In studies of externally guided decision-making in a social situation, an outcome that is varied (or which was believed to be varied) by other people's decisions has been used.

In contrast with such externally guided decision-making, in internally guided decision-making, no correct answer based on external circumstances is available for the subject. Studies of such decision-making have been used for moral judgment and preference judgment tasks for which the answer depends on the subject's own, i.e., internal, preferences rather than on external, i.e., circumstantial, criteria (see **Figure 4** for a summary of the difference between externally and internally guided decision-making).

## **NEURAL DIFFERENCES BETWEEN EXTERNALLY AND INTERNALLY GUIDED DECISION-MAKING**

Our meta-analysis indicated that different neural networks were recruited for externally guided decision-making (uncertainty) and internally guided decision-making. The DMPFC–DLPFC–insula– thalamus–IPL network was activated consistently in externally guided decision-making under uncertainty (see **Figures 1A** and **3A**). This result was consistent with the results of previous metaanalysis study about risky decision-making (Mohr et al., 2010a), which confirms that the method used here works properly and that it produces reliable results.

In internally guided decision-making, MPFC–pACC–PCC– STG network was activated consistently (see **Figure 1C**). Even when we compared externally guided decision-making under

**uncertainty** *>* **internally guided decision, and for (B) internally guided decision-making** *>* **externally guided decision under uncertainty.** Results from the different statistical thresholds are shown with different colors: cyan, pink, and yellow, a height threshold of familywise error rate (FWE) corrected at p < 0.05; orange, a stringent threshold of FWE corrected for spatial extent at p < 0.05 with primary thresholds of

uncorrected p < 0.01. No cluster was observed at the stringent threshold in externally guided decision-making under uncertainty > internally guided decision-making. DMPFC, dorsomedial prefrontal cortex; DLPFC, dorsolateral prefrontal cortex; IPL, inferior parietal lobule; pACC, perigenual anterior cingulate cortex; PCC, posterior cingulate cortex; MPFC, medial prefrontal cortex; STG, superior temporal gyrus.



Regions marked \*\* were significant at FWE voxel-level corrected p < 0.05 with extent size > 10 voxels.

Regions marked \*were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.001.

Regions marked †were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.01.

Regions marked with\* and with †were reported if these were additional regions.

BA denotes Brodman Area; Maxstat. denotes the maximum of the Z field.

uncertainty and internally guided decision-making directly, the same networks remained for each category of decision-making (see **Figure 3B**).

The only common region between these two was DMPFC (**Figure 2A**), which was broader in comparison of externally guided decision-making in a social situation and internally guided

**FIGURE 4 | Schematic summary of differences and relations between externally and internally guided decision-making in terms of operational, neuronal, and theoretical characteristics.** Operational characteristics: clear differences are apparent between these two types of decision-making related to the availability of an externally determined correct answer. Neuronal characteristics: externally guided decision-making under uncertainty is mainly supported by the task-positive network (DLPFC–insula–thalamus–IPL network). In contrast, internally guided decision-making is supported mainly by the task negative, default mode network (DMN). The DMPFC is commonly activated in decision making of these kinds and has functional relations with

decision-making (**Figure 2B**). The VMPFC was, however, limited to internally guided decision-making, even in that comparison. This evidence suggests that the activation of VMPFC–pACC– PCC–STG network was caused neither by uncertainty related to an externally determined correct answer nor by social interaction.

Our results first revealed the neural substrates associated specifically with internally guided decision-making, as distinguished from the neural substrates associated specifically with externally guided decision-making under uncertainty. Externally guided decision-making under uncertainty is probably insufficient to account for our decision-making in everyday life.

## *Balance between the selected studies for externally guided decision-making under uncertainty and internally guided decision-making*

Before further discussion related to meta-analysis results, the difference of stimulus type (verbal or non-verbal) used in externally guided decision-making under uncertainty and in internally guided decision-making should be explained. Verbal stimuli tended to be used more in internally guided decision-making; non-verbal stimuli were used more in externally guided decisionmaking under uncertainty (see **Table A1** in Appendix).

task-positive and task-negative networks. No clear boundary separates decision making processes of different kinds: each decision-making task can be located on the continuum. The extent to which the DLPFC–insula–thalamus–IPL or the VMPFC–pACC–PCC–STG networks becomes involved would differ depending on the decision-making situation. Theoretical characteristics: conflict-based regulation is expected to have an important role for internally guided decision-making instead of outcome-based regulation in the case of externally guided decision-making. The networks for internally guided decision-making are probably modulated according to the amount of conflict evaluated within dACC.

Based on the following four reasons, however, we conclude that the regions observed in our meta-analysis results were not attributable to the difference of stimulus type. First, in every study included in the present meta-analysis, stimuli of the same type with experimental conditions were used in control conditions. For that reason, the coordinates from these studies were not specific to the stimulus type itself, but were specific to uncertainty or absence of an objective correct answer. Second, previous meta-analytical studies of neural substrates for working memory (Owen et al., 2005) and associative learning (Chein and Schneider, 2005) demonstrated broadly similar activation patterns for verbal and non-verbal stimuli including the regions observed in externally guided decision-making under uncertainty. Third, regarding internally guided decision-making, studies included in our metaanalysis and which used non-verbal stimuli (Paulus and Frank, 2003; Johnson et al., 2005; Jacobsen et al., 2006; Chen et al., 2010; Hare et al., 2010) yielded results indicating similar neural substrates with our meta-analysis results. Fourth, although Kobayashi et al. (2007) observed similar brain regions with internally guided decision-making by their mentalizing task, no increased activities within these regions were observed using verbal stimuli compared to non-verbal stimuli.

We found no other significant difference between externally and internally guided decision-making with respect to the stimulus modality (visual or auditory), the sample size, the quality of the statistical analysis (corrected, uncorrected), and differences of reaction times between the experimental condition (uncertain, social, or internally guided) and control condition. Moreover, the reaction times in the experimental condition were not significantly longer than those in the control condition in either the externally guided decision-making under uncertainty or the internally guided decision-making. Based on these results, we conclude that the brain region observation results were not attributable to these factors.

#### *Internally guided decision-making and intrinsic brain activity*

In our meta-analysis results, the DMPFC–DLPFC–insula– thalamus–IPL network was activated consistently in externally guided decision-making under uncertainty. In contrast, VMPFC– pACC–PCC–STG network was activated in internally guided decision-making. This difference is similar to the distinction into two complementary networks, task-positive networks and tasknegative networks, called default-mode networks (DMN; Fox et al., 2005; Broyd et al., 2009; Hampson et al., 2010; Kim et al., 2010; Northoff et al., 2010;Wu et al., 2011a). The task-positive network is known to be activated consistently during goal-directed/externally oriented cognitive tasks, and it is known to include DLPFC, insula, IPL, thalamus, (pre-)SMA, dACC, and the cerebellum (Cabeza and Nyberg, 2000; Fox et al., 2005; Owen et al., 2005; Kim et al., 2010; for detailed hypothetical explanations of the functions of observed regions in externally guided decision-making, see Mohr et al., 2010a).

In contrast, the DMN consists mainly of cortical midline structures (Gusnard and Raichle, 2001; Raichle and Gusnard, 2005) and comprisesMPFC, pACC, PCC, and superior temporal/inferior parietal cortex (Fox et al., 2005;Kim et al., 2010; Qin and Northoff, 2011). The DMN is more active at rest than during externally oriented cognitive tasks (Raichle et al., 2001; Buckner et al., 2008b). The regions within DMN are known to show a high degree of functional connectivity during rest (Raichle et al., 2001; Beckmann et al., 2005; Raichle and Snyder, 2007; Buckner et al., 2008a). Interestingly, the DMN and task-positive network are temporally anticorrelated such that task-induced activation within the taskpositive network is associated with attenuation of the DMN (Fox et al., 2005, 2009). These physiological phenomena are thought to reflect stimulus-independent thought (e.g., mind-wandering; Mason et al., 2007; Christoff et al., 2009), which has been studied since the 1960s from a naturalistic viewpoint (Singer and Antrobus, 1962, 1963; Antrobus et al., 1966, 1970; Wollman and Antrobus, 1986).

The DMN is also activated by a task that requires processing internally generated information, including self-reference (Kelley et al., 2002; Northoff et al., 2006), episodic memory retrieval (Buckner et al., 2008b), envisioning the future (Szpunar et al., 2007), mental imaginary (Hassabis et al., 2007; Daselaar et al., 2010), and mentalizing (Gusnard et al., 2001; Amodio and Frith, 2006). Because of the long lists of psychological contents related to the DMN, it is difficult to attribute any specific psychological function to task-negative regions. The DMN is often summarized more

physiologically as the reflection of intrinsic brain activity in the context of neuroscience (for detailed reviews about task-positive and DMN, see Broyd et al., 2009; Northoff et al., 2010).

Intrinsic brain activity during a resting state is known to affect a stimulus-induced activity (Northoff et al., 2010). For instance, Northoff et al. (2007) measured the level of g-aminobutyric acid (GABA) in pACC, which is part of the DMN during a resting state using magnetic resonance spectroscopy (MRS), in addition to the blood oxygen level dependent (BOLD) response during an emotion judgment task using fMRI. The resting-state level of GABA in the pACC correlated with the degree of decreased BOLD response in the same region induced by an emotional judgment task. This study demonstrated that the resting-state concentration of GABA in the pACC can indeed impact upon stimulus-induced activity changes in the same region pACC.

Based on the rest–stimulus interaction and the overlap between the network for internally guided decision-making with DMN, internally guided decision-making seems to be based largely on intrinsic brain activity.

Taken together, by linking with the notions about the DMN, our meta-analysis results suggest that the decision in internally guided decision-making is based largely on intrinsic brain activity within the DMN (see **Figure 4** for schematic summary). This implication from physiological evidence has high affinity with the psychological nature of internally guided decision-making: decision in internally guided decision-making depends on the participant's own criteria rather than on circumstantial criteria. Internally guided decision-making might be modulated directly by intrinsic brain activity, which can be assessed according to the resting-state brain activity.

## **THEORETICAL DIFFERENCES BETWEEN EXTERNALLY AND INTERNALLY GUIDED DECISION-MAKING**

## *Outcome-based regulation and conflict-based regulation*

Is internally guided decision-making modulated solely by intrinsic brain activity within the DMN? As described earlier in the review part of this report, it is known that the outcomes and feedback are used to regulate externally guided decision-making process (e.g., RL model) to avoid error decision. The outcome-based regulation process is not applicable to internally guided decision-making that does not presuppose the presence of outcomes and feedback (i.e., an objectively correct answer). Is there any regulatory process in internally guided decision-making, as there is in externally guided decision-making?

A possible regulatory process for internally guided decisionmaking is conflict-based regulation instead of outcome-based regulation in the case of externally guided decision-making (see **Figure 4**). Conflict is defined psychologically and computationally as the simultaneous activation of incompatible representations (Botvinick et al., 2001). The abilities of monitoring and regulation of conflict have been investigated extensively in cognitive psychology and neuroscience. Their emphases have been made predominantly on the conflict between error and correct response tendencies using tasks which strongly activate the error response (e.g., Flanker task, Ullsperger and von Cramon, 2001; Takezawa and Miyatani, 2005; Stroop task, Stroop, 1935; MacDonald et al., 2000a; and Simon task, Masaki et al., 2007). Several neuroimaging

studies have documented that greater dACC activation is observed when participants are confronted with situations that demand detection of conflict (MacDonald et al., 2000b; Milham et al., 2003; Kerns et al., 2004; Egner and Hirsch, 2005), whereas the cognitive regulation of conflict (e.g., attentional modulation) is apparently related to the LPFC to reduce conflict (Botvinick et al., 2001, 2004; Kerns et al., 2004).

In addition to the conflict between error and correct response, the dACC evaluates conflict that occurs during internally guided decision-making (Greene et al.,2004; Forstmann et al.,2008;Knutson et al., 2008; Nakao et al., 2009a, 2010a,c; Sommer et al., 2010; Caspers et al., 2011; Kahane et al., 2011). In these studies, the conflict was manipulated based on the number of choices (Forstmann et al., 2008), scenarios of types (Kahane et al., 2011), ratings for each stimulus (Nakao et al., 2009a, 2010c), the chosen frequency of each stimulus (Nakao et al., 2010a), or reaction times (Greene et al., 2004; Knutson et al., 2008; Sommer et al., 2010; Caspers et al., 2011). Irrespective of the mode of conflict manipulation, higher dACC activities were observed in a large-conflict condition than in a small conflict condition during internally guided decision-making in these studies. This evidence suggests that dACC evaluates the conflict between possible decision branches in internally guided decision-making.

The regulation process used to reduce conflict in internally guided decision-making is probably different from that of externally guided decision-making (Lieberman and Eisenberger, 2005; Nakao et al., 2009b, 2010a,c; Chen et al., 2010). Instead of LPFC in the case of externally guided decision-making, MPFC and PCC as the part of the DMN associate with reduction of the conflict. Using psychophysiological interaction (PPI) analyses of fMRI data, Chen et al. (2010) showed that the dACC co-varied significantly more highly with the DMPFC and PCC during a face preference judgment task with no objective correct answer when compared to the control task: a gender judgment task with one correct answer. Similarly, Nakao et al. (2010c) reported that dACC has functional connectivity with VMPFC only during an occupational choice task, as internally guided decision-making, and not during a word-length judgment task. These results suggest that the MPFC and PCC as the parts of the DMN are modulated in response to the amount of conflict evaluated within dACC to reduce conflict during internally guided decision-making (Nakao et al., 2009b, 2010a,c).

One might argue that the dACC is not observed in our meta-analysis results for internally guided decision-making,which means that dACC does not function in internally guided decisionmaking. As described above, the evaluation of conflict within dACC works in situations with and without an objective correct answer. Additionally, the function of dACC is not limited to evaluation of conflict. It includes detection of error (Garavan et al.,2003; de Bruijn et al., 2009) and evaluation of the action value (Rushworth et al., 2007; Walton et al., 2007): dACC can be activated during externally guided decision-making for these functions. For these reasons, dACC activation was not shown in the meta-analysis results for internally guided decision-making based on the previous studies' contrasts of internal decision-making vs. a control task with one objective correct answer without uncertainty (see review part and **Table A1** in "Appendix" for details of the contrasts). We did not include the contrast of large-conflict vs. small conflict in internally guided decision-making as well as results from PPI analyses in our meta-analysis because these did not fit our main aim. However, regarding results from previous studies about conflict evaluation during internally guided decision-making, conflict is evaluated within dACC during internally guided decisionmaking. The evaluated conflict affects the regulation process, which differs from externally guided decision-making.

Taken together, instead of outcome-based regulation in externally guided decision-making, conflict-based regulation might have an important role in internally guided decision-making. The internally guided decision-making is probably based not only on intrinsic brain activity within DMN but also on the dACC as the part of task-positive network.

## *Modulation from attentional network in internally guided decision-making*

Internally guided decision-making, which is supported mainly by the DMN, might also be modulated in anticorrelated way by the network for attentional control. Corbetta et al. (2008) and Corbetta and Shulman (2002) proposed that networks of two types are involved in attending to environmental stimuli: a dorsal frontoparietal network and a ventral frontoparietal network. The dorsal frontoparietal network includes the dorsal parietal cortex (particularly the intraparietal sulcus and superior parietal lobule) and the dorsal frontal cortex (precentral sulcus and frontal eye field; see Figure 2 of Corbetta et al., 2008). The ventral frontoparietal network includes the temporoparietal junction and ventral frontal cortex (i.e., middle frontal gyrus, inferior frontal gyrus, frontal operculum, and anterior insula). When focusing attention on an object, the dorsal frontoparietal network is activated, but the ventral frontoparietal network is deactivated. When an unexpected but important event is evoked, both attentional networks are activated to reorient the attention.

Both of these networks consist mainly of lateral cortical regions (i.e., task-positive network), and do not include the cortical midline structure within the DMN, which is mainly observed in internally guided decision-making. However, the activity within the dorsal frontoparietal network is negatively correlated with the DMN activity (Fox et al., 2005; Golland et al., 2007; Corbetta et al., 2008). When the dorsal frontoparietal network is activated, the DMN is deactivated, and *vice versa*. Such functional connectivity was not observed between the ventral frontoparietal network and the DMN (Corbetta et al., 2008). Although no study has investigated the role of the top-down attentional control in internally guided decision-making, it is possible that the attentional network affect to internally guided decision-making in an anticorrelated way. For instance, when the dorsal frontoparietal network is activated and the ventral frontoparietal network is deactivated (i.e., when attention is focused on external stimuli), the processes for internally guided decision-making are expected to be attenuated.

#### **COMMONALITIES BETWEEN EXTERNALLY AND INTERNALLY GUIDED DECISION-MAKING**

*Overlap between externally and internally guided decision-making* Our meta-analysis results showed that the DMPFC is activated in externally guided decision-making under uncertainty, that in a social situation, and internally guided decision-making. Psychologically, this result suggests that the DMPFC is not modulated solely by the uncertainty of outcome, social situation, or nonavailability of outcome, and that it has common functions in decision-making of these kinds. Physiologically, our results suggest that the DMPFC is co-activated both with DLPFC–insula– thalamus–IPL and/or VMPFC–pACC–PCC–STG networks, and that it has functional relations with these networks.

One might want to argue that the overlap within DMPFC does not reflect that the area was activated both in externally and internally guided decision-making, but the DMPFC was observed because of the extended area from SMA (BA6) in externally guided decision-making and the extended area from VMPFC in internally guided decision-making. That is, the DMPFC observed in externally guided decision-making was caused by the activation within SMA and using a spherical kernel of 10 mm radius in MKDA, it was expanded to the DMPFC (BA8). In contrast, the DMPFC observed in internally guided decision-making was caused by the activity in VMPFC and by a spherical kernel, it was expanded to the area DMPFC. However, as **Figure 2** shows, the area observed in internally guided decision-making was expanded to the posterior part of the overlap. Furthermore, the overlapped area includes the central part of DMPFC observed in externally guided decisionmaking (see **Figures 1A** and **2A**). Based on these observations, it is implausible that the result of DMPFC was the overlap between the edges of the spherical kernels. It would be reasonable to infer that the overlapped area was activated consistently both in externally and internally guided decision-making.

Another possible confounding factor reflected in the overlap is the task difficulty. It is possible that the experimental tasks in both externally guided (i.e., uncertain condition) and internally guided circumstances were more difficult than the control tasks, and that the difference of difficulty was reflected in the DMPFC activation both in externally and internally guided decision-making. However, to assess the effect of the difference of task difficulty between experimental and control conditions, we examined the reaction time difference between these conditions. Results show no significant difference either in externally guided or in internally guided decision-making. The overlap within DMPFC is not expected to reflect the difference of task difficulty between experimental and control tasks.

Although the specific function of the DMPFC remains unclear, one possible role suggested by our result is that it integrates signals from task-positive regions and/or task-negative regions to bias either choice of behavior (see **Figure 4**), which was also proposed in previous articles (Volz et al., 2006; Nakao et al., 2009b). Depending on whether an objective correct answer is available or not, the DLPFC–insula–thalamus–IPL network or VMPFC–pACC–PCC– STG network is strongly activated. However, irrespective of which network is strongly activated, the DMPFC would receive the signals from the activated network(s), then integrate and mediate these signals to the motor control regions to output. In fact, the DMPFC has a strong connection with motor areas (Averbeck and Seo, 2008).

Ochsner et al. (2004) and Ochsner and Gross (2005) reported that the DMPFC was associated with different forms of cognitive control over emotional response. This fact suggests that the DMPFC is the node point between cognition and emotion. The DMPFC might be suited to integrate relevant cognitive and emotional processes in externally and internally guided decisionmaking. For that reason, it is involved in decision-making of both types.

One might be surprised that only the DMPFC was overlapped between these two types of decision-making tasks. One possible reason for the small fraction of overlap is that the data used in meta-analysis were already contrasted in previous studies. Both in the externally guided decision-making under uncertainty and internally guided decision-making, previous studies used a control task which required participants to make judgment in the situation with an objective correct answer without uncertainty. The brain regions which have functions in the control task were not reflected in the results for externally guided decision-making under uncertainty and internally guided decision-making. Therefore, our results might show the small fraction of overlapping. For example, the visual or auditory cortex for stimulus input, motor areafor response, and dACCfor regulation process can be activated during the control task. The striatum, amygdala, and orbitofrontal cortex for reward expectation can also be activated in the control task with reward feedback (e.g., pure monetary rewards task in Moll et al., 2006, and a gambling task using learned rules in Bhanji et al., 2010). We should note that we cannot conclude that the regions which were not observed in the meta-analysis have no function in these decision-making processes.

Another possible reason for the limited overlap area is the nature of MKDA. The MKDA (and other methods of metaanalyses) shows only the consistently activated regions in each category, although this is the aim of the meta-analysis. Consequently, for example, even when one of the studies of internally guided decision-making reported insula activity, such as that of Johnson et al. (2005), it was not reflected in the result from MKDA for internally guided decision-making. Therefore, although the insula was observed in the results of MKDA for externally guided decision-making, that region was not observed as a common region between externally and internally guided decision-making. Again, we should note carefully that the regions that were not observed using MKDA are not equal to the regions which have no function in decision-making. What we can know from the metaanalysis is that the observed regions were observed consistently in previous studies. This point is explained further in the following section.

#### *Relation between externally and internally guided decision-making*

In this report, to examine internally guided decision-making specifically as distinguished from externally guided decisionmaking, we categorized decision-making into externally and internally guided decision-making conceptually and methodologically. Consequently, we showed a difference of neural networks between these two. These two neural networks are, however, thought to be not completely independent of each other. They are merely the two extremes of a single continuum (see **Figure 4**). Each decision-making task can be located on the continuum, and the extent to which the DLPFC–insula–thalamus–IPL or the VMPFC– pACC–PCC–STG networks become involved is expected to differ depending on the decision-making situation.

In fact, several studies included in externally guided decisionmaking have shown activation within the network for internally guided decision-making (e.g., VMPFC, Elliott et al., 1999; Callan et al., 2009; PCC, Coricelli and Nagel, 2009; STG, Elliott et al., 1999; Elliott et al., 2006; Coricelli and Nagel, 2009; and vice versa DLPFC, Johnson et al., 2005; Greene and Paxton, 2009; Schleim et al., 2011; insula, Johnson et al., 2005; IPL, Chen et al., 2010). In addition, Pearson et al. (2011) reviewed mainly monkey singleneuron recording studies and implicated PCC as the part of DMN which has a role in externally guided decision-making. The clear distinctive neural substrates were observed in our meta-analysis because the results of meta-analysis show only the consistently activated regions in each category. This feature functioned well to reveal regions associated with the two extreme categories. However, non-activated regions from MKDA analysis are not equal to non-participating regions in each category of decision-making.

When participants refer to criteria that are probably used predominantly in internally guided decision-making, the VMPFC– pACC–PCC–STG network was activated even in externally guided decision-making. For instance, Hampton et al. (2008) reported increased VMPFC and STG activities during externally guided decision-making in a social situation when they used a computational model incorporating referencing process of one's own actions to analyze fMRI data (see the review section for additional details). Furthermore, Goel and Dolan (2003) used a deductive reasoning task (e.g., "No harmful substances are natural; All poisons are natural; ∴ No poisons are harmful"... true, false, or unsure) with one objective correct answer. They observed increased VMPFC activity when participants reached a decision based on their internal beliefs about the world (e.g., false response for "No poisons are harmful" based on the belief that "Poisons are harmful") instead of logical reasoning (e.g., true response for "No poisons are harmful"). Even in the case of externally guided decision-making, the network that functions predominantly for internally guided decision-making is activated to some degree depending on the task type and the participant's strategy.

Taken together, although one might wish to distinguish decision-making as two completely different phenomena – externally guided or internally guided – such a distinction between networks of the two types becomes relevant based on those earlier studies. How these two networks interact and how they are integrated during real-life decision-making remains to be resolved. However, our meta-analysis results at least suggest that two complementary networks are involved in decision-making and that the DMPFC serves some role in the integrative process.

#### **FUTURE DIRECTIONS**

Our meta-analysis revealed that the neural network used predominantly for internally guided decision-making differs from that for externally guided decision-making under uncertainty. This result suggests that studying only externally guided decision-making under uncertainty is insufficient to account for decision-making processes that take place in a human brain. It is necessary to examine internally guided decision-making more specifically to elucidate the psychological and neural mechanisms of human decisionmaking comprehensively. Furthermore, it would be beneficial to investigate how the two neural substrates for internally and externally guided decision-making mutually interact in day-to-day decision-making situations.

Based on the discussion presented above, we propose two possible directions to investigate internally guided decision-making: rest–stimulus interaction and conflict-based regulation.

#### *Rest–stimulus interaction*

The network for internally guided decision-making overlapped with the DMN. This fact implies that internally guided decisionmaking is strongly affected by resting-state brain activities. Investigating how the resting state affects the decision-making process (i.e., rest–stimulus interaction in decision-making) is a key directive leading to understanding of internally guided decisionmaking. The number of studies investigating the rest–stimulus interactions is growing (Greicius and Menon, 2004; Boly et al., 2007; Northoff et al., 2007, 2010; Wiebking et al., 2010, 2011; Duncan et al., 2011). Using the methods in those earlier studies, further detailed neuronal characteristics of internally guided decision-making would be revealed.

For example, the resting-state EEG for several minutes before conducting experimental tasks can be used to investigate the effect from intrinsic brain activity to internally guided decision-making. As decision-making tasks, color-similarity judgment and color preference judgment tasks which were used in Johnson et al. (2005) are expected to be useful for this purpose (similar tasks were also used by Goldberg and Podell, 1999, 2000). In both tasks, three colored squares are presented in each trial. The colored square presented in the upper center is the target color, and the squares presented in the lower left and right are choices. In the colorsimilarity judgment task, participants are asked to judge which choice is more similar to the target color ("Which is more similar?"). In the color preference judgment task, participants were asked to judge which color pair (target–choice pair) they prefer ("Which do you prefer?").

If intrinsic brain activity modulates internally guided decisionmaking, then the following is expected. Especially in participants who showed more increased resting-state activity (i.e., higher power spectral density during resting state), the color preference judgments are less biased from properties of external stimulus (e.g., color similarity; similar and dissimilar pairs are almost equally selected as the preferred pairs in those participants). In other words, participants who showed higher resting-state activity are expected to rely less on the properties of external stimulus for their preference judgment but might rely greatly on their internal criteria. In the color-similarity judgment, such a relation would not be observed even in cases where the judgment is difficult because of the similar color choices: the color-similarity judgment is the task of making a judgment based on the external stimulus properties. It is expected to be less affected by the intrinsic brain activity.

#### *Conflict-based regulation*

Regarding internally guided decision-making, outcomes and feedback are not available to adjust decision-making processes as externally guided decision-making. For that reason, outcomebased learning and regulation are not applicable to internally guided decision-making. Instead, previous results of studies have suggested that the amount of conflict is evaluated within dACC during internally guided decision-making (Greene et al., 2004; Forstmann et al., 2008; Knutson et al., 2008; Nakao et al., 2009a, 2010a,c; Sommer et al., 2010; Caspers et al., 2011; Kahane et al., 2011), and the signal from dACC is expected to regulate activation within DMN during internally guided decision-making (Chen et al., 2010; Nakao et al., 2010c). Details of conflict-based regulation processes in internally guided decision-making, however, might be less readily apparent. For instance, learning and regulation processes of what kinds are achieved to reduce conflict during internally guided decision-making remains unclear.

Several options are related to manipulation of conflict during internally guided decision-making: stimulus-based manipulation by the number of choices (Forstmann et al., 2008) or type of scenario (Kahane et al., 2011), and individualized manipulation based on reaction time (Greene et al., 2004; Knutson et al., 2008; Sommer et al., 2010; Caspers et al., 2011), ratings (Nakao et al., 2009a, 2010c; Jarcho et al., 2011), or chosen frequency of each stimulus (Nakao et al., 2010a). Although each manipulation has strong and weak points, all are applicable to internally and externally guided decision-making. These methods are useful to investigate the differences of conflict-based regulation process between decision-making of the two kinds.

To measure brain activities relating to conflict-based regulation process, not only fMRI but also event-related brain potentials (ERPs) are useful. The amplitudes of correct and conflict-related negativity (CRN; Simon-Thomas and Knight, 2005; Masaki et al., 2007; Nakao et al., 2010a) and N2 components (Yeung et al., 2004; Bartholow et al., 2005) are known to reflect the amount of conflict.

Nakao et al. (2010a) reported that the amount of conflict during internally guided decision-making (occupational choice in their case) is also reflected in the amplitude of the CRN.

#### **LIMITATIONS**

The meta-analysis results showed clearly that the activation of DMPFC and IFG occurred consistently in externally guided decision-making in social situations, and DMPFC was shared with internally guided decision-making. However, because of limitations imposed by insufficient studies of externally guided decisionmaking in a social situation, we were unable to compare that directly with internally guided decision-making. Replication of the current results when a more extensive and balanced selection of studies becomes available might therefore be warranted.

In the present study, externally guided decision-making under uncertainty has subcategories of two types (see review part and **TableA1** inAppendix):we include the studies manipulating uncertainty by the probabilistic outcome and by the perceptual difficulty. One might argue that perceptual difficulty is different from the probabilistic outcome and that these two types should be separated. We included studies using perceptual difficulty for the following reasons. First, previous studies (Grinband et al., 2006; Callan et al., 2009; Banko et al., 2011) used the concept of uncertainty to describe the psychological state manipulated by perceptual difficulty. Second, our conceptual and operational definitions of uncertainty did not have a positive reason to exclude studies using perceptual difficulty. Third, as we described in the review part, the studies of the two subcategories of externally guided

**under uncertainty using a probabilistic outcome, (B) internally guided decision-making using moral judgment.** Results from the different statistical thresholds are shown with different colors: cyan, pink, and yellow, a height threshold of familywise error rate (FWE) corrected at p < 0.05; light blue, a stringent threshold of FWE corrected for the spatial extent at p < 0.05 with primary thresholds of uncorrected p < 0.001; blue, violet, and red, a

primary thresholds of uncorrected p < 0.01. No clusters were identified at the stringent threshold in preference judgment. DMPFC, dorsomedial prefrontal cortex; DLPFC, dorsolateral prefrontal cortex; IPL, inferior parietal lobule; SPL, superior parietal lobule; IFG, inferior frontal gyrus; pACC, perigenual anterior cingulate cortex; PCC, posterior cingulate cortex; MPFC, medial prefrontal cortex; STG, superior temporal gyrus.

decision-making under uncertainty reported similar neural substrates. Indeed, when we conduct meta-analysis using the studies of probabilistic outcome (see **Figure 5A**; **Table 4**), similar results to those obtained from the meta-analysis using the studies of both subcategories were observed (see **Figure 1A**; **Table 2**): we were unable to conduct a meta-analysis that includes studies of perceptual difficulty because of the scarcity of such studies (four studies). Furthermore, our results for externally guided decisionmaking under uncertainty closely resembled those of a previous meta-analysis study (Mohr et al., 2010a). Based on these reasons, we assume that including these two subcategories into externally guided decision-making was less problematic for our purpose of comparing externally and internally guided decision-making. However, these two types of externally guided decision-making can be supported by different neural substrates. This possibility should be addressed when sufficient numbers of studies for meta-analysis become available.

Similarly, we included two types of decision-making as internally guided decision-making (i.e., moral and preference decisions), based on our conceptual and operational definitions and similarity of neural substrates between these two types of studies. Although meta-analysis for preference judgment showed no significant regions because of the paucity of studies (seven studies), meta-analysis for moral judgment (see **Figure 5B**; **Table 4**) showed similar neural substrates to those found in the metaanalysis results for decision-making of these two types (see **Figure 1C**; **Table 2**). Based on these results, we assume here that using both moral and preference decision-making as internally guided decision-making is less problematic for our purposes. However, it is possible that these subcategories present several differences of neural substrates because the preference judgment can be less influenced by social pressure than moral decisionmaking. In addition, different types of preference judgment (i.e., preference for color or for occupation) can be made based on different kinds of psychological criteria, and can be correlated with different neural substrates. It would be interesting to compare the neural substrates of these subcategories in future studies.

Because coordinate-based meta-analytical methods such as MKDA are based on spatial coordinates from neuroimaging data, they have been limited to PET and fMRI studies, and excluded EEG/ERP studies. Additionally, we did not include results from the analysis related to functional connectivity and computational model-based analysis into our meta-analysis. Although we tried to refer to studies of these kinds in review and discussion parts of this presentation,we note that our meta-analysis results reflect limited aspects of brain activities in externally and internally guided decision-making.

## **CONCLUSION**

We compared different types of decision-making: externally and internally guided decision-making. Based on experimental– operational and neural differences, we can distinguish these two basic types of decision-making from one another. Externally guided decision-making in situations with only one less-predictable correct answer was mainly supported by the DLPFC–insula–thalamus–IPL networks. Internally guided decision-making in which no correct answer based on


#### **Table 4 | MKDA results for each sub-type of decision-making study**

Because of low numbers of studies (four studies), we did not conduct meta-analysis for externally guided decision-making under uncertainty using perceptual difficulty. Internally guided decision-making using preference judgment showed no significant region because of the low number of studies (seven studies).

Regions marked \*\* were significant at FWE voxel-level corrected p < 0.05 with extent size > 10 voxels. Regions marked\* were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.001. Regions marked †were significant at FEW extent corrected p < 0.05 at primary voxel thresholds of uncorrected p < 0.01. Regions marked with\* and with †were reported if these were additional regions.

BA denotes Brodman Area; Maxstat. denotes maximum of the Z field.

external circumstances is available, was supported by the VMPFC–pACC–PCC–STG network. Although the psychological and neural substrates of externally guided decision-making have been well identified, they remain unclear in the case of internally guided decision-making. This study of the substrates is of great interest to the field of decision-making itself in that it sheds some light on a form of decision-making that is prevalent in actual daily life. Beyond the field of decision-making, this line of investigation is also expected to contribute to improvement in our

## **REFERENCES**


beyond errors and response conflict. *Psychophysiology* 42, 33–42.


understanding of the function of the brain's resting state and its high activity, especially in the DMN that largely overlaps with observed regions in internally guided decision-making.

## **ACKNOWLEDGMENTS**

This work was supported by a Grant-in-Aid for JSPS Fellows (20- 821) and for JSPS Postdoctoral Fellowships for Research Abroad (630) from the Japan Society for the Promotion of Science. CIHR, EJLB-CIHR, HDRF-ISAN, UMRF to Georg Northoff.

The brain's default network. *Ann. N. Y. Acad. Sci.* 1124, 1–38.


person?' A network analysis of midline cortex during a social preference task. *Neuroimage* 51, 930–939.


networks. *Proc. Natl. Acad. Sci. U.S.A.* 102, 9673–9678.


moral decisions. *Proc. Natl. Acad. Sci. U.S.A.* 106, 12506–12511.


between episodic memory encoding and retrieval: roles of the taskpositive and task-negative networks. *Neuroimage* 49, 1045–1054.


studies. *Neurosci. Biobehav. Rev.* 35, 1219–1236.


moral cognition. *Nat. Rev. Neurosci.* 6, 799–809.


the stage for expression of motivated behavior. *J. Comp. Neurol.* 493, 167–176.


decision under risk. *J. Neurosci.* 31, 8822–8831.


prefrontal cortex. *Neuron* 50, 781–789.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 December 2011; accepted: 18 February 2012; published online: 05 March 2012.*

*Citation: Nakao T, Ohira H and Northoff G (2012) Distinction between externally vs. internally guided decisionmaking: operational differences, metaanalytical comparisons and their theoretical implications. Front. Neurosci. 6:31. doi: 10.3389/fnins.2012.00031*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Nakao, Ohira and Northoff. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits noncommercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*


(Continued)

**APPENDIX**


**Table A1 | Continued**


condition. Cont > Exp denotes the opposite cases. Here, n.s. signifies not significant.

## **REFERENCES**


differential roles of dorsal and rostral anterior cingulate cortex. *Neuroimage* 35, 979–988.


## Practical implications of empirically studying moral decision-making

#### **Nora Heinzelmann<sup>1</sup>\* † , Giuseppe Ugazio<sup>2</sup>\* † and Philippe N. Tobler <sup>2</sup>**

<sup>1</sup> Faculty of Philosophy, University of Oxford, Oxford, UK

<sup>2</sup> Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland

#### **Edited by:**

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil

#### **Reviewed by:**

Ming Hsu, University of California Berkeley, USA Francisco Aboitiz, Pontificia Universidad Catolica de Chile, Chile

#### **\*Correspondence:**

Nora Heinzelmann, Mansfield College, Mansfield Road, Oxford OX1 3TF, UK. e-mail: nora.heinzelmann@ philosophy.ox.ac.uk; Giuseppe Ugazio, Laboratory for Social and Neural Systems Research, University of Zurich, Blümlisalpstrasse 10, CH-8006 Zurich, Switzerland. e-mail: giuseppe.ugazio@econ.uzh.ch

†Nora Heinzelmann and Giuseppe Ugazio have contributed equally to this work.

This paper considers the practical question of why people do not behave in the way they ought to behave. This question is a practical one, reaching both into the normative and descriptive domains of morality.That is, it concerns moral norms as well as empirical facts. We argue that two main problems usually keep us form acting and judging in a morally decent way: firstly, we make mistakes in moral reasoning. Secondly, even when we know how to act and judge, we still fail to meet the requirements due to personal weaknesses. This discussion naturally leads us to another question: can we narrow the gap between what people are morally required to do and what they actually do? We discuss findings from neuroscience, economics, and psychology, considering how we might bring our moral behavior better in line with moral theory. Potentially fruitful means include nudging, training, pharmacological enhancement, and brain stimulation.We conclude by raising the question of whether such methods could and should be implemented.

**Keywords: descriptive, morality, normative, reasoning, neuroethics**

## **INTRODUCTION**

A sharp distinction has been made between the descriptive domain of morality, i.e., the way agents behave or make moral judgments, and the normative domain, i.e., the way agents ought to behave or make moral judgments. In the empirical sciences, there has been an on-going debate about which theory describes moral decision-making best. Similarly, normative moral philosophy has been discussing which ethical theory is superior to the others.

However, whenever we watch the news or observe our social environment, both of these issues are of comparably little importance to us. The question that usually concerns us is not: How do people behave? Or: How ought they to behave? But rather: Why do they fail to behave in the way they should?

This last question is not purely an empirical one, as it involves an assumption about how one ought to behave. Nonetheless, it is neither an ultimately normative one, as it relies on empirically observable facts about human behavior. The issue is rather a practical one, reaching both into the descriptive and the normative domains of morality. It naturally leads to another practical question:What can we do about the fact that people often do not behave in a way they are morally required to?

In this essay, we elaborate on these two related practical issues and give an outline of how to resolve them.We argue that two main problems usually keep us from acting and judging in a morally decent way: Firstly, we make mistakes in moral reasoning. Secondly, even when we know how we ought to act and judge, we still fail to meet our obligations due to personal weaknesses.

## **HOW OUGHT WE TO ACT?**

Normative ethics tells us what we ought to do. Three of the most prominent contemporary theories are consequentialism, deontology, and virtue ethics (Crisp, 1998/2011, cf. Tobler et al., 2008). There is no clear, simple, and universally accepted definition for any of them; therefore we shall give a brief account of how these concepts are understood in the present paper. Albeit rough and sketchy, we assume that these characterizations serve our present purpose well enough.

In one of its general forms, consequentialism tells us that the outcomes (consequences) of our actions ought to be as good as possible (cf. Scheffler, 1988). There are numerous consequentialist theories which in turn can be classified in various ways. Philosophers traditionally distinguish act and rule consequentialism. Act consequentialism holds that the outcome of single actions ought to be as good as possible. As consequences of single actions are often difficult to predict, attempts have been made to facilitate the decision process of an agent. In this vein, rule consequentialism focuses on action-guiding rules, claiming that the consequences of the rules be as good as possible. Actions are then evaluated with respect to these rules.

Also, different consequentialist approaches disagree on what the goodness of an outcome consists of. The most popular one, utilitarianism, holds that we ought to do what increases people's happiness or decreases their unhappiness. Hereby, the good of everyone has to be taken into account and everyone's good counts equally.We ought to act in a way that maximizes the good of all and

in no other way. Jeremy Bentham, one of the founders of classical utilitarianism, argued for a felicific calculus that allows measuring the outcome of various actions, i.e., the pleasure these actions may produce. Such a method presupposes that all pleasures are comparable and quantifiable and that they are, as consequences of an action, to greater or lesser certainty predictable. After such hedonic approaches to (experienced) utility had been largely abandoned by economics, they have more recently been taken up again by behavioral economics (Kahneman et al., 1997). Moreover, some formal treatments of welfare economics (Harsanyi, 1955) and prosocial preferences (e.g., Fehr and Schmidt, 1999) also have consequentialist roots.

"Deontology" is a collective term denoting a variety of theories which, from a linguistic point of view, assign a special role to duties, as "deontology" refers to the study or science of duty (deon = duty). Deontology requires us to fulfill our moral duties but such a general claim is also made by consequentialist theories, which hold that it is our moral duty to act in such a way that the outcomes be as good as possible. Therefore, deontology is sometimes identified with non-consequentialism, the claim that the wrongness or rightness of an action is not only determined by the badness or goodness of its consequences. For instance, an action can be assigned intrinsic value because of the agent's willingness that the principle – or maxim – on which the action is performed should become a universal law, a criterion established by Kant (1965/1785). Kant's ethics and the theories derived from them are often seen as prominent candidates of deontology. Another central requirement of Kant's ethics is to never treat a human being as a means to an end. Thus according to Kant and in contrast to consequentialism, it would be morally wrong to kill one person if thereby two other human lives could be saved.

Usually, deontology is schematically conceived of as rivalling both consequentialism and virtue ethics. Virtue ethics usually goes beyond the question of what we morally ought to do. This has historical reasons: The earliest prominent account of virtue ethics has been developed by Aristotle (2000) who was concerned with the best way for a human being to live. A central claim of contemporary virtue ethicists is that living virtuously is required in order to flourish. Roughly speaking, a virtue is a disposition to act appropriately for the right reason and thus requires practical wisdom. Flourishing can be described as living fulfilled and happily, which goes beyond mere momentary subjective well-being but refers to an overall outlook and life as a whole.

All of these theories are primarily concerned with the question of how we ought to act rather than how individuals actually do behave. We shall turn to this topic in the following section.

#### **HOW DO WE ACT?**

Empirical research on human moral behavior has focused primarily on two topics: action and judgment. As these two aspects of moral behavior have been studied using rather different approaches, we shall treat each of them separately here. First, we consider the literature studying the effects of norms on people's actions (Bicchieri, 2006; Gibson et al., in press). Second, we shall focus on the literature studying the psychological mechanisms underlying moral judgments (Greene et al., 2001; Moll et al., 2005; Hauser, 2006; Prinz, 2006; Mikhail, 2007).

From a wider perspective, the question arises whether moral judgment translates into moral behavior. This issue is controversial and has received a variety of answers (e.g., Schlaefli et al., 1985). One view (Bebeau et al., 1999) suggests that a moral act requires not only that an agent judges one course of action as moral but also that she identifies a situation as moral (e.g., that consequences of distinct courses of action have differential welfare implications; moral sensitivity), chooses the moral over other courses of action (moral motivation) and persists to implement the goal of the action (moral character). In this view, it would be expected that judgment and action are positively but weakly correlated, which seems to be the case (Blasi, 1980).

#### **MORAL ACTION**

One of the most successful approaches to study moral action has been to observe how people's behavior changes depending on the saliency of a norm. Scholars working in this field developed several models to show how the utility assigned by a person to different outcomes in a given situation is modified by the presence of a norm. Norms motivate compliant behavior mainly in two ways: (a) they modify the expectations an individual has regarding others' behavior (Bicchieri, 2006) and (b) they generate a personal cost for violating the action course prescribed by the norm (Gibson et al., in press).

While Bicchieri's work focused mainly on providing a theoretical description of how and when social norms are most likely to emerge and influence individuals behavior, other scholars provided empirical evidence demonstrating the influence of norms on behaviors in a social context. For instance, recently Gibson et al. (in press) tested the influence of the moral obligation of being honest (or not lying) on individuals'behavior in an economic context. The authors tested the hypothesis that when being incentivized to lie by being able to make a greater profit through not telling the truth, the willingness of an individual to behave immorally, i.e., to lie, was correlated with the importance she assigned to being honest. More specifically, those individuals attributing high importance to the honesty norm were extremely insensitive to the cost of telling the truth, which suggests that the moral value of respecting a moral duty (of being honest) can outweigh economic costs of respecting it and even prevent utilitarian cost-benefit trade-offs altogether.

#### **MORAL JUDGMENT**

Whereas psychological research on moral judgments has captured them predominantly as a cognitive, controlled process, and focused on moral development in the 20th century (Piaget, 1932; Kohlberg, 1976), it has in recent years mainly developed around two research questions: (a) do moral judgments stem from intuitions or from conscious reasoning and (b) which psychological processes are involved in moral intuitions (Cushman et al., 2010). Roughly, we can distinguish four different approaches to these questions.

From a first perspective, following Hume's (1960) idea that moral judgments result from "gut feelings", some scholars proposed that moral judgments predominantly result from intuitions of an emotional nature (Prinz, 2006, see also Prinz, 2007; Woodward and Allman, 2007).

Second, others agree that moral judgments indeed stem from intuitions but they deny that such intuitions are of emotional nature, arguing instead that moral intuitions are the product of moral specific psychological mechanisms named "universal moral grammar" (Hauser, 2006; Mikhail, 2007; Huebner et al., 2008). According to this view, neither conscious reasoning nor emotions play a causal role in determining moral judgments, suggesting that these two processes actually occur after the moral judgment has been produced by the "moral grammar" mechanism.

From a third point of view other scholars put forward a dualprocess theory of moral judgment (Greene et al., 2004) suggesting that moral judgments result from two psychological mechanisms: emotions and conscious reasoning. It is consequently claimed that different moral judgments are underpinned by different psychological systems (Cushman et al., 2010).

Finally on a very similar stance, a fourth theory acknowledges that moral judgments rely on multiple psychological mechanisms, and therefore that both emotions and conscious reasoning play a role in moral judgments. However, in contrast with the third view described above,it is argued that different moral judgments are not underpinned by different psychological systems, but rather that all moral judgments will involve cognitive and emotional mechanisms in competition against each other when a moral judgment is produced (Moll et al., 2005, 2008).

## **NEURAL UNDERPINNINGS**

The advent of neuroimaging methods allowed to study the intact brain of healthy volunteers while they make moral judgments and decisions. This line of research has identified a variety of brain regions that are active during moral cognition (**Figure 1**; for review, see e.g.:Moll et al.,2005,2008;Raine andYang,2006; Forbes and Grafman, 2010). These regions include the prefrontal cortex, particularly ventral, medial, dorsolateral, and frontopolar subregions, posterior cingulate cortex, anterior temporal lobe, superior temporal cortex, temporoparietal junction, striatum, insula, and amygdala. Many of these regions are also implicated in "theory of mind" tasks requiring consideration and inference of others' thoughts and desires (Bzdok et al., 2012) and impaired in patients with antisocial disorders,in agreement with the notion of impaired moral decision-making (**Figure 2**; Raine and Yang, 2006).

One could next ask whether neuroimaging can contribute to informing theories of moral decision-making. Could it help deciding between the different theories outlined in 2.2 (even though some of them may not be mutually exclusive)? Or, more specifically, can neuroimaging inform us about the degree to which emotions are involved in moral judgment?When asking such questions one is often tempted to make reverse inferences from brain activation to mental function. However, given that most brain regions contribute to more than one function, such inferences are at best probabilistic (Poldrack, 2006, 2011). Moreover, they are limited by the response specificity of the brain region under study and by the precision with which mentalfunctions are parsed conceptually and assessed empirically (Poldrack,2006). Nevertheless, some attempts to answer those questions have been made.

For example, an extension to Hume's view mentioned above may be suggested by the involvement of dorsal and lateral frontal regions in moral judgment (e.g., Greene et al., 2001). This would be based on the notion that these regions play a stronger role in more deliberate, goal-directed, and cognitive than automatic and emotional functions (Forbes and Grafman, 2010). Moreover, all of the regions implicated in moral judgment have been implicated also in other mental functions. This seeming lack of evidence for a neural substrate exclusively devoted to moral functions (Young and Dungan, 2012) does not support the universal moral grammar approach; if one assumes that moral functions have evolved from non-moral functions or that the mental functions required for other types of judgments can be used also in the moral domain (Tobler et al., 2008) it is perhaps not surprising that so far no region has been singled out as a uniquely moral center of the brain. In principle though it is still conceivable that finer grained methods, such as single cell recordings, may reveal such a substrate.

Neuroimaging and lesion work also point toward a role for emotion in moral judgment. The ventromedial prefrontal cortex (vmPFC) is involved in emotion processing and also activated when a subject makes moral judgments (reviewed in Young and Koenigs, 2007). Lesions of this region result in blunted affect (hypo-emotionality) as well as increased emotional reactivity to environmental events (Anderson et al., 2006). Activations are increased by pictures with moral emotive content (depicting, e.g., abandoned children, physical assaults) compared to pictures with non-moral emotive content of similar emotional valence and sociality (Moll et al., 2002; Harenski and Hamann, 2006) and by moral compared to semantic judgments (Heekeren et al., 2003, 2005). Patients with lesions of the vmPFC are more likely than controls to endorse harming someone in order to benefit a greater number of other people (Ciaramelli et al., 2007; Koenigs et al., 2007; Thomas et al., 2011). In healthy subjects the strength of skin conductance responses to such moral dilemmas correlates inversely with the propensity to endorse harm for the greater good (Moretto et al., 2010). By contrast, vmPFC patients fail to generate such emotive responses before endorsing harm (Moretto et al., 2010). Thus, at least some moral judgments appear to be caused by emotions.

Although much of the literature has focused on prefrontal cortical regions, moral judgment, and decision-making are clearly not a purely prefrontal or, more generally, neocortical matter. Activation in the striatum, for example, is affected by the moral status of a partner with whom one performs economic exchanges (Delgado et al., 2005) and reflects behavioral sensitivity to the "moral expected value" (number of lives saved) of moral actions (Shenhav and Greene, 2010; **Figure 1B**). Based on its general role in action selection (Balleine et al., 2009), one would also expect the dorsal striatum to contribute to the selection of moral actions. The amygdala contributes to the learning of fear and distress experienced by others (Blair, 2007; Olsson et al., 2007); empathy-induced insula activation correlates with subsequent prosocial behavior (Masten et al., 2011). Thus, although these regions may primarily serve different functions they can nevertheless be harnessed for moral judgments and decisions.

**FIGURE 1 | Brain regions implicated in moral judgment and decision-making. (A)** Cortical regions. Note that the posterior cingulate cortex and the angular gyrus (temporoparietal junction) have also been implicated in moral judgments (shown in **Figure 2**). aPFC, anterior prefrontal cortex; aTL, anterior temporal lobe; DLPFC, dorsolateral prefrontal cortex; lOFC, lateral orbitofrontal cortex; STS, superior temporal sulcus; vmPFC, ventromedial prefrontal cortex. Adapted with permission from Moll et al. (2005). **(B,C)** Example for striatal involvement in moral decision-making. The task employed moral dilemmas. In each trial, subjects rated how morally acceptable it was to save a group of individuals from death with a known

## **PEOPLE DO NOT BEHAVE IN A WAY THEY OUGHT TO**

Combining insights from the two previous sections, this part of the paper will establish the claim that human beings often do not behave in a way they ought to. Although it is clear that discrepancies can arise from a variety of issues, including moral sensitivity, judgment, motivation, and character, we will concentrate on two more

recently discussed phenomena: cognitive biases and emotional influences.

probability rather than a single individual with certainty. Across trials, group size, and probability varied. Group size and probability should be multiplied to compute the expected number of lives saved. **(B)** Regions in ventral striatum previously identified by Knutson et al. (2005) as processing reward value. **(C)** In the regions shown in **(B)**, individual neural sensitivity (contrast estimates of activation increases) correlated with behavioral sensitivity (beta estimates in rating) to the expected number of lives saved. Adapted with permission from Shenhav and Greene (2010). This finding is in line with the notion that moral functions can be underpinned by neural mechanisms that have originally evolved for different functions, such as reward processing (Tobler et al., 2008).

Both these phenomena are morally problematic in that they reflect the influence of morally irrelevant features on actions and judgments. We shall briefly clarify this point for each of the three ethical theories outlined in the section "How Ought We to Act?" above.

**FIGURE 2 | Comparison of brain regions preferentially activated during moral judgment and decision-making (green), regions impaired in patients with antisocial disorders such as antisocial personality disorder and psychopathy (red) and common regions (yellow).** One possible

As mentioned before, consequentialism requires that only the ultimate consequences of an action or judgment are relevant to its moral evaluation. Therefore, features such as the emotional state of the agent or the framing of several options to choose from are not to be taken into account. However, as we shall elaborate in the following, there are a variety of instances in which agents are influenced by such cues and therefore do not act and judge in a morally decent way.

interpretation is that emotions as underpinned by the common regions prevent breaking of moral rules, the defining deficit of antisocial personality disorders. The angular gyrus lies at the junction of temporal and parietal cortex. Reprinted with permission from Raine and Yang (2006).

From a Kantian point of view, a morally right action or judgment is to be made from duty, that is, out of reverence for the moral law. Accordingly, any other feature of a situation, such as the agent's uneasy feeling toward the morally prescribed action course, is to be ignored. However, empirical evidence will be given below that individuals often fail to meet this normative requirement.

Virtue ethics outlines the character traits which distinguish a virtuous person. Amongst them are the faculty of practical reasoning and specific virtues such as justice or temperance. There is, however, solid evidence that agents frequently fail to display these traits in their behavior and judgments, as this section shall make clear.

In the following, we shall explain in greater detail in what ways individuals are biased or influenced by their emotions. For some cases, we shall, by way of example, explain how the actions and judgments in question are morally dubious from a deontological, consequentialist, or virtue ethical perspective.

### **BIASED BEHAVIOR**

Briefly, a cognitive bias is an unconscious tendency to judge a certain element in a way that depends on one's own preferences, expectations, and experiences. Cognitive biases are similar to perceptual biases such as optical illusions (e.g., the Müller-Lyer illusion, Müller-Lyer, 1889). Instead of influencing our perceptual skills, cognitive biases affect people's cognitive capacities. We shall give some examples for this phenomenon below.

Firstly, a known cognitive bias that strongly affects moral actions is the so-called *bystander effect*, i.e.,"the more bystanders to an emergency, the less likely, or the more slowly, any one bystander will intervene to provide aid" (Darley and Latané, 1968, p. 1). Darley and Latané (1968) recreated an emergency situation in the lab in order to test the reactions of participants. The higher the number of bystanders, the lower the percentage of participants who decided to intervene and the longer the time it took them to do so. Presumably, people recognize the badness of the situation, yet feel a "diffusion of responsibility" and so do not act accordingly. However, such a behavior is morally questionable. For instance, from a deontological perspective, it is highly plausible to assume that an agent has a strong duty to help a victim in an emergency. Besides, such a duty is often legally prescribed, i.e., non-assistance of a person in danger is widely regarded as tort. The presence of bystanders and their number does not relieve the agent from his moral duty. Failure to act in accordance with the duty to help is thus a severe moral transgression from a deontological point of view.

Secondly, the next cognitive bias taken into consideration here is known as the *identifiable victim effect* (Schelling, 1968; Redelmeier and Tversky, 1990): one is more likely to help a victim if he is easily identifiable. An example of this behavior is people's widespread inclination to save one little child from drowning in a shallow pond but to refrain from making a small donation that would save 25 children from starving to death in Africa (cf. Hauser, 2006). This pattern of results was consistently found in numerous previous studies observing people's behavior in similar situations (Calabresi and Bobbitt, 1978; Redelmeier and Tversky, 1990; Viscusi, 1992; Whipple and Swords, 1992). Again, this is morally dubious behavior, as we shall argue from a virtue ethicist's viewpoint. Generally, charity and justice (or fairness) are regarded as moral virtues. Assume further, plausibly enough, that the overwhelmingly important point about being charitable is the benefit of the person receiving aid. Then a virtuous agent would help both the drowning child and the starving kids. Helping one but not the others seems to amount to a failure of exhibiting charity and justice and therefore to non-virtuous behavior. From a consequentalist perspective it could be argued that saving 25 is likely to have better consequences than saving one. Thus, failing to save the

larger number would presumably be morally dubious also from a consequentialist perspective.

## **EMOTIONALLY INFLUENCED BEHAVIOR**

Among the elements influencing moral behavior, emotions play an important role. Although, as for cognitive biases, people are usually unaware of the influence that emotions have on their behavior, several studies have shown that brain areas associated with emotion are involved in various decision-making tasks, including the formation of moral judgments.

A seminal study by Greene et al. (2001) has shown that emotions are usually sensitive to the means used for an action, while cognitive processes are sensitive to the consequences resultingfrom this action.

Other studies investigating the role of emotions in moral judgment showed that moral condemnation of an event (i.e., how wrong you think something is) is strongly influenced by the emotional state of the person evaluating it. Haidt and colleagues (Wheatley and Haidt, 2005; Schnall et al., 2008; Eskine et al., 2011) ran a series of studies which showed that induced disgust can yield harsher condemnations of a set of disgust-related moral violations such as incest.

Recently, we (Ugazio et al., 2012) have provided evidence that when a person judges a moral scenario, different emotional states will influence her choices in opposite ways. People who were induced to feel anger were more likely to judge a moral action in a permissive way compared to people in a neutral emotional state, and people induced to feel disgust were more likely to judge the same actions in a less permissive way.

The influence of emotional states on moral judgments and actions, in particular if the emotions stem from morally irrelevant factors of the situation, is morally problematic according to all the moral theories outlined in Section "How Ought We to Act?". Consider consequentialism first and recall that from this perspective, the only aspects relevant to a moral evaluation are the outcomes of an action, decision, etc. In particular, the emotions of the agent are only relevant to the extent to which they are part of the overall utility affected by the outcomes. Hence, emotions are problematic if they influence an action or judgment such that it does not lead to the best possible outcome.

According to deontology, a morally right action or judgment is to be performed out of duty. Kant (1965/1785) famously declined that an action out of inclination fulfills this criterion. As emotions are regarded as inclinations of this sort, a judgment or action determined by an emotion cannot be morally right.

From the point of view of virtue ethics, the actions and judgments described in this section seem to be morally questionable because they do not seem to stem from virtuous practical reasoning. A virtuous person takes her passions into account in an adequate manner, yet she is presumably not dominated by their influence. Moreover, it might be the case that the actions and judgments described are morally dubious because they go against the virtue of temperance. However, the extent to which practical reasoning and temperance are non-virtuously counteracted will depend on the extent to which the action or judgment in question is influenced by the emotions.

Having given evidence for the claim that individuals often do not behave and judge in a morally sound way, we shall in the following section provide details on what we believe are the most important reasons for these failures.

## **WHY DO WE NOT BEHAVE IN MORALLY DECENT WAYS?**

A first step toward a solution to the problem that people often do not behave in morally decent ways consists in analyzing the reasons and mechanisms of this behavior. Our hypothesis is that we do not behave in a way we ought to either because we have mistaken beliefs about what we ought to do or because we fail to carry out the right action despite our better knowledge.

For the first problem – we make mistakes in moral reasoning – a range of different causes can be given. The most obvious one is a lack of cognitive capacities. For instance, we suddenly find ourselves to be free-riders on a train because we simply forgot to validate our ticket. In this case, it is simply bad memory, lack of planning, distraction, or time pressure that led us to a moral transgression.

Inappropriate moral decision-making may also occur as a consequence of people's ignorance of important information. Such ignorance then prevents them from drawing the correct conclusion how to act. For example, a consumer who wants to support fair working conditions may make a wrong decision because he is not aware that the company selling the product he chooses has recently been found guilty of sweatshop labor.

In addition, defective moral reasoning may be behind cognitive biases and phenomena such as the identifiable victim effect as described in the previous section. It seems to stem from a lack of reflection on the two scenarios, their comparison and moral evaluation, ultimately leading to the violation of the virtues of justice and fairness.

The second problem – despite knowing how we ought to act, we fail to carry out the right action – can be analyzed in a variety of ways. We will consider only a selection here.

Failure to act in a way that has been acknowledged of being the morally correct one may be due to personal weaknesses. The most prominent one is *akrasia*, sometimes also described as weakness of the will (cf. Kalis et al., 2008). We shall not distinguish between akrasia and weakness of will in this paper. A person is called akratic if she acts against her own standards or aims. Succumbing to some temptation, e.g., eating another portion of ice-cream despite your knowing you are thereby taking away someone else's share, is usually regarded as an akratic action (cf. Austin, 1961).

The concept of akrasia depends heavily on the underlying idea of man. If we share Socrates' view of a completely rational homo economicus, akrasia simply does not exist. Similarly, Aristotle and Aquinas have regarded akrasia as a result of defective practical reasoning whose result is a morally bad action (see also Hare, 1963; Davidson, 1970). However, if we believe that akrasia goes beyond fallacious reasoning, the difficult question arises of what akrasia actually is. Some have claimed that it is a conflict of competing forces, for instance, according to Augustine, between incompatible volitions. Others have described it as an instance of self-deception (Wolf, 1985; Schälike, 2004). In an Aristotelian vein, Beier (unpublished manuscript, see also Beier, 2010) argues that it is a result of underdeveloped virtues, that is, a defect in character building.

As far as we know, the philosophical concepts and theories concerning akrasia and related phenomena have not yet been linked to empirical research on defects of self-control, empathy, and self-involvement. Such an enterprise might, however, provide fruitful insights for both approaches. As the literature on behavioral and neuroscientific research is vast, we shall confine ourselves to a very brief review of evidence concerning *self-control* here.

An action out of self-control is generally defined as the choice of larger-later rewards over smaller-sooner ones (Siegel and Rachlin, 1995). Self-control has also been defined as the regulation of habits. From another perspective, self-control amounts to the control of emotional reactions (Ochsner and Gross, 2005). Both the second and the third approach regard self-control as a control of automatic reactions involving similar neural circuits. Neuroscientific research investigating the brain areas involved suggests that the dorsolateral prefrontal cortex (DLPFC) modulates the value signal encoded in the vmPFC which in turn drives choices and decisions (Hare et al., 2009). The DLPFC promotes task-relevant processing and eliminates irrelevant activities. Future research into DLPFC and its interactions with vmPFC and other brain regions may shed new light on how to analyze self-control and akrasia and how to influence those phenomena.

In sum, the philosophical conception of akrasia may be linked to a lack of self-control in the following way: relying on Beier, akrasia can be regarded as defective character building which essentially involves the development of self-control. This, in turn, will yield agents'falling prey to morally irrelevant aspects of a situation, such as cognitive biases or emotional influences, which affect behavior and judgment. To illustrate, consider an example from the previous section: depending on their emotional states, subjects regarded moral transgressions more or less severe (Ugazio et al., 2012). That is, they could not separate their feelings from a consideration of a moral scenario which amounts to a defect of control over the emotions.

Another issue that hinders us from acting in morally decent ways may be certain *character traits*. For instance,fanatic religiosity sometimes turns people into murderers. Such traits are presumably the product of both genetic dispositions and their shaping through education and self-reflection.

A general reason for both morally fallacious reasoning and failure to carry out the action identified as the right one is the evolutionary background of human beings. Morality can be viewed as a product of the phylogenetic history of our species which has evolved in an environment different from the one we live in today. More precisely, it is commonly believed that reciprocity became a part of moral behavior because it enhanced the evolutionary fitness of reciprocating individuals (reciprocal altruism, cf. Trivers, 1971). Similarly, prosocial behavior within a group increased the reproductive abilities of its members in comparison to non- or anti-socially behaving groups (group selection, cf. Sober and Wilson, 1998). Likewise, altruistic behavior toward one's own kin may increase the likelihood of spreading the shared genes (kin selection, cf. Hamilton, 1964).

To give some examples for evolutionary explanations of moral behavior, immediate and strong emotional reactions to a given situation probably evolved because they facilitate a quick reaction which in turn improved survival, for instance the fight-or-flight response to predators. The theory of kin selection can explain why humans evoke emotional reactions such as caring love toward their offspring and may favor them over foreigners: by helping the former and not the latter, their own genes are more likely to be passed on in the future. Likewise, we are now equipped with biases that automatically and unconsciously guide us in a way that helps to spread our genes. For instance, the identifiable victim effect increased the safety of the young in the agent's close environment who shared genes with him with higher probability than did children further away. According to group selection, such biased behavior also improved the evolutionary fitness of one's own group, as helping close-by group members rather than faraway out-group individuals would favor one's own group and eventually the agent himself.

Related to this point, reasons for why we do not behave in morally decent ways can be regarded from a cultural perspective. On this view, morality can be seen as a relatively recent development, crystallized in laws and rules for social conduct. In this vein, the philosopher Nietzsche (1966, p. 228) has argued against moral systems such as Kantian, Christian, and Utilitarian ethics, criticizing that these codes of conduct are"detrimental to the higher men" while benefiting the "lowest." From a similar perspective, morality may be seen as a fear of punishment which evolved originally and is exploited by legal systems. In this view, failures of morality arise whenever people do not experience enough fear of punishment. Presumably, the lack may come from the person or the situation.

Empirical research proves helpful to investigate and explain each of the problems mentioned, providing a basis on which we can search for solutions.We shall turn to this topic in the following section.

## **IMPROVING MORAL BEHAVIOR**

Having provided evidence (see section "People do not behave in a way they ought to") that people often make inconsistent, if not mistaken, moral decisions and act accordingly, and having explored possible explanations for such irrational behavior (in the previous section), in this section we discuss possible means by which improving humans' moral decision capacities, particularly via nudging, training, pharmacology, and lastly brain stimulation.

#### **NUDGING**

A nudge has been defined as an "aspect of the choice architecture that alters people's behavior in a predictable way without forbidding any options" (Thaler and Sunstein, 2008, p. 6). Other than regulating, nudging does not eliminate possible courses of action. For example, a school canteen can increase pupils' intake of vitamins by placing fruit salad or similar desserts in front of the chocolate cakes and sweets. This would be a nudge, whereas banning all alternatives to a healthy dessert would be a regulation. Nudging makes use of inclinations and biases, e.g., the fact that people tend to favor items displayed at eye level or often eat the portion they are served regardless of its size. Marketing strategies have benefited from these insights long ago, relying on long lasting research projects into consumer habits and psychology.

Nudging has mainly been investigated as a means to tackle population health issues, such as obesity and addiction to alcohol, nicotine, or other substances (Downs et al., 2009; Just and Payne, 2009; Zimmerman, 2009). However, it can be equally relied on in order to approach moral issues: it provides paternalistic institutions with strategies to succeed in guiding their clients, patients, or charges to the morally right decisions or actions (Thaler and

Sunstein, 2003). For instance, given the assumption that organ donation is a morally praiseworthy action, a government can yield an increase in organ donors by making the donation of organs the default option of which you have to opt-out if you do not want to be a potential organ donor.

However, nudging in moral contexts raises a lot of issues. First, it is questionable whether a morally praiseworthy action loses its praiseworthiness if it had not been performed without the relevant nudge. This depends on whether an action is to be evaluated only on the basis of its results or also with regard to the states of mind of the agent. Second, as nudging itself seems morally neutral, the question arises how, taken in isolation, it could help us to improve moral decision-making and acting at all. Nudging may well be abused by the nudger for his personal interests. Third, the practice of nudging itself may be questioned on the ground of fear for autonomy and respect.

These and other questions will be discussed in Section "Should We Try to Improve, And Is It Possible?". For now, we shall outline some further means and methods that might be useful for an improvement of moral practice.

## **TRAINING**

Although already Aristotle suggested that sound judgment needs practice, there is little empirical research on direct training of moral decision-making. In as far as it is feasible to train cognitive and emotional functions and such training transfers to other domains it may also be conceivable to improve moral decisionmaking indirectly by training these functions. Working memory performance increases with training techniques such as an adaptive dual n-back task (e.g., Jaeggi et al., 2008), or an adaptive order-and-location memory task (e.g., Klingberg et al., 2005; Thorell et al., 2009). Working memory training transfers to other domains, including fluid intelligence (e.g., Jaeggi et al., 2008), attention (Thorell et al., 2009), and response inhibition, at least in children with ADHD (Klingberg et al., 2005). However, transfer appears to occur primarily in closely related domains (Li et al., 2008) and only in individuals in which initial training is successful (Jaeggi et al., 2011).

Response inhibition can be trained with go/no-go and flanker tasks (Thorell et al., 2009) whereas executive attention improves after training with a battery of anticipation and stimulus discrimination exercises (Rueda et al., 2005) but training effects seem to transfer less readily than with working memory training. Based on the hypothesis that utilitarian components of moral decisionmaking depend more on cognitive factors than deontological ones (Greene et al., 2001), one may speculate that training cognitive factors would improve specifically utilitarian components of moral decision-making. However, given that transfer appears to be limited to closely related domains, it is questionable whether moral behavior would benefit from such training.

Training of emotional factors can improve aspects of moral decision-making. For example, a Buddhist compassion-enhancing technique increases provision of help to another player in a virtual treasure hunt game (Leiberg et al., 2011). In the same game, the duration of compassion training correlates with helping particularly in situations in which the other player cannot reciprocate help. By contrast, compassion training does not affect giving money to others in a dictator game, where subjects decide how to split an amount of money assigned to them between a stranger and themselves (Leiberg et al., 2011). Taking these findings further, one may wish to investigate whether deontological components of moral decision-making are influenced more by emotion training than utilitarian components.

Through increasing effort-levels required for achieving reinforcement as well as exercises such as monitoring and improving posture, trying to improve mood states, and monitoring eating, self-control can be increased in humans and rats, respectively (reviewed in Strayhorn, 2002). Accordingly, it has been proposed that self-control acts like a muscle that can be trained or fatigued depending on experience (Baumeister et al., 1994). Insofar as selfcontrol reflects a virtue, self-control training may be beneficial from a virtue ethics perspective.

## **EDUCATION**

Moral education has a long tradition and received consideration from all three philosophical theories introduced above (Althof and Berkowitz, 2006). It largely follows on from the (deontologically flavored) views of Piaget and Kohlberg and focuses primarily on the development of moral reasoning. By contrast, the related character education has a stronger grounding in virtue ethics and utilitarianism and aims to promote moral actions leading to good consequences in educated citizens (Althof and Berkowitz, 2006).

Within a Kohlbergian framework, interventions specifically designed to promote moral education are more effective than control interventions or the passage of time (Schlaefli et al., 1985; cf. King and Mayhew, 2002). Moreover, longer term (up to 12 weeks is optimal) interventions that focus on peer discussion of moral dilemmas, thereby leading to practice in moral problem solving, and interventions that focus on personality development and self-reflection are more effective than shorter-term interventions (≤3 weeks) and interventions that focus on academic content such as criminal justice, law, and social studies (effect sizes: 0.36–0.41 versus 0.09; Schlaefli et al., 1985). Treatment effects are more pronounced in older (≥24 years old) compared to younger (13– 23 years old) subjects, although this may be partly due to selection bias (older subjects are more likely to be volunteers) or other methodological issues. Although the effect sizes of interventions are small to moderate, they lead to 4–5 years of natural growth compared to no intervention (Schlaefli et al., 1985), suggesting that education may be a promising avenue for future research.

## **PHARMACOLOGICAL ENHANCEMENT**

The field of cognitive enhancement by pharmacological means has received attention in recent years (reviewed, e.g., in Jones et al., 2005; Illes and Sahakian, 2011) but the first empirical investigations have focused primarily on improving cognition as such, rather than on moral decision-making. Below, we review a few example studies with a more direct link to moral behavior. Before going further though, it is important to note a few caveats:

(1) It is not necessarily the case that more of a given pharmaceutical agent results in monotonic increases in function. Instead, at least some functions may require an intermediate level of the agent. Increases beyond that level result in decreases in the function. An example for this notion comes from working memory and dopamine (reviewed, e.g.,inCools and D'Esposito, 2011).


Intranasal administration of oxytocin (24 international units) increases trust in the trust game (Kosfeld et al., 2005). More specifically, the average initial amount passed by an investor to a trustee is 17% higher under oxytocin (45% of participants showing maximal trust) than under placebo (21%). Proposers' offers are also enhanced by oxytocin in the ultimatum game (Zak et al., 2007). By contrast, non-social risk taking, trustworthiness of trustees (the amount returned by trustees) and amounts offered in the dictator game remain unaffected by oxytocin, excluding less specific effects on risk perception and prosociality more generally. Thus, oxytocin enhances an emotional aspect of moral behavior.

The administration of a selective serotonin reuptake inhibitor (30 mg Citalopram) increases the propensity with which people judge harming others as forbidden, if the inflicted harm is personal and emotionally salient (Crockett et al., 2010a). Moreover, it reduces the rejection of unfair offers in the ultimatum game (Crockett et al., 2010a; the rejection of unfair offers harms the proposer). Thus, serotonin may facilitate prosocial behavior or moral judgments more generally by enhancing aversion to harming others.

## **tDCS/TMS**

Transcranial direct current stimulation (tDCS) is a technique which allows for modulation of regional neural excitability by means of applications of weak currents. In short, neural activity (i.e., an action potential) is usually elicited when the membrane potential – usually −80 mV at rest – is lowered to about −50 mV via driving inputs through other neurons. Applying weak currents (usually 1 or 2 mA) over a cortical area can increase or decrease the resting membrane potential, depending on the position and polarity (anodal or cathodal) of the electrode. Thus, tDCS can lead to an increase or decrease of the excitability and spontaneous activity in the neural tissue under the electrode.

Transcranial magnetic stimulation (TMS) is a technique of non-invasive brain stimulation which uses magnetic impulses to generate weak currents in specific brain regions. So far, two types of TMS have been used, single pulse TMS and repetitive TMS (rTMS). The first type of stimulation affects neural excitability similarly to anodal tDCS, resulting in a depolarization of the neurons targeted by the magnetic impulses. Such depolarization then results in the generation of action potentials in the stimulated neurons. By contrast, rTMS lasts much longer than single pulse stimulation. Therefore rTMS can increase or decrease the resting membrane potential of the stimulated brain region, depending on the intensity and frequency of the stimulation and on the coil orientation (Fitzgerald et al., 2006).

Using both these techniques scholars have shown that it is possible to directly manipulate social and non-social behavior in several tasks including temporal discounting (Figner et al., 2010) and norm compliance (Ruff et al., in preparation). The latter study focused directly on moral behavior (i.e., complying with behavior prescribed by a norm). Other studies investigated processes which are related to moral behavior such as contributing to the enforcement of a fairness norm by costly punishing defectors, or mechanisms involved in shaping individuals' impulsivity.

More specifically, Knoch et al. (2008) tested the role of DLPFC in punishing unfair behaviors. Measuring the altruistic punishments (Fehr and Gächter, 2002) responders inflicted to unfair proposers while playing an ultimatum game (Andreoni et al., 2003), the authors showed that reducing excitability by means of cathodal tDCS in the DLPFC led to a reduction of punishments, compared to participants with intact DLPFC excitability. Therefore the authors conclude that the DLPFC neural activity has a causal role in the willingness to punish fairness norm violators.

Furthermore, Figner et al. (2010) revealed a role of the LPFC for self-control in intertemporal choice behavior. Intertemporal choices require one to decide between receiving a smaller good (e.g., money or food, but also health benefits) in a closer future (usually immediately, but also in days or months) or a larger good in a distant future. Depending on the options an individual chooses it is then possible to measure its level of self-control: the more she prefers the distant-in-time option the higher her self-control level. Disrupting LPFC excitability by means of rTMS resulted in decreased self-control, as people chose more often the immediate smaller good over the alternative option.

Taken together these studies show that brain stimulation could influence two mechanisms strongly related to moral behavior, i.e., self-control and willingness to punish norm violators, as they are involved in social decisions where one is required to choose between a personal gain or benefiting the society (Elster, 1989; Fehr and Gächter,2002; Fehr and Fischbacher,2004;Crockett et al., 2010b).

Furthermore, the link between these two mechanisms and moral behavior is made more salient in a more recent study by Ruff et al. (in preparation). In this study we show that the LPFC is causally necessary to avoid altruistic punishment, inducing people to share fairly between oneself and another person when punishment for unfair behavior is allowed. More specifically, increased LPFC excitability (by means of anodal tDCS) resulted in more successful social interactions compared to decreased LPFC excitability (by means of cathodal tDCS) or natural LPFC excitability (sham stimulation). This study thus suggests that it is possible to improve moral behavior by increasing sensitivity to punishment threat, which is possibly achieved as a side effect of improving self-control.

Finally, in a more recent study, Tassy et al. (2012), examined the effects of disrupting the right PFC by means of rTMS on moral judgments expressed in the context of moral dilemmas where a person is called to judge if it is morally permissible to sacrifice a small number of people (usually one) to save the lives of many more (usually five). The evidence reported by these authors show that compared to controls with undisrupted right PFC activity, disruption leads to a higher likelihood of making utilitarian judgments.

## **SHOULD WE TRY TO IMPROVE, AND IS IT POSSIBLE?**

Relying on the evidence outlined so far, this final section discusses the question of whether we should make use of the knowledge gained from empirical research on human behavior and psychology in order to improve moral practice and/or decision-making. Even if we arrive at a positive answer to this question, however, it remains unclear, how this project ought to be carried out and whether, in turn, this is possible. We shall discuss the former question first and then turn to the question of implementation.

Whether we should strive for moral improvement depends on (a) whether we believe that it is something worth striving for, (b) whether, assumed that we think it is, we should strive for it, and (c) granted that we should, whether the methods and techniques outlined in this essay provide morally acceptable means for such a project.


link itself may be questionable from a moral point of view. For instance, from a consequentialist perspective, it might be better if everybody acted upon certain rules laid out by some ethical framework, not upon their own moral convictions. A second point that can be made in this context is that, as a matter of fact, people generally strive for moral improvement or at least they claim to do so, i.e., they want to act in a more decent way, they want to become morally better individuals, they want the world to be a morally better place, etc. Three remarks shall be made about this: First, the folk notions of morally good individuals, actions, and states of affairs are vague and require clarification. Second, it is debatable whether people really do claim to strive for moral improvement and in which contexts and, again, what they understand by it. Third, it may be questioned whether their claim is appropriate, i.e., whether they are in fact concerned about moral improvements or only just say so. All these and other questions are worth pursuing in the future.

(c) An extended debate has arisen around this question for every single method we have outlined above (e.g., for the debate on enhancement: Douglas, 2008; Savulescu and Bostrom, 2009; Savulescu et al., 2011). Due to space limitations, we shall therefore only mention a few important arguments here.

First, the mere possibility of moral improvement may count in favor of such a project, once it is acknowledged that moral improvement is desirable and ought to be aimed at. Furthermore, it may be viewed as an extension of methods that are already used for moral improvement at present, e.g., teaching, self-reflection, etc.

Second, and in contrast to the position just sketched, it may be doubted that any of the methods and techniques provides an acceptable way to moral improvement at all. Several reasons may be given for this position. To begin with, one may be skeptical about whether any of the approaches outlined above can really yield actual moral improvement. After all, so far only small, primarily short-term and reversible effects have been achieved. Yet, although it seems plausible that there is a limit to improvement given the constraints of the human mind and body and that moral *perfection* cannot be achieved, it seems doubtful that it be not possible to improve at all. The empirical evidence we have reviewed above supports this notion.

Also, it may be argued that the methods for improvement are not reliable because further research is required in order to allow for their responsible application. However, it may be replied from a consequentialist perspective that such risks can be accounted for by calculating the sum of all possible outcomes each multiplied with the probability of its occurrence. For some techniques such as nudging, no morally neutral default option is available: e.g., either a country's citizens are organ donors by default or they are not, but each option invokes moral issues and there is no option outside of the moral realm.

Third, a debunking argument in favor of applying the techniques and methods described could be established on the ground that all considerations speaking against such a project are merely products of a human *status quo* bias.

Much more could be said on each of the considerations described above. We assume that enough evidence suggests that attempts of moral improvement could be believed to be promising.

Let us now turn to the question of implementation: if we assume that we should try to achieve moral improvement, should such a project actually be carried out and if so, how? As the matter here is complex and partly speculative, we shall restrict ourselves to providing a brief sketch of two issues that are relevant to this debate.

To even start considering improving moral behavior, one has to first tackle the complex philosophical issue of identifying a standard for moral improvement. This might require defining an ultimate universally accepted moral code, or agreeing on a set of general moral rules, being these consequentialist rules or nonconsequentialist ones. Such a standard would then have to be used to gear interventions used to improve moral behavior. Whether it is in principle possible to identify such a standard, however, is highly controversial. For one thing, moral relativists hold that moral standards are relative to a culture (Wong, 1984) and thus prescribe very different behaviors. Some,for instance,forbid abortion while others allow it. Improving moral behavior may thus be specific for every moral community sharing the same moral standards.More profoundly, one may be skeptical about whether it is in principle possible to achieve agreement on moral questions, given that current debates about moral issues reveal both intercultural and intracultural discrepancies. For instance, from a consequentialist perspective, it may be a moral improvement to increase the number of potential organ donors, but from some religious or deontological perspectives, this would be regarded as immoral.

Moreover, there is the danger of abuse by the agents or institutions in charge of implementing a process of moral improvement. Determining a prudent and trustworthy authority for this task may be extremely difficult if not impossible. Most people seem unwilling to entrust others with the care of their moral development.

Second, on a more practical stance, altering moral behavior may not yield the desired improvement effects or have counterproductive side effects. For instance, promoting trustfulness may result in exploitation of trustful agents, and increasing altruistic behavior may benefit unfairly selfish individuals who could easily take advantage of altruists. In addition, the danger of a moral "lock-in" is lurking: once a process of alleged moral improvement has begun, it may be irreversible, as the moral outlook produced by this process may prevent us from reviving lost values; mistakes may become uncorrectable.

In sum, the question of whether we should try to achieve moral improvement and whether this is possible raises a legion of extremely controversial questions. Note that the present paper itself does not mean to take a normative position on the issue of whether morality should be improved. The above points are merely meant to provide some leads for the debate.

## **CONCLUSION**

The aim of this paper was to investigate why individuals often fail to judge and act in a morally decent way and what one can do about it. Investigations on morally problematic and inconsistent behavior, dominated by, e.g., cognitive biases and emotional influences, have revealed two main clusters of reasons: first, agents reason in fallacious ways, and, second, in judging or acting, they fail to account for their moral convictions. These phenomena allow for several ways of improvement. For instance, nudging may facilitate actions in accordance with moral aims, training, and education may ameliorate agents' capacities for moral reasoning, pharmacological enhancement and transcranial stimulation techniques may yield improvements of both moral reflection and capacity to act morally. However, impact and application spectrum of all these methods have not yet been thoroughly studied, as their development is still an on-going process. An answer to the question of whether they should be implemented not only depends on future research in this field but also requires careful philosophical consideration and societal debate. We believe that these endeavors are highly relevant for a possible improvement

#### **REFERENCES**


review of the literature. *Psychol. Bull.* 88, 593–637.


of moral practice and therefore for the future of humanity in general.

## **ACKNOWLEDGMENTS**

We are very much indebted to Gina Rini and Simona Aimar for valuable comments on an earlier draft, and we would like to acknowledge helpful discussions with Todd Hare. Correspondence concerning this article should be addressed to Nora Heinzelmann, Mansfield College, Mansfield Road, Oxford OX1 3TF, UK, nora.heinzelmann@philosophy.ox.ac.uk, or Giuseppe Ugazio, Laboratory for Social and Neural Systems Research, University of Zurich, Blümlisalpstrasse 10, CH-8006 Zurich, giuseppe.ugazio@econ.uzh.ch. This work was supported with funding from a Swiss National Science Foundation Professorship (PP00P1\_128574) to PNT.

Harman, S. Nichols, J. Prinz, W. Sinnott-Armstrong, and S. Stich (Oxford: University Press), 47–69.


*Acad. Child Adolesc. Psychiatry* 44, 177–186.


moral cognition. *Nat. Rev. Neurosci.* 6, 799–809.


(Cambridge: Cambridge University Press), 1–19.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2012; accepted: 07 June 2012; published online: 06 July 2012. Citation: Heinzelmann N, Ugazio G and Tobler PN (2012) Practical implications of empirically studying moral decision-making. Front. Neurosci. 6:94. doi: 10.3389/fnins.2012.00094*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Heinzelmann, Ugazio and Tobler. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Optimal short-sighted rules

## **Sacha Bourgeois-Gironde\***

Laboratoire d'Economie Moderne, Université Paris 2, Paris, France

#### **Edited by:**

Gabriel J. Mograbi, Federal University of Mato Grosso, Brazil

#### **Reviewed by:**

Marijn Van Wingerden, Heinrich-Heine University Duesseldorf, Germany Kunjumon I. Vadakkan, University of Manitoba, Canada

#### **\*Correspondence:**

Sacha Bourgeois-Gironde, Laboratoire d'Economie Moderne, Institut Jean-Nicod, Université Paris 2, Pavillon Jardin – Ecole Normale Supérieure, 29, rue d'Ulm, Paris 75005, France. e-mail: sbgironde@gmail.com

The aim of this paper is to assess the relevance of methodological transfers from behavioral ecology to experimental economics with respect to the elicitation of intertemporal preferences. More precisely our discussion will stem from the analysis of Stephens and Anderson's (2001) seminal article. In their study with blue jays they document that foraging behavior typically implements short-sighted choice rules which are beneficial in the long run. Such long-term profitability of short-sighted behavior cannot be evidenced when using a self-control paradigm (one which contrasts in a binary way sooner smaller and later larger payoffs) but becomes apparent when ecological patch-paradigms (replicating economic situations in which the main trade-off consists in staying on a food patch or leaving for another patch) are implemented. We transfer this methodology in view of contrasting foraging strategies and self-control in human intertemporal choices.

**Keywords: behavioral ecology, intertemporal choice, myopia, patch-paradigms, self-control**

## **INTRODUCTION**

The aim of this paper is to assess the relevance of methodological transfers from behavioral ecology to the neuroeconomics of intertemporal choices. More precisely, our discussion stems from the analysis of Stephens and Anderson's (2001) seminal article. In their study with blue jays they report that foraging behavior typically implements short-sighted choice rules which are beneficial in the long run. Such long-term profitability of short-sighted behavior cannot be evidenced when using a self-control paradigm (one which contrasts in a binary way sooner smaller and later larger payoffs) but becomes apparent when ecological patchparadigms (replicating economic situations in which the main trade-off consists in staying on a food patch or leaving for another patch) are implemented [see **Figure 1**]. Stephens and Anderson show that in certain situations (self-control settings) the immediate consequences of choice strongly influence animal behavior, while in other situations (stylized patch situations) animals adopt strategies apparently consistent with evolutionary models that emphasize the long-term fitness consequences of individual choices.

We schematize the two types of experimental paradigms and then address our target question as to know to what extent it is theoretically relevant to generalize them to issues recently addressed in the neuroeconomics of intertemporal choices. We defend a dual system underlying intertemporal choices, which is, however, distinct from McClure et al's. (2004) view of a limbic system and a prefrontal system respectively encoding impatient and patient intertemporal choices. We rather focus on the contextual dependence/relevance of each of the two systems involved in that type of choices pleas in favor of the plausibility of optimal short-sighted behavior. This line of argument is briefly related to evolutionary considerations.

## **PATCHES AND SELF-CONTROL PARADIGMS IN INTERTEMPORAL CHOICE ELICITATION ANIMAL SELF-CONTROL**

Evolutionary theory predicts preferences for long-term decisions, if the issue is to guarantee the replication of a subset of genes making up an individual organism over a given temporal delay (until decay). Self-control paradigms are supposed to elicit those preferences at the individual level. In these settings animal have to wait for a time (*T*) and then have to make a binary choice between (1) a small-immediate reward (*t* <sup>1</sup> →*G*<sup>1</sup> →*p*) and a (2) large-delayed choice (*t* <sup>2</sup> →*G*2; with *t* <sup>2</sup> > *t* <sup>1</sup> and *G*<sup>2</sup> > *G*1), with a post-feeding delay (*p*) for one or both conditions. "Self-control" is defined as the case in which the subject waits for the large-delayed reward.

The long-term rate model predicts that in a self-control situation animals should choose the alternative 1 when the ratio of the first gain amount (*G*1) and the sum of the initial time (*T*), of the short delay (*t* <sup>1</sup>) and the post-feeding delay (*p*) is greater than the ratio of the second gain amount (*G*2) and the sum of the initial time (*T*) and the longer delay (*t* <sup>2</sup>), that is to say: *G*1/*T* + *t* <sup>1</sup> + *p* > *G*2/*T* + *t* <sup>2</sup>. Long-sighted decisions involve temporal elements that play an important role in determining preferences but, as experimental evidence shows, animals treat these temporal elements in different ways:

• *Delays between choice and food delivery* strongly influence foraging preferences; in fact animals prefer shorter-delay even if the delayed amount is double (in some settings wherein self-control is particularly hard to maintain, among non-human animals only rhesus macaques seem to succeed; see Evans and Beran, 2007).

• *Post-feeding delay* yields virtually no effect on animal preferences, which discords with far-sighted models (Stephens et al., 2004).

•*Inter-trial intervals* (ITIs) make little effect on preferences,which again disagrees with far-sighted models (as shown, for instance, in Schultz, 2010).

As we can see self-control results contradict evolutionary models assuming long-term calculations. An obvious limitation of these models is their ability to accommodate small discounting effects, but their lack of account for long-term effects of systematic iterative short-sighted decisions. However, the potential optimality of iterative myopic behavior in the long run can be elicited by using the alternative patch-paradigm.

#### **PATCHES**

In the patch-paradigm approach we define a"patch residence time" as the foraging duration spent by an animal on a particular area before it moves to another due to its observation or anticipation of local resources decrease (Stephens and Anderson, 2001). This approach relies on the prediction that patch residence times have an incidence on the long-term rate of food intake (Stephens and Krebs, 1986) and that foragers should spend more time in patches when travel times to a patch to another are longer. In fact travel time plays a role similar to the ITI but, contrary to what we observed with the self-control approach, in patch situations its effect is crucial. To the extent that foragers can choose between a small amount of food reachable in a short time and a large amount of food reachable in a longer time located on another patch, patch-paradigms implement a critical travel time cost. The contrast between staying on a patch and leaving that patch is this time expressed by a two-argument function that includes time and gain.

In spite of evidence to the effect that far-sighted foragers are sensitive to ITI, a question remains unaddressed: why in patch experiments long travel temporal intervals tends to induce animals to spend longer time to extract more food, while in self-control experiments ITI appears to have little effect? This question, as well as the apparent evidence that animals always adopt myopic strategies, has been tackled in an experiment where self-control and patch situations are parameterized as economically equivalent. In this experiment animals are trained to make as before (1) a binary choice between a small-immediate and a large-delayed gain (self-control), or (2) a choice between "leave" (small-immediate) and "stay" (large-delayed; patch-paradigm). The two situations are economically equivalent in so far as they present both the same conditions in terms of time and rewards (the same time/gain function as before). Since in this experiment the two situations are economically equivalent, if it is true that animals always adopt short-term strategies, the latter should be observable in both self-control and patch situations.

In order to establish the different patterns of choices in the selfcontrol and patch-use contexts and because they had observed that ITI has an effect only in patch experiments, the authors tested each context at three distinct ITIs. To describe the differences in each combination (self-control/patch and ITI) they measured the effect for both 50 s and 5 s levels of delay-to-small reward. Results of the experiment demonstrated that when the delay-to-small reward (below abbreviated as DTS) was large (50 s) preferences of the blue jays were not affected by the ITI. However, when the delayto-small was brief (5 s), the outcome was less tractable. In the control situation, the jays' preference for large rewards decreased together with the ITI, while in the patch-use condition the subjects' preference increased for the large reward together with the ITI. As predicted by evolutionary hypotheses about long-term fitness maximization patch-use situations revealed that jays favor large-delayed outcomes as ITI increased, but let us remind that in self-control cases, the conclusion was precisely the opposite.

To sum up, results show that:

• If DTS = 50 s then ITI has no effect on preference, but animals prefer large in the patch context.

• If DTS = 5 s then preferences for large increased with ITI in patch, while decreased in self-control. This shows an interaction between ITI and context in DTS.

The hypothesis proposed to explain these different behavioral patterns is that a single short-sighted behavioral rule underlies the approach to the different environments and their economic parameters. Self-control situations involving binary choices trigger a short-term rule that can be expressed simply as: "Choose 2 if *G*2/*t* <sup>2</sup> > *G*1/*t* <sup>1</sup>." This rule evidently disagrees with longterm maximization and ignores the potential impact of ITI in self-control contexts. However, the very same rule when applied in patch contexts may yield an optimal outcome, given that is these contexts the rule can be expressed as: "Choose 2 if [(*G*<sup>2</sup> − *G*1)/(*t* <sup>2</sup> − *t* <sup>1</sup>)] − [*G*1/(*T* + *t* <sup>1</sup>)] > 0." The difference in terms of long run optimality of the rules across the two experimental paradigms can be easily explained if we pay attention to the fact that the difference in short-term rates is equivalent to the difference in long-term rates because in the patch context the short-term rule predicts sensitivity to T, the ITI term that constitute part of the key delay. Based on this result, it is possible to conclude that the short-term rule not only agrees, but significantly determines the difference in long-term rates, that is to say that the short-term rule explains the long-term maximization in the patch contrary to self-control situation.

## **OPTIMAL FORAGING STRATEGIES VS. APPARENT LACK OF SELF-CONTROL IN HUMAN INTERTEMPORAL CHOICE**

Discounted utility theory (DUT) is the normative model used in order to account for intertemporal decisions. This model intends to capture the rationality of preferences over variably temporally located options under the joint criteria that those preferences are logically coherent, consistent over time and yield optimal payoffs. However, DUT has a restricted descriptive validity because it fails to capture more or less systematic violations of preferences temporal consistency. As neatly put by Kalenscher and Pennartz (2008): "Common difference and immediacy effects and the fact that preference reversals occur after deferring all choice alternatives into the future by the same interval, violate assumptions of consistent choice." Foraging animals' preferences might not essentially depend on the proportion of rewards and delays presented by alternative options but rather on the waiting time prior to the gains. The comparison of results for similar economic parameters over the two experimental paradigms demonstrates their incompatibility with an interpretation of foraging behavior in terms of sacrifice rather than maximization. It is not necessary to discard a short gain in order to maximize one's fitness in the long run and short-term benefits may add up to optimal payoffs.

Let us note that these results in behavioral ecology are consistent with findings from McClure et al. (2004) study in which they observed that neural activities of the limbic system were greater for decisions involving choices between immediate and delayed rewards than for choices between only delayed rewards. Some specific neural mechanism is involved when short terms options are available. Yielding to immediate small rewards may be evolutionarily advantageous because once a small reward is consumed, it gets out of sight and temptation and the subject can pursue its longer-term goals. If gains are easy to grab, with very low opportunity costs, their immediate consumption may enhance the pursuit of life strategies by smothering tingling appetites. Our foraging ancestors may have developed this sense of taking advantages of small rewards as they presented themselves in their environments. Neural mechanisms dedicated to the valuation of those immediate rewards may thus have developed in order to deal properly with scarce and random resources. In our contemporary economic environments, this neural system may still prove itself useful. However this intuitive and evidence-based dual system approach defended by McClure and his colleagues is far from unanimously received.

Kable and Glimcher (2007) have certainly stated one of the most potent objections to the view that intertemporal choices are supported by a dual system such as the one McClure describes. More exactly, they contend that one general valuation system deals with different characteristics of economic options. It is a complex but single brain system that is, according to these authors, involved in intertemporal choice, in the sense that they make clear that the ventral striatum, the medial prefrontal cortex, and the posterior cingulated cortex tracks the subjective value of monetary rewards. Relative valuation, encoded by neural activities in the different areas constituting this whole system, corresponds to the selective manipulation of economic characteristics of the rewards. Namely, activity in those three main regions increases as the amount of the reward increases and decreases when the actualization delay of that reward increases. Kable and Glimcher thereby reduce intertemporal choice to option valuation according to different features processed single-handedly by one common neural valuation system.

We argue in favor of a midway between these two opposite neuroeconomic positions. The phenomenon of patience vs. impatience is robust but the current analyses of how such contrasted choices are encoded by the brain may miss the main point about the nature of these choices. Kable and Glimcher (2007), to our opinion, rightly point to the fact that as far as economic valuation is concerned, one neural system, with internally differentiated activities modulation, may be enough. The point is that economic valuation is not the only parameter (notwithstanding its relative complexity in terms of magnitude/delay trade-offs) at stake. Contextual evaluation in terms of probability of reward and richness of environment, being part of a broadened ecological approach of what intertemporal choices are like in natural and artificial economic settings, are essential parts of the nature of intertemporal choices and may motivate the adoption of a dual neural system in order to account for the contrast between apparent patience and impatience. But *pace* McClure et al. (2004) the dual system in question is not best explained in terms of those insufficiently contextualized behavioral denominations (patience/impatience) but rather in terms of optimal short-sighted behavior vs. optimal long-sighted behavior.

Kolling et al. (2012) have recently explored the neural mechanisms of foraging with human subjects. They demonstrate that humans can alternate between "stay" and "leave" strategies in multi-branched patch settings such as the ones we have schematized above. Humans process aptly the costs inherent to foraging choices. The contrast between such choices involves neural structures that partly (but only partly) overlap with the valuation system indicated by Kable and Glimcher (2007) and crosses over limbic and prefrontal systems respectively associated in McClure et al. (2004) to impatient and patient choices. "Stay or leave" choices in foraging settings involve distinct neural mechanisms in ventromedial prefrontal cortex (VPMC) and anterior cingulate cortex (ACC). VMPC activities are dedicated to a general valuation system, like reported by Kable and Glimcher but the ACC encodes the search cost and potential richness of alternative patching in the environment, which is something sufficiently neurally specific to this type of intertemporal choices setting. It seems to us then relevant to assess the optimality of short-sightedness and long-term choice behavior in terms of (i) the structure of economic settings (i.e., whether they present foraging potentialities or binary frames requiring self-control) and (ii) the correlation between the economic structure (here in terms of richness and search cost) and the contextual relevance of used behavioral rules within these structures.

## **CONCLUSION**

Modern economic environments are labile and complex and the propensity to accept small rewards may be optimal in theface of the opportunity costs of more sophisticated strategies. It is also possible that the incorporation of long-term plans and self-projections in the far future into present decisions is more evolutionary recent than the tendency to accept immediate gratifications. From that evolutionary perspective, the preference of small-immediate rewards over larger future ones is not the sign of our irrationality, but may rather reflect the conflict between two evolved rational rules: the incremental pursuit of long-term goals and the maximization of low cost immediate rewards. Patch-paradigms used in behavioral ecology precisely demonstrate the compatibility and optimal coincidence of these potentially jointly evolutionarily selected behavioral rules. The apparent conflict shown by opposed behavioral data over self-control and patch-paradigms is solved if one considers, on the one hand, that aggregate immediate gains may add up to maximizing long-term fitness and, on the other hand, that predefined long-term goals are endogenously modified by actually made choices.

Monterosso and Ainslie (1999) note that "people and less cognitively sophisticated animals do not differ in the hyperbolic form of their discount curves." Some researchers (e.g., Herrnstein, 1997; Rachlin, 2000) hold the view that hyperbolic time discounting is effectively "hardwired" into our evolutionary apparatus. However, time discounting of humans and other animals may also rely on qualitatively different mechanisms. While both humans and animals discount the future at dramatically different rates, both humans and animals display a common pattern of time discounting commonly referred to as "hyperbolic time discounting." However, they believe that while such findings do not rule out the possibility that humans and animals discount

#### **REFERENCES**


decision-making. *Prog. Neurobiol.* 84, 284–315.


the future similarly, the quantitative discontinuity is indicative of a qualitative discontinuity. It is not that clear that discounting of humans and other animals relies on qualitatively different mechanisms even though, recent neuroeconomic studies (such as McClure et al's., 2004) tended to support that, specifically, human time discounting reflects the operation of two fundamentally different systems, one that heavily values the present and cares little about the future (which we share with other animals), and another that discounts outcomes more consistently across time (which is uniquely human). More extended and systematic comparisons between foraging patches and selfcontrol paradigms among human subjects could help to revisit this view.

Microeconomics research has seldom considered animals as possible research subjects, but in recent years evolutionary theories of human and animal decision making might show how such a transfer of methodologies and theoretical goals could be fruitful (Kalenscher and vanWingerden, 2011). Starting from evolutionary considerations we can understand how the uncovering of choice mechanisms in animals and their neural substrates may help understand human intertemporal choice behavior. Moreover, economic theories and ecological models show remarkable similarities in their assumptions and implications (Stephens and Krebs, 1986). Although the decision rules used by modern humans take place in a different context, they evolved in a similar context and they may actually be maladaptive today to some extent (Kahneman and Tversky, 1996). But it can also be envisioned that Stephens and Anderson (2001) provide a useful tool to understand that modern humans' decision strategies are optimally adapted to the sequential foreground/background environment faced by foragers, but at the same time they may fail to produce an optimal outcome in a "modern" binary choice environments.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 February 2012; accepted: 20 August 2012; published online: 11 September 2012.*

*Citation: Bourgeois-Gironde S (2012) Optimal short-sighted rules. Front. Neurosci. 6:129. doi: 10.3389/fnins.2012.00129*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Bourgeois-Gironde. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Time, self, and intertemporal choice

## **Cintia Retz Lucci 1,2\***

1 Institut Jean Nicod, UMR 8129 CNRS, Institut d'Étude de la Cognition, Ecole Normale Supérieure – Ecole des Hautes Etudes en Sciences Sociales, Paris, France <sup>2</sup> Laboratory of Cognitive Neuroscience, INSERM U960, Institut d'Étude de la Cognition, Ecole Normale Supérieure, Paris, France

#### **Edited by:**

Gabriel J. Mograbi, Federal University of Mato Grosso do Sul, Brazil

#### **Reviewed by:**

Hyojung Seo, Yale University School of Medicine, USA George Ainslie, Coatesville VA Medical Center, USA

#### **\*Correspondence:**

Cintia Retz Lucci, Institut Jean Nicod (ENS), L'École des Hautes Études en Sciences Sociales, 29 Rue d'Ulm, Pavillon Jardin, Paris 75005, France. e-mail: c\_lucci@yahoo.com.br

Neuroscientific studies of intertemporal choice (IC) have focused mainly on the neural representation of self-control mechanisms and valuation.This reflects what has been considered as the core of the IC phenomenon.The claim of this paper is that deviations from exponential reward discounting as a function of time might be fully accounted for by the deviation of subjective time from calendar time. This claim is based on evidence that specificities of time perception can modulate discounting. Consequently, time perception is fundamental to IC and it is crucial to understand the mechanisms underlying time processing in different situations; to investigate when human time perception differs from time as represented by the calendar metric system; and to study how time perception predicts choices. This paper surveys the recent literature on time perception in order to discuss the measuring of IC through time-perception specificities. The notion of self is also discussed within this temporal perspective. If time perception modulates discounting, and time perception is related to self, the relationship between self and time perception becomes a new path to be explored in the IC studies.

**Keywords: human time perception, discounting, self-referential processing**

## **INTRODUCTION**

An extensive literature in economics has explored the sources and consequences of the daily difficulties we experience when making intertemporal choices (IC), that is, decisions in which the moment of choice and the associated consequences are separated in time. The way humans discount values through time continues to motivate investigation into the mathematical representation that best fits real decisions (e.g., Benhabib et al., 2010; Ray and Bossaerts, 2011; Takeuchi, 2011). The observed pattern of delayed value discounting has also been explained in terms of procrastination (e.g., O'Donoghue and Rabin, 2000), self-control problems (e.g., Laibson, 1997), the multiple-selves perspective (e.g.,Ainslie, 1992), the visceral factor hypothesis (Loewenstein, 1996), and projection bias when predicting future utilities (Loewenstein et al., 2003).

Neuroscience can potentially increase the precision of the parameters of existing models. It can also propose new and important elements to the explanation of IC. This paper argues that an aspect relevant to the study of IC – human time perception – has not received enough attention. If the specificities of time perception are intrinsic to the patterns displayed in IC behavior, how can our models take account of this?

Currently, neuroscientific studies on IC have focused mainly on the neural correlates of self-control and reward evaluation. This reflects what is considered to be the core of the IC phenomenon in economics; hence, time perception does not seem to be included. Delay is usually assessed by observing the activation of other mechanisms, such as those known to underlie impulsive behavior (e.g., Roesch et al., 2006). Now, if time perception were considered intrinsic to IC, one could expect that experiments would be designed first of all to study the mechanisms that underlie time processing in different situations, and how the operation of these mechanisms predicts choices.

In fact, the analysis of human time perception shows wide variation in time processing, presently overlooked by the standardized metrics of time assumed by IC research. One week from now may be perceived as longer than the same period of 7 days 1 year from now. Therefore, either number followed by the word "days," "months," or "years" might not be sufficient to account for variations in temporal discounting. Different ways of reading experimental results, according to these different metrics, can lead to quite different interpretations of the data. This paper discusses the consequences of these variations.

The paper is organized as follows: Section "The Nature of the IC Phenomenon" shows that IC research has not given time processing mechanisms a central role, and explains why it should; Section "Time Perception in the Brain" surveys evidence showing divergences between time perception in humans and calendar time, and outlines studies that analyze the accuracy of models of IC when psychological features of time perception are taken into account. The Section "Are Time Perception, Self, and Discounting Related?" discusses two potential basic components of discounting: human time perception, but also, the notion of self.

## **THE NATURE OF THE IC PHENOMENON**

In IC situations, people tend to prefer immediate satisfaction over a delayed and bigger reward. Farsighted behavior is more than a normative feature of decision making theory. People believe they will be able to wait. However, faced with the situation – the future becomes the present time – they behave in a shortsighted way. Hence, IC creates the conditions for the emergence of behavior that is incompatible with the long-term declared interests of the individual. Thus, a first condition for the emergence of this inconsistency is the introduction of an interval of time. Such an interval permits the operation of cognitive biases, and temporal and hedonic distortion of prospective scenarios; it gives rise to internal conflict between future and present interests; and it makes pertinent risks and uncertainties related to future. Thaler (1981, p. 205) reported empirical data supporting the difference between today and tomorrow to be more important than that between 1 year, and 1 year and 1 day. This idea had also been mentioned in Strotz (1956), almost three decades earlier. Therefore, the possibility that time does not follow a static scale in human perception in IC is not a novelty. Still, studies tackling basic features of time perception have received far less attention in economics – and more recently in neuroscientific studies on economics – than those aiming to directly test IC's functional forms.

In general, time in economics has been represented on a fixed scale, so 1 day strictly means 24 h. According to empirical data, however, "today" doesn't have the same weight as any other day, and this affects the output of decisions. Today is not simply the aggregate of 24 h, but a word with a visceral meaning. This concept embraces physiological needs and a precise schedule, it is involved in recent memories, and it is prone to contextual influences. This fact is not completely ignored by economists. Features of the particular way in which humans perceive time have always been documented in economic studies of IC. One example is the notion of "diminishing sensitivity," according to which our perception of changes in magnitude follows a concave function (Kahneman and Tversky, 1979). Another is the "reference-level effect," proposed by Rabin (1998), in which marginal changes are perceived as having a specific time *t* as parameter, usually the present. Finally, the phenomenon of present bias, or a thoughtless preference for immediate satisfaction, is well accepted in the economic literature (see among recent papers Benhabib et al., 2010; Walther, 2010; Takeuchi, 2011).

Evidence indicates that distortions in prospection might be directly modulated by time. If the introduction of an interval of time triggers a different dynamic in decision making, time should be at the core of IC phenomenon. If this were a consensus, one main question would be "how is discounting modulated by variations in the perception of time?" The prevailing usage of the metrics without further specification (i.e., "6 months," "5 years," "present and future") doesn't allow us to distinguish how different temporal intervals affect decision making. There is a gap between human time perception and the standard metrics. The next section addresses this theme.

## **TIME PERCEPTION IN THE BRAIN**

#### **EVIDENCE: HUMAN TIME PERCEPTION DIFFERS FROM CALENDAR TIMESCALE**

How long does present time last? Just by changing the intervals of the discounting task protocol, a phenomenon, so-called future bias, challenges the limits of the "present" (e.g., Gerber and Rohde, 2010; Takeuchi, 2011). While the widely observed present bias implies a decreasing impatience through time (denoting a preference for the immediately available reward), the future bias represents the contrary, an increasing impatience. This phenomenon occurs during a specific interval and it is only detected when the first delay is short (e.g., 22 days in Takeuchi's study, instead of 3 months in Thaler (1981)'s protocol). Notwithstanding, present

bias still occurs – forming an inverse S-curve, concave for the first days and convex thereafter. So, to illustrate it, let us assume that a nice event is going to happen very soon (a fancy dinner, a great monetary bonus, a nice concert). As the time of the event gets closer, individuals feel more and more impatient (increasing impatience – future bias). When delivery is imminent, individuals show a strong preference for receiving it immediately. But if the delivery of the reward is postponed, people tend to be less impatient as the delay becomes longer (decreasing impatience – present bias). Therefore, if the first delay is long enough, empirical data will show only the present bias, while a shorter delay can reveal the growing expectation for the delivery of the reward. As claimed by Takeuchi (2011), this first period would be a kind of "extended present" and leads the author to ask when the future really begins. Intuitively, present time can be longer than "now" or "today."

If the present can be "extended," the future can be felt as less remote. At least, this is a possible interpretation of an increasing number of neuroscientific studies that attempt to understand the role of prospective thinking and memory in temporal preference. These studies have shown that thinking about the future in a precise context in a way that we can associate with storage memories (i.e., my birthday the next year) reduces discounting. An empirical test using fMRI (Peters and Büchel, 2010) brought about an "episodic condition" where they used real information, obtained from subjects in a pre-scan interview, about specific future events planned for the day of the reward delivery. As expected, results showed that discounting is modulated by episodic future event cues. A similar idea already appeared in economics; Read et al. (2005) found lower discounting rates in subjects' responses when the date in the future was specified, i.e., on "3rd July" instead of "3 months from here," or "1 year from now" and so on.

In fact, an important literature (for a review, see e.g., Schacter and Addis,2007) claims that episodic future simulation (imagining the future) draws on episodic memory [the capacity to remember experienced past events (Tulving, 2002)] and that the two share neural correlates. Moreover, recent results indicate that information relevant for the future might be preferentially selected in memory consolidation (information that is sent to "long-term storage") during resting or sleep (Dragoi and Tonegawa, 2011; Wilhelm et al., 2011).

Thus, a 20-year period may appear infinitely uncertain, but being 60 years old is easier to conceive. The representation of timescale divided into days, months, and years is methodologically easier, but ignores the real sense of time for human beings and neglects an important feature necessary for understanding subjective value formation. In fact, in standard economic analysis, the measure of time is rarely divided into days, months, and years. The Marshallian partial equilibrium framework introduced functional definitions for different periods of time. The "short" and the "long terms," according to this view, are defined by the variables which are allowed to adjust for the optimal solution, and not by specific time intervals; whereas the adjustment process is predominantly governed by marginal utility (demand) in the short term, it is the cost of production (supply) that determines equilibrium in the long run (Marshall, 1920, book V). In addition, from an

evolutionary perspective, the introduction of the current calendar is recent. It is not difficult to imagine that "seasons" have a more tangible meaning than "months" for rural-based societies, even nowadays.

#### **EVIDENCE: BIOLOGICALLY PLAUSIBLE VIEWS OF TIME PERCEPTION**

Elaborating more subtle distinctions, like near and far future (Eber and Prelec, 2007), or distinguishing between the notions of psychological and physical time (Kim and Zauberman, 2009; Ray and Bossaerts, 2011), seem to be promising research strategies for uncovering the way people actually make decisions. Following this approach, there is a search for the principles underlying human time perception. Ray and Bossaerts (2011), for instance, assume that calendar time differs from the internal representation of time in humans. Named "biological time," this internal chronological perception is said to vary randomly from calendar time, though, naturally, the way people discount future values follows biological time. Thus, choices that are biological time-consistent to the individual appear time-inconsistent to an external observer who bases their judgment of time on calendar time. Consequently, discounting rates are better represented by a hyperbolical functional format. Nonetheless, when biological time is accurate according to calendar time, discounting takes the exponential form. Other authors, however, have shown that the relation between time perception and calendar time is not random; instead, it follows a precise pattern. Takahashi et al. (2008) tested models including psychophysical effects [a stimuli-response relation resulted from investigations on the measurement of sensation (Stevens, 1975)]. The models based on Weber–Fechner's law [the relation between stimulus and subjective response is logarithmic (Stevens, 1975)] and Steven's law [a power law according to which equal stimulus ratios produce equal sensation ratios (Stevens, 1975)] fit the behavioral data better than the hyperbolic and exponential models. Cui (2011) specifies when Weber's law (the linear growth of variability in judgments is a function of the stimulus measure) is valid in time and value perception. Despite variations, this line of research stems from humans' actual perception of time, rather than from calendar time.

In neuroscientific studies, time perception has usually been analyzed in combination with attention (Kagerer et al., 2002; Wittmann and Paulus, 2007), emotion (Berlin and Rolls, 2004; Geoffard and Luchini, 2010), and working memory (Lewis and Miall, 2006). This latter relies on a literature that associates (increasing levels of) dopamine with (acceleration of) subjective time. Cheng et al., 2007, p. 149) explain that the ability to discriminate durations in the seconds-to-minutes range "is a form of temporal cognition that requires an optimal level of dopaminergic function in cortico-striatal circuits in order to control time sharing and regulate clock speed."

Yet, time perception has traditionally been studied in the context of impulsiveness. The idea of an internal representation of time appears in a classic paper by Barratt (1983), a major reference in psychophysiological and neurocognitive research on impulsivity, which names the widely used scale for impulsive behavior BIS, the Barratt Impulsiveness Scale. According to the author, individual differences in (the speed of) one's subjective sense of time are related to impulsiveness (Barratt, 1983). Wittmann and Paulus

(2007) claim that impulsive people overestimate the duration of a given period of time, resulting in heavier discounting of delayed rewards. The same idea is found in Takahashi et al. (2008). Both studies rely on theoretical reviews that associate neuropsychiatric and neurological disorders, whose main behavioral feature is impulsivity, with impaired time perception. Similarly, Berlin and Rolls (2004) found that impulsivity was correlated with time perception for all participants (both for borderline personality disorder patients, and control group).

So the design of experiments on IC assumes a common time frame taken from the calendar, whereas real time, as experienced by people, may have several modalities. This neglect may lead to misrepresentation of the real processes underlying IC. IC research should incorporate time perception and its dynamic into models of reward valuation mechanisms. The empirical literature surveyed above indicates that even if hyperbolic functional formats have fitted the data, when components of human time perception are considered, other functional representations can be argued to fit the data better. This can be the (often judged as unrealistic) classic exponential format (as in Ray and Bossaerts, 2011), or the Weber–Fechner discounting model with non-linear temporal cognition due to psychophysical effects (as in Takahashi et al., 2008).

## **ARE TIME PERCEPTION, SELF, AND DISCOUNTING RELATED?**

When we acknowledge the involvement of time-processing specificities proper to the agents within the IC, two promising research directions appear: (1) time perception modulates discounting (a subject developed throughout this paper), as part of the biological basis of IC performance; and (2) relating the notion of self to discounting, in the specific context of time perception.

How are self and time related? For Wittmann (2009, p. 1955), time is a function of the self. Considering that time is felt in absence of a specific sensory organ; and taking as a standpoint that a single interval of time can seem long or short depending on subjective well-being, time would be a construction of the self (see Wittmann's (2009) for a discussion of theoretical and empirical bases of this notion). Let us add to this thesis the assumption that intention is an essential component of the self. In light of this idea, Haggard et al.'s (2002) study offers an empirical illustration of a possible link between self and time perception, where subjects must estimate the duration of a time interval after intentional and non-intentional acts. Using Libet's paradigm, it is shown that estimation of the time interval between an action (pressing a button) and a consequence (a tone) changes depending on whether the act is voluntary or involuntary [the latter condition is generated by transcranial magnetic stimulation (TMS)]. In both cases, voluntary and involuntary acts are performed by the subjects – their finger presses the button – so the difference between the cases is mainly the presence of an intention (or yet we could call it a self-generated act in opposition to an involuntary act caused by TMS). The experiment suggests that intention, as component of the self, changes the subject's time estimation. It remains unknown to which extent the most frequently used time perception tasks (estimation, production, and reproduction) involve different cognitive processes. While in the time estimation tasks (as in Haggard and colleagues' study) the subject must evaluate the duration of

an external cue (a stimulus), in the time production tasks the subject must self-generate a specific duration indicated by the experimenter. In the third kind, the subjects are required to reproduce the duration of a stimulus. The extent to which the self is implicated in the experience of time in these three tasks remains to be understood.

How are self and discounting related in the context of time? Lack of sensitivity to one's own future-self may be the basis of the preference for present satisfaction. According to Mitchell et al. (2011), whether a person has an impaired perception of her futureself or not is reflected in the activity of the ventromedial prefrontal cortex (VMPFC), a region associated with self-referential processing. The disparity between VMPFC's activity when one thinks about oneself in the present *versus* in the future represents the degree of misperception of the future-self, according to these authors. They found that patience levels and activation of VMPFC were correlated. In addition, as predicted, these results were highly correlated with choices on the IC task.

This "neural signature" of self-referential processing is good news, from a methodological view, for the prospects of testing the hypothesis that self- and time-perception are components of discounting.

## **REFERENCES**

Ainslie, G. (1992). *Picoeconomics*. Cambridge: Cambridge University Press.


and valuation of the near and far future. *Manage. Sci.* 53, 1423–1438. doi:10.1287/mnsc.1060.0671


**CONCLUSION**

Two plausible components of IC were identified on the basis of relevant evidence. In consequence, two research paths are suggested: (1) to measure IC through time-perception specificities and (2) to further investigate how discounting can be modulated by the level of the notion of self within the agent. The idea of attributing a central role to time-processing mechanisms seems promising on biological grounds, whereas the second hypothesis, i.e., the notion of self, would need further investigation.

Hyperbolic functions are consistent with empirical data, but models that consider psychophysical effects or a biological perception of time have been shown to fit the data better. The present approach consists, then, in explaining behavior from a temporal perspective, supported by neuroscientific findings about the underlying neural mechanisms of time perception and the notion of self.

## **ACKNOWLEDGMENTS**

The author thanks the Ph.D. fellowship from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). She is also grateful to the anonymous reviewers for their valuable comments and suggestions to improve the manuscript.

*Sci. (Regul. Ed.)* 10, 401–406. doi:10.1016/j.tics.2006.07.006


biological clock implies hyperbolic discounting. *Front. Neurosci.* 5:2. doi:10.3389/fnins.2011.00002


*Games Econ. Behav.* 71, 456–478. doi:10.1016/j.geb.2010.05.005


*J. Econ. Psychol.* 31, 114–130. doi:10.1016/j.joep.2009.11.006


impulsivity and time perception. *Trends Cogn. Sci. (Regul. Ed.)* 12, 7–12. doi:10.1016/j.tics.2007.10.004

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 February 2012; accepted: 07 March 2013; published online: 27 May 2013.*

*Citation: Lucci CR (2013) Time, self, and intertemporal choice. Front. Neurosci. 7:40. doi: 10.3389/fnins.2013.00040*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2013 Lucci. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The origins of options

## *Paul E. Smaldino1\* and Peter J. Richerson2*

<sup>1</sup> Center for Advanced Modeling in the Social, Behavioral, and Health Sciences, Johns Hopkins University, Baltimore, MD, USA

<sup>2</sup> Department of Environmental Science and Policy, University of California, Davis, CA, USA

#### *Edited by:*

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso Brazil, Brazil

#### *Reviewed by:*

Eric J Johnson, Columbia University, USA

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso Brazil, Brazil

Kevin Thomas Hill, Virginia Tech, USA

#### *\*Correspondence:*

Paul E. Smaldino, Center for Advanced Modeling in the Social, Behavioral, and Health Sciences, Johns Hopkins University, 5801 Smith Avenue, Suite 3220, Davis Building, Baltimore, MD 21209, USA. e-mail: paul.smaldino@gmail.com

Most research on decision making has focused on how human or animal decision makers choose between two or more options, posed in advance by the researchers. The mechanisms by which options are generated for most decisions, however, are not well understood. Models of sequential search have examined the trade-off between continued exploration and choosing one's current best option, but still cannot explain the processes by which new options are generated. We argue that understanding the origins of options is a crucial but untapped area for decision making research. We explore a number of factors which influence the generation of options, which fall broadly into two categories: psycho-biological and socio-cultural. The former category includes factors such as perceptual biases and associative memory networks. The latter category relies on the incredible human capacity for culture and social learning, which doubtless shape not only our choices but the options available for choice. Our intention is to start a discussion that brings us closer toward understanding the origins of options.

**Keywords: decision making, options, choice, goals, neuroeconomics, culture**

## **INTRODUCTION**

Neuroscientists and psychologists studying decision making generally follow a standard practice borrowed from economics, which is to assume a solitary decision maker who is presented with a set of options and asked to choose among them. The quintessential mathematical formulations of choice, decision theory and game theory, deal exclusively with actors with a finite and completely known set of action choices, and this framework has allowed for the development of coherent formal theories of economic, political, and evolutionary organization. This practice has also been fruitful for the experimental sciences: we have learned much about the psychological factors that influence decisions in ways contrary to the rational ideal of *Homo economicus*, and have uncovered neurophysiological mechanisms by which we process and assess those options. If we pull back from the domain of economic decision theory, however, we find that very few choices are made in this way. We are rarely given an explicit set of options from which to choose, or even an obvious goal toward which we can strive to optimize our choices. Rather, we make myriad decisions daily based on competing goals and options. Those options come not from a predetermined and ready-made basket, but are vaulted into the mind from sources that are not well understood. Uncovering those sources and classifying that order is therefore a task of vital importance to the sciences of decision making.

There is an important distinction between the act of choosing among options and the process by which those options are generated (**Figure 1**). The former is well studied in the fields of neuroscience, psychology, and behavioral economics. The latter has barely been studied at all. When an individual makes a choice, she evaluates a number of options in terms of her desired goal (or set of goals), using internal cognitive processes and perceptual

information from the environment to select an action (Kahneman and Tversky, 2000; Cisek, 2007). Some researchers have also noted that organisms interact dynamically with the environment, and therefore the set of options is not static but rather shifts with the circumstances, with options competing for dominance based on available internal and external information (Cisek and Kalaska, 2010). This dynamic view of organism and environment is more realistic, but it still begs a question. Individuals must generate options for evaluation. Where do these options come from?

From a perspective of naïve epistemology, humans have a near infinite number of options available at any moment. Walking into a restaurant, for example, one usually thinks of the salient choice as being between which table to seat oneself, if such an act is permitted, or if it is not, of there being no choice at all but to go and see the host (or *maître d'*, depending on the fanciness of the establishment) to await seating. But there are countless other options. You could smack the headwaiter in the face. You could burst into song. Leap up on a table and tap dance. Try to walk through a wall. Take a nap on the floor. Drool. Check your watch. Scratch your leg. Stage a holdup. Turn around and leave. If there are limitless options, how are we ever to make any intelligent decisions?

The solution is that the operational set of options is not limitless. We are interested in the many processes that lead up to choice in the sense that it is usually modeled, the choice among a small set of options directly leading to action. Some of the near infinite number of theoretical options are not present at the point of decision because they have not been invented by the decision maker or communicated by some other individual. Holding up a restaurant is not an option unless you have learned how to use a pistol. Some options may be masked and others activated by many processes. For example, holding up a restaurant is masked

for most people by a general commitment to being law abiding. Contrariwise, for some young males with poor job prospects and skills with a firearm, entering any prosperous business may activate an assessment of the prospects for a successful holdup. Many acts are not the result of choice at all. For example, when a behavior becomes habitual, the options are reduced to one; we enter our favorite restaurant for breakfast, sit at our usual table, and order our standard item without consulting the menu. Only a single option is salient even though the readily available menu lists a dozen or more. Throughout this paper, we will use "options" to denote those behaviors that are actually considered by an individual, consciously or unconsciously, rather than the infinite set of all possible actions.

Whether an option is considered has a lot to do with an individual's goals. A person who had been awake for days and wasn't concerned with social appearances might very well sit on the floor for a nap if he found himself in a restaurant (or anywhere else, for that matter). Goals influence choice in fundamental ways. An individual chooses from among actions in order to achieve a goal. Sometimes certain subgoals must be achieved *en route* to the superordinate goal, and actions will be selected to accomplish these (Brooks, 1991). Goals, in turn, may change dynamically in response to internal processes and external stimuli, and therefore understanding how goals interact with choice among a static set of options is a challenge in itself. Goals also play an important role in the generation of options, since goals help to define the cognitive and perceptual salience of potential behaviors (Minsky, 1985). That being said, goals influence the domain in which we search for options, but options are not fully defined by goals. Even if a goal is singular and extremely well-defined, which is rarely the

case in natural settings, there are still a number of factors that will influence the available options. Some of these are provided by the environment itself – you cannot act upon what is not there, and what is there will be a source for ideas. Other factors are internal – options are influenced by an individual's memories, motivational states, and personality. As social organisms, however, humans do not make decisions in a social void. Social and cultural factors influence the generation of options – we learn from each other, obey cultural norms, and respond to social influence. Thus a considerable number of processes interact with goals to lead to the options the decision maker comes to entertain.

The problem of options is related to a classic conundrum in cognitive science and artificial intelligence called the "frame" problem (Dennett, 1984; Shanahan, 2009). Given a task at hand, one needs to determine a set of options for evaluation, but this cannot be obtained simply by eliminating all the ineffective options, because the list of such options is effectively infinite, and an individual has limited time and computing power for decision making. Nor can the individual explicitly determine which options are irrelevant, because that still requires the discrete consideration of an infinite list. The frame problem is often formalized as a search for a set of generalized axioms that allow an individual to consider only relevant actions (Shanahan, 2009); however, a computational model that solved the frame problem for an actor of human-level complexity would effectively describe how options are generated.

It is worth noting that subjects in many decision making experiments evaluate choices that are not necessarily *a priori* "correct." In addition to decisions concerning the optimization of an externally dictated reward, researchers have also considered actorcenter choices evaluated on the basis of individual priorities. These two categories of decisions have been respectively referred to as veridical and adaptive decision making (Goldberg and Podell, 1999; Mograbi, 2011). While veridical decisions always have a best response, adaptive decision making experiments can shed light on how options are evaluated based on innate and learned preferences in such diverse domains as food (Arana et al., 2003; Paulus and Frank, 2003), leisure activities (Chaudry et al., 2009), esthetics (Goldberg and Podell, 1999), occupation (Nakao et al., 2009), altruistic behavior (Moll et al., 2006; Rilling et al., 2008), and moral decision making (Cikara et al., 2010; Kahane et al., 2011). Nevertheless, experiments in both veridical and adaptive decision making overwhelmingly tend to supply participants with predetermined options, and therefore still fail to shed light on the origins of options.

So, returning to the restaurant, why don't we punch the waiter in the face? The rational response to this question is: why would we? To most people, this action has nothing to do with any salient goals, and therefore is not considered, even unconsciously. If, however, you are a jealous man, and the waiter has recently stolen your girlfriend, then *voilà*! Punching him becomes an option. That does not mean that you will choose this action – after all, you may be aware that this choice could land you in unwanted trouble – but it is considered where in the previous case it wasn't. Continuing this line of thought, let's now imagine that you have been looking for this man for the express purpose of punching him in the face. Now, even though it wasn't your active goal a moment before you entered the restaurant, the sight of him makes you change gears and rush toward him, fists flailing. This new action plan, of course, entails a whole set of choices to be made, with the availability of specific options restricting the set of possible behaviors in the processing of those choices.

Whatever the situation, an individual's course of action will depend on his evaluation of his available options, but those options are in turn influenced by a variety of factors – environmental, personal, and socio-cultural. These options are not necessarily available simultaneously for comparison. Decision makers may instead evaluate a sequential series of options, considering further solutions only until one is found that is satisfactory (Kahan et al., 1967). The process of considering options one at a time until a choice is made is known as sequential search, and can be characterized by a choice between selecting one's best current option ("exploitation") vs. continuing to search for a better solution ("exploration"). This is a classic problem in decision making, and has been extensively studied in neuroscience, economics, ecology, and computer science, but it is not the problem under consideration here. The complexities involved in the origins of options are fundamentally distinct from those of sequential search, recently framed (Cohen et al., 2007) in the immortal words of the Clash: should I stay or should I go? Once the decision to go has been made, the question becomes: where do I go, and how do I get there?

In this paper, we will consider how scientists might start thinking seriously about the origins of options. First,we will expand that discovering these origins cannot be achieved through solutions to sequential search problems, a traditional technique in decision making research. Following that, we will start fresh and discuss some of the factors involved in the generation of options, with the hope that a detailed enumeration of these factors will clarify the problem and inspire future work. First, we will briefly discuss the role of the environment on options. Next, we will explore the individual-level psycho-biological factors most familiar to neuroscientists and cognitive psychologists, which include things like memory and affect.We will then discuss the role socio-cultural factors on the origins of options in human decision making. While decisions are made by individuals, the intensely social nature of humankind necessitates the consideration of social and cultural forces. Finally, we will consider the implications and limitations of the ideas presented here.

## **SEQUENTIAL SEARCH**

In choosing an example for the case of well-defined options, we used a situation in a restaurant. Why? It was likely chosen because the first draft of this paper was written in a café, and our mental models (Johnson-Laird, 1983) related to restaurants were primed. It is possible that other scenarios were evaluated, but more likely that we stuck with the first thing that came to mind. If "restaurant" was a satisfactory choice, then we likely deemed it "good enough," and proceeded. If we had not been able to find a suitable example in the context of a restaurant, then we may have begun a *sequential search* for a more suitable choice. Most theoretical and experimental work on decision making under conditions where not all options are known to the decision maker have involved sequential search (Kahan et al., 1967; Hunt et al., 1989; Real, 1990; Hutchinson and Meyer, 1994; Daw et al., 2006; Cohen et al., 2007; Rendell et al., 2010), including so-called "naturalistic decision making" (Todd and Gigerenzer, 2001).

A sequential search is a two-stage process. An individual initiates search and finds a possible candidate solution for her problem. If the solution is not adequate, she searches again. In some cases, a decision to discontinue the search is made only when the perfect solution (if known) is found. In other cases, the search is discontinued in favor of the current "best" solution when the estimated cost of continuing the search outweighs the benefit of retaining the current solution. Optimal solutions for sequential search tasks have been discovered for various conditions in economics (Gittins, 1979; McKenna, 1979), artificial intelligence (Russell and Norvig, 2010), and behavioral ecology (Luttbeg, 2002; Stamps et al., 2005; Wiegmann et al., 2010), though the restriction of bounded rationality (Simon, 1990) makes it likely that evolved minds evaluate search decisions with fast and frugal heuristics (Gigerenzer et al., 1999), such as satisficing (i.e., choosing the first option to meet some evaluation threshold; Simon, 1956).

If options are evaluated one at a time (or even in parallel) with sequential search, then haven't we reduced choice to two options: search or stay? This is a fundamental decision, analogous to the neuropsychological distinction between approach and withdrawal behaviors (Kinsbourne, 1993), and has received some well-deserved attention in the neuroscience literature under the computer science-inspired name of exploitation vs. exploration (Daw et al., 2006; Cohen et al., 2007). A problem endemic to all models of sequential search, however, is that the individual is assumed to know *how* to search. A mouse in search of a nest site can choose the best spot he has found so far or continue to search. This is a dichotomous choice, and one that may rely on a mental calculation of risk based on past experience. However, once the decision has been made to continue searching, where does the mouse look? While his options may not be technically infinite, in a complex environment such as those in which wild mice are found, the search space is nonetheless alarmingly vast. Yet somehow, a mouse searches for habitats without curling up in a fetal position and rocking back and forth while squeaking to itself, overwhelmed by an ocean of options. Similarly, a person entering a restaurant is not driven mad by an infinitude of possible behaviors. In fact, the ease with which we make choices is remarkable. Our philosophy departments are not littered with baffled epistemologists, too stunned by innumerable options to move.

The decision of whether to exploit or explore is a fundamental component of decision making, but it does not capture how the decision maker gathers the options for exploration. While much decision making theory assumes that the structure of the environment presents an individual with clear choices, this is rarely the case. Rather, our brains have evolved to detect salient features of the environment, or dimensions along which to search for those features. Those features and dimensions are then shaped and constrained by individual experiences and social factors,which in turn shape and constrain the perceived environment. The options available to an individual decision maker in natural contexts emerge organically from neural processes influenced by environmental, psycho-biological, and socio-cultural factors, and are not usually available *a priori* to an outside observer. We will now turn to explore in more detail the role these factors play in generating options.

## **ENVIRONMENTAL FACTORS**

The external environment shapes our options by providing structure to our behavior. This is so obvious that it will be given only cursory treatment here. The option to build a snowman only makes sense in a snowy environment; it is rarely ever considered by indigenous Hawaiians. Environments are also more than just rocks and trees and buildings and weather. Our environments also include other individuals. For example, while economists have noted the importance of market forces in constraining options, this also extends to what Noë and Hammerstein (1994) have called "biological markets" on the analogy of the markets that are so important in presenting options in the case of humans. The availability of and demand for interaction partners influences the pools from which we choose our friends, romantic partners, and business relations. One's position in a social network also influences the spread of information to and from that individual, including cultural norms and expectations (Christakis and Fowler, 2009). How specific social factors influence perception and cognition will be discussed in greater detail in a subsequent section but we must first recognize that the individuals with whom we interact—and how those individuals are themselves socially connected—shape the types of decisions we will be in a position to make as well as the available options for those decisions (López-Pintado and Watts, 2008; Zerubavel and Smith, 2010).

Finally, a decision may be made to alter the environment (physical, social, or both) in order to provide the individual with new options. Gibson (1979) summed this up nicely when he posited that perception of an object is intrinsically related to the behaviors it affords the individual. Affordances are the passive natural analog of the selling points that salespersons use to convince us to buy their product. Options, then, are constrained by the potential behaviors afforded by the environment.

## **PSYCHO-BIOLOGICAL FACTORS**

All aspects of psychology emerge from the interplay of neuronal, hormonal, and other biochemical processes. Psychology, then, *is* biology, but the nature of psychological phenomena demands that we abstract these phenomena in conceptual and linguistic terms (rather than in purely physiological terms) in order to discuss them coherently. In terms of decision making, it is often useful to articulate constraints in psychological rather than physiological terms. Here, we choose to use the designation "psycho-biological" to emphasize the connection between the two levels of abstraction. Whatever the articulation, there are a number of psycho-biological factors that constrain the options available for decision processes. The exploration of each of these in full would require much more space than we have here; what follows is by no means a complete list, but rather a broad survey of the mechanisms and processes that constrain our construction of options.

## **PERCEPTUAL BIASES**

We cannot choose what we cannot perceive. The senses of each thinking organism have evolved to perceive the world in a way that reflects the salient cues that have been important for survival and reproduction throughout the species' evolutionary history (von Uexküll, 1934/1957). An organism's evolved perceptual biases therefore shape its options by dictating the relevant stimuli to which it reacts. Primates, for example, evolved in a niche where forward-facing eyes and good color vision were essential for navigation,foraging, and predator evasion. Swinging through trees and navigating quickly through dense, three-dimensionally complex forests requires good depth perception, and a dietary requirement of ripe fruits necessitates the ability to distinguish the color signals of fruits and leaves that are ready to eat. Grazing mammals such as deer or gazelles, on the other hand, have diets that are less dependent on color cues, and so have less precise color vision. They live in open plains, where they are vulnerable from predation from all sides, and so have eyes on each side of their head, with wide, oblong pupils for an almost completely panoramic visual field (Attenborough, 2002). Even closely related species have differences in organization of the sensory cortex related to different needs of their ecological niche, as demonstrated by recent work on rodents (Campi and Krubitzer, 2010; Krubitzer et al., 2011). Humans are famously unable to see the ultraviolet light, which renders invisible to us the often-beautiful UV-reflective patterns that guide many bird and insect species to find food, mates, and prey (Kevan et al., 2001).

These evolved biases have important effects on the ways organisms solve problems in a given environment. For example, the Norway Rat (*Rattus norvegicus*) is a semi-aquatic animal, and therefore is well-equipped to solve hidden-platform water maze, a common laboratory test of spatial learning. Mice, who in the wild spend much less time in water, have more difficulty solving the water maze, relying less on spatial cues than on random movement strategies (Frick et al., 2000). The Brazilian short-tailed opossum (*Monodelphis domestica*), a rat-sized arboreal marsupial, is generally unable to solve the hidden-platform task (Kimble and Whishaw, 1994). Each of these animals should be physically able to solve this task, but their evolved perceptual biases influence the strategic options available to them. These biases therefore influence the generation of options for decision making at a fundamental level.

There may also be differences in perceptual biases within species. These obviously include perceptual impairments such as blindness (and color blindness), deafness, etc. In addition, genetics and experience alter the salient options for decision making in many ways, which are explored in the subsequent sections.

#### **PERSONALITY**

Personality refers to individual differences in general behavioral tendencies, sometimes called *behavioral syndromes* when referring to non-humans (Sih et al., 2004). In humans, personalities are relatively stable throughout adulthood, though this stability largely depends on the constancy of the social environment and the individual's role therein (Ardelt, 2000), and long-term changes can still be effected by certain life-changing events (MacLean et al., 2011). In the context of decision making, personalities refer to predictive behavioral regularities within individuals, which are influenced by complex interactions between genotype and developmental experience (Bouchard and Loehlin, 2001). Personality traits are useful descriptors that help us predict individual decision making. For example, riskier behavior for gains is correlated with increased Openness to Experience and decreased Neuroticism (Lauriola and Levin, 2001), and stable ambiguity-seeking tendencies have been shown to predict decision making behavior under both risk and ambiguity (Lauriola et al., 2007). The way in which reward is processed in the brain is also mediated by certain personality traits (Simon et al., 2010).

By defining behavioral and perceptual tendencies (Shrauger and Altrocchi, 1964; Perugini and Prestwich, 2007), personality can influence the options available to a decision maker. Imagine an individual going to a party where she does not know most of the guests. Many of her decisions, and the options thereof, will be dictated by personality-guided goals. If she is shy, she may try to associate only with people she already knows, and may stick to the edges of a room full of unfamiliar people. If she is thirsty, she may wait, or nervously ask the host for a glass. A socially bold person, on the other hand, might go directly to the refrigerator for a drink, and enthusiastically seek out conversations with strangers. Of course, it is possible that the shy person thought of going for the fridge, but rejected the action. However, the bold person assumes she will be liked (Sinclair and Lentz, 2010) and is unlikely to consider slinking along the walls or sneaking out to get a drink at the store around the block, while the shy person does. Importantly, personality traits influence more than just the way options are evaluated; they influence the determination of which options are available for evaluation.

A recent study by Gino and Ariely (2012) gives a simple example in a study of creativity, which can be characterized at least in part as a measure of the diversity of options an individual can generate. Subjects were given a difficult visual perception task of determining which of two adjacent triangles contained more circles, and could receive cash rewards. However, reward payoffs were not determined by accuracy but by absolute behavior: guessing the right triangle always paid off 10 times more than guessing the left. It was found that measures of creativity (a personality trait) correlated with the tendency to profit maximize rather than guess correctly. Though the authors characterize this behavior as dishonesty, a more parsimonious explanation of their results is that the possibility of "cheating" to maximize profits rather than perform as instructed simply did not occur to less creative individuals. As the authors note, "creativity may lead people to think of more and diverse ways they could benefit from the monetary gains from cheating, thus making cheating itself more tempting" (p. 11).

## **AFFECT**

Affect is a broad term used to encompass moods, emotions, attitudes, evaluations, and preferences (Zeelenberg et al., 2008). Here we use the term to contrast with personality traits, which are more stable over the long-term; we define affective states as those situationally influenced brain states that alter the processing and prioritization of stimuli and behavioral choices. Though the variable nature of affect is often ignored by decision theorists, affective states are clearly a guiding factor in deciding among choices (Bechara et al., 2000; Zeelenberg et al., 2008). Zajonc (1980) has proposed, for example, that all perceptions contain some affect: we see not just a house but a *nice* house, an *ugly* house, etc. Building on this, Slovic et al. (2007) have proposed that many decisions are made using an *affect heuristic*. In these cases, the broad feelings associated with various options drive our choices more than a rational (profit-maximizing) evaluation of the associated payoffs. A similar idea has also been developed by Cunningham et al. (2007), with the additional proviso that evaluations are iteratively processed as relevant attitudes and associations are realized through spreading activation.

What is still overlooked, however, is that the options for many decisions are also guided by an individual's affective state. Emotions, for example, may determine which goals are most salient, and therefore which options will come to the forefront (Zeelenberg et al., 2008). Damasio's *somatic marker hypothesis* (Damasio, 1994; Bechara and Damasio, 2005) posits that the emotions experienced at the onset of and in response to a situation will bias the response options by activating in working memory those choices made in similar emotional states. Whether a person is angry, tired, hungry, manic, sad, or scared not only influences how she evaluates a set of options, but, given a minimal degree of agency, will influence what decisions are most important, and which options are available for consideration.

#### **MEMORY AND LEARNING**

Complex organisms are able to develop, adapt, and survive not only because they have been evolutionarily selected to do so, but also because the stimuli and experiences are internalized to guide future perceptions and decisions. This, of course, is *learning*, and the persistent effects of learning on cognition fall under the classification of *memory*. Memory obviously influences decision making in terms of the prior knowledge we can use to evaluate our decisions, whether in the Bayesian sense of prior probability

distributions, or in terms of the relevant schemas and mental models used to evaluate situations. Memory is also related to affect, in the sense that one's previous affective associations with a situation or option can guide choice (Damasio, 1994; Bechara and Damasio, 2005; Slovic et al., 2007). Memory can be an important factor in one's motivational state, which we have already shown to influence the selection of options.

Since options must arise from the interplay of salient (external or internal) stimuli and preexisting cognitive structure, it is unsurprising that memory should be involved in influencing the origins of options. Perhaps the clearest influence of memory on the emergence of options is in the determination of the current goal or motivational state. As a simple example, consider a rodent exploring a dark arena. Research on Tristram's jird (*Meriones tristrami*), a nocturnal rodent native in the Middle East, has shown that the animal has at least two distinct methods of exploration depending on its experience in the arena (Avni et al., 2006). At first, it "loops" around somewhat aimlessly, probably to gather enough spatial information to establish one or more "home bases." Once a representation of the arena is internalized, the animal switches to "home-base behavior," in which it makes short excursions from a preferred location, returning to the same location each time. Knowledge of the neural processes involved in this kind of spatial learning, at least in the hippocampal formation, is quite advanced (Moser et al., 2008). In this example, the animal must decide where to go (or whether to stay put), but the method for this decision process is determined by a mental schema dictated by the animal's knowledge of the space.

Consider also the well-known influence of expertise in human decision making. A chess grandmaster can easily recall complex (but plausible) board positions and can make well-considered decisions with ease, which contrasts with the difficulty in both memorization and strategy found in chess novices (Simon, 1987). The grandmaster has not only memorized board positions, but has also internalized schemas and strategies, and can thus think many moves in advance, a difficulty for novices. Previous experience certainly influences the evaluation of choice options, but it also allows for the consideration of different options. Therefore, the difference between a master and a novice is not just the speed of search; through experience, the master has options unavailable to the beginner and conversely may not consider options that inexperienced players do. Even in chess, with a finite number of possible moves each turn, the expert may choose not only to make a particular move, but to embark on a planned series of moves, for which the choice of moves and the evaluation of the opponent's moves are phenomenologically quite different than for the novice who chooses one move at a time. For more naturalistic decisions, the influence of experience on the generation of options can be even more severe and nuanced.

Individual learning is an error prone process. The information transferred in social learning processes is not always received without error either, nor are memories necessarily recalled without inaccuracies. We may misinterpret a communication not only because of imperfect perception, but also due to our own expectations and prior knowledge. Our memories are also imperfect, and we often fill in details of recalled events with conjectures and confabulations. Hirst et al. (2009) have shown this to be the case

even when we are certain that our memories are accurate, as with so-called "flashbulb memories." Moreover, conversations involving the recall of an important event can alter future recollections (Coman et al., 2009), introducing more errors. Errors introduce variation in our behavioral repertoires, and work as "mutations" for behavior selection. Acting on the basis of a previous choice, we may modify a behavior haphazardly to create a new option. If the new behavior is reinforced, it may become the dominant option around which further options are generated through haphazard modifications. Indeed, the operation of selective forces on errors may be a driving force in the production of creative thought (Campbell, 1960).

### **OTHER PSYCHO-BIOLOGICAL FACTORS**

At the individual-level, there are certainly other important factors that influence options. These include gender and biological sex, age, working memory (Bechara et al., 2000; Hinson et al., 2003), and cognitive biases such as framing and anchoring effects (Kahneman and Tversky, 2000). Evolution has supplied humans with useful decision making heuristics that work well under many conditions of limited information (Gigerenzer et al., 1999) and specific environmental structure (Bullock and Todd, 1999), the neural processes of which have begun to be uncovered (Volz et al., 2006). Additionally, individual differences related to both shortand long-term behavioral tendencies (i.e., affect and personality, respectively) are influenced by hormonal and genetic factors (Lee, 2008; Rilling et al., 2008). The nature of these influences may involve complex interplay between perception, cognition, and physiology (Wimsatt, 1972; Schank, 2001). Many facets of psychology and neurobiology are at work in the generation of choice options.

## **SOCIO-CULTURAL FACTORS**

A decision is made by an individual and so, strictly speaking, all relevant factors shaping and constraining options reduce to those found within the individual, i.e., the psycho-biological factors discussed above1. However, social forces enter into the decision making processes of all social animals, and none more so than humankind. Humans are unique in the animal kingdom for the richness of their social ties and cultural phenomena, and for the ability of their cultures to rapidly evolve (Richerson and Boyd, 2005). Many other species engage in complex social behaviors of interest to decision scientists (de Waal and Tyack, 2003). The coordinated flocking behavior of birds in flight, for example, requires each individual to dynamically respond to its neighbors (Couzin, 2008), not to mention the intricate social dynamics found in nonhuman primates (de Waal and Tyack, 2003; Cheney and Seyfarth, 2007). Due to the unique role culture plays in human behavior (Chudek and Henrich, 2011), however, we will restrict this discussion to socio-cultural influences on human behavior, and the generation of options for human decision making.

<sup>1</sup>This excludes collective decision processes, where the relevant behavior is at the level of the group rather than that of each component individual, and represent an extremely interesting line of research in their own right (e.g., Kerr and Tindale, 2004; Sumpter, 2006; Couzin, 2008).

## **HUMANS ARE SOCIAL ANIMALS**

Human cognition has been shaped by evolution to interpret and react to the behavior and intentions of others, and to collaborate and cooperate in shared goals in ways that differ fundamentally from our nearest primate relatives (Tomasello et al., 2005; Csibra and Gergely, 2009). There are many facets of humans as social animals that influence the options for decisions by interacting with many of the individual-level psycho-biological processes mentioned above, the diversity of which this section offers a mere taste.

## *The drive to be social*

Humans are not content to act in solitude, a fact recognized long ago by Aristotle when he declared that "man is by nature a social animal." We have a seemingly intrinsic drive to be for company and social acceptance, which will influence the options made in social or potentially social situations. Loneliness, for example, is a social emotion that influences perception and attention, which in turn influence available options. For example, Cacioppo et al. (2009)found that lonely individuals were less rewarded by pleasant social stimuli (e.g., a rollercoaster or a man and a dog running), and spent more time looking at images of social suffering than non-lonely individuals. Further, the desire for companionship and understanding is so strong that some individuals will even form relationships with anthropomorphized inanimate objects in an effort to stave off loneliness (Epley et al., 2008).

## *Social roles*

Sociologists have long argued that one's position within a society plays a large part in determining the roles that one can adopt and the actions that one can take (e.g., Goffman, 1974). These roles are often domain specific and dependent on the social landscape – a person behaves differently at work with her boss than at home with her friends. A woman may behave very differently in situations with her children, in which her role as "mother" is more salient, than in situations solely among her peers. On the other hand, tendencies developed in one sphere of life can influence behavior in other spheres. Kohn and Schoenbach (1983) found that individuals whose jobs were more "self-directed" were more likely to strive for autonomy in other domains, whereas those with more constrained job opportunities tended to favor conformity over autonomy. Importantly, these values of autonomy or conformity were transmitted both explicitly and implicitly to their children. Emphasizing one value system over another will influence an individual's perceptions of situations as well as his goals within those situations.

Social roles also influence how we respond to various individuals. A generic social identity might drive behavior – we help an elderly woman carrying a heavy object, but not a strong young man. Our minds keep track of social relationships at the personal and interpersonal level that are quite complex, and the relevant schemas, motivations, and memories associated with those relationships influence the options and goals for decision making. Social roles and relationships influence who we trust, who we fear, and who we learn from. Humans' amazing capacity for social learning in particular is a large part of what makes our species unique (Hermann et al., 2007), and who we target for social learning is important. In addition to our parents, we turn to people who are respected and venerated by others (Henrich and Gil-White, 2001) – indeed, this choice constitutes a sort of second-order social learning as we learn from whom to learn. Humans also preferentially reward and learn from individuals that are similar and punish, ostracize, or ignore those who are different (Aronson, 2004). This tendency appears very early – 12-montholds preferentially copy the food selection choices of unfamiliar adults who speak their language compared with similar targets speaking a foreign language (Shutts et al., 2009).

## *Imitation, joint action, and emotion contagion*

Our options for behaviors are influenced by what the people around us are doing. This refers to more than just environmental constraints like "I can't walk there because Joe's in the way." Sociality is so deeply ingrained in humans that others' behaviors can automatically trigger behavioral options in our brains. The mirror neuron system (Rizzolatti and Craighero, 2004) is the most famous example of this, but numerous brain networks in which action and observation comingle have been identified for sensations, emotions, and motor actions (Frith and Singer, 2008). This link between observational and behavioral pathways facilitates social learning, allows us to coordinate in complex joint tasks (Tomasello et al., 2005), and probably fosters social cohesion and the propagation of cultural norms and regional idiosyncrasies. When two people interact, they often unconsciously mimic each other's postures, mannerisms, and facial expressions (Chartrand and Bargh, 1999). When this mimicry takes place, interactions occur more smoothly and the partners tend to like each other more (Lakin and Chartrand, 2003).

In addition to directly influencing options by activating behaviors, we can influence each others' options by affecting their emotional states with our own. This may involve the simple spread of emotion, such as when we become fearful upon viewing another person expressing fear (Morris et al., 1996), or a reactive set of responses, such as exhibiting an expression of appeasement (e.g., embarrassment) in response to another's anger (Keltner and Buswell, 1997).

## *Communication*

We don't get all our ideas from individual trial and error. While observational learning (Bandura, 1986) is an important source of information, we don't socially learn solely by observation. The direct communication of ideas through gesture, symbol, and language represents a huge divide between humans and other species, and gives us immediate access to options generated by other minds. Indeed, seeking the advice or consultation of a friend or colleague can sometimes be an option in its own right. Whether solicited or not, advice is often most useful when it proposes options that were not previously considered, including the framing of a situation in a new light. Supporting this idea, work by Page (2007) has shown that groups are often best able to solve difficult problems when the constituent individuals are from diverse backgrounds, which increases the number and breadth of available options.

## **HUMANS ARE CULTURAL ANIMALS**

While all social animals are likely to be influenced by social learning, social contagion, and communication, these are hypertrophied in our species to create complex and diverse cultures (Tomasello, 1999;Jablonka and Lamb, 2005). The tremendous capacity for social learning coupled with an innate desire to learn the behaviors and customs of those around us leads to differentiations in groups, including customs, norms, and ethnic markers. It has become increasingly apparent that culture can fundamentally affect basic cognitive processes (Shore, 1996; Nisbett et al., 2001; Nisbett and Miyamoto, 2005), and that cognitive universality is largely mythical. Indeed, the fact that most psychological research is conducted on Western undergraduates should give us pause in considering how well we currently understand human cognitive and behavioral tendencies (Henrich et al., 2010). Culture guides social learning and shapes the schemas and associative networks of what is proper and what is possible in various circumstances – in other words, what behaviors are entertained as options. Indeed, cultural experience may even shape the way a given circumstance is perceived. A well-considered neural or psychological theory of decision making cannot ignore culture.

## *Culture influences cognition*

Nisbett and colleagues (Nisbett et al., 2001; Nisbett and Miyamoto, 2005; Na et al., 2010;Varnum et al., 2010) have argued persuasively that many aspects of cognition and perception are fundamentally dependent on cultural influences. Their research emphasizes the differences between two general modes of thinking: the *analytic* style prevalent in the West, and the *holistic* style prevalent in East Asia. Analytic thinking involves the decontextualization of an object from its field, a focus on attributes of an object used to assign it into categories, and a preference for using rules about the categories to explain and predict behavior. In contrast, holistic thinking involves an orientation to the context or field as a whole, and a preference for explaining and predicting events based on relationships. Holistic thinking tends to rely on experience-based knowledge rather than abstract logic, and employs dialectic reasoning – emphasizing change, recognizing contradiction as an inherent property in the universe, and promoting a search for compromise in solutions.

These cultural differences in cognitive styles have been shown to influence both perception and memory. In a study by Masuda and Nisbett (2001), Japanese and American subjects were shown animated underwater scenes with a focal animal (a fish) and asked to describe what they had seen. The Japanese subjects were more likely to mention background information and relationships, whereas the Americans were more likely to concentrate on the focal animal. During a later recognition task, Japanese subjects had more difficulty remembering the focal animal if it was shown against a different background than the one originally seen; Americans did not show this effect. Cultural effects have also been shown in the perception of social events. Westerners are much more likely to explain another individual's behavior in terms of inherent personality traits,while EastAsians are more likely to consider explanations that take into account situational, contextual, and societal factors (Nisbett et al., 2001). If an event is perceived in a fundamentally different way, then it is probable that the options for decisions regarding that event will also differ.

#### *Culture explicitly dictates options*

Different cultures may be associated with differences in the physical environment, which alter decision making by providing different behavioral affordances (Miyamoto et al., 2006). In addition, cultural norms can influence options by suggesting or restricting choices, or by determining which behaviors will achieve specific social goals. We do not always cave to social pressures and cultural norms, but these factors still influence options even when we rebel. A secular teenager in an affluent US suburb may rebel by listening to hardcore punk music, while a rebellious teen in a fundamentalist religious community may get a thrill from sneaking a listen to a mainstream pop station.

Cultures may vary in terms of which behaviors are salient or even permitted. For example, cultures vary widely in the degree to which young people can make their own decisions concerning whom they marry (Buunk et al., 2010). A fascinating and somewhat horrific illustration of this type of cultural influence is the phenomenon of "bride abduction"in Central Asia (Werner, 2009). In Kazakhstan, a man wishing to marry a woman may forcibly abduct her, after which the woman is usually obligated to marry her abductor. The man's friends and family are often complicit in the act, including actively assisting in the abduction and persuading or threatening the woman to accept the marriage. The bride is sometimes an accomplice in her own abduction (such as when she wishes to marry someone of whom her parents disapprove), but this is not always the case. Because female modesty plays an important role in a Kazakh family's honor, "whether the abduction is consensual or not, it is the abduction itself that damages the family's honour and the bride's acceptance of the marriage serves to restore that honour" (Werner, 2009, p. 316). Werner further notes "Many of the same people who... believe it is wrong for a man to abduct a woman without her consent also believe that it is wrong for an abducted woman to reject the marriage" (p. 322). The option to forcibly abduct a woman he wishes to marry, let alone to recruit his friends and family to take part in the abduction, is not an option that occurs to most men in parts of the world like the United States, who are unaccustomed to the very concept of bride abduction. Again, this is not a matter of choice evaluation. Werner (2009) tells of a Kazakh man who was dissuaded from his original intent to abduct a bride by the power of persuasive rhetoric. That the origins of options are culturally influenced pertains to the fact that the option even occurred to him in the first place.

## **CONCLUSION**

By focusing on choice behavior in the context of well-structured problems with pre-defined options, decision theorists limit the scope of their future understanding of decision processes. We cannot understand what we do not even try to study. Simon (1973) posited that it was not an overstatement to suggest that no realworld problems were well-structured in the way that experimental paradigms were – and are – generally presented. We propose that, to a large extend, problems become structured by the options that an individual considers.

Understanding how the brain generates options for decision making is a complex issue, and it is not clear that we are at all close to being able to produce a serious neural or cognitive theory. This is an open problem, and concerns neuroscientists, psychologists, economists, and anyone interested in fundamental decision making processes. Generally speaking, all behavior is decision making, and so a complete theory of behavior must account for the generation of options. We have not provided such

a theory. We have merely stated the problem, and pointed out a wide array of factors for which a complete theory would need to account. Some insights into the origins of options may potentially be gleaned indirectly from previous decision making studies that look at different types of option sets (e.g., veridical vs. adaptive decision making), but these insights are limited because such studies have not considered the generation of options directly.We hope that the explicit recognition of this problem prompts future work toward a richer understanding of a fundamental component of decision processes. Given the scientific community's accelerating knowledge of the organization and behavior of complex systems, progress toward such an understanding seems very plausible.

In some settings, an individual's choices may be so constrained by social, cultural, and environmental factors (including legal and moral factors) that the set of options is in practice common across a wide range of individuals. In these cases, the available options may be so uniform that the paradigms of traditional decision making experiments seem applicable. This, however, still begs the question concerning the internal mechanisms that generate those admittedly common options.Moreover,we believe these situations are less common that often believed. Although broad behavioral patterns of individuals are statistically quite predictable in the aggregate (Ariely, 2008; Barabási, 2010), the precise, moment-tomoment behavior of individuals in naturalistic settings is inherently unpredictable. As we discussed in our Introduction, even the apparently simple and constrained act of ordering from a restaurant menu is rife with myriad factors that influence the available options for choice.

In contrast to the currently prevailing approach in the decision sciences of experiments with *a priori* options, we note that psychological experiments in which participants are allowed to respond in any way afforded by their environments are far from non-existent. Indeed, this type of experimental design has been common practice in social psychology since the 1960s. Such experiments, however, have thus far remained largely descriptive – e.g., people in larger groups wait longer to intervene in a social emergency (Darley and Latané, 1968); physical proximity, perceived power, and individual differences influence how individuals respond to counterintuitive orders from authority figures (Milgram, 1974); deeply entrenched cultural differences influence both behavioral and physiological responses to social insults (Cohen et al., 1996).

The idea of integrating free response into a more rigorous neuroscience of human decision making is highly intriguing, though of course presents difficulties for experimental design. For example, implanted voltammetric microelectrodes have shed tremendous light on the role of dopamine in the reward-seeking behavior of free-moving rats (Phillips et al., 2003; Roitman et al., 2004), but similar experiments are obviously not feasible for human research. Bridging the gap between naturalistic behavior and rigorous scientific discovery of relevant decision mechanisms remains an important challenge.

One possible direction for future research might be in uncovering the neural bases for individual differences in option search strategies. For example, Schwarz et al. (2002) devised a scale which differentiated subjects' tendencies either to seek more options in a choice task or to prefer a limited set of options as long as one met some threshold of worth, and called those at either end of the scale maximizers and satisficers, respectively. While we don't know if satisficers and maximizers generate options in the same way, we do know they have different strategies for processing options, and that maximizers will evaluate more options when possible. These differences provide a potential starting point for understanding the neural bases for how the brain generates options. Another useful paradigm might be one that could determine whether an individual evaluated a given option (independent of final choice), or even whether two individuals consider the same options in a particular task.

We encourage researchers in the cognitive and behavioral sciences to start looking for neural mechanisms and cognitive models for the generation of options.We encourage all scientists interested in decision making to move beyond the assumptions that choices are (a) available *a priori* to the decision point, and (b) identical for all actors. We note that we have only presented a small number of options for future directions, but we are confident that creative decision scientists will generate many more.

## **ACKNOWLEDGMENTS**

This paper was greatly improved by careful readings by Bert Baumgaertner, Joshua Epstein, Kevin Hill, Emily Newton, and Jeffrey Schank. We also thank Mark Goldman for stimulating discussion, and Gabriel Mograbi and two anonymous reviewers for helpful comments.

## **REFERENCES**


creative thought as in other knowledge processes. *Psychol. Rev.* 67, 380–400.


New York: Cambridge University Press.


experiences occasioned by the hallucinogen psilocybin lead to increases in the personality domain of openness. *J. Psychopharmacol. (Oxford)* 25, 1453–1461.


Page, S. E. (2007). *The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies*. Princeton, NJ: Princeton University Press.

Paulus, M. P., and Frank, L. R. (2003). Ventromedial prefrontal cortex activation is critical for preference judgments. *Neuroreport* 14, 1311–1315.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 December 2011; accepted: 27 March 2012; published online: 11 April 2012.*

*Citation: Smaldino PE and Richerson PJ (2012) The origins of options. Front. Neurosci. 6:50. doi: 10.3389/fnins.2012.00050*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Smaldino and Richerson. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits noncommercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Cognitive processes in decisions under risk are not the same as in decisions under uncertainty

## **Kirsten G. Volz <sup>1</sup>\* and Gerd Gigerenzer <sup>2</sup>**

<sup>1</sup> Werner Reichardt Centre for Integrative Neuroscience, Tuebingen, Germany

<sup>2</sup> Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany

#### **Edited by:**

Gabriel José Corrêa Mograbi, Federal University of Mato Grosso, Brazil

#### **Reviewed by:**

Paul Cisek, Université de Montréal, Canada Christopher MacDonald, Boston University, USA

#### **\*Correspondence:**

Kirsten G. Volz, Werner Reichardt Centre for Integrative Neuroscience, Otfried-Müller-Straße 25, 72076 Tuebingen, Germany. e-mail: kirsten.volz@cin. uni-tuebingen.de

We deal with risk versus uncertainty, a distinction that is of fundamental importance for cognitive neuroscience yet largely neglected. In a world of risk ("small world"), all alternatives, consequences, and probabilities are known. In uncertain ("large") worlds, some of this information is unknown or unknowable. Most of cognitive neuroscience studies exclusively study the neural correlates for decisions under risk (e.g., lotteries), with the tacit implication that understanding these would lead to an understanding of decision making in general. First, we show that normative strategies for decisions under risk do not generalize to uncertain worlds, where simple heuristics are often the more accurate strategies. Second, we argue that the cognitive processes for making decisions in a world of risk are not the same as those for dealing with uncertainty. Because situations with known risks are the exception rather than the rule in human evolution, it is unlikely that our brains are adapted to them. We therefore suggest a paradigm shift toward studying decision processes in uncertain worlds and provide first examples.

**Keywords: as-if versus process models, neuroscience of decision making, risk and uncertainty, small world versus large world problems, heuristics**

## **RISK** 6= **UNCERTAINTY**

In 1999, Elkhonon Goldberg and Kenneth Podell distinguished between adaptive and veridical decision making. Noticing the predominance of the latter in the cognitive neuroscientific studies at that time, they concluded that new paradigms were desperately needed:

In a typical experimental paradigm used in cognitive neuroscience, one possible response is correct and others are incorrect. The determination of what is correct and what is "incorrect"is inherent in the experimental situation (external milieu) and does not require any knowledge of the organism making the choice (internal milieu). The typical experimental paradigms used in cognitive neuropsychology are deterministic and veridical. (p. 365)

With some disappointment they concluded:"Paradoxically and almost incomprehensibly, the arsenal of cognitive neuroscience is virtually completely bereft of paradigms capable of examining how adaptive (as opposed to veridical) decisions are made" (p. 366). As a result, they called for innovative experimental procedures to determine the contribution of the prefrontal lobes to adaptive decision making.

Goldberg and Podell (1996) hit on a distinction closely related to one made in economics and decision theory: the distinction between risk and uncertainty (Knight, 1921). Risk, according to Knight, refers to situation of perfect knowledge: the decision maker knows the probabilities of all outcomes for all alternatives. This makes it possible to calculate the only correct, or optimal, response. Uncertainty, in contrast, refers to situations where the probabilities cannot be expressed with any mathematical precision, neither in frequencies nor in propensities. That is, in an

uncertain world, the probabilities are unknown or unknowable. As an economist, Knight perceived this distinction to be important, since uncertainty may afford opportunities for profit that do not exist in situations where risks can be calculated (Rakow, 2010).

A related distinction was made by Savage (1954), known as the founder of modern Bayesian decision theory. Savage introduced the term "small worlds" for situations of perfect knowledge where all relevant alternatives, their consequences, and their probabilities are known for certain. According to him, these are the worlds in which Bayesian theory provides the best answer. Examples are lotteries and roulette. Small worlds need to be distinguished from "large worlds," where part of the relevant information is unknown or must be estimated from small samples, or the future is uncertain (Savage, 1954; Binmore, 2009). Examples are decisions about when to plan a picnic, whom to marry, and how to raise your kids. Decision making under uncertainty is what our brain does most of the time, while situations of known risk are relatively rare and found mostly in gambling. Savage made it very clear that applying Bayesian theory to decisions in large (uncertain) worlds would be "utterly ridiculous" (p. 16) because there is no way to know all alternatives, consequences, and probabilities. As a consequence, the brain needs strategies beyond Bayes' rule to succeed in an uncertain social and physical environment.

The distinction between risk and uncertainty has not always been recognized in cognitive neuroscience. In this article, we make a normative and a descriptive argument regarding this distinction:

1. *The best solution in a world of risk is generally not the best one in a world of uncertainty.* We argue that what the brain should do under risk does not necessarily generalize to what it should do under uncertainty.

2. *Cognitive processes in decisions under risk are not the same as in decisions under uncertainty.* We argue that cognitive processes observed under risk do not necessarily generalize to those the brain uses under uncertainty. Specifically, we argue:

*Risk:* Value-based statistical thinking (e.g., Bayesian probability updating plus utilities) is sufficient for making good decisions, provided that the problem is computationally tractable.

*Uncertainty:* Statistical thinking is no longer sufficient; heuristic thinking is required.

Much of cognitive neuroscience does not distinguish between risk and uncertainty. For instance, consider the claim, made in various forms, that the brain is Bayesian (e.g., Friston, 2010). Such a brain will likely provide optimal decisions only in small worlds, which are rare. Or consider the claim that there are two systems of reasoning: System 1, which is fast, heuristic, and prone to error, and System 2,which is slow, in keeping with the laws of probability, and rational (Sloman, 1996; Kahneman, 2011; for a critique, see Gigerenzer and Regier, 1996; Keren and Schul, 2009; Kruglanski and Gigerenzer, 2011). This two-system view does not consider that the laws of probability are sufficient for rationality in a small world only. In uncertain worlds, however, heuristics are indispensable. That is, both logic and heuristics are tools for different classes of problems. For instance, the recent financial crises illustrate that statistical tools for estimating risk, Bayesian or otherwise, failed consistently in the real, uncertain world of finance (Taleb, 2010). They are optimal when risks are known, but not in a world of uncertainty. Applying normative theories of risk to uncertain worlds can in fact lead to disasters. With respect to the financial crash of 2008, Stiglitz (2010) noted: "It simply wasn't true that a world with almost perfect information was very similar to one in which there was perfect information" (p. 243). In sum, norms derived from assuming known risks do not simply generalize to norms under uncertainty.

## **RATIONALITY OF RISK** 6= **RATIONALITY OF UNCERTAINTY**

The point that the calculus of probability can determine the best action under risk but not under uncertainty is not new; it has been made as often by statisticians as it has been forgotten by cognitive scientists. Savage (1954) devoted the first half of his seminal book *Foundations of Statistics* to Bayesian decision theory, and the second half to heuristic decisions, such as minimax (choose the option that minimizes the maximal loss). Arrow (2004) similarly writes that in uncertain, ill-specified worlds, unbounded rationality (i.e., expected utility optimization) "has no meaning at all" (p. 54). What is new are scientific demonstrations that show that applying an optimization model to an uncertain world can lead to decisions that are normatively inferior to simple heuristics (see Gigerenzer et al., 2011). Here is an illustration:

## **MEAN-VARIANCE OPTIMIZATION LEADS TO INFERIOR RESULTS IN THE REAL WORLD**

#### **Consider financial investment**

A normative theory of how to allocate money to *N* assets is Markowitz's Nobel prize-winning mean-variance model. Like all optimizing theories, it assumes a small world with perfect knowledge about the relevant parameters. Is this theory also optimal in the real, uncertain world of financial investment, where parameter values are not known for certain but need to be estimated? De Miguel et al. (2009) compared the mean-variance model with a heuristic called 1/*N*, or *equality heuristic.* The heuristic simply allocates money to *N* assets equally. The result was that 1/*N* consistently performed better in out-of-sample prediction (an elementary form of uncertainty). Cross-validation is a prime example of out-of-sample prediction: the data is divided into two complementary subsets: the in-sample data set, which is used for fitting the parameters of the competing models and an out-ofsample data set, which is used for testing how well the models predict (see also below). Note that in data fitting, that is, when all data are known, the optimizing model always wins, but not in prediction. None of 12 other optimization models, Bayesian or otherwise, could consistently predict better than the simple heuristic.

This result contradicts the widespread view that heuristics are always second best to logic and statistical optimization models. This view makes no distinction between risk and uncertainty. Researchers in this tradition have evaluated people's reliance on 1/*N* negatively and attributed it to their cognitive limitations. However, ignoring part of the information is what makes heuristics robust for the unknown future, whereas by trying to integrate all information and estimate the weights, complex strategies such as the mean-variance portfolio suffer from overfitting the past. The mathematically sophisticated reader who wants to understand why and when simple heuristics can be more accurate than complex statistical methods will find an answer in the bias-variance dilemma (Gigerenzer and Brighton, 2009).

#### **THE ECOLOGICAL RATIONALITY OF SIMPLE HEURISTICS**

The fact that simple heuristics often outperform "optimization" models in situations of uncertainty has been demonstrated many times over (see Czerlinski et al., 1999; Gigerenzer and Brighton, 2009; Gigerenzer et al., 2011). In order to deal with an uncertain world, the brain relies on an *adaptive toolbox* of heuristics. Accordingly, intelligence is defined as the degree of knowing in which situation to use which heuristic. The scientific study of this normative question is called the study of the *ecological rationality* of a heuristic. For instance, 1/*N* tends to outperform mean-variance optimization in situations where predictive uncertainty is high (stocks are hard to predict), the number of options *N* is large (the optimization models have to estimate more parameters which leads to more error), and the sample size is relatively small. In uncertain worlds with these features, 1/*N* can be expected to be both faster and more accurate than the mean-variance optimization. When would mean-variance outperform 1/*N*? De Miguel et al. (2009) estimated that with 50 assets, one would need some 500 years of stock data before the optimization model is profitable.

Humans rely on the 1/*N* heuristic not only for financial investment. Many parents who have two or more children try to distribute their time and love equally. For three or more children, this heuristic paradoxically predicts interesting inequalities in the long run because the first and last-born get more time, dependent on the spacing between births. Tests have provided empirical evidence for these predictions (Hertwig et al., 2002). In many situations, fairness und justice are achieved by distributing resources equally.

Our normative argument has fundamental consequences for the neuroscience of decision making: Claims that the rational brain always works by Bayesian calculations are founded on the assumption that what is rational in a world of risk is also rational in an uncertain world – the world our brain has to deal with most of the time. These claims are also incompatible with three well-known restrictions: Bayesian optimization is not feasible if (i) the choice alternatives are not known for sure, (ii) the mind has more than one goal, and (iii) even if all alternatives were known and the mind had only one goal, the calculations can quickly become computationally intractable, that is, no mind can actually perform them in a lifetime (Gigerenzer, 2004). Bayesian inference works in small worlds where there are reliable data for probabilities and only a few alternatives and cues.

## **COGNITIVE PROCESSES IN SITUATIONS OF RISK** 6= **PROCESSES IN SITUATIONS OF UNCERTAINTY**

In the previous section, we argued that what is optimal in a world of risk is typically not the best in a world of uncertainty. Consequently, an adapted brain relies on different processes according to the situation. When faced with risk, using heuristics is of little value, unless the computations become too difficult. When faced with uncertainty, using logic and statistics is of little value, unless the part of the problem that is known is being calculated.

We would like to emphasize the importance of the distinction between risky and uncertain worlds for the neuroscientific investigation of decision making. So far, its focus has been on small world problems. But just as normative results from studying cognition in small worlds do not automatically generalize to what people should do in uncertain worlds, we cannot be sure that descriptive results generalize either. Influenced by small-word theories of decision making, neuroscientists, and neuroeconomists have nevertheless relied heavily on the "gambling paradigm" as a model for exploring the neural correlates. In a typical neuroeconomic paradigm, participants are presented with the choice between two options, Option A and Option B, which differ with respect to objective dimensions such as the magnitude and the probability of reward (as assigned by the experimenter). Reward is largely defined as monetary value, which the participant will receive after the functional session. These problems require entirely different skills and strategies than decisions under uncertainty. For example, although calculating the expected value might suffice for a lottery, it will not be sufficient for deciding whether to be vaccinated against swine flu, which share to buy, or whom to marry.

Results such as the finding that "activity in the ventral striatum during the evaluation of monetary gambles is non-linear in probabilities in the pattern predicted by prospect theory" (Hsu et al., 2009, p. 2231) may capture the neural activation pattern when comparing gambles. Concluding that this activity pattern will also be observed when searching for jobs or mates, however, is not warranted. The pattern predicted by prospect theory in fact disappears and even reverses when the probabilities are not provided by the experimenter but the participant instead has to learn these from experience, a phenomenon known as the description-experience gap (Hertwig and Erev, 2009). Nor do findings from small worlds

easily translate into a cognitive process model, that is, testing for the neural correlates of some form of utility model cannot elucidate the cognitive mechanisms in large world problems. In the words of Colombo and Seriès (2012): "that the brain *is* a Bayesian machine does not follow from the fact that Bayesian models are *used* to study the brain and the behavior it generates" (p. 2).

In an uncertain world, there is broad experimental evidence that humans and other animals rely on a toolbox of heuristics. These are based on evolved and learned core capacities and include (for details, see Gigerenzer and Gaissmaier, 2011):


What would a neuroscientific investigation of heuristic decision making look like? One approach is to study the neural correlates of heuristic processes, such as search rules, stopping rules, and decision rules (e.g.,Volz et al., 2006, 2010; Khader et al., 2011; Rosburg et al., 2011). In what follows, we provide two illustrations for how to go beyond lotteries and study the neural correlates of the use of cognitive heuristics in an uncertain world.

## **NEURAL CORRELATES OF HEURISTIC DECISIONS IN UNCERTAIN WORLDS**

Note that studying decision making under uncertainty (as opposed to risk) does not require squeezing the complexity of the large world into the laboratory. It simply requires studying tasks where not all alternatives, consequences, and probabilities are known for sure or provided by the experimenter.

#### **RECOGNITION HEURISTIC**

Consider a simple heuristic that humans and other animals use to make inferences about an uncertain world (Goldstein and Gigerenzer, 2002):

*Recognition heuristic:* If one of two objects is recognized and the other is not, then infer that the recognized object has the higher value with respect to the criterion.

<sup>1</sup>The fluency heuristic is a simple heuristic that can be used to exploit recognition memory and is defined in the following way: If two objects are recognized, and one of objects is more fluently retrieved, then infer that this object has the higher value with respect to criterion; where retrieval fluency is defined as how long it takes to retrieve a trace from long-term memory (c.p. Schooler and Hertwig, 2005).

<sup>2</sup>Fast-and-frugal decision trees are simple rules for categorization; they are fastand-frugal since they allow a classification decision at each level of the tree (c.p. Martignon et al., 2011). For binary predictors, a fast-and-frugal tree has *n* + 1 exists, while a full tree has 2*<sup>n</sup>* exits. An example is the Simple Triage and Rapid Treatment (START) procedure, which is used to categorize patients into those who need immediate medical treatment and those whose treatment can be delayed (Super, 1984). By using the START, "a paramedic sequentially checks up to five diagnostic cues to decide which category a person falls into; a decision can be made after each cue is checked" (Luan et al., 2011, p. 316). By using such a simple and transparent decision tree, the decision maker/paramedic does not need to search for and integrate all the relevant cues so as to reach a sound decision.

For instance, consider the question whether Milan or Modena has more inhabitants. If one has heard of Milan but not of Modena, the inference is that Milan is the larger city. Note that the RH requires semi-ignorance to be *applicable*, meaning that if one has heard of both (or neither) objects, it will not be effective. Experimental studies indicate that a large proportion of subjects rely on it in uncertain situations, such as when predicting which tennis player will win in Wimbledon or which political candidate to vote for, and by animals when choosing food (Gigerenzer and Goldstein, 2011). These studies report a substantial correlation between the proportion of judgments that follow the RH and the validity of recognition for the task, suggesting an adaptive use of the heuristic.

There are two competing hypotheses in the literature: that people use the RH in an adaptive or in an automatic way. The adaptive use requires two processes. The first assesses whether or not the alternatives are recognized and hence whether the RH *can* be applied in principle. The second process assesses whether the RH *should* be applied, which is essentially a judgment about the heuristic's ecological rationality, that is, the match between mind and environment. In contrast, the automatic use entails only one process: automatically choosing the recognized alternative, without considering why recognition should be predictive of the criterion. Such an automatic strategy would also be successful for the Milan–Modena question, where recognition is so highly correlated with city size.

In 2006, we tested these hypotheses with the help of functional magnetic resonance imaging (fMRI; Volz et al., 2006). To see whether RH-based decision processes depend on additional judgments of ecological rationality, which should draw on brain areas beyond those known to reflect recognition memory processes, we ran two experiments. In experiment 1, participants were presented with the names of two cities and asked to indicate which city in each pair is larger (recognition plus inference). In experiment 2, participants were presented with the names of two cities and asked to indicate which city they knew in each pair (recognition only). Comparing the activation results of the two experiments, we found that decision processes in both experiments 1 and 2 drew on medial parietal areas, which are assumed to reflect recognition memory processes. In contrast, specifically RH-based decision processes (in experiment 1) drew additionally on the anterior medial prefrontal cortex, which is taken to reflect judgments of the ecological rationality of the RH in terms of assessing one's own sense of recognition. Thus, RH-based decision processes go beyond automatically choosing the recognized alternative and are guided by judgments about the ecological rationality of the RH, as reflected by activation in anterior medial prefrontal cortex.

The study illustrates how fMRI can be used to compare competing hypotheses about the selection of heuristics: here, hypotheses on automatic versus adaptive use.

#### **TAKE-THE-BEST HEURISTIC**

The RH draws on the core capacity of recognition of names, faces, or other stimuli. If both objects are recognized, the RH is not applicable, but the take-the-best heuristic (TTB) is. Like the RH, take-the-best models how people infer which of two objects has a higher value on a criterion based on cue values retrieved from memory (Gigerenzer and Goldstein,1996). The heuristic is defined by three building blocks:

#### *Take-the-best heuristic:*


Thus, according to this cognitive process model, information search is terminated as soon as a cue discriminates between the alternatives; other cues are not activated. For instance, if a person has heard of both Milan and Modena and recalls that Milan is a state capital (the most valid cue) but Modena is not, that person would stop search for further cues and infer that Milan has the larger population.

Note that take-the-best implies a lexicographic step-by-step process with limited search. This process is quite different from weighting-and-adding all cues, which is assumed in models that postulate the integration of all cues, such as in value-based decision models. Experimental studies have provided strong evidence that many people's memory-based inferences are consistent with the predictions of take-the-best (and inconsistent with those of adding-and-weighting models) in situations where its use is ecologically rational (e.g., Rieskamp and Otto, 2006; Bröder, 2011). Specifically, experts appear to rely on simple search and stopping rules more often than novices (Garcia-Retamero and Dhami, 2009).

Can cognitive neuroscience provide evidence for the hypothesis of limited search, as defined in the stopping rule of take-the-best? Khader et al. (2011) used fMRI to test the assumption that heuristics simplify decision making by activating long-term memory representations of only those attributes that are necessary for the decision, since it is unclear from behavioral studies alone whether using heuristics is indeed associated with limited memory search (with the exception of reaction time studies; see Bröder and Gaissmaier, 2007). Accordingly, the authors monitored the activation of specific long-term memory representations while participants made memory-based decisions using the take-the-best heuristic.

Khader et al. (2011) taught their subjects to make decisions using the TTB heuristic while measuring their hemodynamic response. Particularly, they let their participants first learn by trial and error to associate each of 16 fictional company names with a specific stimulus pattern of four binary cues (objects, houses, locations, faces). Then, participants learned how to make decisions using the TTB heuristic for a fictional job selection scenario (i.e., which of two applicants is more suitable for a job). Thereafter, participants learned by trial and error (i) the importance of the different attributes for predicting which of two companies would be more successful, e.g., the attribute hierarchy objects > houses > locations > faces; and (ii) which stimulus was predictive of higher success, that is, the attribute direction. In each phase, participants learned until they satisfied a criterion.

In the actual decision making task, participants were presented with only the names of two companies and then had to infer, by using the TTB heuristic, which company will be more successful in the next year. To do so, participants had to retrieve all the relevant attribute information from long-term memory. The attributes with which the two companies were described consisted of visual information known to be represented in different parts of the posterior cortex, e.g., in a face-specific and in a house-specific region of interest. That allowed the authors to examine activation within these regions of interest as a function of the number of the to-be-retrieved attributes. Given the cognitive process model of the TTB heuristic, Khader et al. (2011) expected the activation in the regions of interest to be systematically modulated by the relative importance of the information for making a decision. Their specific analyses revealed a controlled retrieval shown by a selective boosting of activation, specifically in those regions that represent the attributes that were relevant for the decision. For example, activation strongly increased in the face-specific region solely in those trials in which faces were relevant for the decision. Furthermore, a prolonged response to an attribute was found only when it was relevant late in the decision process, when the attribute was low in importance. All in all, the data showed a "selective modulation of neural activation that follows the retrieval order according to TTB" (p. 11), which the authors take to support the notion of controlled retrieval processes.

Thus, by using fMRI the authors could provide evidence in favor of the cognitive process model's prediction for the decision phase, i.e., using the TTB heuristic is indeed associated with a controlled activation of decision-relevant attribute representations. As in the case of the RH, the imaging study was used to compare two competing models, exhaustive search as assumed in standard weighting-and-adding models and limited search as defined by take-the-best.

## **STUDY THE NEURAL CORRELATES OF PROCESS MODELS, NOT AS-IF MODELS**

Why are experimental studies with situations of known risk, such as lotteries, so popular? This is especially puzzling given that lotteries, roulette, and other tasks with known risks are quite a recent in human history. One answer is that they facilitate application of statistical optimization models, such as expected utility and Bayesian updating, which are also quite recent achievements in human history. However, studies of cognitive processes have provided little evidence that the mind engages in expected utility calculations during decision making; instead, there is reliable evidence for the use of heuristics (see Ford et al., 1989; Payne et al., 1993; Friedman and Sunder, 2011). For instance, Friedman and Sunder (p. 1) concluded in their review of the literature on risky choice from 1950 to 2010:

No such functions [utility or similar Bernoulli functions] have yet been found that are useful for out-of-sample predictions. Nor do we find practical applications of Bernoulli functions in major risk-based industries such as finance, insurance, and gambling.

The important methodological concept is "out-of-sample prediction." Expected utility theory or its variants such as prospect theory can easily fit their parameters to the data after the fact, but the real test is in prediction, not fitting. "Out-of-sample" means that the parameters of a model, heuristic, or optimizing, are fitted to one part (e.g., half) of the sample and the other part is tested. This is an elementary form of uncertainty, where not all data are known. Most importantly, neoclassical economists have never claimed that the brain computes expected utilities but explicitly emphasize that optimization models do not describe the cognitive process. Following Friedman's (1953) as-if methodology, economists consider these models only as tools for prediction, making deliberately "wrong" assumptions that are mathematically convenient. Unfortunately many cognitive neuroscience studies appear to be unaware of this conceptual problem and search for the neural correlates of "as-if" models.

## **CONCLUSION**

We distinguished between two kinds of problems humans face: worlds of *risk* or worlds of *uncertainty*. In a world of risk (small world), all relevant alternatives, their probabilities, and their consequences are known for sure and the future is certain. In contrast, in a world of uncertainty (large world) part of the information is unknown or has to be estimated from small samples, and surprises can happen. The second distinction we introduced is between*what* decisions people make (the outcome) and *how* they make them (the process). Answering the first question leads to *as-if* models; answering both questions leads to *process* models. We argue that the two distinctions are correlated: As-if models tend to match small world studies, whereas process models tend to match large world studies.

We pointed out the strong focus on decision making under risk in neuroscientific studies, which pay little attention to how the brain makes adaptive decisions in an uncertain world. That becomes problematic when the normative and descriptive results are generalized to how the brain deals with an uncertain world. In addition, we provided evidence that the normative solution under risk is not the best one under uncertainty. We also provided evidence that the cognitive processes for decisions in a world of risk are not the same as in a world of uncertainty. The study of behavior in lotteries – and other small world tasks – does not address the question of how humans make decisions when the conditions for rationality postulated by the model of neoclassical economics are not met, a question emphasized by Simon (1989). In large worlds, people cannot optimize but instead "satisfice" by relying on the brain's adaptive toolbox.

In sum, the current focus of cognitive neuroscience studies on situations where all risks are known and optimization is possible imposes limits on the understanding of adaptive brain processes, both normatively and descriptively. The neural correlations of cognitive processes such as heuristic search, stopping rules, and aspiration levels have little chance of being detected and may even be taken for correlates of expected utility and other as-if theories.

## **REFERENCES**


under risk is nonlinear in probabilities. *J. Neurosci.* 29, 2231–2237.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 February 2012; accepted: 22 June 2012; published online: 12 July 2012. Citation: Volz KG and Gigerenzer G (2012) Cognitive processes in decisions under risk are not the same as in decisions under uncertainty. Front. Neurosci. 6:105. doi: 10.3389/fnins.2012.00105*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Volz and Gigerenzer. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The neurobiology of decision-making and responsibility: Reconciling mechanism and mindedness

## *Michael N. Shadlen1\* and Adina L. Roskies <sup>2</sup>*

<sup>1</sup> Department of Physiology and Biophysics, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA <sup>2</sup> Department of Philosophy, Dartmouth College, Hanover, NH, USA

*Edited by:*

Carlos Eduardo De Sousa, Northern Rio de Janeiro State University, Brazil

#### *Reviewed by:*

Dario L. Ringach, University of California Los Angeles, USA Carlos Eduardo De Sousa, Northern Rio de Janeiro State University, Brazil

#### *\*Correspondence:*

Michael N. Shadlen, Department of Physiology and Biophysics, Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195-7290, USA. e-mail: shadlen@uw.edu

## **INTRODUCTION**

As neuroscience begins to expose the brain mechanisms that give rise to decisions, what do the assortment of facts tell us about such philosophical concepts as responsibility and free will? To many, these concepts seem threatened because of an inability to reconcile a truly free choice with either deterministic brain mechanisms on the one hand or stochastic effects on the other. The former seem to negate the notion of choice by rendering it predictable, at least in principle, or as being under the control of forces external to the agent. The latter reduces choice to caprice, a weak freedom that precludes any meaningful assignment of responsibility. In this essay, we offer an alternative perspective that is informed by the neural mechanisms that underlie decision-making.

Some of these mechanisms point to features that distinguish agents from each other and allow us to understand why one agent might make a better or worse choice than another agent. We suggest that more attention be paid to these aspects of decisionmaking, and that such attention may help bridge the neurobiology of decision-making (NBDM) and philosophical problems in ethics and metaphysics. Our idea is not that the neurobiology supports one particular philosophical position, but that certain principles of the NBDM are relevant to ethicists of many a philosophical persuasion.

## **NEUROSCIENCE AND THE PHILOSOPHY OF FREEDOM AND RESPONSIBILITY**

**Figure 1** shows in broad brushstrokes the main philosophical positions regarding free will. In the philosophical literature, theorists can be classified according to the relation they see between the truth of determinism and the possibility of freedom.

To some, the more knowledge we have about the workings of the brain, the less it seems possible that we exercise free will when we make choices, and the less it seems that we can be held responsible

This essay reviews recent developments in neurobiology which are beginning to expose the mechanisms that underlie some elements of decision-making that bear on attributions of responsibility. These "elements" have been mainly studied in simple perceptual decision tasks, which are performed similarly by humans and non-human primates. Here we consider the role of neural noise, and suggest that thinking about the role of noise can shift the focus of discussions of randomness in decision-making away from its role in enabling alternate possibilities and toward a potential grounding role for responsibility.

**Keywords: free will, responsibility, lateral intraparietal area, motion perception, compatibilism, determinism, noise, policy**

> for our decisions (Crick, 1994; Schall, 2001; Greene and Cohen, 2004; Glimcher, 2005). It is not just the physicalist concept that the mind is the brain, but that as we come to understand more about how the brain gives rise to choices, mechanisms seem to displace freedom. At least some philosophers and many neuroscientists wonder whether moral responsibility is something that we would reject if we knew everything about the machinery of the human brain. They worry that the neuroscience of decision-making will render concepts like free will and responsibility "quaint fictions" – although perhaps essential ones that we rely upon as social agents. As the NBDM exposes the mechanisms that underlie choice behavior, our agency seems to be replaced by a machine that converts circumstances into an outcome without any real choice at all. NBDM is thus perceived by many as supporting "hard determinism"1: rendering the cause–effect chains with a modern brush.

> Compatibilists argue that determinism does not strip the agent of choice, responsibility, or freedom (Frankfurt, 1971; Strawson, 1974; Hume, 1975/1748; Dennett, 1984; Bok, 1998; Blackburn, 1999; James, 2005/1884). Indeed, some compatibilists deny that the practice of ethics, and the concept of responsibility which it presupposes, depends upon any reconciliation of human action with fundamental physics for justification (Strawson, 1974; Williams, 1985). Even if one adopts a compatibilist view, however, there is no particular reason to exclude neuroscientific facts from ethical discussion. Although neuroscience is not foundational to ethics, it has the potential to illuminate capacities and limitations of a decision maker. Capacities like impulsivity and rationality are obvious examples.

<sup>1</sup>Hard determinism is the position that (a) freedom is incompatible with determinism, and (b) determinism is true. Many hard determinist arguments are based on the inexorability of causal chains and the resultant lack of ability to do otherwise. For more definitions of technical terms, see **Box 1**.

#### **Box 1 Some definitions.**

Physicalism: The thesis that all that exists is physical or supervenes on the physical.

Reductionism: The thesis that all complex systems can be explained by explaining their component elements.

Emergent properties: New properties that arise in a complex system as a result of low-level interactions.

Eliminativism: The view that the terms we use to describe a domain are either redundant or in error and thus could be eliminated from our discourse.

Neural noise: Variability in neural signal not tied to the signaling function of the neuron.

Compatibilism: The thesis that free will is compatible with determinism.

Incompatibilism: The thesis that free will is incompatible with determinism.

Hard determinism: The view that free will is incompatible with determinism, determinism is true, and we therefore lack freedom. Libertarian free will: Free will dependent on indeterminism; the main idea is that indeterminism allows agents to break free of the chain of causation.

In this essay, we explain why we think that neuroscience reveals aspects of decision-making with the potential to illuminate our conception of ethical responsibility.

Like nearly all neuroscientists, we accept physicalism. All matters mental are caused by brains. This leaves open the possibility that not every aspect of our thoughts and feelings can be adequately expressed in reductionist terms. We leave open the possibility of emergent phenomena: properties that arise from simpler causes but which are not explained away by them2.

Our goal is to demonstrate a correspondence between neural mechanisms and some elements that compatibilists have long suspected must be present. Rather than "explain away" free will, the neurobiology enhances our conception of ourselves as having will, agency, authorship, and real options. In the end, we hope to convince a certain kind of compatibilist that neuroscience matters in ways that he tends to miss because he is so focused on dismissing the entire body of physical knowledge wielded by the hard determinist to argue against freedom and responsibility. And we hope to convince the incompatibilist that neuroscience can be explanatory without rendering responsibility and free will quaint but illusory.

#### **FREE WILL AND RESPONSIBILITY**

Having free will minimally implies that when I choose A (i) I do so with some degree of autonomy, and (ii) in some sense, I could have made another choice3. The first condition implies ownership of the choice. My choice cannot be explained *entirely* by forces outside the ones I control as an agent. The second means that there is a real alternative and that I could choose that alternative. Our arguments here will focus on the former condition, although they have some impact on the second as well.

Most people take it that moral responsibility implies freedom: One can only be responsible if one is free. For someone to be held responsible for an action, they must be, in some sense, a cause of that action. Moreover, assignment of moral responsibility is relative to the properties of a decision maker or agent. This invites us to explain a relevant part of the decision as depending fundamentally on properties of the deciding agent. The relevant properties are, loosely, what we refer to as constitution, temperament, values, interests, passions, capacities, and so forth. In our discussion of the neurobiology, we will refer to such properties as policies that govern parameters of the decision-making process, such as the tradeoff between speed and accuracy.

<sup>2</sup>A highly intuitive discussion of emergence can be found in Gazzaniga's (2011) recent book. The concept of emergence is a matter of ongoing debate in philosophy (e.g. see Kim, 2010 for discussion). Our arguments do not depend on a metaphysically demanding notion of emergence, but rather on a weak notion of emergence that prevents radical eliminativism of high-level properties.

<sup>3</sup>There is considerable philosophical dispute about in what sense (ii) must be true, and some have argued that it is not essential, or that it can be true even if there is only one way the world can evolve (Frankfurt, 1971; Dennett, 1984;Vihvelin, 2012).

#### **PREDICTABILITY AND DETERMINISM**

Some scientists might conceptualize the problem a little differently than the organization depicted in **Figure 1**, but the same basic elements are present: causes and effects, randomness and predictability (Crick, 1994; Schall, 2001; Glimcher, 2005). Many neuroscientists, physicists, and mathematical theorists subscribe to the following position: they are (1) physicalists who (2) believe the mental is explained by a physical brain through chains of causation, but (3) they also embrace some elements of randomness. The randomness can be fundamental indeterminism, based on principles of quantum mechanics, or it can be uncertainty that arises from complexity in a deterministic system whose quantum effects are negligible. This randomness implies that an agent's choices are not practically predictable from the history of events or the state of the brain beyond probabilistic expectations.

Libertarians deny that freedom is compatible with determinism, but believe that indeterminism is true and makes freedom possible (Kane, 2002). Many scientists likewise deny that freedom is compatible with determinism, and reject the notion that the universe (or brain) is determined, because of the likelihood of randomness. However, unlike Libertarians, they reject the idea that randomness confers freedom or responsibility. Let us call them "Scientific Hard Incompatibilists" or SHIs. SHIs think the sources of randomness provide the basis for a physical understanding of the unpredictable, and recognize that in the real world even deterministic processes are coupled with a randomness that muddies the deterministic machinery from the perspectives of both actor and the observer. Prediction is imperfect. Choices can be dissected into determined and random components (necessity and chance). However, and perhaps ironically, SHIs also believe that randomness cannot confer free will and responsibility. There is no"willing" and certainly no responsibility for a choice that is explained only by randomness. In this sense, "Chance is as relentless as necessity" (Blackburn, 1999). Therefore, the SHI concludes that free will and responsibility are illusory.

By focusing on the question only of whether low-level deterministic or indeterministic processes make room for free will and responsibility, we believe SHI's dissection leaves out something essential. As explained in the next section, the neurobiology invites us to view uncertainty not so much as it bears on predictability but on the strategy that an agent adopts when making a choice in the face of uncertainty. The neurobiology sheds light on how these strategies are implemented and therefore why one decision maker may make one choice, whereas another individual may choose differently.

## **NEUROBIOLOGY OF DECISION-MAKING**

Here we provide a brief and highly selective review of some findings in neuroscience about the neural bases of decision-making. Neurobiology is beginning to illuminate the mechanisms that explain why one agent makes one choice, whereas another would choose differently.We discuss the role of randomness in explaining such choices. The role that this randomness plays in our argument is not to confer freedom but to necessitate high-level policies regarding decisions. Although these policies themselves do not immediately provide conceptual grounding for responsibility, they provide a potential locus for philosophical arguments linking the nature of the agent to his or her decisions.

### **SIMPLE DECISIONS**

A decision is a commitment to a proposition or plan, and a decision process encompasses the steps that lead to this commitment, what is often termed deliberation among options. These options may take the form of actions, plans, hypotheses, or propositions. Most decisions are based on a variety of factors: evidence bearing on prior knowledge about the options, prior knowledge concerning the relative merit of the options, expected costs and rewards associated with the matrix of possible decisions and their outcomes, and other costs associated with gathering evidence (e.g., the cost of elapsed time). This formulation is not exhaustive, but it covers many types of decisions, ranging from simple to complex. Because the elements listed in this paragraph play a role in simple decisions as well as complex ones, it is possible to study the NBDM in non-human animals, including our evolutionarily close relatives, monkeys. This research has begun to expose basic principles that are applicable to the more complex decisions we make in our lives, including those for which we can be held morally responsible.

The process of deciding generally has a beginning and an end. For perceptual decisions about the direction of motion, like the one depicted in **Figure 2A,B**, the onset of a random-dot visual stimulus marks the beginning of the decision process. Of course, other aspects of the decision process are already in play before this. They might be lumped together as establishing the rules of engagement from various contextual cues: something in the brain establishes that a decision is to be made in the first place, that the source of information resides in a region of the visual field, that the useful information is encoded by a set of neurons in the visual cortex, and that the mode of response will be an eye movement to a target. The stream of information from the stimulus is processed by specific regions in visual cortex, which supply a stream of evidence to downstream processes. This momentary evidence furnishes a fresh piece of information at each instant that bears on the decision process. These bits of evidence are accumulated until there is enough to render a decision. Mainly for convenience, we term this commitment point the end of the decision process. After that, either an action ensues to communicate or enact the decision, or there is some delay during which such an action is planned (the occasional change of mind is understood as a second decision process; Resulaj et al., 2009).

We know much about the neurobiology underlying this type of simple decision. The stream of momentary evidence comes from neurons in the visual cortex, concentrated in an area of the macaque brain called the middle temporal visual area (MT; also known as V5). These neurons respond better when motion through their receptive fields is in one direction and not in another. They have a background discharge, which is modulated by the random-dot motion stimulus. If the neuron prefers rightward motion, then it tends to produce action potentials at a faster rate when motion is to the right than when it is to the left. When the decision is difficult – that is, when only a small fraction of randomly appearing dots actually move to the right at

**FIGURE 2 | Neural mechanism of a decision about direction of motion. (A)** Choice-reaction time (RT) version of the direction discrimination task. The subject views a patch of dynamic random dots and decides the net direction of motion. The decision is indicated by an eye movement to a peripheral target. The subject controls the viewing duration by terminating each trial with an eye movement whenever ready. The gray patch shows the location of the response field (RF) of an LIP neuron. **(B)** Effect of stimulus difficulty on choice accuracy and decision time. Solid curves are fits of a diffusion model, which accounts simultaneously for choice and decision time. **(C)** Response of LIP neurons during decision formation. Average firing rate from 54 LIP neurons is shown for three levels of difficulty. Responses are grouped by motion strength and direction of choice, as indicated. Left graph, The responses are aligned to onset of random-dot motion and truncated at the median RT. These

responses accompany decision formation. Shaded insert shows average responses from direction selective neurons in area MT to motion in the preferred and anti-preferred directions (solid and dashed traces, respectively). After a transient, MT responds at a nearly constant rate. The LIP firing rates approximate the integral of a difference in firing rate between MT neurons with opposite direction preferences. Right graph, The responses are aligned to the eye movement. For Tin choices (solid curves), all trials reach a stereotyped firing rate before saccade initiation. We think this level represents a threshold or bound, which is sensed by other brain regions to terminate the decision. **(D)** Responses grouped by RT. Only Tin choices are shown. Arrow shows the stereotyped firing rate occurs <sup>∼</sup>70 ms before saccade initiation. Adapted with permission from Gold and Shadlen (2007) insert from on line-data base used in Britten et al. (1992), www.neuralsignal.org data base nsa2004.1.

any moment – the same neuron increases its discharge albeit less vigorously. When the motion is leftward but not strongly coherent, the neuron also increases its discharge, though now to an even lesser degree. For strong motion to the left the neuron would typically discharge at the background rate or possibly slightly below. The mechanisms for extracting this momentary evidence about direction are reasonably well understood (Born and Bradley, 2005). It is also clear from lesion and microstimulation experiments that these MT neurons supply this evidence to the decision process (for reviews, see Parker and Newsome, 1998; Gold and Shadlen, 2007).

The decision on this task benefits from an accumulation of evidence in time. The direction selective sensory neurons described in the previous paragraph do not accumulate evidence (**Figure 2C**, inset). Their responses represent the momentary information in the stimulus. Other neurons, which reside in association cortex, represent the accumulation of this momentary evidence. A key property of neurons in these areas – the vast majority of the cortical mantle in primates – is the capacity to maintain discharge for longish periods in the absence of an immediate sensory stimulus, or an immediate motor effect. The exact parameterization of "longish" is not known, but it is at least in the seconds range. This is in marked contrast to sensory neurons like the ones discussed above, which keep up with a changing environment (tens of milliseconds) or motor neurons, which cause changes in body musculature on a similar timescale. Indeed it is likely that this flexibility in timescale underlies many of the higher cognitive capacities that we cherish.

Some of these neurons in association cortex produce firing rates that reflect the accumulated evidence from the motion stimulus. For example, neurons in the lateral intraparietal area (LIP) in parietal cortex respond to visual stimuli in a restricted portion of space, termed the response field (RF), but they also respond when the RF has been cued as a potential target of an eye movement. These neurons "associate" information from vision with plans to look (Gnadt and Andersen, 1988; Andersen, 1995; Mazzoni et al., 1996; Colby and Goldberg, 1999; Lewis and Van Essen, 2000). During the decision process, these LIP neurons represent the accumulated evidence that one of the choice targets is a better choice (given the task) than the other. While MT neurons are producing spikes at a roughly constant rate, neurons in LIP gradually increase or decrease their rate of discharge as more evidence mounts for or against one of the choices. If the stimulus is turned off and a delay period ensues, MT neurons return to their baseline firing rates, but LIP neurons, whose response fields contain the chosen target, emit a sustained discharge that indicates the outcome of the decision-effectively, a plan to make an eye movement to that target.

When the decision maker is permitted to answer at will, the LIP neurons also lend insight into the mechanism whereby the decision terminates. As shown in **Figures 2C,D**, the decision ends when the firing rates of certain LIP neurons achieve a critical level. Whether the decision was based on strong or weak evidence and whether the process transpired quickly or not, the LIP responses achieve the same level of discharge at the moment of decision. This is an indication that there is a threshold for terminating the decision process. Since the LIP firing rate represents the accumulation of momentary evidence, the termination "rule" is to commit to a choice when the accumulated evidence reaches a critical level. For example, the rule might be: if the rightward preferring MT neurons have produced <sup>∼</sup>6 spikes per neuron more than the leftward preferring neurons, choose right; else if the leftward preferring MT neurons have produced <sup>∼</sup>6 spikes per neuron more that the rightward preferring neurons, choose left; else continue to accumulate evidence. This implies that LIP neurons are effectively computing the integral of the difference in firing rates between rightward and leftward preferring MT neurons (Ditterich et al., 2003; Huk and Shadlen, 2005; Hanks et al., 2006; Wong et al., 2007).

This mechanism of accumulation of evidence to a threshold level is called bounded accumulation (or bounded drift–diffusion, or random walk to bound; **Figure 3**). The idea was developed in the 1940s as a statistical process for deciding between alternatives (Wald, 1947), and it played a key role in British wartime code-breaking (Good, 1979). It has found application in areas of sensory psychology (Link, 1992) and cognitive psychology (Ratcliff and Rouder, 2000; Usher and McClelland, 2001; Bogacz et al., 2006). In all of these cases the threshold for terminating the decision process, what we will call the "bound," controls both the speed and the accuracy of the decision process (e.g., **Figure 2B**). This tradeoff is an example of a *policy* that the brain implements to shape its decisions.

Typically, when a stream of evidence is available, a decision maker will tend to make fewer errors if she takes more time. In the motion task, it appears that this is achieved by raising the level of the bound for terminating the decision process. This simple adjustment to the mechanism leads to longer decision times and to more reliable evidence at the point of termination (Palmer et al.,

rates between pools of direction selective neurons that prefer right and left. At each moment, this difference is a noisy draw from a Gaussian distribution with mean proportional to motion strength. The mechanism extends to account for choices and RT when there are more than two alternatives. Reprinted with permission from Gold and Shadlen (2007).

2005). In the case of the motion experiment, the policy is establishing the tradeoff between speed and accuracy of the direction judgments. The resultant payoff is something like the rate over which reward is obtained and errors avoided (Gold and Shadlen, 2002; Bogacz et al., 2006).

The neurobiology underlying the setting of the bound (and detecting that the accumulation in LIP has reached the bound) is not currently known; it ought to be an area of intense study. The most promising candidate mechanisms involve the basal ganglia. These structures seem to possess the requisite circuitry to terminate the decision process based on a threshold crossing and to adjust the bound based on cues about how the current "policy"for making decisions is paying off (Bogacz et al., 2006; Lo and Wang, 2006).

Neurobiology supports the view that a decision process balances evidence gathering with other "policy"factors. Other factors that affect simple decisions also assert themselves in the negotiation between evidence and bound. These include valuation of – or relative weight assigned to – (i) potential rewards and punishments associated with success and failure (for reviews, see Sugrue et al., 2005; Padoa-Schioppa, 2011), (ii) prior knowledge in the absence of new evidence about which of the alternatives is likely to be correct (Hanks et al., 2011), (iii) social and emotional factors, and (iv) the passage of time itself. Elapsed time is associated with opportunity costs and alters the value of an expected reward. There may also be a deadline to complete a decision by a certain point in time. Interestingly, the neurons that encode accumulated evidence in the motion task also encode elapsed time (Leon and Shadlen, 2003; Janssen and Shadlen, 2005; Maimon and Assad, 2006) in a way that incorporates the sense of urgency in the decision process (Drugowitsch et al., 2012). We suggest that increased attention to these elements, and their role in decision-making will provide insight into the active role of the agent in shaping decision processes.

#### **NOISE**

The picture we have painted thus far captures some of the important neurobiological determinants of decisions, but an important aspect has been left out. That is the issue of noise.

The mechanisms outlined so far are causal mechanisms, and as such one might think that these mechanisms will always evolve in the same way under the same circumstances. However, the neurons that represent evidence – whether from vision (Britten et al., 1992) or via associations of cues with their bearing on a proposition (Yang and Shadlen, 2007) – do so in a "noisy" way. These neurons do not convey the same number of action potentials per unit time even when they are exposed to the identical condition over and over (at least as identical as can be tested in the laboratory). There is nothing magical about this noise, although the source of noise remains unknown, as does whether it reflects fundamentally deterministic or indeterministic processes. As far as we understand, the existence of noise does not confer any special properties, like freedom, will, consciousness, etc. However, the noise does have very real effects. For example, errors in perceptual decisions can be traced to the variable discharge of cortical neurons (Parker and Newsome, 1998). There are two ways to think that noise might bear upon our understanding of freedom and

responsibility. The first concerns the source of noise, and the second its effects.

#### *What is the source of noise?*

The origins of noise in the neocortex are probably in the complexity of synaptic integration with large numbers of excitatory and inhibitory inputs (Shadlen and Newsome, 1994, 1998; van Vreeswijk and Sompolinsky, 1996).

The representation of information by neurons is affected by noise. Moreover, this noise is an ineliminable aspect of brain function. Even in the parts of the brain that are reasonably well understood, such as the visual cortex, when the exact same stimulus is presented in a highly controlled setting, a neuron might emit 10 spikes on one exposure, 6 on the next, 17 on the next, and so on. In the neocortex, if the mean spike rate is 10 spikes in some epoch (say 1/4 s), then variance is typically about 15. The square root of this number, the SD, is just under 4. Roughly then, we might characterize the count as a random number that tends to be near 10 but falls between 2 and 18 (±2 SDs of 10) with 95% probability. That is a very large range of variability.

Of course, there are many neurons in any patch of cortex. So the brain can achieve an improvement in this variability by averaging the spikes from many neurons. However, there is a limit to this improvement because the neurons are weakly correlated, and thus share some variability. It has been shown that the improvement in signal to noise can only be reduced by a factor of about 3 (Zohary et al., 1994; Shadlen and Newsome, 1998; Mazurek and Shadlen, 2002). This is one of the reasons that neuroscientists can record activity from single neurons and find them so informative about what an entire neural population, and even an animal, senses, decides, and does.

Although it is commonly said that neurons compute with spikes, this truism obscures a deeper truth about the currency of information exchange the cortex. Cortical neurons compute with spike rates. They access information from other neurons even in the temporal gaps between the spikes of any one neuron that contributes information to a computation. In many subcortical structures and in many simpler nervous systems, a neuron emits a spike if only one (or a few) of its inputs are active. The inputs are simple or relatively sparse, and these neurons effectively pass on the action potentials from those inputs. In contrast, neurons in the cortex compute new information by combining quantities potentially representing many different things: position of a stimulus in the left eye's view compared to the right or whether a quantity *x* is greater than another quantity *y* and if so, by how much. The numbers that are to be added, subtracted, and compared are not just all or none. They are intensities: contrast, level of evidence, etc. For the new computation to occur, it would be inefficient for a neuron to wait through the period between spikes arriving from the various inputs that represent, for example, *x* and *y*. Instead, the circuit establishes a representation of *x* and *y* that is present through the interspike interval of any one neuron.

To achieve this, many neurons represent *x* and *y*. That way, in a very narrow time window (e.g., <sup>∼</sup>1/100 s) the neuron that is doing the comparison gets a sample of the intensities of *x* and *y*, as represented by many neurons. This calculating neuron gets to know *x* by averaging the spikes and silences across neurons instead of averaging the spikes (and silences) from one neuron across time. That makes for a fast cortex that can compute new things. But it poses a problem. We know that it takes only <sup>∼</sup>10–20 excitatory inputs in a few ms epoch to make a neuron fire. If we think about the number of inputs that are needed to achieve the computations in question – that is to permit calculations with numbers ranging from 10 to 100 spikes per second, it turns out we need on order 100 neurons representing *x* and another 100 that represent *y*. That would lead to far too many resultant spikes – there would be a surfeit of excitation. Neurons would not be able to maintain a graded range of responses: they would quickly saturate their firing rates.

To counter this, the cortex balances excitation with inhibition (Shadlen and Newsome, 1994). In cortex, when a neuron is driven to discharge at a higher rate, both the rate of its excitatory input *and* inhibitory input increase. In fact, we think there is a delicate balance that allows this to work. It controls the dynamic range of firing. Now the spiking occurs when the neuron has accumulated an excess of excitation compared to inhibition. But since both are occurring, the effect is like a particle in Brownian motion. The neuron's state (e.g., membrane voltage) wanders until it bumps into a positive threshold and produces a spike. The net effect is a preservation of dynamic range among inputs and outputs, but the cost is irregularity. The spikes occur when the random path (called a random walk) of voltage happens to bump into a threshold. That is an irregular process. In fact it explains the high irregularity that one typically observes when recording from cortical neurons. It explains the variance of the spikes counted in an epoch. That irregularity also results in asynchronous spiking. That means another neuron will not be fooled into "thinking" that spike rate has increased because spikes from several neurons arrive all at the same time.

There are a number of interesting implications of this mechanism, but the one we wish to emphasize concerns the relationship between inputs and outputs. There is an important intuition that one ought to have about diffusion and random walks. It is that the state variable that undergoes the walk – what we are thinking of here as membrane voltage – tends to meander from its starting position by a distance given roughly by the square root of the number of steps it has taken multiplied by the size of a unitary step. Suppose that the amount of depolarization required to generate a spike is equivalent to 20 excitatory steps. Then for the random walk, we would expect it to take 400 steps (half excitatory and half inhibitory) for the membrane voltage to meander this far from its starting point, on average. And, approximately half the time the displacement is in the wrong direction, away from spike threshold. This intuition allows us to appreciate why a balance of excitation and inhibition allows a cortical neuron to operate in a regime in which it is bombarded with many inputs from other neurons. The random walk achieves a kind of compression in number of input events to output events.

So why are neurons noisy? It is an inescapable price the neocortex pays for its ability to combine and manipulate information. To perform their computations, cortical neurons receive many excitatory inputs from other neurons, and they must balance this excitation with inhibition (balanced E/I). Balanced E/I leads to the variable discharge that is observed in electrode recordings.

## *The effects of noise*

The fact that this variability exists is not controversial, although its implications are often debated (Glimcher, 2005; Faisal et al., 2008). One particularly relevant fact that is not disputed is that this variability is a source of errors in simple decisions. Noise limits perceptual sensitivity and motor precision. This fact makes one very suspicious of claims that the variability is just due to causes that the experimenter has not controlled for (or cannot control; e.g., variation in motivational state). That would be a valid concern were it not for the fact that the rest of the brain also does not seem to know that this variability is not part of the signals it uses for subsequent computation and behavior. Were the causes of variability in sensory evidence known to the rest of the brain, that variability would not induce errors. Downstream structures would know that the 17 spikes it received was anomalous and that the real signal had magnitude 10.

This leads to another important point. Consider the time that a spike occurs. In actuality, it was preceded by a particular path that the membrane voltage took before the voltage threshold for the spike was attained. This path reflects detailed information about when the input spikes (excitatory and inhibitory) occurred. But, because of the presence of noise and the random walk of the membrane voltage, there are many paths that could lead to the identical spike time and many more that could have led to a range of spike times that would be indistinguishable from the point of view of downstream neurons. The detailed information about the path that led to a particular spike is lost. Downstream neurons see only the outcome – the spike. They are not privy to the particular trajectory of membrane voltage that led to this spike. Thus, downstream neurons do not "know" the exact cause of the inputs impinging upon them, nor can they reconstruct this from the data available to them. They cannot differentiate signal from noise, and any computational characterization of the processes they support must incorporate ineliminable probabilistic features.

This observation has implications for neural coding. For example, it renders implausible a baroque code of information in spatial and temporal patterns of spikes. That is not to say that which neurons are active, and when, is not the code of information. Perhaps, the fine detail of the spike pattern across the population of neurons – like a constellation of flickering stars – conveys information. However, the details of the spike patterns in time – the time sequence of the flickering – are removed from the neural record. They are represented in the particular trajectories that the receiving neurons'membrane voltages undergo between their spikes and are thus lost in transmission. Other neurons in the brain do not benefit from this information.

This observation also has important philosophical implications. It implies a fundamental epistemic break in the flow of information. From the effect, i.e. the spikes of some set of output neurons, the system cannot reconstruct its causes (the times of all the inputs). This means that the variability on the outputs cannot be predictively accommodated. If in some epoch a neuron emits five spikes instead of four, it is often impossible for the brain to trace this difference to an event in inputs that would lead another neuron to discount this extra spike as anomalous. Although the variability can emerge from deterministic processes (no quantum effects) it should be viewed as fundamental, because there is no way to trace it to its source or to negate it.

Noise is at least in part a result of complexity at the synaptic level, a manifestation of a chaotic mechanism that balances excitation and inhibition (Shadlen and Newsome, 1994, 1998; van Vreeswijk and Sompolinsky, 1996). Indeterministic or chaotic neural activity has been postulated by some philosophers to make possible free will (Kane, 2002). Some people (including one of the authors, Michael N. Shadlen) might be inclined to think that noise in the nervous system shows determinism to be false (Glimcher, 2005). However, without being able to identify the source of noise, we cannot attribute it, with certainty, either to indeterministic brain events, such as effects of quantum indeterminacy, or to complex but deterministic processes. And if determinism is true, the spikes produced at some level of neural organization are completely caused by prior physical events, and their precise timing can in principle be accounted for in its entirety. For example, the firing pattern of neurons that represent momentary evidence are completely caused by the impulses from other neurons. However, as argued above, this precise timing does not convey information, nor can it be exploitedfor prediction. So we are wise to look at spike rates as a random value with an expectation (or central tendency) and uncertainty.

We have already noted that this variability has an effect on behavior, namely on the accuracy and speed of decisions. So it is a quantity that we ought to care about. Yet, it is useless to try to account for it by tracing it to more elementary causes. The variability might as well have arisen *de novo* at the level we measure it. Thus, despite the fact that the system may be deterministic in the physical sense, it cannot be understood properly in terms of only its prior causes. This is arguably an example of emergence, a principle that applies to many macroscopic properties in biology (Anderson, 1972; Mayr, 2004; Gazzaniga, 2011).

The presence of noise implies that there is some uncertainty involved in every calculation the brain makes. Thus, even if we know conditions in the world, we cannot be sure about what outcome they will cause via the workings of a brain that must make a decision, because it is not clear, even to the brain, exactly what state it is in. Because the brain operates on noisy data with noisy mechanisms, it must enact strategies or policies to control accuracy. For example it must balance the speed of its decisions against a targeted accuracy. Such policies underlie distinctions that separate one decision maker from another, and we will argue that they are relevant to assessments of free will and responsibility. We explicitly deny that the brain or the agent can (always) identify noise as distinct from signal. However, through experience the agent can tell that he does not always track the world correctly, or that his decisions are not the right ones. He thus must learn to modulate his decisions in order to compensate for uncertainty, where that uncertainty is generated (at least in part) by noise. For example, a high error rate might induce the agent to change policy by slowing down. Neither the agent nor the brain need know about the noise, but by changing the bound height, the brain (and agent) would reduce the error rate.

## **RESPONSIBILITY, POLICY, AND WHERE THE BUCK STOPS**

On most moral views, capacities, attitudes, and policies are relevant to assessments of ethical responsibility (e.g., Strawson, 1974; Wallace, 1998; Smith, 2003). Capacities set broad outlines for domains of possibility for the engagement of certain functions important for social agency. Some, such as basic abilities to comprehend facts, make valuations, and control impulses may be necessary conditions for responsible agency, whereas consideration of others, such as perceptual acuity, memory, attentional control, and mentalizing abilities may modulate responsibility judgments. Attitudes or policies such as explicit beliefs about moral obligations, risk-aversiveness, and in/out-group attitudes may affect decisionmaking in ways that we consider subject to moral assessment4. The neuroscience of motivation and social behavior is beginning to shed some light on the neuroscience of social attitudes, but at this point only in broad-brush ways that do not yet illuminate mechanism. Policies are high-level heuristics that affect the parameters of decision-making and can be modulated in a context-dependent way. These include the relative weighting of speed versus accuracy, the relative weighting given to different types of information, and the cost assigned to different degree of expected error. Some of these elements are formalized mathematically in decision theory (Jaynes, 2003). Our focus here will be on policies.

In Section "Neurobiology of Decision-Making," we suggested that neuroscience is beginning to expose the brain mechanisms that establish at least some such policies. The speed–accuracy tradeoff is a paradigmatic example. We focus here on the tradeoff between speed and accuracy because it is something we are beginning to understand (Palmer et al., 2005; Hanks et al., 2009). Neural mechanisms responsible for other decision policies are probably not far behind. In principle, the same kinds of mechanisms that operate on perceptual decisions are probably at play in social decisions (Deaner et al., 2005), economic decisions involving relative value (Glimcher, 2003; Sugrue et al., 2005; Lee, 2006), and decisions about what (and whether) to engage – deciding what to decide about (Shadlen and Kiani, 2007, 2011).

That policies can have a role in the assessment of responsibility is plain. Policies are malleable, context-dependent, and pervasive. Consider the following outcomes due to decisions made by two doctors. Doctor A made a hasty,inaccurate diagnosis of her patient because she valued speed over accuracy. Doctor B,valuing accuracy more than speed, made a correct diagnosis, and saved the patient. Doctor C, also valuing accuracy over speed, failed to act in time to stanch the bleeding of his patient. Decisions cannot be explained in the absence of considerations of policy, and the suitability of policies must be tuned to circumstances. These policies are center stage in our consideration of the qualities of these three doctors' decisions. It is not the policy itself, but the application of the policy in particular circumstances that is important: That is why our moral assessments of Doctor's B and C differ, even though they have the same policy. On the other hand, even if Doctor B had not saved the patient due to chance factors she could not control, we would have no grounds for moral sanction. Thus, it is the policy, not just the outcome, that is relevant to moral assessment.

Recent work indicates that policy elements of decision-making are beginning to be explicable in neural terms. Importantly, the

<sup>4</sup>It is not clear how to distinguish attitudes and policies. We refer to them as if they are different, but it could just be that we have the beginning of an understanding of the neural basis of policies and how they affect decision-making, but so far no real handle on the neural realization of things we consider attitudes.

elucidation of the neural mechanism that gives rise to policy does not explain the policy away, nor does it make it less relevant to ethical assessment. Policies may be chosen poorly, but they are also revisable. So over time, policies should better track what they must accomplish. Agents can be morally assessed for failing to revise. Indeed setting a policy often requires decisions (as well as learning and other factors). The process might also be subject to noise and uncertainty, but again, the noise neither confers freedom nor lack of responsibility; it invites consideration of policy affecting decisions about policy. These policies are also targets for moral assessment. For example, Doctor A in the story, who made a hasty, inaccurate diagnosis of her patient because she valued speed over accuracy, might ask us to excuse her action on the grounds that her policy, favoring speed, was merely the outcome of noise. The argument concerns policy and thus has bearing on our evaluation, but it is not compelling, because one would counter (in effect) that training in medicine should lead to non-volatile policies,which are resilient to noise, emotional factors, distraction, and sleep deprivation. However, were Doctor A poisoned by a drug (or disease process) that affected the bound-setting mechanism, we might be inclined to accept this fact as mitigating.

The important insight is that the neurobiology is relevant in the sense that it points us toward the consideration of policy in our moral assessments. We may not have direct access to the internal policy in the way we can observe an act, but we can infer settings like bound height from behavioral observations, just as we can infer accuracy. We also have direct access to the agent's communications about these policies. Although indirect and possibly non-veridical, they are expressions of metacognitive states – analogous to confidence – concerning a decision. Thus, when we engage in ethical evaluation of a decision or act, policy is one natural place to focus our inquiry.

To recap, we have argued that ineliminable noise in neural systems requires the agent to make certain kinds of commitments in order to make decisions, and these commitments can be thought of as the establishment of policies. Noise puts a limit on an agent's capacities and control, but invites the agent to compensate for these limitations by high-level decisions or policies that may be (a) consciously accessible; (b) voluntarily malleable; and (c) indicative of character. Any or all these elements may play a role in moral assessment. It remains to be seen how such information about policy might bear upon our view of free will and responsibility. The answer will depend in part on what one's basic views on free will and responsibility are. It will also depend on whether the arguments about noise are taken to illustrate a purely epistemic limitation about what we know about the causes of our behavior, or whether one can muster arguments to the effect that a fundamental epistemic limitation brings with it metaphysical consequences.

One of us (Adina L. Roskies), thinks that policy decisions are a higher-level form of decision that establishes parameters for first-order decisions, and that to the extent that policies are set consciously or deliberately, or are subject to feedback from learning, policy decisions should be considered significant in attributions of responsibility, and the ability of the agent to manipulate them as important in attributions of free will. It is possible that policy decisions should be considered significant in attributions of

responsibility even when they are set without conscious deliberation. Another of us (Michael N. Shadlen), agrees with the above, and in addition holds that the special status of policies is also a consequence of their emergence as entities orphaned from the chain of cause and effect that led to their implementation in neural machinery. This will be explained further in the next section.

Does the fact that we cannot know the precise neural causes of some effect mean that we can conclude that they are in some relevant sense undetermined? If one rejects the notion that the unpredictability of noise entitles one to take the noise as fundamentally equivalent to indeterminacy – because the limitations are only epistemic in character – the noise argument cannot be used to argue for the falsity of determinism and the consequent falsity of positions tied to the truth of determinism. Thus, the focus on policies does little to address the worries of the hard incompatibilist.

However, those with compatibilist leanings might think like this: As a compatibilist, your concerns are not with the truth of determinism or indeterminism, or even with predictability. Instead, you think that capacities and other properties of agents are the criteria upon which to establish responsibility. For example, if you think that responsibility judgments are relativized to the information available to an agent, then noise, whether deterministic or indeterministic, puts limits on perfect information and forces the agent to make policy decisions based on prior experience. This is just an augmentation of the imperfect information or uncertainty that we already take to exist in decision-making. If one accepts that mechanism need not undermine mindedness, then we can examine whether policies are based on conscious decisions/intentions, and whether agents can be held accountable for how policies are set. The plasticity of this system will undoubtedly be an important aspect of responsibility. Notice that this same reasoning can be applied to decisions themselves (or, perhaps only decisions for reasons that the agent is aware of), so it is not clear we get anywhere with traditional philosophical problems, but it does point to an aspect relevant to moral assessment that is often overlooked, and for which we have some insight from neuroscience. The important point the science gives us is that the policies are necessitated not by indeterminism but by noise, an established physical fact that all sides can agree on. This makes the traditional debate about determinism/indeterminism moot, and instead puts emphasis on the importance of capacities and how they ground responsibility. Moreover, if one thinks that the information available to the agent is an important factor to weigh in assessments of responsibility, the recognition of noise puts important limits on even ideal measures of that quantity.

Suppose, on the other hand, you argue that the ineliminability of noise, and the information loss it results in, provides a basis for a belief in indeterminism5. You may try to leverage an

<sup>5</sup>One of us (Michael N. Shadlen) believes this is the case: the limitation imposed by noise is not merely epistemic; it represents indeterminacy that is fundamental. This is because the complexity of the brain magnifies exponentially the finite variation in initial state. This finite variation is a property of nature, not a consequence of measurement imprecision. It is the notion of infinite precision that is fictional (invented for the calculus). This variation leads to exponential divergence of state in chaotic (deterministic) processes. Because this variation cannot be traced back in time to its original causes, chaos supports metaphysical (as well as epistemic) indeterminacy.

argument that will be persuasive to the scientist who is tempted by incompatibilism, but one who is worried about scientific reductionism rather than determinism, and thus worried that neuroscience will explain away agency.

Here would be a sketch of such an argument: Brain states including those that underlie the establishment and implementation of high-level policies in decision-making possess low-level explanations and causes. However, due to the information loss that noise engenders, there is a fundamental limitation to the kinds of reductionist accounts that will be available. The inability to offer a reductionist explanation is an epistemic limitation, but one could argue that the due to noise, high-level brain states or processes, including the policies that are developed to deal with noise, represent a form of emergence.

The argument for emergence is analogous to one that the evolutionary biologist, Ernst Mayr,made regarding species. In principle, we can trace the sequence of events leading to the evolution of zebras, for example, from early vertebrates, but we do not recognize this causal accounting as being fully explanatory. The chain of cause and effect in evolution – here the path of evolution from early vertebrates – could have diverged vastly differently from the one we can piece together in retrospect. This vastness of possibility follows from the mechanism of evolution, and in brief, this degree of divergence necessitates that we cannot explain the zebra's status as an entity equivalent to "early vertebrates plus the mechanism of evolution," since there are multiple ways evolution might have gone. Thus we postulate the ontological independence of the zebra in our biological theories. A similar argument can be made with respect to causal processes in the brain that lead to the establishment of policy. Emergence does not contradict the fact that a chain of cause and effect led to a brain state. But it does imply that we cannot explain a behaviorally relevant neural state solely in terms of its causal history; too much of the entire causal history would be needed to account for the final state. And because in human interaction we need to explain behaviors, and explaining behavior is important for assessments of moral responsibility, we have to stop trying to trace back causal chains beyond the noise, and focus on higher-level regularities such as policy decisions. Thus, if you object to freedom and responsibility, not because of the presence of causal chains but because of eliminativist worries that mechanism precludes responsibility, we urge you to consider the following: First, recognize that policies are real and ineliminable aspects of decision-making, necessitated by the limited information available to neural systems. Second, take these high-level policy decisions as a basis for responsibility assessments.

In other words, if noise in the brain arises from a mechanism that is analogous to the emergence phenomenon in evolution it might imbue brain states with the same type of status that a species has in evolution – an ontologically real entity. If this were to hold for policy, then an incompatibilist might be nudged toward explanations of decisions that recognize irreducible elements in the brain of the decision maker, elements that cannot be explained away on the basis of prior causes. These elements can provide a basis for accountability and responsibility that focuses on the agent, rather than on prior causes.

That said, if you are a hard incompatibilist, and reject freedom and responsibility on the grounds that neither determinism nor indeterminism can support freedom, the foregoing argument for freedom and responsibility may not move you at all, for policies themselves have a causal basis, and the same arguments that block responsibility in the first-order case will also block it in the case of higher order policies.

#### **FINAL REMARKS**

We have attempted to account for the role of noise in decisionmaking, based on an understanding of the underlying science. Our philosophical conclusions are modest. For example, we do not say much here explicitly about freedom, although we think the points made here will prove relevant to considerations of freedom in the neural context. We have, however, argued that:


This argument might have further implications for understanding and investigating conditions for moral responsibility, such as:


#### **SUMMARY**

Recent advances in neurobiology have exposed brain mechanisms that underlie simple forms of decisions. Up until now, the role played by noise in decision-systems has not been considered in detail. We have argued that the science suggests that noise does not bear on the formulation of the problem of free will in terms of determinism as traditionally thought, but rather that it shifts the focus of the debate to higher-level processes we call "policies". Our argument is compatibilist in spirit. It implies that NBDM does not threaten belief in freedom because it discloses the causes of action. Rather, NBDM sheds light on the mechanisms that might lead an agent to make one choice in circumstances that might lead another, even very similar agent to choose differently. We have not appealed to randomness or noise as a source of freedom, but rather recognize that such randomness establishes the background against which policies have to be adopted, for example, for trading speed against accuracy. We thus offer a glimpse of an aspect of compatibilism that does not address the compatibility of freedom with determinism *per se*, but instead addresses the compatibilism of responsibility with neurobiology and mechanism. By showing *how* choices are made, the neurobiology does not dismiss choice as illusory, but highlight's the agent's capacity to choose.

## **REFERENCES**


**ACKNOWLEDGMENTS**

Michael N. Shadlen thanks Larry Abbott, Simon Blackburn, Dan Braun, Helen Brew, Patricia and Anne Churchland, Roozbeh Kiani, Josh Gold, Tim Hanks, Gerald Shadlen, S. Shushruth, Xiao-Jing Wang, and Daniel Wolpert for helpful discussions and comments on an earlier draft of this manuscript, and acknowledges support from HHMI, NEI, NIDA and a Visiting Fellow Commoner Fellowship from Trinity College, University of Cambridge, UK. Adina L. Roskies thanks the Princeton University Center for Human Values and the Big Questions in Free Will Project, funded by the John Templeton Foundation for support. The opinions expressed in this article are our own and do not necessarily reflect the views of the John Templeton Foundation.


the parietal lobe of the macaque monkey. *J. Comp. Neurol.* 428, 112–137.


and responsibility. *Trends Cogn. Sci. (Regul. Ed.)* 12, 3–4; author reply 4.


networks with balanced excitatory and inhibitory activity. *Science* 274, 1724–1726.


performance. *Nature* 370, 140–143.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 February 2012; accepted: 30 March 2012; published online: 23 April 2012.*

*Citation: Shadlen MN and Roskies AL (2012) The neurobiology of decision-making and responsibility: Reconciling mechanism and mindedness. Front. Neurosci. 6:56. doi: 10.3389/fnins.2012.00056*

*This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.*

*Copyright © 2012 Shadlen and Roskies. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.*

## Neurophilosophical considerations on decision making: Pushing-up the frontiers without disregarding their foundations

## *Gabriel J. C. Mograbi 1,2,3\**

*<sup>1</sup> Department of Philosophy, Federal University of Mato Grosso (UFMT), Cuiabá, Brazil*

*<sup>2</sup> Mind, Brain Imaging and Neuroethics Lab, Royal Ottawa Health Care Group, Institute of Mental Health Research, University of Ottawa, Ottawa, ON, Canada*

*<sup>3</sup> Coordination for the Improvement of Higher Education Personnel (CAPES), Brasília, Brazil*

*\*Correspondence: gabriel.mograbi@gmail.com*

## *Edited by:*

*Carlos E. B. De Sousa, Northern Rio de Janeiro State University (UENF), Brazil*

*Reviewed by:*

*Kamila E. Sip, Rutgers the State University of New Jersey - Newark Campus, USA*

#### **Keywords: decision making, neurophilosophy, moral decisions, testability, ecological relevance, adaptive decision**

## **INTRODUCTION**

This is an opinion article on the special research topic now turned into an e-book called "Decision-Making Experiments under a Philosophical Analysis: Human Choice as a Challenge for Neuroscience." As the first editor of the issue I want to briefly comment on each of the articles highlighting its achievements and prospects for the future.

## **THE ORIGINAL RESEARCH SECTION INCLUDES 3 ARTICLES/CHAPTERS**

To what extent a decision to deceive someone is conditioned by social pressure of being caught in a lie and suffering the consequences of it? This socially relevant question is addressed in Sip et al. (2012). Deception is a social conduct with practical interests and implications established by complex interaction between interlocutors or agents. Nevertheless, not many empirical studies have been produced so far to understand how the social pressure is internalized by the subjects in their decisions. Sip et al. (2012) explore, in a very creative experimental design, social pressure as a component of decision to deceive. The study makes use of a computer game in which the subject inside the scanner could, in part of the trials, be confronted by an opponent about his/her knowledge of a display's content. A small monetary reward was used to encourage participants to avoid being detected deceiving: Subjects were rewarded for successful deception and penalized for ineffective ventures. The results, in addition to showing, as expected, that the decision to deceive is influenced by the risk of being detected and the social confrontation represented by the detection, also reveal that participants were slower when taking an honest course of action instead of taking advantage of their privileged knowledge. In trials in which confrontation was not possible increased activity in subgenual anterior cingulate cortex was recorded. Also, understanding of a question which allows a deceptive response was associated with activation in right caudate and inferior frontal gyrus.

Deneve (2012) presents an elegant Bayesian decision model that both infers the probability of two different choices and simultaneously estimates the reliability of the sensory information on which this choice is based. Trials in which the level of difficulty is higher show early sensory inputs having a stronger impact on the decision. Accordingly, the threshold collapses such that response time is shorter, tough with lower accuracy. Easy trials, by their turn, show the opposite: an increased sensory weight and a higher threshold over time, eliciting slower, but more accurate, decisions. As the model advanced by the author considers adaptive sensory weights, it could not only extract a single estimate from the sensory input, but also evaluate the uncertainty associated with it. That would be an advantage in comparison to standard diffusion models as it would allow an optimal combination with other noisy sensory cues. The Bayesian model is especially successful when it is possible to encompass prior knowledge with sensory evidence. Notwithstanding its success in monkeys, as human reaction times (RTs) are more asymmetrical than RTs distributions observed in monkeys, traditional diffusion models suit better the human data. Thus, it is still open to further investigation whether the phenomenon is due to the fact that human subjects are less trained than monkeys or because humans may use other cues to evaluate the sensory reliability, not allowing for adaptive sensory gain as, from the beginning, near the optimal value are already achieved.

Osman (2012) empirically compares Choice-based decision making and Prediction-based learning, showing that the former leads to more accurate cueoutcome knowledge. The study mainly focuses on the role of reward. During training period, participants received outcome feedback and were exposed to different types of reward manipulations: Positive Reward, Negative Reward, Both Positive + Negative Reward, No Reward. Negative Reward detrimentally affected Choice-based decision making during learning. By its turn, predictive-based decision making was also negatively affected by Positive Reward. During test period, solely choice was negatively affected by the previously Positive Reward or Negative Reward manipulations exerted in the training period. Based on those results, author suggests that the additional demand of cognitive resources for the processing of rewards could be an explanation of its adverse effect in the decisional process. Also, a series of philosophical considerations is forwarded to question how generalizable is evidence from neuropsychology to psychology and vice-versa. In this context, the relationship of intra-level and inter-level experiments is considered.

In the **Reviews'** section we have a very innovative article by Nakao et al. (2012). This meta-analytical manuscript compares and disentangles two types of empirical protocols used for study of decisional processes: experiments with a unique but uncertain answer and experiments in which no unique external cued answer could be considered correct. The former is categorized as externally oriented decision making and the latter as internally oriented decision-making. The article compares externally and internally guided decision-making empirically and theoretically, studying conceptual and operational differences, as also, similarities between both cases. In the case of externally guided decision, two types of experiments are analyzed: tasks with difficult probabilistic outcome and also experiments in which the answer is varied (or believed to be varied). Both protocols addressing neuroeconomic and social subjects are included in this category. In the case of internally guided decisionmaking, experiments addressing preference judgment and moral decision making are encompassed. The article uses Multi-Kernel Density Analysis (MKDA) to contrast internally and externally guided decisions in terms of recruitment of areas, to finally compare commonalities and differences between the two types of decisions. The authors show that externally guided decision-making was mainly correlated to the DLPFC-insula-thalamus-IPL network and internally guided decision-making to the VMPFC-pACC-PCC-STG network. Also, it discusses possible future directions to internally guided decision study. Along the contributions to the field of decision making, the article has as one its virtues a contribution to the understanding brain's resting state and its high activity, especially in the Default Mode Network (DMN) that largely overlaps with observed regions in internally guided decision-making.

## **IN THE PERSPECTIVES SECTION WE HAVE 3 CHAPTERS**

Heinzelmann et al. (2012) discusses the practical and moral question of inappropriate behavior considering its foundations in both philosophical normative and descriptive domains. The moral implication of empirical findings in neuroscience, economics and psychology are discussed in the light of this philosophical background aiming at an understanding of the possible mechanisms of moral inappropriate actions and the decisional process that leads to them. More importantly, the paper addresses the morally important and controversial question of interventions to promote behavior improvement. First, it considers the empirical available knowledge on different techniques of interventions to promote better decisional capacities at various levels of invasiveness: nudging, training, education, pharmacological enhancement and tDCS/TMS. Then, it discusses its feasibility and whether or not we can be morally justifiable to apply those techniques. Both practical and foundational issues are considered to answer this question.

Taking as a standpoint Stephens and Anderson's (2001) already classic article, by Bourgeois-Gironde (2012) aims at considering the viability of methodological transfers from behavioral ecology to experimental economics, including human choice inasmuch as it is concerned with intertemporal preferences. The author suggests that economic theories have noticeable similarities to ecological models in their assumptions and implications. More specifically, it is argued that "hyperbolic time discounting" is present in both humans and other animals, despite the possibility of this process being not only quantitatively but also qualitatively highly different among species. Brief evolutionary considerations are offered to contend for this possibility.

Lucci (2013) proposes an investigation of the subjective component of time in intertemporal choice (IC). The author asserts that deviations from exponential reward discounting, as a function of time, could have as a primary factor the deviation of subjective time from the calendar metric system time. Time perception, she claims, could modulate discounting. Consequently, time perception would be a fundamental component of IC. Reviewing recent literature on time perception, she discusses its relationship with the measuring of IC. Her approach emphasizes the importance of the self in the explanation of behavior from a temporal perspective.

## **IN THE HYPOTHESES AND THEORIES AXIS 3 CHAPTERS ARE PRESENTED**

Smaldino and Richerson (2012) approached a very important foundational question, namely, the generations of options. The authors argue that current paradigms in neuroscience are focused on decisions made among a previously established set of options, although, the very generation of options has barely been studied and still to a great extent an untapped issue. The author considers various specific factors that could influence the generation of options that would be categorizable in two broadly defined domains: psycho-biological and socio-cultural.

Volz and Gigerenzer (2003) differentiate the "small world" of risk from the "large worlds" of uncertainty. Authors argue that normative strategies used in decisions under risk could not be generalized to all types of decision-making processes, stressing that in most of the experimental designs, the strategies to deal with risk are assumed as implicit presuppositions even if they are not applicable. Also, it is shown that criteria for generating optimal solutions in decisional processes under risk could not be the best whenever uncertainty is the difficulty the agents have to cope with. Even the neural correlates of decision under uncertainty would be different from the ones present in decision under uncertainty. More precisely, valuebased statistical thinking would be sufficient for making good decisions in a risk situation but not in the case of uncertainty. Under uncertainty, heuristic thinking would play a key role in an efficient decisional process.

Shadlen and Roskies (2012) argue for the possibility of reconciling responsibility with neurobiology and mechanism by philosophically reviewing presuppositions and implications of recent empirical studies in neurobiology. Instead of the more traditional account of compatibilism based on an appeal to randomness or noise as a source of freedom, they rather recognize that randomness could possibly establish the background against which policies have to be adopted. Although, the argument does not favor compatibility of freedom with determinism *per se*, it contends that compatibilism of responsibility and mechanism is possible. Their arguments function in hypothetical manner: if agents can be accountable for policies that in some sense determine decisions, they can be held responsible for those decisions, even if they do not have conscious access to the reasons for those decisions.

## **ACKNOWLEDGMENTS**

Gabriel J. C. Mograbi is a postdoctoral fellow at the Mind, Brain and Neuroethics Lab (University of Ottawa) granted by a fellowship from the Coordination for the Improvement of Higher Education Personnel (CAPES — Brazil) as a tenured Professor at the Federal University of Mato Grosso (UFMT), Department of Philosophy on sabbatical.

## **REFERENCES**

Bourgeois-Gironde, S. (2012). Optimal short-sighted rules. *Front. Neurosci.* 6:129. doi: 10.3389/fnins. 2012.00129


*Received: 12 October 2013; accepted: 12 December 2013; published online: 30 December 2013.*

*Citation: Mograbi GJC (2013) Neurophilosophical considerations on decision making: Pushing-up the frontiers without disregarding their foundations. Front. Neurosci. 7:261. doi: 10.3389/fnins.2013.00261*

*This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2013 Mograbi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

## OPEN ACCESS

Articles are free to read, for greatest visibility

## TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org