# **CONSCIOUSNESS AND ACTION CONTROL**

# **Topic Editors Ezequiel Morsella and T. Andrew Poehlman**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-315-8 **DOI** 10.3389/978-2-88919-315-8

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **CONSCIOUSNESS AND ACTION CONTROL**

Topic Editors:

**Ezequiel Morsella,** San Francisco State University and University of California, San Francisco, USA

**T. Andrew Poehlman,** Clemson University, USA

One of the first mechanistic accounts of how action arises from perception: An illustration by René Descartes of the reflex arc, which can transpire unconsciously, from input to output. According to Descartes, the involvement of 'the psyche' (i.e., consciousness) in non-automatic actions depends on the pineal gland. Today, researchers continue to search for the basic mechanisms, both cognitive and neural, underlying conscious action control.

Hall, T. S. (1972). Treatise of man (René Descartes). Cambridge, MA: Harvard University Press.

The basic nuts and bolts underlying human behavior remain mysterious from a scientific point of view. Everyday acts — naming an object, suppressing the urge to say something, or grabbing a waiter's attention with a "cappuccino, please" — remain difficult to understand from a mechanistic standpoint. Despite these challenges, research has begun to illuminate, not only the basic processes underlying human action production, but the role of conscious processing in the control of behavior. This Research Topic, "Consciousness and the Control of Action," is devoted to surveying and synthesizing these developments from disparate fields of study.

# Table of Contents


Alfredo Pereira Jr, Rafael Peres dos Santos and Rafael Fernandes Barros

*Semantic Relevance in Memory Experiments*

## *149 The Effects of Alerting Signals in Masked Priming* Rico Fischer, Franziska Plessow and Andrea Kiesel

*157 Adaptive Control of Human Action: The Role of Outcome Representations and Reward Signals*

Hans Marien, Henk Aarts and Ruud Custers

*162 The Wild Ways of Conscious Will: What We do, How We do it, and Why it Has Meaning*

J. Scott Jordan

	- E. J. Masicampo and Roy F. Baumeister

## The inevitable contrast: Conscious vs. unconscious processes in action control

## *Ezequiel Morsella1,2\* and T. Andrew Poehlman3*

*<sup>1</sup> Department of Psychology, San Francisco State University, San Francisco, CA, USA*

*<sup>2</sup> Department of Neurology, University of California, San Francisco, San Francisco, CA, USA*

*<sup>3</sup> Marketing Department, Cox School of Business, Southern Methodist University, Dallas, TX, USA*

## *Edited by:*

*Lorenza S. Colzato, Leiden University, Netherlands*

*\*Correspondence: morsella@sfsu.edu*

**Keywords: action, consciousness, unconscious processing, voluntary action, perception-and-action**

The simple actions of everyday life—flicking a light switch, suppressing the urge to say something, or grabbing a waiter's attention with a "check, please"—remain difficult to understand from a scientific point of view. Unlike the mechanisms giving rise to machine action—which are designed according to clearcut, well principled plans—the mechanisms underlying human action are fashioned by the happenstance and tinkering process of evolution, whose products can be counterintuitive and suboptimal (Simpson, 1949; Lorenz, 1963; Gould, 1977; de Waal, 2002; Marcus, 2008), far unlike the kinds of things we humans design into robots (Arkin, 1998) <sup>1</sup> . When speaking about the *reverse engineering* of biological products, the roboticist thus cautions, "Biological systems bring a large amount of evolutionary baggage unnecessary to support intelligent behavior in their silicon based counterparts" (Arkin, 1998, p. 32), and, speaking of the products of mother nature, the ethologist concludes, "To the biologist who knows the ways in which selection works and who is also aware of its limitations it is no way surprising to find, in its constructions, some details which are unnecessary or even detrimental to survival" (Lorenz, 1963, p. 260).

Faced with this and many other challenges (cf., Rosenbaum, 2005; Herwig et al., 2013), the student of human action is forced to abandon a *normative* view (which describes how things *should* function) of the phenomena at hand and adopt instead a more humble, *descriptive* view (which describes the products of nature as they have evolved to be). From such a descriptive approach, investigators over the past two decades have begun to illuminate, not only the basic processes underlying human action, but the liaison between action and consciousness—the most mysterious aspect of nervous function (Roach, 2005).

In this special issue of *Frontiers in Cognition*, we survey these advances stemming from disparate fields of inquiry, including cognition, neuroscience, and artificial intelligence/robotics. Together, these developments unveil a great deal about the links between perception and action while also illuminating much about all else in between. Of note, these developments also reveal that the study of *action production and control* ("action control," for short) provides a unique portal through which to examine the nature of conscious processing. As explained below, many aspects of consciousness are easier to study from an *action-based approach* than from a *perception-based perspective*, which has been the traditional approach to studying consciousness (e.g., Crick and Koch, 2003; see discussion in Baars, 1997).

Before discussing further the liaison between consciousness and action control, and what the latter informs about the former, it is important to first describe the most nebulous term at hand, "consciousness."

## **THE MIND-BOGGLING AND (UNFORTUNATELY) INESCAPABLE PROBLEM OF CONSCIOUSNESS AND THE BRAIN**

Throughout intellectual history, people have been investigating the phenomenon of consciousness in one way or another, though often while avoiding utterance of the controversial term, "consciousness," which has been considered unscientific for most of its history. During the Behaviorist era (1919–1948), in which discussion of consciousness was strongly discouraged, the rank and file psychophysicist and Gestalt psychologist continued to study the "conscious field" that had been the object of investigation during the earlier Structuralist era pioneered by Wundt and Titchener (1879–1919). Since the fall of Behaviorism, a *de facto* distinction has been made between conscious and unconscious processing in every field of inquiry of psychology and neuroscience, though, again, often without mention of the term "consciousness." In perception research, psychophysical measurement continues to make the distinction of *supra-* vs. *sub*liminal, and to base its conclusions on conscious "self-report." In the study of attention, the term "attentional awareness" is often contrasted with unconscious, "pre-attentive" processing (Treisman and Gelade, 1980). In memory research, there is the classic distinction between "declarative" (explicit) processes and "procedural" (implicit) processes (Squire, 1987; Schacter, 1996). In research on motor control and on language production, the conscious aspects of voluntary action and action monitoring are contrasted with the unconscious aspects of motor programming (Levelt, 1989; Rosenbaum, 2002), including the implicit learning of motor sequences (Taylor and Ivry, 2013). Last, various fields contrast "controlled" processing, which tends to be associated with consciousness, and "automatic" processing, which tends to be associated with unconscious mechanisms (e.g., Lieberman, 2007; but see Panagiotaropoulos et al., 2013).

In summary, the difference between conscious and unconscious processes (regardless of the appellations ascribed to each

<sup>1</sup>Consider that the artificial heart is very different from its natural counterpart and that the difference between human locomotion and artificial locomotion is a stark one—that between legs versus wheels.

process) is an inescapable contrast that is encountered after even a cursory examination of mental and nervous phenomena<sup>2</sup> .

Upon accepting that, in the natural world, there are conscious and unconscious processes, then one must contemplate the phenomenon of consciousness. Understanding how the nervous system gives rise to basic, low-level consciousness—the subjective experience of pain, breathlessness, or a yellow afterimage remains one of the greatest puzzles in science (Crick, 1995; Roach, 2005). This most basic form of consciousness is referred to as "sentience" (Pinker, 1997), "subjective experience," "phenomenal state," and "qualia" (Gray, 2004). It has been best defined by Nagel (1974), who proposed that an organism possesses consciousness if there is *something it is like* to be that organism—something it is like, for example, to be human and experience pain, yellow afterimages, or breathlessness.

Some have attempted to *explain away* this mind-boggling puzzle by claiming that consciousness does not exist (which is perhaps the least deniable fact of our existence, given that consciousness encompasses the totality of all we know) or that it exists but serves no function (that is, it is "epiphenomenal") in the nervous system. Unfortunately, while the former view is difficult to defend, the latter view does not provide an escape from the enigma at hand either. Regardless of whether consciousness serves a function in the nervous system or not, the scientist must still explain its place within nature: Huxley's steam whistle may be epiphenomenal with respect to the locomotive, but the scientist must still understand what it is (high frequencies) and how it arises from physical events (high pressured steam released through a small aperture). It seems premature to state that a phenomenon does not serve a function when the place of that phenomenon within nature remains unknown. In short, even if a phenomenon is functionless, a complete scientific account of the natural world must include an explication of it. See, in this issue, the article by Pereira et al. for a novel, untraditional approach to consciousness; see also relevant articles by Cruse and Schilling, by Hommel, and by Masicampo and Baumeister.

Progress regarding the puzzle of consciousness has stemmed from descriptive approaches juxtaposing conscious and unconscious processing in terms of their cognitive and neural correlates (Shallice, 1972; Logothetis and Schall, 1989; Crick and Koch, 1995; Kinsbourne, 1996; Wegner and Bargh, 1998; Grossberg, 1999; Di Lollo et al., 2000; Dehaene and Naccache, 2001; Baars, 2002, 2005; Gray, 2004; Libet, 2004; Laureys, 2005; Morsella, 2005; Merker, 2007; Doesburg et al., 2009; Damasio, 2010; Boly et al., 2011; Panagiotaropoulos et al., 2012). [For a review regarding the conclusions of this contrast, see Godwin et al. (2013); for discussion of the limitations of a contrastive approach, see Aru et al. (2012).] To examine this contrast, researchers have focused primarily on perceptual processing (see Panagiotaropoulos et al., 2013), for several important reasons (see reasons in Crick and Koch, 2003). Perception-based research has illuminated how entry into consciousness ("entry," for short) is influenced by processes that are "bottom-up" (e.g., stimulus salience, motion, novelty, incentive and emotional quality, etc.; Gazzaley and D'Esposito, 2007) or attentional (cf., Most et al., 2005). This important research has led to several advances (see review in Koch, 2004), including (a) the differences in the processing of stimuli that are supraliminal (i.e., consciously-perceptible) and subliminal (i.e., consciously-imperceptible; Logothetis and Schall, 1989; Dehaene and Naccache, 2001; Koch, 2004; Roser and Gazzaniga, 2004; Doesburg et al., 2009), and (b) uncovering the unconscious processes preceding a conscious percept (Di Lollo et al., 2000; Goodhew et al., 2012; see Fischer et al., 2013).

Such research has also led to the *integration consensus* (Tononi and Edelman, 1988; Baars, 1988, 1998, 2005, 2013; Damasio, 1989; Freeman, 1991; Srinivasan et al., 1999; Zeki and Bartels, 1999; Edelman and Tononi, 2000; Dehaene and Naccache, 2001; Llinás and Ribary, 2001; Varela et al., 2001; Clark, 2002; Ortinski and Meador, 2004; Sergent and Dehaene, 2004; Morsella, 2005; Del Cul et al., 2007; Kriegel, 2007; Merker, 2007; Doesburg et al., 2009; Uhlhaas et al., 2009; Boly et al., 2011; Koch, 2012; Tallon-Baudry, 2012; Tononi, 2012), which proposes that consciousness integrates neural activities and information-processing structures that would otherwise be independent (see reviews in Baars, 2002; see Morsella, 2005, for the limitations of the integration consensus and for a listing of integrations that can occur unconsciously). Findings from action-based research complement the integration consensus: Consistent with the integration consensus, in conditions in which actions are decoupled from consciousness (e.g., in neurological disorders), actions often appear impulsive or inappropriate, as if they are not adequately influenced by the kinds of information by which they should be influenced (Morsella and Bargh, 2011). These actions reveal a lack of adequate integration. Thus, consciousness appears to permit a form of integration that constrains potential action, achieving a form of *multiple-constraint satisfaction* (Merker, 2013). Constraints can be "online," reflecting stimuli in the current environment, or they can be "offline," reflecting covert processes such as memory, cognitive maps, operations on mental representations, and mental simulation (Schacter and Addis, 2007). For example, recent theories propose that the function of explicit, episodic memory—a form of knowledge representation intimately associated with the past—is actually to simulate future, potential actions (Schacter and Addis, 2007).

#### **CONSCIOUSNESS AND ACTION**

Although theorists have long appreciated that consciousness is intimately related to action (James, 1890; Neumann, 1987; Allport, 1989; Hamker, 2003; Morsella, 2005; Baddeley, 2007), until recently there has been a substantial gap in our knowledge regarding how action-related processes influence consciousness. The reason for this gap is not surprising, as action itself is an under-explored topic of research (see reasons for this in Nattkemper and Ziessler, 2004; Rosenbaum, 2005; Agnew et al., 2009; Herwig et al., 2013). Action control is a highly complicated process, one involving various kinds of mechanisms (e.g., *hierarchical* vs. *distributed control* and *forward modeling* vs. *inverse*

<sup>2</sup>It is important to appreciate that, even in the early Twentieth Century, in the field of psychiatry (which was at that time independent from psychophysics and other forms of academic psychology), the student of the mind realized that in the nervous system there are processes that are consciously mediated and those that are unconsciously mediated, as discussed at length and with great insight by the psychiatrist Bleuler (1924).

*modeling*; Arkin, 1998; Miall, 2003). See in this issue, the article by Jordan. Only recently have researchers begun to focus on the action-related aspects of consciousness (e.g., Frith et al., 2000; Lau et al., 2004; Libet, 2004; Morsella, 2005; Berti and Pia, 2006; Jeannerod, 2006; Pacherie, 2008; Morsella and Bargh, 2010).

The following sections summarize those findings from actionbased research that are relevant to this special issue about consciousness and action control (for a review of all action research, see Morsella, 2009)<sup>3</sup> .

### **UNCONSCIOUS PROCESSING IN ACTION CONTROL**

Investigations on consciousness and action control have revealed that many sophisticated aspects of action production can or do occur unconsciously (Bargh and Morsella, 2008; Morsella and Bargh, 2011; see Panagiotaropoulos et al., 2013). Specifically, investigations from diverse areas (see review in Morsella and Bargh, 2011), including motor control (Rosenbaum, 2002), subliminal processing (Hallett, 2007), automatisms (Morsella and Bargh, 2011), dissociations between action and conscious perception (Goodale and Milner, 2004), and the automatic activation of action plans (Morsella and Miozzo, 2002; Ellis, 2009), reveal that the activation, modulation, selection, and, in some cases, expression of action plans can occur unconsciously. For example, research on various neurological conditions has revealed aspects of action control that can occur unconsciously. These neurological conditions include *blindsight* (Weiskrantz, 1992, 1997), *blind smell* (Sobel et al., 1999), *utilization behavior* (Lhermitte, 1983), *visual form agnosia* (e.g., Patient D. F.; Milner and Goodale, 1995), *anarchic hand syndrome* (Marchetti and Della Sala, 1998), *sensory neglect* (Graziano, 2001; Heilman et al., 2003), unintentional *ambient echolalia* (Suzuki et al., 2012), and complex automatisms, (e.g., vocalizations and singing) during epileptic seizures (Blanken et al., 1990; Enatsu et al., 2011; Kececi et al., 2013). Insights about consciousness and action control stemmed also from the study of the "split brain" patient (Sperry, 1961), and from conditions in which declarative memory is compromised but action programs can be stored and influence action even when the patient is unaware of the acquisition or maintenance of these programs (e.g., as in the case of H. M.; Milner, 1966). Together, this research provided substantial knowledge about the sophisticated capacities of unconscious processing in action control (see, in this issue, contributions by Cruse and Schilling, by Fischer et al., by Hommel, by Masicampo and Baumeister, by Panagiotaropoulos et al., and by Merker).

This research also reveals which aspects of action control may be unconscious during normal, everyday action, in which conscious and unconscious processes interact in ways that are only now beginning to be understood (see, in this issue, articles by Lynn et al., by Panagiotaropoulos et al., and by Merker). For instance, under normal circumstances, a person is unconscious of the complicated motor programs that, during action production, calculate which muscles should be activated at a given time (James, 1890; Rosenbaum, 2002; Johnson and Haggard, 2005; see Grossberg, 1999, about why motor programs must be unconscious). Specifically, evidence suggests that one is unconscious of the programming of the efference to the muscles as well as of the adjustments that are made "online" as one, say, reaches for a moving object (Fecteau et al., 2001; Rossetti, 2001; Rosenbaum, 2002; Goodale and Milner, 2004; Heath et al., 2008; Liu et al., 2008; see, in this issue, articles by Anderson et al. and by Rosenbaum et al.).

The activation of action plans (a phenomenon to be distinguished from motor control) can occur unintentionally (see Lynn et al., this issue). This has been revealed in experimental paradigms in which the mere presence of incidental action-related stimuli can interfere with one's intended response to a target stimulus. A basic form of this effect has been demonstrated for decades in the classic Stroop task (Stroop, 1935; see reviews in MacLeod and Dunbar, 1988; MacLeod, 1991; MacLeod and MacDonald, 2000), in which the mere presence of a word (e.g., RED) interferes with naming a patch of color (e.g., blue). In this task, participants are instructed to name the color in which a word is written. When the color matches the word (e.g., RED presented in red), or is presented on a neutral stimulus (e.g., a series of x's as in XXXX), there is little or no interference [e.g., decreased response times (RTs)] and decreased perturbations in consciousness (e.g., "urges to make a mistake"; Morsella et al., 2009a). (Urges to err, a *subjective effect*, are obtained simply by asking participants after each trial, "How strong was your urge to make a mistake?" which participants rate on an 8-point scale, in which 1 signifies "almost no urge" and 8 signifies "extremely strong urge.") When the word and color are incongruous (e.g., RED presented in blue), response conflict leads to interference (Cohen et al., 1990), including increased RTs, error rates, and systematic changes in consciousness, such as urges to err (Morsella et al., 2009a).

In the incongruent condition, set-related top-down activation from prefrontal cortex increases the activation of areas in posterior brain regions (e.g., visual association cortex) that are associated with task-relevant dimensions (e.g., color; Enger and Hirsch, 2005; Gazzaley et al., 2005). Thus, to influence behavior, action sets from information in working memory or long-term memory increase or decrease the strength of perceptuosemantic information, along with, most likely, other kinds of information (e.g., motor priming). The finding that top-down activation strengthens one representation (e.g., color-naming) over another (e.g., word-reading) can be characterized as a case of "refreshing," the act of foregrounding one representation over another (Johnson and Johnson, 2009). Following an incongruent trial, ramped up activation in control regions of the brain (e.g., the dorsolateral prefrontal cortex) leads to improved performance on the subsequent trial (Cohen et al., 1990).

## **PARADIGMS ILLUMINATING THE LIAISON BETWEEN CONSCIOUSNESS AND ACTION CONTROL**

The Stroop task is one of many *response interference paradigms* (see, in this issue, articles by Anguera et al. and by Lynn et al.). In such paradigms, subjects attempt to respond to a *target* (e.g., font color in the Stroop task) while presented with a *distracto*r (e.g., Stroop word). Such interference paradigms have revealed much about the role of consciousness in action control. Findings

<sup>3</sup>The following is based in part on reviews of the literature presented in Morsella and Bargh (2011); Morsella et al. (2011) and Hubbard et al. (2013).

complementing that of the Stroop paradigm have been obtained with the classic Eriksen flanker task (Eriksen and Eriksen, 1974). In one version of the task (Eriksen and Schultz, 1979), participants are trained to press one button with one finger when presented with the letter S or M and to press another button with another finger when presented with the letter P or H. After training, participants are then instructed to respond to the stimulus presented in the center of an array (e.g., SSPSS, SSMSS, targets underscored) and to disregard the "flanking" distractors (i.e., the Ss). Of all the flanker conditions, measures of interference such as RTs, error rates, and self-reported urges to err are lowest in the *Identical* condition, where flankers and targets are identical, as in SSSSS (Eriksen and Schultz, 1979; Morsella et al., 2009b). In this paradigm, it is well-established that interference is greater when distractors are associated with a response that is different from that of the target (*response interference*; e.g., SSPSS) than when distractors look different from targets but are associated with the same response (*perceptual interference*; e.g., SSMSS; van Veen et al., 2001; Morsella et al., 2009b). These findings, revealing that perceptual processes can automatically activate action plans, have been used as evidence for *continuous flow* (Eriksen and Schultz, 1979) and *cascade* (McClelland, 1979; Navarrete and Costa, 2004) models of perception-and-action (see discussion in Morsella, 2009; see, in this issue, Filevich and Haggard's treatment of the effects of unselected actions).

There are many other experimental paradigms that illuminate the study of consciousness and action control: the anti-saccade task (Hallett, 1978; Curtis and D'Esposito, 2009), the MacLeod and Dunbar object naming task (MacLeod and Dunbar, 1988), spatial compatibility tasks (e.g., the Simon task; Simon et al., 1970), response-effect compatibility paradigms (Kunde, 2001), the Posner attentional cuing task (1980), dual-task paradigms (Kahneman, 1973; Logan and Gordon, 2001), binocular rivalry (Alais and Blake, 2005), inattentional blindness (Raymond et al., 1992), covert priming paradigms (Bargh and Chartrand, 2000), the implicit association task (Greenwald et al., 1998), and the go/no go (Newman et al., 1985) and stop-signal tasks (Lappin and Eriksen, 1966; see, in this issue, articles by Anguera et al. and by Diefenbach et al.).

Evidence from these paradigms suggests that response interference stems from the automatic, "stimulus-triggered" activation of action plans (DeSoto et al., 2001), as if distractors automatically activate the associated action plans. Accordingly, psychophysiological research shows that, in response interference, competition involves simultaneous activation of the brain areas associated with the target- and distractor-related responses (DeSoto et al., 2001; Mattler, 2005). Complementary evidence has been obtained from a more micro level of analysis: The activity of the neurons in the motor cortex that, in the aggregate, yield a population code corresponding to one vs. another action (e.g., moving the arm left or right; Georgopoulos et al., 1983; Bagrat and Georgopoulos, 1999). This research reveals that individual neurons can be found to fire, not only for the target-related action (i.e., the intended actions), but also for distractor-related actions (Cisek and Kalaska, 2005). Interestingly, although neurons actively code distractor-related action plans, this activation does not appear to influence one's conscious awareness about ongoing action: One infers only that one's whole brain and musculature were concerned about executing the intended movement (see, in this issue, article by Filevich and Haggard). Research on automaticity (Puttemans et al., 2005) and on the consciously inaccessible neural mechanisms underlying action intentions (Libet, 2004) similarly reveal several sophisticated action-related processes that are unconscious.

Similarly, research on *mirror neurons* (Rizzolatti et al., 2008) has revealed that, when observing the actions of others, one is activating neural circuits that correspond to action planning, even though one may be motionless and utterly unconscious of these activations. This research also reveals that conscious percepts are intimately related to action control (James, 1890; Gibson, 1979; Llinás, 2002; Fuster, 2003). For example, Proffitt and colleagues (Proffitt et al., 2003; Witt et al., 2005) have shown that hills look steeper if one is carrying a heavy backpack or that objects appear closer when one is holding a tool that makes it easier to retrieve those objects (see also Firestone, 2013; Proffitt, 2013). For evidence regarding the role of functional knowledge in object identification, see Bub et al. (2003).

Additional evidence for unconsciously mediated actionrelated processing stems from the study of *efference binding* (Haggard et al., 2002a), which links perceptual processing to action/motor processing. This kind of stimulus-response binding allows one to learn to press a button when presented with a cue in a laboratory paradigm. Taylor and McCloskey (1990, 1996) demonstrated that, in a choice RT task, participants could select the correct motor response (one of two button presses) when confronted with subliminal stimuli (cf., Hallett, 2007). Unconscious efference binding also occurs in the case of reflexive responses to environmental stimuli, as in the *pain withdrawal reflex.* It is worth mentioning that, concerning unconscious integrations, the binding of perceptual information, known as *afference binding* (Morsella and Bargh, 2011) can also occur unconsciously, as is evident in intra- and inter-sensory illusions (e.g., the McGurk effect; McGurk and MacDonald, 1976). (The McGurk effect involves interactions between visual and auditory processes: An observer views a speaker mouthing "ga" while presented with the sound "ba." Surprisingly, the observer is unaware of any intersensory interaction, perceiving only "da.")

## **CONSCIOUS ASPECTS OF ACTION CONTROL**

An appreciation of all that can transpire unconsciously during action control leads one to the following question. If so much in action control can be accomplished unconsciously, then what does consciousness contribute to action control? How and why is consciousness associated with some aspects of action control but not others?

When attempting to answer this question, one must consider that some aspects of action control do perturb consciousness strongly and reliably: (a) action-related mental imagery, (b) senses such as the *sense of agency* and *sense of effort*, and (c) actionrelated urges (e.g., arising under conditions of action conflict). We now discuss these under-explored conscious aspects of action control.

It has been demonstrated that the simultaneous activation of incompatible skeletomotor action plans, as when holding one's breath while underwater (where one is inclined to both inhale *and* not inhale) or suppressing a prepotent response in a response interference paradigm (see, in this issue, articles by Anguera et al., and by Lynn et al.), reliably influence consciousness (see quantitative review of evidence in Morsella et al., 2011). During such *conscious conflicts* (Morsella, 2005), a person experiences notable subjective "tuggings and pullings." Lewin (1935), Freud (1938), and Miller (1959) studied the nature of these intra-psychic conflicts. Often, in such conflicts, the expression of undesired action plans can be suppressed, but the subjectively experienced actionrelated inclinations cannot be (Bargh and Morsella, 2008). For instance, a person can suppress dropping a painfully hot dish of porcelain, but cannot suppress the subjective urges to drop the expensive dish (Morsella, 2005). In this way, inclinations can be behaviorally suppressed but most often cannot be mentally suppressed (Bargh and Morsella, 2008). These conscious conflicts stand in contrast to (a) conflicts involving smooth muscle (e.g., involving the pupillary reflex; cf., Morsella et al., 2009a), and (b) perceptual conflicts, which tend to be unconscious, as in the case of ventriloquism and McGurk effects (McGurk and MacDonald, 1976). This pattern of results suggests that the skeletal muscle system (an effector given the special appellation, "voluntary muscle") is intimately associated with conscious processing (see explanation in Morsella, 2005).

It should be noted that the interference paradigms mentioned above involve only punctate acts that are executed quickly (color naming and button pressing), placing minimal demands on *working memory* (WM). (See, in this issue, article by Anguera et al. and by Buchsbaum.) (WM has been defined as a temporary, capacity-limited storage system under attentional control that is used to intentionally hold, and manipulate, information in mind; Baddeley, 1986, 2007.) However, many of the conscious conflicts of everyday life—holding one's breath or gargling strong mouthwash for 30 sec—are not fleeting, short-lived events, but events that unfold over time and make demands on WM, by requiring one to hold in mind an action goal (e.g., not expelling mouthwash before 30 sec; Hommel and Elsner, 2009). In everyday life, many goal-directed actions are also guided by representations that are not triggered by external stimuli (Miller et al., 1960; Neisser, 1967). (This also occurs in the phenomenon of *prospective memory*; see McDaniel and Einstein, 2007.) Sustaining the activation of such internally-generated representations is an effortful process, requiring that top-down activation strengthen one representation (e.g., the target or action goal) over another (e.g., task-irrelevant goals; Gazzaley et al., 2005). Thus, many everyday acts of action control are actually instances of *WMbased action control*, in which a person effortfully holds an action goal in mind while attempting to overcome goal-irrelevant interference.

Theoretical developments have forwarded the notion that WM is intimately related to both action control and consciousness (LeDoux, 2008). This is evident in the title and contents of a recent treatise, *Working Memory, Thought, and Action* (Baddeley, 2007). Indeed, perhaps no mental operation is as consistently coupled with consciousness as is WM (LeDoux, 2008). When trying to hold in mind action-related information, a person's consciousness is consumed by this goal (James, 1890). For instance, when holding a to-be-dialed telephone number in mind (or when gargling with mouthwash for 30 sec), action-related mental imagery occupies one's consciousness during the delayed action phase. Similarly, before making an important toast (or, more dramatically, making the toast in a foreign and unmastered language), a person has conscious imagery regarding the words to be uttered, much as when an actor rehearses lines for an upcoming scene (see, in this issue, article by Buchsbaum). In this way, before an act, the mind is occupied with perceptuallike representations of what that act is to be, as James (1890) stated: "In perfectly simple voluntary acts there is nothing else in the mind but the kinesthetic idea. . . of what the act is to be" (p. 771). Thus, voluntary action control often occupies both WM and consciousness. Common experience suggests that, during the delay before action production, action-related imagery enters one's consciousness. The imagery is isomorphic in some ways with the overt action goal, especially in the case of "subvocalization" (Morsella and Bargh, 2010), which involves "talking in one's head" (Levelt, 1989). In subvocalizing, auditory imagery is isomorphic in some way with what would be uttered (Levelt, 1989; Baddeley, 2007; Morsella et al., 2009b; Morsella and Bargh, 2010).

In addition to conscious conflicts, urges, and WM-related conscious imagery is the sense of agency, another conscious aspect of action control. The sense of agency is based on the perception of the lawful correspondence between action intentions and action outcomes (Haggard and Clark, 2003; Wegner, 2003; Hommel, 2009). For example, if one has the intention of flexing one's finger or of saying "hello" and then one's finger happens to flex or one hears oneself utter "hello," respectively, then one is likely to sense that one caused the action. This attribution is the outcome of conceptual processing (Synofzik et al., 2008a,b; Jeannerod, 2009) that takes into account information from various contextual factors (Wegner and Wheatley, 1999; Moore et al., 2009), including that of motor efference (Cole, 2007; Engbert et al., 2007; Tsakiris et al., 2007; Sato, 2009), proprioception (Balslev et al., 2007; Knoblich and Repp, 2009), and the perception of the real-world consequences of action intentions (Synofzik et al., 2009). This sense could be considered a form of *metacognition* (Dunlosky and Metcalfe, 2008).

By manipulating contextual factors, scores of experiments have demonstrated that subjects can be fooled into believing that they caused actions that were in fact caused by something else (Wegner, 2002). For example, when a participant's hand controls a computer-drawing device behind a screen such that the participant cannot see his or her hand in motion, the participant can be fooled into thinking (through false feedback on the computer display) that the hand intentionally moved in one direction when it actually moved in a slightly different direction (Fourneret and Jeannerod, 1998). With such techniques, participants in another study were tricked into believing that they could control the movements of stimuli on a computer screen through a phony brain-computer interface (Lynn et al., 2010). When intentions and outcomes mismatch, people are less likely to perceive actions as originating from the self (Wegner, 2002).

Most of these studies examine how agency is influenced by intention-outcome mismatches or illusory intention-outcome matches. There are several "comparator models" explaining how intention-outcome mismatches are detected and influence various levels of agency. Importantly, different theorists link the sense of agency and urges to different phases of the process (cf., Haggard, 2005, 2008; Berti and Pia, 2006; David et al., 2008). Complementing research on the sense of agency are investigations on the *sense of effort* during action control (Sherrington, 1900, 1906; Gandevia, 1982) and the sense of body ownership (e.g., in the rubber hand illusion; Botvinick and Cohen, 1998) and of actions generated toward the body (e.g., tickling-related illusions; Blakemore et al., 2000). Additionally, states described as *flow* (Csikszentmihalyi, 1990) and *effortless attention* (Bruya, 2010) have been associated with forms of action control. Moreover, theorists of the Würzburg School (e.g., Külpe, Ach, and Marbe) have discussed several, action-related *conscious attitudes*, including *doubt*, *hesitation*, *certainty*, and *will to enact a certain change in the world*.

We will now survey some less intuitive properties of actionrelated conscious processing. First, there is a peculiar property of voluntary action that appears to not be shared by other (e.g., involuntary) forms of action. For reasons unknown, in *intentional binding*, the perceived elapsed time between a voluntary action and its consequence is shorter than the actual time span (Haggard et al., 2002b), as if the two events were temporally attracted to each other. Thus, when striking a bell voluntarily, the experiences of striking the bell and of hearing the gong of the bell are perceived to occur more closely together in time than they actually did.

Another property of action-related consciousness arises in the paradigm of binocular rivalry (see Logothetis article). In this paradigm (see review in Alais and Blake, 2005), participants are first trained to respond in certain ways when presented with visual stimuli (e.g., to button-press when presented with the image of a house). After training, a different stimulus is presented to each eye (e.g., an image of a house to one eye and of a tree to the other). Surprisingly, the participant does not consciously perceive both objects (e.g., a tree overlapping a house), but responds as if perceiving only one object at a time (e.g., a house followed by a tree). During rivalry, the conscious percept is said to be "dominant," and the unconscious percept is said to be "suppressed."

The mind's process of switching dominance between each eye can be manipulated in interesting ways. Maruya et al. (2007) demonstrated that voluntary action can influence which percept enters awareness: The object that moved in synchrony with participants' voluntary movements of a computer mouse was dominant for a longer period of time and suppressed for a shorter period of time. Rivalry stimuli consisted of a radial grating (resembling the pattern on a dart board) and a rotating sphere that was transparent and defined solely by dots. Prior to test, participants learned to move a computer mouse in a continuous left-to-right motion. Participants later performed this motion under conditions of rivalry. Maruya et al. (2007) concluded, "conflict between two incompatible visual stimuli tends to be resolved in favor of a stimulus that is under motor control of the observer viewing that stimulus" (p. 1096), revealing "a strong link between action and perception" (p. 1090). This finding is consistent with that of Wohlschläger (2000), who reported that, while perceiving a perceptually bistable apparent rotation of an object, participants were more likely to perceive the object as rotating in the direction in which they happened to be rotating a knob (Repp and Knoblich, 2007), a case of *perceptual resonance* (Wohlschläger, 2000; Schütz-Bosbach and Prinz, 2007). Consistent with the finding by Maruya et al. (2007), Doesburg et al. (2009) found in a psychophysiological study that it is only during the dominant percept that perceptual processing associated with the percept is coupled with motor-related processes in frontal cortex. (Additional evidence stems from a recent study showing that entry of any kind may require a top-down signal from frontal cortex; Boly et al., 2011; Panagiotaropoulos et al., 2012.)

Perceptual resonance, and the voluntary control of action, can be explained by *ideomotor theory* (Lotze, 1852; Harleß, 1861; James, 1890; Greenwald, 1970; Hommel et al., 2001; Hommel, 2009; Hommel and Elsner, 2009). When popularizing this theory, William James (1890) proposed that the mere thoughts of actions produce impulses that, if not curbed or controlled by thoughts of incompatible actions, result in the performance of the imagined actions (see Marien et al., this issue). From this view, activating the perceptual effects of an action leads to the corresponding action—effortlessly and without awareness of the motor programs involved (Gray, 1995; Kunde, 2004). The representations guiding action production tend to be perceptual-like images of action outcomes (Hommel, 2009), which are based on memories of prior action outcomes (see, in this issue, Marien et al. for role of reward in ideomotor learning). Consistent with ideomotor theory, during conflicts such as those of the Stroop task, it is perceptual-like representations that are activated to guide action (Enger and Hirsch, 2005).

Because action/motor processes are largely unconscious (Grossberg, 1999; Goodale and Milner, 2004; Gray, 2004), the entry into consciousness of content is influenced most by perceptual-based (and not action-based) events and processes (e.g., priming by perceptual representations; Müller, 1843; James, 1890; Gray, 2004; Morsella and Bargh, 2010). [See brain stimulation evidence in Desmurget et al. (2009).] Hence, few conscious contents should arise from what can be construed as "pure" action-related processes (should there be such a thing; cf., Hommel, 2009). Thus, entry from action in Maruya et al. (2007) might be the result of the more "perceptual" aspects of action production, such as perceptual-like *action effect* representations (or "Effektbild"; Harleß, 1861) or corollary discharges from action plans (Gray, 2004). From this standpoint, though perception and action are intimately related and may even share the same representational format, as in "common code" models of perception-and-action (Hommel, 2009), when it comes to phenomenology, consciousness is most influenced by what has traditionally been regarded as the perceptual end of the perception-action cycle (Neisser, 1976; Gray, 1995). Accordingly, research by Wohlschläger (2000) and by ideomotor theorists (e.g., Hommel, 2009) suggests that action-based effects on awareness such as perceptual resonance require, not only perturbation of the sensorium, but dimensional overlap (e.g., shared spatial dimensions) between actions and percepts (cf., Knuf et al., 2001; Schütz-Bosbach and Prinz, 2007).

As noted, some ideomotor models propose that perceptual action effects and action codes share the same representational format, hence the description of some ideomotor accounts as common code theories of perception-and-action (Hommel, 2009). Such common code perspectives resemble mirror neuron approaches (Rizzolatti et al., 2008) and motor theories of speech perception (Liberman and Mattingly, 1985). (For a treatment of action simulation, see, in this issue, Springer et al.) Similarly, speaking about the interconnection between perception and action, Sperry (1952) proposed that the phenomenal percept (e.g., the shape of a banana) is more isomorphic with its related action plans (grabbing or drawing the banana) than with its sensory input (the proximal stimulus on the retina). [For contemporary treatments regarding how action influences the nature of conscious percepts, see Gray (1995), Hochberg (1998), O'Regan and Noë (2001), and Humphreys (2013).]

With great influence, Gibson (1979) too proposed an "ecological theory" of perception in which perception is intimately related to action, but, unlike ideomotor theory and common code approaches, Gibson's approach is strictly non-representational in that all the information necessary for action was provided and contained by the environment. For a treatment regarding the difference between ecological and representational ("cognitive") theories of action, see Hommel et al. (2001). See Sheerer (1984) and Markman (1999) for reviews of the shortcomings of approaches in which the nature of percepts or, more generally, representations, is constituted in part by motor processing, as in "peripheralist," "motor," "embodied," "efferent," and "reafferent" theories of thought (e.g., Münsterberg, 1891; Watson, 1924; Washburn, 1928; Held and Rekosh, 1963; McGuigan, 1966; Festinger et al., 1967; Hebb, 1968; see discussion of embodied approaches in Deifenbach et al., this issue; see relevant article by Jordan, in this issue).

## **CONCLUSION TO THE INTRODUCTION OF THE SPECIAL ISSUE ON CONSCIOUSNESS AND ACTION CONTROL**

Our survey and the following articles reveal that one of the primary reasons to study consciousness by way of action control

## **REFERENCES**


Distilling the neural correlates of consciousness. *Neurosci. Biobehav. Rev.* 36, 737–746. doi: 10.1016/j.neubiorev.2011.12.003


is that the contrast between conscious and unconscious processes is easy to appreciate from an action-based standpoint. It is important to consider that, though it is far from trivial to demonstrate unconscious perceptual processing—a controversial phenomenon whose study often requires neuroimaging and sophisticated techniques (e.g., perceptual priming)—even the most cursory examination of action phenomena reveals that, in the nervous system, there is the distinction of processes that are *consciously mediated* (e.g., voluntary action) and *unconsciously mediated* (e.g., reflexes, peristalsis, and aspects of motor control). Stumbling upon this contrast between conscious and unconscious processes is not only uncontroversial in the study of action but is inevitable. In addition, it is more experimentally tractable to study the relationship between action and consciousness than that between attention and consciousness (the traditional approach; cf., Baars, 1997), because in the former there is less likelihood of conflating conscious and attentional processes (cf., Hamker, 2003), a recurring problem in consciousness research (Baars, 1997; Maruya et al., 2007). Last, what Sperry noted in 1952 about action is still true: *The outputs of a system reveal more about the inner workings of the system than do the inputs to the system.* As the cardinal "output" of the nervous system (Morsella and Bargh, 2010), action thus provides the investigator with a unique portal to illuminate the most elusive of central processes, consciousness.

#### **ACKNOWLEDGMENTS**

We would like to thank Professor Lorenza Colzato, Professor Bernhard Hommel, and the editorial staff at *Frontiers in Cognition* for giving us the honor of serving as editors of this special issue and for assisting us throughout the entire editorial process. We are also indebted to the contributors of the special volume. They have shared with us and the readership of *Frontiers in Cognition* theoretical and empirical advancements that will be studied for years to come. Ezequiel Morsella acknowledges the support provided by the Center for Human Culture and Behavior at San Francisco State University.

6, 47–52. doi: 10.1016/S1364-6613 (00)01819-2


*Phenomenol. Cogn. Sci.* 6, 309–325. doi: 10.1007/s11097-007-9051-5


perception. *J. Exp. Psychol. Monogr.* 74, 1–36. doi: 10.1037/h0024766


to consciousness: buffer of the perception-and-action interface," in *The Unity of Mind, Brain and World: Current Perspectives on a Science of Consciousness*, eds A. Pereira and D. Lehmann's (Cambridge, UK: Cambridge University Press), 43–76.


conscious awareness. *Nat. Neurosci.* 5, 382–385. doi: 10.1038/nn827

	- Hallett, P. E. (1978). Primary and secondary saccades to goals defined by instructions. *Vis. Res.* 18, 1279–1296. doi: 10.1016/0042-6989 (78)90218-3
	- Hallett, M. (2007). Volitional control of movement: the physiology of free will. *Clin. Neurophysiol.* 117, 1179–1192. doi: 10.1016/j.clinph. 2007.03.019

*Handbook of Human Action,* eds E. Morsella, J. A. Bargh, and P. M. Gollwitzer (New York, NY: Oxford University Press), 371–398.


revised. *Cognition* 21, 1–36. doi: 10.1016/0010-0277(85)90021-6


*Trends Cogn. Sci.* 4, 383–391. doi: 10.1016/S1364-6613(00)01530-8


Koch (New York, NY: McGraw-Hill), 196–292.


inattentional blindness and the capture of awareness. *Psychol. Rev.* 112, 217–242. doi: 10.1037/0033- 295X.112.1.217


macaque lateral prefrontal cortex. *Front. Psychol.* 4:603. doi: 10.3389/fpsyg.2013.00603.


*Am. Psychol*. 60, 308–317. doi: 10.1037/0003-066X.60.4.308


eds D. T. Gilbert, S. T. Fiske, and G. Lindzey (New York, NY: McGraw-Hill), 446–496.


New York, NY: Oxford University Press.


*Conscious. Cogn.* 8, 225–259. doi: 10.1006/ccog.1999.0390

*Received: 13 August 2013; accepted: 15 August 2013; published online: 10 September 2013.*

*Citation: Morsella E and Poehlman TA (2013) The inevitable contrast: Conscious vs. unconscious processes in action control. Front. Psychol. 4:590. doi: 10.3389/fpsyg.2013.00590*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Morsella and Poehlman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The efference cascade, consciousness, and its self: naturalizing the first person pivot of action control

## *Bjorn Merker\**

*Kristianstad, Sweden*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA T. Andrew Poehlman, Southern Methodist University, USA*

*\*Correspondence: Dr. Bjorn Merker, Fjälkestadsv. 410-82, Kristianstad SE-29194, Sweden e-mail: gyr694c@tninet.se*

The 20 billion neurons of the neocortex have a mere hundred thousand motor neurons by which to express cortical contents in overt behavior. Implemented through a staggered cortical "efference cascade" originating in the descending axons of layer five pyramidal cells throughout the neocortical expanse, this steep convergence accomplishes final integration for action of cortical information through a system of interconnected subcortical way stations. Coherent and effective action control requires the inclusion of a continually updated joint "global best estimate" of current sensory, motivational, and motor circumstances in this process. I have previously proposed that this running best estimate is extracted from cortical probabilistic preliminaries by a subcortical neural "reality model" implementing our conscious sensory phenomenology. As such it must exhibit first person perspectival organization, suggested to derive from formating requirements of the brain's subsystem for gaze control, with the superior colliculus at its base. Gaze movements provide the leading edge of behavior by capturing targets of engagement prior to contact. The rotation-based geometry of directional gaze movements places their implicit origin inside the head, a location recoverable by cortical probabilistic source reconstruction from the rampant primary sensory variance generated by the incessant play of collicularly triggered gaze movements. At the interface between cortex and colliculus lies the dorsal pulvinar. Its unique long-range inhibitory circuitry may precipitate the brain's global best estimate of its momentary circumstances through multiple constraint satisfaction across its afferents from numerous cortical areas and colliculus. As phenomenal content of our sensory awareness, such a global best estimate would exhibit perspectival organization centered on a purely implicit first person origin, inherently incapable of appearing as a phenomenal content of the sensory space it serves.

**Keywords: action control, attention, consciousness, egocentric space, first person, pulvinar, self, superior colliculus**

## **INTRODUCTION**

"Given the presumption that the way we see the world evolved to make the control of action as straightforward as possible, it is likely that our phenomenal perception of the world is closely related to the mechanisms we use to act upon it"

Michael Land (Land, 2012, p. R811).

Whatever a theory of consciousness might contain or propose, it must provide an account of what it is that places us in a *first person* perspectival relation to our phenomenal experience. So central is this relation to the constitution of the conscious state that it virtually defines it (Velmans, 1991; Merker, 1997). This much at least is certain, without such an account a theory cannot be adequate to the greater part of ordinary waking reality, because in it we routinely experience the events of our lives. The "we" here refers, of course, to the "first person" in question. Neither self-consciousness nor a self-image is implied by this usage; to be subject to phenomenal experience suffices. To the extent that any notion of self is consciously entertained, it shares with other items or contents of consciousness the status of being apprehended from a first person perspective. The latter does not, in other words, presuppose self images or self-consciousness, but they presuppose it.

To be explored in what follows is the proposition that the first person perspective, and with it consciousness, is best understood in relation to the requirements of action control (Merker, 2005, 2007; Land, 2012), and has its origin in them. It is there that one finds the key to the kinds of content that enter the conscious state (Morsella, 2005) and also the functional grounds for the peculiar tripartite nested format in which the first person perspective of our sensory consciousness is cast (Merker, 2007, 2013). In this endeavor we shall be concerned almost exclusively with sensory consciousness, and visual sensory consciousness in particular. This is not because other domains of conscious contents are without interest, but because nowhere is the first person perspective more concretely defined, more instructively instantiated, or more empirically accessible than in immediate phenomenal sensory experience.

Sensory experience is typically treated on the afferent side of cerebral operations, concerned with how the brain interprets and makes sense of the barrage of irregular spiking activity arriving on its sensory nerves. Action control, on the other hand, is typically treated on the efferent side, presupposing that the world has been deciphered, and one is ready to act upon it. The *apparent* contradiction of making action control the key to sensory experience stems from conflating sensory operations—the ramified activity of the cortical sensory hierarchies—with sensory experience. The latter is conscious, and the phenomenal objects that populate it bear no trace of the massive multi-stage operations the cortex mounts in order to strip them of the multiple dimensions of inherent ambiguity encumbering the brain's primary afference (see Merker, 2012 and references therein). Sensory objects present themselves to our consciousness as finished products of the cortical hierarchies, delivered on completion of their labors (which accordingly may take place unconsciously).

There are, moreover, good grounds for believing that the cortex employs a probabilistic data format for its many internal operations (Hinton and Sejnowski, 1983; Földiák, 1993; Anderson and Van Essen, 1994; Zemel et al., 1998; van Rossum et al., 2002; Pouget et al., 2003), and that our sensory world is a running *global best estimate* based upon those probabilistic cortical preliminaries (Merker, 2012). The cortex, furthermore, has reason to avoid precipitating final estimates within its own operations (van Rossum et al., 2002; Merker, 2012; see Beck et al., 2008 and Ma et al., 2006 for an example). It is perfectly feasible, then, to entertain the possibility that the implementation of our sensory awareness takes place in structures among efferent targets of cortical operations, provided they have the requisite representational capacity and are in receipt of direct projections from a suitable set of cortical areas. What such an arrangement might look like when pursued into the targets of descending cortical pathways will be explored in the sections that follow.

## **THE EFFERENCE CASCADE DEFINED**

It is an all too common misconception that cortical control over behavior is exercised principally through direct projections from primary motor cortex to the motor neuronal apparatus of lower brain stem and spinal cord, and that the rest of the cortex influences behavior indirectly, via its typically multisynaptic transcortical connections to primary motor cortex. But no cortical area is dependent on the motor cortex for its efference1 because every cortical area has direct subcortical projections descending from pyramidal cells populating its lower two cortical layers (Diamond, 1979; Jones, 1984; Thomson and Lamy, 2007).

One contingent of these descending projections issues from cortical layer VI to "near" (often reciprocally connected) subcortical structures such as the thalamus and the claustrum [reviewed in Thomson (2010)]. In the thalamus they exert a merely modulatory influence on their target structures via small boutons synapsing on distal dendrites and engage the thalamic reticular nucleus (likewise modulatory) by collaterals when passing through it (Guillery, 1995; Eri¸sir et al., 1997; Sherman and Guillery, 1998; Prieto and Winer, 1999; Rouiller and Welker, 2000; Li et al., 2003; Wang et al., 2006). In the setting of reciprocal corticothalamic connectivity this large population of layer VI cells presumably is engaged in "tuning" neural activity on its way *up to* the cortex (cf. Ferster and Lindström, 1985; Martin and Somogyi, 1985; Li and Ebner, 2007; da Costa and Martin, 2009), whether that activity originates in thalamic sensory relay nuclei or higher order ones.

It is cortical layer V, however, that contains pyramidal cells engaged in exporting cortical information to distant targets, and therefore can be expected to convey a final summary of cortical operations to the rest of the brain. It supplies numerous diverse and far-flung subcortical targets in basal ganglia, basal forebrain, diencephalon, midbrain, pons, medulla, and spinal cord with typically high-security driving synaptic input via large boutons that synapse on proximal dendrites (Kuypers, 1981; Jones, 1984; Guillery, 1995; Sherman and Guillery, 1998; Rouiller and Welker, 2000; McHaffie et al., 2001; Winer, 2006; Lemon, 2008). *Every cortical area issues such descending projections*. Their precise subcortical targets depend on the cortical area in question. In this laminar sense, then, all of cortex can be said to have a motor function (Diamond, 1979; cf. also Jones, 1984, p. 522; Campbell, 1905; Bolton, 1910; Swanson, 2000).

Not all long descending cortical projections terminate in motor related structures, however. Some innervate brainstem sensory structures such as the trigeminal sensory and dorsal column nuclei (Kuypers, 1981). *Here the term "efference cascade" will therefore be used as a comprehensive and functionally neutral term for the entire diverse system of descending (extra-telencephalic) cortical layer V projections*. It originates in large pyramidal cells concentrated to lower cortical layer V.

These layer V pyramids exceed all other cortical cell types in the comprehensiveness with which they sample activity across cortical layers (Larkum, 2013). Their basal dendrites often extend into cortical layer VI below them (e.g., Dégenètais et al., 2002 Figure 10; Ledergerber and Larkum, 2010, Figure 12), and their robust and typically branching apical dendrites extend as prominent tufts into the supragranular layers including layer I. Special conductance and spike initiation mechanisms operate to connect this tuft compartment with the basal dendrite and axon initial segment compartment via action potential backpropagation (Amitai et al., 1993; Yuste et al., 1994; Larkum et al., 1999, 2004; Larkum, 2013). They thus appear ideally disposed to issue a comprehensive summary to the rest of the brain of the state of the local patch of cortex in which they reside.

It was in this sense that Douglas and Martin summarized their role as follows, "The pyramidal cells of layer 5 that drive subcortical structures involved in action (e.g., basal ganglia, colliculus, ventral spinal cord) decide the output of the cortical circuits" (Douglas and Martin, 2004, p. 443). The axons of these pyramidal cells do not send collaterals to the thalamic reticular nucleus even when passing through it on their way to the dorsal thalamus (Jones, 2002). This, in present terms, is in keeping with their operational role as conduits for the running record of *completed*

<sup>1</sup>Unless, of course, a cortical area needs to utilize the highly specialized motoric capacity for which primary motor cortex appears to have evolved, the control of that fraction of behavior that consists of the skilled (learned) and detailed patterning of movements of distal extremities, or effectors such as those involved in vocal learning (Heffner and Masterton, 1975; Lawrence and Hopkins, 1976; Passingham et al., 1978; Kuypers, 1981; Karni et al., 1998; Rathelot and Strick, 2006; Okanoya and Merker, 2007; Brown et al., 2008; Lemon, 2008).

*cortical labors* rather than earlier operational stages requiring tuning of activity *arriving* at cortex from subcortical sources.

The morphological and physiological specializations of layer V pyramidal cells ensure that the spiking activity of their axons comes to reflect the overlap in time of activity across cortical layers (Jones, 1998; Douglas and Martin, 2004; Larkum, 2013, box 1, Figure 1; Thomson et al., 2002). They appear to be particularly well disposed, in fact, to reflect conjoint activation of cortical feedforward and feedback projections in their activity (Larkum, 2013). This circumstance carries special significance for the present topic, because a number of lines of evidence suggest that such conjoint activation is a condition for cortical information to enter consciousness (Lamme and Spekreijse, 2000; Bullier, 2001; Merker, 2004, p. 566 and Figure 4; Lamme, 2010; Boly et al., 2011).

It is conceivable, therefore, that somewhere beneath the cortex there is a target or set of targets of these cortical layer V pyramidal cell axons in which their "reporting" on the cortical pattern of conjoint activation of feedforward and feedback activity becomes conscious, after passing a threshold in that subcortical terminus. Combined with the reasons alluded to in the previous section for provisionally excluding the cortex itself as a venue for precipitating the sensory estimates that yield phenomenal perceptual objects (a full rationale is presented in Merker, 2012), it seems worth exploring the distinct possibility that the brain's mechanism of consciousness might hide among targets of cortical layer V descending projections.

## **PICKING A PATH THROUGH THE WILDERNESS**

The massive many-to-few convergence by which the efference cascade connects vast expanses of cortex to compact subcortical nuclei is an appropriate design feature for a system that derives concise final estimates from cortical probability distributions for purposes of action control. Given that no more than roughly a hundred thousand motor neurons must execute every behavior influenced by some 20 billion cortical neurons, a steep convergence ratio is a systemic necessity. This fits well with the modest representational requirements of final estimates compared to their capacity-intensive probabilistic preliminaries (Ma et al., 2006; Beck et al., 2008). Here, however, we are not concerned with just any estimate, but with the brain's global best estimate of its current circumstances, proposed to fill our consciousness with the world we experience around us (Merker, 2012, 2013). Do compact subcortical nuclei have the neuron numbers and representational capacity to accomodate such content?

A calculation based on a well-studied aspect of phenomenal sensory content, namely visual acuity as a function of eccentricity, discloses that some 164,000 picture elements ("pixels") suffice to render a monochromatic, monocular, full-field human visual percept at full psychophysical (i.e., phenomenal; see Rock, 1997) resolution (Rojer and Schwartz, 1999; see also Lennie, 1998, p. 900, and Watson, 1987). By rough extrapolation from this measure, a few million neurons employed as representational elements should readily accomodate the full compass of multimodal human sensory awareness (for additional detail, see Merker, 2012, p. 49). This in turn means that a number of the way stations of the efference cascade, such as the superior colliculus in the midbrain and the mediodorsal nucleus as well as the pulvinar complex of the higher order thalamus have the requisite neuron numbers to do so (for cell counts, see Théoret et al., 2001; Abitz et al., 2007; Chalfin et al., 2007).

At least on this score, then, the search for an implementation of a mechanism of sensory consciousness among the subcortical targets of the efference cascade can proceed without embarrassment. In so doing, the *generic structural characteristics of phenomenal sensory consciousness* can be used to canvass the tangled anatomy of the search space for candidate implementing mechanisms (Merker, 2012, 2013). So far this phenomenal resource remains curiously under-exploited in consciousness theory<sup>2</sup> , though it would seem to be a necessary requirement for any matching of candidate neural mechanisms to the operational requirements of the function they are conjectured to implement.

One of the more conspicuous structural characteristics of sensory experience is the nested arrangement in which it comes to us. The world we inhabit is laid out before us in consciousness as a three-dimensional panorama surrounding a central object, our body, from which we look out upon the world through an empty opening in its upper face region (Mach, 1897; Merker, 2007, 2013). The key claim of the present proposal is that this nested egocentric organization of sensory consciousness is inherently related to and derived from the needs of action control in that it simplifies the conversion of locational differences in phenomenal space to directional displacements in our most ubiquitous category of behavioral output, namely the targeting movements of spatial orienting behavior (Hassler and Hess, 1954; Sokolov, 1963; Johansson et al., 2001; Land, 2012). Subsequent sections will expand on this theme, but for now a minimal sketch of the rudiments of an egocentric orienting system is provided in **Figure 1**.

Gaze or orienting movements account for a greater share of behavioral variance than any other kind of movement. They typically provide the temporally leading edge of all instrumental acts by landing on the targets of those acts *ahead* of the implementing body part (for detail, see Merker, 2012, pp. 46–47). The strategy applies all the way down to the split-second details of manipulative activity (Johansson et al., 2001). Arm and fingers *follow* the agile movements of the gaze as if attached to it by elastic bands. The coupling of arm or hand to the gaze appears to be the brain's default mode of operation (Gorbet and Sergio, 2009; Thaler and Todd, 2009; see also Lünenburger et al., 2001; Reyes-Puerta et al., 2010; Crawford et al., 2011), and so called gain fields (Andersen and Mountcastle, 1983; Chang et al., 2009) can be likened to the "elastic bands" in the analogy just used.

These leading gaze or orienting movements accordingly can be regarded as the brain's principal output. To a first approximation they consist of rotary displacements of the eyes in their

<sup>2</sup>One possible reason for this neglect is described in the final section of Merker (2013). In brief, it may betoken a lingering and entirely tacit influence of naive realism on theorizing such that this world that surrounds us is not recognized as a content of consciousness but is mistaken for the actual physical universe itself. Such misattribution eliminates a major portion of the contents of sensory consciousness from consideration vis-a-vis consciousness theory, whose purview accordingly shrinks to matters of our "inner life," thinking, self-consciousness, qualia, and the like.

**FIGURE 1 | Polar panorama of Cardiff Castle surrounding an observer head, to illustrate the use of an egocentric neural representation of ambient space in the control of rotational displacements of eyes and head during orienting movements.** Only a head movement is depicted. For inclusion of eye movements in such a scheme, see Land (2012). The physical universe is rendered in gray scale, while the contents of the neural reality model (shown filling the physical head only to gain image resolution) are rendered in color and raster. **Colored sector:** The visible portion of the surroundings representated in the neural reality model, anchored to the perceptual egocenter inside the reality model's head representation (rastered, because not within the field of view). **Rastered sector:** The remaining multimodal space representation of the neural reality model, tacitly present for vision in the form of sectors of ambient space that may be brought within the field of view by gaze displacements. In such a scheme perceived angular distance to a potential orienting target matches the required rotational displacement of the physical eyes and head (gaze), symbolically indicated by the line joining the two angular displacements. The execution of such a movement is *experienced* as a movement of one's (i.e., neural model) *head only*, while one's (i.e., neural model) surroundings *remain stationary*, though the *physical* surroundings undergo wholesale displacement relative to the sensory receptors fixed to the moving physical eyes and head in the course of that movement. The tacit representation of the surroundings (rastered) accordingly must undergo a corresponding compensatory displacement in the neural reality model, leaving the rastered sector "locked," as it were, to the physical surroundings despite head movements, presumably in dependence on oculomotor efference copy and vesitibular head movement signals (see further, Land, 2012). The content of the colored sector, of course, is always what is before the eyes. For gaze movements from one primary position to another that content always occupies the same fixed sector of the reality model (i.e., without requiring translatory movement), given surround compensatory movement plus saccadic suppression. In the proposed dorsal pulvinar implementation of such a reality model, the compensatory surround movement can draw on afference from both colliculus and posterior parietal cortex, the latter in receipt of disynaptic hippocampal, cerebellar and collicular (Clower et al., 2001), as well as vestibular (Andersen, 1997), information. The Cardiff Castle panorama photo is from Gregg M. Erickson under a Creative Commons Attribution 3.0 Unported license, modified in polar coordinates by Nevit Dilmen under the same license, further modified for inclusion in this figure by Bjorn Merker and released under the same license.

orbits and of the head on its cervical pivot. Rotation-based coordinate transformations accordingly are central operations in their coordination and control (Crawford et al., 2011). That control is implemented by highly conserved and complex sensorimotor circuitry of the brainstem (Simpson et al., 1988; Büttner-Ennever et al., 1989; Grantyn et al., 1992; Masino, 1992; Isa and Sasaki, 2002), ultimately anchored to the vestibular system (Cohen, 1988). All higher control of orienting behavior must in one way or another access that control circuitry.

The circumstances just reviewed allow a considerable portion of the efference cascade to be put to one side for present purposes. In his comprehensive survey of the "anatomy of the descending pathways" of 1981, Kuypers identified two major contingents of these pathways (Kuypers, 1981). He called them Group A and Group B. The fiber tracts of Group A follow and contribute to the brain's most basic and earliest formed fiber tract, the medial longitudinal fasciculus (Ross et al., 1992). Through this contingent of medially descending tracts, vestibular, oculomotor/reticular, tectal and other fiber systems effect a set of spatially directional motor adjustments that regulate the body's basic postural orientation to its surroundings in gravitational, inertial, and other spatial sensory system terms (i.e., the functional domain outlined in Roberts, 1973). This medial system is crowned by the control circuitry for eye, head, and (in many species) ear movements that together with trunk movements determine the direction of gaze during orienting movements (Hassler and Hess, 1954; Henkel and Edwards, 1978; Büttner-Ennever et al., 1989; Grantyn et al., 1992; Masino, 1992; Isa and Sasaki, 2002; Horn, 2006).

The fiber tracts of Group B descend in a lateral course through the brainstem, and functionally supplement those of Group A with motor adjustments centered on distal extremities such as those involved in manipulative activity. Group B circuitry accordingly can be thought of as the part of the efference cascade by which the brain guides the body's "engagement" with the configuration of a selected target object or event, while Group A "orients" the body to its global surroundings and targets within it. There is an obvious match between these two contingents of the efference cascade and the "leading" and "following" components of behavior referred to above. It is only the first of these movement domains, those of orienting, that are served by the simplifying geometry of egocentric, rotation-based transformations reflected in the nested format of our sensory consciousness. The search space for a hypothetical implementation of sensory consciousness within the targets of the efference cascade accordingly can be confined to components of Kuypers' Group A "orienting" circuitry.

Even then, Group A features daunting complexity, and further constraints are needed. Functionally, a unitary displacement of the gaze from one target location to another is typically effected by a minimum of two partly independent but linked motor systems, those of eyes and head. The most caudally located premotor site for unitary specification of gaze displacements is the superior colliculus in the roof—tectum—of the midbrain [(Munoz et al., 1991; Freedman et al., 1996; Freedman and Sparks, 1997; Scudder et al., 2002); reviewed in Sparks (2004); see also (Khan et al., 2009)]. Downstream from the superior colliculus the circuitry for control of eyes and head again diverge (Masino, 1992; Scudder et al., 2002; Sparks, 2004; Horn, 2006).

The search for a unitary global best estimate mechanism can be confined, in other words, to targets of cortical layer V projections concerned with orienting behavior located between the cortex and the isthmic caudal border of the midbrain. Within this territory, the numerous targets of descending projections from the principal orienting-related cortical areas, namely the frontal eye fields and gaze-related partietal cortex in primates (Huerta et al., 1986; Stanton et al., 1988a; Saint-Cyr et al., 1990; Lock et al., 2003) are entangled in intricate mutual connective relations within which an ordering principle is nevertheless discernible. As pointed out by Huerta et al. (1986, pp. 434–435), the colliculus belongs among the more prominent targets of both of these cortical areas, and many of their *other* subcortical targets—typically connected with one another—project to the colliculus and are targeted by the colliculus in its turn. The functional significance of this curious parallel or duplicative connectivity will be explored in what follows.

## **AN ORIENTING SUPERHUB IN THE ROOF OF THE MIDBRAIN**

At least half a dozen areas of the macaque cortex have functional specializations related to the control of gaze movements (see Lynch and Tian, 2006 for a detailed treatment). Of these, the principal ones are the frontal eye fields inside the arcuate sulcus of the frontal lobe and the parietal gaze area in the lateral bank of the intraparietal sulcus, henceforth "cortical gaze fields" for short. The telencephalic, diencephalic, and mesencephalic targets of descending projections from the cortical gaze fields are shown in barest outline in **Figure 2**, along with some of the principal connections among those targets. Together these structures form the basic supranuclear apparatus for control of gaze (orienting) behavior between the cortex and the mesopontine isthmus. It is interposed, in other words, between the cortex and the brainstem reticular and cervical spinal motor circuitry for eye and head movement control. In the figure they have been grouped into two "subcortical tiers." One contains cortical gaze field targets in basal ganglia and dorsal thalamus, and the other their targets in ventral thalamus and midbrain.

Tier 1 consists of the gaze field recipient zones in the striatum and a paramedian constellation of orienting-related thalamic nuclei which in addition to the pulvinar complex includes what might be called the "extended intralaminar complex." The latter is a set of thalamic nuclei that share the property of projecting to the basal ganglia (Powell and Cowan, 1956; Jones, 1989, 1998; McFarland and Haber, 2000, 2001). They include the suprageniculate and limitans nuclei at the caudoventral border of the thalamus, the parafascicular, central lateral, and paracentral nuclei of the classical intralaminar nuclei (weakly connected to the gaze fields) and (flanking the paracentral nucleus) "paralaminar" portions of the mediodorsal, ventral anterior, and ventral lateral nuclei.

The striatal destination of many of the projections issuing from dorsal thalamic targets of the cortical gaze fields, along with the direct gaze field projections to the striatum, makes the basal ganglia the center of gravity of Tier 1 projections. This is reinforced by the fact that the chief thalamic targets of the cortical gaze fields lack descending projections of their own. Thus, as far as orienting gaze behavior is concerned, the principal descending exit from Tier 1 (i.e., from dorsal thalamus and striatum) is through the basal ganglia output pathway for gaze-control. It passes via the substantia nigra pars reticulata and lateralis in the ventral

**FIGURE 2 | Schematic depiction of the basic connective relations of the supranuclear apparatus for gaze control discussed in the text.** The figure may conveniently be inspected by proceeding from the two principal cortical "gaze fields," the frontal (FGF) and the parietal (PGF), which are mutually connected. Projections descending from them are shown as curvilinear trajectories, further distributed to components of Tier 1 [dorsal thalamus and basal ganglia (BG)] and Tier 2 (ventral thalamus and midbrain) via connective "buses" (for graphical economy). Connections between components of Tiers 1 and 2 are omitted to avoid clutter, with two exceptions: Tier 1 projections destined for the basal ganglia (BG) are shown, as are the main connections of both tiers with the superior colliculus (SC). Both cortical gaze fields issue direct projections to the colliculus as well as to the brainstem orienting apparatus. The latter has a token representation in Tier 2 in the form of its most rostral member, the rostral interstitial nucleus of the medial longitudinal fasciculus (riMLF). Connections to the rest of that apparatus are shown descending along the medial longitudinal fasciculus (MLF), and include the direct collicular descending projections to the paramedian brainstem and spinal cord. The colliculus returns projections to the cortical gaze fields via synapses in the paralaminar MD (MD) and Pulvinar (PULV), shown as straight lines deflected in the respective dorsal thalamic nuclei. Note, finally, that the chief descending route from Tier 1 to the brainstem orienting apparatus proceeds from the basal ganglia (which also receive direct cortical gaze field projections) via its midbrain outpost, the substantia nigra pars reticulata (SNr), to the superior colliculus. Together with the rest of its connectivity sketched here, this places the colliculus in the position of connective superhub in the supranuclear apparatus for gaze control, a concept further explicated in the text. Solid dots mark the source of a projection. The termination of a projection is shown ending in an open "Y." Filled triangles indicate reciprocal connections. Ext. intralam. cmplx, extended intralaminar complex, which includes the suprageniculate and limitans nuclei; VA-VL, ventral anterior and ventrolateral nuclei; PT, pretectal nuclei; LGV, ventral lateral geniculate nucleus (pregeniculate of primates); MRF, midbrain reticular formation; ZI, zona incerta. The figure was inspired by the passage on pp. 435–436 of Huerta et al. (1986). For further detail, see Goldman and Nauta (1976); Fries (1984); Asanuma et al. (1985); Lynch et al. (1985); Leichnetz and Goldberg (1988); Selemon and Goldman-Rakic (1988); Saint-Cyr et al. (1990); Shook et al. (1991); Lock et al. (2003); May (2006); and Stanton et al. (1988a,b).

midbrain to the superior colliculus in the roof of the midbrain (Beckstead et al., 1979; Hikosaka and Wurtz, 1983, 1989). As the main connecting link between the first and second tiers, the substantia nigra of the midbrain occupies a position of its own in **Figure 2**.

Tier 2 has its most rostral outpost in the zona incerta, a ventral thalamic derivative on the undersurface of the dorsal thalamus (see Merker, 2007, pp. 75–76). It further contains the ventral lateral geniculate nucleus (the pregeniculate nucleus of primates; also a ventral thalamic derivative), the anterior and posterior pretectal nuclei, the rostral interstitial nucleus of the medial longitudinal fasciculus, as well as the midbrain reticular formation and what might be called the "perioculomotor nuclei" (the interstitial nucleus of Cajal, nucleus of Darkschewitsch and nucleus of the posterior commissure, not represented in the figure). Finally, it contains as its most elaborate and prominent member the superior colliculus in the roof of the midbrain. Anatomical references are cited in the legend to **Figure 2**.

The various components of Tier 2—unlike a number of those in Tier 1—have descending projections of their own. In the case of the gaze-related output of the substantia nigra—the principal conduit for the entire descending output of Tier 1—this projection terminates in the intermediate layers of the superior colliculus. This makes the superior colliculus the principal, if indirect, premotor output station of Tier 1. In addition to conveying the output of Tier 1, the colliculus receives prominent direct projections from the cortical gaze fields themselves (Huerta et al., 1986; Stanton et al., 1988b; Lock et al., 2003), as well as from a number of their Tier 2 targets. This convergence of gazefield-related connectivity on the superior colliculus is complemented—as pointed out by Huerta and colleagues and illustrated in **Figure 2**—by collicular projections to virtually the entire gamut of their diencephalic and midbrain targets (Huerta et al., 1986, pp. 435–436).

Apparently the superior colliculus occupies a central position in the descending connectivity of the cortical gaze fields, suggestive of "superhub" status in informal graph theoretic terms. Assigning it such a role by no means implies that the superior colliculus constitutes an obligatory link in the descending gaze field control over eye and head movements. Instead it opens the possibility that it may perform a more indirect or higher order function than its midbrain location might suggest. It is but one of many subcortical targets of the cortical gaze fields. Among these, most Tier 2 structures have independent descending brainstem projections, and the cortical gaze fields themselves project beyond the midbrain to brainstem nuclei with functions in the control of eye end head movements (Schiller et al., 1980; Schnyder et al., 1985; Huerta et al., 1986; May and Andersen, 1986; Stanton et al., 1988b; Faugier-Grimaud and Ventre, 1989; Shook et al., 1990, 1991; Munoz and Schall, 2004), though some of these projections are not very strong.

In this setting, a collicular role as connective superhub means that from virtually any component of the supranuclear orienting apparatus sketched in **Figure 2** there typically is a short synaptic route to the superior colliculus and via it to any other component of that apparatus. *The range of collicular connective relations, arrayed in tandem (i.e., in parallel) with the complex orienting circuitry it serves, seems to indicate that the superior colliculus performs a central function which otherwise diverse components of that circuitry have reasons to access and presumably derive benefit from*. What might that function be?

## **THE KEY TO COLLICULAR FUNCTION**

The wide-ranging afferent and efferent connectivity of the superior colliculus indicates that it must perform an integrative function of wide scope. A multitude of sensory as well as nonsensory cortical and brainstem systems converge with laminar specificity on its layered structure in the roof of the midbrain (see **Figure 3**). In the cat more than 40 subcortical nuclei and over 25 cortical areas project to it (Edwards et al., 1979; Edwards, 1980; Harting et al., 1992; see also Grofová et al., 1978; Hikosaka and Wurtz, 1983; Huerta and Harting, 1984; Rieck et al., 1986; Canteras et al., 1994). Collicular output, in turn, distributes divergently: not only do its descending projections target a range of brainstem systems controlling the diverse effectors of orienting movements, including those of the ears in animals that move them (Henkel and Edwards, 1978), but contrasting behavioral output categories are functionally segregated within them (Dean et al., 1988, 1989; Moschovakis et al., 1988a,b; Westby et al., 1990; Redgrave et al., 1993; Mana and Chevalier, 2001; Comoli et al., 2012). Its ascending projections, meanwhile, target the telencephalon (cortex and basal ganglia) via the higher-order and intralaminar thalamic nuclei, as already outlined (Huerta and Harting, 1984; Sparks and Hartwich-Young, 1989; May, 2006).

**FIGURE 3 | Schematic depiction of two principal design features of the anatomical organization of the superior colliculus. Lower left:** The cortex-like segregation, by laminar depth in the colliculus, of collicular afferents from many and diverse cortical and subcortical sources. Here only cortical sources are illustrated. Each source typically projects through the full mediolateral extent of the colliculus, but is here shown only as a narrow sector in which its laminar depth is marked by shading. The drawing is a simplified adaptation of results by Harting and colleagues in the cat (Harting et al., 1992), patterned after their summary Figure 27. **Upper right:** A cartoon of the compartmental organization of the collicular intermediate gray substance, based on histochemical and connectional studies in rat and cat (Harting et al., 1992, 1997; Chevalier and Mana, 2000). The upper surface of the composite drawing is patterned after Figure 6 of Chevalier and Mana (2000), and its cut face is loosely patterned after Figure 26 of Harting et al. (1992). Note that this part of the figure combines patterns from rat and cat, and is not anatomically veridical. It is only intended to convey the honeycomb-like tessellation of the collicular intermediate gray substance, by means of which distinct input-output "channels" are concatenated within a shared sensori-motor topography. See further the studies just cited, as well as Deniau et al., 2007 and Redgrave et al., 1992. SC, superior colliculus; PAG, periaqueductal gray matter; MRF, midbrain reticular formation.

Well over a century of behavioral and physiological studies indicate that this integrative hub somehow serves the multieffector phasic movements that re-orient an animal's receptor surfaces relative to a spatial target of immediate behavioral interest (Adamük, 1870; Hassler and Hess, 1954; Schneider, 1967; Schaefer, 1970; Syka and Radil-Weiss, 1971; Straschill and Rieger, 1973; Goodale and Murison, 1975; Harris, 1980; Merker, 1980; Roucoux et al., 1980; McHaffie and Stein, 1982; Milner et al., 1984; Dean et al., 1989; Freedman et al., 1996; Gandhi and Katnani, 2011). The canonical form of this re-orienting is the swift and seamlessly integrated joint action of eyes, ears (in many animals), head, and postural adjustments that make up what its pioneering students called the orienting reflex (Sokolov, 1963) 3 . Collicular involvement in this central pivot of behavior extends even to its autonomic and cerebral activation aspects (Jefferson, 1958; Dean et al., 1991; Dringenberg et al., 2003).

It would be tempting to call the colliculus the "central pattern generator of the orienting reflex," were it not for the fact that it does not actually specify the particular moment to moment sequence in which eyes, ears, head, trunk or limbs combine to produce a given orienting movement. The interplay among components of orienting gaze shifts is apparently settled downstream of the colliculus (Sparks, 2004). There the elaborate brainstem connectivity bundled along the medial longitudinal fasciculus carries the vestibular, cerebellar, and postural information, including eye position information, integral to the fluid interplay of the several effector organs involved (Büttner-Ennever et al., 1989; for the complexities involved in eye-head coordination alone, see Crawford et al., 1999; Sparks, 1999; Scudder et al., 2002).

Moreover, the behavioral role of the colliculus is not confined to the orienting reflex as classically conceived. Without a colliculus, animals do not exhibit escape reactions to visual threat (Sprague et al., 1961; Denny-Brown, 1962; Sprague and Meikle, 1965; Casagrande and Diamond, 1974; Merker, 1980; Dean et al., 1989; King and Cowey, 1992). Such escape behavior re-orients the animal *away from* the eliciting stimulus, and no orienting *toward* that stimulus need precede the precipitous escape triggered by an effective visual threat (Merker, 1980)<sup>4</sup> . Again, the escape behavior itself is presumably orchestrated downstream of the colliculus, with involvement of the nucleus cuneiformis and periaqueductal gray matter located directly beneath the colliculus (Sprague et al., 1961; Blanchard and Blanchard, 1987; Dean et al., 1988).

Functionally, there is little common ground between orienting target acquisition and escape from visual threat except this: in both situations the brain selects a "spatial target of immediate behavioral priority" toward which the animal's receptor surfaces are re-oriented. In the case of escape behavior, that spatial target is a safe place or escape route and not the eliciting stimulus itself—in fact, the farther from that stimulus the better! A so far elusive generic definition of collicular function may accordingly come within reach by focusing on the *determination of target priority* rather than on either the eliciting stimulus or the nature of the resulting movement (see Schall and Thompson, 1999; Fecteau and Munoz, 2006; Boehnke and Munoz, 2008).

Such a function, it is hereby proposed, may be formulated as follows: *The superior colliculus provides a comprehensive mutual interface for brain systems carrying information relevant to defining the location of high priority targets for immediate re-orienting of receptor surfaces, there to settle their several bids for such a priority location by mutual competition and synergy, resulting in a single momentarily prevailing priority location subject to immediate implementation by deflecting behavioral or attentional orientation to that location.*

The key collicular function, according to this conception, is the selection, on a background of current state and motive variables (Dorris et al., 2007), of a single target location for orienting in the face of concurrent alternative bids. In this capacity the colliculus would serve as the brain's final "priority comparator" or "priority gate" for immediate re-orienting. It would determine which of simultaneous bids for an orienting movement (including that of continuing the current orientation unchanged, Munoz and Guitton, 1989; Peck, 1989) should prevail in gaining momentary control of collicular output circuitry housed in its intermediate layers. The colliculus resolves conflicts, in other words, between the many brain systems whose state bears on an impending orienting movement. According to one theory of the function of phenomenal states (Morsella, 2005), this should give it a role in the constitution of such states. What that role might be is a question the present analysis is laboring to answer.

To clarify further the priority gate function of the collicular orienting superhub: what will be impaired in the absence of the colliculus is not eye or orienting movements as such as orienting superhub the colliculus is arrayed both in parallel and in series with cortical gaze fields (see **Figure 2** and Schiller et al., 1980)—but the process of selection among concurrent bids for target location priority. Depending on task and situational particulars this may take the form of deficient selection and triggering of alerting, orienting and escape reactions—impaired distractibility being a common symptom of collicular lesions across species (Denny-Brown, 1962; Casagrande and Diamond, 1974;

<sup>3</sup>Recognition of this collicular function was long delayed by the fact that the head of the experimental animal was fixed in a number of key physiological experiments designed to probe collicular function, and a restricted set of stimulation sites and parameters in early experiments in which the animals were free to move their heads (Robinson and Jarvis, 1974; Stryker and Schiller, 1975; see further Sparks, 2004). With less restrictive experimental conditions, not only does collicular stimulation evoke integrated gaze movements combining movement of eyes and head (Freedman et al., 1996; Sparks, 2004), but the animals' localization ability is drastically improved (Tollin et al., 2005), and the relationship between collicular unit activity (as well as stimulation site) and behavior is altered (see Sparks, 1999 for details).

<sup>4</sup>In the author's studies of escape behavior in hamsters (Merker, 1980), frame-by-frame analysis of the filmed trials showed that no orienting movements toward the over-head sudden silent visual threat preceded the explosive escape behavior triggered by the stimulus (Merker, 1980). Instead the animal instantly reoriented to one of two escape routes in the familiar testing arena, and scrambled to safety, a behavior that was abolished by undercutting the superior colliculus, severing its descending projections. In present terms, on

the rare and sudden appearance of a large, dark, and silently but swiftly moving visual stimulus in the animal's upper visual field the location of an escape route, known to the animal from long established familiarity with the testing arena, became its compelling "spatial target of immediate behavioral priority." For effective visual threats in rodents, see Wallace et al. (2013).

Goodale et al., 1975, 1978; Milner et al., 1978; Merker, 1980; Albano et al., 1982; Gaymard et al., 2003)—or impaired ability to regulate orienting priorities in a learning situation (Winterkorn and Meikle, 1981).

Selection of the spatial target for the next orienting movement is not a matter of sensory locations alone, but requires access to situational, motivational, state, and context information determining behavioral priorities. It combines, in other words, bottom-up "salience" with top-down "relevance." As emphasized by Munoz and colleagues, priority is a weighted combination of these two types of information (Fecteau and Munoz, 2006; Boehnke and Munoz, 2008). This provides a rationale for nonsensory collicular afference such as that originating in cortical association areas and hypothalamus, and more generally the conspicuous convergence of exogenous (bottom-up) and endogenous (top-down) information sources in the superior colliculus (cf. Lines and Milner, 1985; Rieck et al., 1986; Cooper et al., 1998; Trappenberg et al., 2001; Felsen and Mainen, 2008; Reyes-Puerta et al., 2009; Cohen and Castro-Alamancos, 2010; Meeter et al., 2010; Maior et al., 2012).

No cortical gaze field is as directly connected to as wide a range of sources carrying information bearing on the decision where to turn next as is the midbrain superior colliculus. The cortical gaze fields receive high level information but not primary sensory afference, while the colliculus receives both the latter and the direct output of the cortical gaze fields and numerous additional cortical and brainstem afferents as well. Its broader afference enables its intrisic circuitry to weigh a wider range of information bearing on the very next orienting movement than any other known neural system [with the possible exception of the zona incerta, with which it is reciprocally connected (Merker, 2007, pp. 75–76)]. This predicts that without a colliculus an animal will be capable of turning and orienting, but not with as comprehensive a moment-to-moment *weighting and comparison/gating* of all relevant sources of information as when in possession of an intact collicular hub.

The intricate intra- and inter-laminar circuitry within the colliculus that carries out the requisite interactions among its many inputs is beyond the scope of this review [(Moschovakis et al., 1988a,b; Doubell et al., 2003); see review in Isa and Hall (2009)]. Suffice it to say that it involves massive inhibitory interactions, both intrinsic to the colliculus (Katsuta and Isa, 2003) and coming from outside in the form of powerful inhibitory projections from several sources, only one of which is the already mentioned nigral projection. They include the zona incerta, anterior and posterior pretectal nuclei, the periparabigeminal area, a "critical zone" of the pedunculopontine region, and indirectly, via collicular interneurons, the parabigeminal nucleus (Ficalora and Mize, 1989; Appell and Behan, 1990; Behan and Appell, 1992; May et al., 1997; Durmer and Rosenquist, 2001; Klop et al., 2006; Lee and Hall, 2006). Through this convergent interface, multiple functionally diverse systems—each occupying a unique laminar depth in the colliculus—have their say, via inter- and intralaminar collicular interactions, in the moment to moment determination of the next priority target location.

The advantage of conducting structured interactions between low-level primary afference and high-level cortical information in a compact, convergent, laminar mechanism is twofold. First, this way the brain escapes the liability of entrusting momentto-moment decisions to an executive fed only highly derivative information. When high-level cortical areas place their priority bids with an independent priority comparator, the brain as a whole, through its offices, stays open to "last split-second" course corrections, even by low-level sensory information, provided its magnitude is sufficient to override current competitors (cf. Marino et al., 2012). It is worth noting in this connection that cognitively demanding high-level deliberations are often readily postponable in comparison with intrusive sensory change that might spell disaster unless immediately attended to. Though often a fleeting glance is all that is required before ongoing behavior can be safely resumed, these "cautionary glances" nevertheless compete with the demands of ongoing behavioral task execution. Both utilize the same effector equipment for orienting, hence the need for a mechanism to resolve conflicts between them (Morsella, 2005; see Goodale et al., 1975 for an example).

Second, by taking place in a compact neural space by means of short axon intrinsic connectivity, the interactions needed to determine target location priority can occur far faster than anything that might be accomplished by long-range cortico-cortical interactions among multiple systems. The abolition of short latency gaze shifts by lesions of the colliculus or its local inactivation (Schiller et al., 1980, 1987; Hikosaka and Wurtz, 1985, 1986) accordingly may reflect the absence of the rapid descision making competence by which the colliculus normally drives the orienting machinery (Yarbus, 1967; Sparks et al., 2000; Johansson et al., 2001; Schiller et al., 2004), rather than a mere quantitative slowing of the orienting system.

There is thus no need to interpret the oft reported "vacant stare" and "fixed gaze" of colliculectomized tree shrews and monkeys (Denny-Brown, 1962; Anderson and Symmes, 1969; Casagrande and Diamond, 1974; Keating, 1974; Butter, 1979) as a symptom of an inability to move the eyes or to orient. Rather, without the broad-based afference and rapid operation of the collicular decision making machinery the incessant lively play of the orienting reflex triggered at the collicular interface of endogenous and exogenous signals is compromized, leaving orienting behavior impoverished (see citations on impaired distractibility above).

Among investigators reporting impoverished orienting behavior in monkeys after lesions centered on the superior colliculus, none was more impressed by the lack of spontaneity in postlesion behavior than was Derek Denny-Brown. In his Sherrington memorial lecture of 1962 he reported on the behavior of five macaques with such lesions, stressing a global deficit in spontaneous behavior as a key symptom of their brain damage. The animals showed a "gross reduction in all types of externally directed behavior," spent long periods "staring aimlessly into space," and uttered no sounds (Denny-Brown, 1962, pp. 536–537). These global deficits appear to indicate, he suggested, that the tectum is the "primary driver of the mesencephalic reticulum" (which fits with the evidence for a collicular role in cerebral activation cited above, Jefferson, 1958; Dean et al., 1991; Dringenberg et al., 2003).

There were, however, considerable differences among Denny-Brown's five animals in the nature and severity of their symptoms, extending to the details of their visuomotor behavior. These differences presumably are related to differences in the extent and location of the lesions. The lesions were large and deep, variously encroaching on neighboring structures. In this connection it is worth noting that the behavioral effects of complete and selective lesions of the periaqueductal gray matter are more drastic versions of the kind of global behavioral changes reported by Denny-Brown (see Bailey and Davis, 1942, 1944). It seems plausible, therefore, that these symptoms, including persistent mutism (Gruber-Dujardin, 2010) 5 , relate to damage extending beyond the colliculus into the immediately underlying periaqueductal gray matter or its efferent fibers. In addition, periaqueductal loss of its collicular input (Mantyh, 1982, 1983) may have contributed to the observed deficits.

Perhaps in cognizance of the likelihood that the behavioral symptoms he described involved damage to more than the superior colliculus *sensu stricto*, Denny-Brown ended his lecture on a cryptic note. The periaqueductal gray and above all its "more differentiated peripheral layers," namely midbrain reticular formation and tectum are vital, he wrote, for unitary functioning of the organism in relation to its surroundings, and constitute what he called *the physiological "ego*.*"* He did not elaborate on this obscure formulation, but this is the first time a linkage between the neural machinery in the roof of the midbrain and "the self" appears in print. Fifteen years later a similar suggestion, focused on the sense of continuity of self over time, is made by the Scheibels with regard to the deeper layers of the superior colliculus and nucleus cuneiformis beneath its caudal border (Scheibel and Scheibel, 1977). They, as well as Denny-Brown, are cited in their turn at late points in an expansive discourse on a collicular locus of "awareness of self " published by the biochemist and gerontologist Bernard Strehler 14 years after the Scheibels (Strehler, 1991).

Of these three, only Strehler attempts a detailed justification of a collicular role in the domain of self and awareness. However, the terminology he applies to this end is so varied and imprecise as to leave the attempt under-constrained from the side of the proposed function. The latter might, by close reading, be narrowed down to "awareness of self-vs-environment" or a system's "cognizance of its own existence" (Strehler, 1991, p. 81). In present terms, these expressions refer to particular contents of consciousness (i.e., cognizance of the distinction between self and environment, or of the fact that one exists, both of which are cognitive contents). They do not, in other words, define factors constitutive of the state that allows contents to be consciously apprehended; rather they presuppose it. If instead we ask whether there might not be some construal of the term self that might in fact refer to a constitutive factor of the conscious state, and how such a factor might be neurally implemented, a possible role for the superior colliculus in the constitution of the conscious state does indeed come within view.

## **THE SELF THAT IS EXCLUDED FROM BUT PRESUPPOSED BY THE CONTENTS OF CONSCIOUSNESS**

The entire content of our sensory experience bears witness in multiple ways to the egocentric geometry of its spatial arrangement. As far as immediate sensory experience goes, all its contents, irrespective of modality, are arrayed around an approximation to a single point, the point "from which" they all are experienced, be they near or far, high or low, left or right, in front or behind (e.g., sounds). In fact, these very terms are defined in relation to that point, and have no meaning apart from it; the same applies to "sidedness" and "handedness" (James, 1890, p. 150, footnote 2). It is this egocentricity of sensory experience the fact that visual (as other) objects are perceived from a point that occasions the occlusion of one visual object by another. In the sense of touch the sensation of a light touch to a finger is experienced as located in the finger, but that sensation *in* the finger is not experienced *from* the finger, but from about the same spatial location from which that finger is seen, even if the sensation should occur in pitch darkness. Our spatial senses are integrated, in other words, into a single, panoramic multimodal space anchored to its egocenter common origin (see **Figure 1**)<sup>6</sup> .

That point, that origin, lies at the intersection of all lines of sight, serving as their common pivot (cf. Vetter et al., 1999; Wagner, 2006; Thaler and Todd, 2009). It is located at the proximal-most end of any line of sight or equivalent line of attentional focus (say for somesthesis in the dark). It is the "here" with respect to which any sensory (or other) percept is "there." It is the point, in other words, from which we are looking and, more generally, registering sensory experience or deploying attention. For our visual perception of the world, that point can be determined with millimeter precision by a simple procedure first developed by Hering (1879/1942; Roelofs, 1959). Commonly included in lab exercises in the psychology of perception it empirically pinpoints the intersection of a few lines of sight obtained by fixating specified environmental locations and aligning fiduciary pins with them along each of the lines of sight (Howard and Templeton, 1966).

Thus, determined, the visual egocenter is found to be, first of all, single (not a foregone conclusion given that we have two eyes) and it turns out *not* to be located, as one might suppose, at the midpoint between the centers of rotation of the two eyes. Rather, it lies deeper inside the head, in the midsagittal plane, some 4–5 cm behind the bridge of the nose (see left panel, **Figure 4**). This empirically determined location inside the head from which we look out upon the world along straight and uninterrupted lines of sight is of course surrounded on all sides by biological tissues. Here lies the ultimate conundrum of phenomenal sensory awareness, the Achilles heel of its secret, in fact. How it is possible

<sup>5</sup>It appears that the integrative role of the periaqueductal gray in vocal behavior [for which see review by Gruber-Dujardin (2010)] in fact resembles the role here proposed for the colliculus in orienting behavior.

<sup>6</sup>Visual experience is panoramic: no one has ever experienced that mainstay of philosophical discussions of perception—the "red of a tomato"—in itself and as such, but always only in a particular location with a visual surround, typically rich in other objects arrayed around it and all of them together around us. It is an egregious error to imagine that the problem of perception can be approached by "starting simple" to build complexity from elementary sensations (the tortuous nature of William James' attempt to do so is a case in point; James, 1890, pp. 145–166).

©Nevit Dilmen found at Wikimedia commons, released under a Creative Commons Attribution-Share Alike 3.0 Unported license). **Right panel:** A monocular view from the visual egocenter, rendered by Ernst Mach through his left eye (Mach, 1897, Figure 1, p. 16). The dark fringe of Mach's eyebrow appears beneath the shading in the upper part of the figure, the edge of his moustache at the bottom, and the silhouette of his nose at the right-hand edge of the drawing. These close-range details framing his view are available to our visual experience, particularly with one eye closed, though not as crisply defined as in this drawing. In a full cyclopean view with both eyes open the scene is framed by an ovoid within which these proximal details typically disappear from view. Apparently Mach was a smoker, as indicated by the cigarette extending forward beneath his nose. Digitally retouched version of Mach's drawing reproduced courtesy of Wikimedia (http://commons.wikimedia.org/wiki/File:Ernst\_Mach\_Innenperspektive.png). Note the apparent impossibility of having an unobstructed view of a scene from the empirically determined point marked on the image on the left, a point which is surrounded on all sides by biological tissues (see further the text).

to have unobstructed lines of sight into the world from a place inside our heads that is surrounded on all sides by opaque tissues?

The short answer is that our experienced head is the head of the neural reality model (see **Figure 1**, rastered head), for which arrangements are possible that are not realizable in the physical head itself. For details, see my previous publications (Merker, 2012, pp. 53, 55 and 2013, pp. 26–27). Here, we are concerned, rather, with what it is that occupies this enigmatic location at the origin of the line of sight.

Typically our line of sight is deployed to a distal object of interest, but let us reverse the direction of our interest by "moving backwards" along a line of sight toward its proximal origin. We will then traverse a succession of environmental locations ever closer to ourselves, to arrive in the vicinity of our eyes. At these close quarters we may espy the shadowy presence of the edge of our orbit in peripheral vision, particularly if, as in **Figure 4**, we follow Ernst Mach's example, and close one eye. Then, as we try to proceed all the way to the origin of the line we have followed, an origin we know to be located inside our head, we are suddenly at a loss for any determinate content of consciousness whatsoever that might inform us about the nature of that which occupies the origin of the line we have followed backwards. Disappointed, but not defeated, we press on, and continue progress along the extension of the line of sight through the troublesome lacuna we landed in, to have our focus arrive in short order at the back of our head.

We are then free to continue our imaginary journey out into the world behind our head. There is, however, no need to do so, because the answer to the question of what occupies the origin of the line of sight is already at hand. For every step away from the troublesome lacuna, even to a distance as short as to the back of our head, the points along the line we are tracing are ever more distant from the place from which we are conducting the exercise. We are, in other words, increasing the distance between our targets and ourselves, in a reverse motion from the one that brought us to the lacuna. What occupies the lacuna, then, can be nothing other than we ourselves. The place from which one is looking or attending is occupied—necessarily, unsurprisingly, and tautologically—by oneself.

This "oneself," the self thus located through the above first person exercise, is not and cannot be a self-image of any kind. It defines the viewpoint from which any and all images are viewed or equivalently, is the origin of all lines of sight (and "lines of attention," the exercise was conducted by covert attention). It is the one location that is forever beyond the reach of any directed attention or perception, because it is the point from which attention is directed and relative to which percepts are located in the space whose origin it defines.

This helps explain the utter blank one draws in attempting to take the last step along the line of sight back to its origin. That location is excluded from the contents of consciousness by the same geometric necessity that prevents an eye from viewing itself, though it is the instrument for viewing all else (Schopenhauer,1844/1958, vol. 2, p. 491; see also Baars, 1988, pp. 327ff for "contextual" aspects of consciousness). This is what David Hume failed to realize when he "searched his mind" for a self and found only perceptions and bundles of perceptions (Hume, 1739/1888). The self he was looking for is the place from which he was looking.

The first person exercise we have just conducted yields a minimal definition of the self as the perceptual egocenter of sensory consciousness and, by extension, of all awareness. It defines a location with respect to which any and all conscious percepts can be uniquely localized in space by direction and distance relative to that point. Some of these percepts are located inside our skin say, a stirring of joy in our breast or a headache—yet they are still perceived relative to that self-same egocenter. Its location inside the head just behind the eyes—a convenience for the control of orienting movements, as we have seen—is in good agreement with our intuitive sense of "where we (and others) are located" as recently determined empirically by a third-person procedure. Both children and adults assign that location to the vicinity of the eyes (Starmans and Bloom, 2012).

For present purposes, it matters little whether that assignment draws on first person intuitive conclusions along the lines of our exercise above, or on the sense that the lively play of a person's eyes bear more immediate and direct witness to their interests and intentions—and hence to their self—than do other visible behaviors. Perhaps it is a combination of both, because the two are intimately related. When, for purposes of the above exercise, we moved attention along our line of sight we were doing no more than making deliberate use of the routine functional role of our perceptual vantage point (egocenter) in directional movements of attention and gaze. It is only in relation to the perceptual egocenter that the size and direction of the angular displacement required of a given gaze or attentional movement are defined. As the implicit reference of all such movements it is the central functional pivot from which they issue, not as motor instructions for a particular combination of eye, head and trunk movements, but rather as locational pointers to targets in egocentric space to be attained by the very next orienting or attention movement.

But that is reminiscent of the function attributed to the superior colliculus in the previous section. Might this midbrain structure in fact—as first suggested in the vaguest of terms by Denny-Brown—serve as the physiological "ego" or self in the minimal sense just outlined? The exercise which led us to this possibility provides an initial plausibility check on whether it might do so. That exercise was conducted by directing attention alone, without eye or head movements, forwards and backwards from the egocenter lacuna, i.e., by *covert* attention in a full 360 degree egocentric space. The involvement of the superior colliculus in covert spatial attention is well established (Robinson and Kertzman, 1995; Cavanaugh and Wurtz, 2004; Ignashchenkova et al., 2004; Muller et al., 2005; Fecteau and Munoz, 2006; Lovejoy and Krauzlis, 2010; Schneider, 2011). Does it also host a full 360 degree directional compass, without which it could not have allowed us to move covert attention to the back of our head, and without which it cannot serve as central pivot or origin of a fully functional multimodal and egocentrically organized localization system (see **Figure 1**, and below)?

When animals are free to move their head, collicular stimulation at increasingly caudal levels evokes increasingly extensive gaze excursions beyond the oculomotor range by recruiting ever larger head movements into the orienting response (Faulkner and Hyde, 1958; Westheimer and Blair, 1975; Roucoux et al., 1980; King et al., 1991; Grantyn et al., 1992; Freedman et al., 1996; Sparks, 1999; Corneil et al., 2002; Isa and Sasaki, 2002; see also Guitton and Volle, 1987). For natural orienting movements into the space behind the animal, head turns by means of cumulative rotation across increasingly caudal cervical vertebrae (Richmond et al., 1992) are supplemented by trunk movements (Hassler and Hess, 1954). The same recruitment of eyes, head and trunk by collicular stimulation is true of non-mammals (Herrero et al., 1998; Saitoh et al., 2007). Since, as already noted, the details of movement execution are left to brainstem structures downstream of the colliculus, the colliculus itself appears to implement a space of pure locational specification for the entire egocentric surround<sup>7</sup> .

With a full collicular complement of spatial directionality, the path is cleared for the possibility that this midbrain structure in fact occupies the position in the neural machinery of the brain that gives us our position as first person inhabitants of an egocentrically organized space of phenomenal sensory awareness, while it itself lies outside the compass of phenomenal awareness. As already detailed, though that position is the defining feature of such a space, it cannot itself appear as a phenomenal content within it. In fact, all phenomenal contents, as we have seen, are separate from it, because that location defines the ultimate unobservable "here" with respect to which they are located "there." If the superior colliculus in fact implements the directional pivot—an omnidirectional non-phenomenal "here" for all phenomenal "theres"—how and where are those phenomenal contents implemented, and how is the colliculus related to that larger arrangement of which it must, on this interpretation, form an integral part?

## **TETHERING PHENOMENAL SPACE TO ITS NON-PHENOMENAL DIRECTIONAL PIVOT**

In view of all that has gone before, only two possibilities remain: the space within which sensory information achieves conscious status, i.e., phenomenal space, is implemented either within the colliculus itself or among the targets of its ascending projections. Regarding the first alternative, the multimodal laminar colliculus features every modality on which animals rely for their phasic sensory orienting. This includes exotic ones in some species, such as infrared (Hartline et al., 1978), electroceptive (Bastian, 1982), magnetic (Nemec et al., 2001), and echolocation senses (Valentine and Moss, 1997). These modal maps, layered cortex-like through the collicular/tectal depth dimension (see **Figure 3**), share the collicular efferent premotor functional framework in its tangential dimension. In the collicular output layers its multiple modalities converge onto single collicular neurons with cortically dependent multimodal properties (Meredith et al., 1992; Wallace and Stein, 1994). Moreover, collicular neuron numbers would seem to suffice for implementation of a comprehensive multisensory phenomenal space. A total (bilateral) neuron count of almost 2 million for the macaque superior colliculus (Théoret et al., 2001) can be extrapolated to about 5 million for the human. This, according to the rough estimate provided in an earlier section, should suffice for the purpose.

There are good reasons, nevertheless, to discount the colliculus as a serious contender for the honor of hosting our phenomenal sensory consciousness. The phenomenal world we inhabit is not only crowded with intricate pattern detail, but brightly colored and exquisitely articulated in its depth dimension both in terms of global spatial relations and solid object shapes. The neural operations of the superior colliculus, on the other hand, seem concerned primarily with locational matters, to the exclusion of much of this intricate and gaudy finery (but see Rizzolatti et al., 1980). Thus macaque collicular single units dispense with the orientation and directional specificity carried by axons of its visual cortical afference, presumably by convergence of multiple differently tuned cortical afferents onto single collicular units, rendering them broadly tuned or untuned (Finlay et al., 1976).

<sup>7</sup>The primitive position of the eyes in vertebrates is lateral, on the sides of the head, a placement exhibited by most non-mammalian and many mammalian species. Such animals have visual fields that essentially cover their full surround. The colliculus has no "reason" to contract its full field sensorimotor organization in the minority of species whose eyes have migrated to a frontal position. With frontally placed eyes, head movements are required to move the visual field beyond the oculomotor range. By leaving collicular full-field organization intact, head movements can be collicularly triggered now as before by, say, somatosensory or nociceptive stimuli to body parts beyond the reach of vision, or sound sources localized in the rear sector of space (see also footnote 3 and references therein).

Regarding the color *selectivity* which is an absolute requirement for implementing human phenomenal contents, the colliculus appears to lack it. Its direct retinal afference proceeds from broad-band retinal ganglion cells, and the indirect pathway to the colliculus via the lateral geniculate and primary visual cortex appears likewise to be a broadband, magnocellular pathway lacking color selectivity (Schiller et al., 1979). That does not mean that collicular units lack color *sensitivity*, however: they respond vigorously to stimuli defined by isoluminant color patches alone, but they do so without discriminating stimulus wavelength (White et al., 2009). This color-based information appears to arrive at the colliculus from extrastriate sources, again presumably by convergence of color tuned units. This allows the colliculus to respond to colored stimuli without representing their hue. Such an arrangement fits well with its localizing function but badly with a venue hosting multi-colored phenomenal space.

Regarding three-dimensional depth, finally, the situation is less clear. It hinges on the thorny issue of whether or not collicular output is a purely directional ("cyclopean") signal, or includes a vergence and torsional signal for the alignment of the two eyes (van Opstal et al., 1991; Chaturvedi and Van Gisbergen, 1999; Walton and Mays, 2003; Busettini and Mays, 2005; Waitzman et al., 2008; Pérez Zapata et al., 2013). The parietal gaze area transmits disparity information to the superior colliculus (Gnadt and Beyer, 1998), yet collicular disparity sensitive units are broadly tuned (Berman et al., 1975; Dias et al., 1991; Bacon et al., 1998), perhaps again reflecting collicular pooling of cortical specificities. These units have been found in the rostral colliculus, where fixation units are also found, so possibly they play a role in fixation behavior. In view of the negative evidence reported for torsion by van Opstal and colleagues and for vergence by Walton and Mays (cited above), there would not seem to be a strong general case for a collicular "third dimension."

Taken together, these several strands of evidence regarding collicular single unit properties weigh against a collicular locus for full, ordinary phenomenal sensory experience. The process of elimination therefore leaves only targets of ascending collicular projections to consider as possible candidate sites for colliculophenomenal interaction. Recall that the search is for a subcortical target of cortical layer V projections capable of relieving the cortex of the need to precipitate a global best estimate of sensory circumstances within cortical probabilistic operations themselves. That target has now been further specified "from below" as a target of ascending collicular projections, and these are concentrated to the thalamus (see **Figure 2** and the text it illustrates).

Two further requirements must be fulfilled for a structure to serve the cortex as its global best estimate buffer. It must be reciprocally connected with a broad range of cortical areas occupying the higher levels of the several cortical sensory hierarchies, and must contain the intrinsic circuitry needed to conduct swift multiple constraint satisfaction operations over these cortical afferents in the span of the few hundred milliseconds available between gaze shifts (Rayner, 1998; see Merker, 2012, p. 56 for details). The constraint satisfaction operation accordingly must be conducted in parallel fashion (cf. Mezard and Mora, 2009) through interactions beyond strictly local ones in the candidate structure. Generally, however, the thalamus is conspicuously lacking in intrisic connectivity within or between its subdivisions (e.g., Trojanowski and Jacobson, 1975; Ogren and Hendrickson, 1977). It therefore lacks a crucial anatomical requirement for implementing the needed constraint satisfaction operation. There is, however, one notable exception to this generalization.

The dorsal pulvinar of the higher-order thalamus is a multimodal region connected with high level posterior parietal and temporal areas of both streams of the visual system, with auditory association cortex and multimodal cortical areas, as well as with parahippocampal, prefrontal (including frontal eye fields), orbitofrontal, and insular cortices (Yeterian and Pandya, 1991; Gutierrez et al., 2000; Imura and Rockland, 2006; Kaas and Lyon, 2007; see also Cappe et al., 2009). The caudal reaches of this dorsal pulvinar territory are invested with a unique population of long range inhibitory interneurons (Imura and Rockland, 2006). Their axons branch widely across the many intricately interdigitated slabs or discs by which cortical areas are represented there (e.g., Asanuma et al., 1985; Hardy and Lynch, 1992). Though connective detail is as yet lacking, these axons, being inhibitory, can hardly avoid establishing competitive linkages and bridges across these interdigitated slabs. The reach of these inhibitory interneurons within the dorsal pulvinar is extra-local but *less than global* (see insets in Figures 5, 6, and 8 of Imura and Rockland, 2006). Unlikely, therefore, to operate as a winner-take-all decision mechanism, this inhibitory cross-connectivity may instead constitute a powerful means of swift multiple constraint satisfaction over the interdigitated mosaic of the cortical areas represented there (see also Imura and Rockland, 2007).

This is also the part of the pulvinar that features neurons that combine selectivities of both the dorsal and ventral streams of the visual system in single neurons (Benevento and Port, 1995), that show *more selectivity for stimulus awareness than cortical visual areas assessed with the same method* (Wilke et al., 2009; see also Padmala et al., 2010), that correlate with confidence in sensory judgments (Komura et al., 2013), that reflect intentional rather than routine movements (Acuña et al., 1983), and whose reversible inactivation disrupts selection of action plans (Wilke et al., 2010). The powerful influence of pulvinar activity over the visual responsiveness of even V1 neurons is also worth noting (Purushothaman et al., 2012), as is the longstanding association of the pulvinar with sensory attention and neglect (Petersen et al., 1987; Karnath et al., 2002; Rushmore et al., 2006; Saalmann et al., 2012). Though it does not, of course, prove it, all of this fits well with the conjecture that the dorsal pulvinar implements the brain's global best estimate of sensory circumstances in temporary buffer fashion (further circumstantial evidence bearing on this identification is available in Merker, 2012, pp. 63–69).

Proceeding, then, on the working hypothesis that the dorsal pulvinar in fact performs this best estimate buffer function, it remains to consider how the "first person" might enter its operations. In the preceding section, this inherent aspect of sensory consciousness was found to be implicated in the directional function of covert and overt orienting by defining its implicit (non-phenomenal) spatial origin. This suggested the collicular priority gate, with its omni-directional orienting system, as a candidate implementing structure. It can be related to the dorsal pulvinar via the connectivity depicted in **Figure 2**, by noting that the principal elements and connections of Stewart Shipp's proposed functional anatomy of the brain's attention system lie embedded in that connectivity (see Shipp, 2003, 2004). In his scheme, the *ventral* pulvinar fills the role of principal "salience map" (Shipp, 2004, Figures 2a,g). However, to fill that role it would need intrinsic circuitry by which to crown a "winner" among alternate bids for target priority among its stacked visual topographies, yet in keeping with thalamic patterns generally, this pulvinar subdivision presumably lacks such circuitry.

The functional logic of Shipp's scheme survives this problem, however, because the requisite circuitry is available in the superior colliculus, as we have seen. The colliculus is an integral part of his scheme, and can therefore substitute in it for the ventral pulvinar as principal "salience map" ("priority gate" in present terms). A collicular rather than ventral pulvinar locus also has the advantage that it generalizes priority selection across all spatial modalities (instead of being confined to vision alone), as it must in order to qualify as a general spatial attention system. Moreover, as "orienting super-hub" the colliculus engages principally when alternative bids from a variety of sources, not least cortical, compete for the location of the target of an orienting or attention movement. On the present account such competition is settled within the collicular circuitry itself, and in its deeper layers in final terms. They are therefore the first site in the brain to "know" which location will be the target of the next saccadic gaze shift *actually to be executed*, and thus ideally situated to convey this decision to the forebrain via their ascending projections to the thalamus.

What is conveyed to the forebrain in this way, then, can be nothing other than the predictive "attention pointers" proposed to prepare forebrain sensory maps for impending gaze shifts peri-saccadically (for which see Wurtz, 2008; Cavanagh et al., 2010; also Hulme et al., 2010; Prime et al., 2011). Given that even *top-down* biasing of covert attentional selection in a distractor task requires an intact superior colliculus (Lovejoy and Krauzlis, 2010), the predictive pointer function presumably is the phasic variant of a more general overt and covert directional orienting signal conveyed to the forebrain from the colliculus via its ascending projections. From there it propagates as a local attentional bias shared by all relevant forebrain maps on account of the topographic matching of their connectivities across telencephalic, diencephalic, and mesencephalic levels, exactly as detailed in the Shipp model of the attention system (Shipp, 2004).

The answer is now at hand to the question of how a "collicular self," construed as a non-phenomenal directional pivot for phenomenal sensory space, might relate to the proposed implementation of that space in the dorsal pulvinar. First, the dorsal pulvinar receives direct projections from the superior colliculus, originating—as they should, according to the above—in the deeper collicular lamina (Benevento and Standage, 1983). Second, all the gaze-related areas in cortex and basal ganglia that receive the collicular signal via the extended intralaminar complex and higher-order thalamus are bound to reflect the play of the collicular attention/orienting pointers in their operations.

The incessant play of these pointers will therefore figure as one of the variables in the massive operation of probabilistic source reconstruction in which the cortex is permanently engaged, both to decipher the immediate sensory situation it faces from moment to moment, and for the cumulative (learned) acquisition of the prior competence with which it meets that challenge. This prior competence will therefore inevitably come to reflect the invariant behind the play of the directional attention/orienting pointers, namely the point of origin with respect to which their directional differences are defined. If the primary function of the dorsal pulvinar is indeed mutual constraint satisfaction across its diverse afferents, then the resulting global best estimate of sensory circumstances it produces will come to incorporate this invariant embedded in its cortical afference, complemented by collicular afference from below. It will figure there as exactly what in fact it is, a tacit perspective point *implicit* in the perspectival organization of the phenomenal contents of the global best estimate sensory buffer, without being present as a phenomenal object in it.

This point, then, which is the point from which we look and feel, is our tacit first person perceptual egocenter or self. It is only the innermost of the similarly extracted invariants behind the clusters of correlated variances which our receptor surfaces present to the brain for disambiguation, and which in their momentary global best estimate form we experience as our body and the world which surrounds it (Merker, 2012, p. 54; see also Philipona et al., 2003, 2004). As a product or derivative of the lively play of collicularly triggered orienting and attention movements, the orienting superhub in the roof of the midbrain is its ultimate anatomical base. The decision making machinery hypothetically incorporated into the schematic egocenter in my previous publications (see Merker, 2012, pp. 59, 68; Merker, 2013, pp. 19–22, and Figures 1.2 and 1.4 in particular) accordingly is the intrinsic collicular circuitry by which the priority target of the very next orienting or attention movement is settled.

In the scheme proposed here, this ultimate collicular pivot of the mechanism of consciousness lies outside the anatomical structure implementing conscious contents. This provides a felicitous fit with the phenomenal inaccessibility not only of the self that anchors the first person perspective in which alone those contents come to us in consciousness, but also with our lack of conscious access to the continual split-second decision-making by which it expresses itself in the incessant movements of our gaze across its targets.

## **CONCLUSION**

To summarize, the movements of our gaze or attention from a point inside the nested structure of body within world that is our phenomenal sensory space supplies the leading edge of practically all our behavior. Moving from target to target, it precedes our instrumental engagement with the world like the acquisition marker of a laser spotter in a combat zone. The point from which the pointer proceeds is thus not only the tacit perceptual egocenter or self, it is also, and without the need to make additional assumptions, the central pivot of action control. This, then, is the burden of the present bid to naturalize the first person perspective in action control by assigning a role, in the functional economy of the brain's efference cascade, to our tacit sense of occupying a place inside our heads from which we survey our world and direct the movements of our body within it.

## **REFERENCES**


**ACKNOWLEDGMENTS**

My thanks go to Ezequiel Morsella, without whose initiative this paper would never have been written. I am also indebted to John Hidley for our wide-ranging discussions of topics related to the contents of this paper.


of the medial longitudinal fasciculus and paramedian tracts. *Rev. Neurol. (Paris)* 145, 133–139.


for functional output channels. *Prog. Brain Res.* 75, 27–36. doi: 10.1016/S0079-6123(08)60463-X


stimulation enhances neocortical serotonin release and electrocorticographic activation in the urethane-anesthetized rat. *Brain Res.* 964, 31–41. doi: 10.1016/ S0006-8993(02)04062-3


retrograde transport. *Neuroscience* 29, 567–581. doi: 10.1016/0306- 4522(89)90131-0


	- Hikosaka, O., and Wurtz, R. H. (1986). Saccadic eye movements following injection of lidocaine into the superior colliculus. *Exp. Brain Res.* 61, 531–539. doi: 10.1007/BF00237578
	- Hikosaka, O., and Wurtz, R. H. (1989). "The basal ganglia," in *The Neurobiology of Saccadic Eye Movements*, eds R. H. Wurtz and M. E. Goldberg (Amsterdam: Elsevier), 257–281.
	- DC), 448–453. Horn, A. K. E. (2006). The reticular formation. *Prog. Brain Res.* 151, 127–155. doi: 10.1016/S0079- 6123(05)51005-7
	- Howard, I. P., and Templeton, W. B. (1966). *Human Spatial Orientation*. New York, NY: Wiley.
	- Huerta, M. F., and Harting, J. K. (1984). "The mammalian superior colliculus, studies of its morophology and connnections," in *Comparative Neurology of the Optic Tectum*, ed H. Vanegas (New York, NY: Plenum), 867–773.
	- Huerta, M. F., Krubitzer, L. A., and Kaas, J. H. (1986). Frontal eye field as defined by intracortical microstimulation in squirrel monkeys, owl monkeys, and macaque monkeys, I. Subcortical connections. *J. Comp. Neurol.* 253, 415–439. doi: 10.1002/ cne.902530402
	- Hulme, O. J., Whiteley, L., and Shipp, S. (2010). Spatially distributed encoding of covert attentional shifts in human thalamus. *J. Neurophysiol.* 104, 3644–3656. doi: 10.1152/jn.00303.2010
	- Hume, D. (1739/1888). "Of personal identity," in *A Treatise of Human Nature*, ed L. A. Selby-Bigge (Oxford: Clarendon Press), 179–187.
	- Ignashchenkova, A., Dicke, P. W., Haarmeier, T., and Thier, P. (2004). Neuron-specific contribution of the superior colliculus to overt and covert shifts of attention. *Nat. Neurosci.* 7, 56–64. doi: 10.1038/nn1169
	- Imura, K., and Rockland, K. S. (2006). Long-range interneurons within the medial pulvinar nucleusof macaque monkeys. *J. Comp. Neurol.* 498, 649–666. doi: 10.1002/cne.21085
	- Imura, K., and Rockland, K. S. (2007). Giant neurons in the macaque pulvinar, a distinct relay subpopulation. *Front. Neuroanat.* 1:2. doi: 10.3389/neuro.05.002.2007
	- Isa, T., and Hall, W. C. (2009). Exploring the superior colliculus *in vitro*. *J. Neurophysiol.* 102, 2581–2593. doi: 10.1152/jn.00498. 2009
	- Isa, T., and Sasaki, S. (2002). Brainstem control of head movements during orienting; organization of the premotor circuits. *Prog. Neurobiol.* 66, 205–241. doi: 10.1016/S0301- 0082(02)00006-0
	- James, W. (1890). *Principles of Psychology*, Vol. 2. London:

Macmillan. Available online at: http://books.google.se/books/about/ The\_Principles\_of\_Psychology.html? id=nPFIy6WBgPYCandredir\_esc=y


Goodale, M. A., and Murison, R. C. C. (1975). The effects of lesions of the superior colliculus on locomotor orientation and the orienting reflex in the rat. *Brain Res.* 88, 243–261. doi: 10.1016/0006-8993(75)90388-1 Gorbet, D. J., and Sergio, L. E. (2009). The behavioural consequences of dissociating the spatial directions of eye and arm movements. *Brain Res.* 1284, 77–88. doi: 10.1016/j.brainres.2009.05.057 Grantyn, A., Berthoz, A., Hardy, O., and Gourdon, A. (1992). "Contribution of reticulospinal neurons to the dynamic control of head movements, presumed neck bursters," in *The Head–Neck Sensory Motor System*, eds A. Bertho, P. P. Vidal, and W. Graf (Oxford: Oxford University Press), 318–329. Grofová, I., Ottersen, O. P., and Rinvik, E. (1978). Mesencephalic and diencephalic afferents to the superior colliculus and periaqueductal gray substance demonstrated by retrograde axonal transport of horseradish peroxidase in the cat. *Brain Res.* 146, 205–220. doi: 10.1016/0006-8993(78)90969-1 Gruber-Dujardin, E. (2010). "Role of the periaqueductal gray in expressing vocalization," in *Handbook of Mammalian Vocalization. An Integrative Neuroscience Approach*, ed S. M. Brudzynski (London: Academic Press), 313–327. Guillery, R. W. (1995). Anatomical evidence concerning the role of the thalamus in corticocortical communication, a brief review. *J. Anat.* 187,

313–326.

Corticotectal projections in the cat, Anterograde transport studies of twenty-five cortical areas. *J. Comp. Neurol.* 328, 379–414. doi:

Hassler, R., and Hess, W. R. (1954). Experimentelle und anatomische Befunde über die Drehbewegungen und ihre nervösen Apparate. *Archiv für Psychiatrie und Nervenkrankheiten* 192, 488–526. doi: 10.1007/BF00342860 Heffner, R., and Masterton, B. (1975). Variation in form of the pyramidal tract and its relationship to digital dexterity. *Brain Behav. Evol.* 12, 161–200. doi: 10.1159/000124401 Henkel, C. K., and Edwards, S. B. (1978). The superior colliculus control of pinna movements in the cat, possible anatomical connections. *J. Comp. Neurol.* 182, 736–776. doi:

10.1002/cne.903240308 Hartline, P. H., Kass, L., and Loop, M. S. (1978). Merging of modalities in the optic tectum, Infrared and visual integration in rattlesnakes. *Science* 199, 1225–1229. doi: 10.1126/sci-

10.1002/cne.901820502 Hering, E. (1879/1942). *Spatial Sense and Movements of the Eye*. Trans. C. A. Radde. Baltimore, MD: American Academy of Optometry (Original work published in 1879). Herrero, L., Rodríguez, F., Salas, C., and Torres, B. (1998). Tail and eye movements evoked by electrical microstimulation of the optic tectum in goldfish. *Exp. Brain Res.* 120, 291–305. doi:

10.1007/s002210050403 Hikosaka, O., and Wurtz, R. H. (1983). Visual and oculomotor functions of monkey substantia nigra pars reticulata. IV. Relation of substantia nigra to superior colliculus. *J. Neurophysiol.* 49, 1285–1301. Hikosaka, O., and Wurtz, R. H. (1985).

ence.628839

583–592.

427–459.

cor/2.3.217

367–391.

*Neurol.* 419, 61–86.

Gutierrez, C., Cola, M. G., Seltzer, B., and Cusick, C. (2000). Neurochemical and connectional organization of the dorsal pulvinar complex in monkeys. *J. Comp.*

Guitton, D., and Volle, M. (1987). Gaze control in humans, eye-head coordination during orienting movements within and beyond the oculomotor range. *J. Neurophysiol.* 58,

etal lobule in the macaque. *Cereb. Cortex* 2, 217–230. doi: 10.1093/cer-

Harris, R. L. (1980). The superior colliculus and movements of the head and eyes in cats. *J. Physiol.* 300,

Harting, J. K., Feig, S., and Van Lieshout, D. P. (1997). Cortical somatosensory and trigeminal inputs to the cat superior colliculus, Differential influence of attention on gaze and head movements. *J. Neurophysiol.* 101, 198–206. doi: 10.1152/jn.90815.2008


circuitry. *Brain Behav. Evol.* 40, 98–111. doi: 10.1159/000113906


*Implications for the Biology of Consciousness*. Unpublished manuscript available at, http:// cogprints*.*org/179/1/COGCONSC*.* TXT


I. Morphological classification of efferent neurons. *J. Neurophysiol.* 60, 232–262.


made in infancy. *Brain Res.* 145, 410–414. doi: 10.1016/0006-8993 (78)90878-8


*Philos. Trans. R. Soc. Lond. B Biol. Sci.* 358, 1605–1624. doi: 10.1098/ rstb.2002.1213


colliculus in visually guided behavior. *Exp. Neurol.* 11, 115–146. doi: 10.1016/0014-4886(65)90026-9


*Philos. Trans. R. Soc. Lond. B Biol. Sci.* 357, 1781–1791. doi: 10.1098/rstb.2002.1163


*Brain Sci.* 14, 702–719. doi: 10.1017/S0140525X00072150


properties. *Exp. Brain Res.* 81, 626–638. doi: 10.1007/BF02423513


sulcus in rhesus monkeys. *Exp. Brain Res.* 83, 268–284. doi: 10.1007/BF00231152


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 June 2013; paper pending published: 13 June 2013; accepted: 16 July 2013; published online: 09 August 2013.*

*Citation: Merker B (2013) The efference cascade, consciousness, and its self: naturalizing the first person pivot of action control. Front. Psychol. 4:501. doi: 10.3389/fpsyg.2013.00501*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Merker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The role of locomotion in psychological development

#### *David I. Anderson1 \*, Joseph J. Campos 2, David C. Witherington3, Audun Dahl 2, Monica Rivera4, Minxuan He2, Ichiro Uchiyama5 and Marianne Barbu-Roth6*

*<sup>1</sup> Department of Kinesiology, San Francisco State University, San Francisco, CA, USA*


*<sup>6</sup> Laboratoire Psychologie de la Perception, Université Paris Descartes – Centre National de la Recherche Scientifique, Paris, France*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA T. Andrew Poehlman, Southern Methodist University, USA*

#### *\*Correspondence:*

*David I. Anderson, Department of Kinesiology, San Francisco State University, 1600 Holloway Ave., San Francisco, CA 94132-4161, USA e-mail: danders@sfsu.edu*

The psychological revolution that follows the onset of independent locomotion in the latter half of the infant's first year provides one of the best illustrations of the intimate connection between action and psychological processes. In this paper, we document some of the dramatic changes in perception-action coupling, spatial cognition, memory, and social and emotional development that follow the acquisition of independent locomotion. We highlight the range of converging research operations that have been used to examine the relation between locomotor experience and psychological development, and we describe recent attempts to uncover the processes that underlie this relation. Finally, we address three important questions about the relation that have received scant attention in the research literature. These questions include: (1) What changes in the brain occur when infants acquire experience with locomotion? (2) What role does locomotion play in the maintenance of psychological function? (3) What implications do motor disabilities have for psychological development? Seeking the answers to these questions can provide rich insights into the relation between action and psychological processes and the general processes that underlie human development.

#### **Keywords: action, brain, cognition, crawling, locomotion, infancy, psychological development**

## **INTRODUCTION**

Locomotion is one of the most thoroughly studied behaviors in the animal kingdom. It has captivated the interest of engineers, ethologists, biologists, neurologists, clinicians, psychologists, and even philosophers. Most of the scientific interest in locomotion has centered on how it evolved, how it develops, how it is controlled, and how it can be rehabilitated following injury or disability. However, several theorists, from various epistemological traditions, have pondered whether locomotion makes a broader contribution to human life beyond its obvious role in moving from one place to another. For example, Mahler, a psychoanalyst, has stated that the onset of voluntary locomotion represents the "psychological birth" of the human infant (Mahler et al., 1975). Piaget (1952, 1954) argued that the origins of intelligence were in the intercoordination of sensory information with self-produced movements, including locomotion, and Gibson (1966, 1979) similarly stressed the importance of actions like locomotion for revealing meaningful information in the world.

Given the centrality of locomotion in such a diverse range of theoretical viewpoints, one might assume that the psychological correlates and consequences of the development of self-produced locomotion would be thoroughly understood. This is distinctly not the case. Only recently have the psychological consequences of self-produced locomotion been subjected to systematic empirical study (see Anderson et al., 2013 and Campos et al., 2000 for reviews). Researchers have shown that the onset of independent locomotion is indeed a pivotal event in the life of the human infant, heralding surprisingly broadscale changes in a variety of psychological functions, including perceptual-motor coordination, spatial cognition, memory, and social and emotional processes. Moreover, evidence reveals that locomotion is not merely a maturational antecedent to these psychological changes, but instead plays a causal role in their genesis (e.g., Uchiyama et al., 2008). Researchers have also begun to unravel the processes by which locomotion has its effects on psychological development, providing important insights into the mechanisms that underlie developmental change (e.g., Dahl et al., 2013).

The primary objective of the current paper is to describe a sample of the research linking locomotion to psychological development, highlighting the range of converging research operations—including variations of the classic *enrichment* and *deprivation* paradigms in animal studies—that have been used to isolate locomotion as a central contributor to these changes. A secondary objective is to highlight recent attempts to unravel the processes by which locomotion has its effect on psychological development. A final objective is to pose three questions to guide future research in this still relatively nascent, and often under appreciated, field of study. Before tackling these objectives, we will briefly address why empirical study of the psychological consequences of self-produced locomotion was neglected for so long. Placing the issue in historical context helps to show how the study of the psychological consequences of locomotor experience has challenged some of the core assumptions in developmental psychology. Pursuing the research agenda we outline in this paper can provide valuable insights not only into the processes that underlie developmental change but also into the broader linkage between action and psychological processes.

## **WHY HAVE THE PSYCHOLOGICAL CONSEQUENCES OF SELF-PRODUCED LOCOMOTION BEEN NEGLECTED?**

Although many theoretical traditions have highlighted the centrality of locomotion in human life, strong biases have existed in biology and psychology for much of the nineteenth and twentieth centuries against the notion that motoric activity plays a role in psychological processes or human development. Two factors have been particularly important in perpetuating this bias. First, a series of experiments in the 1930s failed to confirm that advanced motor development during infancy predicted advanced intellectual functioning later in life (Kopp, 1979), leading many psychologists to assume that motor activity was unimportant for psychological functioning. In hindsight, this line of research was ill conceived, posing questions that were too broad to be tested meaningfully and assuming that motor and intellectual development must be connected via a singular individual difference variable, like genetic integrity, that influenced both similarly. In addition, researchers failed to assess the domains of psychological function that were most likely to be affected by motor activity (ignoring the specificity principle, which states that each developmental change results from specific experiences in a specific context), and they also failed to consider that the role played by motor activity in psychological development might be easier to ascertain during developmental transitions when large and rapid changes occur simultaneously in motor and psychological functioning (Bertenthal and Campos, 1990).

The second factor perpetuating a bias against a role for motor activity, and by extension locomotion, in psychological development has been the domination of unidirectional models in psychological science and biological development. The two models that dominated psychological science for much of the twentieth century were the stimulus-response model and the information processing model. Both assumed that behavior was simply the end product of a chain of events that started with the reception of stimulation from the environment and ended with some type of action. Moreover, behaviorists were not concerned with psychological processes. Though cognitive processing intervened in the information processing model, adherents to that model were far more interested in those cognitive processes than the *less interesting* behavioral output and they didn't consider that action might reciprocally influence cognition and perception. In short, action was not considered relevant to the ontology of cognition—it was merely the output of processes that make use of cognition (Allen and Bickhard, 2013)—and whether the information for perception was self-generated or externally generated was irrelevant.

Similarly, in biology, the dominant model during most of the nineteenth and twentieth centuries was a nativist one that stressed the linear unfolding of a genetic blueprint. Genetic activity led to structural maturation, which in turn led to function, activity, and experience (Gottlieb, 2007). Again, adherents to this model did not consider that the relations between these different levels of analysis might be bi-directional. Even the empiricists (psychologists in this case), who trumpeted the importance of experience in human development, viewed development in linear terms, assuming that the environment exerted its effect on an essentially passive organism.

Nativism continues to hold sway amongst contemporary developmentalists (e.g., Spelke and Newport, 1998; Spelke and Kinzler, 2009), further perpetuating the bias against locomotion playing much of a role in psychological development. The preoccupation with documenting the origins of psychological phenomenon has led to confusion between what have been labeled *partial accomplishments* (Haith and Benson, 1998; Campos et al., 2000), the precursors to mature skills, and the mature skills themselves. The confusion in turn has minimized the importance of experience, particularly self-generated experience, in orchestrating qualitative reorganizations in behavior during postnatal development and short-circuited the analysis of the processes by which the substrates of skilled behavior, i.e., the partial accomplishments, are elaborated, differentiated, and inter-coordinated into full-blown skills (Campos et al., 2008; Kagan, 2008; Spencer et al., 2009).

## **WHY HAS THE BIAS AGAINST LOCOMOTION BEGUN TO CHANGE?**

The emergence and spread of bidirectional models in biology and psychology during the latter half of the twentieth century have led to greater acceptance of the idea that actions like locomotion might have consequences for psychological development. For example, dynamical systems theory and its close cousin ecological psychology stress the reciprocity between perception, action, and cognition, and view development as the result of a complex, contingent, and multi-determined web of interactions that emerge over time (Gibson, 1988; Thelen and Smith, 1994; Witherington, 2007, 2011). Similarly, Gottlieb's (e.g., 1970, 1991, 2007) notion of probabilistic epigenesis has provided a strong challenge to the unidirectional model of human development by highlighting the diversity of co-actions (reciprocal interactions that can literally change the interacting elements) that occur across the genetic, structural, and functional (environmental) levels of analysis during pre- and post-natal development. Probabilistic epigenesis states that development is a function of time-based, probabilistic relations between these different levels of analysis. Bidirectional models highlight the activity-dependent nature of structural and functional development and give experience an essential role in the developmental process.

Two aspects of probabilistic epigenesis are especially important to the empirical work linking self-produced locomotion to psychological development. The first is the idea that one developmental acquisition, like crawling, can generate experiences that bring about a host of new developmental changes in the same and different domains. These changes in turn create still other developments in a cascading cycle throughout the lifespan. From this perspective, individuals contribute to their own development by creating the experiences that drive developmental change. The second important aspect is the notion that experience does not have a singular effect on development; it can *induce* changes that are completely dependent on those experiences for their emergence, it can *facilitate* changes that would take place without such experiences, only more slowly, and it can *maintain* changes that have already taken place. Development is probabilistic because there is typically more than one ontogenetic pathway—although one of the many pathways (e.g., locomotor experience) may be the ordinary and expectable one. This line of thinking is clearly antithetical to the traditional unidirectional account of development in which developmental change is seen simply as the maturational unfolding of a genetic blueprint.

## **WHAT IS SPECIAL ABOUT LOCOMOTOR EXPERIENCE?**

Throughout the first year of life, infants gain control over an increasingly broader range of motor skills in a predictable sequence. Each new skill presents new opportunities to engage the world and exert a degree of control over it. What makes the acquisition of crawling—typically the first locomotor skill—so impactful is that it so dramatically changes the relation between the infant and her environment. No longer at the mercy of others for movement from one place to another, the infant now has an explosion of new goals to choose from and problems to solve. She can explore the environment and operate on it at will (Gibson, 1988). Exploration, in turn, provides new perspectives and it reveals new information and creates many novel experiences that can drive changes in a family of different psychological phenomena.

The breadth of these phenomena stems from the breadth of experiences that accompany locomotion. Moreover, these experiences do not simply represent "more of the same" because the experiences of the crawling infant are fundamentally different from those of the pre-crawling infant. Locomotion orchestrates this diversity of changes by making it almost inevitable that infants will encounter the experiences that contribute to specific psychological changes. The acquisition of independent locomotion is not only significant because of the breadth of psychological phenomena to which it is connected. Its enduring significance stems from the fact that once locomotion has been acquired it is available across the lifespan and so it may well be vital to the maintenance of the very psychological skills it had a role in bringing about. We will return to this point after first considering the role that locomotor experience plays in the ontogeny of two important phenomena: wariness of heights and the search for hidden objects.

## **LOCOMOTOR EXPERIENCE AND THE EMERGENCE OF WARINESS OF HEIGHTS**

Wariness of heights is extraordinarily biologically adaptive, functioning to avoid falls that can maim, kill, and prevent reproduction of a person's genes. Indeed, Bowlby (1973) classified the fear of heights as one of the most salient "natural clues to danger." Similarly, Gibson and Walk (1960) concluded that avoidance of dropoffs is evident in non-human animals and human infants at the first testing opportunity. Scarr and Salapatek (1970) described it as one of the two strongest fears observed in infants. It remains powerful even into adulthood, as is evident in the reactions of visitors to the transparent platform extending over the edge of the Grand Canyon ("The Grand Canyon's skywalk," 2007), the Sears Tower, or a Shanghai skyscraper. It is no wonder that wariness of heights is considered under strong maturational control (Gleitman et al., 2007).

However, wariness of heights presents an enigma; it is not under maturational control, nor is it present at the earliest testing opportunity or when the threat of falling first materializes. Experience with locomotion seems to be a powerful factor in the onset of wariness of heights. Mothers notice two interesting phenomena related to dropoffs. First, there is a period after the onset of crawling when their infants would plunge over the edge of a bed, off the top of a changing table, or even off the top of a staircase if she were not extremely vigilant. Second, within 2–4 weeks of crawling onset, infants will avoid dropoffs. These maternal reports are highly consistent (Campos et al., 1978).

Laboratory experiments using a visual cliff confirm maternal reports. The visual cliff is a large table with a Plexiglas surface. Illuminated tiles immediately beneath the Plexiglas surface on the *shallow* side of the cliff give the impression of a solid surface, whereas the tiles four feet below the surface on the *deep* side give the compelling impression of a drop-off. Negative reactions to heights can be assessed by a number of indices of wariness, and each of these has been shown to undergo a developmental shift following the onset of locomotion. These indices include (1) changes from cardiac deceleration to acceleration when the infant is lowered to the deep side of the cliff (Campos et al., 1992); (2) initial crossing to the mother on a beeline when she calls the child over the deep side, followed by eventual avoidance (Campos et al., 1978); (3) initial absence of facial patterns indicative of distress when infants are lowered to the deep side of the cliff, to significant negative facial responses starting at 11 months of age and possibly before (Hiatt et al., 1979); and finally, a change from nonchalance to stiffening of the body and resistance with the arms when an infant is pushed from behind onto the deep side of the cliff. There is thus no doubt that a developmental shift takes place in wariness of heights. The shift is seen in many emotional ways and it is observed in real-world and laboratory contexts.

This developmental shift is where the enigma rests: by what process does the infant become wary of heights and how does that process produce a lifelong, biologically adaptive, wariness?

We can rule out the development of depth perception as the crucial factor. Infant depth perception is very well-developed some 2 or 3 months before wariness of heights is expectable (Timney, 1988). Depth perception is sufficiently well-developed at 6 months to allow clear differentiation of distances on the visual cliff. For instance, in a study by Walters (1980), prelocomotor 6-month-olds, when lowered toward the shallow or the deep side of the cliff, and who otherwise show no wariness of heights, extend their arms and hands in preparation for contact with the visually solid shallow side of the cliff, but show no such extension of arms and hands when lowered to the deep side. They quite happily land on their bellies on the deep side.

Falling experiences can also be ruled out as the crucial factor in the shift. The relation between falls and avoidance of heights or risky slopes is weak or non-existent (Walk, 1966; Campos et al., 1978; Adolph, 1997). Social referencing (Sorce et al., 1985) is not likely to play a role in the developmental shift either because it comes online well after the development of wariness of heights. So, the mother's facial, vocal, and gestural expressions cannot serve as unconditioned stimuli that become the basis for the infant learning to fear heights when paired with depth-at-an-edge (Mumme et al., 1996).

Finally, the developmental shift cannot be an artifact of the visual cliff apparatus. The solid glass surface cannot be said to provide a "safe" medium onto which the newly-locomoting infant can descend simply because touching the surface reveals its solidity. Though solid to touch, the transparent surface eventually becomes a source of avoidance with age and experience in longitudinally-tested infants (Campos et al., 1992). Furthermore, the maternal reports on infant near-falls cited above concur with the findings on the cliff, demonstrating ecological validity of findings using the cliff table. Lastly, there are the observations by Adolph (1997) using "risky slopes," without a glass surface, that showed the same functional relation between locomotor experience and avoidance of dropoffs as does work with the visual cliff. The developmental shift found in visual cliff studies is thus robust, replicable, and ecologically valid.

## **A PROPOSED EXPLANATION OF THE ONTOGENY OF WARINESS OF HEIGHTS**

The explanation of the developmental shift toward wariness of heights must involve experience but not classical conditioning (such as to falls); it must involve the discovery of a factor or factors that provide an "affective sting" (i.e., concern relevance, Frijda, 1986) that the experience of depth alone does not provide; it must explain why the fear of heights is often accompanied by the reports of heights being "dizzying;" it must account for the role of locomotor experience in the shift; and it must explain the presence of wariness of heights in the occasional, though rare, prelocomotor infant. What can that factor or set of factors be?

Bertenthal and Campos (1990) proposed an explanation that meets the above criteria. They maintained that visual proprioception plays a crucial role in the onset and maintenance of wariness of heights. Although not widely known, visual proprioception is as fundamental a perceptual process as form, motion, depth, and orientation. Visual proprioception is the optically induced sense of self-movement produced by patterns of optic flow in the environment (Gibson, 1966, 1979). It is best known to most people by the experience, when one is seated stationary on a train or bus, of one's self moving when it is the train or bus on an adjacent track in the visual periphery that is moving. However, visual proprioception is much more than the source of a trivial illusion. It is crucial for establishing and maintaining postural stability and for navigation in the world. It is the apparent *loss of postural stability* linked to visual proprioception that leads to wariness of heights. According to Bertenthal and Campos, visual proprioception is not fully present in the infant with no locomotor experience, but becomes functional, and eventually well-established, as experience with locomotion increases. In brief, because of developmental changes in visual proprioception with locomotion, heights are initially not "dizzying," but then become so.

Visual proprioception depends on patterns of optic flow that covary with self-movement. When one is looking and moving straight ahead there is a radial (star-like) pattern with optical flow originating from a static point in the center of one's visual field. Simultaneously, there is a lamellar (layered and parallel) pattern of flow in the visual periphery. Although perception of self-movement has traditionally been relegated to information from the vestibular and the somatosensory systems, visual proprioception is so powerful that a standing 13-month-old infant will fall down when exposed to optic flow in a *moving room* (Lee and Aronson, 1974). The moving room is a small, textured enclosure with one end open (**Figure 1**). Pushing or pulling the room gives the child the perception of moving forward or backward (depending on the direction of optic flow) even when he or she is stationary. Peripheral lamellar optic flow, generated by moving only the side walls in the moving room, creates a particularly compelling sense of self motion and leads to greater visual-postural coupling than radial optic flow (Stoffregen, 1985). Visual proprioception is without doubt a powerful source of information for postural stability and instability.

Bertenthal and Campos (1990) linked visual proprioception to wariness of heights via the following set of propositions. First, they predicted that infants with locomotor experience would show visual proprioception in response to peripheral optic flow, whereas infants without locomotor experience would not, or would do so minimally. Secondly, once this type of visual proprioception comes online, it works in concert with vestibular, and somatosensory information to specify stasis or changes in posture or self-movement. Third, when a child approaches a dropoff, there is a sudden loss of visual proprioceptive information in the periphery, but not of vestibular or somatosensory information. At a dropoff, there is little or no optic flow in the periphery of the visual field and head/body movements produce little change in radial or lamellar flow because of the distance from the child to the closest visible surface (the floor). This loss of visual information is the basis for wariness of heights because of the disparity between visual and somatosensory/vestibular information for self-movement and/or a reduction in postural stability (see Brandt et al., 1980).

**FIGURE 1 | The moving room.** Responsiveness to peripheral optic flow is determined by cross-correlating the infant's postural sway in the fore-aft direction, measured by four force transducers under the legs of the infant seat, with the movement of the side walls.

Locomotor experience is important in the functionalization of peripheral lamellar optic flow into visual proprioception for at least two reasons. One, the infant who is able to move voluntarily can notice and detect patterns of optic flow that coincide with forward and backward movements of the body. Prior to voluntary locomotion, there is little or no regularity between direction of optic flow and self-movement because when infants are carried passively, forward movement can be linked to any number of directions of optic flow depending on how the infants are held and where they are looking. In addition, most infants when carried early in life are in a state of "visual idle," looking at nothing in particular. Only when the infant moves voluntarily do the head and eyes consistently point straight ahead (Higgins et al., 1996), allowing consistent exposure to radial optic flow in the central field of view and lamellar optic flow in the periphery. The second reason locomotor experience is important is that when the infant must navigate the world, it is important to segregate information about environmental features (specified in the central field of view) from information about self-movement (specified by peripheral optic flow) so as to steer an appropriate course and maintain postural stability (Gibson, 1979). Because these tasks must be accomplished simultaneously, locomotion leads to a perceptual differentiation wherein central and peripheral optic flow are relegated different perception-action functions. Attending to features of the environment can be accomplished more effectively and efficiently in the central field of view if postural stability is relegated to the periphery.

There is now no doubt that locomotor experience affects visual proprioception. Using two converging research operations—(1) an age-held-constant study of locomotor, prelocomotor, and prelocomotor infants with artificial "walker" experience, and (2) the random assignment of precrawling infants to a condition in which they could control their own movement in a powered mobility device (PMD) (**Figure 2**) or a no-movement condition, Uchiyama et al. (2008) documented that infants with any kind of locomotor experience showed not only postural compensation to peripheral optic flow in a moving room, but also negative emotional reactions to peripheral optic flow, consistent with a sense of loss of postural stability. These findings confirmed previous reports of greater responsiveness to peripheral optic flow in infants with locomotor experience compared to same-aged infants without locomotor experience (Higgins et al., 1996). In sum, the proposition of the Bertenthal and Campos hypothesis that locomotor experience brings on or greatly improves visual proprioception has been empirically supported.

## **TESTING THE LINK BETWEEN VISUAL PROPRIOCEPTION AND WARINESS OF HEIGHTS**

Two studies were recently conducted by Dahl et al. (2013) to test the relation between visual proprioception and wariness of heights proposed by Bertenthal and Campos (1990). The first study examined whether newly crawling infants who were highly responsive to peripheral optic flow would be more likely to avoid heights. Wariness of heights was assessed on a visual cliff and postural compensation to peripheral optic flow was assessed by moving the side walls in a moving room. Under the infant's seat in

**FIGURE 2 | The powered-mobility-device (PMD) used to test the relation between self-produced locomotion and psychological development.** Infants can move forward in the PMD by pulling on the brightly colored joystick.

the moving room were force sensors that recorded postural sway in the fore and aft directions. Cross correlating the postural sway data with the displacement of the side walls provided an index of the strength of the coupling between vision and posture.

As predicted, postural compensation to peripheral optic flow was positively and significantly associated with infant avoidance of the deep side of the visual cliff. That is, the greater the coupling between an infant's postural sway and the wall movement, the more likely the infant was to avoid the drop-off. In contrast, there was no relation between visual-postural coupling in the moving room and avoidance of the shallow (non-dropoff) side of the visual cliff (see **Figure 3**). These findings were replicated in another unpublished study with somewhat younger infants who had similar amounts of locomotor experience, further evidencing the robustness of the relation between infant visual proprioception and wariness of heights.

The second study used the PMD to experimentally manipulate infant experience with self-produced locomotion and responsiveness to peripheral optic flow. The study had three purposes: (1) to investigate whether PMD experience would lead to increased wariness of heights, (2) to corroborate Uchiyama et al.'s (2008) finding that PMD experience leads to increased responsiveness to peripheral optic flow, and (3) to test whether the relation between PMD experience and wariness of heights is mediated by responsiveness to peripheral optic flow, as predicted by the Bertenthal and Campos (1990) hypothesis. Since all infants were precrawlers, they were tested on the visual cliff by measuring their heart rate (HR) while they were lowered onto the deep and shallow sides of the visual cliff. HR differentiation between the deep and shallow sides was used as an index of wariness (Ueno et al., 2012, showed that the crossing paradigm and the lowering paradigm

on the visual cliff yield the same conclusions). As in the previous study, visual proprioception was assessed in the moving room.

All three predictions were supported. PMD infants showed greater HR differentiation between the deep and shallow sides of the visual cliff than control infants (see **Figure 4**), they showed greater responsiveness to peripheral optic flow in the moving room than controls (see **Figure 5**), and, finally, the relation between PMD experience and HR differentiation on the visual cliff was mediated by infant responsiveness to peripheral optic flow. In other words, only insofar as PMD infants had higher postural responsiveness to the moving room did they also show higher cardiac signs of wariness of heights.

The above studies thus show strong support for the hypothesis that wariness of heights typically comes about through locomotor-induced changes in visual proprioception. However, none of the studies actually manipulated infant use of visual proprioceptive information in the presence of a drop-off. The Bertenthal and Campos (1990) hypothesis implies that if crawling infants, ordinarily wary of drop-offs, are provided with additional visual proprioceptive information at the edge of a drop-off they should show less wariness of heights. The provision of visual referents has been shown to improve postural control at the edge of a drop-off in adults (Simenov and Hsiao, 2001).

In an ongoing study, a corridor was built on top of the visual cliff. The walls of the corridor are either covered by highly patterned fabric (increased texture condition) or are plain white (minimal texture condition). Importantly, the presence of the corridor gives no additional clues that the surface of the visual cliff is solid. Infants are encouraged by their mothers to cross the deep side of the visual cliff through the corridor. If infants rely on peripheral optic flow for postural stability as they locomote, and loss of that information leads to wariness when depth at an edge is encountered, then they should be more likely to cross the deep side of the visual cliff in the increased texture condition than in

the minimal texture condition. Preliminary data conform to prediction. Infants with more than 6 weeks of crawling experience are significantly more likely to cross the deep side of the visual cliff in the increased texture condition than in the minimal texture condition. The added texture thus appears to provide optic flow that, at least in part, compensates for the loss of visual information at the edge of the drop-off.

In sum, convincing evidence has been provided for Bertenthal and Campos's novel explanation for the emergence of wariness of heights. Locomotor experience appears to functionalize peripheral optic flow such that infants come to rely on this source of visual proprioceptive information for postural stability during locomotion. Upon encountering a drop-off, infants show signs of wariness either because they lose information they have come to rely upon, they experience a discrepancy between information provided by the visual, vestibular, and somatosensory systems, and/or their postural stability decreases.

The above studies also show that locomotor experience is not the *only* way by which infants can become wary of drop-offs. Indeed, Dahl et al. (2013) reported a positive relation between responsiveness to peripheral optic flow and cardiac signs of wariness in the pre-locomotor control group. The development of wariness of heights, like so many other (if not all) developmental processes is not deterministic, but probabilistic (Campos et al., 2000; Gottlieb, 2007). Transitions typically engendered by locomotor experience, like reliance on peripheral optic flow for visual proprioception, can sometimes be brought about through alternative developmental pathways. One question for future research

is what these additional developmental pathways are in the cases of visual proprioception and wariness of heights.

## **SUMMARY**

Converging research operations—including the experimental manipulation of infant experience with self-produced locomotion—have systematically documented that locomotor experience can induce a reorganization in visual proprioception and the onset of wariness of heights. These same converging operations have begun to address issues of process by establishing functionalization of peripheral optic flow as an experiential mediator in the relation between self-produced locomotion and wariness of heights. As such, this line of research serves as a model for beginning to tackle the question of how locomotor experience might bring about its functional consequences for other psychological skills. In the next section, we examine the relation between locomotor experience and improved search for hidden objects. Though the link between the two is strong and the processes that underlie the link are extremely important to understand, it has not yet received the same rigorous experimental treatment as the link between locomotion and visual proprioception and wariness of heights.

## **LOCOMOTOR EXPERIENCE AND MANUAL SEARCH FOR HIDDEN OBJECTS**

Correctly searching for an object hidden in one of two locations proves to be a surprisingly difficult skill for the infant who has already developed proficiency in reaching and grasping. Infants between 8 and 9 months-of-age can successfully retrieve an object hidden within reach at one location, but they often fail when the object is hidden under one of two adjacent locations, even when the locations are perceptually distinct (Piaget, 1954; Bremner, 1978). More curiously, infants at this age will often continue to search for an object in its original hiding location even after they have seen it moved to a new hiding location. This perseverative search is referred to as the *A-not-B error* and the infant's performance becomes progressively poorer as the delay between hiding in the new location and search increases (Diamond, 1990).

The ability to search for and retrieve hidden objects has been the subject of intense scientific scrutiny because it represents a major transition in the infant's understanding of spatial relations. The capacities that underlie successful spatial search are thought to contribute to many important cognitive changes, including concept formation, aspects of language acquisition, representation of absent entities, the development of attachment, and other emotional changes (Haith and Campos, 1977). Importantly, changes in spatial search behavior have been explained entirely in maturational terms; specifically, maturation of the dorsolateral prefrontal cortex has been postulated as the necessary precursor to successful search (Kagan et al., 1978; Diamond, 1990). In contrast, Piaget (1954), among others (e.g., Hebb, 1949), has argued that changes in search behavior stem from motoric experience and active exploration of the world.

## **EVIDENCE LINKING LOCOMOTION TO SKILL IN SPATIAL SEARCH**

A number of researchers, including Piaget (1954), have speculated about a link between skill in spatial search and locomotor experience (Bremner and Bryant, 1977; Campos et al., 1978; Acredolo, 1978, 1985; Bremner, 1985). The first confirmation of the link was provided by Horobin and Acredolo (1986) who showed that infants with more locomotor experience were more likely to search successfully at the B location on a series of progressively challenging hiding tasks. The finding was replicated and extended by Kermoian and Campos (1988), using a similarly challenging series of spatial search tasks that ranged from retrieving an object partially hidden under a single location to the A-not-B task with a seven-second delay between hiding and search. Infants in the study were all 8.5 months-of-age but differed in experience with independent locomotion. The results showed clearly that infants with hands-and-knees crawling experience or experience moving in a wheeled-walker significantly outperformed the prelocomotor infants on the spatial search tasks. Moreover, search performance improved as experience with locomotion increased. For example, 76% of crawling and walker infants with nine or more weeks of locomotor experience successfully searched in the B location on the A-not-B test with a 3 s delay compared to only 13% of infants without locomotor experience.

The obvious conclusion from the Kermoian and Campos (1988) study is that locomotion, regardless of how it is accomplished, makes an important contribution to spatial search. However, a third experiment in the series raised an important caveat to that conclusion. *Belly crawling* infants, who were the same age as those tested in experiments 1 and 2, with between 1 and 9 weeks of crawling experience performed like *prelocomotor* infants on the spatial search tasks. Moreover, no relation was found between the amount of belly crawling experience and spatial search performance.

Why would belly crawling experience *fail* to make the same contribution to skill in spatial search as hands-and-knees crawling and walker experience? Kermoian and Campos (1988) argued that belly crawlers failed to profit from their locomotor experiences because belly crawling is so effortful and inefficient. Belly crawlers were thought to devote so much effort and attention to organizing forward progression that they were unable to deploy attention to the environment in the same way as the hands-and-knees crawlers and infants in walkers. Consequently, the belly crawlers may not have noticed some of the important spatial transformations during crawling, such as occlusion and reappearance of objects that contribute to improved search performance.

The Kermoian and Campos (1988) findings have been replicated and extended using a variety of converging research operations, including cross-sectional and longitudinal research designs as well as a variation of the deprivation design that took advantage of ecologically and culturally mediated delays in the onset of independent mobility in urban Chinese infants (Tao and Dong, 1997, unpublished data). Specifically, infants in Beijing who were delayed in locomotion by 2 to 4 months relative to North American norms initially performed poorly on the A-not-B test, then improved dramatically as a function of locomotor experience regardless of the age at which they acquired independent locomotion.

The relation between locomotor experience and spatial search performance is not confined to typically-developing infants. The relation has also been confirmed in a longitudinal study of seven infants with spina bifida (Campos et al., 2009). Spina bifida is a neural tube defect that is associated with delays in locomotor and psychological development. The test was a two-position hiding task in which a toy was hidden only in one location, with a second hiding location serving as a distractor. Infants were tested monthly after recruitment until 2 months after the delayed onset of independent locomotion, which occurred at 8.5, 11.5, and 13.5 months-of-age in three of the infants and 10.5 months-of-age in the other four. Dramatic improvements on the task were noted following the onset of locomotion. Infants searched successfully for the hidden object on only 14% of trials before they were able to crawl, but improved to 64% correct search following the delayed onset of locomotion.

Bai and Bertenthal (1992) studied the link between locomotor experience and spatial search in the context of a paradigm designed to assess position constancy. Position constancy is an ability to find an object or location following a shift in one's spatial relation to that object or location. Position constancy would be impossible without a basic level of skill in spatial search. Three groups of 33-week-old infants were tested. One group was prelocomotor, one group had 2.7 weeks of belly crawling experience, and one group had 7.2 weeks of hands-and-knees crawling experience. An object was hidden under one of two different colored cups that were placed side by side in front of the infant. Prior to searching for the object, the infant was rotated 180 deg around the other side of the table on which the cups were placed *or* the table was rotated 180 deg. The data from the first trial showed a particularly strong effect of locomotor experience. Infants with hands-and-knees crawling experience successfully retrieved the object on 72% of trials following rotation to the other side of the table compared to a 25% success rate for the prelocomotors. As in Kermoian and Campos's (1988) spatial search experiment, the belly crawlers in Bai and Bertenthal's study performed liked prelocomotors, searching successfully on only 30% of trials. Notably, the groups did not differ on their search performance when the table was rotated, likely because this type of displacement is rarely experienced by any infant, regardless of locomotor experience. (**Figure 6** shows a hypothetical series of spatial search tasks to highlight the difference between the typical search procedure and the one in which the table or the infant is rotated).

## **HOW IS SPATIAL SEARCH FACILITATED BY LOCOMOTOR EXPERIENCE?**

The process by which locomotion contributes to spatial search remains poorly understood despite the range of converging research operations that have been used to document the link between locomotor experience and skill at spatial search. The need to explain the spatial component of manual search for hidden objects (where is the object located) as well as the temporal component (improved tolerance of increasing delays between hiding and search) has added to the challenge of developing viable explanations. Nevertheless, we have speculated previously (Campos et al., 2000) that at least four different factors contribute to improvements in search performance: (1) shifts from egocentric to allocentric coding strategies, (2) new attentional strategies and improved discrimination of task-relevant information, (3) improvements in means-ends behaviors and greater tolerance of delays in goal attainment, and (4) refined understanding of others' intentions.

## *A shift in coding strategies*

Piaget first proposed that changes in spatial search performance reflect shifts from egocentric (body referenced) to allocentric

(environment referenced) coding strategies (Piaget, 1954). He reasoned that prelocomotor infants could rely on egocentric coding strategies because they interacted with their environment from a stationary position. Thus, an object on the left would always be found on the left and an object on the right would always be found on the right. However, egocentric coding strategies are unreliable once the infant starts to move from place to place because the mobile infant's relation to the environment changes constantly. In Piaget's scheme, objects are first tied to the sensory impressions they give rise to and then to the actions that are performed on them. Even when infants can first represent objects independently of their own actions, the objects are still bound to specific locations in space. Only after infants develop a truly objective view of the world do they realize that objects can potentially inhabit many different positions in space.

## *New visual attentional strategies*

Locomotor infants are commonly observed to be more attentive and less distractible during spatial search tasks (Campos et al., 2000). The idea that locomotion might facilitate changes in attentional strategies is quite reasonable if one assumes that attention is largely in the service of actions (e.g., Franz, 2012). Richard Walk has been one of the most vocal proponents of this idea, arguing that, "Although motor activity is important, its function seems to be mainly that of properly directing attention; the motor activity itself seems to contribute little" (Walk, 1981, p. 191).

Acredolo and colleagues first proposed visual attention as a mediator between locomotor experience and success on spatial search tasks (Acredolo et al., 1984; Acredolo, 1985; Horobin and Acredolo, 1986). They noticed that infants who kept an eye on the hiding location were more likely to retrieve the object successfully. In addition, visual distractions that encourage the infant to take their eye off the hiding location decrease the likelihood of successful search (Diamond et al., 1994). Keeping an eye on objects may be a particularly helpful way for a locomotor infant to retrieve objects following self-displacement. Keeping an eye on objects may also help infants to discriminate perceptually relevant information about the self and the environment through the process of *education of attention* to meaningful invariants (Gibson, 1979). Improved spatial discrimination of relevant task features has been proposed as one means by which locomotor experience might facilitate performance on the A-not-B task (Smith et al., 1999; Thelen et al., 2001).

## *Improvements in means-ends behaviors and working memory*

Improvements in means-ends behaviors (e.g., Diamond, 1991) and greater tolerance for delays between initiating a behavior and completing it have been proposed to account for the observation that errors on the A-not-B task increase as the delay between hiding and search increases. How is experience with locomotion implicated in this process? The logic is that prone locomotion is a continuous task that is accomplished by concatenating a series of discrete movements of the arms and legs. The infant often struggles with several different means of coordinating all four limbs before discovering the diagonal pattern of couplings between the arms and legs that characterizes proficient (and efficient) fourlimbed gait (Freedland and Bertenthal, 1994; Adolph et al., 1998). Learning to locomote proficiently may then transfer to learning other means-ends behaviors, perhaps through a process akin to *learning how to learn* (Harlow, 1949; Adolph, 2005; Seidler, 2010). In addition, locomotor goals require more time to complete than discrete actions like reaching and so the infant must keep the locomotor goal in mind for a longer period of time, taxing working memory.

A recent study linking locomotor experience to greater flexibility in memory retrieval provides indirect evidence that locomotion might facilitate the infant's ability to tolerate longer delays in the A-not-B task. Herbert et al. (2007) tested 9-month-old crawlers and non-crawlers on a deferred imitation task. An experimenter demonstrated an action on a toy and the infants were tested 24 h later to see if they would perform the same action. Crawlers and pre-crawlers imitated the action when they were given the same toy in the same context in which they were tested (laboratory or home), however, crawlers were significantly more likely than pre-crawlers to imitate the action when the toy and the testing context were different. The authors argued that locomotor experience promotes flexibility in memory retrieval because locomotor infants have abundant opportunities to deploy their memories in novel situations. It is not unreasonable to think that locomotion might also contribute to changes in working memory given that it has been linked to long-term memory. Such changes would be the basis for the greater tolerance of delays in hide-and-seek tasks.

## *Improved understanding of others' intentions*

We have already noted that locomotor infants are more attentive and less distractible during search tasks. However, they also appear to search for communicative signals from the experimenter. This search is likely related to their ability to follow the referential gestural communication of an experimenter (e.g., Campos et al., 2009) and increased distal communication with the parent after the onset of locomotion (Campos et al., 2000). The importance of social communication in the A not B error has recently been highlighted by an experiment showing that perseverative search errors are considerably reduced when communication between the experimenter and infant is minimized (Topál et al., 2008). The authors argue that infants make the error because they misinterpret the *game* they are playing with the experimenter during the trials when objects are hidden at the A location. The growing literature on the link between action production and action understanding (e.g., Sommerville and Woodward, 2010) is also relevant to the potential mediating role of understanding others' intentions in successful spatial search. This literature suggests that infants' understanding of other people's actions as being goal-directed is a function of their own action experience.

## **SUMMARY**

The evidence supporting a link between locomotor experience and spatial search performance is compelling. A range of converging research operations have shown that infants who can locomote perform better on spatial search tasks than infants who cannot. However, it is important to note here that we have not yet demonstrated a *causal* association between locomotion and spatial search performance as has been done for locomotion and visual proprioception and wariness of heights. The PMD is currently being used to conduct the pivotal studies. In addition, more attention must be devoted to understanding *how* locomotor experience contributes to spatial search performance. While the proposed mechanisms described above seem intuitive and viable, none have been confirmed experimentally.

The need for better understanding of the developmental process prompts us to raise additional questions about the relation between locomotion and psychological development that have received scant attention in the research literature. These include, how does the brain change when infants acquire locomotor experience, what role does locomotion play in the maintenance of psychological function, and what implications do limitations in motor ability have for psychological development? We now turn our attention to these important questions in the hope of showing how they can contribute to a deeper understanding of the processes that link action and psychological function and the processes that underlie developmental change.

## **WHAT CHANGES IN THE BRAIN OCCUR WHEN INFANTS ACQUIRE EXPERIENCE WITH LOCOMOTION?**

The emergence in infancy of each new motor skill brings new means of engaging the world. Given the activity-dependent character of neurological development highlighted by contemporary, bidirectional developmental models, we should expect reorganizations in cortical structure to accompany and be dependent on the acquisition of these skills. Surprisingly little empirical work, however, exists to confirm this speculation. Thus, the question of what changes in the brain are consequences of acquiring independent locomotion remains largely unexplored.

The critical role that activity plays in the development of psychological function extends to the development of neurological structure and function. Empirically, the activity-dependent character of neurological development is now well-established (Katz and Shatz, 1996; Pallas, 2005; Gottlieb et al., 2006; Westermann et al., 2007). Consider the oft-cited example of ocular dominance column formation, in which binocularly innervated tissue in layer 4 of the visual cortex developmentally segregates into alternating, *eye-specific* columns of cortical neurons. Even brief monocular deprivation in early postnatal development—limiting sensory activity to one eye—produces major anatomical changes to the structure of these columns (Hubel and Wiesel, 1963; Katz and Crowley, 2002). Such functional restructuring of the cortex illustrates how its eye-specific layering is plastically responsive to activity-derived competition for cortical neuronal resources (Katz and Shatz, 1996; Mareschal et al., 2007), even in premature infants (Jandó et al., 2012).

At the more macro-level of organismic activity, numerous examples of activity-modified brain structure exist, from demonstrations of cortical reorganization when novel motor skills are learned (e.g., Karni et al., 1998; Kleim et al., 1998; Zatorre et al., 2012) to the classic environmental complexity studies of Rosenzweig and colleagues, which show structural changes in the brains of rats reared in complex environments and given opportunities to actively explore and play with various objects compared to rats that were visually exposed to the complex environment but unable to engage with it. Among the structural changes are increases in synaptic size and density, expanded dendritic arborization, and increases in glial cells, vascular density, and neurogenesis (e.g., Ferchmin et al., 1975; Greenough et al., 1987; Markham and Greenough, 2004; Vazquez-Sanroman et al., 2013).

The importance of micro and macro levels of activity for the development of neurological structure is not just limited to modifications or extensions of existing neural architectures. Even *in utero*, before sensory systems are functionally active and sampling external stimulation, sensory neurons engage in *spontaneous* waves of activity that influence cortical differentiation (O'Leary, 1989; Pallas, 2005; Mareschal et al., 2007). Alongside this spontaneous neural activity is internally generated spontaneous activity issuing from cortical and subcortical structures of the brain. Such activity is considered by many to serve a critical role in the formation and early differentiation of neural networks (O'Leary, 1989; Katz and Shatz, 1996; Westermann et al., 2007). For example, the emergence of initial column structure in layer 4 of the visual cortex depends on spontaneously generated retinal activity (Feller and Scanziani, 2005; Mareschal et al., 2007) and experimental blockage of such activity has adverse consequences for neural development (Pallas, 2005). This also holds true at the macro level for the spontaneous motor activity of embryos and fetuses during prenatal development; experimental restraint of such activity yields morphological abnormalities in skeletal, muscular, and neural development (Einspieler et al., 2012).

In short, functional activity plays a central role in the formation, construction and development of structure in the nervous system. In stark contrast to the unidirectional framing of structure-function relations featured within traditional, maturational treatments of brain development, more and more neurologically-focused empirical work argues that function and structure reciprocally influence on one another throughout development. The bidirectionality of the relationship situates functional activity at the very heart of structural development, not as a mere epiphenomenal outgrowth of it. Such bidirectionality in structure-function relations is the hallmark of Gottlieb's (1970, 1991, 2007; Gottlieb et al., 2006) *probabilistic epigenesis* and is a mainstay of more recent efforts to establish relational approaches to neurological development, such as the theoretical framework of *neuroconstructivism* (Johnson and Karmiloff-Smith, 2004; Mareschal et al., 2007; Westermann et al., 2007).

What, then, do we know about the influence that locomotion has on the brain? The limited insights we have into the brain changes that accompany the onset of crawling come from work that was done by Bell and Fox (1996, 1997). They used an age-held-constant design with 8-month-olds who varied in their experience with hands-and-knees crawling activity to investigate the relation between cortical development and crawling experience. In their first study, four groups of infants—a prelocomotor group, a novice crawling group (1–4 weeks of experience), a middle-level crawling experience group (5–8 weeks of experience), and a long-term crawling experience group (9 or more weeks of experience)—were compared using a measure of EEG coherence across frontal, parietal, and occipital brain regions to index synaptic connectivity. EEG coherence measures the degree of association or coupling between different brain regions.

Bell and Fox (1996) discovered a curvilinear relationship between crawling experience and EEG coherence. Specifically, infants with 1–4 weeks of crawling experience demonstrated much greater EEG coherence than their long-term crawling counterparts (9 or more weeks of experience) and their prelocomotor counterparts. In their second study, Bell and Fox (1997) reproduced the same basic curvilinear relationship across the four groups of crawlers, however, this time with an estimate of withinregion EEG power. The relationship held for EEG power in the frontal and parietal regions of the brain, but not the occipital region. Again, it was the infants with 1 to 4 weeks of crawling experience who demonstrated greater EEG power values than all other groups.

Given the greater coherence and power seen in the group with minimal crawling experience, Bell and Fox (1996, 1997) concluded that the brain changes represented an *experienceexpectant* rather than an *experience-dependent process*(Greenough et al., 1987; Greenough and Black, 1992). As their labels suggest, experience-expectant processes are thought to emerge in anticipation of experiences that are ubiquitous and common to all members of a species, whereas experience-dependent processes are idiosyncratic or unique to an individual. Bell and Fox argued that the brain overproduced synaptic connections in anticipation of the new sets of experiences likely to derive from the acquisition of crawling, a species-typical motor skill. Synaptic pruning was assumed to follow the initial overproduction of synapses as the infant consolidated crawling and its experiential consequences.

Do the changes in EEG coherence and power seen at the onset of crawling really represent an experience-expectant rather than an experience-dependent process? Unfortunately, we don't have an answer to this question as no attempts have been made to replicate the Bell and Fox experiments. Two factors lead us to believe that the observed changes were dependent on experience, however. First, though the infants in the two studies had limited crawling experience, it must be remembered that they were hands-and-knees crawlers. This is important because infants typically explore many different forms of prone locomotion before converging on the more efficient hands-and-knees pattern, as noted earlier in the paper (Adolph et al., 1998). Consequently, Bell and Fox may have underestimated the amount of experience the infants had with self-generated locomotion. Second, an explosion of research in the neurosciences over the last decade has documented countless examples of experience-dependent plasticity in human development across the lifespan.

When the results from the environmental enrichment studies alluded to earlier are combined with the role that functional activity is known to play in the development of the nervous system, the idea that locomotion induces changes in the brain seems eminently reasonable. Nevertheless, the idea awaits experimental confirmation. Here is another research question that could be addressed using the powered-mobility-device. We hypothesize that prelocomotor infants given training in the PMD would show similar EEG coherence and power values to those seen in the infants with 1–4 weeks of crawling experience in the Bell and Fox (1996, 1997) studies and higher values than seen prior to training. In contrast, we would not expect to see changes in coherence and power in infants who did not receive training.

## **WHAT ROLE DOES LOCOMOTION PLAY IN THE MAINTENANCE OF PSYCHOLOGICAL FUNCTION?**

We noted earlier in the introduction that Gottlieb (1970, 1991, 2007) outlined three roles for experience in development induction, facilitation, and maintenance. The discussion so far has focused on the first two roles; it is now time to focus on maintenance, the role that has received little, if any, empirical attention in the developmental literature. The concept of maintenance by experience has enormous implications for our understanding of the declines in psychological function associated with the aging process, and it provides a theoretical bridge between the processes that generate psychological structure and function in the early years of life and those that contribute to its deterioration later in life.

Experientially-induced cognitive and neural plasticity during adulthood is a topic of major interest in the neurosciences at the moment because of the dramatic shift in the proportion of the global population that will be over 65 years-of-age within the next 25 years and the concomitant personal, social, and economic costs that stem from age-related declines in cognitive function (Anderson-Hanley et al., 2012; Karbach and Schubert, 2013). It is particularly relevant to the central thesis of this paper that changes in an older person's gait are now recognized as early predictors of dementia, including Alzheimer's disease (Hall et al., 2000; Verghese et al., 2002, 2007). Those individuals at risk for dementia have slower walking speeds, disrupted rhythms, and show greater variability from stride to stride. Equally relevant is the prevailing tendency to view gait dysfunction as the first symptom of the disease rather than a contributor to the disease. In other words, most researchers assume that gait dysfunction (and motor dysfunction more broadly) is simply the earliest manifestations of the neural and vascular changes that will ultimately lead to detectable cognitive impairment, even though many acknowledge that the relation between physical activity and cognitive function is complex and likely reciprocal (Cedervall et al., 2012).

The tendency to downplay or ignore a potential role for mobility impairment in the progression of cognitive impairment is surprising given what is now known about the protective effects of physical activity on cognitive functioning in the elderly. (However, it is reminiscent of the skepticism that has met the idea that locomotion contributes to early psychological development.) Numerous studies have shown a positive effect of exercise and physical fitness on mental health and cognitive performance, using correlational research designs and randomized controlled trials (for reviews see Kramer and Erickson, 2007; Hillman et al., 2008; Baker et al., 2010; Chaddock et al., 2010; Erickson et al., 2012). Moreover, the areas of the brain where the most dramatic exercise-related structural changes occur, the neural, vascular, and molecular substrates that underlie these changes, and the effects that can be attributed to exercise *per se*, vs. learning, have been well-documented (Nithianantharajah and Hannan, 2009; Thomas et al., 2012).

The differential effects of learning vs. exercise on brain development, demonstrated some years ago by Greenough and colleagues (Black et al., 1990), and the brain regions known to be affected by physical activity, are important to consider relative to the potential effects of locomotion on the maintenance of psychological function. Rats who were given a prolonged period of wheel running showed an increase in blood vessel density in the cerebellum whereas those given acrobatic training showed an increase in synaptogenesis. More recent work has shown that while exercise can increase neurogenesis in the mouse hippocampus, environmental enrichment enhances the survival of new neurons and increases the likelihood they will be incorporated into existing neural networks (Kronenberg et al., 2003).

Exercise-related changes in the brain are typically localized to the motor cortex, the cerebellum, and the hippocampus (Thomas et al., 2012). Although the cerebellum has traditionally been assumed to participate exclusively in the control of movement, Diamond (2000) has argued that the connections between the cerebellum and the dorsolateral prefrontal cortex suggest that the cerebellum might also play an important role in cognitive functions. Deterioration in the hippocampus, which plays a central role in learning, memory, and spatial skills like navigation, precedes and leads to memory impairment, Alzheimer's disease, and depression in older adults (Thomas et al., 2012). A recent randomized controlled trial showed that a 12 month exercise program (walking) led to increases in the size of the anterior hippocampus and improved spatial memory in older adults (Erickson et al., 2011).

Having noted the different effects of exercise vs. environmental enrichment on the brain, one wonders whether the changes in hippocampal size noted by Erickson et al. (2011) were a function of the physiological demands of walking or the engagement with the environment that walking permits. A recent study on exergaming (a combination of exercise and video game play) sheds some light on this issue. Anderson-Hanley et al. (2012) randomly assigned older adults to a cybercycling intervention, which involved virtual reality tours through simulated environments and competition with other cyclists, or to a traditional cycling intervention on a stationary bike. Despite equivalent levels of effort and fitness, the cybercyclists showed significantly greater improvements in cognitive function following the intervention than traditional cyclists. Importantly, cybercyclists showed significantly larger increases in brain derived neurotrophic factor (BDNF), an important neurotrophin thought to mediate exercise-induced neurogenesis and synaptogenesis, than traditional cyclists. Thus, exercise with simultaneous cognitive engagement was a much more effective facilitator of cognitive function than exercise alone.

Finally, it is highly relevant to again note the role played by the hippocampus in spatial navigation to fully appreciate the potential impact that locomotion has on the maintenance of psychological function. Interactions with complex environments place highly specific demands on navigation and lead to measurable changes in the hippocampus. For example, London taxi drivers, who are held to some of the most rigorous standards in the world relative to knowing their city, have greater gray matter volume in the mid-posterior hippocampi. Moreover, greater driving experience is associated with greater posterior hippocampal gray matter volume (Maguire et al., 2000, 2006). Many complex navigational processes decline with hippocampal atrophy (Nedelska et al., 2012).

In an interesting parallel with the developmental work linking the onset of crawling to the increased use of allocentric spatial coding strategies (note, much of that work was not covered in the current paper, but see Anderson et al., 2013 for a recent review), researchers have shown that allocentric spatial coding strategies in healthy older adults correlate with gray matter volume in the hippocampus whereas egocentric strategies correlate with volume in the caudate nucleus (Konishi and Bohbot, 2013). A study by Harris et al. (2012) has recently shown that aging specifically impairs the ability to switch from an egocentric to an allocentric navigational strategy during a virtual maze task. This finding is important to the concept of maintenance by experience because the onset of locomotion in infancy is associated with more flexible use of the two strategies during spatial search and coding tasks. It would be interesting to see whether older adults with mobility impairments, or who were more sedentary, would have more difficulty switching to an allocentric strategy than those without an impairment or those who were more physically active.

In summary, the concept of maintenance by experience not only highlights the enduring effects of locomotor experience, but offers an alternative way to conceptualize the relation between gait dysfunction and cognitive decline in the elderly. Rather than view the relation as unidirectional, i.e., neural and vascular changes lead to a deterioration in gait and cognitive function, with the deterioration in gait continuing as executive function becomes increasingly compromised, it may be more appropriate to view the relation as bidirectional. Impaired mobility is very likely to exacerbate cognitive impairment because it limits the interaction with the environment that is known to drive structural and functional changes in the brain. We will elaborate on this idea in the next section.

## **WHAT IMPLICATIONS DO MOTOR DISABILITIES HAVE FOR PSYCHOLOGICAL DEVELOPMENT?**

We have already noted that infants who are delayed in the onset of locomotion for neurological or orthopaedic reasons have also been shown to be delayed in the development of spatial-cognitive skills. These findings have been confirmed in a recent longitudinal study of seven infants with spina bifida who were tested on three spatial-cognitive paradigms prior to and after the onset of independent crawling (Rivera, 2012). The first paradigm assessed visual proprioception in the moving room. The second paradigm assessed the ability to follow the point and gaze gesture of an experimenter and the third paradigm assessed the ability to extract the invariant form of an object that was presented in multiple sizes, orientations, and colors. Consistent with the Campos et al. (2009) findings, the infants showed marked improvements on each of the spatial-cognitive paradigms following the acquisition of crawling, which occurred at an average age of 19.6 months, well after typically-developing infants begin to crawl. In addition, we have also noted already that infants who engage in effortful forms of locomotion, like belly crawling, don't appear to profit, in terms of psychological consequences, from their locomotor experience. We suspect that at least some of the cognitive deficits that have been noted in older children and adults with motor disabilities might be attributable to a lack of locomotor experience or delays in locomotor experience, particularly if those delays straddle sensitive periods in the development of the psychological skills in question.

The idea that motoric limitations might contribute to limitations in perceptual and spatial-cognitive functioning in children with motoric disabilities is not new (e.g., Abercrombie, 1964, 1968; Kershner, 1974). Limited evidence currently exists, however, to support the idea and the current model in developmental pediatrics has a strong bias against motoric factors playing a role in the psychological development of children with disabilities (Anderson et al., 2013). A major problem with accepting a role for motoric factors in the psychological development of children with physical disabilities has been the difficulty associated with separating the role of brain damage from that of mobility impairment in any psychological deficits that are discovered. Brain damage is often the cause of the primary motor impairments seen in children with physical disabilities and that same damage is obviously implicated in any co-occurring spatial-cognitive deficits.

Despite the above-mentioned difficulties, there is clear evidence that limited opportunities to explore the environment can impede the development of spatial-cognitive skills. Notably, in reference to the previous section, navigation is one of the skills that is most severely affected. One of the first studies to examine the effects of limited exploration on the development of navigation skills was conducted by Simms (1987). We have already discussed the more flexible use of egocentric and allocentric spatial coding strategies that accompanies the shift to independent locomotion in typically developing children as well as the difficulties that older adults often have using allocentric strategies. The development of spatial coding does not end, however, once the child has acquired the ability to use allocentric strategies. Rather, it continues to develop as children learn routes to target locations and ultimately learn to integrate routes and landmarks into an overall representation of the environment (Piaget and Inhelder, 1948; Siegel and White, 1975). In Simms's (1987) study, nine young adults with spina bifida and nine able-bodied controls had to learn routes while being driven through a traffic-free road system and a busy village. Compared to able-bodied controls, the young people with spina bifida took significantly longer to learn a route, noticed fewer landmarks, were less able to mark routes on a map, and produced poorer hand drawn maps. Importantly, the participants' level of mobility was linked to spatial skill, with walkers performing better than wheelchair users.

More recent studies have confirmed that children with physical disabilities have difficulties acquiring spatial knowledge related to navigation (e.g., Foreman et al., 1989, 1990; Stanton et al., 2002; Wiedenbauer and Jansen-Osmann, 2006) and have demonstrated that the severity of motor disability and the severity of brain damage make independent contributions to spatial-cognitive impairments (Pavlova et al., 2007). The study by Foreman et al. (1990) is particularly revealing because it shows that active decision making may be one of the key mediators in the link between locomotion and the acquisition of spatial knowledge. In two experiments, 4–6 year-old children were tested for their ability to retrieve objects that were strategically positioned within a large room. The children were first familiarized with the object positions in one of four locomotor conditions: (1) independently walking between positions, (2) walking but being led by an experimenter, (3) passively transported in a wheelchair, or (4) passively transported in a wheelchair while directing the experimenter where to go. The results showed that children who walked independently or directed the experimenter while being pushed in the wheelchair performed most successfully on the task. Thus, control over decision making was the crucial determinant of spatial search performance following navigation through the room and not the means by which locomotion was achieved. This finding is important because it further highlights the distinction between the experiences that are associated with locomotion and the means by which locomotion is achieved. A considerable body of research with typically developing children now shows that *active* locomotion facilitates spatial search performance (Yan et al., 1998).

When the studies linking crawling experience with spatialcognitive development in infants with spina bifida are combined with the studies showing spatial-navigational deficits in older children with physical disabilities, the evidence in favor of the hypothesis that impaired mobility contributes to impaired psychological development is already quite strong and growing stronger. Nevertheless, considerably more work needs to be done in this area before clinicians will accept the hypothesis without reservation. In the meantime, it is encouraging that some researchers and clinicians are already exploring the psychosocial benefits that might stem from early powered-mobility training in children with mobility impairments (e.g., Lynch et al., 2009; Ragonesi et al., 2010). Continued work in this broad area is imperative given the millions of children with physical disabilities world-wide who could potentially profit from our deeper understanding of the relation between locomotor impairments and psychological deficits.

## **CONCLUDING COMMENTS**

The onset of independent locomotion is a momentous event in human development. It marks a major transition toward independence from caregivers, it creates an explosion of new choices for the infant, and it heralds a remarkably broad set of changes in psychological functioning. Overwhelming evidence suggests that locomotion is not merely a maturational antecedent to these changes. Rather, the changes are a function of the specific experiences that accompany moving oneself through the world. Consistent with the idea that development is probabilistic, infants could potentially be exposed to these experiences in non-locomotor ways and thus acquire the psychological skills through alternative developmental pathways. However, the acquisition of these skills through alternative pathways in the typically-developing infant is likely rare. What makes locomotion significant is that it virtually guarantees that infants will encounter the requisite experiences that drive a host of important psychological changes; many of which were not documented in this paper and many of which remain to be discovered. Even though self-produced locomotion may not be necessary for these changes to take place, locomotion is significant because in the ecology of the typically-developing infant it is the most common means by which these changes happen.

The enduring significance of locomotion stems from the fact that, once acquired, it is typically maintained; though it also

## **REFERENCES**


becomes more effectively controlled, more efficient, and more adaptable to a range of different morphological and contextual constraints. Locomotion can thus serve as a permanent framework for the maintenance of the psychological skills it helped to engender in the first place. Moreover, the onset of new locomotor skills, like walking or running, will likely have consequences for the development of more sophisticated psychological skills. This hypothesis is already being tested. The maintenance idea has important implications for our understanding of the declines in psychological functioning that occur when locomotion is compromised by aging, injury, disease, or disability, and it deserves to be scrutinized much more carefully. Equally worthy of further scrutiny are the psychological consequences associated with motor disabilities that delay the acquisition of independent locomotion or impair its quality once acquired. Many questions remain unanswered about the specific processes by which locomotion brings about psychological changes as well as the specific changes in neural structure and function that can be tied to locomotion. Questions also remain about the acquisition of other motor skills that may have implications for psychological development. Addressing all of these questions could markedly enhance not only our understanding of the specific role that locomotion plays in psychological processes across the lifespan, but also the broader role that action plays in those same processes. Ultimately, we argue that the acquisition of any skill that dramatically changes the relation between the person and the environment must have consequences for psychological functioning. This idea has significant implications for the way we view and understand human development.

## **ACKNOWLEDGMENTS**

This research was supported by grants from the Japanese Society for the Promotion of Science (KAKENHI-22330191), the John D. and Catherine T. MacArthur Foundation, the National Institutes of Health (HD-07181, HD-07323, HD-25066, HD-36795 and HD-39925), and the National Science Foundation (SBR-9116151, BCS-0002001, and BSC-0958241), by a fellowship from the Center for Advanced Study in the Behavioral Sciences, and by a "Research Infrastructure in Minority Institutions" award from the National Center on Minority Health and Health Disparities, P20 MD00262.


approach can transcend the nativist–empiricist debate. *Cognit. Dev.* 28, 96–133. doi: 10.1016/j. cogdev.2013.01.002


older adult cognition: a cluster randomized clinical trial. *Am. J. Prev. Med.* 42, 109–119. doi: 10.1016/j.amepre.2011.10.016


23, 162–171. doi: 10.1016/0022- 0965(77)90082-0


267–309. doi: 10.1111/j.1749- 6632.1990.tb48900.x


basal ganglia. *Front. Psychol.* 3:535. doi: 10.3389/fpsyg.2012.00535


*of Child Psychology: Cognition, Perception and Language.* Vol. 2, eds W. Damon, D. Kuhn, and R. Siegler, (New York, NY: John Wiley), 199–254.


perspectives on infant development," in *Theories of Infant Development*, eds G. Bremner and A. Slater (Oxford: Blackwell), 121–144. doi: 10.1002/ 9780470752180.ch5


342–348. doi: 10.1016/j.tics.2007. 06.009


early mobility on shortcut performance in a simulated maze. *Behav. Brain Res.* 136, 61–66. doi: 10.1016/S0166-4328(02)00097-9


psychology. *Hum. Dev.* 50, 127–153. doi: 10.1159/000100943


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2013; accepted: 25 June 2013; published online: 23 July 2013. Citation: Anderson DI, Campos JJ, Witherington DC, Dahl A, Rivera M, He M, Uchiyama I and Barbu-Roth M (2013) The role of locomotion in psychological development. Front. Psychol. 4:440. doi: 10.3389/fpsyg.2013.00440*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Anderson, Campos, Witherington, Dahl, Rivera, He, Uchiyama and Barbu-Roth. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

REVIEW ARTICLE published: 03 June 2013 doi: 10.3389/fpsyg.2013.00273

## Choosing actions

#### **David A. Rosenbaum<sup>1</sup>\*, Kate M. Chapman<sup>1</sup> , Chase J. Coelho<sup>1</sup> , Lanyun Gong<sup>1</sup> and Breanna E. Studenka<sup>2</sup>**

<sup>1</sup> Department of Psychology, Pennsylvania State University, University Park, PA, USA

<sup>2</sup> Department of Health, Physical Education, and Recreation, Utah State University, Logan, UT, USA

#### **Edited by:**

Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA

#### **Reviewed by:**

Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA T. Andrew Poehlman, Southern Methodist University, USA

#### **\*Correspondence:**

David A. Rosenbaum, Department of Psychology, 448 Moore Building, Pennsylvania State University, University Park, PA 16802, USA e-mail: dar12@psu.edu

## **INTRODUCTION**

Actions make psychological activity tangible, for it is through actions that decisions are expressed. To be on the frontier of psychology, therefore, it is desirable not just to know what actions are chosen but also how they are. The actions of interest can be large-scale, as in deciding whether to stay in school or drop out; or they can be small-scale, as in raising one's eyebrow or nodding in a way that conveys less than full agreement. The actions need not be communicative, however. They can be purely functional, as in reaching for a cup of coffee when one is alone. Such functional actions can also be carried out in different ways, quickly and assuredly, for example, or slowly and hesitantly.

Psychologists have paid little attention to the way actions are physically expressed. Instead, they have typically focused on the instrumental outcomes of behavior, the most famous example being B. F. Skinner's research, in which rats pressed on levers or pigeons pecked on keys to get rewards or avoid punishments (e.g., Skinner, 1969). How the rats pressed the levers or how the pigeons pecked the keys were of less interest than which devices were activated when.

The restriction of focus to switch closures, whether achieved with limbs or beaks, is understandable when one's methods of recording behavior are primitive. It is much easier to record which electrical switch is closed in a Skinner box than to quantify the detailed properties of movement trajectories. Still, the manner in which movements are made may be relevant not just for conveying subtleties of communication or for determining whether a task is performed confidently. How movements are made may also be relevant for shedding light on motor control itself.

Consider the simple act of pressing an elevator button. An elevator summoned by a button press is indifferent to the movements made to press the button. Still, the movements made to press the button are a concern for the person pressing the button. This is obvious for someone with a movement disability, but even for

Actions that are chosen have properties that distinguish them from actions that are not. Of the nearly infinite possible actions that can achieve any given task, many of the unchosen actions are irrelevant, incorrect, or inappropriate. Others are relevant, correct, or appropriate but are disfavored for other reasons. Our research focuses on the question of what distinguishes actions that are chosen from actions that are possible but are not. We review studies that use simple preference methods to identify factors that contribute to action choices, especially for object-manipulation tasks. We can determine which factors are especially important through simple behavioral experiments.

**Keywords: action selection, degrees-of-freedom problem, motor control, behavioral psychology, choosing actions**

neurologically typical individuals, there is a non-trivial problem to be solved in pressing an elevator button. The number of possible joint configurations that let the finger press the button is limitless. In addition, for any given joint configuration achieved at the time of the press, the number of paths leading to that joint configuration is limitless as well. Finally, for every one of those paths to the final configuration, the timing possibilities are boundless, too. So even for a task as trivial as pressing an elevator button, the number of possible actions is infinite. A core question in motor control is how, for situations like this, particular actions are chosen.

## **APPROACHES TO ACTION SELECTION**

The problem of choosing actions in the sense just discussed was first recognized by Bernstein (1967), who referred to the matter as the *degrees-of-freedom* problem. As Bernstein appreciated, the degrees of freedom of the body exceed the degrees of freedom associated with the ostensive description of most tasks to be achieved. An elevator button, for example, has six (positional) degrees of freedom – the three spatial coordinates of its center, and the three orientation coordinates of its plane (pitch, roll, and yaw). The width of the button (governing its tolerance for aiming errors) is relevant as well, as is the force needed to complete the press. Summing up these degrees of freedom, there are eight of them.

The degrees of freedom of the body of a typical person intent on pressing a button are vastly greater. Considering only the skeleton, a person's upper arm has three degrees of freedom (rotation about the *x*, *y*, and *z* axes), the forearm has two degrees of freedom (flexion/extension and twisting), and each finger joint adds its own degrees of freedom. Adding the joints of the spine, hip, knee, and ankle, still more degrees of freedom come along. How the head is oriented enters as well, how the eyes are oriented factors in, and so on. Quickly, the bodily degrees of freedom exceed the eight associated with the button, and this ignores the vicissitudes of the muscles affecting the joints and the nerves driving the muscles, which create an even greater explosion of possibilities.

## **COUPLING**

How can one make progress on the challenge of choosing particular actions when infinitely many achieve a task? In the literature on this topic three approaches have been taken. Two were pursued by Bernstein (1967). A third emerged after him.

One approach that Bernstein (1967) pursued was to identify functional dependencies between effectors. Bernstein's idea was that linkages between effectors could limit the degrees of freedom to be controlled.

At an abstract level, this approach can be appreciated by considering **Figure 1**, which shows, in one case, two independent points in a plane and, in the other, two points joined by a line of fixed length. In the first case, there are four degrees of freedom: the *x* and *y* values of point A, and the *x* and *y* values of point B. In the second case, there are three degrees of freedom: the *x* and *y* of one point and the angle of the line, whose length is fixed, from A to B. This simple example, adapted from Saltzman (1979), shows how coupling can reduce the degrees of freedom to be managed.

Does coupling exist in actual motor performance? The answer, resoundingly, is Yes. As noticed by von Holst (1939), when fish oscillate their dorsal fins and then start to oscillate their pectoral fins, the dorsal fin oscillations change. When von Holst asked human subjects to do something similar, raise and lower one outstretched arm at a fixed frequency and then at other frequencies, the oscillations of the control arm changed. Such limb interactions occur reliably and have been studied in detail (e.g., Swinnen et al., 1994).

What do these results imply about the degrees-of-freedom problem? They might be taken to suggest that dependencies between effectors obviate the problem, but there is a difficulty with this suggestion. Linkages are not fixed but rather come and go depending on what needs to be achieved. During speech, for example, the upper lip moves down toward the lower lip more quickly than usual if the lower lip rises more slowly than usual (and vice versa), but this is only true when the sound to be produced requires bilabial closure, as in "p" or "b." It is not the case when the sound to be produced is a fricative, as in "f" or "v" (Abbs, 1986).

The manifestation of coupling also depends on how the task is presented. When the perceptual representation of the task is simplified, actions that are otherwise difficult to perform can be easy (Mechsner et al., 2001). Similarly, if the hands haptically track moving objects, staying in light touch with the objects while the objects move, two circularly moving objects turning at different frequencies can be haptically tracked essentially perfectly no matter what the frequency relation between them. By contrast, generating two circles with those same frequencies is nearly impossible if the circles are drawn through more conventional means, such as drawing them on a blackboard (Rosenbaum et al., 2006b).

#### **MECHANICS**

The second track that Bernstein (1967) pursued to address the degrees-of-freedom problem was to appeal to exploitation of mechanics. His idea was that action control can be simplified by exploiting mechanical interactions between the body and outer world.

Examples of motor performance that reflect exploitation of mechanics abound. A delightful example concerns babies in Jolly Jumpers. Suspended in their little seats, dangling via elastic chords from firm hooks above, babies learn to push on the floor at just the right pace and force to get the most "bang for the buck" (Goldfield et al., 1993).

Once babies and toddlers learn to walk, they continue to exploit mechanics. During mature walking there is a stance phase and a swing phase for each foot. During the stance phase the foot is *on* the ground, whereas during the swing phase the foot is *off* the ground. During the swing phase there is remarkably little muscle activity once the swing is initiated. The swing is completed, however, because the leg is swung forward and then pulled down via gravity. It turns out that people switch from walking to running as locomotion speed increases at just the speed where leg lowering would occur more quickly than is achievable by letting gravity pulling the leg down. At this critical speed, the transition is made from walking (a series of controlled falls), to running (a series of controlled leaps) (Alexander, 1984).

Does exploitation of mechanics solve the degrees-of-freedom problem? Perhaps to some extent in some circumstances. For example, exploitation of mechanics has been shown to be a useful way to avoid copious computation for robot trajectories (Collins et al., 2005). Still, it is unclear how far one can go with this approach, for it fails to explain the richness and diversity of voluntarily shaped performance.

#### **CONSTRAINTS**

If neither the coupling approach to the degrees-of-freedom problem nor the mechanics approach to the degrees-of-freedom problem fully solves the problem, what approach can do so? Toward answering this question, it is useful to return to the way we introduced the degrees-of-freedom problem earlier in this article. We noted that Bernstein (1967) couched the problem in terms of the degrees of freedom of the body relative to the degrees of freedom associated with the ostensive description of the task to be achieved. The key phrase for us as psychologists is "ostensive description." What we mean is that while a task description has *some* properties, the individual approaching the task adds more properties to the

description – enough of them, in fact, to fully describe the problem and thereby, in effect, solve it. For example, if the task is "to press the elevator button," the person about to perform this task might add more constraints, such as ". . . with an effector that can easily be brought to the button." The effector might be the right index finger, but if the individual were holding a squirming baby, some other effector might be used instead.

Saying that constraints limit action choices raises the question of how scientists can identify those constraints. To begin with, note that if constraints limit the range of possible actions, the constraints that do so correspond to the features of actions that are performed. Similarly, actions that could achieve the task but are not performed lack those features. Not all constraints are equally important, however. If an elevator button must be pushed, it is probably more important to press the button with a finger than to carry one's finger to the button with some desired average speed.

Given this pair of points – that constraints are mirrored in the features of selected actions and that some constraints are more important than others – the challenge for psychologists interested in action selection is to discover which constraints are more important than which others. Determining the ranking or weighting of constraints achieves two things. First, it obviates the need to say which constraints are relevant and which are not. That is, instead of adopting such a binary classification, all possible constraints can be, and indeed must be, included. What distinguishes the constraints, then, is their weights. Some constraints have large weights. Others have small weights, including weights that are vanishingly small (i.e., nearly zero or zero itself).

Second, the weights of the constraints define the task as represented by the actor. This point is of inestimable importance for psychology because so much of psychological research is about performance of one task or another – the Stroop task, the Flanker task, and so on. What a task is – how it is represented by someone performing it – is rarely considered, but the issue is core to understanding action selection and psychology more broadly. If a bus driver sees his task as setting people straight about how to enter his bus, then the way his passengers feel about him will be very different than if he sees his task as greeting his passengers as warmly as he can.

A mathematical formalism can help pave the way for where we will go with this. The formalism lets us depict tasks in an abstract "task space" (**Figure 2**) and lets us introduce a hypothesis about minimization of transitions within this space.

An elementary task, T, performed at time 1 can be defined as a vector of constraint weights, *w*1,1 for constraint 1, *w*2,1 for constraint 2, and so on, all the way up to constraint *n* at time 1.


The weights can be visualized as a point in task space, as seen in **Figure 2**. The axes of the space correspond to the weights (between 0 and 1) for the possible constraints. **Figure 2** shows just two constraint weights, for graphical convenience. In this illustration, ellipses contain the possible weights for achieving a given elementary task. A single point within the ellipse is highlighted to show which weight combination is chosen.

It is also possible to consider*series* of elementary task solutions, as shown in the next equation, where we extend the first equation to one in which all the weights take on values for a range of times from time 1 up to time *t*:


Two task series are shown in **Figure 2**. In the case on the left, the first elementary task solution takes into account which point will be chosen for the second elementary task. In the right panel, though the set of possible solutions for the first elementary task is the same as in the left panel, the weighting pair chosen within it is different. The reason is that a different task is required next.

What we are saying is that actions may be selected in a way that minimizes transitions through task space. This idea has been appreciated before (e.g.,Jordan and Rosenbaum, 1989) and is particularly well known in connection with speech co-articulation, where the way a sound is produced depends on what sounds will follow (Fowler, 2007).

## **OBJECT MANIPULATION**

In our laboratories at Penn State and Utah State, we have been concerned with manual control rather than speech control. Our particular interest within the domain of manual control has been object manipulation. Object manipulation is particularly interesting to us because we take a cognitive approach to action selection. In studies of object manipulation the same object can be used for different purposes. A pen can be used for writing or for poking, a knife can be used for slicing or for jabbing, and so on (Klatzky and Lederman, 1987). This feature of object manipulation makes the associated tasks attractive to us given our cognitive bent. The same participant can be exposed to the same object in the same position and can be instructed or otherwise induced to use the object with different goals. Differences in the way participants grasp or handle the object depending on the future task demands can be ascribed to differences in the participants' mental states.

#### **ORDER OF PLANNING**

Yet another attraction of object manipulation is that one can study planning effects of different orders. One can look for *firstorder* planning effects, reflecting the influence of the object being reached for in its present state; or one can look for *second-order* planning effects, reflecting the influences of what is to be done *next* with the object; or one can look for *third-order* planning effects, reflecting the influences of what is to be done *after that*; and so on (Rosenbaum et al., 2012). The highest-order planning effect that can be observed can be taken to reflect the planning span. For discussions of planning spans for speaking and typewriting, see Sternberg et al. (1978) and Logan (1983).

As long as there are second- or higher-order planning effects in object manipulation, those effects can be viewed as manual analogs of speech co-articulation. We can, in fact, coin a phrase to highlight this association. Just as there are co-articulation effects for speech, we can say there might be "co-manipulation" effects for manual control.

One would expect co-manipulation effects if the cognitive substrates of co-articulation extended to manual behavior. Saying this another way, to the extent that manual control is present in many animals whose evolutionary past does not yet equip them with the capacity for speech, the capacity for co-manipulation may set the stage for co-articulation.

## **NATURALISTIC OBSERVATION**

Granted that co-manipulation would be interesting to discover, how could one look for it? A first thought is to observe the

microscopic features of manual behavior in the laboratory, taking advantage of technical systems for recording and quantifying properties of limb movements (e.g., Cai and Aggarwal, 1999). We have used such systems in our research (e.g., Studenka et al., 2012). However, the method we have generally favored has been simpler. We have preferred to observe behavior in situations where there are two easily observed ways of grasping any given object, especially when one of those ways can be plausibly linked to what will be done with the object. We like this approach because it can be pursued in the everyday environment, permitting or, better yet, *encouraging*, naturalistic observation.

It was, in fact, a naturalistic observation that paved the way for most of the research to be described in this article. While the first author was eating at a restaurant, he observed a waiter filling glasses with water. Each glass was inverted and the waiter had to turn each glass over to pour water into it. The waiter grasped each glass with his thumb *down*, whereupon he turned the glass over and poured water into the glass, holding the glass with his thumb *up*. Finally, he set the filled glass down, keeping his thumb up, and then proceeded to the next glass, turning his hand to the thumb-down position as he prepared for the next episode of glass filling.

The usual way of reaching for a glass is, of course, to take hold of it with a thumb-up posture. Why, then, did the waiter grasp the glass with his thumb down? Grasping an inverted glass with a thumb-down posture afforded a thumb-up hold when the waiter poured water into the glass and then set it down on the table. If the glass had been picked up with the thumb up, the resulting thumb-down posture would have made the subsequent pouring and placement awkward. At extreme forearm rotation angles (e.g., thumb-down angles) as compared to less extreme forearm rotation angles (e.g., thumb-up angles), rated comfort is lower (Rosenbaum et al., 1990, 1992, 1993), muscular power is lower (Winters and Kleweno, 1993), joint configuration variance is higher (Solnik et al., 2013), and maximum oscillation rates, which are critical for quick error correction, are lower as well (Rosenbaum et al., 1996). For any of these reasons, it made sense for the waiter to grasp each inverted glass as he did.

## **TWO-ALTERNATIVE FORCED CHOICE PROCEDURE**

The waiter's adoption of a thumb-down posture was consistent with the model shown in **Figure 2**. The waiter's decision to grasp the glass thumb-down shows that he was aware of (or had learned) what he would do next with the glass, so his action selection reflected second-order (or possibly higher-order) planning.

The waiter's maneuver was detected in a single naturalistic observation, so it was important to replicate the result in the laboratory. The laboratory method that was used relied on the two-alternative forced choice procedure. The two alternatives per trial were readily categorized actions, either of which was possible for the task at hand but only one of which was typically preferred (or expected to be preferred) over the other.

The logic of the approach was to find out how often one alternative was favored over the other depending on the nature of the choice difference. The approach proved useful, as indicated in the raft of studies that have used it (Rosenbaum et al., 2012). In the present article, we cover some of the major results of this work, including several findings that emerged after preparation of the review article just cited. Specifically, we review (1) findings that have been obtained about choice of grasp *orientation*, both in neurologically typical and neurologically atypical adults and children and in non-human primates; (2) findings concerning grasp *locations* along objects to be moved; and (3) findings concerning selection of actions that involve walking as well as reaching.

#### **GRASP ORIENTATION IN HEALTHY YOUNG ADULTS**

The first laboratory test of the tendency to select initially distinct grasp orientations in the service of later grasp orientations (Rosenbaum et al., 1990), involved presenting university students with a horizontally oriented wooden dowel resting on stands beneath the dowel's ends (**Figure 3**). One end of the dowel was white; the other end was black. A circular disk target was placed on either side of the stand, closer to where the participant stood, and the participant was asked to reach out with the right hand to grasp the dowel and place either the black end or white end into a specified target. The task used a two-alternative forced choice method, though the two alternatives were not explicitly named for the participants. They could either grasp the dowel with an overhand (palm down) grasp, or they could grasp the dowel with an underhand (palm up) grasp. The dowel placement could likewise end in either of two ways: with a thumb-up posture or with a thumb-down posture. Ratings from the participants indicated that they found the thumb-down posture uncomfortable. In addition, they found the underhand (palm-up posture) less uncomfortable, and they found the overhand (palm-down posture) and thumb-up posture least uncomfortable (most comfortable).

For the action rather than the rating task, the main result was that participants consistently chose an initial grasp orientation that

**FIGURE 3 | Dowel task transport (Rosenbaum et al., 1990) demonstrating the end-state comfort effect**. In **(A)**, the black and white dowel rests on a cradle with a target on either side of the cradle. In **(B)** the dowel's black end was to be placed in the left or right target. In **(C)**, the dowel's white end was to be placed in the left or right target. The numbers near the black and white ends of the dowel represent the number of participants who grasped the dowel with the thumb directed toward that colored end of the dowel. (Image from Rosenbaum et al., 2006a.)

facilitated a thumb-up posture when the dowel was placed onto the target. When the participants were asked to place the black (left) end of dowel in the target, they picked up the dowel with an overhand grasp, which allowed them to end in a thumb-up orientation. By contrast, when participants were asked to place the white (right) end of the dowel in the target, they picked up the dowel using an *underhand* grasp, which also allowed them to end in the same thumb-up orientation. Regardless of the end that needed to be placed on the target, therefore, participants altered their initial grasps in a way that ensured a comfortable final grasp orientation. Rosenbaum et al. (1990) called this the *end-state comfort* effect.

After this first laboratory demonstration of the end-state comfort effect, many studies confirmed that the tendency to prioritize the grasp orientation at the end of a movement emerges in a wide variety of tasks. It was found that participants showed a preference for end-state comfort when the dowel task was reversed, so a vertical dowel was brought to a horizontal resting position; the final posture was a comfortable palm-down posture (Rosenbaum et al., 1990). When participants were asked to pick up an inverted cup and fill it with water, they chose an initially uncomfortable grasp and ended in a thumb-up grasp (Fischman, 1997). When participants were asked to grasp a handle and turn it to rotate a disk 180˚ so a tab would line up with a given location around the disk's perimeter (see **Figure 4**), participants adopted initially uncomfortable grasps to ensure a comfortable grasp orientation at the end of the rotation (Rosenbaum et al., 1993).

The end-state comfort effect emerged not only in *single*-hand tasks, as just described, but also in *bimanual* tasks. In a bimanual version of the dowel transport task, participants grasped two horizontal dowels, one with each hand, and moved them to two vertical positions (Weigelt et al., 2006). Participants grasped the dowels in a way that afforded comfortable thumb-up grasps at the ends of the dowel transports. Participants in other experiments behaved similarly (Janssen et al., 2009, 2010).

Subsequent studies showed that precision rather than end-state comfort *per se* may be the decisive factor in second-order grasp planning. Short and Cauraugh (1999) showed that participants were less likely to grasp a dowel in a way that ensured end-state comfort if the target to which the dowel was moved was wide rather than narrow. A similar effect emerged in a unimanual disk rotation task study in which participants were asked to take hold of a handle in order to turn a lazy susan to an ending orientation (Rosenbaum et al., 1996). In one condition, securing the ending position took very little control, thanks to a bolt that stopped the disk's rotation. In that condition, only half the participants showed the end-state comfort effect. By contrast, a much larger proportion of participants showed the end-state comfort effect when they had to control the final orientation through normal aiming.

All the results summarized in the last paragraph indicate that the term "end-state comfort" may be a misnomer. Ending in a comfortable state may be less important than occupying postures affording the most control. For further evidence, see Künzell et al. (in press).

#### **GRASP PLANNING IN NON-HUMAN ANIMALS**

The evidence just reviewed suggests that consideration of grasp orientation is an important constraint guiding action selection in

object manipulation, at least in neuro-typical college students. Is this factor also important in other populations?

Consider first performance by non-human primates. Studies of object manipulation in non-human primates have shed light on the evolutionary history of the cognitive capacities underlying manual action selection. Weiss et al. (2007) tested cotton-top tamarin monkeys on a modified version of the tasks described above. As shown in **Figure 5**, these cotton-top tamarins were presented with a food-baited champagne glass oriented upright or inverted. The base of the glass was removed and a long rod extended the glass's stem. Both when the glass was upright or inverted, a flat plate prevented the monkeys from reaching into the glass to remove a marshmallow visible inside it. To retrieve the food, each individually tested monkey had to slide the glass toward him or herself to remove the marshmallow.

When the cup was upright, the tamarins grasped the stem with a typical thumb-up orientation. More interestingly, when the cup was inverted, the tamarins grasped the stem with an atypical thumb-down orientation. In the latter case (as in the former) the monkeys ended with the glass held thumb-up (see **Figure 5**). Thus, the tamarins, like college students and waiters, prioritized comfort (or presumed comfort) of final grasp orientations over prioritized comfort (or presumed comfort) of initial grasp orientations. This outcome suggests that the cognitive substrates for second-order grasp planning may have been in place as long as 45 million years ago, when the evolutionary line leading to tamarins diverged from the evolutionary line leading to humans (**Figure 6**).

Can the lineage for such planning be placed even farther back in time? Chapman et al. (2010) showed that it could. These authors obtained the same grasp-planning effect when the cup task (slightly modified) was used with lemurs. Lemurs are the most evolutionarily distant living primate relatives of humans (**Figures 6** and **7**). The lemur line diverged from the anthropoid line (the line leading to *Homo sapiens*) approximately 65 million years ago, or 20 million years earlier than for tamarins.

A final remark about evolution is that one would expect the planning ability indexed by grasp planning also to exist for old world monkeys and apes; otherwise, there would be a disconcerting "hole" in the picture. Rhesus monkeys (Nelson et al., 2011) and chimpanzees (Frey and Povinelli, 2012) also show sensitivity to future grasp orientation requirements, so as far as we can tell, then, the capacity for second-order grasp planning was in place as long as 65 million years ago and has held fast since that time.

#### **GRASP PLANNING IN BABIES, TODDLERS, AND CHILDREN**

What about ontogenetic rather than phylogenetic development? In humans, the species whose ontogenetic development is of most interest to us, first-order grasp planning takes hold within the first year of life. Babies modify their grasps according to the properties of objects they reach for. The relevant literature is briefly reviewed in a textbook about motor control written mainly for psychologists (Rosenbaum, 2010) and at greater length in a recent handbook chapter (Savelsbergh et al., 2013).

In terms of the development of second-order grasp planning, such planning appears in some toddlers at around 18 months of age (Thibaut and Toussaint, 2010). Surprisingly, though, secondorder grasp planning as studied in the manner outlined earlier does not reach adult-like competency until 9 or 10 years of age (Hughes, 1996; Smyth and Mason, 1997; Manoel and Moreira, 2005; Thibaut and Toussaint, 2010; Weigelt and Schack, 2010; Jovanovic and Schwarzer, 2011).

Several studies have also investigated child clinical populations. Autistic children and mildly learning-disabled children show less sensitivity to final grasp orientation than do age-matched controls (Hughes, 1996). Less consistent sensitivity to final grasping posture is also seen in children with cerebral palsy (Crajé et al., 2010) and in children with Williams' syndrome (Newman, 2001).

#### **GRASP PLANNING IN ADULT CLINICAL POPULATIONS**

Studies of adult clinical populations have also revealed graspplanning deficits. In a task requiring participants to grasp a dowel and rotate it to different positions, individuals with visual agnosia did not consistently choose initial grasps that facilitated comfortable grasp orientations at the ends of the rotations

**FIGURE 5 | A cotton-top tamarin performing the cub extraction task ofWeiss et al. (2007)**. In **(A)**, the monkey grasps an upright cup's stem using a canonical thumb-up posture. In **(B)**, the same monkey grasps the inverted cup's stem using a non-canonical thumb-down posture. (Image from Weiss et al., 2007.)

**FIGURE 7 | A ring-tailed lemur grasping an inverted cup's stem using a thumb-down posture**. The lemur then inverted the cup to remove a raisin from it. (Image from Chapman et al., 2010.)

(Dijkerman et al., 2009). Neither did adults with cerebral palsy (Crajé et al., 2009) or with apraxia due to unilateral lesions (Hermsdörfer et al., 1999). On a more up-beat note, adults with autism spectrum disorder exhibited some sensitivity to end-state comfort, not only in themselves but also in others, as shown in a study of handing a tool to another person (Gonzalez et al., 2013). The capacity for anticipating the needs of others was

less consistent in the autistic individuals than in their typically developed age-matched peers, however.

#### **GRASP HEIGHT**

Constraints that come into play in planning for object manipulation are not only revealed by grasp *orientations*; they are also revealed by grasp *locations*. For example, when grasping a glass to place it on a high shelf, a person might grasp the glass low, near the base, to avoid an extreme stretch during the placement. Similarly, when grasping a glass to place it on a low shelf, the same person might grasp the same glass higher to avoid an extreme downward stretch.

These expectation were borne out in a naturalistic observation made by the first author at his home in the midst of returning a toilet plunger to its normal position on the floor. The details of the incident are unimportant. We will spare you! Suffice it to say that the type of manipulandum for which the phenomenon first appeared proved useful in laboratory experiments, where a fresh plunger was used.

Participants were asked to place the plunger onto shelves of different heights (Cohen and Rosenbaum, 2004). As seen in **Figure 8**, the plunger always began at the same location. The participant was asked to take hold of it with the right hand in order to move it to another shelf of variable height. When the plunger was grasped to be placed on a high shelf, the grasp was low. Conversely, when the plunger was grasped to be placed on a low shelf, the grasp was high. In general, as seen in **Figure 9**, there was an inverse linear relation between target height (the independent variable) and grasp height (the dependent variable) within the range of home and target-heights studied.

This observed relation, which Cohen and Rosenbaum (2004) called the *grasp height* effect, can be understood to reflect a desire to avoid extreme joint angles, similar to what was seen for the hand orientation effects described earlier. Also as for the hand orientation effects, it turned out that required precision played an important role. The grasp height effect was attenuated when placement of the plunger on its target location required less precision than when placement of the plunger on its target location required a lot of precision (Rosenbaum et al., 2006a). This outcome suggested that avoidance of extreme joint angles was sought when greater control was needed, as concluded earlier in connection with grasp orientations.

Another finding from the study of Cohen and Rosenbaum (2004) shed light on the nature of the action selection process. Cohen and Rosenbaum found that grasp heights depended not just on upcoming task demands but also on previous actions. After the plunger was brought from its home position to the target, the participant lowered his or her hand and then returned the hand to the plunger to bring the plunger back to the home position. When participants did this, the grasp heights they adopted were very similar to the grasp height just adopted for the home-to-target grasps (Cohen and Rosenbaum, 2004). Thus, participants did not strive for invariant end postures for the target-back-to-home transports. If they had, they would have grasped the plunger higher from high targets than they originally did (overcoming the tendency to grasp low for high targets at the home site), and they would have grasped the plunger lower from low targets than they originally did (overcoming the tendency to grasp high for low targets at the home site). What participants did instead was to grasp close to where they had just grasped the plunger when they brought it to the target from the home position.

A subsequent study showed that it was the location on the plunger shift rather than the posture that participants recalled for the return moves. Weigelt et al. (2007) showed this by having participants step up onto, or down from, a stool after moving the plunger from the home to the target and before returning to the home position. Instead of adopting a posture like the one adopted when holding the plunger on the target (just before releasing it), which would have meant holding the plunger at a different point along the shaft after stepping up or down, participants grasped the plunger close to where they had grasped it before, even if this required a very different posture. Thus, participants relied on memory of the grasp location to guide their grasps for the return trip. Relying on that strategy may have required fewer cognitive resources than planning a new action from scratch every time. How different a posture would be tolerated for the return trip is still an open question.

## **REACHING AND WALKING**

The studies reviewed above concerned choices of grasps for forthcoming object manipulations. The studies provided evidence for second-order planning at least. The studies showed that grasps are not just adjusted according to the immediate demands of taking hold of an object based on its currently perceived properties (firstorder planning), but instead also depend on what will be done

**FIGURE 8 |Two of the conditions studied by Cohen and Rosenbaum (2004) in their demonstration of the grasp height effect**. The plunger occupies the same starting position in both conditions shown here (and in all five target-height conditions tested). Each participant was instructed to keep his or her left hand in his or

her left pocket and to begin each trial with the right-hand hanging by the participant's side. Left panel: the highest target shelf tested. Right panel: the lowest target shelf tested. The experimenter is the first author of the present article. The participant gave permission to have his face shown.

with the object afterward. For evidence that grasp planning can go beyond the second-order, see Haggard (1998).

Grasp features are not the only aspects of behavior that provide evidence for higher-order object-manipulation planning. Consider a study by Studenka et al. (2012). They asked participants to engage in the everyday task of opening a drawer to grasp an object inside. When participants knew that no object had to be grasped (i.e., they would simply open the drawer) they held the grasping arm lower for the drawer opening than when they knew they would lift an object from the drawer after opening it. Not only was the arm higher in the lifting condition than in the non-lifting condition; the joint angles were also more similar to those that would be adopted for the lift. This outcome is similar to the grasp height effect of Cohen and Rosenbaum (2004) in that it reflects assimilation: features of upcoming behavior are reflected in behavior that comes before. Such assimilation reflects the tendency to minimize differences between immediately forthcoming postures and subsequent postures, as shown in **Figure 2**.

## **STANDING FOR OBJECT MANIPULATION**

Whereas Studenka et al. (2012) looked at arm configurations, it is also possible to look at more macroscopic aspects of behavior to draw inferences about action planning in the context of object manipulation. Specifically, it is possible to study how people approach a space where they know they will manipulate an object. People approaching the space must project themselves to a new position, often in a very different part of space than the one they currently occupy. How they do so is a topic of longstanding interest in psychology.

Little research has been done on whole-body planning of object manipulation, but some work has been done on it in our lab at Penn State. van der Wel and Rosenbaum (2007) asked how people walk up to a table to move a plunger to the left or right over a long or short distance. The long distance was 120% of the subject's arm length; the short distance was 20% of the subject's arm length. Participants began each trial standing some distance from the table: one, two, three, or four steps from the table, where"steps" were defined for each participant based on his or her height. Participants could use whichever hand they wished to perform the plunger displacement task, which involved lifting the plunger and then setting it down the long or short distance away toward the left or right.

The main result was that participants preferred to stand on the foot opposite the direction of forthcoming object displacement if the displacement was large. If the displacement was small, participants displayed no foot preference at the time of manual displacement.

Why did participants stand on the opposite foot for large displacements? Doing so made it possible for participants to rock in the direction of the upcoming manual displacement, landing on the foot ipsilateral to the placement. No such rocking was observed when the manual displacements were small.

What was the effect of the participants' initial distance from the table? To the surprise of van der Wel and Rosenbaum (2007), there was a stronger preference to stand on the foot contralateral to the large forthcoming object shift when participants initially stood *far* from the table than when participants initially *near* the table (**Figure 10**). The reason for this outcome was not entirely clear. Participants may have thought they could not navigate as well to position their feet as they wished when they began close to the table. With more steps, however, they may have had had more of a chance to adjust their foot positions.

The latter hypothesis was confirmed through an analysis of changes in step lengths as a function of starting distance from the table. van der Wel and Rosenbaum (2007) found that the greater the starting distance, the more the step lengths changed as participants approached the table. So as participants approached the table, they altered their steps to afford contralateral foot support at the time of the large manual displacement.

These results, along with the others summarized in this section, suggest that participants could project themselves into the positions they would need (or want) to adopt for the manual transfers they would perform.

## **WALKING FOR OBJECT MANIPULATION**

If people can mentally project themselves *to* future body positions, might they also be able to project themselves moving *through* those positions? Might they, in other words, be able to imagine themselves carrying out object manipulations while moving through the environment – for example, while grabbing items from a supermarket shelf during a trip down the aisle?

That people can coordinate their reaching and walking in cognitively impressive ways was shown by Marteniuk and Bertram (2001), who compared hand trajectories produced by people moving a cup from one position to another either while standing still or while walking. As seen in **Figure 11**, the hand paths were virtually identical in the two cases, at least when the hand paths were depicted in external, spatial coordinated. When the hand paths were depicted in intrinsic, joint-based coordinates, they were strikingly different.

This finding is reminiscent of a classic result reported 20 years earlier. In that study, Morasso (1981) found that hand paths for

point-to-point reaching movements were nearly straight in extrinsic spatial coordinates but were often curved and highly complex in intrinsic, joint-based coordinates. Morasso's result suggested that the motor system puts a premium on generating movements defined with respect to external coordinates. The complexity of motions in intrinsic coordinates suggests that the intrinsic control system – the one responsible for moving and stabilizing muscles – is extremely "clever," somewhat like a highly skilled secretary who works behind the scenes to keep his or her boss looking good (Rosenbaum and Dawson, 2004). In the case of Marteniuk and Bertram's (2001)result, the fact that the motor system could generate simple hand paths in extrinsic space even when people were *walking* is a stunning result. Developing a computational model capable of simulating this capability will be a worthwhile aim for future research.

In the walk-and-reach study of Marteniuk and Bertram (2001), the topic of interest was participants' ongoing behavior. An issue that was not addressed in that report was how far in advance people planned their walks and reaches. That is a topic for which most of the research we know of has come from our own laboratory, where again we have found it useful to rely on the two-alternative forced choice procedure.

In one of our experiments (Rosenbaum et al., 2011), we asked participants to pick up a child's beach bucket on a table and carry it to either of two sites beyond the table (**Figure 12**). To pick up the bucket, the participant could either walk along the left or right side of the table. If the participant walked along the left side of the table, he or she was supposed to pick up the bucket with the right hand and carry it to a target site (a stool) beyond the table's left end. If the participant walked along the right side of the table, he or she was supposed to pick up the bucket with the left hand and carry it to a target site (a different stool) beyond the table's right end. In different trials, the left and right target sites (the left and right stools) occupied different distances from the end of the table. Crossed with this variable, the bucket was close to the left edge of the table, in the middle of the table, or close to the right edge of the table.

Given these possible arrangements, it was possible to study the costs of walking versus reaching. In some conditions, participants had no conflict between these two costs. For example, participants had no conflict if the bucket was near the left edge of the table, the left target was close, and the right target was far away (top panel of **Figure 12**). However, if the bucket was near the right edge of the table, the left target was nearby, and the right target was far away (middle panel of **Figure 12**), participants had a conflict. In that case, participants could either walk along the right side of the table, reaching less but walking more, or they could walk along the left side of the table, reaching more but walking less. Finally, in terms of the examples reviewed here (just some of the conditions tested), if the bucket was in the middle of the table and the left target was farther away than the right (bottom panel of **Figure 12**), participants could walk less by walking along the right side of the table, or they could walk more, walking along the left side of the table, reaching just as far in both cases. If they walked more, they would have to use the less favored hand (the left hand for the participants in this study).

So what was more important, walking less or reaching with the hand that was preferred? With the tasks used, which go beyond those reviewed above, Rosenbaum et al. (2011) could estimate the relative costs of walking over some distance versus reaching over some distance, and they could estimate the relative cost of reaching with the left hand or right. The way they estimated the relative costs is reflected in **Figure 13**, which shows the probability, *p*(*L*), that participants walked along the left side of the table plotted as a function of the difference between two derived measures,"left path functional distance"and"right path functional distance."Left path

**FIGURE 12 |Three arrangements used by Rosenbaum et al. (2011) to study walking and reaching**. In all cases, the participant stood at the site where these photographs were taken. Top panel: bucket near the left edge of the table and the left target stool is nearby. Middle panel: bucket near the right edge of the table and the left target stool is again nearby. Bottom panel: bucket in the middle of the table and the right target stool is nearby. Adapted from Rosenbaum (2012).

**FIGURE 13 | Probability, p(L), of choosing to walk along the left side of the table (Rosenbaum, 2012). Left path functional distance was defined as walking distance** + **10.3** × **right-hand reaching distance**. Right path functional distance was defined as walking distance + 12.3 × left-hand reaching distance, all in meters (m). Adapted from Rosenbaum (2012).

functional distance was defined as the sum of the walking distance (in meters) plus the right-hand reaching distance (also in meters), with the latter term being multiplied by an empirically fit constant. Similarly, right path functional distance was defined as the sum of the walking distance (in meters) plus the left-hand reaching distance (also in meters), with the latter term being multiplied by another empirically fit constant. The empirically fit constant for the right hand was 10.3. The empirically fit constant for the left hand was 12.3. Based on these two values, it was possible to say that right-hand reaching was less costly than left-hand reaching (10.3 compared to 12.3) and that reaching over some distance was much more costly than walking over that same distance, 11.3 times more costly, in fact (the mean of 10.3 and 12.3).

Two further remarks are worth making about the study just reviewed. First, the study was aimed at showing how different kinds of costs are considered together. *A priori*, it is not obvious how walking costs and reaching costs are co-evaluated in the planning of walking and reaching. The study just summarized shows that it is possible to find a common currency for evaluation of the two kinds of costs. That common currency is (or is analogous to) "functional distance," defined as the weighted combination of walking distance and reaching distance. Presumably, the weights would change if walking were challenged more (e.g., by adding leg loads) or if reaching were challenged more (e.g., by adding wrist loads). Being able to estimate mathematical weights such as these is central to the general approach outlined here because, as stated in the introduction of this paper, we believe that tasks can be represented as vectors of weights for dimensions on which tasks vary (see **Figure 2**).

The second remark is that the study just reviewed was done by having participants actually walk and reach in the environment depicted in **Figure 12**. The study was later repeated by showing a new group of participants (another group of Penn State undergraduates) pictures of the environment in which the real task had been done, photographed from the perspective of someone standing where participants stood at the start of each real-action trial (Rosenbaum, 2012). Examples of those images are shown in **Figure 12**. In each experimental trial, one image was shown on a computer and the participant either pressed a left key, indicating that s/he would walk along the left edge of the table (carrying the bucket with the right hand to the left target), or the right key, indicating that s/he would walk along the right edge of the table (carrying the bucket with the left hand to the right target). There was no time pressure, just as in the"real-action"experiment. Moreover, participants were allowed to hold and heft the bucket (which was empty) before doing the computerized "virtual-action" task.

The result of the virtual-action study was that the choices participants made when they indicated how they *would* do the task were almost identical to the choices made by participants who actually *did* the task. The result lent credence to the impression that Rosenbaum et al. (2011) had when they ran their experiment, that their real-action participants knew which way they would go as soon as they left the start point.

It also happened that in the virtual-action study of Rosenbaum (2012), the choice reaction times were longer the more similar the functional path lengths of the left and right paths. This outcome let Rosenbaum (2012)reject the hypothesis that participants mentally simulated one task alternative and then the other, choosing whichever seemed easier. Such a serial simulation method would have resulted in a different pattern of choice reaction times than the one obtained. The choice reaction times would have grown with the sum of the left and right functional path lengths rather being inversely related to the difference between the two path lengths, as found.

Did it make sense that participants did not rely on serial simulations to choose their actions in this reach-and-walk task? Rosenbaum (2012) suggested that it did. By analogy to someone being chased by a tiger, if you had a tiger on your tail, the tiger would have you for lunch if you fully simulated alternative escape paths. If you stood at a choice point, blithely imaging yourself going one way or the other, you would probably land in the tiger's jaws. A more efficient method would be to compare critical differences between the paths, quickly choosing your action based on differences between the alternatives. The time to choose between the paths would grow with their similarity, as was found in the virtual-action task of Rosenbaum (2012) and as is typically found in studies of perceptual discrimination (e.g., Johnson, 1939).

## **CONCLUSION**

The research summarized in this article has been concerned with choosing between actions expressed at the relatively low level of carrying out movements, especially with the hands and legs in the context of object manipulation. As noted in the introduction, there has been relatively little attention paid to the motor system in psychology, which is odd considering that psychology is the science of mental life and behavior, whereas motor control is the science of how one gets from mental life *to* behavior.

The latter definition might not be the one that most motorcontrol researchers spontaneously provide when asked to define their field, for most motor-control researchers typically come from engineering or neuroscience. That issue aside, it is not always clear that to understand motor control, one must invoke mental states. Some aspects of motor control are explicitly removed from mental states in that they rely on mechanical properties of the neuromuscular and skeletal system, sometimes obviating the need for planning or control, as discussed in Section"Mechanics."Similarly, reflexive (highly automatic) responses might not require extensive mental involvement. Even in the case of simple tasks where reflexes seem sufficient, mental states turn out to have a tuning function, as reviewed in Section "Coupling." Thus, mental states are essential for motor control, just as motor control is essential for the expression of mental states.

Why motor control has received short shrift in psychology is an interesting topic that, among other things, tells psychologists about their values (Rosenbaum, 2005). One hypothesis about why psychologists have not pursued motor-control research as actively as they might have is that they think the methods that are needed are extremely technical, so that, to make any kind of progress, one has to record the electrical activity of muscles, for example, or the detailed kinematic properties of the limbs with expensive equipment. As we have tried to show here, however, simple behavioral methods can be profitably applied to the study of motorically expressed action choices. None of the studies described here (from our lab) required exotic or highly technical equipment. The equipment that was needed to do the studies we have summarized has been limited to tables, stools, beach buckets, toilet plungers, wooden dowels, wooden disks, webcams, and laptop computers.

Even with such primitive materials, however, we have arrived at some useful conclusions. The first of these is that different tasks can be represented in terms of the weights assigned to different performance variables. No matter how obvious this point is, it actually diverges from a prevailing view in engineering-inspired motorcontrol research – namely, that there is some single optimization variable that governs motor control. Various candidates for this single optimization variable have been suggested over the years, including minimization of mean squared jerk (Hogan, 1984), minimization of mean squared torque change (Uno et al., 1989), and minimization of endpoint variance (Harris and Wolpert, 1998). But movements do not always satisfy these constraints. Indeed, the flexibility of performance – for example, the possibility of making high-jerk bow strokes while playing the violin with staccato style versus making very smooth bow strokes while playing the violin with legato style – reflects the opposite of unstinting loyalty to one fixed optimization constraint. Rather, it reflects the possibility of re-prioritizing constraints according to the task to be achieved. The essence of skill, we believe, is being able to re-prioritize constraints, not being locked into prioritizing constraints in a fixed fashion.

Our second conclusion is that identifying the priorities of constraints for a task need not be viewed as an elusive goal. Instead, it is a reachable goal if one is willing simply to try to find out which means of achieving a task are preferred over others. All the presently reviewed experiments (from our lab) had this goal. What was common to all the experiments was the aim of determining which performance variables participants cared about more than

others. To answer this question, we relied on ratings, measures of performance quality, and, especially, two-alternative forced choice preferences.

Our third and final conclusion concerns embodied cognition. This has become a very popular topic lately. The embodiment perspective is one that we find congenial given our interest in motor control, but the discussion of embodiment has glossed over the details of motor performance. Saying that perception implicitly calls up a response is fine as far as it goes, but a "response" is actually an equivalence class of possible movement solutions, as detailed here. Therefore, turning to a familiar example from the embodied-cognition literature (Glenberg and Kaschak, 2002), reading a sentence about opening a drawer may evoke a draweropening response, but there isn't a single movement that achieves drawer opening, as discussed earlier in connection with the study of Studenka et al. (2012). Likewise, saying that embodiment may

## **REFERENCES**


robots based on passive-dynamic walkers. *Science* 307, 1082–1085.


autism spectrum disorder plan ahead? *Front. Integr. Neurosci.* 7:23. doi:10.3389/fnint.2013.00023

entail simulating actions is only theoretically helpful up to a point. As we have argued here in connection with the study of choosing between walking-and-reaching routes (Rosenbaum, 2012), simulation may not be used, at least judging from the fact that the time to choose between the routes was not predicted by the sum of their lengths. Other studies from our lab, not reviewed here (Walsh and Rosenbaum, 2009; Coelho et al., 2012) have also cast doubt on a naïve account of motor imagery according to which actions are chosen by running mental movies of the actions in order to find out which is better or best; see also Cisek (2012). Were such a method to be used, we probably would not have survived in the jungles from which we evolved. The methods we use to choose actions are honed by eons of selective pressure. The features of actions that are preferred are ones that have been selected for and that the experiments summarized here have been aimed at

Haggard, P. (1998). Planning of action sequences. *Acta Psychol. (Amst.)* 99, 201–215.

identifying.


the role of observational learning. *Vision Res.* 51, 945–954.


Jorgensen, M. (1990). "Constraints for action selection: overhand versus underhand grips," in *Attention and Performance XIII: Motor Representation and Control*, ed. M. Jeannerod (Hillsdale, NJ: Lawrence Erlbaum Associates), 321–342.


variance during reaching. *Exp. Brain Res.* 225, 431–442.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 April 2013; accepted: 27 April 2013; published online: 03 June 2013.*

*Citation: Rosenbaum DA, Chapman KM, Coelho CJ, Gong L and Studenka BE (2013) Choosing actions. Front. Psychol. 4:273. doi: 10.3389/fpsyg.2013.00273*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Rosenbaum, Chapman, Coelho, Gong and Studenka. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Action-sentence compatibility: the role of action effects and timing

#### **Christiane Diefenbach1,2\* , Martina Rieger 1,3, Cristina Massen1,4 andWolfgang Prinz <sup>1</sup>**

<sup>1</sup> Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany


<sup>4</sup> Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany

#### **Edited by:**

Ezequiel Morsella, San Francisco State University and University of California San Francisco, USA

#### **Reviewed by:**

Ezequiel Morsella, San Francisco State University and University of California San Francisco, USA T. Andrew Poehlman, Southern Methodist University, USA

#### **\*Correspondence:**

e-mail: christiane.diefenbach@gmx.de

Research on embodied approaches to language comprehension suggests that we understand linguistic descriptions of actions by mentally simulating these actions. Evidence is provided by the action-sentence compatibility effect (ACE) which shows that sensibility judgments for sentences are faster when the direction of the described action matches the response direction. In two experiments, we investigated whether the ACE relies on actions or on intended action effects. Participants gave sensibility judgments of auditorily presented sentences by producing an action effect on a screen at a location near the body or far from the body. These action effects were achieved by pressing a response button that was located in either the same spatial direction as the action effect, or in the opposite direction. We used a go/no-go task in which the direction of the to-be-produced action effect was either cued at the onset of each sentence (Experiment 1) or at different points in time before and after sentence onset (Experiment 2). Overall, results showed a relationship between the direction of the described action and the direction of the action effect. Furthermore, Experiment 2 indicated that depending on the timing between cue presentation and sentence onset, participants responded either faster when the direction of the described action matched the direction of the action effect (positive ACE), or slower (negative ACE). These results provide evidence that the comprehension of action sentences involves the activation of representations of action effects. Concurrently activated representations in sentence comprehension and action planning can lead to both priming and interference, which is discussed in the context of the theory of event coding.

**Keywords: action-sentence compatibility, language comprehension, motor simulation, action simulation, embodiment**

## **INTRODUCTION**

## **EMBODIED LANGUAGE COMPREHENSION**

Imagine that a friend who plays football tells you that she has scored a goal. While listening to her report, you vicariously experience the described events. You "see" the shot in your mind's eye, and if you have your own experiences with playing football, you probably "feel" the movement of kicking the ball. We are all familiar with this kind of vicarious experience of a described situation not only from conversations, but also from reading stories when we feel as if the events occurring in the story happened to ourselves.

This kind of vicarious experience is what the proponents of embodied approaches to language comprehension (Barsalou, 1999; Glenberg and Kaschak, 2002; Zwaan, 2004; Pulvermüller, 2005) call *mental simulation* of the described situation, and it is regarded as essential for capturing the meaning of an utterance. According to the embodied view, words and sentences reactivate memory traces from actual experiences with the denoted objects, events, or actions in the person trying to comprehend the words or sentences. These perceptual and action representations enter

into a mental simulation that is constructed during language comprehension. Empirical evidence for those assumptions stems particularly from studies on action-related language. Here it is assumed that the comprehension of action-related language relies on action simulation, that is, on the reactivation of stored motor experiences.

An embodied approach to language comprehension may have important implications for the role of conscious awareness in action processing. This is because an approach like this challenges the classical distinction between explicit, declarative knowledge about action (as is involved in representations of action-related words and sentences) and implicit, procedural knowledge for action (as is involved in motor representations for action control). Challenging this distinction is, to some extent, tantamount to challenging that there is a functional separation between conscious and non-conscious modes of action processing. Common opinion holds that processing of declarative knowledge is (mandatorily) conscious whereas processing of procedural knowledge does not require conscious awareness (Squire, 1992; Balota et al., 2000; Tulving, 2000; Baddeley, 2002). If so, the claim that one is grounded in the other seems to imply that conscious and non-conscious processing modes draw on common representational resources and are, in functional terms, less different than is often claimed and believed.

#### **EVIDENCE FOR MOTOR SIMULATIONS IN LANGUAGE COMPREHENSION**

Neurophysiological and brain imaging studies have indicated that motor system activation is involved in semantic access to the meaning of action words. For instance, comprehension of sentences describing actions performed with the mouth, hand, or leg engages motor circuits that largely overlap with those activated during execution and observation of the described actions (Tettamanti et al., 2005). Further, changes in motor excitability are specific for the effector involved in the described action (Buccino et al., 2005). When the arm's motor area is stimulated, words referring to arm actions are recognized faster than words referring to leg actions, and the opposite pattern occurs when the leg's motor area is stimulated (Pulvermüller et al., 2005). In addition, processing of action verbs at the onset of reaching movements affects the kinematics of the movements 160–180 ms after word onset (Boulenger et al., 2006). At this time, early lexico-semantic processes are known to occur (Sereno et al., 1998). Even when action verbs are only displayed subliminally while participants prepare a reaching movement, they affect motor preparation and subsequent movement kinematics (Boulenger et al., 2008).

On the behavioral level, studies show content-specific interactions between the understanding of a verbally described action and a concurrently performed motor response. Usually these interactions reflect a facilitated execution of the motor response when the response shares some features with the semantic meaning of the action-related words and sentences presented as stimuli. An example of such an interaction is the action-sentence compatibility effect (ACE; Glenberg and Kaschak, 2002), which refers to compatibility between the direction of a described action and the direction of the response. In ACE experiments, participants judge whether sentences describing actions toward or away from the body, such as "Courtney handed you the notebook" or "You handed Courtney the notebook," are sensible or not. Participants perform the judgment by moving the hand from a home button in the center of a response device to either a button closer to their body (near button, movement toward the body) or to a button further away from the body (far button, movement away from the body). Several studies have shown that when the movement direction for the response is compatible with the movement direction expressed in the sentence, e.g., when both are directed away from the body, response times are faster than when movement directions are incompatible (Glenberg and Kaschak, 2002; Borreggine and Kaschak, 2006; Glenberg et al., 2008b; Kaschak and Borreggine, 2008). A similar compatibility effect has been observed for verbally described directions of manual rotations (e.g., opening a water bottle) and rotation directions that were produced by turning a knob (Zwaan and Taylor, 2006). Zwaan et al. (2012) found that the response execution was modulated even when there was no feature overlap between responses and verbally described movements. Participants were presented with sentences that implied a forward or backward movement and although response movements involved leaning to the left or right on a balance board, these movements were shifted forward or backward depending on the sentence content.

## **THE NATURE OF THE INVOLVED ACTION REPRESENTATIONS**

Based on those and similar behavioral and neurophysiological studies, embodied theories of language comprehension, such as the theory of perceptual symbol systems (Barsalou, 1999), or the indexical hypothesis (Glenberg and Robertson, 1999, 2000), assume that linguistic contents evoke multimodal representations of their referents. In their view, language reactivates experiences that were encoded and stored by different modality-specific systems. In the case of action words or sentences, the motor system partially reactivates the motor state that produces the denoted action, thereby creating a simulation of that action (Barsalou, 2003). Thus, action representations activated during language comprehension are supposed to refer to specific motor programs.

In spite of the considerable body of evidence for the involvement of motor programs in understanding action-related language, it is conceivable that another kind of action representation is also involved. Based on assumptions of the common coding approach (Prinz, 1990, 1997), representations of described actions could be coded in terms of action goals or action effects. Prinz's approach proposes that perceived events and planned actions are coded in a common representational format. This common coding of perception and action is thought to result from actions being represented in terms of their perceptual consequences or intended action effects (Prinz, 1990; for evidence see, e.g., Elsner and Hommel, 2001; Rieger, 2007). Support for this assumption comes from stimulus-effect compatibility effects. For instance, Hommel (1993) demonstrated that response times are faster when participants respond to a stimulus which has a spatial compatibility to an intended action effect (e.g., both are on the left side), regardless of whether the intended action effect was produced by a spatially compatible or non-compatible action (i.e., by pressing a right or left key).

Some studies have already indicated that interactions between language processing and actions can occur on the level of intended action effects. Markman and Brendl (2005) found that responses to positive words were facilitated when producing an action effect with a positive connotation (approaching the participant's name on the screen) compared to producing an action effect with a negative connotation (withdrawing from the participant's name, i.e., avoidance action). This compatibility effect was independent of whether the action effect resulted from moving the arm toward or away from the body. In this case, priming occurred because representations of emotional words and response representations shared affective codes on the level of action effects (see also Eder and Klauer, 2007, 2009; Eder and Rothermund, 2008; van Dantzig et al., 2008). Lindemann et al. (2006) showed that semantic processing of words was facilitated when the words denoted the goal of an action that was prepared before. Thus, the activated action goal primed the word meaning, which again suggests common codes that represent the intended action effects.

#### **THE PRESENT STUDY**

The experiments investigating compatibility effects between language comprehension and concurrent action, so far, either have not clearly differentiated between the representations of actions and intended action effects or they only used affective word stimuli (like Markman and Brendl, 2005). Therefore, it is unclear to what extent the comprehension of sentences is based on representations of intended action effects or on motor representations.

Our experiments addressed this question by testing whether action representations activated during sentence comprehension interact with representations of intended effects of actions or with representations of the motor component of these actions. We asked participants to indicate whether sentences describing actions toward or away from the body were sensible or not. The sentences expressed transfer of concrete or abstract objects between the participant and another person. Participants were asked to perform the judgment by producing an action effect (lighting a star on a horizontally mounted screen) at a location either near the body or far from the body. These action effects were achieved by moving the hand from a centrally located button to a button located nearer to or further away from the body. In order to dissociate actions from their intended effects, participants performed the task with either a regular spatial relationship between actions and the intended effects (e.g., combining a movement to the near button with an action effect located near the body) or with an inverted spatial relationship between actions and effects (e.g., combining a movement to the near button with an action effect located far from the body).

We looked at sentence-effect compatibility, that is, the compatibility between the direction of the action described in the sentence (object transfer toward or away from the body) and the direction of the intended action effect (a star appearing on the screen at a location near the body or far from the body). If representations of intended action effects play a role in understanding action-related language, responses should be faster in the sentenceeffect compatible condition than in the sentence-effect incompatible condition. This pattern of results (an action effect-related ACE) should be observed both in regular and in inverted conditions. If, however, representations of the motor component of actions predominantly contribute to the understanding of actionrelated language, compatibility should be effective between the sentence direction and the direction of the arm movement to the response button (movement-related ACE). In this case, different patterns should be observed in the regular and inverted condition: in the regular action-effect relation condition in which the directions of actions and action effects are completely correlated, responses should be faster in the sentence-effect compatible condition than in the sentence-effect incompatible condition. This pattern should reverse in the inverted action-effect relation condition. Because sentence-effect compatibility is equivalent to sentenceaction incompatibility in the inverted condition, the reversed ACE pattern means that responses are faster with sentence-action compatibility than sentence-action incompatibility.

The question of whether the ACE is related to the arm movement or to the intended action effect might be answered differently for concrete and abstract sentences. Understanding concrete sentences might involve activation of specific motor programs and hence give rise to a movement-related ACE, whereas understanding abstract sentences might involve activation of representations of action effects and therefore lead to an action effect-related ACE.

## **EXPERIMENT 1**

Experiment 1 investigated whether the comprehension of action sentences relies on actions or on intended action effects by dissociating actions from their intended effects in an ACE paradigm. We wanted participants to be aware of the locations of the to-beproduced action effect and avoid participants adapting to certain movements when judging sentences. Therefore, we varied randomly whether the response required producing an action effect at the near or the far location. In order to keep the task as easy as possible, we adopted the go/no-go method from Borreggine and Kaschak's (2006) ACE experiments, in which participants only responded when they judged sentences to be sensible. Participants were informed about the current location of the to-be-produced action effect by a visual cue at the onset of every sentence, similar to a condition in which Borreggine and Kaschak found the standard ACE.

## **METHOD**

## **Participants**

Nineteen adults were paid 7 Euros to participate in the experiment. The data from three participants were excluded from analyses for reasons explained later in the data analysis section. Thus, the final sample comprised 16 participants (mean age = 24.8 years; 6 males, 10 females). All participants were native German speakers, right-handed, and had normal or corrected-to-normal vision and audition. They were randomly assigned to two groups containing eight participants each. One group performed the task in the regular action-effect relation condition, while the second group performed the task in the inverted action-effect relation condition.

## **Stimuli and apparatus**

The linguistic material comprised 80 triads of sentences that were adopted from Glenberg et al. (2008b) and translated into German. Each triad consisted of three versions of an action sentence: the two critical sentences of each triad described the same transfer action directed either toward the body (e.g., "Jakob reicht dir das Buch" [Jacob hands you the book]), or away from the body (e.g., "Du reichst Jakob das Buch" [You hand Jacob the book]). The third sentence contained the same character names and objects as the transfer sentences, but a different verb that expressed no transfer (e.g., "Du liest mit Jakob das Buch" [You read the book with Jacob]). Half of these neutral sentences began with the German word "Du" [you], like the away sentences, and half began with a character name, like the toward sentences. In addition, half of the triads described the concrete transfer of objects (as in the examples above), and half described abstract transfer,for example, the transfer of information (e.g., "Julia erzählt dir eine Geschichte" [Julia tells you the story]). Half the sentences (40 triads) were sensible and half were nonsensical. Twenty additional sentences were created and served as practice items. All sentences were recorded by a female German speaker and presented over headphones during the experiment.

The response device (see **Figure 1**) consisted of three buttons (diameter: 6.3 cm) that were arranged in a vertical line on a board. The distance between the center of the middle button and the centers of the near and far button was 11.3 cm. The board was located on a table in front of the participant so that the buttons differed in

**FIGURE 1 | Illustration of response fields and cue (A) and of arm movement and its effect by the example of a "yes" response in the yes-is-near condition with regular action-effect relation (B) and with inverted action-effect relation (C)**.

distance from the participant's body. The near button was about 20 cm away from the body. Above the buttons, a 17<sup>00</sup> flat screen monitor was mounted horizontally. On the screen, two response fields were presented on a black background. One of the fields appeared at a near location (subtending a visual angle of 6.6˚) and one at a far location (visual angle of 3.1˚). The distance between the centers of the response fields was identical to the distance between the centers of the outer response buttons (i.e., 22.6 cm). One response field had a blue frame and the other one was framed in yellow; both fields were black inside. To indicate that a sentence was sensible participants were asked to activate a given response field by pressing the near or far button. The color (blue or yellow) of a cross (1.8˚ of visual angle) served as a cue to indicate the response field that had to be activated. As an effect of the button press, a star flashed in place of the activated response field on the screen. The effect star had the same color as the activated response field and subtended a visual angle of 16.2˚ (near location) or 7.7˚ (far location). To increase attention to the visual response effect, a sound ("twinkles") was presented at the same time as the star appeared. The sound was composed of two successively presented tones that formed a fourth upward (with fundamental frequencies of 625 and 834 Hz).

Because the screen was placed above the buttons, the moving hand was covered and, thus, participants received no on-line visual feedback of their movement, but only perceived its effect on the screen. The experiment was controlled by an IBM-compatible computer running Presentation software (Neurobehavioral Systems, Albany, USA), and the response buttons were connected to it via the parallel port.

#### **Procedure and design**

The experiment was run in a dimly illuminated and soundattenuated room. Each trial was initiated by pressing the middle button with the right hand, and participants were told not to release this button until they were able to make their response. Five-hundred milliseconds after the button press, the blue and the yellow response fields appeared on the screen at the near and far location and 1000 ms after their appearance, the auditory presentation of a sentence started. Participants were instructed to decide if the presented sentence was sensible or not. As a go/no-go task was used, participants were asked to respond only when the sentence was sensible (*yes* response), and to refrain from responding to a nonsense sentence. The yes response was randomly assigned to either the near response field (yes-is-near condition) or to the far response field (yes-is-far condition). When the sentence presentation started, the response cue (a cross) appeared in the center of the screen matching the color of one of the response fields. The color of the cue indicated whether the near or the far response field should be activated if the sentence was sensible. Activating the response fields required moving one's arm from the middle button to the near or far button, that is, toward the body or away from the body. Participants were asked to give the yes response as soon as the sensibility of the sentence could be decided, thereby responding as quickly and accurately as possible. In case a response occurred, a star flashed and the accompanying sound was presented as soon as one of the response buttons was pressed. The star replaced the response field on the screen. The cue remained visible until the response was made or, in the case of a sentence being judged as nonsensical and no response being given, until the trial timed out 5 s after sentence onset. The sequence of trial events is illustrated in **Figure 2**.

There were two different mappings of action effects to buttons (see **Figure 1** for an illustration): in the condition with the regular action-effect relation, actions and action effects were completely correlated, which means that the location of the to-be-produced action effect on the screen corresponded with the location of the button press (i.e., both were near the body or both were far from the body). In contrast, the action and its effect were opposed in the condition with the inverted action-effect relation: an action effect at a certain location on the screen resulted from moving one's arm in the opposite direction (i.e., the star appeared at the near location on the screen when pressing the far button and vice versa).

At the beginning of the experiment, participants received two blocks of practice trials. The first block consisted of 32 trials in which participants were familiarized with the response assignment. They were only presented with the German words "Ja" [yes] and "Nein" [no]. In the yes trials they were asked to activate the response field that was indicated by the visual response cue, in the no trials they were asked to refrain from responding. Feedback about the correctness of the response was provided by displaying the German word "Richtig" [right], colored green, or "Falsch" [wrong], colored red, on the screen. In the second practice block, participants received 20 trials with practice sentences. The two response assignments were each presented in one half of the trials. The whole experiment lasted approximately 30 min.

Apart from action-effect relation (regular, inverted), which was manipulated between participants, all of the independent variables were manipulated within participants. Sentence direction (toward,

away, and neutral), sentence type (concrete and abstract), sensibility (sensible and nonsensical), and effect direction (yes-is-near, yes-is-far) varied from trial to trial.

To ensure that all sentences appeared equally often in every condition, the 240 stimulus sentences were divided into two material blocks which were assigned to one of the effect directions each. In each trial, a sentence was selected randomly from one of the two material blocks. The assignment of material blocks to conditions of effect direction and action-effect relation was counterbalanced across participants. Sentences were randomized in such a way that each material block was divided into five subblocks (24 sentences each) that contained an equal number of sentences of each category (sensibility, sentence type, sentence direction), but never included sentences that belonged to the same triad. For each participant, the order of sentences in each subblock, as well as the order of the subblocks themselves, was randomized.

#### **Data analysis**

In both Experiments 1 and 2, participants were removed from the analysis and replaced by a new participant (a) when their error rates exceeded 15% or (b) when participants had their hand resting on the middle button and only pressed the response buttons with fingers splayed out in more than 15% of the trials, despite being instructed to move the whole hand from the middle button to the response button. These cases were identified through earlier registration of response button presses than the release of the middle button.

Dependent variables were total response time (TRT)<sup>1</sup> and percentages of errors. TRT was measured from the onset of sentence presentation to the pressing of the near or far response button. Incorrect trials were excluded from the analysis. To reduce the effect of outliers, first, 0.5% of the longest and shortest responses over participants were eliminated, and second,for each participant in each condition, responses that deviated more than 2.5 SD from the condition mean were discarded. This procedure was based on the trimming procedure used by Glenberg et al. (2008b).

Only data from the sensible toward and away sentences were analyzed (see Glenberg et al., 2008b). In order to simplify the analysis and to make the data more easily accessible, the variables sentence direction and effect direction were merged into a new variable, sentence-effect compatibility (compatible, incompatible). The sentence-effect compatible condition always contained cases in which effect direction matched the sentence direction, irrespective of the direction of the arm movement required for the response. The sentence-effect incompatible condition included cases in which effect direction and sentence direction were opposed.

Three-way mixed-factor analyses of variance (ANOVAs) were conducted on TRTs and error rates with sentence-effect compatibility (compatible, incompatible) and sentence type (concrete, abstract) as within-subjects factors and with action-effect relation (regular,inverted) as a between-subjectsfactor. Since compatibility effects are the main interest of this work, only main effects of and interactions with sentence-effect compatibility will be reported.

#### **RESULTS**

#### **Total response time**

The trimming procedure applied to the data from the final sample resulted in the elimination of 4.8% of the TRT data. A significant main effect of sentence-effect compatibility [*F*(1, 14) = 4.67, MSE = 1344.46, *p* = 0.049] and a significant interaction between

<sup>1</sup>Preliminary data inspection indicated that participants may not have always selected the response before releasing the middle button (i.e., short response times, RTs), but instead made the decision to press the near or the far button only when they had already initiated the movement. In such instances, response preparation occurs partly during movement time (MT, time from releasing the middle button to pressing the response button) and compatibility effects correspondingly shift from

RTs to MTs. We therefore decided to analyze total response time (TRT), the sum of RT and MT.

sentence-effect compatibility, sentence type, and action-effect relation [*F*(1, 14) = 6.73, MSE = 843.77, *p* = 0.02] were found (see **Figure 3** for mean TRTs). Further ANOVAs, performed separately for each sentence type, revealed that responses to concrete sentences were faster across action-effect relations in the sentenceeffect incompatible condition (*M* = 2034, SD = 144) than in the sentence-effect compatible condition [*M* = 2056, SD = 167; *F*(1, 14) = 4.59, MSE = 851.22, *p* = 0.05]. This was not modified by an interaction between sentence-effect compatibility and actioneffect relation [*F*(1, 14) = 0.92, MSE = 851.22]. In contrast, for abstract sentences a significant interaction between sentenceeffect compatibility and action-effect relation was observed [*F*(1, 14) = 4.63,MSE = 1337.0, *p* = 0.049].When the action-effect relation was inverted, responses were faster in the sentence-effect incompatible condition compared to the sentence-effect compatible condition [*t*(7) = 2.98, *p* = 0.02]. No significant difference was found in the regular action-effect relation condition [*t*(7) = −0.49]. In sum, TRTs for concrete sentences showed a sentence-effect compatibility disadvantage (negative ACE) that was not modulated by action-effect relation, and TRTs for abstract sentences displayed a sentence-effect compatibility disadvantage only in the condition with inverted action-effect relation.

## **Effects of response speed on the ACE**

The observation of a negative ACE was surprising, particularly since we followed the procedure by Borreggine and Kaschak (2006) that yielded a positive ACE. However, Borreggine and Kaschak manipulated the timing of response planning in their ACE experiments. When the response cue that informed participants of the movement direction was presented at the onset of the sentence, the response could be planned while the sentence was processed. In this condition, a positive ACE arose. In contrast, when the cue appeared after the offset of the sentence, which prevented participants from preparing the response during sentence processing, responses were slower and the ACE was eliminated and descriptively showed a tendency to be reversed.

The negative ACE in the current experiment could result from participants not immediately paying attention to the response cue when it appeared on the screen. Since the cue was visible throughout the whole sentence presentation, there may have been a tendency for participants to postpone processing of the cue and response preparation to the end of the sentence. In this way, our experiment may have corresponded to Borreggine and Kaschak's (2006) condition with delayed cue presentation, in which the ACE started to become negative. Thus, different timings between response preparation and sentence comprehension might be responsible for this result.

In order to investigate whether this might be the case, we used participants' mean TRTs of all correct trials containing sensible toward and away sentences to obtain a measure reflecting how fast each participant responded on average. Fast responses may reflect relatively early response preparation, whereas slow responses may reflect relatively late response preparation. We repeated the ANOVA described above with participants' speed as an additional covariate. The analysis revealed a significant interaction between sentence-effect compatibility and speed [*F*(1, 13) = 12.53, MSE = 737.24, *p* = 0.004]. To clarify the nature of this interaction, we correlated participants' speed (i.e., their mean TRTs) with the magnitude of theACE (difference between TRTs for sentence-effect incompatible trials and TRTs for sentence-effect compatible trials, i.e., positive numbers indicate a compatibility advantage and negative numbers a compatibility disadvantage). A negative correlation was obtained (*r* = −0.7, *p* = 0.003) which reveals that the slower the participants responded, the more they showed a compatibility disadvantage.

### **Table 1 | Mean error rates (in %) and standard errors of error rates (in parentheses) in Experiment 1.**


#### **Error rates**

In the ANOVA on error rates, no significant effects involving sentence-effect compatibility were found (all *F*s < 1.1). Mean error rates are given in **Table 1**.

## **DISCUSSION**

The present experiment addressed the contributions of actions and intended action effects to the ACE. To this end, actions were dissociated from their effects and differed in whether they were directed toward the body or away from the body. In each trial, participants were instructed about the current direction of the yes response at sentence onset. The results were different for concrete and abstract sentences. TRTs for concrete sentences showed a negative ACE that referred to the action effect, because the data pattern was the same with the regular and inverted action-effect relations. Thus, the mental simulation during the comprehension of concrete sentences seems to involve representations of action effects. For abstract sentences, the ACE occurred only in the condition with inverted action-effect relation, but not in the condition with regular action-effect relation. Therefore, it cannot be determined whether the ACE in the inverted condition relied on action or effect, and no conclusions can be drawn regarding the nature of the representations activated during the comprehension of abstract sentences.

A follow-up analysis was conducted to examine whether the unexpected occurrence of the negative ACE was connected with response timing. The results suggest that slow responses, which probably reflect response preparation after the completion of the sentence, promote the emergence of a negative ACE. This indicates that the relative timing between movement preparation and sentence comprehension might play a role for the reversal of the ACE.

Changes in compatibility effects depending on relative timing have also been observed in other studies. For instance, Richardson et al., 2001, Experiment 1) presented participants with a series of pictured objects that afforded an action on either the left or the right side. Afterward, participants were asked to press a left or right key in order to indicate whether they had seen a certain object or not. Responses were facilitated when the side of the required keypress was opposite to the side of the action afforded by the recalled object. This incompatibility effect seemed to depend on the timing of the responses: when, in a second experiment, response time data were split into an early and a late half, the late group, again, exhibited an incompatibility effect between motor responses and affordances of verbally described objects. The early

group, in contrast, displayed a non-significant tendency toward a compatibility effect.

To explain their results, Richardson et al. (2001) and Borreggine and Kaschak (2006) drew on the theory of event coding (TEC; Hommel et al., 2001). TEC suggests that action representations with overlapping features prime each other when they are activated at short time intervals (compatibility benefits), but interfere with each other when they are activated at long intervals (compatibility costs). In the light of TEC, compatibility benefits and costs in the ACE may arise as follows: during online sentence processing, feature codes are activated that represent the action that the sentence content is referring to. Among those feature codes is the directional code (toward or away from the body). In the first phase (activation phase), these codes can be activated more easily for planning another action that shares features with the first action. If the response is prepared during this activation phase, access to the activated directional feature code of the described action is easier. Thus, responding in the same direction is facilitated, resulting in compatibility benefits (the standard ACE). At the end of the sentence, when all relevant information is known, the activated feature codes are probably bound together to form a complete representation of the sentence content (which means running a full simulation of the described action). In this second phase (integration phase, about 250–500 ms after feature activation, Stoet and Hommel, 2002), the feature codes can no longer be activated in isolation. If response planning does not take place until the sentence is completed, the directional feature code is less available for coding the response. Thus, responding in the same direction is impaired, resulting in compatibility costs. This could account for the pattern of results obtained in Experiment 1: a large part of participants probably held off preparing their response until the end of the sentence, which caused the negative ACE.

## **EXPERIMENT 2**

Theory of event coding implies that there are mutual interactions between sentence comprehension and response planning. Therefore, not only should semantic processing of the sentence be able to facilitate or impair response planning, but also vice versa, depending on the temporal order of the two processes. In order to investigate the consequences of the timing between movement preparation and sentence comprehension for the ACE, the stimulus onset asynchrony (SOA) between the onset of sentence presentation and cue presentation was manipulated in Experiment 2. Moreover, the response cue did not remain on the screen but was presented only for a short period of time in order to limit the processing of the response cue to a certain point in time. In addition to addressing the timing issue, Experiment 2 continued to pursue the initial question of whether the ACE relies on actions or on action effects. Thus, again, the spatial relationship between action and action effect was manipulated.

## **METHOD**

## **Participants**

Fifty German native speakers took part in the experiment in return for 7 Euros. The data from 10 participants were discarded for the reasons stated previously, and so analyses were based on the data from 40 participants (mean age = 24.4 years; 15 males,25 females). All participants were right-handed and had normal or correctedto-normal vision and audition. They were randomly assigned to the regular or the inverted action-effect relation condition, each comprising 20 participants.

## **Stimuli and apparatus**

The stimuli and apparatus were identical to those used in Experiment 1.

## **Procedure and design**

The procedure and design were the same as in Experiment 1, apart from the following modifications: the cue signaling the direction of the yes response in each trial was visible only for 500 ms. The SOA between sentence onset and the presentation of the response cue was manipulated within subjects and varied blockwise. In one of the five SOA conditions, the response cue appeared on the screen simultaneously with the onset of the sentence presentation (SOA = 0 ms; as in Experiment 1). In the other conditions, the cue was presented 1000 ms before sentence onset (SOA = −1000 ms), 500 ms before sentence onset (SOA = −500 ms), 500 ms after sentence onset (SOA = 500 ms), and at the end of the sentence presentation (SOA = 100% of the sentence length).

At the start of the experiment, participants received 40 trials to practice the response mode. They then performed 20 trials with practice sentences. The experimental design was identical to that of Experiment 1, apart from the additional independent variable SOA (−1000, −500, 0, 500 ms, 100% of sentence length). The order of SOA blocks was counterbalanced between participants. For each participant, an equal number of sentences of each category, in each material block, were pseudorandomly assigned to the SOA conditions in such a way that, across participants, each combination of effect directions, SOAs, and action-effect relations contained each sentence with equal frequency.

## **Data analysis**

In the present experiment, median TRTs instead of mean TRTs were computed for each participant in each condition, because the additional SOA manipulation resulted in too few data points per condition to identify and remove outliers. Further, trials with two particular triads of sentences (one concrete and one abstract) were excluded from analyses. They were erroneously judged as nonsensical by a large proportion of participants,which led to unbalanced frequencies of these sentences in the different conditions. Since there were relatively few data points per condition in this experiment, some conditions did not include any correct response to these sentences at all, while other conditions did. This could have distorted the results due to the different sentence lengths. This resulted in the elimination of 5.0% of the data. Further data analysis was identical to that of Experiment 1, except that the performed ANOVAs included SOA as an additional within-subjects factor.

## **RESULTS**

## **Total response time**

Mean TRTs are depicted in **Figure 4**; since no effect of sentence type was observed in the ANOVA, data are presented averaged over concrete and abstract sentences. The ANOVA showed a significant interaction between sentence-effect compatibility and SOA [*F*(4, 152) = 4.58, MSE = 23265.86, *p* = 0.002]. *T*-tests performed separately for each SOA condition revealed that for the SOA of 0 ms, responses were faster when sentence direction and effect direction were compatible (*M* = 2148, SD = 220) than when they were incompatible (*M* = 2215, SD = 213), *t*(39) = −2.45, *p* = 0.02. In contrast, for the SOA of 500 ms, responses were slower in trials with compatible directions (*M* = 2178, SD = 217) than in trials with incompatible directions (*M* = 2123, SD = 224), *t*(39) = 2.5, *p* = 0.02. Thus, there was a compatibility advantage when the cue was presented at sentence onset, but when the cue appeared 500 ms after sentence onset, a compatibility disadvantage occurred. In conditions with SOAs of −1000, −500 ms, and 100% of the sentence length, no significant compatibility effects were observed (all |*t*(39)| < 1.8).

## **Error rates**

Mean error rates are shown in **Table 2**. Again, no effect of sentence type was observed; therefore, data are presented averaged over the two sentence types. There was a significant interaction between sentence-effect compatibility and action-effect relation [*F*(1, 38) = 8.04, MSE = 38.89, *p* = 0.007] and a marginally significant interaction between sentence-effect compatibility and SOA [*F*(4, 152) = 2.67, MSE = 50.92, *p* = 0.06]. However, follow-up analyses indicated that the only significant difference between sentence-effect compatibility conditions occurred for the SOA of −1000 ms in the regular condition [*t*(19) = 2.63, *p* = 0.02], even though the interaction between sentence-effect compatibility, action effect relation and SOA did not reach significance [*F*(4, 152) = 0.97]. The error rate was higher in the sentenceeffect compatible conditions than in the sentence-effect incompatible conditions. Because significant positive or negative ACEs occurred in different SOA conditions for error rates and for TRTs, a speed-accuracy trade-off can be ruled out.

## **DISCUSSION**

The aim of Experiment 2 was to look in more detail at the temporal dynamics of the interaction between the processes of sentence comprehension and response preparation, in order to investigate which of the conditions might lead to the emergence of a negative ACE. First of all, the results indicate that the timing between sentence comprehension and response preparation does indeed affect whether the ACE is present at all and, if it is present, whether it is positive or negative. When response planning took place 1000 or 500 ms before sentence onset, the ACE was absent in TRTs. When response planning and sentence processing started at the same time, there was a positive ACE in TRTs, whereas the ACE became negative when response planning was delayed for 500 ms. Finally, the ACE disappeared again when response planning took place after the end of the sentence. Because the action-effect relation did not modulate compatibility effects, data indicate that the ACE is related to the intended action effect rather than to the action itself. No significant differences between abstract and concrete sentences were found.

The positiveACE in the condition with the response cue appearing at sentence onset fits well with the TEC-based explanation of the ACE: the cue automatically triggers the activation of the indicated directional feature of the response. We assume that response

preparation is completed and feature codes are integrated into the action plan at about 500 ms after the presentation of the cue, because this was the approximate amount of time that passed between the cue presentation at the end of the sentence and the release of the middle button (response initiation). Regarding sentence comprehension, the direction of the described action is clear once subject and verb (the first two words in the sentence) are processed. The end of the verbs lies between 500 ms (for concrete sentences) and 850 ms (for abstract sentences) after sentence onset. Because the uniqueness point at which the verb is recognized and the direction of the described action becomes clear, lies somewhere

before this, there seems to be a temporal overlap of the activation of the directional code during response planning and sentence processing. In the sentence-effect compatible condition, priming occurs between the representation of the sentence content and the response representation, because both activate the directional feature code at the same time before it is bound to the one or the other event. Thereby, the comprehension of the sentence is facilitated, which leads to a positive ACE.

The result of the negative ACE that occurred when the response cue was given 500 ms after sentence onset could be explained similarly to Kaschak and Borreggine's (2008) interpretation of **Table 2 | Mean error rates (in %) and standard errors of error rates (in parentheses) in Experiment 2.**


SOA, stimulus onset asynchrony.

their results: in their experiment, participants were presented with sentences describing transfer toward the body or away from the body, and as a secondary task, compatible or incompatible motor responses had to be executed at different points in time during sentence processing. They found that a positive ACE arose in response times when responses were executed at an early point in the sentences, but disappeared when responses were executed in the middle of the sentences. Similar to the current experiment, responses that were performed 500 ms after the onset of sentences whose length and syntax was comparable to our sentences descriptively displayed sentence-effect compatibility costs. According to the authors, the disappearance of the ACE in the middle of the sentence results from a rather early running of the simulation which might be possible because the last part of the sentences is quite predictable. Thus, in our experiment the activated feature codes may have become integrated into the representation of the sentence content at an early point within the sentence. This might have impaired the preparation of a compatible response<sup>2</sup> .

This explanation also fits well with the finding that the ACE disappeared in the condition in which the cue was presented at the end of the sentence: if the directional feature code needed for planning the response was integrated into the simulation of the sentence content at the end of the sentence, one would have expected interference with response preparation and thus a negative ACE. Yet, if the integration, and with it the temporary binding, of the directional feature code occurred earlier in the sentence, the feature code might become available again for response preparation around the end of the sentence, thereby diminishing the interference effect.

In sum, the results concerning the direction of the ACE are in line with TEC. In addition, the ACE was related to the intended action effect rather than to the action itself. This indicates that representations of intended action effects are activated during the processing of action sentences (regardless of whether they are concrete or abstract), which is also consistent with TEC.

## **GENERAL DISCUSSION EXPERIMENTAL FINDINGS**

In the present experiments, we were interested in the question of whether the ACE relies on actions or intended action effects. Experiment 1 provided no definite answer to this question but, unexpectedly, showed a negative ACE. Experiment 2 provided evidence that the ACE is related to the intended action effect. Thus, the comprehension of action descriptions involves the activation of action representations referring to the intended effects of these actions. This holds for both concrete and abstract sentences: there were no systematic effects involving sentence type and (particularly in Experiment 2) the ACE was action effect-related for both concrete and abstract sentences.

Experiment 2 additionally addressed the role of timing between sentence comprehension and response preparation in the ACE. We assumed that the negative ACE that occurred in Experiment 1 is caused by preparing the response rather late in the sentence when sufficient information is known to simulate the described action. However, early response preparation was thought to lead to the positive ACE. Consistent with this assumption, the positive ACE emerged when response preparation took place at the beginning of sentence processing, whereas the negative ACE arose when response preparation took place around the middle of the sentence. These findings suggest that the positive ACE is a result of priming between concurrently activated directional feature codes in sentence processing and response planning. In contrast, the negative ACE seems to result from interference between the two processes that probably arises because the directional feature code is already bound to the simulation of the sentence content and, thus, is less accessible for response planning.

#### **LIMITATIONS OF THE FINDINGS**

Our data cannot rule out that, in addition to representations of intended action effects, motor representations are also involved in the emergence of the ACE: in some conditions, the ACE was modulated by action-effect relation. However, the modulation by action-effect relation reflected that the ACE occurred only in the inverted action-effect relation condition. It could be that the compatibility effect was more pronounced in this condition because responding was more difficult which resulted in participants concentrating more on the direction of the response. Because of this, the directional feature code might have received stronger activation, which in turn led to stronger interactions with the respective feature code activated during sentence comprehension. However, even though we cannot exclude that motor-related processing of the action sentences did occur in our study, our results show that processing according to intended action effects was stronger. Whether motor programs associated with the described actions are always activated during sentence comprehension and, if so,

<sup>2</sup>Although Kaschak and Borreggine define the ACE in terms of movement, it is legitimate to compare our results with theirs, because in their study, as well as in our regular condition, actions and their intended effects were completely correlated and hence sentence-effect compatibility is the same as sentence-action compatibility. Because the ACE did not differ significantly between the regular and the inverted condition in our experiment, the ACE in the inverted condition should also be comparable to Kaschak and Borreggine's ACE.

under which conditions representations of action effects or motor representations dominate, remains an open question.

Overall, the ACEs we observed were weaker and less reliable than the effects found in other ACE experiments (e.g., Glenberg and Kaschak, 2002; Borreggine and Kaschak, 2006; Kaschak and Borreggine, 2008). One reason for this could be that we investigated the ACE in German instead of in English and that differences between the languages could have caused differences in the mental simulation during sentence processing. One of the differences between the English and German linguistic material lies in the sentence construction. In the studies listed above, half of the sentences used the double-object construction (e.g., "Courtney handed you the notebook"/"You handed Courtney the notebook"), and half used the dative construction (e.g., "Andy delivered the pizza to you"/"You delivered the pizza to Andy"),whereas our German sentences were only in the double-object form (e.g., "Andrea bringt dir die Pizza"/"Du bringst Andrea die Pizza"). This was due to the fact that the dative form is not very common in the German language, and for most of the verbs used it would actually be grammatically wrong. Following the linguistic focus hypothesis (Taylor and Zwaan, 2008), this dative form, especially, may give rise to a strong ACE: in this construction, the recipient is postponed to the end of the sentence, whereby the direction of transfer is brought back into the attentional focus at this late point of processing. The renewed activation of the directional feature code around the end of the sentence may enable priming of a compatible response even when response preparation occurs rather late. In contrast, in the German sentences the focus is shifted to the transferred object and not to the direction of transfer at the end of the sentences. This may also contribute to the particular temporal dynamics of the ACE observed in Experiment 2.

## **COMPARISON WITH RELATED THEORIES**

The present findings are compatible with the TEC (Hommel et al., 2001) and, on closer inspection, they are also compatible with the indexical hypothesis by Glenberg and Robertson (1999, 2000). The theory of perceptual symbol systems by Barsalou (1999) appears to be incompatible with the finding that representations of action effects are activated during the comprehension of action sentences.

The correspondence of our results with the common coding approach and with TEC can be explained as follows: extending those theories to linguistic stimuli, we make the additional assumption (following embodied approaches to language comprehension) that semantic meaning of linguistic stimuli is represented in the same format as the perceptual and action events these stimuli refer to. Assuming that the meaning of actions is represented in terms of the intended action effects, it can be claimed that the semantic representations of action words and sentences are shaped by the goals or effects of the described actions. Because of these shared representations of action-related language and real actions, compatibility effects arise between linguistic stimuli and intended action effects. This was confirmed by the appearance of the action effect-related ACE in Experiment 2. The negative ACE and the related time-course of the ACE that we observed in Experiment 2 appear to be broadly consistent with the mechanisms of code activation and integration proposed by TEC.

Similar to the positive ACE, compatibility benefits due to activated codes are also reflected in interactions between language processing and actions that were already mentioned in the introduction (e.g., Boulenger et al., 2006; Zwaan and Taylor, 2006): the representation of the word meaning activates feature codes which then prime feature-overlapping responses. Similar to the negative ACE, compatibility costs due to integrated codes have been shown,for example, between perceptual and action-planning processes (Müsseler and Hommel, 1997): perceptual performance was impaired – the identification of a briefly presented and masked arrow pointing to the left or right – when concurrently preparing a movement that was spatially compatible with the stimulus.

Furthermore, we follow TEC in assuming that the shared representations referring to intended action effects reside on a higher cognitive level. According to Hommel et al. (2001), this is because the activation of representations of intended action effects (distal representations) is assumed to be the initial step in action planning – the more abstract premotor component – and is a prerequisite for selecting the appropriate motor codes (for evidence for shared high-level representations of perception and action see, e.g., Massen and Prinz, 2007). In line with this, some authors propose that more abstract, higher-level action representations might be involved in understanding action-related language (e.g., Zwaan, 1999; de Vega, 2008; Pulvermüller, 2008). According to Zwaan and colleagues (Zwaan, 1999; Taylor and Zwaan, 2009), representations evoked by verbal descriptions can be embodied to different degrees: depending on the existence of one's own visual or motor experience, the mental representation of the described situation can be rich or poor, detailed or coarse. For example, descriptions of actions that are not part of one's own action repertoire (such as specific sports) cannot be simulated in detail, and hence the motor system is only slightly involved in their comprehension. Such coarse simulations may draw on higher-level action representations; this might even apply to descriptions of familiar actions when their details remain unspecified (de Vega, 2008).

The finding of the action effect-related ACE contradicts the theory of perceptual symbol systems suggested by Barsalou (1999). According to this theory, linguistic descriptions evoke multimodal representations of their referents, that is, they reactivate associated experiences that are simulated solely by modality-specific systems. In the case of action sentences, mainly experiences of motor states should be simulated. Thus, this approach predicts priming of low-level motor programs (i.e., a movement-related ACE), and it cannot therefore account for the occurrence of the action effect-related ACE.

Similar to Barsalou's (1999) theory, the indexical hypothesis by Glenberg and Robertson (1999, 2000) also assumes that the words in a sentence activate low-level modal representations of referential objects and actions of the words. Yet, beyond that, the indexical hypothesis proposes two additional processes of sentence comprehension which allow to account for the present results: affordances are derived from these representations, that is, the comprehender gains access to potential motor interactions with the referential objects. As a next step, the affordances are combined or meshed into a coherent, executable, and imaginable set of actions. This meshing process is guided by the meaning of the syntactic construction. For instance, double-object constructions evoke the notion that an agent (Subject) transfers an object or something more abstract (Object2) to a recipient (Object1), that is, they activate a schemafor giving (Goldberg,2003). Thus, syntactic constructions are assumed to activate more generalized action schemas or higher-order action representations (see also Bergen et al., 2004; Feldman and Narayanan, 2004). Correspondingly, the finding of the action effect-related ACE can be explained by the indexical hypothesis in the following way: for one thing, through the meaning of the syntactic construction, processing a transfer sentence activates a certain transfer goal (the action effect). For another thing, affordances are derived depending on current goals and the learning history of the comprehender. If the comprehender has learned that, in the current situation, an action effect in a certain direction can only be achieved by making a movement in the opposite direction, then processing a transfer sentence also activates a movement representation opposite to the direction of transfer. In this way, an action effect-related ACE can occur that relies on both high-level distal representations and low-level motor representations. The indexical hypothesis is therefore consistent with the present results, which point to the importance of highlevel representations, as well as with previous results, which point to the importance of motor-level action representations in the comprehension of action-related language (e.g., Glenberg et al., 2008a,b; for a more elaborate account see Glenberg and Gallese, 2012).

## **CONCLUSION**

Altogether, our findings confirm the close coupling of cognition and action and provide further evidence for the

### **REFERENCES**


embodied approach to language comprehension. The presented results revealed that the comprehension of linguistic descriptions of actions involves the activation of higherorder action representations referring to distal effects of these actions.

Moreover, the results indicate that interactions between (declarative) sentence comprehension and (procedural) response selection are highly sensitive to the temporal relationship between the two kinds of processes.

In conclusion, our results suggest that declarative and procedural modes of action processing are less different from each other than is often thought. While they may differ in terms of concomitant mental experiences, they seem to be fairly equivalent in terms of underlying functional mechanisms. They interact with each other and draw on common representational resources, to the effect that high-level processes such as sentence processing can influence action unconsciously. Thus, within the limits of the present paradigm, we can account for our experimental observations without assigning a particular functional (or even causal) role to conscious awareness. However, we do not mean to generalize this conclusion beyond the limits of our paradigm. Other paradigms may invite other kinds of conclusions.

## **ACKNOWLEDGMENTS**

We thank Gudrun Henze, Anne Herbik, Ramona Kaiser, and Katharina Horstkotte for helping with the data collection and Henrik Grunert for building the experimental apparatus. We also thank Arthur Glenberg for helpful comments on an earlier version of this manuscript.

towards response-compatible stimuli. *Cogn. Emot.* 21, 1297–1322.


*Common Mechanisms in Perception*

*and Action: Attention and Performance XIX*, eds W. Prinz and B. Hommel (Oxford: Oxford University Press), 538–552.


motor resonance in language comprehension. *J. Exp. Psychol. Gen.* 135, 1–11.

Zwaan, R. A., van der Stoep, N., Guadalupe, T., and Bouwmeester, S. (2012). Language comprehension in the balance: the robustness of the action-compatibility effect (ACE). *PLoS ONE* 7:e31204. doi:10.1371/journal.pone.0031204

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 March 2013; accepted: 26 April 2013; published online: 21 May 2013.*

*Citation: Diefenbach C, Rieger M, Massen C and Prinz W (2013) Actionsentence compatibility: the role of action effects and timing. Front. Psychol. 4:272. doi: 10.3389/fpsyg.2013.00272*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Diefenbach, Rieger, Massen and Prinz. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Persistence of internal representations of alternative voluntary actions

## **Elisa Filevich1,2\* and Patrick Haggard<sup>1</sup>**

1 Institute of Cognitive Neuroscience, University College London, London, UK

<sup>2</sup> Max Planck Institute for Human Development, Max Planck Institut für Bildungsforschung, Berlin, Germany

#### **Edited by:**

Ezequiel Morsella, San Francisco State University, USA

#### **Reviewed by:**

Ezequiel Morsella, San Francisco State University, USA T. Andrew Poehlman, Southern Methodist University, USA

#### **\*Correspondence:**

Elisa Filevich, Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany. e-mail: elisa.filevich@gmail.com

We have investigated a situation in which externally available response alternatives and their internal representations could be dissociated, by suddenly removing some action alternatives from the response space during the interval between the free selection and the execution of a voluntary action. Choice reaction times in this situation were related to the number of initially available response alternatives, rather than to the number of alternatives available effectively available after the change in the external environment. The internal representations of response alternatives appeared to persist after external changes actually made the corresponding action unavailable. This suggests a surprising dynamics of voluntary action representations: counterfactual response alternatives persist, and may even be actively maintained, even when they are not available in reality. Our results highlight a representational basis for the counterfactual course of action. Such representations may play a key role in feelings of regret, disappointment, or frustration. These feelings all involve persistent representation of counterfactual response alternatives that may not actually be available in the environment.

**Keywords: free action, response selection, Hick's law, volition, reselection**

## **INTRODUCTION**

Voluntary action involves selection of one action alternative amongst a series of equally available ones. Rapidly changing environments may impose sudden changes to the set of effectively available alternatives. Imagine a field hockey player running toward the goal with a ball, and coming to face the goalkeeper. Whilst she is deciding whether to push the ball right or left to the goalkeeper, a defense player suddenly comes to block the left side of the goal, leaving the attacker with only one effective alternative (pushing the ball to the right) if she wants to get the goal.

Several components can be identified in these situations of action selection. First, an internal response space must be constructed, containing representations of possible alternative responses (Fletcher et al., 2000). Next, one response (Gold and Shadlen, 2007) must be selected from the response space. Finally, the corresponding action must be prepared and executed (Deecke et al., 1969). Clearly, these processes may be dynamically updated with changes in the environment. For example, the motor plan may need to be adjusted, or completely switched after it has been selected (Wise and Mauritz, 1985; Snyder et al., 2000; Resulaj et al., 2009). A final step in the process is often neglected: the representations of the non-selected (or counterfactual) response alternatives must be dismantled (Logan et al., 1984).

Early views of action selection considered these processes to occur serially (e.g., Keele, 1968). However it is now widely recognized that these are not independent processes, and that they instead occur in parallel, and influence each other by competitive inhibition (Cisek, 2007; Cisek and Kalaska, 2010). For example, there is converging evidence suggesting that action selection is in play until relatively late in the chain of events, and that it co-occurs

with action preparation (Sakai et al., 2000; Cisek and Kalaska, 2005; Klein-Flügge and Bestmann, 2012). Importantly, parallel processing models are not consistent with the notion of a single, definitive process of action selection. Rather,multiple action representations may persist with different levels of activation, through an extended period of preparation, until one dominant action emerges from the competition. On this view, an action may be actively entertained even if it is not the "front-runner" in the response selection process, and ends up being counterfactual. This parallelism of voluntary action representations could allow for adequate flexible behavior in a dynamic environment. Suppressing a response alternative too early may make it harder to reactivate it if circumstances require. A concomitant disadvantage of parallelism is that non-selected alternatives may remain needlessly activated.

Unselected, counterfactual action representations have proved difficult to study for the simple methodological reason that they have no behavioral output. Therefore, most current knowledge comes from animal studies where it is sometimes possible to record directly the neural signals involved in decision processes (Cisek and Kalaska, 2005), or from human imaging studies where the relative reward value of the non-chosen alternative should be tracked (Boorman et al., 2011; Rushworth et al., 2011).

We have developed an indirect, behavioral measure of counterfactual action based on the number of alternatives in the response space. In a choice reaction task, the reaction time depends strongly on the number of potential response alternatives (response set size). Hick (1952) found monotonic, increasing relations between reaction times (RTs) and set size, now widely known as "Hick's law." Hick's Law is often explained by the additional

time required to compare a stimulus repeatedly with each entry in a stimulus-response look-up table, and thus retrieve the correct action from the response space. Importantly, a crucial distinction must be made, between *external* and *internal* response sets – or "in the world" and "in the brain" c.f. (Gold and Shadlen, 2007). The former refers to the response alternatives that are effectively available in the external environment. The latter refers to the internal representations of the alternatives within the response space. While the external and internal sets should normally match, they need not do so, and only internal set sizes can influence RTs. This possibility allows us to test whether an unselected and unexecuted action is nevertheless represented within the internal response set. In particular, a higher RT than the external set size would predict could potentially be explained by the presence of an additional, counterfactual action representation within the response space.

Hick's law has classically been applied to instructed actions, where a stimulus explicitly tells participants which action to make in every trial. In voluntary actions, by contrast, there are no explicit instructions about which action to make, and the participant instead freely selects one action from the response space. The underlying neural structures of voluntary actions differ from those for instructed actions (Krieghoff et al., 2011). In particular, voluntary action, but not instructed action, in parallel models of action selection may lead to feelings of regret (Boorman et al., 2009). Feelings of regret can be defined as a negative value in the comparison between the outcomes of the chosen alternative with the counterfactual alternative (Coricelli et al., 2005). To make this comparison, the relative value of the chosen and un-chosen alternatives should be computed. Therefore, some representation of the counterfactual alternatives must remain until after the choice was made. Thus, regret implies the persistent representation of actions that were included in the response space, but were not in fact selected.

Here, we asked whether internal response sets retain traces of counterfactual action when the external response set changes, by using an experimental analog of the hockey goal shooting example mentioned at the start of this article. The task necessitated three main features. First, it should present participants with a dynamic response space, in which some freely selected alternatives might suddenly become unavailable. Second, the task should allow the initially selected (but subsequently inhibited) response to be identified. For this aspect, we relied on participant's subjective reports made after the trial. Third, and crucially, the task had to provide an implicit behavioral measure demonstrating the covert representation of unexecuted alternative actions, without explicitly reactivating them.

## **MATERIALS AND METHODS PARTICIPANTS**

Eighteen naïve participants (11 female, mean age ± SD; 24 ± 5 years) took part in the study. One participant did not update their choice following changes in the number of available alternatives, and in fact selected disappeared locations. Their data was therefore excluded from the analysis. This yielded a total of 17 participants. All participants had normal or corrected to normal vision. Procedures were approved by the University College

London research ethics committee and were in accordance with the principles of the Declaration of Helsinki.

## **TASK**

We asked participants to voluntarily select a response from an initially available set of responses. Once an intentional decision had been made, but before the response was executed, the external response set was suddenly reduced. On some trials, therefore, the response that the participant had already selected would become unavailable, and they would need to reselect another, alternative response from the updated external response set. Using RTs as a proxy for the internal response set size, we addressed whether the internal response set had been rapidly updated to match the new external response set, or whether the internal response set lagged behind the external changes (see **Figure 1**). Two scenarios were possible. In the first place, the internal representation of the response alternatives could perfectly track the external response set. Alternatively, the internal representation of the response alternatives could contain a persistent representation of the initially selected and now unavailable response.

Because the size of the response set is different, the two cases can be distinguished using Hick's Law, even if they are behaviorally identical. If initially selected response alternatives were effectively removed from the internal response set once they become unavailable, RTs would increase as a function of the final response set size (and not the initial response set size). Conversely, if the neural representation of the initially selected but now unavailable response is maintained in the internal response set, then RTs would increase as a function of the initial response set size.

Stimuli were displayed on a CRT monitor with a refresh rate of 60 Hz. Participants sat 60 cm away from the screen. The experiment consisted of six blocks of 100 trials and lasted for approximately 50 min. Each trial belonged to one of four experimental conditions that will be described thoroughly below. These were *no change* (34% of the total number of trials), *instructed selection* (20%), and *original selection* and *reselection* (together, 46%). The exact proportion was partly determined by the participants' behavior, see below.

At the start of each trial, one to four different numbers were displayed on the screen, arranged around a central fixation cross with 2˚ eccentricity (see **Figure 2**). Number location and identity were randomized. All stimuli were displayed over a black background. We used numbers as targets because we sought to minimize the working memory load on both target selection and recall, minimizing in turn the problems and potential biases associated with subjective report.

The set of numbers first presented in each trial was the initial response set. Numbers in the initial response set were randomly sampled without repetition from the numbers 1–9 excluding the number 5 (see *instructed* condition below). The numbers in the initial response set were displayed in white for either a short or a long exposure times (ETs). Short ETs were periods of 550 ms with a random jitter of a maximum of ±200 ms. Long ETs were periods of 1500 ms with random jitters of a maximum of ±200 ms. Participants were asked to covertly select one of the numbers in the initial response set during the ETs, and to prepare to move a cursor and click on the number using a large trackball mouse

cases, the internal representation of the response alternatives must be updated to reflect the changes in the external environment. Two

internal response set may be resilient to change, and the internal response set may lag behind changes in the external environment.

(Keytools Ltd., Southampton, UK). They were instructed to make a new free selection on each trial, avoiding stereotyped responses or sequential patterns. The ETs was varied to allow more or less time for this initial selection process. Short and long ETs were randomly assigned to experimental trials.We assumed that longer ETs would allow for stronger action preparation and a stronger, and therefore more persistent, encoding of the initial response set.

After the ETs, the fixation cross changed color, from white to red. This was the "Go" signal, instructing participants to move to the selected target number. Crucially in the *original selection* and *reselection* conditions, a subset of the numbers in the initial response set disappeared at the time of the Go signal. The remaining numbers changed color and turned to green. The number of disappearing targets varied from zero to *n* − 1, where *n* is the initial response set size. Consequently, the *final* response set varied from one item to the full initial response set. The positions and identity of the disappearing numbers were fully randomized.

After the go signal, participants moved the trackball to bring the cursor to the selected number, and clicked the mouse button. They were instructed to make this action as quickly and as rapidly as possible. If the originally chosen number had disappeared from the final response set, participants were asked to reselect a different number, from the smaller final response set of available alternatives. Otherwise, they were to execute the originally selected response.

After clicking on the target number, participants reported which number they had *originally* chosen in all conditions, and regardless of which number they had clicked on. In this way, trials in which reselection had occurred could be identified on the basis of subjective report. We assumed that reselection had occurred if the reported original choice did not match the clicked number, and if the original choice had disappeared. Otherwise, trials were classified as "simple selection" (see **Figure 3B**). At debriefing, no participant reported difficulties in the report of their original choice.

Our task crucially required that participants did indeed select from the initial response set, rather than simply waitfor the appearance of the final response set, as only then could we evaluate the persistence of suddenly unavailable response alternatives. We used two strategies to ensure that participants attended to the initial response set, and selected an action from it. First, we included *instructed* trials. If the number"5"was found in the initial response set, participants were instructed to always select this number, and execute the corresponding response on seeing the Go signal. No numbers were removed from the initial response set in *instructed* trials. Second, to prevent participants from simply waiting for the final response set, we included a *no change* condition, in which the Go signal appeared but no numbers were removed from the initial response set (see **Figure 3A**).

To encourage action preparation following the initial response set, and therefore inhibition of the prepared response,we rewarded participants for quick responses (measured as time to click on the target relative to the onset of the go signal, i.e., the sum of reaction time and movement time). We informed participants that they would get 0.5 p extra for every trial that was quicker than their own average in the preceding block. Therefore, the experimental design discouraged the potential strategy of ignoring the initial response set completely and waiting for the final response set instead. Participants earned on average £ 2.23 (±SD £0.03).

To discourage participants from adopting a predetermined choice strategy, both the identity and the spatial location of the targets varied randomly from trial to trial. Randomly sampled numbers were displayed on the vertices of a square with an angular tilt of either 0˚ or 90˚. The position of the targets was fully randomized.

in the initial response set, participants were instructed to click on it (instructed condition). In the no change condition, the number choice was intentional. **(B)** Conditions with non-matching initial and final response set sizes. In the original selection and reselection conditions, some numbers

choice. Instead, a trial was sorted as reselection if participants reported having chosen a number that had become unavailable. n<sup>i</sup> and n<sup>f</sup> indicate the initial and final response set sizes, respectively. They were not displayed in the experiment.

Importantly, the initial and final response set sizes were not correlated. This allowed us to test for the independent contributions of these parameters to RT.

Before starting the experiment, participants had a short practice session of 40 trials. The mean movement time during this practice session was recorded to calculate the number of rewarded trials in the first experimental block. The data from the practice session were otherwise not further analyzed.

#### **DATA ANALYSIS**

Reaction times were calculated as the time at which the mouse speed first increased above zero after the go signal. Because of the screen refresh rate (60 Hz), RTs were obtained with a relatively low temporal precision, of one sample every ∼16.7 ms. Trials with RTs under 100 ms were rejected, as potentially anticipatory. In the same way, trials with RTs longer than 1000 ms were rejected. Movement times were calculated as the time taken to click within 20 pixels of the number target, relative to the Go signal. Therefore, movement times included RTs.

To calculate the relationship between RT and response set size, linear regressions were obtained for each participant's data, and the slopes analyzed RTs were fitted with a linear function, rather than the logarithmic relation normally used for Hick's Law, for several reasons. First, RTs increase with increasing response set sizes, but a strict logarithmic relationship has not been tested (Lau et al., 2004; Van Eimeren et al., 2006; Kühn et al., 2009; Zhang et al., 2012). Second, the response set sizes considered here (1–4) would fall on the rising arm of the logarithmic function, so could be approximated linearly. Finally, our aim was to establish whether RTs were affected by either initial or final response set sizes, regardless of the precise form of the relationship.

## **RESULTS**

Participants made few omission errors in *instructed* trials. There was a mean omission rate of 0.94 ± 0.3%. After rejection of omission trials, an average (±SD) of 114 ± 2 trials were included in the *instructed* condition, 186 ± 5 trials in the *no change* condition, 126 ± 17 trials in the selected condition and 150 ± 18 trials in the *reselected* condition. The original selection and reselection conditions presented the highest variability in the number of trials across participants because the exact number of trials that fell in each condition depended on each participant's behavior. Based on the total number of trials and the combination of initial and final response set sizes, the mean expected number of reselection trials was 139, comparable to the figure obtained.

Trials with RTs shorter than 100 ms were rejected, as potentially anticipatory. Overall, 26 ± 24% trials were rejected, across all participants and conditions. The high number of mean rejected trials was mainly driven by two participants who had a strong tendency (>60% of trials) to anticipate their movements to the Go signal. The results reported here remained valid when we excluded the data from these participants from the analysis.

Differences between the proportions of rejected trials were examined. A two-way 4 × 2 repeated measures ANOVA with the factors of condition and ET revealed significant differences between conditions (*F*3,48 = 48.16, *p* < 0.001). The highest proportion of rejected trials due to anticipation was in the *instructed* condition, where participants knew that the instructed target ("5") would not disappear. There were no significant differences between the proportion of rejected trials in the critical *selected* and *reselected* conditions (*F*1,16 = 0.98,*p* = 0.338).Average numbers of trials are shown in **Table 1**.

#### **CONDITIONS WITH NO CHANGES IN SET SIZE**

In trials in which no numbers disappearedfrom the initial response set, initial and final response sets were equivalent, so the only factors of interest were condition (*instructed*/*no change*) and ETs (short/long). The RT averaged across all participants for each response set size is shown in **Figure 4**.

To explore the effect of ET and voluntary selection, we obtained the mean RTs collapsed across all response set sizes for each condition (**Figure 4**). A 2 × 2 ANOVA with the factors of condition (*no change*/*instructed*) and ET revealed a main effect of ET (*F*1,16 = 22.26, *p* < 0.001), suggesting that participants prepared their motor response during the ET.

There was also a significant main effect of condition (*F*1,16 = 27.02, *p* < 0.001), and a significant interaction effect (*F*1,16 = 11.01, *p* = 0.004). Follow-up *t*-tests revealed a significant difference between the *instructed* and *no change* conditions for short ETs (*t* <sup>16</sup> = −2.14, *p* = 0.048), and a strongly significant difference for long ETs (*t* <sup>16</sup> = −8.73, *p* < 0.001). The longer RTs for the *no change* condition compared to the *instructed* condition reveal an RT cost for voluntary action selection in the former.

To test the effects of set size on RTs, we fitted linear regressions to each participant's data in each cell of the design, and performed a two-way repeated measures ANOVA with factors of condition (*no change*/*instructed*) and ET (short/long) on the estimated slope parameters (see **Figure 4**).

Results revealed a main effect of condition (*F*1,16 = 14.28, *p* = 0.002) but no significant main effect of ET (*F*1,16 = 0.09, *p* = 0.773) or interaction effect (*F*1,16 = 0.14, *p* = 0.709). The significant main effect of condition was expected, and consistent with Hick's law. Whereas voluntary response selection amongst larger sets should have an RT cost in the *no change* condition, the response set size should have no effect on instructed RTs, since there is no selection process other than visually searching for the target.

**Table 1 | Final mean (**±**SD) number of trials per condition after rejecting incorrect and anticipatory trials.**


#### **CONDITIONS WITH DIFFERENT INITIAL AND FINAL RESPONSE SETS**

In the *selected* and *reselected* conditions, one or more numbers were removed from the response set. Consequently, the initial and final response set sizes differed. RTs as a function of either the initial set size or final response set sizes are shown in **Figure 5**.

Our design carefully ensured that the initial and final response set sizes were not correlated. For example, trials with an initial response set of four could have any of the possible final response set sizes of 1, 2, 3, or 4. Similarly, trials with an initial response set of 3 could have any of the possible final response set sizes of 1, 2, or 3. Therefore, the relationship between mean RT and the size of the initial response set could potentially differ from the relationship between the mean RT and the size of the final response set. We could use this design feature to investigate whether the internal representation was updated to match the final set size. Updating the internal representation predicts a stronger relation between RT and final set size than between RT and initial set size, while failure to update predicts the opposite pattern.

To examine the effects of reselection on RTs, we first analyzed the mean RTs, irrespective of set size, by incorporating the factor of response set into a 2 × 2 × 2 ANOVA with the factors of ET (short/long), condition (*selected*/*reselected*), and set (initial/final). There was a main effect of ET (*F*1,16 = 39.24, *p* < 0.001), suggesting that long ETs allowed for stronger motor preparation than shorter ETs, and validating the ET manipulation. We also found a significant main effect of condition (*F*1,16 = 14.06, *p* = 0.002). We interpret this as an RT cost of the inhibition of the original action plans and the process of number reselection.

We also found a main effect of response set (*F*1,16 = 6.33, *p* = 0.023), with larger response sets generally associated with larger RTs. There was a significant response set × ET interaction (*F*1,16 = 9.54, *p* = 0.007). There were no other significant effects.

More importantly, we analyzed the slopes of the individual linear fits for the RTs in a repeated measures 2 × 2 × 2 ANOVA. Differences between the slopes allowed us to infer whether an updating of internal representation did or did not occur when disappearance of an item from the initial response set triggered reselection. The mean slope estimates for the *selected* and *reselected* conditions are shown in **Figure 6**.

Results from the three way ANOVA revealed a significant main effect of ET (*F*1,16 = 6.87, *p* = 0.019), presumably also revealing the results of increased motor preparation.

Importantly, we also found a main effect of response set (*F*1,16 = 5.12, *p* = 0.038), and a significant response set × ET interaction (*F*1,16 = 6.551, *p* = 0.021). This indicates that the initial response set size had a stronger impact on RTs than the final response set size. No other effects were significant.

To investigate the response set × ET interaction, the slope estimates were collapsed across conditions. Follow-up *t*-tests revealed no differences between initial and final response set sizes in the short ET conditions (*t* <sup>16</sup> = 0.28, *p* = 0.779), but clearly significant differences between the initial and final response set sizes in the long ET conditions (*t* <sup>16</sup> = 4.04, *p* < 0.001).When participants had enough time to represent the initial response space and prepare actions (in long ET conditions), the number of initially available response alternatives seem to have a measurable effect on RTs, even if the selected alternative later became unavailable. This suggests that the internal representation of the response space, once it is built, is not fully updated if the number of response alternatives is reduced. That is, the internal representation of the response space displays persistence.

#### **RTs AS A FUNCTION OF THE BINARY LOGARITHM OF THE RESPONSE SET SIZE**

We obtained the above results by estimating linear fits of the RTs as a function of the different response set sizes (see Materials and Methods). As a control, we also explored whether the same results would be valid if the RTs were described as a function of the

binary logarithm of response set size, as established by Hick's law (Hick, 1952). Because a maximum of four response set sizes are not enough to produce reliable estimates of the parameters of a logarithmic function, we considered the linearized response set size. In other words, we conducted the same analyses, but considering RTs as a function of the binary logarithm of the response set size, rather than as a function of the response set size itself. This analysis yielded similar results as the ones reported above.

A 2 × 2 × 2 ANOVA on the slopes of the RTs as a function of the binary logarithm of the response set size revealed a main effect of response set (*F*1,16 = 9.17, *p* = 0.008), a marginally significant effect of ET (*F*1,16 = 4.41, *p* = 0.052), and a marginally significant effect of condition (*F*1,16 = 4.41, *p* = 0.051). There was a trend for a significant response set × condition interaction (*F*1,16 = 3.99, *p* = 0.06). No other effects were significant.

Finally, in the analysis reported above, we calculated RTs as the first point in time at which the speed of the cursor was non-zero. To ensure that the obtained results were not an artifact of the way in which the RTs were defined, we performed the same analysis on the slopes of the linear fits in two alternative ways. First, we calculated RTs as the time at which the cursor had covered 25% of the total distance in each trial. Second, we performed the same analysis on movement times, calculated as the time to click on the final target. In both cases, the three way repeated measures ANOVA

yielded a significant effect of response set (*F*1,16 = 12.8, *p* = 0.003 and *F*1,16 = 13.32, *p* = 0.002, respectively).

In sum, the main effect of response set size remained after addressing the relationship between RTs and response set sizes in a way that followed more strictly the formulation of Hick's law. The effect was not highly sensitive to the way in which the RTs were calculated.

## **DISCUSSION**

In this study we aimed at answering the question of whether selected response alternatives that are no longer available in the environment nevertheless remain represented in the brain. The internal response sets driving RTs corresponded more closely to the initial than to the final external response sets. This suggests that the internal response sets are in fact resilient to external change, and "lag behind" sudden changes in the external environment.

## **CONDITIONS WITH EQUIVALENT INITIAL AND FINAL RESPONSE SET SIZES: INSTRUCTED AND NO CHANGE**

We first compared the *no change* and *instructed* conditions, where the initial and final set sizes were indistinguishable. Whereas the *no change* condition required intentional response selection, the *instructed* condition required only visual search to identify the instructed target. The *no change* condition was informative of the relationship between the RTs and the response set size. RTs in the *no change* condition showed a positive linear relation with response set size. Conversely, RTs in *instructed* trials did not depend on the response set size (i.e., the estimated slopes of the linear trends did not differ significantly from zero). This may seem surprising, as monotonic increases in instructed RTs as a function of response set size have been well

documented (Hick, 1952). In this experiment, however, the ET temporally separated the processes of visual search and action initiation. This may explain the null effect of response set size on instructed RTs. Importantly, this validates the ET manipulation, aimed at allowing for selection and motor preparation, and suggests that the results cannot be easily explained by visual search processes.

We analyzed the effects of ET (short *vs*. long) and condition (*instructed vs*. *no change*). Shorter ETs were associated with longer mean RTs and with steeper dependencies of RTs on response set size. This suggests that longer ETs allowed for movement preparation, reducing the mean RT and decreasing the impact of increasing the number of response alternatives.

## **CONDITIONS WITH UNEQUAL INITIAL AND FINAL RESPONSE SET SIZES: "ORIGINAL SELECTION" AND "RESELECTION"**

In the selection and reselection conditions, some target numbers disappeared from the initial response set. Because the initial and final response set sizes were not correlated, we incorporated them as independent factors in statistical analyses.

In both *selected* and *reselected* conditions, trials with longer ETs showed shorter RTs. This effect mirrors what was found in the *no change* and *instructed* conditions, and once again suggests that response selection and motor preparation took place during the ET. Further, as expected, longer RTs were found in reselection trials due to the cost of response inhibition and reselection.

Crucially, an analysis of the slopes of the linear fits revealed stronger dependencies of the RTs with initial response set sizes as compared to final response set sizes. This suggests that the initial response set size had a stronger influence on the RTs than the final response set size. This effect was strongest particularly for long ET conditions. Longer ETs may allow for stronger and more stable encoding of the initial response set size,leading to more persistence of the internal representation of the initial response set.

A comparison of the *selected* and *reselected* conditions revealed that reselection processes led to longer RTs, in all response set sizes. This is consistent with an RT cost of abandoning the initially selected response alternative and selecting a new one.

Interestingly, however, the persistence of the response space was not directly related to the disappearance of the selected alternative itself. The comparison between the *selected* and *reselected* conditions did not reveal differences in the RT slopes. This suggests that the persistence of the initial set is not uniquely driven by the disappearance of the selected alternative. Instead, these results suggest that it is the non-specific encoding of the entire response set that makes it persistent in face of external change.

This effect recalls Schacter and Addis (2007) hypothesis, from the very different field of episodic memory, that the brain flexibly recalls all past events, in order to constructively simulate and prepare for future events. Over a shorter time scale such as our task, recollection of all possible past experiences of action selection, could speculatively take place in the context of working memory.

Intriguingly, a marginally significant effect of condition on the RT slopes was found when the binary logarithm of the response set size was considered instead of the absolute response set size. This analysis was motivated by exploring a strict implementation of Hick's law, which establishes that instructed go RTs vary linearly with the binary logarithm of the response set. However, there is no solid empirical evidence for such a strict implementation of Hick's law, so the potential effects of intentional selection remain speculative.

Additional controls showed that the significant effect of response set size was not an artifact of the way in which RTs were measured. Two additional controls considered complete movement times, or measured RTs as the time at which the distance traveled by the cursor was 25% of the final distance. In both cases, a statistically significant effect of response set size was found.

## **PERSISTENT REPRESENTATIONS OF VOLUNTARILY SELECTED RESPONSE ALTERNATIVES**

We have previously argued that the distinction between externally instructed and internally driven action systems (Krieghoff et al., 2011) is also applicable to action inhibition (Filevich et al., 2012). In other words, we suggest that neural systems for externally instructed inhibition (as in the case of stop signal tasks) do not fully overlap with those for internally driven action inhibition. Moreover, the two systems may differ quantitatively as well as qualitatively: intentional decisions for both action and inhibition may have a weaker neural signal, or lower levels of evidence, than their instructed counterparts (Fleming et al., 2009; Filevich and Haggard, 2012).

The present experiment revealed a process of inhibition and subsequent reselection for intentional actions. How do these processes fit with the intentional/instructed distinction mentioned above? First, the initial selection of a number to which to move was intentional, in the sense of internally generated rather than externally triggered. The inhibition of the action was achieved by removal of the selected item. This seems to have some features in common with intentional inhibition, such as the absence of an overt stop signal, but some features in common with external inhibition, since there is an environmental change that triggers inhibition. Our analyses focused on the persistence *vs*. flexible updating of an internal representation of the response space for intentional action selection.

Our result seems relevant to the previously introduced concepts of parallelism and strength of evidence in action selection, in two ways. First, persistence of action representations is naturally linked to parallel action preparation. If an intentionally selected action is not removed when the response set is updated, it will potentially remain a candidate for selection, and may competitively interfere with subsequent action selection processes. Second, since the to-be-inhibited item apparently remained in the internal representation of the response space, we might conclude that the processes of intentional inhibition are relatively weak.

We could not directly compare flexibility of the internal representation of response spaces for internally driven and externally instructed selection, because we did not remove any numbers from the initial response set in *instructed* trials. Indeed, removing options in an *instructed* condition would be meaningless. If the removed item were different from the instructed item, then no inhibition would be expected, and if the removed item was the instructed item, the task effectively becomes a NoGo task rather than an instructed action task. Rather, we used our *instructed* condition as a baseline for modeling the relationship between RTs and set sizes. Therefore, we cannot directly compare the persistence of action alternatives between our intentional selection and an externally triggered alternative.

Neurophysiological data suggest that in cases of active maintenance of multiple response alternatives (in this case, selected and non-selected), all representations are scaled down in proportion to the total number of active representations. For example, in a saccade-to-target experiment, Purcell et al. (2012) found that firing rates in monkeys' visual and motor areas decreased monotonically with increasingly larger response sets (two, four, or eight total items).

The results from Purcell et al. (2012) provide a plausible neural explanation for Hick's law. Larger response set sizes will set a lower baseline firing rate from which perceptual evidence needs to be accumulated until it reaches a decision threshold (Gold and Shadlen,2001; Smith and Ratcliff, 2004). In turn, this may translate into longer accumulation times, manifested as longer RTs.

Accumulator models, traditionally restricted to perceptual decision-making, have recently been extended to voluntary choices in human behavior (Zhang et al., 2012). These, "voluntary" accumulator models, analogous to perceptual models, may relate to the present findings. Different neuronal assemblies may gather "voluntary" information for each target. Speculatively, spiking activity in neuronal assemblies that correspond to the alternatives that are no longer available may not be fully inhibited immediately after target disappearance. In line with the results reported by Purcell et al. (2012), initial firing rates of each neural assembly may be lower for larger initial response set sizes in this experiment, leading to longer intentional RTs.

Together, these results suggest an interesting corollary of the relative weakness of internally driven decisions mentioned earlier (Fleming et al., 2009; Filevich and Haggard, 2012). Weak internal decisions may set lower baseline firing rates for the representations of each of the potential response alternatives. In turn, these weak representations may not efficiently inhibit the representations of the unavailable response alternatives.

## **CONCLUSION**

Our findings suggest that persistence of unselected options in the internal response space could be one reason why people often feel they "could have done otherwise." This feeling may be sufficient to generate a feeling of freedom, even when the current external environment in fact limits what an agent can do.

Several important mental health disorders can be linked to counterfactual representation, including frustration, (expected outcomes of chosen actions are not obtained), regret (desire of having selected the alternative response alternative), and

#### **REFERENCES**


act or inhibit action. *Exp. Brain Res.* 223, 341–351.


rumination (persistence of these feelings across time). These processes and states all require a continued representation of action alternatives that are not actually available. Previous investigations of these concepts have been limited by the difficulty of relating them to control of actual behaviors. Our concept of persistent internal representation of items in the response space may offer a window into understanding these processes.

#### **ACKNOWLEDGMENTS**

This work was supported by the Wellcome Trust, an Overseas Research Students award from the British Council (Elisa Filevich), an European Science Foundation-European Collaborative Research Project/Economic and Social Research Council grant (RES-062-23-2183), and an ESRC Professorial Fellowship (ES/J023140/1).

intending not to do something. *J. Neurophysiol.* 101, 1913.


cortex: a review. *Vision Res.* 40, 1433–1441.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 March 2013; accepted: 02 April 2013; published online: 06 May 2013.*

*Citation: Filevich E and Haggard P (2013) Persistence of internal representations of alternative voluntary actions. Front. Psychol. 4:202. doi: 10.3389/fpsyg.2013.00202*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Filevich and Haggard. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Reconciling the influence of task-set switching and motor inhibition processes on stop signal after-effects

#### *Joaquin A. Anguera1 \*, Kyle Lyman1,2, Theodore P. Zanto1, Jacob Bollinger <sup>1</sup> and Adam Gazzaley1*

*<sup>1</sup> Departments of Neurology, Physiology and Psychiatry, Center for Integrative Neurosciences, University of California San Francisco, San Francisco, CA, USA <sup>2</sup> Feinberg School of Medicine, Medical Scientist Training Program, Northwestern University, Chicago, IL, USA*

#### *Edited by:*

*T. Andrew Poehlman, Southern Methodist University, USA*

#### *Reviewed by:*

*Daniel J. Upton, Monash University, Australia Nicole C. Swann, University of California, San Francisco, USA*

#### *\*Correspondence:*

*Joaquin A. Anguera, Departments of Neurology, Physiology and Psychiatry, Center for Integrative Neurosciences, Mission Bay – Sandler Neurosciences Center, University of California San Francisco, MC 0444, 675 Nelson Rising Lane, Room 502, San Francisco, CA 94158, USA e-mail: joaquin.anguera@ucsf.edu*

Executive response functions can be affected by preceding events, even if they are no longer associated with the current task at hand. For example, studies utilizing the stop signal task have reported slower response times to "GO" stimuli when the preceding trial involved the presentation of a "STOP" signal. However, the neural mechanisms that underlie this behavioral after-effect are unclear. To address this, behavioral and electroencephalography (EEG) measures were examined in 18 young adults (18–30 years) on "GO" trials following a previously "Successful Inhibition" trial (pSI), a previously "Failed Inhibition" trial (pFI), and a previous "GO" trial (pGO). Like previous research, slower response times were observed during both pSI and pFI trials (i.e., "GO" trials that were preceded by a successful and unsuccessful inhibition trial, respectively) compared to pGO trials (i.e., "GO" trials that were preceded by another "GO" trial). Interestingly, response time slowing was greater during pSI trials compared to pFI trials, suggesting executive control is influenced by both task set switching and persisting motor inhibition processes. Follow-up behavioral analyses indicated that these effects resulted from between-trial control adjustments rather than repetition priming effects. Analyses of inter-electrode coherence (IEC) and inter-trial coherence (ITC) indicated that both pSI and pFI trials showed greater phase synchrony during the inter-trial interval compared to pGO trials. Unlike the IEC findings, differential ITC was present within the beta and alpha frequency bands in line with the observed behavior (pSI *>* pFI *>* pGO), suggestive of more consistent phase synchrony involving motor inhibition processes during the ITI at a regional level. These findings suggest that between-trial control adjustments involved with task-set switching and motor inhibition processes influence subsequent performance, providing new insights into the dynamic nature of executive control.

**Keywords: motor inhibition, EEG, stop signal, after-effects, ERSP**

## **INTRODUCTION**

The act of attempting to inhibit an executed response is one of the best characterized examples of cognitive control. In recent years, response inhibition has been extensively studied through the use of the stop signal paradigm (Logan and Cowan, 1984; Verbruggen and Logan, 2009), with the inhibition process modeled as a horse race between "GO" and "STOP" processes (Logan and Cowan, 1984). This model suggests that the probability of a successful inhibition (SI) depends on the outcome of a race between two independently operating processes ("GO" and "STOP"). While this model describes performance on a given trial, it does not consider how these "GO" and "STOP" processes affect performance on the next trial. Several stop signal studies have shown that response time (RT) to a "GO" signal on trial n is slower when the immediately preceding trial (*n* − 1) was a "STOP" trial vs. a "GO" trial (Rieger and Gauggel, 1999; Verbruggen et al., 2005b; Li et al., 2008; Verbruggen and Logan, 2008). Interestingly, "STOP" trials have only two outcomes, SI or failed inhibition (FI) of the motor response, and RTs during "GO" trials are slowed regardless of whether it follows successful or FI trials. To assess the neural mechanisms underlying these stop signal after-effects, the present study looked to characterize specific neural processes engaged on a "GO" trial when following a trial that contained a previous "Successful Inhibition" (pSI), a previous "Failed Inhibition" (pFI), or a previous "GO" trial (pGO).

Behavioral after-effects are not specific to the stop signal task, as post-error slowing (Rabbitt and Rodgers, 1977) and negative priming (Neill et al., 1990, 1992; Tipper, 2001) studies have regularly reported a similar increase in RTs on subsequent trials. These types of effects have been explained by several different hypotheses of behavior involving task switching (Mayr and Kliegl, 2000; Schneider and Logan, 2005; Kray, 2006), although one is especially relevant to the present investigation: negative priming manifested through the persistence of motor inhibition processes (Kramer et al., 1992; Rieger and Gauggel, 1999). This type of behavior is considered to be indicative of *between-trial control adjustments*(Rieger and Gauggel, 1999), a perspective that is comparable to Allport et al. (1994) "task-set inertia" hypothesis that suggests task features (stimulus-based, and not motor-related) on trial "*n* − 1" can interfere with processing on trial "*n*" when the task requirements change. Thus, responding to a "GO" signal on trial "*n*" requires changing from a "STOP" associated state if the preceding trial contained a "STOP" signal.

However, the between-trial control interpretation has been challenged by evidence suggesting that after-effects following SI performance are actually a reflection of a *repetition-priming effect* (Verbruggen et al., 2008). These researchers examined the directionality of the "GO" signal on a pSI trial vs. the direction on trial *n*, and reported that *only* for trials where the directionality of the "GO" signal repeated were these post-SI "GO" trials slower than repeating "GO" trials (pGO; Verbruggen et al., 2008). Alternatively, when the direction was different, no difference was observed between these trial types, which these authors interpreted as evidence for repetition-priming effects. This finding was consistent regardless of stimulus, category, or even during a selective stop signal task (Verbruggen et al., 2008), with subsequent work demonstrating short-term RT adjustments after unsuccessful stopping and long-term after effects persisting even 20 trials after a SI (Verbruggen and Logan, 2008). However, a more complete understanding of these two positions (repetitionpriming effect vs. between-trial control adjustments) may be better understood by corroborating these behavioral effects with the underlying neural processes.

EEG studies of the stop signal task have regularly characterized inhibition-related neural activity using event-related potentials (ERPs; Pliszka et al., 2000; Kok et al., 2004; Ramautar et al., 2004; Schmajuk et al., 2006). These ERPs have characterized the neural activity immediately following a "STOP" event, which does not facilitate the present goal of explaining the effect seen on subsequent "GO" trials. Upton and colleagues (Upton et al., 2010) recently examined N2 and P300 effects on these subsequent "GO" trials, reporting conditional differences that reflect memory retrieval processes with respect to negative priming. However, the act of inhibiting an executed response involves a host of neural regions whose activity is not always best examined post-stimulus, especially considering that these after-effects are influenced by processes occurring during the preceding inter-trial interval (ITI). Indeed, there is a rich literature describing the involvement of different regions such as the right inferior frontal gyrus (rIFG), the medial frontal cortex, and primary motor cortex during stop signal inhibition (Braver et al., 2001; Aron et al., 2007; Jahfari et al., 2010). With respect to stop-signal after effects, activity at any of these regions may be contributing to the reported behavioral effect. Thus, an analysis that facilitates the examination of neural activity at each of these regions prior to the subsequent "GO" stimulus onset may provide a deeper understanding of these after-effects.

One such approach involves the use of frequency based analyses such as coherence (Roach and Mathalon, 2008), as this approach has been shown to be a powerful way of interrogating markers of cognitive control in a spontaneous EEG spectrum (Makeig, 1993; Neuper and Klimesch, 2006). There are several theories postulating that goal-directed behaviors are supported by local synchronization of neural oscillations within specific cortical areas, with this activity integrating spatially distant brain regions into a unified functional network (Tononi and Edelman, 1998; Varela et al., 2001). The examination of single-trial EEG dynamics across theta (4–7 Hz), alpha (8–12 Hz), and beta (15– 30 Hz) frequency bands using inter-trial coherence [ITC; a measure of consistency across trials (cf. Makeig et al., 2002)] has been useful in further characterizing activity associated with voluntary response inhibition (Yamanaka and Yamamoto, 2010; Müller and Anokhin, 2012). Similarly, inter-electrode coherence (IEC; a similar measure of consistency between electrodes across trials) has also been used to characterize motor inhibition-related activity from a large-scale network perspective across different frequencies (Shibata et al., 1998; Serrien et al., 2005; Gladwin et al., 2006; Moore et al., 2008; Tallet et al., 2009; Brier et al., 2010; Yamanaka and Yamamoto, 2010; Liang et al., 2012). Specific to interrogating stop signal after-effects, the use of ITC and IEC to examine the temporal and spatial synchronization is theoretically ideal for interrogating motor inhibition processes before (and after) these subsequent "GO" stimuli at different electrodes/regions.

Either IEC or ITC associated with prefrontal, pre-motor, or primary motor areas may reflect the observed stop-signal after effects. However, it is unclear *when* their potential influence would be most apparent, or *how long* this effect would persist: just prior to the subsequent "GO" stimulus, persisting through stimulus onset, or lasting all the way through the subsequent "GO" response itself. Here we hypothesized that both temporal and spatial phase synchrony would increase as greater cognitive control is called for (i.e., following a "STOP" trial), with a conditional change in each type of coherence being the greatest for pSI trials, followed by pFI trials and then pGO trials during the ITI at the electrodes nearest to the aforementioned regions associated with motoric inhibition. We anticipated that this approach would inform these previous behavioral (and more recent ERP) findings by highlighting how well-characterized measures of inhibitory activity are influencing these after-effects that have been attributed to task-switching processes. Thus, the neural signatures underlying these after-effects may provide a deeper understanding of how these potential explanations contribute to the observed behavior in a temporal and regional specific manner.

## **METHODS**

## **PARTICIPANTS**

Twenty-one healthy young individuals (mean age: 23.5 years; range 18–30 years; 10 males) were recruited from the San Francisco community. These individuals signed a UCSF approved consent form in order to participate in the study and were paid \$15/ h for their time. All participants were screened to ensure that they were healthy, had normal to corrected vision and were right handed. EEG data for 3 participants was corrupted during data acquisition, leaving 18 (9 male) participants.

## **EXPERIMENTAL PROCEDURES**

The stop signal paradigm consisted of "GO" and "STOP" trials, with each "GO" trial having a left- or right-pointing arrow (the "GO" stimulus) displayed on a computer screen for 1000 ms. On a "STOP" trial (25% of the 100 trials), the participant attempted to stop their response when a stop signal (a vertical arrow) appeared shortly following a "GO" stimulus. On these "STOP" trials, the time interval of 250 ms between "GO" signal and "STOP" signal onsets (e.g., stop signal delay) changed systematically according to each participant's performance. It became 50 ms longer after each successful stopping performance, making it harder to inhibit, and 50 ms shorter after each unsuccessful inhibition, making it easier to inhibit. The staircase algorithm ensured that the task was equally challenging and difficult for each individual, providing approximately 50% successful and 50% unsuccessful inhibition trials. The stop signal delay was calculated for each "STOP" trial. The stop signal reaction time (SSRT) was computed for individual subjects by subtracting the mean stop signal delay from the mean "GO" trial RT. Each "STOP" stimuli was displayed for 1000 ms—(current stop signal delay); thus the "GO" stimuli presentation time was equal to the time remaining from the aforementioned "STOP" difference from 1000 ms **(Figure 1)**. A mean ITI was randomly jittered between 1.6, 1.7, 1.8, and 1.9 s to optimize statistical efficiency.

Participants were instructed to respond as fast as possible with a left or right key press (using index and middle fingers of the right hand) while maintaining a high level of accuracy. Responding quickly to the "GO" stimulus was emphasized by explaining to the participants that they were not to delay their response in anticipation of the stop signal, as it would not always be possible to withhold their response after detection of the stop signal. This was reinforced by showing participants their mean RT to the "GO" trials following each block of 100 trials, along with the message, "The fastest average RT for your age group is currently 422 ms, so try to reach or beat it!" This time of 422 ms was the fastest RT for a pilot group of 5 participants (data not presented). Participants practiced 80 trials of

**FIGURE 1 | Task schematic for each trial type. (A)** pGO = a "GO" trial following a "GO" trial, **(B)** pFI = a "GO" trial following a failed inhibition (FI) trial, **(C)** pSI = a "GO" trial following a successful inhibition (SI) trial. "GO" stimuli were presented for [1000 ms—the stop signal delay] calculated for each "STOP" signal event. The inter-trial interval (ITI) was between 1.6 and 1.9 s in length.

the stop signal task, then performed 6 blocks of 100 trials for the study. Participants also performed 100 trials of just the "GO" task (no "STOP" signals presented) to assess baseline RT behavior (RT baseline task). This task was performed separately from the other stop signal task blocks (always prior to any stop signal blocks), and also had a jittered ITI to match all methodological parameters used in the task excluding the presence of "STOP" trials.

## **BEHAVIORAL ANALYSIS APPROACH**

The outcome of a single trial fell into one of three categories: go trials (GO) on which no stop signal appears, FI trials in which a stop signal appears but a response is still made, and SI trials in which a stop signal appears and no response is made. To evaluate the stop signal after-effect, all "GO" trials were divided into three different bins based on whether they were preceded by a GO trial (pGO), a FI trial (pFI), or a SI trial (pSI) (**Figure 1**). Trials for pGO, pSI, and pFI were also stratified by ITI duration to evaluate whether this jittered time interval affected subsequent RTs.

As previously described, Verbruggen et al. (2008) reported RT differences associated with the directional congruency of the subsequent "GO" trial arrow direction between pSI/pFI and pGO trials. The logic employed by these researchers was that if between-trial control adjustments are being made after successful inhibition trials, then one should observe longer pSI vs. pGO RTs regardless of whether the "GO" stimulus from trial *n* – 1 is repeated. Alternatively, if these after-effect following successful response inhibition are driven by repetition priming, then pSI RTs should be longer than pGO RTs *only* for trials where the direction of the "GO" stimulus repeats (see also Mayr et al., 2003) which would also argue against the Rieger and Gauggel (1999) persistence of inhibition interpretation. Thus, we also further stratified the pGO, pSI, and pFI trials by whether or not the directionality of the "GO" stimuli (i.e., pointing left or right) during the pGO/pSI/pFI trials were congruent with the previous "GO" stimuli. For example, if the GO stimulus in trial "*n* - 1" was a left pointing arrow and the GO stimulus in trial "*n*" was a right pointing arrow, then the stimulus in trial "*n*" was considered incongruent.

## **EEG RECORDING AND DATA PREPROCESSING**

Participants were seated in an armchair in a dark room with the screen ∼85 cm from the participants' eyes. Neural data were recorded with a BioSemi ActiveTwo 64-channel EEG acquisition system in conjunction with BioSemi ActiView software (Cortech-Solutions). Signals were amplified and digitized at 1,024 Hz with a 16-bit resolution. All electrode offsets were *<*25 k*-*. Anti-aliasing filters were used and data were band-pass filtered between 0.01 and 100 Hz during data acquisition. Preprocessing was conducted using Analyzer software (Brain Vision, LLC). Eye-movements artifacts were removed through an independent components analysis (ICA). The raw EEG-data were referenced to an average reference off-line and time-locked to stimulus onset for each trial type ("GO" stimulus for pGO, pSI, pFI). Trials were further cleaned of excessive peak-to-peak deflections, amplifier clipping, or other artifacts using a voltage threshold of 75 mV. Epochs (−3000 to +1000 ms, to encompass the previous trial and subsequent "GO" trial) for each trial type were time locked to the "GO" stimuli (see **Figure 1**).

## **CHANNEL/FREQUENCY SELECTION**

In attempt to narrow the focus of our subsequent analyses, we chose to focus on specific frequency bands at the C3, FCz, and F6 electrodes, as previous work has identified motor-related inhibitory activity at each of these electrodes (or their underlying regions) within certain frequencies. For example, the C3 electrode has been regularly used to examine inhibition-related processes originating near the motor cortex within the alpha frequency band (Serrien et al., 2005; Moore et al., 2006; Yamanaka and Yamamoto, 2010; Serrien and Sovijarvi-Spape, 2013). Thetarelated activity near the FCz electrode has also been regularly examined given its proximity to premotor regions and associations with motor inhibition (Trujillo and Allen, 2007; Cavanagh et al., 2009; Brier et al., 2010; Yamanaka and Yamamoto, 2010; Liang et al., 2012; Müller and Anokhin, 2012). Finally, betarelated activity near the F6 electrode has been frequently interrogated with respect to right-lateralized stopping-related responses near this region with the stop-signal task (Serrien et al., 2005; Schmajuk et al., 2006; Liang et al., 2012; Swann et al., 2012) as well as with increased phase locking associated with "switch" trials (Gladwin et al., 2006; Serrien, 2009; Tallet et al., 2009). While the present analysis was driven by apriori hypotheses focusing exclusively on the frequencies associated with certain regions/electrodes in terms of motoric inhibition, we report the findings of the same analyses for all electrode/frequency combinations in an effort to provide full disclosure given that other studies have also associated certain frequencies at different regions with inhibition-related processes.

## **IEC AND ITC ANALYSES**

We examined IEC and ITC to test the phase consistency between (IEC) and within (ITC) electrodes for each condition (pGO, pSI, pFI) for each frequency band. These trials were convolved using EEGLAB's complex Morlet wavelet decomposition (Delorme and Makeig, 2004) to resolve frequencies from 4 to 65 Hz to calculate phase for each trial. Phase locking values (PLVs) for both IEC and ITC were computed by measuring the inter-trial variability of the phase difference at each time–frequency point (Lachaux et al., 1999). This procedure yields a PLV measure bound from 0 to 1 such that 0 represents random phase differences across trials while 1 indicates a consistent phase difference. For IEC, this involved calculating PLVs between our "seed" electrode/frequency (i.e., F6 in the beta band, C3 in the alpha band, FCz in the theta band) and all other electrodes. After calculating coherence from each of our three primary electrodes of interest to all other electrodes, we then created a global index of IEC for each frequency band by calculating the mean PLV to all electrodes for each condition (cf. Trujillo et al., 2005). For ITC, this involved calculating PLVs across trials at these seed electrodes. Within-subject differences in trial numbers were accounted for using a standardized bootstrap method (1000 permutations).

## **STATISTICAL ANALYSIS APPROACH**

We examined IEC and ITC for each condition (pGO, pSI, pFI) at each electrode within each frequency band at three distinct time periods. First, we examined the patterns of coherence prior to the "GO" trial stimulus onset during the prestimulus interval (−1000 to 0 in 100 ms intervals) using a condition × time window ANOVA at each electrode and frequency. Next, we examined the coherence patterns immediately following the moment of stimulus presentation (visual interrogation revealed peak activity to be centered between 0 and 200 ms). Finally, we examined coherence centered around the "GO" response using each individual's mean RT as the median and their own standard deviation as the window of interest. Follow-up contrasts were performed to further characterize any interactions observed, with a Greenhouse-Geisser correction utilized when assumptions of sphericity were not met. Planned contrasts for each frequency-associated electrode between each trial type were used to uncover any potential relationship(s) exhibiting a similar pattern to the behavioral findings. Furthermore, while our analyses were focused within these three different time periods, our motivation for this study was inherently driven by those results associated within the ITI. Thus, we report on observed activity following stimulus presentation but did not have any apriori hypotheses regarding patterns of activity at these time points.

## **RESULTS**

### **BEHAVIORAL RESULTS**

Performance data describing the stop signal task are presented in **Table 1**. The effect of ITI and condition on RTs was tested using a Two-Way ANOVA of ITI (1.6 s, 1.7 s, 1.8 s, 1.9 s) × condition (pGO, pSI, pFI), revealing main effects of ITI [*F(*3*,* <sup>51</sup>*)* = 6*.*3, *p <* 0*.*01] and condition [*F(*2*,* <sup>34</sup>*)* = 19*.*3, *p <* 0*.*001], but no condition X ITI interaction [*F(*6*,* <sup>102</sup>*)* = 1*.*46, *p* = 0*.*20; see Supplementary Figure 1]. A within-subjects contrast of ITI for linear effects was significant [*F(*1*,* <sup>17</sup>*)* = 16*.*31, *p* = 0*.*001], indicating that RTs decreased as the ITI decreased in length from 1.9 to 1.6 s across all conditions. Follow-up *t*-tests examining the main effect of condition revealed that the RTs for the pGO condition (418 ms ± 19) were significantly faster than both the pFI (459 ms ± 16, *t* = 4*.*22, *p <* 0*.*01) and the pSI (477 ms ± 19, *t* = 4*.*96, p *<* 0.001) conditions, with the pSI trials being slower than the pFI trials (*t* = 2*.*2, *p <* 0*.*05; see **Figure 2**). Thus, there was a significant influence on RT based upon the identity of the


previous trial type, with longer ITIs corresponding with longer RT in general.

To test whether these effects were driven by between-trial control adjustments vs. repetition-priming effects (Verbruggen et al., 2008), a separate condition (pGO, pSI, pFI) × directional congruency of the "GO" arrows on trial *n* − 1 and *n* (same vs. different direction) ANOVA revealed incongruent directionality of the "GO" stimuli vs. the preceding trial led to faster RT in a differential fashion for each condition [*F(*2*,* <sup>34</sup>*)* = 4*.*14, *p* = 0*.*024]. Follow-up analyses revealed a significant difference between pSI and pGO trials regardless of whether they were directionally congruent [*t* = 6*.*00, *p <* 0*.*001] or incongruent (*t* = 4*.*44, *p <* 0*.*001), with the same pattern observed for pFI and pGO trials (for each comparison *t >* 2*.*60, *p <* 0*.*018) as well as pSI and pFI trials (for each comparison *t >* 3*.*03, *p <* 0*.*007). Unlike Verbruggen et al. (2008), whose repetition-priming interpretation was based upon no difference being present between pSI and pGO trials on incongruent trials, the directional differences observed here suggests the involvement of between-trial control adjustments.

#### **NEURAL ANALYSES**

The following neural analyses focused on IEC and ITC activity within specific frequency bands at the C3 (alpha), FCz (theta), and F6 (beta) electrodes in accord with previous work describing this type of activity at these electrodes (or their underlying regions) within certain frequencies bands. In all subsequent analyses (except those stating otherwise), we observed the same pattern of significance when comparing pGO and pSI as when comparing pGO and pFI (see Supplementary Tables 1, 2 for an overview of all subsequent analyses and findings, and Supplementary Figures 2–7 for all other ITC frequency/electrode combinations not driven by apriori hypotheses). Thus, in describing these results, we combined the description of these analyses (even though their analyses were performed separately) as indicated by the pSI/pFI term. For all of the analyses examining the prestimulus period, the factor of time window (100 ms intervals from −1000 to 0) was included in each respective ANOVA;

however, there were no interactions involving this factor in any analyses.

#### *IEC during the inter-trial interval*

A Two-Way ANOVA involving time window (10) and condition (3) for theta activity at the FCz electrode revealed a main effect of condition [*F(*2*,* <sup>34</sup>*)* = 15*.*39, *p <* 0*.*0001]. Comparing the pSI/pFI and pGO conditions, there was an effect of condition with pSI/pFI showing greater IEC than pGO [*F(*1*,* <sup>17</sup>*) >* 21*.*20, *p <* 0*.*0001 for each comparison], but no effect of condition between pSI and pFI trial types [*F(*1*,* <sup>17</sup>*)* = 2*.*43, *p* = 0*.*13; see **Figure 3**; for result of other frequency bands at this electrode, see Supplementary Table 1].

Using the same approach for alpha activity at the C3 electrode, a main effect of condition was present [*F(*2*,* <sup>34</sup>*)* = 16*.*27, *p <* 0*.*001]. Comparing the pSI/pFI and pGO conditions, there was an effect of condition, with pSI/pFI showing greater IEC than pGO [*F(*1*,* <sup>17</sup>*) >* 10*.*80, *p <* 0*.*007 for each comparison], with an effect of condition between pSI and pFI trial types (pFI *>* pSI; *F(*1*,* <sup>17</sup>*)* = 6*.*63, *p* = 0*.*02; see **Figure 3**; for result of other frequency bands at this electrode, see Supplementary Table 1).

Using the same approach for beta activity at the F6 electrode, a main effect of condition was present [*F(*2*,* <sup>34</sup>*)* = 11*.*73, *p <* 0*.*001]. Comparing the pSI/pFI and pGO conditions, there was an effect of condition, with pSI/pFI showing greater IEC than pGO [*F(*1*,* <sup>17</sup>*) >* 11*.*80, *p <* 0*.*005 for each comparison], but no effect of condition between pSI and pFI trial types [*F(*1*,* <sup>17</sup>*)* = 2*.*64, *p* = 0*.*12; see **Figure 3**; for result of other frequency bands at this electrode, see Supplementary Table 1]. Given that the directional differences observed within the behavioral data suggest the involvement of between-trial control adjustments, these IEC findings would support this interpretation as both "STOP" trial types demonstrated a difference from pGO trials in terms of greater global coherence. The exact same pattern of effects were also observed when restricted to only the electrodes of interest (e.g., FCz, C3, F6), confirming a conditional change in global coherence during the "STOP" vs. "GO" trial types.

## *IEC after "Go" stimulus onset and centered around the "Go" response*

For each time period, a similar pattern emerged: there was greater pSI/pFI than pGO IEC [*F(*1*,* <sup>17</sup>*)* ≥ 7*.*97, *p* ≤ 0*.*012 for each comparison and time period], but no difference present between pSI and pFI trial types [*F(*1*,* <sup>17</sup>*)* ≥ 2*.*43, *p* ≤ 0*.*14 for each comparison and time window; see Supplementary Table 1]. Thus, as with the ITI IEC findings, both "STOP" trial types demonstrated a difference from pGO trials that was congruent with the behavioral observed with these same trial types. As with the ITI findings, the exact same pattern of effects were also observed when restricted to only the electrodes of interest as during the ITI analysis.

## *ITC during the inter-trial interval*

Using the same Two-Way ANOVA analysis approach described above for IEC, theta activity at the FCz electrode again revealed a main effect of condition [*F(*2*,* <sup>34</sup>*)* = 246*.*00, *p <* 0*.*001]. Comparing the pSI/pFI and pGO conditions, there was an effect of condition, with pSI/pFI showing greater ITC than pGO [*F(*1*,* <sup>17</sup>*) >* 393*.*00, *p <* 0*.*001 for each comparison]. Comparing pSI and pFI, there was an effect of condition [pFI *>* pSI; *F(*1*,* <sup>17</sup>*)* = 5*.*15, *p* = 0*.*03; see **Figure 4**; for result of other frequency bands at this electrode, see Supplementary Table 2].

Using the same approach for alpha activity at the C3 electrode, there was an effect of condition [*F(*2*,* <sup>34</sup>*)* = 250*.*00, *p <* 0*.*001], with follow up analyses comparing pSI/pFI and pGO also revealing an effect of condition [*F(*1*,* <sup>17</sup>*) >* 419*.*00, *p <* 0*.*001 for each comparison]. Between pSI and pFI trial types, there was a trend again toward significance [pSI *>* pFI; *F(*1*,* <sup>17</sup>*)* = 3*.*36, *p* = 0*.*08;

**FIGURE 4 | Theta inter-trial coherence (ITC) at electrode FCz. (A)** Bar graph displaying mean ITC averaged over the −1000 to 0 ms interval, with 0 ms as GO stimulus onset. **(B)** Line plot illustrating ITC from −1000 to +1000 ms, with the dark gray highlighting the ITI, the light gray bar after 0ms highlights maximal coherence following stimulus onset, and the dashed lines indicating ITC centered around the "Go" response.

**FIGURE 5 | Alpha inter-trial coherence (ITC) at electrode C3. (A)** Bar graph displaying mean ITC averaged over the −1000 to 0 ms interval, with 0ms as GO stimulus onset. **(B)** Line plot illustrating ITC from −1000 to +1000 ms, with the dark gray highlighting the ITI, the light gray bar after 0 ms highlights maximal coherence following stimulus onset, and the dashed lines indicating ITC centered around the "Go" response.

see **Figure 5**; for result of other frequency bands at this electrode, see Supplementary Table 2].

Using the same approach for beta activity at the F6 electrode, there was an effect of condition [*F(*2*,* <sup>34</sup>*)* = 234*.*00, *p <* 0*.*001] with follow up analyses comparing pSI/pFI and pGO yielding an effect of condition in each case [*F(*1*,* <sup>17</sup>*) >* 366*.*00, *p <* 0*.*001 for each comparison]. Comparing pSI and pFI trial types revealed an effect of condition, with greater pSI activity [*F(*1*,* <sup>17</sup>*)* = 9*.*80, *p <* 0*.*01; see **Figure 6**; for result of other frequency bands at this electrode, see Supplementary Table 2] Thus, we confirmed our hypotheses regarding the influence of regionally-specific alpha and beta ITC during the ITI as function of different trial types that mirrored the observed pSI *>* pFI *>* pGO behavioral effect.

## *ITC after "Go" stimulus onset*

As above, analyses were performed comparing conditions within a particular frequency band at each electrode. For each comparison, the same pattern was observed: there was greater pSI than pGO ITC [*F(*1*,* <sup>17</sup>*) >* 150*.*00, *p <* 0*.*001], greater pFI than pGO coherence [*F(*1*,* <sup>17</sup>*) >* 94*.*00, *p <* 0*.*001], but no difference between pSI and pFI [*F(*1*,* <sup>17</sup>*)* ≤ 2*.*72, *p >* 0*.*12; see Supplementary Table 2]. Thus, as with the ITI IEC findings, both "STOP" trial types demonstrated a difference from pGO trials that was congruent with the behavioral observed with these same trial types.

## *ITC centered around the "Go" response*

Examining theta ITC at electrode FCz centered around the moment of response to the subsequent "GO" stimuli, we observed an effect of condition [*F(*2*,* <sup>34</sup>*)* = 20*.*60, *p <* 0*.*001]. Follow up analyses indicated that ITC was greater for pSI/pFI than pGO trials [*F(*1*,* <sup>17</sup>*) >* 33*.*10, *p <* 0*.*001 for each comparison]. However, there were no differences when comparing pSI and pFI [*F(*1*,* <sup>17</sup>*)* = 1*.*42, *p >* 0*.*20; for result of other frequency bands at this electrode, see Supplementary Table 2].

**FIGURE 6 | Beta inter-trial coherence (ITC) at electrode F6. (A)** Bar graph displaying mean ITC averaged over the −1000 to 0 ms interval, with 0ms as GO stimulus onset. **(B)** Line plot illustrating ITC from −1000 to +1000 ms, with the dark gray highlighting the ITI, the light gray bar after 0 ms highlights maximal coherence following stimulus onset, and the dashed lines indicating ITC centered around the "Go" response.

Examining alpha ITC at C3, an effect of condition was again observed [*F(*2*,* <sup>34</sup>*)* = 34*.*21, *p <* 0*.*001]. Comparing pSI/pFI and pGO indicated greater pSI/pFI coherence [*F(*1*,* <sup>17</sup>*) >* 16*.*72, *p <* 0*.*001 for each comparison], with greater alpha ITC for pSI vs. pFI trial types [*F(*1*,* <sup>17</sup>*) >* 6*.*97, *p <* 0*.*05; for result of other frequency bands at this electrode, see Supplementary Table 2]

Finally, examining beta ITC at electrode F6 revealed an effect of condition [*F(*2*,* <sup>34</sup>*)* = 17*.*30, *p <* 0*.*001]. Greater pSI/pFI than pGO ITC was evidenced [*F(*1*,* <sup>17</sup>*) >* 15*.*70, *p <* 0*.*001 for each comparison], with greater beta ITC during pSI vs. pFI trials [*F(*1*,* <sup>17</sup>*)* = 4*.*82, *p <* 0*.*05; for result of other frequency bands at this electrode, see Supplementary Table 2]. Thus, examination of inhibition-related ITC centered around the moment of response showed the same pattern of effects as seen during the ITI for the F6 electrode, but no clear similarities for the other electrodes or periods tested, nor (most importantly) with the observed behavioral effects.

## **DISCUSSION**

Both pSI and pFI trials were slower than pGO trials, replicating previous stop signal after-effect studies (Rieger and Gauggel, 1999; Verbruggen et al., 2005a; Li et al., 2008; Verbruggen and Logan, 2008). However, we also observed (i) pSI trials being slower than pFI trials, (ii) a general effect of ITI on RTs, and (iii) behavioral evidence supporting a between-trial control adjustment interpretation over a repetition-priming explanation. Our neural analyses revealed increased IEC and ITC for "STOP" vs. "GO" trial types, indicative of a difference in cognitive processing for these inhibitory-laden trial types. Critically, the observed pSI *>* pFI *>* pGO pattern of behavior was matched only by the ITC analysis within the beta and alpha frequency bands during the ITI at the apriori specified electrodes. Here we describe how these behavioral and neural findings are indicative of betweentrial control adjustments involved with both task-set switching and motor inhibition processes during these stop signal after-effects.

## **BEHAVIORAL INTERPRETATIONS**

The longer RTs following "STOP" (pSI, pFI) vs. "GO" (pGO) stimuli support the idea of motor inhibition processes persisting from trial "*n* − 1" to trial "n," as this ordering (i.e., pSI *>* pFI *>* pGO) would fit the theoretically perceived amount of inhibitionrelated processes engaged in each condition. Verbruggen et al. (2008) argument for these types of findings reflecting repetition priming effects rather than between-trial control adjustments was based upon the idea that if successful response inhibition is due primarily to repetition priming, then pSI should be longer than pGO *only* for stimulus-repetition trials vs. non-repeating stimulus trials (see also Upton et al., 2010). However, unlike Verbruggen et al. (2008), we did evidence a significant effect for both stimulus-repeating and non-repeating trials, indicative of between-trial control adjustments. Given that similar findings have demonstrated slowing for both correct and incorrect trials following the presentation of infrequent stimuli (Notebaert et al., 2009), the type of adjustment found here agrees with the idea of a shift in strategy following the "STOP" stimuli in line with between-trial control adjustments.

The infrequent nature of stop signals here (appearing on 25% of trials) implies that the likelihood of two stop signals occurring sequentially to be only 6.25%, a percentage that participants could have inferred (but was not directly probed for here) but seems unlikely. As such, the presence of a stop signal in trial "*n* − 1" would theoretically elicit a strategic shift toward making a "GO" response in trial "*n*" moreso than a shift toward making another "STOP" response <sup>1</sup> . Thus, this congruency analysis suggests a conditional strategic effect may be in play when presented with different trial types. It should be noted that unlike the previously mentioned stop signal after-effect studies, longer RTs for pSI vs. pFI trials were also observed. This discrepancy may stem from the ITI jittering approach used here, as the other studies each used a fixed ITI length (Rieger and Gauggel, 1999). Given that changes in ITI have been shown to affect RTs when switching between conditional trial types (Altmann, 2004a,b; Monsell and Mizon, 2006), the variable ITI appears to have influenced not only the difference between "STOP" and "GO" trials, but also revealed the subtle difference between pSI and pFI trial types.

The behavioral analyses are in agreement with the idea that participants may have been anticipating a switch from a "STOP" trial (on trial *n* − 1) to a "GO" trial (trial *n*), leading to these after-effects. Task switching, which involves the active reconfiguration of mental resources when task requirements change (Logan and Delheimer, 2001; Logan and Gordon, 2001; Monsell, 2003; Yeung et al., 2006; Vandierendonck et al., 2010) is known to produce slower RTs in the form of switch costs (Monsell, 2003). This interpretation, which is also in line with the "task-set inertia" hypothesis (Allport et al., 1994), is consistent with the theory that these stop-signal after-effects reflect participants strategically anticipating and subsequently reconfiguring their task goals following both pSI and pFI trials (unlike pGO trials, where a "GO" stimuli was repeated). Evidence for this interpretation is borne out in the neural data, described below.

## **NEURAL FINDINGS REFLECTING MOTOR INHIBITION PROCESSES**

The two neural measures used here, IEC and ITC, each showed similar patterns to the behavioral findings: a conditional increase in phase synchrony for both pSI and pFI trial types vs. pGO trials, such that greater coherence (that is, *less* variability (or *more* consistent) engagement) associated with motor inhibition processes was observed following a "STOP" trial. These findings suggest that the focused engagement of motor inhibition processes persists during the ITI, and having to reset the synchronization of neural oscillations within specific cortical areas from an "inhibitory" state to a "action" state (that is, changing from "STOP" to "GO") <sup>2</sup> underlies the observed behavioral slowing in a manner that is congruent with the "task-set inertia" hypothesis. This interpretation agrees with work describing that the networks involved in mediating stop signal inhibition were also identified during task switching (Kenner et al., 2010), and other studies that reported increased coherence when switching between task sets in the beta (Gladwin et al., 2006; Serrien, 2009; Tallet et al., 2009) and alpha (Serrien et al., 2004; Serrien and Sovijarvi-Spape, 2013) frequency bands. Similarly, fMRI studies have described the engagement of lateral prefrontal regions when overcoming residual cognitive inhibition (Dreher and Berman, 2002; Dreher et al., 2002), with this activity being related to the re-engagement of a previous task set within the same paradigms (Dreher and Berman, 2002). Indeed, recent IEC findings by Müller and Anokhin (2012) have also suggested increased task demands during response inhibition require stronger phase synchronization, with this phase locking indicative of an anticipatory switching process (Gladwin et al., 2006). Thus, the observed pattern of global IEC suggests that regions associated with motor inhibition processes are communicating with a number of other areas as a network when switching from a "STOP" to a "GO" state, with greater synchronization between these regions contributing to the observed behavioral slowing following "STOP" trials.

However, while the IEC metric did not follow the observed pattern of behavior (pSI *<* pFI *<* pGO) that also distinguished between the "STOP" trial types, this effect was present for the ITC analyses. We hypothesized that ITC activity would be best observed during the ITI within certain frequency bands nearest to stop-signal inhibition specific regions, with this activity reflecting greater local (as opposed to global) synchronization associated with motor inhibition processing. Under this premise, ITC differences between pSI and pFI trial types were found within the beta frequency band near the rIFG. Using the task-set inertia hypothesis as a framework, a pSI trial could be considered a "complete" switch as the "STOP" task was successfully performed on the previous trial, whereas a pFI trial would then be an "incomplete" switch trial. This interpretation agrees not only with the premise that increased cognitive demands, like task switching, call for greater coherence but also agrees with other task switching work that has also evidenced increased betaband phase locking preceding switch trials (Gladwin et al., 2006; Serrien, 2009; Tallet et al., 2009). Given that rIFG activity has also shown modulation with stop signal success on the previous trial in fMRI studies (Li et al., 2008), these findings are suggestive of the prior engagement motor inhibition processes influencing switching between task sets which contributes to a RT slowing.

These interpretations are supported by the related alpha ITC (and IEC) findings near the motor cortex during the ITI. The synchronization of alpha power at motor regions has been associated with the inhibitory control (Hummel et al., 2002; Klimesch et al., 2007) and task switching (De Jong et al., 2006). Most related to the present study, the pattern(s) of coherence observed here agree with previous studies utilizing alpha coherence measures to support the theory that task switching engages inhibitory processes to swap between task sets (Serrien et al., 2004; Serrien, 2009; Serrien and Sovijarvi-Spape, 2013). Swann et al. (2009) have previously identified the motor cortex as a downstream target of prefrontal regions with respect to both alpha (and beta)

<sup>1</sup>While this interpretation would seemingly predict that the pSI RTs should be faster than the pGO RTs, the congruency delay described above, as well as the RT cost associated with making such a switch, precludes this from being the case.

<sup>2</sup>The reconfiguration view of task switching (Vandierendonck et al., 2010) is similar in premise to the task-set inertia hypothesis; however, given the design of the present study, we could not directly test this theory given that stimulusresponse mapping did not change at any time during the task.

when engaging motor inhibition processes. In agreement with this interpretation, we observed a trend toward greater responsecentered alpha ITC phase-locking for pSI vs. pFI trials, as well as faster RT during pFI vs. pSI trials, suggesting that pSI trials had inhibitory control processes engaged to a greater extent than pFI trials.

Consequently, we also observed greater theta ITC for the pFI vs. pSI trials at the FCz electrode during the ITI. Greater theta ITC nearest midline frontal regions has previously been associated with voluntary response inhibition processes (Brier et al., 2010; Yamanaka and Yamamoto, 2010; Müller and Anokhin, 2012), with theta- (and beta-) driven coherence amongst the rIFG, preSMA, and primary motor cortex suggested to be critical for inhibitory control during the stop signal task (Liang et al., 2012). However, it should also be noted that theta-band power and oscillatory activity have been associated with conflict monitoring (Hanslmayr et al., 2008; Cavanagh et al., 2009; Nigbur et al., 2012), suggesting that the observed conditional differences may also reflect a combination of multiple cognitive processes. This interpretation would agree with the present findings given that pFI trials would have greater conflict than pSI trials given the presence of an error on the preceding trial.

It is interesting, yet unclear, why these differential patterns of ITC between pSI and pFI trials were no longer present immediately following "GO" stimulus presentation, and inconsistent when examined around the moment of response. The influence of the recently encountered "STOP" trial type is seen to persist beyond the ITI, with the consistent finding of pSI = pFI *>* pGO for both the IEC and ITC within each frequency band immediately after stimulus presentation indicating of a common feature between these trial types (e.g., task set switching). However, it is likely that other cognitive factors like error monitoring (Carp and Compton, 2009; Nigbur et al., 2012) may be in play nearest the moment of response, potentially accounting for the inconsistencies between the neural effect and observed differences in behavior for each condition.

## **RECONCILIATION OF MOTOR INHIBITION AND TASK SWITCHING CONTRIBUTIONS**

We propose the present findings are the product of two principal sources of slowing in the stop signal task: motor inhibition processes and a strategic decision implemented when switching between task sets. It is tempting to speculate that the conditional global IEC effects may better reflect the involvement of task switching processes, while the ITC results highlight the underlying motor inhibition processes engaged during each condition given the similarities to the observed behavior. However, confirming this interpretation would require further investigation as the present experimental design could not determine whether these outcomes are truly mutually exclusive. Nevertheless, the idea of conditional changes in phase synchrony near motor, pre-motor, and prefrontal regions agrees with elegant TMS work characterizing a functional interaction between pre-SMA, primary motor cortex, and the right IFG specific to action reprogramming trials (Neubert et al., 2010). Indeed, the pre-SMA has been shown to facilitate the correct action on switch trials (Mars et al., 2009) and has a critical relationship with the right IFG during action inhibition (Duann et al., 2009; Obeso et al., 2013). Thus, we cautiously speculate that the increased IEC/ITC observed during the ITI between pSI/pFI and pGO trial types is facilitating set switching by modulating the phase synchrony of activity between right IFG, pre-SMA, and primary motor cortex.

These strategic decisions appear to be influenced by the jittered ITI, which suggests why studies using a fixed ITI may have observed a different pattern of behavioral results. Unlike Verbruggen et al. (2008), whose Experiment 1a findings suggested that the observed slowing could be either task switching or repetition priming (which led to subsequent experiments validating their repetition priming interpretation), our behavioral findings were consistent with the task set switching perspective and subsequently guided our neural analyses. However, an open question that remains involves the differences in the neural correlates associated with repetition-priming and between-trial control adjustments, as the neural findings themselves cannot directly discount the possibility of repetition-priming without assessing directional congruency. The present experimental design resulted in a relatively small number of each directional trial type (the mean number of pSI & pFI congruent & incongruent trials was ∼25 ± 5 each), providing only a modest signal-to-noise ratio for subsequent neural analyses of these repetition-priming effects. However, given that the behavioral results did not statistically support interrogating these directional effects, this analysis was not warranted here. Nevertheless, this proposed task-set switching interpretation is supported by the region-specific ITC that follows the observed behavioral finding (pSI *>* pFI *>* pGO).

Although these findings are limited in terms of their spatial resolution, they were driven by planned analyses focusing on regionally-specific activity patterns within certain frequency bands based on previous motoric inhibition work. Outside of these planned analyses, some of the other electrode/frequency combinations also showed inter-trial coherence that differentiated between the pSI and pFI trial types. However, these adjunct findings and their subsequent interpretations are not clearly supported in the motor inhibition literature; given that these findings were not hypothesis driven and the number of analyses performed, it is unlikely that their significance would survive any type of multiple comparisons correction. Rather, their inclusion is in the interest of full disclosure for researchers interested in the results at these regions using this type of analyses given the number of EEG stop-signal studies in recent years.

While the present study focused on coherence centered near prefrontal and motor cortex electrodes within beta, alpha, and theta frequency bands, these effects likely also involve a network of regions beyond the range of surface EEG spatial resolution used here. For example, areas within the basal ganglia have also been associated with the selection and inhibition of competing motor programs (Mink, 1996; Wichmann and Delong, 1996; Kropotov and Etlinger, 1999; Mink, 2006), including during task switching (Kenner et al., 2010). Future work that integrates these other regions and frequencies, especially in populations known to have deficiencies in task switching (ex. older adults: Kray and Lindenberger, 2000; Mayr and Kliegl, 2000; Kray et al., 2002; Adrover-Roig and Barcelo, 2010; Jimura and Braver, 2010) is warranted to extend the present findings, providing a more thorough characterization of these stop signal after-effects.

## **ACKNOWLEDGMENTS**

We thank N. Barbhaiya & B. Yang for their help with data collection, and B. Voytek & C. Walsh for insights involving the refinement of this manuscript. This work was supported by NIH

## **REFERENCES**


dynamics during action monitoring. *J. Neurosci.* 29, 98–105. doi: 10.1523/JNEUROSCI.4137-08.2009


R01-AG030395 (Adam Gazzaley). Joaquin A. Anguera is supported by a UCSF Institutional Research and Career Development Award (IRACDA).

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Cognition/10.3389/ fpsyg.2013.00649/abstract

(2010). Responding with restraint: what are the neurocognitive mechanisms. *J. Cogn. Neurosci.* 22, 1479–1492. doi: 10.1162/jocn.2009. 21307


*Psychophysiol.* 31, 197–217. doi: 10.1016/S0167-8760(98)00051-8


6926–6931. doi: 10.1523/JNEURO SCI.1396-09.2009


Electrophysiological activity underlying inhibitory control processes in normal adults. *Neuropsychologia* 44, 384–395. doi: 10.1016/j.neuro psychologia.2005.06.005


751–758. doi: 10.1016/S0959-4388 (96)80024-9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 April 2013; accepted: 30 August 2013; published online: 24 September 2013.*

*Citation: Anguera JA, Lyman K, Zanto TP, Bollinger J and Gazzaley A (2013) Reconciling the influence of task-set switching and motor inhibition processes on stop signal after-effects. Front. Psychol. 4:649. doi: 10.3389/fpsyg. 2013.00649*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Anguera, Lyman, Zanto, Bollinger and Gazzaley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The role of consciousness in the phonological loop: hidden in plain sight

## *Bradley R. Buchsbaum\**

*Department of Psychology, Rotman Research Institute, Baycrest Hospital, University of Toronto, Toronto, ON, Canada*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA T. Andrew Poehlman, Southern Methodist University, USA*

#### *\*Correspondence:*

*Bradley R. Buchsbaum, Department of Psychology, Rotman Research Institute, Baycrest Hospital, University of Toronto, 3560 Bathurst St., Toronto, ON M6A 2E1, Canada e-mail: bbuchsbaum@ research.baycrest.org*

We know from everyday experience that when we need to keep a small amount of verbal information "in mind" for a short period, an effective cognitive strategy is to silently rehearse the words. This basic cognitive strategy has been elegantly codified in Baddeley and colleagues model of verbal working memory, the phonological loop. Here we explore how the intuitive appeal of the phonological loop is grounded in the phenomenological experience of subvocal rehearsal as consisting of an interaction between an "inner voice" and an "inner ear." We focus particularly on how our intuitions about the phenomenological experience of "inner speech" might constrain or otherwise inform the functional architecture of information processing models of verbal working memory such as the phonological loop; and how, indeed, how ideas about consciousness may offer alternative explanations for the dual nature of inner speech in verbal working memory.

**Keywords: working memory, phonological loop, inner ear, inner voice, consciousness**

## **THE ROLE OF CONSCIOUSNESS IN THE PHONOLOGICAL LOOP: HIDDEN IN PLAIN SIGHT**

Working memory is a cognitive system for the maintenance, manipulation, and monitoring of information that is not currently available in the sensory environment. There is extensive empirical evidence showing that working memory is capacity limited: that one can only retain 3 or 4 independent items or objects "in" working memory at a time (Cowan, 2001; Marois and Ivanoff, 2005). But what does it mean for an item—an internal mental representation—to be "in" working memory? A functional or operational definition might say that for something to be in working memory, it must be readily accessible and can be *reported* or otherwise *described* by a subject under study. According to this definition, a way to find out what a person currently holds in working memory is simply to ask them. If we define working memory in this way, that is, as the current contents of memory that are available for subjective report, then we may say that working memory consists only of consciously accessible information.

A key historical precursor to working memory, the Jamesian concept of primary memory, was identified more or less directly with the contents of consciousness. Many modern theorists also see a close connection between working memory and consciousness. For example, Cowan (1993) has proposed that while many mental representations may be in an "activated state" at any given time, only those representations that are within the capacitylimited "focus of attention," a concept closely related to conscious awareness, are accessible within working memory. Baars (Baars and Franklin, 2003) has argued that consciousness is associated with a limited capacity "global workspace," akin to working memory, whose focal contents are broadcast to widely distributed specialized networks in the brain.

In the classic working memory model of Baddeley and colleagues (Baddeley and Hitch, 1974; Baddeley, 1992, 2003; Repovs and Baddeley, 2006), however, consciousness is not an explicit motivating force for the logic and structure of the theory. Nevertheless, certain aspects of the model are often informally identified with some characteristics of conscious experience. This is especially clear in the case of the verbal component of working memory, the "phonological loop," where the resemblance between the model and subjective phenomena seems to be more than merely metaphorical. Our present goal is to show that even a seemingly consciousness-averse information-processing model such as the phonological loop owes something to an introspective analysis of conscious experience. We focus particularly on how our intuitions about the phenomenological experience of "inner speech" might constrain or otherwise inform the functional architecture of information processing models of verbal working memory such as the phonological loop; and how, indeed, the analysis of consciousness may suggest alternative interpretations of the fundamental nature of inner speech in verbal working memory.

## **THE MULTI-COMPONENT WORKING MEMORY MODEL**

The goal of the working memory model (Baddeley, 1992) is to provide a basic functional description of how internal mental representations are maintained online during complex cognitive processing. It consists of two so-called "slave systems," the visuospatial scratchpad and the phonological loop, which are dedicated to the storage of visual and verbal information, respectively. The visuospatial scratchpad and the phonological loop are conceived of as buffers, that is, as containers of highly processed information and are not directly involved in the perceptual analysis of sensory stimuli. Both of these storage subsystems are controlled and monitored by a superordinate cognitive control mechanism called the "central executive." While the visuospatial scratchpad is described as a single storage component (but see Logie; Logie and Pearson, 1997), the phonological loop consists of two sub-components, a storage component called the phonological store and a maintenance component known as the articulatory rehearsal process. The phonological store can hold speech-based information for a brief period of time (approximately 2 s per item) before it is lost to decay. The role of the articulatory rehearsal process is to counteract this decay by periodically "refreshing" the contents of the phonological store by way of subvocal speech.

## **INNER SPEECH AS A MNEMONIC STRATEGY**

Because of the importance of language and communication in human cognition, memory for verbal information has been the topic of a great deal of research in the cognitive sciences over the last 50 years. A somewhat trivial (and by now nearly anachronistic) but oft-cited example of the need in everyday life for verbal working memory, is to keep the digits of a phone number "in mind" after reading them from a phonebook or hearing them from a telephone operator. There is a period of time in between receiving the number and dialing it where the ordered sequence of digits must be maintained in working memory; and during this interval most people will "repeat the numbers to themselves," either overtly or covertly, as a way of keeping the digits conscious and accessible. But what does this behavior, this routine cognitive strategy, tell us about the kind and nature of the internal codes that are used in verbal working memory?

One might ask of course whether subvocal rehearsal is actually beneficial to memory performance. This question has been answered by testing subjects' memory for lists of verbal items while preventing rehearsal by requiring them to concurrently articulate an irrelevant word (e.g., "hiya") during a delay period interposed between stimulus perception and recall. Many studies have shown that blocking rehearsal through "articulatory suppression" has a strong negative effect on recall performance, suggesting that the cognitive strategy of rehearsal is indeed useful (e.g., Baddeley et al., 1984). A second obvious question is whether for rehearsal to be an effective strategy, the to-be-remembered verbal items must be spoken aloud; if so, it would suggest that rehearsal serves merely as a kind of trick to "re-present" the items to the auditory perceptual system through external sensory feedback loop. In fact, however, studies have shown that verbal rehearsal is beneficial to memory even when it is subvocal and thus produces no external auditory feedback (e.g., Murray, 1968). Here we note that this finding also comports with phenomenological experience: when we "silently talk to ourselves"—when we subvocally rehearse—we seem to *hear* a dim but unmistakable voice; we are *listening* to this voice, and we typically identify this voice as our own. The empirical demonstration that subvocal rehearsal is beneficial to short-term verbal recall, combined with the subjective experience that internal speech involves both an *inner voice* and an *inner ear*, offers intuitive support for the basic architecture of the phonological loop model of verbal working memory, which posits the existence of two such communicating components.

## **SENSORY AND MOTOR CODES IN THE PHONOLOGICAL LOOP**

A fundamental aspect of the phonological loop model is that it involves the repeated conversion between two codes: one that is a (quasi-sensory) phonological code and one that is an (quasimotor) articulatory code (Wilson, 2001). Both of these codes represent verbal content and the transfer from one format to the other does not involve in a net gain or loss of information in the system. Although we have noted that the dual coding premise appeals to our subjective experience of the inner voice and inner ear during covert speech, from an information processing standpoint it seems rather like a pointless game of representational ping-pong. Indeed, Baddeley and Hitch (1974) had initially attempted to explain the main empirical findings of verbal working memory research more parsimoniously in terms of a single articulatory component, without the need for an auditory/perceptual store. This was based on the strong evidence for the critical role of speech production processes in verbal span tasks. For instance, individual differences data showed that the faster a person is able to articulate a set of words, the greater his or her verbal memory span (Landauer, 1962). In addition, sets of words that take longer to articulate result in poorer memory performance than sets of shorter duration words (the word-length effect Baddeley et al., 1975); and, as mentioned previously, blocking subvocal rehearsal through articulatory suppression impairs verbal short-term memory.

Several lines of evidence, however, ultimately compelled the addition of the phonological store component and with it the dual coding view of verbal working memory was established (Salame and Baddeley, 1982). First, neuropsychological investigations showed the existence of patients with dramatically reduced auditory-verbal short-term memory in presence of preserved speech production and auditory comprehension abilities (Shallice and Warrington, 1977; Shallice and Vallar, 1990). Second, articulatory suppression eradicates the phonological similarity effect when verbal stimulus presentation is visual, but not when it is auditory. This finding suggested that the phonological similarity effect was based on an auditory-perceptual code rather than an articulatory one. Third, the ability to make rhyme judgments on a pair of visually presented words is unaffected by articulatory suppression (Baddeley and Lewis, 1981). Fourth, the presentation of irrelevant speech during immediate verbal memory has a deleterious effect on serial recall (Jones and Morris, 1992; Beaman and Jones, 1998), suggesting the existence of a representational code more closely tied to the auditory-sensory system than to the articulatory-motor system.

To account for these data, Baddeley and colleagues split the articulatory loop into an articulatory control process and a phonological store, which act in concert to retain verbal information in working memory (Salame and Baddeley, 1982). In the new model, neither component is on its own capable of supporting maintenance of verbal information in working memory, each has as it were an Achilles heel. The articulatory rehearsal process has no storage capacity of its own, but can *refresh* the contents of the phonological store, which are otherwise subject to rapid timebased decay. The phonological store has a memory capacity of its own, but no internal means of reactivating its decaying contents. Thus, as neither component is self-sufficient, damage to either one of these components should result in severe degradation in the performance of the system. Indeed, the interdependence of two such components is supported by neuropsychological data showing that patients with severe dysarthria, and thus a damaged articulatory control process, have greatly reduced verbal working memory (Baddeley and Wilson, 1985); and, as already mentioned, patients with temporo-parietal lesions have been described with intact speech production and comprehension abilities, but impaired auditory-verbal short-term memory spans.

## **THE INNER EAR, THE INNER VOICE AND THE PHONOLOGICAL LOOP**

We have briefly reviewed the historical development of the phonological loop and some of the empirical evidence that led to the fractionation of the verbal component of working memory into an articulatory and phonological component. It is interesting to note that the evolution of the phonological loop converged on an architecture that is more compatible with phenomenal experience than its purely articulatory precursor. It may be instructive to consider whether this congruence between introspective evidence and the structure of an information-processing model is more than a coincidence, or whether it may have a deeper significance.

A seemingly arbitrary aspect of the phonological loop is the claim that the articulatory control process has no internal storage capacity. This might translate, in phenomenological terms, to: "the inner voice cannot hear itself speak," or: "the inner voice is deaf." If we, for the sake of argument, endow the articulatory control process with storage capacity and the ability to reactivate its own contents (i.e., as in original articulatory loop model), then from an information processing standpoint the component becomes self-sufficient and self-referential: *it is a voice that can hear itself speak*.

Putting aside behavioral considerations for or against such an architecture, it seems to run counter to the introspective evidence telling us that inner speech is a private version of outer speech. Thus, the auditory-perceptual quality of the auditory imagery of the inner ear is *like* hearing external speech, just as when we imagine a patch of green light it is (phenomenologically) *like* seeing a patch of green light (Place, 1956; Smart, 1959; Shepard and Chipman, 1970). Moreover, during inner speech, verbal information constitutes the content of the auditory imagery of the inner ear, and as such is consciously reportable. We cannot say the same for the inner voice: although one can describe a feeling of agency during inner speech (Morsella et al., 2011), this feeling does not carry any linguistic content, and there are no other articulatorymotoric sensations that can be described as representing a verbal message. Thus, an introspective analysis of the phenomenology of inner speech is in favor of the existence of two separable conscious components, and it is not difficult to identify a resemblance between these two phenomena and the functional components of the phonological loop.

## **THE NEED FOR AN OBSERVER OF MOTOR PROGRAMS**

One might say that the inner voice is only identifiable as marker of agency, conveying the feeling that: "it is *you* that is speaking;" whereas the inner ear carries the conscious content of the message: "this is *what* you are saying." Indeed, the conscious experience of inner actions, including speech production, lack reportable content apart from indicators of agency such as urges, plans, and intentions (Morsella et al., 2011). To enable self-awareness of the content of motor speech programs, such action representations must first be as it were *rendered into sensory-perceptual space*. Thus, we may say that the content of motor programs are not introspectable, they cannot be reflected upon, without first being in some sense realized and observed. This may simply be a necessary property of a self-conscious organism: it cannot anticipate the content of its own actions before these actions have been either explicitly executed or internally simulated (Libet et al., 1982). Another way to understand the impenetrability of the content of motor programs is to assume that neural computations and conscious representations are necessarily independent of one another. In other words, a computational process cannot observe itself: viz. *the inner voice cannot hear itself speak*. We may further note that whereas the computational goal of the action system is to encode motor programs that determine an organism's future interactions with the environment, the primary role of the perceptual system is to decode and represent the content of the sensory world. In this sense, then, the auditory-perceptual system is well suited to perform its regular role as the *observer* in the cortico-cortico crosstalk that is the neural substrate of inner speech (Buchsbaum et al., 2001; Buchsbaum and D'Esposito, 2008).

## **THE INNER EAR, THE INNER VOICE, AND THE BRAIN**

We have noted a resemblance between subjective experience of inner speech and the two-component structure of the phonological loop. This resemblance may also be seen to extend in to the brain, where even in the 19th century Carl Wernicke referred to the generative process of speech production as consisting of the simultaneous co-activation of "auditory word images," housed in the superior temporal gyrus, and "motor word images" stored in the inferior frontal gyrus; and they were assumed to be connected by a large fiber bundle spanning across the frontal and temporal lobes called the arcuate fasciculus (Eggert and Wernicke, 1874/1977).

Modern functional neuroimaging studies of inner speech in the context of simple working memory tasks where subjects must keep in mind a small set of words or pseudowords over a delay period have essentially verified Wernicke's hypothesis. Many studies have shown that during subvocal rehearsal robust activation is observed in both frontal "motor" regions (Broca's area, premotor cortex) and posterior "sensory" regions (planum temporale, superior temporal sulcus) that are often implicated in speech perception and production processes (Wise et al., 2001; Hickok et al., 2003; Buchsbaum et al., 2001, 2005, 2011). Indeed, the continuous co-activation of inferior frontal and superior temporal brain sites during inner speech has recently been show to persist for as long as 45 s in a task requiring extended inner speech (Fegen et al., submitted), long after transient executive and cognitive control processes that are activated during stimulus encoding have ceased and the subject has entered an automatic "maintenance mode." Thus, Wernicke's notion of a simultaneous reverberation between auditory and motor word images, an idea that has an affinity with phenomenological experience of inner speech, finds support from functional neuroimaging studies of subvocal rehearsal.

## **IMPLICATIONS FOR UNDERSTANDING INNER SPEECH AND VERBAL WORKING MEMORY**

In light of the above discussion, then, one might argue that the "Achilles Heel" of the articulatory rehearsal process is *not*, as is claimed in the phonological loop model, that it lacks *storage capacity,* but rather that it lacks a direct means of delivering information to conscious awareness. Articulatory programs must be routed through the sensory perceptual system to gain access to conscious awareness. Earlier we referred to this aspect of the model as an unnecessary game of representational ping-pong. It is traditionally explained by assuming that the articulatory rehearsal process lacks storage capacity and therefore must continuously access and update representations in the phonological store. However, there is no special reason to assume that the articulatory system lacks storage capacity—in fact, there is reason to think otherwise (e.g., Monsell, 1984; Levelt, 1993). Rather, we propose that the two-component architecture of the phonological loop may be better understood as a emerging from the

## **REFERENCES**


*J. Mem. Lang.* 24, 490–502. doi: 10.1016/0749-596X(85)90041-5


requirement that articulatory programs must first be *witnessed* by a sensory system before they can gain access to consciousness and working memory. If we take this view, then the concept of a single locus for the temporary storage of phonological information is no longer necessary to explain the inner voice/inner ear duality of verbal working memory. Rather, we may dispense with the notion of temporary storage altogether (e.g., Craik and Kirsner, 1974; Ruchkin et al., 2003; Postle, 2006; Buchsbaum and D'Esposito, 2008), and instead propose that the this duality is a fundamental consequence of the conscious impenetrability articulatory motor programs and the corresponding need for a external representational system into which motor output can be projected. In fact, coordinated activity between anterior "motor" systems and posterior "sensory" systems appears to be a general feature of declarative memory systems across multiple sensory modalities and domains (Danker and Anderson, 2010; Buchsbaum et al., 2012); and thus the literal conversation of inner speech may only be a special case of a neurophysiological principle that dictates that conscious thoughts emerge from the coordinated interplay between anterior and posterior brain systems.


preceding unrestricted 'spontaneous' vs. pre-planned voluntary acts. *Electroencephalogr. Clin. Neurophysiol.* 54, 322–335.


working memory: explorations in experimental cognitive psychology. *Neuroscience* 139, 5–21. doi: 10.1016/j.neuroscience.2005.12.061


*Neuropsychological Impairments of Short-Term Memory*, eds G. Vallar and T. Shallice (Cambridge: Cambridge University Press), 11–53.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 July 2013; accepted: 15 July 2013; published online: 08 August 2013.*

*Citation: Buchsbaum BR (2013) The role of consciousness in the phonological loop: hidden in plain sight. Front. Psychol. 4:496. doi: 10.3389/fpsyg.2013.00496 This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Buchsbaum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The influence of high-level beliefs on self-regulatory engagement: evidence from thermal pain stimulation

#### *Margaret T. Lynn1 \*, Pieter Van Dessel <sup>2</sup> and Marcel Brass 1,3*

*<sup>1</sup> Department of Experimental Psychology, Ghent University, Gent, Belgium*

*<sup>2</sup> Department of Experimental Clinical and Health Psychology, Ghent University, Gent, Belgium*

*<sup>3</sup> Behavioral Science Institute, Radboud University, Nijmegen, Netherlands*

#### *Edited by:*

*T. Andrew Poehlman, Southern Methodist University, USA*

#### *Reviewed by:*

*T. Andrew Poehlman, Southern Methodist University, USA Joaquin A. Anguera, University of California San Francisco, USA*

#### *\*Correspondence:*

*Margaret T. Lynn, Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium e-mail: maggie.lynn@ugent.be*

Determinist beliefs have been shown to impact basic motor preparation, prosocial behavior, performance monitoring, and voluntary inhibition, presumably by diminishing the recruitment of cognitive resources for self-regulation. We sought to support and extend previous findings by applying a belief manipulation to a novel inhibition paradigm that requires participants to either execute or suppress a prepotent withdrawal reaction from a strong aversive stimulus (thermal pain). Action and inhibition responses could be determined by either external signals or voluntary choices. Our results suggest that the reduction of free will beliefs corresponds with a reduction in effort investment that influences voluntary action selection and inhibition, most directly indicated by increased time required to initiate a withdrawal response internally (but not externally). It is likely that disbelief in free will encourages participants to be more passive, to exhibit a reduction in intentional engagement, and to be disinclined to adapt their behavior to contextual needs.

**Keywords: free will, beliefs, inhibition, volition, effort, self-control, pain**

## **INTRODUCTION**

The question of whether free will truly exists is an age-old philosophical question, tackled by thinkers ranging from Democritus to Russell. Yet most contemporary scientists have avoided the metaphysical and existential hurdles of free will, and instead investigate its impact on human action: how this phenomenon arises in the mind, and to what extent deterministic beliefs have an effect on our behavior (e.g., Wegner, 2003; Vohs and Schooler, 2008; Baumeister et al., 2009; Rigoni et al., 2011, 2012, 2013). The sensation of control over one's actions is an undeniably ubiquitous feature of human experience. People tend to believe they are responsible for a given action if the causal principles of *consistency, priority*, and *exclusivity* are satisfied that is, if their intentions are consistent with and experienced at a suitable interval prior to the relevant action, and there is no other reasonable explanation for the action arising (Wegner, 2003). Perception of personal control is further considered to be intrinsic, biologically necessary, and protective against environmental stressors (Leotti et al., 2010).

Social psychological research has recently investigated the degradation of behavioral and social effects thought to follow from a belief in determinism. For instance, Vohs and Schooler (2008) found that inducing disbelief in free will, via reading of a determinist essay or series of statements, elicited an increase in cheating on the part of participants. In comparison with control subjects, anti-free will participants in this case paid themselves a statistically improbable amount of money for performance on a problem-solving task, and more frequently permitted themselves to view answers when given the opportunity to cheat. Under similar conditions, Baumeister et al. (2009) found that participants with weakened free will beliefs showed increased aggression and decreased helping behavior. Likewise, an increase in mindless conformity and a decrease in counterfactual thinking, assumed to be adaptive for learning and social adaptation, have been reported to accompany deterministic beliefs (Baumeister et al., 2011; Alquist et al., 2013). Interestingly, when these studies included a condition promoting free will, results were consistent with the control group, suggesting that a belief in free will is a common default state.

More recent research in the domain of Cognitive Psychology has revealed an impact of deterministic beliefs even on basic levels of motor control. Rigoni et al. (2011) used a manipulation identical to that of Vohs et al. (2008, Experiment 1) to alter participants' belief in free will. They observed that participants who were induced to disbelief in free will showed reduced amplitudes of the readiness potential, an electrophysiological marker of unconscious motor preparation (Rigoni et al., 2011). In a subsequent study (Rigoni et al., 2013) it was found that performance monitoring, as indicated by post-error slowing, was also diminished in participants induced to disbelieve in free will. This may indicate a reduction in the recruitment of self-regulatory processes, and less inclination to adjust one's behavior according to circumstantial needs, on the part of anti-free will participants.

Finally, this belief manipulation has been applied to an important facet of self-control, namely *intentional inhibition*, or the ability to voluntarily suppress a prepotent action plan (Brass and Haggard, 2007). The study in question (Rigoni et al., 2012) employed a task developed by Kühn et al. (2009) that overcame a limitation of the well-supported literature on externally-generated stopping (see Aron, 2007, for a review) by enabling voluntary choice behavior to be experimentally investigated within an inhibition paradigm. In this task, participants were occasionally asked to freely decide whether to stop a prepared action (button pressing to halt the progress of a marble rolling down a ramp). Both intentional inhibition and perceived self-control were shown to be adversely affected by an anti-free will manipulation (Rigoni et al., 2012). These findings were interpreted such that weakened free will beliefs lead to a reduction in intentional effort, which then causes participants to select the less demanding response option (in this case to execute the pre-planned response).

The goal of the present study was to support and extend prior research on the influence of free will beliefs upon intentional inhibition, by investigating whether inducing determinist beliefs might in turn influence one's intentional engagement in self-regulatory behavior. However, while previous studies have investigated intentional inhibition in rather artificial experimental situations in which participants have hardly any prior motivation to act or inhibit, we sought to address voluntary inhibition in a more ecologically valid setting in which behavioral urges are present. To this end, our secondary goal was to develop and pilot a novel experimental paradigm for disentangling intentional from instructed inhibition.

Pain was selected as the behaviorally relevant stimulus for our purposes. Management of the pain avoidance response can be seen as a compelling component of the affective response system; the organism is strongly motivated to avoid the pain sensation (Campbell and Misanin, 1969; Elliot, 2006). We can therefore consider management of this urge as a window into how we suppress our most basic drives, and a classical instance of self-control. The pain avoidance response can of course be highly automatized, for instance when one reflexively jerks their hand away from a hot stove. However, at times other goals call for self-control to be exerted for the suppression of this avoidant urge, such as when the heat comes not from the stove, but from a plate of food. In this case, one might choose to suppress the highly prepotent reaction momentarily in favor of satisfying the opposing basic urge of hunger (cf. Morsella, 2005).

Our paradigm required participants to occasionally inhibit a prepotent withdrawal reaction from a heat source applied to their inner wrists. In half the trials, participants were able to choose whether to inhibit the withdrawal response or to immediately terminate the trial. The advantage of this manipulation is that it requires strong (and consistent; the urge to withdraw does not fade) self-control to withstand the thermal pain. In that sense, it is in stark contrast to standard laboratory tasks involving selfregulation and agency. The design also ensures that acting and inhibiting were equally distributed in the non-choice, or directed, trials, thereby discouraging any response bias and ensuring a comparable number of trial in each design cell. To manipulate free will beliefs, we used a Velten procedure (Velten, 1968) similar to that used in previous experiments (Vohs and Schooler, 2008, Experiment 2; Baumeister et al., 2009), in which participants are required to read and reflect upon a series of statements (see Supplementary Material for a complete list). Immediately prior to each trial, participants were presented with a statement and asked to retain the statement in memory until the end of the block. Statements were either neutral or meant to induce anti-free will beliefs (between-subjects). These statements were shown during the inter-trial interval in order to reduce potential pain preparation and decision-making strategies. We hypothesized that inducing disbelief in free will would lead participants to exhibit a reduction in intentional engagement, to lack adaptive strategies, and to be disinclined to adapt their behavior to contextual needs.

## **METHODS**

## **PARTICIPANTS**

Fifty-four Dutch-speaking undergraduate students enrolled in the study; all gave written consent prior to participation. They received either course credit or a payment of 10 euros for their participation. All participants had normal or corrected-tonormal vision and reported no neurological deficits. The study was conducted in accordance with the Declaration of Helsinki, and the approval of Ghent University's Ethical Committee was obtained in advance. After determining participants' individual pain thresholds, those who did not report sufficient pain (i.e., their threshold surpassed 50◦—beyond the safety limitations of the stimulating equipment) were removed from the study. A total of 48 participants (12 male, tested individually) completed the entire experiment.

## **PROCEDURE**

## *Threshold determination*

Pain was induced via a thermode connected to a Medoc PATHWAY device (MEDOC, Haifa, Israel), an apparatus designed to induce thermal pain using cold or hot stimulation. The threshold at which participants felt sufficient pain was determined by exposing each participant to 26 trials in which the thermal sensation gradually increased over 5 s from 32◦C to a randomized destination temperature between 45 and 50◦C (in increments of 0.25◦), a slope comparable to the experimental trials. After each trial, the thermode returned instantly to baseline temperature, and participants were asked to rate their perceived pain on a scale from zero to eight, with zero being no pain and eight being the worst possible pain. The destination temperature employed in the main experiment was computed for each participant as the highest temperature at which they rated their pain as a six. This method was revealed during piloting to yield more accurate tolerance threshold measurements than merely requiring participants to indicate the maximum heat they could withstand when exposed to a steadily increasing temperature. Importantly, participants were free to press a button at any point during the threshold determination in order to terminate the trial.

## *Task and stimuli*

Participants received painful heat stimulation during each trial, applied via a thermode to alternating inner wrists. The images of three geometric shapes (triangle, square, circle) were used as cues to indicate the trial type. Depending on the cue, participants were requested to either press the button as quickly as possible ("directed action," 25% of trials), inhibit this response and endure the pain ("directed inhibition," 25% of trials), or make a voluntary decision to either button press immediately or persist until the end of the trial ("choice," 50% of trials). In the latter case, participants were requested to make their choices approximately equal over the course of the experiment, but not to use any particular strategies or to decide in advance of the presentation of the cue. In a practice block, absent pain stimulation, participants were trained on the cues. A pilot study had revealed that participants are typically around 200 ms slower to respond on choice action trials than on directed action trials, reflecting the additional time needed for the choice decision. Accordingly, to make stimulation as identical as possible across action conditions, 200 ms of thermal stimulation was added to directed action trials, following the button press.

Each trial was preceded by a statement ("neutral" or "antifree will," see below) with a duration of 12 s. After a delay of 1 s, a fixation cross was presented and the temperature of the thermode began to gradually increase from a baseline of 32◦C to the participant's individually determined threshold. After 5 s, one of the three task cues appeared in place of the fixation cross. The temperature remained at threshold for the next 2 s, or until the participant pressed the button to terminate both the pain stimulation and the trial. Afterwards, prompts for ratings of the perceived pain and "urge to terminate the trial by pressing the button" (both on a scale of 0–8) remained on screen until participants responded. Participants were then cued to alternate the arm placed atop the thermode. The arm not being stimulated was used to button press (thereby providing a response time for action trials) and was placed atop the opposing wrist, in order to lend weight and make it more difficult for participants to inadvertently withdraw from pain rather than button pressing. A schematic overview of a possible trial in the anti-free will condition is presented in **Figure 1**.

The assignment of geometric shapes to trial types, and the order of the first-stimulated wrist were counterbalanced across subjects. Each participant had to perform 120 trials in total, being divided into six blocks of 20 trials presented in randomized sequence. In each block, participants were given 10 trials in which they were cued to make a decision, five trials in which they were cued to push and five trials in which they were cued to inhibit their withdrawal response. Importantly, participants were free to press a button to immediately terminate the thermal sensation at any point during the experiment.

## *Manipulation of free will beliefs*

Participants were randomly assigned to either the control condition or the anti-free will condition (24 in each condition). All participants were required to read discrete statements presented on-screen during the inter-trial interval. They were instructed to retain this information until the end of the block, at which point a probe question concerning statement recognition was presented on the screen (see Supplementary Material). The probe questions were inserted to verify that participants had attended to the statements as directed, and to support a cover story that the study's goal was to test the influence of pain on memory. After feedback on the accuracy of their answer was given, a novel set of statements was presented, and subjects were instructed to remember these subsequent statements instead. The statements were either neutral or designed to tap into free will beliefs, with 60 unique statements in each group. Over the course of the experiment, control participants were exposed to each of the 60 neutral statements twice, while participants in the anti-free will condition were shown each of the 60 statements related to free will beliefs twice. Furthermore, in the anti-free will condition, the three trial types (directed action, directed inhibition, choice) were divided equally over each of the three statement categories.

A total of 90 statements were collected from a variety of questionnaires and articles involving free will beliefs (e.g., Carey, 2005; Vohs and Schooler, 2008; Paulhus and Carey, 2011), or were produced based on these inventories. These 90 statements were selected with the aim of being related to certain aspects of free will beliefs; 30 statements were related to the idea that people do not have a free will (e.g., "scientists tell us that people have no free will"), thirty statements concerned beliefs in scientific determinism (e.g., "the environment someone is raised in determines their success as an adult") and 30 statements were related to beliefs in fatalistic determinism (e.g., "you can't change your destiny, no matter how hard you try"). Another 90 neutral statements were selected, stating facts and ideas that were unrelated to beliefs in free will (e.g., "an ostrich's eye is bigger than its brain").

The combined 180 statements were then rated online (http:// www*.*thesistools*.*com) by 38 participants, none of whom participated in the main experiment. Participants rated how difficult they would find the statement to recall, and the degree to which the statement was in line with either a disbelief in free will, a belief in scientific determinism, or a belief in fatalistic determinism. These questions were based on the factors laid out by Paulhus and Carey (2011) and were expressed in layman's terms for ease of understanding.

A total of 120 statements were selected based on the ratings drawn from this pre-test. The 20 statements that had received the highest ratings in each belief category were chosen, for a total of 60 experimental statements. Sixty neutral statements were matched for difficulty with these statements. Crucially, the experimental statements and the control statements did not differ with regard to their difficulty to recall (experimental: *M* = 1*.*59; neutral: *M* = 1*.*60), *t(*7*)* = 0*.*86, *p* = 0*.*82.

## *Questionnaires*

Two days prior to their participation in this study, participants completed an array of questionnaires concerning memory, anxiety, and free will beliefs. Questions about memory and anxiety were inserted to support the aforementioned cover story. Questions regarding free will beliefs consisted of the entire battery of the Free Will and Determinism questionnaire (FAD-Plus, Paulhus and Carey, 2011). Following the experimental session, participants were requested to complete the FAD-Plus questionnaire a second time to determine whether or not the experimental statements had an effect on the relevant belief system.

## **RESULTS**

## **MANIPULATION CHECK**

To test the effectiveness of the belief manipulation, a mixed design ANOVA was conducted on participants' total FAD-scores before and after the experiment using Time (Pre-test vs. Post-test) as a within-subject factor and Belief condition (Anti-free will vs. Control) as a between-subjects factor. Total FAD-scores were calculated for each participant such that higher values indicate less belief in free will, by reverse scoring the Free Will subscale and combining it with the other three subscales (Scientific Determinism, Fatalistic Determinism, and Unpredictability). The analysis revealed a significant interaction between Time and Belief Condition, *F(*1*,* <sup>46</sup>*)* = 4*.*19; *p <* 0*.*05 (**Figure 2**), such that participants in the experimental condition scored significantly higher after the experiment than before (Post-test: *M* = 80*.*0, *SD* = 8*.*9; Pre-test: *M* = 76*.*3, *SD* = 8*.*5), *t(*23*)* = 3*.*23, *p <* 0*.*01, indicating a weakening of beliefs in free will. No such effect was observed for participants in the control condition (Post-test: *M* = 76*.*9, *SD* = 8*.*9; Pre-test: *M* = 76*.*6, *SD* = 9*.*4), *t(*23*)* = 0*.*29, *p* = 0*.*78.

## **DATA PREPARATION**

Despite efforts toward optimizing the pain threshold procedure, the grand mean pain rating across participants was rather low (*M* = 4*.*6; *SD* = 1*.*11). Crucially, in the debriefing questionnaire, more than half (*N* = 26) of all participants stated that they had not needed to exert any effort to withhold the pain-withdrawal response during the experiment. As pain is a key factor in this experiment, we decided to restrict our analyses to participants that reported a sufficient level of pain throughout the whole of the experiment. We therefore excluded all participants with mean pain ratings lower than the median of the subjective pain scale, namely 4.5. All further analyses were performed on this subset of 25 "high pain" participants (8 male): 12 participants in the antifree will condition and 13 participants in the control condition. Results for the excluded "low pain" participants may be found in Supplementary Material.

## **BEHAVIORAL ANALYSES**

Between-group means and standard deviations are reported in **Table 1**.

## *Reaction times*

On trials in which participants were cued to button press, participants performed the correct response in nearly all trials (*M* = 99%, *SD* = 2%). We expected anti-free will participants to be significantly slower than controls, particularly on choice trials. A mixed design ANOVA on RTs, with Instruction

**FIGURE 2 | Mean total scores on the FAD-Plus questionnaire as a function of Belief condition (Control vs. Anti-free will) and Time (Pre-test vs. Post-test).** Higher scores indicate increased disbelief in free will.

(Choice vs. Directed) as a within-subjects factor and Belief condition (Anti-free will vs. Control) as a between-subjects factor, revealed a main effect of Instruction, *F(*1*,* <sup>23</sup>*)* = 79*.*310, *p <* 0*.*01, such that participants were slower to respond on choice trials (Choice: *M* = 807 ms, *SD* = 158 ms; Directed: *M* = 567 ms, *SD* = 108 ms), consistent with piloting and reflecting the time needed for a response decision. A main effect of Belief condition revealed a non-significant trend, *F(*1*,* <sup>23</sup>*)* = 2*.*958, *p* = 0*.*099, indicating that anti-free will participants tended to be slower to respond than controls (though this interpretation should be approached with caution due to the marginal significance level). Further, the interaction between Instruction and Belief condition trended toward significance, *F(*1*,* <sup>23</sup>*)* = 2*.*928, *p* = 0*.*10. Planned comparisons revealed an RT difference between anti-free will participants and controls on choice action trials, *t(*23*)* = −2*.*07, *p <* 0*.*05, Cohen's *d* = 0*.*84 (**Figure 3**), such that anti-free will participants were significantly slower to respond when given a choice than were controls. No such effect was found on directed action trials, *t(*23*)* = −0*.*69, *p* = 0*.*497, *d* = 0*.*27.


values depicted are means and standard errors.

## *Correlation of FAD difference scores with choice reaction times*

To examine the relationship between participants' RTs and free will beliefs more thoroughly, we performed an additional correlation analysis. The aim of this analysis was to test to what extent the slowed responding on choice action trials was related to the effectiveness of the belief manipulation. To this end, we first computed each participant's change in anti-free will beliefs, across experimental condition (control participants were included to ensure sufficient variability), by subtracting participants' postexperimental scores on the anti-free will subscale of the FAD from their pre-experimental scores. Second, we computed a difference score of participants' mean RTs on choice and directed action trials to create an index of each participant's decision time at pushing the button. There was a significant positive correlation between the two difference scores, *r(*23*)* = 0*.*40, *p <* 0*.*05 (**Figure 4**), reflecting that those subjects who showed a stronger reduction in free will beliefs were also slower to make the decision to press the button.

## *Proportion of inhibition on choice trials*

On trials in which participants were cued to choose between acting and inhibiting, participants opted to inhibit in 41.47% of all trials (*SD* = 9*.*76%). The proportion of inhibition on choice trials was analyzed in an independent-samples *t*-test, revealing no significant difference between anti-free will participants and controls, *t(*23*)* = −0*.*462, *p* = 0*.*648. This lack of a difference between experimental groups, which is in contrast to the findings of Rigoni et al. (2012), may be due to the experimental design, which, unlike previous studies, discourages response biases by using an equal proportion of directed action and inhibition trials.

## **RATINGS**

#### *Pain ratings*

We began by computing pain ratings across all participants for the first and second halves of the experiment to ensure that

**FIGURE 4 | Correlation of difference scores (post-test minus pre-test) on the anti-free will subscale of the FAD-Plus with the decision response time index (mean response times on choice minus directed trials).**

participants did not adapt to the pain stimulation over the course of the experiment. No differences in pain ratings were observed between the trials of the first and the second half of the experiment (First half: *M* = 5*.*4, *SD* = 0*.*8; Second half: *M* = 5*.*5, *SD* = 0*.*8), *t(*24*)* = −0*.*58, *p* = 0*.*57.

Participants reported a grand mean pain rating of 5.5 (*SD* = 0*.*74). Pain ratings were analyzed in a mixed design ANOVA using Belief condition as a between-subjects factor, and Response (Action vs. Inhibition) and Instruction (Directed vs. Choice) as within-subject factors. The main effect of Belief condition was not significant, *F(*1*,* <sup>23</sup>*)* = 0*.*13, *p* = 0*.*73, reflecting that subjective pain across trials was equivalent for the two groups. However, there was a significant main effect of Response (Action: *M* = 5*.*3, *SD* = 0*.*2; Inhibition: *M* = 5*.*7, *SD* = 0*.*1), *F(*1*,* <sup>23</sup>*)* = 12*.*60, *p <* 0*.*01, indicating higher perceived pain on inhibition compared with action trials, presumably due to the lengthier pain stimulation. Moreover, there was an interaction effect of Response × Instruction, *F(*1*,* <sup>23</sup>*)* = 7*.*94, *p* = 0*.*01, reflecting that inhibition trials were rated as less painful when they were voluntarily chosen rather than instructed (Choice: *M* = 5*.*5, *SD* = 0*.*8; Directed: *M* = 5*.*8, *SD* = 0*.*6), *t(*24*)* = 3*.*38, *p <* 0*.*01, while there was no such difference between chosen and directed action trials (Choice: *M* = 5*.*4, *SD* = 1*.*0; Directed: *M* = 5*.*2, *SD* = 0*.*9), *t(*24*)* = −1*.*54, *p* = 0*.*14. Importantly, the lack of a difference between the mean pain ratings of anti-free will and control participants suggests that our findings are not solely due to differences in the overall subjective experience of pain.

## *Urge ratings*

Participants reported a grand mean urge rating of 4.5 (*SD* = 1*.*4). Urge ratings were analyzed with a mixed design ANOVA akin to that of the pain ratings. The analysis revealed a significant main effect of response, reflecting greater urges on action trials (Action: *M* = 4*.*8, *SD* = 0*.*3; Inhibition: *M* = 4*.*2, *SD* = 0*.*3), *F(*1*,* <sup>23</sup>*)* = 4*.*98, *p <* 0*.*05. There was also a significant interaction effect of Response × Instruction, *F(*1*,* <sup>23</sup>*)* = 6*.*49, *p <* 0*.*05. Consistent with the pain ratings, participants reported a reduced urge on choice compared with directed inhibition trials (Choice: *M* = 4*.*0, *SD* = 1*.*6; Directed: *M* = 4*.*5, *SD* = 1*.*7), *t(*24*)* = 2*.*67, *p <* 0*.*05, while there was no such difference between choice and directed action trials (Choice: *M* = 5*.*0, *SD* = 1*.*4; Directed: *M* = 4*.*6, *SD* = 1*.*6), *t(*24*)* = −1*.*70, *p* = 0*.*10. The main effect of Belief condition was not significant, *F(*1*,* <sup>23</sup>*)* = 0*.*10, *p* = 0*.*76. Crucially however, there was a significant interaction effect of Belief condition × Instruction, *F(*1*,* <sup>23</sup>*)* = 6*.*22, *p <* 0*.*05. *Post-hoc t*-tests revealed that participants in the anti-free will condition tended to report a stronger urge to press on directed trials than on choice trials, *t(*11*)* = 2*.*044, *p* = 0*.*066, whereas this was not the case for control subjects, *t(*12*)* = −1*.*465, *p* = 0*.*17 (**Figure 5**). This may be indicative of a greater urge to act when externally instructed on the part of anti-free will participants. Similar results were obtained by Alquist et al. (2013), who found that anti-free will participants conformed more to external pressure.

#### **ADAPTIVE STRATEGIES ON CHOICE TRIALS**

Based on the hypothesis that anti-free will participants might lack adaptive strategies, we conducted an exploratory analysis in which

we investigated whether preceding trial pain or trial type had an influence on response selection during choice trials. We assumed that high pain trials might create a strong incentive to "quit" when subsequently given a choice, thereby activating a strategy that is protective of the organism. Similarly, participants might attempt to create subjectively easier response sequences when granted the opportunity. These strategies would presumably only be present for control participants, as anti-free will participants tend to be less inclined to adjust their behavior to the present situation (Rigoni et al., 2013).

#### *Pain on preceding trial*

To investigate the influence of pain on subsequent choice behavior, we computed each participant's mean pain rating for the trials preceding choice inhibition and choice action trials. A mixed design ANOVA with factors of Belief condition (Anti-free will vs. Control) and Response (Choice Action vs. Choice Inhibition) was then conducted on mean pain rating for n-1 trials. The analysis revealed no main effects or interactions, *F*s *<* 0*.*838, *p*s *>* 0*.*36, indicating that pain ratings on the preceding trial did not differ between choice inhibition and choice action trials, for either experimental group. This would suggest that participants do not use recent pain as a factor in deciding whether to act or inhibit when given the choice.

#### *Response styles*

To investigate response styles, we computed mean proportions of inhibition during choice trials following each of the four trial types. A mixed design ANOVA with factors of Belief condition (Anti-free will vs. Control), n-1 Instruction (Choice vs. Directed), and n-1 Response (Action vs. Inhibition) was then conducted on mean proportion of inhibition in choice trials. This gave an index of how often participants chose to inhibit rather than act following a particular trial type (**Figure 6**). The analysis revealed

a main effect of n-1 Instruction (Choice: *M* = 45*.*0% inhibition on subsequent choice trial; Directed: *M* = 38*.*7% inhibition on subsequent choice trial), *F(*1*,* <sup>23</sup>*)* = 6*.*366, *p <* 0*.*05, such that participants tended to choose to inhibit more often following a choice trial. There was also a significant interaction between n-1 Instruction and n-1 Response, *F(*1*,* <sup>23</sup>*)* = 11*.*460, *p <* 0*.*01, such that participants chose to inhibit more often following a choice action trial (*M* = 52*.*2%) than any other trial type (Choice Inhibit n-1 = 37.9%; Directed Action n-1 = 35.5%; Directed Inhibition n-1 = 41.7%), *t*s *>* 2*.*64, *p*s *<* 0*.*05. Furthermore, there was a non-significant trend toward an interaction between n-1 Response and Belief condition, *F(*1*,* <sup>23</sup>*)* = 3*.*523, *p* = 0*.*07. Anti-free will participants tended to inhibit more often following an action trial (*M* = 48*.*0%) than an inhibition trial (*M* = 38*.*6%), *t(*11*)* = −2*.*164, *p* = 0*.*05, *d* = 0*.*63, whereas this was not the case for controls (Action n-1: *M* = 40*.*0%; Inhibition n-1: *M* = 40*.*8%), *t(*12*)* = 0*.*251, *p* = 0*.*806, *d* = 0*.*03. This may indicate a more explicit tendency to alternate in an attempt to satisfy the 50% choice instruction. Finally, *post-hoc t*-tests confirmed that the primary difference in proportion of inhibition between experimental groups lay in directed action n-1 trials. Control subjects chose to inhibit significantly less often than anti-free will participants following a directed action trial (Control: *M* = 29*.*7%; Anti-free will: *M* = 41*.*7%), *t(*23*)* = −2*.*490, *p <* 0*.*05, *d* = 0*.*99. This may be indicative of an additional adaptive strategy on the part of control participants, as response repetitions are subjectively less effortful than response switches.

### **DISCUSSION**

In the present study, we employed a novel experimental approach using thermal pain stimulation in order to demonstrate the moderating nature of high-level beliefs on self-regulation. In particular, we sought to probe whether reducing participants' belief in free will could lead to a form of intentional disengagement that influences selection and inhibition of action within a "hot" motivational system (Metcalfe and Mischel, 1999).

In line with our predictions, participants who were induced to disbelieve in free will were significantly slower to initiate a response on trials in which they chose to act in order to terminate the pain stimulation. This directly corresponds to the hypothesis that anti-free will participants would exhibit less intentional engagement. Interestingly, this effect is only evident when a pain avoidance response has to be executed internally rather than externally, suggesting not a global passivity, but rather a specific impairment in intentional self-regulation. This dissociation is in accordance with previous evidence that intentional and stimulusdriven actions rely on distinct functional (Herwig et al., 2007) and neural (Müller et al., 2007) mechanisms. The amount of slowing on choice action trials was furthermore correlated with the degree of the effectiveness of the belief manipulation, suggesting a direct link between the weakening of free will beliefs and the voluntary management of a behavioral response to an aversive stimulus. This mirrors the finding by Rigoni et al. (2011) in which decreases in the readiness potential were correlated with a change in anti-free will scores.

Moreover, anti-free will participants reported greater urges to terminate the trial when their behavior was guided by the cue compared to when they were able to freely choose, suggesting a disengagement from the task when externally instructed. Importantly, and in contrast with previous studies (e.g., Kühn et al., 2009; Rigoni et al., 2012), the aforementioned differences are not confounded by differential response biases, as the proportion of inhibition in choice trials was equivalent between control and anti-free will participants.

Our analysis of potentially adaptive strategies revealed surprising results. Participants do not appear to use recent pain as a criterion in deciding whether to act or inhibit when given the choice. However, we do find differences between the experimental groups in terms of their response styles. Interpretations are merely speculative at this point, but it seems plausible that this effect could be related to minimizing cognitive effort (e.g., Kool et al., 2010). For instance, one could suppose that control participants select a subjectively easier strategy when exhibiting a bias to repeat an action response (e.g., Mayr and Bell, 2006). On the other hand, one could interpret the anti-free will participants as selecting the less effortful strategy, by avoiding two (subjectively more painful) inhibition trials in a row (e.g., law of least effort, Hull, 1943). In the future, this could be disentangled by presenting blocks composed solely of choice trials in order to determine, via longer choice trial sequences, which is the favored strategy: response repetitions or avoidance of effortful combinations.

Taken together, the present study supports and extends previous research on intentional inhibition (Brass and Haggard, 2007; Kühn et al., 2009; Filevich et al., 2012; Rigoni et al., 2012). In particular, it is the first to investigate voluntary inhibition of behavior in an ecologically valid experimental setting that involves hot motivational systems rather than entirely arbitrary choices. Participants reported less pain and a reduced urge to terminate the trial on choice inhibition trials compared with directed inhibition trials, while choice and directed press trials were more comparable. Thus, the pain paradigm we introduce offers an effective way to dissociate between voluntary and instructed inhibition on a behavioral level, which opens the door to new ways of investigating inhibition in which behaviorally-relevant options are available to the participant.

That being said, as this study served as a first pilot of a novel paradigm, our investigation must be seen as exploratory in nature, and our conclusions considered accordingly. The exclusion of participants who did not experience sufficient pain levels is an unfortunate limitation of the present line of research (Supplementary Material includes a summary of the excluded participants' results, for a comprehensive overview of our findings). Future studies should endeavor to ensure that a sufficient pain tolerance threshold is obtained for each participant, or that unsuitable participants are excluded in advance of testing. This may require rigorous pre-testing of criteria such as whether participants are able to reliably report their tolerance thresholds, and whether or not they adapt too quickly to pain over the course of the experiment.

On a larger scale, the observed effects also exemplify a growing body of research that reveals the influence of higher-order beliefs and metacognitions on behavioral control. As discussed earlier, determinist beliefs have been shown to have an effect on prosocial behavior (Vohs and Schooler, 2008; Baumeister et al., 2009, 2011), basic motor and cognitive processes (Rigoni et al., 2011, 2013), intentional inhibition (Rigoni et al., 2012), and now on self-regulation of a "hot" incentive response system (Morsella, 2005). Yet free will beliefs are not the only higher-order cognitions capable of influencing a variety of processes underlying behavioral control.

For instance, one factor that has been proposed to have a strong influence on self-control is "ego depletion," or the phenomenon in which exertion of self-control exhausts a common regulatory resource, leading to hindered performance on subsequent tasks (Muraven et al., 1998; Vohs et al., 2008; Baumeister et al., 2009; Hagger et al., 2010). However, recent research has revealed that participants' relevant belief systems are likely to be more crucial than actual depletion when it comes to

## **REFERENCES**


*Pers. Soc. Psychol. Bull.* 35, 260–268.


self-regulatory capacity. For instance, Job et al. (2010) demonstrated that only participants who thought of willpower as a limited resource demonstrated the typical pattern of ego depletion, while the effect was completely absent in participants who lacked this conviction. Similarly, Clarkson et al. (2010) found that regardless of how depleted participants actually were, if they perceived themselves as less depleted, they failed to demonstrate ego depletion effects during subsequent task performance (see also Vohs et al., 2012).

These observations indicate that beliefs regarding regulatory resources are distinct from the resources themselves, and can impact task performance independently. The present study complements this line of research. There is little incentive for engagement in self-control under the assumption that behavior is fully determined, and in this way free will beliefs are able to influence the decision to exert regulatory effort. Accordingly, assumptions about the existence of free will can be considered as operating in parallel with beliefs about regulatory capacities. The former speaks to one's motivation to engage in self-regulation, while the latter informs one's available resources for self-control. Moreover, while the aforementioned ego depletion studies have examined task-relevant beliefs as stable traits, here we demonstrate the relevance of lay beliefs more directly by manipulating them experimentally. Our findings therefore indicate that the impact of higher-order beliefs on self-regulatory engagement is not limited to stable, trait-like effects, but that even subtle statelike fluctuations in the strength of beliefs can affect the amount of effort that people invest in self-control. A fundamental belief in control over one's actions may therefore prove to be an integral prerequisite for self-regulatory investments. Future studies should more directly investigate the mechanisms by which higher-order beliefs impact the recruitment of self-control.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Cognition/10.3389/fpsyg. 2013.00614/abstract

self-regulatory behavior. *J. Pers. Soc. Psychol.* 98, 29–46. doi: 10.1037/ a0017539


sensorimotor integration in intention- based and stimulusbased actions. *Q. J. Exp. Psychol.* 60, 1540–1554. doi: 10.1080/17470210 601119134


the "veto-area" exerts control. *Hum. Brain Mapp.* 30, 2834–2843. doi: 10.1002/hbm.20711


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 April 2013; accepted: 21 August 2013; published online: 23 September 2013.*

*Citation: Lynn MT, Van Dessel P and Brass M (2013) The influence of high-level beliefs on self-regulatory engagement: evidence from thermal pain stimulation. Front. Psychol. 4:614. doi: 10.3389/fpsyg.2013.00614*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Lynn, Van Dessel and Brass. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Desynchronization and rebound of beta oscillations during conscious and unconscious local neuronal processing in the macaque lateral prefrontal cortex

#### *Theofanis I. Panagiotaropoulos <sup>1</sup> \*, Vishal Kapoor <sup>1</sup> and Nikos K. Logothetis 1,2*

*<sup>1</sup> Department of Physiology of Cognitive Processes, Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany <sup>2</sup> Division of Imaging Science and Biomedical Engineering, University of Manchester, Manchester, UK*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA Bernhard Hommel, Leiden University, Netherlands*

#### *\*Correspondence:*

*Theofanis I. Panagiotaropoulos, Department of Physiology of Cognitive Processes, Max-Planck-Institute for Biological Cybernetics, Spemanstrasse 38, 72076 Tübingen, Germany e-mail: theofanis.panagiotaropoulos@tuebingen.mpg.de*

Accumulating evidence indicates that control mechanisms are not tightly bound to conscious perception since both conscious and unconscious information can trigger control processes, probably through the activation of higher-order association areas like the prefrontal cortex. Studying the modulation of control-related prefrontal signals in a microscopic, neuronal level during conscious and unconscious neuronal processing, and under control-free conditions could provide an elementary understanding of these interactions. Here we performed extracellular electrophysiological recordings in the macaque lateral prefrontal cortex (LPFC) during monocular physical alternation (PA) and binocular flash suppression (BFS) and studied the local scale relationship between beta (15–30 Hz) oscillations, a rhythmic signal believed to reflect the current sensory, motor, or cognitive state (status-quo), and conscious or unconscious neuronal processing. First, we show that beta oscillations are observed in the LPFC during resting state. Both PA and BFS had a strong impact on the power of this spontaneous rhythm with the modulation pattern of beta power being identical across these two conditions. Specifically, both perceptual dominance and suppression of local neuronal populations in BFS were accompanied by a transient beta desynchronization followed by beta activity rebound, a pattern also observed when perception occurred without any underlying visual competition in PA. These results indicate that under control-free conditions, at least one rhythmic signal known to reflect control processes in the LPFC (i.e., beta oscillations) is not obstructed by local neuronal, and accordingly perceptual, suppression, thus being independent from temporally co-existing conscious and unconscious local neuronal representations. Future studies could reveal the additive effects of motor or cognitive control demands on prefrontal beta oscillations during conscious and unconscious processing.

**Keywords: beta oscillations, control, prefrontal cortex, consciousness**

## **INTRODUCTION**

According to a traditionally held view suggesting that control functions are bound to consciousness (Norman and Shallice, 1986), it is reasonable to assume that conscious perception of sensory cues is a prerequisite for their integration into a control function. However, more recently, there is accumulating evidence that control of action is functionally distinct from consciousness since it can be affected by subliminal, unconscious information processing of masked stimuli. Specifically, control functions like response inhibition (van Gaal et al., 2008, 2010), task-set preparation, conflict detection, motivation, and error detection can be initiated by unconscious stimuli (for a thorough review see van Gaal and Lamme, 2012; van Gaal et al., 2012). Although in general, the impact of these subliminal stimuli in control is rather small compared to conscious signals, the observed effects suggest that control processes are not strictly conscious but can be detected across a wide spectrum of conscious and unconscious processing. These observations suggest that control and consciousness are, to a considerable degree, separable functions (Hommel, 2007, 2013; van Gaal et al., 2012) and therefore a similar dissociation should be expected for their respective neuronal correlates.

In this context, it was recently examined whether physiological signals related to control are observed not only when a visual stimulus is consciously perceived but also during its visual masking, a manipulation that renders the stimulus invisible. Indeed, electroencephalography (EEG) signals associated to inhibitory control like the N2 event-related potential (ERP) component were detected for both masked and unmasked stop stimuli, suggesting that the neural mechanism of inhibitory control can be dissociated from consciousness (van Gaal et al., 2010). The source of the N2 ERP component has a frontal origin (van Gaal et al., 2008) which is in accordance with the activation of inferior frontal gyrus during unconscious inhibitory control and other control-related tasks affected by unconscious information as determined by functional magnetic resonance imaging (fMRI) or intracranial EEG (Berns et al., 1997; Stephan et al., 2002; Lau and Passingham, 2007; van Gaal et al., 2010).

Another electrophysiological signal strongly associated to control functions is oscillatory synchronization in the beta frequency range (∼15–40 Hz). In particular, beta oscillations in the somatosensory, motor, and frontal cortices reflect different aspects of sensory, motor, and cognitive processing and control. Specifically, processing of visual cues as well as different phases of a motor sequence have been shown to exert a strong impact on the power of beta oscillations in the frontal, premotor, motor, and sensory cortex (for a review see Kilavik et al., 2013). The most striking effect is an initial beta desynchronization (i.e., decrease in beta power) following stimulus onset or voluntary motor behavior that is followed by a beta activity rebound during unchanged stimulus input or steady contractions and holding periods (Sanes and Donoghue, 1993; Pfurtscheller et al., 1996; Donoghue et al., 1998; Baker et al., 1999; Gilbertson et al., 2005; Jurkiewicz et al., 2006; O'Leary and Hatsopoulos, 2006; Baker, 2007; Siegel et al., 2009; Engel and Fries, 2010; Puig and Miller, 2012; Kilavik et al., 2013). Although the functional significance of these stereotypical modulations remains largely elusive, the dominance of beta band activity during such "no-change," resting state-like periods led recently to the suggestion that beta oscillations could reflect an active process that supports the maintenance of the current sensory, motor, or cognitive set (Gilbertson et al., 2005; Pogosyan et al., 2009; Swann et al., 2009; Engel and Fries, 2010). Interestingly, this hypothesis is supported by clinical observations showing that the power of beta oscillations is abnormally high in cortical and subcortical structures of patients suffering from Parkinson's disease (PD; Marsden et al., 2001; Brown, 2007; Chen et al., 2007; Hammond et al., 2007). The accompanying disruption of motor function and control observed in PD suggests that pathologically enhanced beta oscillations could mediate reduced flexibility and a pathological maintenance of the current sensory and motor state. These results combined with findings directly involving prefrontal beta activity in cognitive control (Buschman and Miller, 2007, 2009; Buschman et al., 2012) indicate that beta oscillations could be related to both basic and higher-order control processes across sensory, cognitive, and motor domains (Engel and Fries, 2010).

Despite the wealth of information on the role of beta oscillations on control it is currently unknown how beta is affected by conscious or unconscious processing, particularly in cortical areas like the prefrontal cortex which is heavily involved in control. To resolve this issue, we examined the temporal dynamics of beta oscillatory power in the lateral prefrontal cortex (LPFC) during conscious and unconscious stimulus processing using binocular flash suppression (BFS), a paradigm of rivalrous visual stimulation that dissociates conscious perception from purely sensory stimulation, and compared it with the respective dynamics during monocular physical alternation (PA) of the same visual patterns. In a previous study, we demonstrated that local spiking activity in the LPFC correlates with conscious and unconscious processing (Panagiotaropoulos et al., 2012). That is, neuronal discharges increase when a preferred stimulus is consciously perceived and decrease when the preferred stimulus is perceptually suppressed. Here, we examined in detail the modulation of beta oscillations in these prefrontal sites where locally recorded spiking activity reflects conscious or unconscious processing.

Our results show that the power modulation of beta oscillations under control-free conditions follows the same temporal dynamics during monocular, purely sensory stimulus transitions (i.e., without any underlying stimulus competition) and perceptual transitions involving rivalry that result in the suppression of a competing stimulus. Therefore, the temporal dynamics of prefrontal beta oscillatory power following perceptual transitions appear not to be influenced by the presence of a competing but perceptually suppressed stimulus. Most interestingly, in prefrontal sites where spiking activity followed the perceptual dominance or suppression of a preferred stimulus, beta power was modulated in a non-specific manner regardless of dominance or suppression.

These findings indicate that the stimulus-induced modulation of beta oscillatory power in the LPFC under control-free conditions could reflect a general purpose process, not bound to neuronal—and therefore perceptual—dominance or suppression, but rather indicating transitions in visual perception. We suggest that prefrontal beta oscillations could reflect an elementary process that represents the maintenance or change in the current visual sensory state, independent of stimulus awareness.

## **MATERIALS AND METHODS**

## **ELECTROPHYSIOLOGICAL DATA COLLECTION AND STIMULUS PRESENTATION**

The cranial headpost, scleral eye coil, and recording chambers were implanted in two monkeys under general anesthesia using aseptic and sterile conditions. The recording chambers (18 mm in diameter) were centered stereotaxically above the LPFC (covering mainly the ventrolateral inferior convexity of the LPFC) based on high-resolution MR anatomical images collected in a vertical 4.7 T scanner with a 40-cm-diameter bore (Biospec 47/40c; Bruker Medical, Ettlingen, Germany).

We used custom-made tetrodes made from Nichrome wire and electroplated with gold with impedances below 1 M*-*. Local field potential (LFP) signals were recorded by analog band pass filtering of the raw voltage signal (high-pass at 1 Hz and low-pass at 475 Hz) and digitized at 2 kHz (12 bits). Multi-unit spiking activity (MUA) was defined as the events detected in the highpass analog filtered signal (0.6–6 kHz) that exceeded a predefined threshold (typically, 25µV) on any tetrode channel. The 0.6– 6 kHz recorded signal was sampled at 32 kHz and digitized at 32 kHz (12 bits). The recorded signals were stored using the Cheetah data acquisition system (Neuralynx, Tucson, AZ, USA). Eye movements were monitored online and stored for offline analysis using the QNX-based acquisition system (QNX Software Systems Ltd.) and Neuralynx. Visual stimuli were displayed using a dedicated graphics workstation (TDZ 2000; Intergraph Systems, Huntsville, AL, USA) with a resolution of 1280 × 1024 and a 60 Hz refresh rate, running an OpenGL-based stimulation program. All procedures were approved by the local authorities (Regierungspräsidium Tübingen, Tübingen, Germany) and were in full compliance with the guidelines of the European Community (EUVD 86/609/EEC) for the care and use of laboratory animals.

### **BEHAVIORAL TASK AND LFP ANALYSIS**

We performed extracellular electrophysiological recordings in the LPFC of 2 macaque monkeys during (a) monocular PA and (b) BFS, a well-controlled version of rivalrous visual stimulation that allowed us to induce robust perceptual dominance and suppression for a duration of 1000 ms. Although the task used in this study had no behavioral conditions in which control was explicitly examined it nevertheless allowed us to observe the local cortical interactions between distinct neurophysiological signals related to control and consciousness, during conditions that elicited subjective perceptual dominance and suppression. Specifically, in a previous study we identified LPFC sites where the summed neuronal discharges and gamma oscillations followed the perceptual dominance or suppression of a preferred stimulus (Panagiotaropoulos et al., 2012). Here we reexamined the temporal modulation of LFPs from the same recording sites to determine the influence of conscious perception on oscillatory activity with a special focus on beta frequency range (15–30 Hz), the frequency band that is involved in the maintenance or disruption of sensory or motor status quo (Engel and Fries, 2010) and cognitive control (Buschman et al., 2012).

Before the beginning of each recorded data set, a battery of visual stimuli was presented, and, based on the MUA response, a preferred stimulus that could drive local neuronal activity better was contrasted to a non-preferred stimulus that induced less robust responses. Visual stimuli were foveally presented with a typical size of 2–3◦. In both BFS and PA trials, a fixation spot (size, 0.2◦; fixation window, ±1◦) was presented for 300 ms (*t* = 0–300 ms), followed by the same visual pattern to one eye (*t* = 301–1300 ms). In BFS trials (**Figures 1C,D** "BFS"), 1 s after stimulus onset, a disparate visual pattern was suddenly flashed to the corresponding part of the contralateral eye. The flashed stimulus remained on for 1000 ms (*t* = 1301–2300 ms), robustly suppressing the perception of the contralateraly presented visual pattern, which was still physically present. In the PA trials (**Figures 1A,B** "PA"), the same visual patterns were physically alternating between the two eyes, resulting in a visual percept identical to the perceptual condition but this time without any underlying visual competition. At the end of each trial and after a brief, stimulus free, fixation period (100–300 ms), a drop of juice was used as a reward for maintaining fixation. The efficiency of BFS to induce perceptual suppression, was tested in a different monkey that was trained to report PA and BFS by pulling levers for the two different stimuli used in our recordings (Panagiotaropoulos et al., 2012). PA and BFS conditions were pseudorandomized and allowed us to record from perceptually dominant and suppressed populations by changing the order of presentation of the two disparate stimuli (**Figure 1**). Binocular stimulation was achieved through the use of a stereoscope.

The baseline preference of MUA activity was determined in the control, PA trials, where perception of a preferred or a non-preferred pattern occurred without any underlying stimulus competition (**Figures 1A,B**). In BFS, a monocularly presented preferred or non-preferred stimulus was perceptually suppressed by the presentation ("flash") of a disparate visual pattern in the contralateral eye for at least 1000 milliseconds (Wolfe, 1984;

preferred visual stimulus. In **(B)** the order of visual stimulation is reversed. These PA conditions allowed us to study neurophysiological responses during purely sensory stimulation without any underlying competition. In **(C)** the non-preferred stimulus is suppressed by the presentation of a preferred visual pattern while in **(D)** the preferred pattern is suppressed due to a flash

suppression of a local population. Therefore, BFS allowed us to study conscious and unconscious processing of a visual stimulus. Stimulus preference was determined by comparing the local population discharge response to the two stimuli used in **(A)** and **(B)** between *t* = 1301–2300 ms (see also Panagiotaropoulos et al., 2012).

Panagiotaropoulos et al., 2012). By changing the order of visual stimulus presentation in half of the trials, it was possible to discern between the perceptual suppression of a preferred and a non-preferred visual stimulus (**Figures 1C,D**). A contrastive analysis that compared neuronal activity during BFS (where visual rivalry occurred) with the respective activity during PA (thus without any underlying competition) was used to distill the consciousness-related neuronal correlates (Panagiotaropoulos et al., 2012).

In this study we analyzed LFP signals from sites where we recorded spontaneous, resting-state, activity as well as from local prefrontal sites that exhibited significant stimulus preference (Panagiotaropoulos et al., 2012). We binned the long spontaneous activity recordings (lasting approximately 10–30 min) in windows of 1000 ms duration. The PSD of the raw LFP signals for long, spontaneous activity recordings (**Figure 2**), was estimated using the multitaper method (Thomson, 1982) for narrow frequency bins of 1 Hz and for each 1000 ms window. This method uses linear or non-linear combinations of modified periodograms to estimate the PSD. These periodograms are computed using a sequence of orthogonal tapers (windows in the frequency domain) specified from the discrete prolate spheroidal sequences. For each dataset we averaged the spectra across all time windows. Time frequency analysis during PA and BFS (**Figure 5**) was performed by computing a spectrogram of the power spectral density in each trial using overlapping (94%) 256 ms windows and then averaged across all trials for the same condition. In **Figure 6** a Hilbert transform of the beta band limited signal in each trial was used to extract the band-limited LFP envelope between 15 and 30 Hz. The mean envelope was averaged across trials and across conditions for each dataset. Digital filters were constructed via the

**FIGURE 2 | (A)** Power spectrum of resting-state activity in 45 recorded sites sorted according to the power magnitude at 22 Hz. All sites exhibit a prominent peak (black arrow) in the beta frequency range (approximately between 15 and 30 Hz). **(B)**. Mean power spectrum ± s.e.m during resting state activity across the 45 recorded sites presented in **(A)**. Note a bump (black arrow) in the mean power spectrum in the beta range. The peak in 50 Hz is due to power line noise.

Parks–McClellan optimal equiripple FIR filter design to obtain the beta (15–30 Hz) band-limited LFP signal. The LFP data presented here are from the same sites where local spiking activity was previously found to exhibit significant selectivity during PA (Panagiotaropoulos et al., 2012).

## **RESULTS**

Initially, we established that beta oscillations reflect a dominant oscillatory rhythm in the LPFC during resting state. We recorded long (approximately 10–30 min) periods of spontaneous, resting state activity during which the awake macaques could keep their eyes open or closed. As depicted in **Figure 2**, the mean power spectrum of spontaneous oscillatory activity in all (*n* = 45) LPFC recorded sites is characterized by a prominent peak in the beta frequency range, between 15 and 30 Hz. Since such peaks or bumps in the LFP power spectrum are indicative of dominant, frequency-specific, intrinsic rhythmic activity, these results show that beta oscillations represent a dominant resting-state rhythm in the LPFC.

We analyzed how the power of this spontaneously occurring prefrontal rhythm is modulated during purely sensory visual stimulation in PA, in recorded sites where spiking activity showed a significant preference for one of the two stimuli used in each dataset. In our previous study (Panagiotaropoulos et al., 2012) we found that despite significant spiking selectivity the power of low frequency oscillations averaged over 1 s of visual stimulation in the same local sites was not selective, showing no stimulus preference. However, when we reexamined our LFP data we observed that high amplitude low frequency oscillations detected in the broadband LFP signal were consistently modulated across trials, exhibiting signs of desynchronization (i.e., reduction in power) and rebound activity during the presence of visual stimulation (example trials from a typical LPFC recording site are depicted in **Figure 3**). We performed a Hilbert transform in the recorded LFP signal for each trial and extracted the band-limited oscillations in the beta frequency range (15–30Hz). For all conditions we observed periods of abrupt desynchronization following both initial visual stimulation (*t* = 301–1300 ms) or a change in the visual input (*t* = 1301–2300 ms) that were replaced by a rebound of oscillatory activity (**Figure 4**). We captured a qualitative representation of beta modulation across conditions by computing the time-frequency spectrogram for each trial and then averaged across trials for each recording site and finally across sites for each condition. The averaged spectrograms show that beta oscillations were dynamically modulated during visual stimulation regardless of the co-existing stimulus preference exhibited by the averaged spiking activity (**Figure 5**). Specifically, in PA trials where visual stimulation started with the presentation of a nonpreferred (by the local spiking activity) pattern that was followed by a preferred one (**Figure 5A**), beta oscillations were desynchronized immediately after the initiation of fixation and then a rebound of synchronous activity was observed until the first, non-preferred, stimulus was presented (*t* = 0–300 ms). The presentation of the non-preferred stimulus resulted in a new decrease in beta power until ∼400 ms following the onset of visual stimulation where a rebound in the power of beta oscillatory activity appeared (*t* = 301–1300 ms). Following a monocular stimulus alternation (i.e., removal of the first stimulus and stimulation

of the contralateral eye with a disparate pattern), beta oscillations were modulated again (*t* = 1301–2300 ms). Specifically, the presentation of the preferred (as determined by spiking activity) stimulus in the contralateral eye resulted in a new round of desynchronization followed by beta rebound activity after ∼400 ms. As expected, due to the absence of any obvious selectivity in beta power, the same pattern of beta power modulation was also observed in the PA condition when a non-preferred (by the spiking activity) pattern followed the monocular presentation of a preferred pattern (**Figure 5B**). The initial desynchronization following the first stimulus presentation and monocular switch was followed by a beta power rebound. This result demonstrates that in a local prefrontal level, in sites where spiking activity exhibits stimulus preference, beta oscillations are dynamically modulated regardless of stimulus preference when perception occurs without any underlying visual competition.

However, the PA condition provides no information about the modulation of beta oscillations when local spiking activity reflects conscious perception or perceptual suppression. Therefore, we determined the influence of conscious perception or perceptual suppression in beta power modulation during BFS trials that involved visual competition. As depicted in the averaged time-frequency plot in **Figure 5C**, when a preferred stimulus suppressed the initially presented non preferred visual pattern (*t* = 1301–2300 ms) the power of beta oscillations showed the same modulation pattern (initial desynchronization followed by a beta rebound) as when a preferred stimulus was perceived without competition in PA (**Figure 5A**). Most interestingly, the same desynchronization followed by beta activity rebound was also observed when the local population signaling the preferred stimulus was suppressed by the presence of a non-preferred visual pattern (**Figure 5D**). This result indicates that beta oscillations are visually modulated regardless of the simultaneously recorded local spiking activity that may be dominant or suppressed. Finally, in both PA and BFS trials, the inter-trial period, during which eye movements were free and the animals were allowed to fixate

**FIGURE 4 | Band-limited LFP signal (15–30 HZ) of the raw LFP signals presented in Figure 3.** Beta oscillations are suppressed for all conditions during visual stimulation without any obvious

relationship to stimulus preference for both PA (**A** and **B**) and BFS (**C** and **D**). Beta oscillations are particularly prominent during the inter-trial period.

anywhere or have their eyes closed, resulted in the reestablishment of beta oscillations and high beta power, similar to the activity detected during long, resting-state activity recordings.

We quantified the effects qualitatively described in the time frequency plots by plotting the mean envelope of the beta band (15–30 Hz)-filtered signal in PA and BFS. In **Figure 6A**, visual stimulation without perceptual competition (PA) initially results in beta power reduction followed by a rebound of oscillatory activity regardless of neuronal stimulus preference. Exactly the same pattern can be observed in **Figure 6B** for BFS. In this condition that employs visual rivalry between a preferred and a non-preferred stimulus during *t* = 1301–2300 ms, beta oscillations recorded when the spiking activity of local neuronal populations is suppressed exhibit the same desynchronization and rebound effect that is observed when the same population is dominant. During the inter-trial period the power of beta oscillations is significantly higher compared to the period of visual stimulation.

These results indicate that visual competition (during BFS) has no effect on the modulation pattern of beta oscillations in the LPFC observed during purely sensory stimulation (during PA). Most importantly, based on the absence of any indication of stimulus selectivity in the power of beta oscillations in sites where

**FIGURE 6 | Mean envelope (15–30 Hz) across trials and recorded sites for PA (A) and BFS (B).** In PA there is no difference in the modulation of beta power between a switch from a preferred to a non-preferred (red curve) and a switch from a non-preferred to a preferred (blue curve) visual stimulus. Stimulus-induced desynchronization (black arrows) followed by a beta rebound is observed in both cases. The same pattern is observed during BFS **(B)**. Note that in BFS from *t* = 1301–2300 there are no differences in beta power when the recorded neuronal population as well as the preferred pattern is dominant (blue) or suppressed (red).

spiking activity is selective during visual rivalry, we can infer that at least two neurophysiological signals related to consciousness (local spiking activity) and control (beta oscillations) follow discrete modulation patterns in a local prefrontal level. Even when a preferred stimulus becomes suppressed during rivalrous stimulation and the local neuronal populations are not responsive, beta oscillations recorded from the same non-responsive area undergo the same desynchronization and rebound of activity as when the local population becomes perceptually dominant. These results establish a baseline condition for the modulation of beta oscillations during conscious and unconscious processing that could be exploited by future studies in which both conscious perception and control demands are modulated during a task. We show that a control related signal (i.e., beta oscillations) is non-specifically modulated by visual stimulation and, most importantly, this modulation is not influenced by the dominance or suppression of spiking activity during rivalrous visual stimulation. Therefore, beta oscillatory power in the LPFC could reflect a general purpose mechanism that is not related to conscious perception *per se* but rather indicates transitions and stability in visual perception.

## **DISCUSSION**

## **CONTROL AND CONSCIOUSNESS IN THE PFC**

Executive or cognitive control functions define a large set of higher-order mental operations that organize, initiate, monitor, and act on goal-directed behavior in a flexible manner. Historically, the dependence of these executive operations on perceptual awareness generated a great deal of philosophical debate since resolving the details of this intricate relationship could provide significant insights into the functional role of consciousness and constrain theoretical concepts of free will (Mayr, 2004; Hommel, 2007). More recently, experimental investigations revealed that—contrary to common belief—both elementary and higher order, cognitive, control processes have access to subliminal, unconscious information (Eimer and Schlaghecken, 1998; Eimer, 1999; Lau and Passingham, 2007; van Gaal et al., 2008, 2010, 2012).

It is possible to eavesdrop on some aspects of the relationship between consciousness and control by studying the local interactions of the respective neuronal correlates in the neocortex. The current body of evidence suggests that part of the neuronal correlates of both conscious perception (Lumer et al., 1998; Sterzer and Kleinschmidt, 2007; Gaillard et al., 2009; Dehaene and Changeux, 2011; Libedinsky and Livingstone, 2011; Panagiotaropoulos et al., 2012) and cognitive control (Luria, 1969; Goldman-Rakic et al., 1992; Miller, 1999, 2000; White and Wise, 1999; Miller and Cohen, 2001; Wallis et al., 2001; Tanji and Hoshi, 2008; Swann et al., 2009; Buschman et al., 2012) are co-localized in the prefrontal cortex (PFC). However, although these two parallel streams of research led to significant insights into the neuronal correlates of conscious perception and executive functions, the progress was, until recently, to a large extent independent and as a consequence little is known about the interactions of these two neuronal representations in the PFC, at least in the fine spatiotemporal scale offered by extracellular electrophysiological recordings. For example, an elementary but not yet addressed issue is to what extent control-related neurophysiological signals in the PFC, like beta oscillations, are influenced by the perceptual dominance or suppression of a preferred stimulus during rivalrous stimulation, under control-free conditions. Such information could reveal the baseline impact of conscious processing and perceptual suppression on the state of intrinsic signals related to control, before control is learned or applied.

## **BETA OSCILLATIONS DURING CONSCIOUS AND UNCONSCIOUS PROCESSING IN THE LPFC**

In this study we determined the extent to which the visual, sensory-induced, modulation of beta (15–30 Hz) oscillations depends on conscious neuronal processing in a local prefrontal cortical level. Our task didn't involve any motor or cognitive control demands and therefore our results are not informative about the role of beta oscillations on cognitive or motor control during conscious or unconscious processing. However, we were able to discern the effect of conscious and unconscious processing as a result of visual competition on beta oscillations.

The results presented in this study reveal that intrinsically generated beta oscillations in the LPFC are non-specifically modulated by visual sensory input in local sites where spiking activity exhibits preference for stimulus features. The pattern of purely sensory-induced beta power modulation is characterized by an initial stimulus-induced desynchronization followed by a beta rebound, as shown in the PA condition. This desynchronizationrebound pattern has been reported in the past in the context of other electrophysiological studies, as a result of visual input in the prefrontal cortex (Siegel et al., 2009; Puig and Miller, 2012). However, PA or purely sensory input is not adequate to dissociate the effect of conscious visual perception from sensory stimulation. This was achieved during BFS which allowed us to elicit visual competition between two stimuli and study the modulation of beta oscillations in local prefrontal sites during periods that a preferred stimulus was perceptually dominant (thus consciously perceived) or suppressed (i.e., without access to awareness). Our results show that local processing of consciously perceived or perceptually suppressed information, as determined by the dominance or suppression of spiking activity in the BFS condition, is not a limiting factor for the modulation of beta oscillations by visual input. In particular, beta oscillatory activity recorded from sites where spiking activity becomes suppressed exhibits the same desynchronization-rebound pattern recorded from the same sites when spiking activity is dominant.

The absence of any stimulus preference in the power of beta (15–30 Hz) oscillations during monocular PA, in sites where local spiking activity is selective for one of the two stimuli used, is not surprising. It is known that even high-frequency, gamma, LFP's which are more likely to have a similar tuning to spikes than beta oscillations don't exhibit the same robust tuning as spiking activity in the visual cortex (Frien et al., 2000; Henrie and Shapley, 2005; Liu and Newsome, 2006; Berens et al., 2008; Panagiotaropoulos et al., 2012). Poor feature selectivity has been ascribed to different factors, some of them being that gamma activity is generated by neuronal ensembles larger than the local neuronal populations contributing to multi-unit activity recorded from the same electrode. Particularly for the beta LFP band, the impressive absence of any stimulus selectivity has been suggested to reflect the dominant influence of diffuse neuromodulatory input (Belitski et al., 2008; Magri et al., 2012). It is therefore likely that the non-specific modulation of beta oscillations during PA reflects a common source of input in the LPFC. Most importantly, our findings could suggest that this input is not affected by visual competition since the magnitude of nonspecific modulation is similar during both PA and BFS. We can therefore conclude that under baseline, control-free conditions, the modulation of beta oscillations is independent of conscious or unconscious stimulus processing in the LPFC.

## **IMPLICATIONS FOR CONTROL FUNCTIONS AND CONSCIOUSNESS**

Although in this study we didn't use a control task our findings are of importance for future studies that will explicitly manipulate both consciousness and control functions. We suggest that our results point to a functional independence between the sensory modulation of oscillatory signals that are employed by control processes (beta oscillations) and conscious processing in the prefrontal cortex under baseline, control-free conditions. Furthermore, it is likely that beta oscillations could reflect an intrinsic mechanism of elementary control due to the pattern of modulation observed as a result of sensory input. Apart from higher-order processes, control functions can apparently engulf more basic functions that satisfy the criterion of disturbance compensation (Hommel, 2007). Our results suggest that visual sensory input represents a disturbance to the cortical network interactions responsible for generating the intrinsic prefrontal beta rhythm. This sensory disturbance results in the initial desynchronization of beta oscillations as reflected in the beta power reduction. During that period the network interactions responsible for beta become destabilized and result in a reduction/desynchronization of beta power but soon control over this disturbance is achieved by the underlying network as reflected in the rebound of beta activity ∼400 ms following a change in visual input. The similarity of this effect for PA and BFS, perceptual dominance and suppression, points to an independence of this elementary mechanism from the coexisting neuronal networks underlying conscious perception.

Our findings are also in line with previous studies that detected physiological signals reflecting control processes during both conscious and unconscious information processing, especially in the prefrontal cortex which appears to have a crucial role in control functions (Berns et al., 1997; Stephan et al., 2002; Lau and Passingham, 2007; van Gaal et al., 2008, 2010). The extracellular electrophysiological recordings in the LPFC used in our study offered the additional advantage of high spatial resolution compared to fMRI or EEG recordings. The limited spatial resolution of these methods prevents the detection of local sites involved in the conscious processing of a particular visual stimulus. However, this can be achieved using local extracellular electrophysiological recordings (Logothetis and Schall, 1989; Leopold and Logothetis, 1996; Sheinberg and Logothetis, 1997; Kreiman et al., 2002; Gail et al., 2004; Keliris et al., 2010; Panagiotaropoulos et al., 2012). For the first time, we were able to record control-related signals (i.e., beta oscillations) from prefrontal sites where spiking activity reflected perceptual dominance or suppression during control-free conditions and our findings may support the conclusions of physiological studies suggesting that control and consciousness are probably independent, but also overlapping, functions. Future studies that combine intracortical recordings of electrophysiological signals during conscious perception or perceptual suppression and control within the same task could further elucidate the relationship between these two higher-order cognitive functions.

## **REFERENCES**


## **AUTHOR CONTRIBUTIONS**

Conceived and designed the experiments: Theofanis I. Panagiotaropoulos. Performed the experiments: Theofanis I. Panagiotaropoulos, Vishal Kapoor. Analyzed the data: Theofanis I. Panagiotaropoulos. Contributed reagents/materials/analysis tools: Theofanis I. Panagiotaropoulos, Nikos K. Logothetis. Wrote the paper: Theofanis I. Panagiotaropoulos.

## **ACKNOWLEDGEMENTS**

This work was supported by the Max Planck Society. We thank Joachim Werner and Axel Oeltermann for excellent technical help.


*Neuroimage* 32, 1281–1289. doi: 10.1016/j.neuroimage.2006.06.005


cortical areas. *J. Neurophysiol.* 96, 1492–1506. doi: 10.1152/jn. 00106.2006


Conscious and subconscious sensorimotor synchronization–prefrontal cortex and the influence of awareness. *Neuroimage* 15, 345–352. doi: 10.1006/nimg.2001.0929


*Neuroscientist* 18, 287–301. doi: 10.1177/1073858411404079


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 June 2013; accepted: 19 August 2013; published online: 11 September 2013.*

*Citation: Panagiotaropoulos TI, Kapoor V and Logothetis NK (2013) Desynchronization and rebound of beta oscillations during conscious and unconscious local neuronal processing in the macaque lateral prefrontal cortex. Front. Psychol. 4:603. doi: 10.3389/fpsyg. 2013.00603*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Panagiotaropoulos, Kapoor and Logothetis. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Dancing in the dark: no role for consciousness in action control

## *Bernhard Hommel\**

*Cognitive Psychology Unit, Institute for Psychological Research, Leiden Institute for Brain and Cognition, Leiden University, Leiden, Netherlands \*Correspondence: hommel@fsw.leidenuniv.nl*

*Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA*

## **CONSCIOUSNESS AND ACTION CONTROL**

It was Sigmund Freud who put the relative contributions from conscious and unconscious processes to the control of human action on the psychological agenda. Freud (1949) suggested that action control emerges from the interplay between unconscious, automatic, and reward-oriented action tendencies generated by the Id and rational, socially mediated considerations provided by the Ego. While Id processes were assumed to be inaccessible for consciousness in principle, some, but not all Ego operations were claimed to be conscious. This basic logic still provides the blueprint for our current theorizing about action control. Indeed, numerous "dual-route" models in almost all psychological and cognitiveneuroscientific research areas assume that human action emerges from the interplay between consciously inaccessible automatic action tendencies and consciously accessible top-down processes that enforce intentional action goals and social acceptability [for an overview, see Evans and Stanovich (2013)]. Interestingly, many authors associate conscious accessibility with cognitive control (Hommel, 2007). For instance, in the action-control model of Norman and Shallice (1986), automatic, stimulus-driven actions are contrasted with actions that are under "deliberate conscious control," as if unconscious deliberate control would be inconceivable. In the same spirit, Libet (1985) has suggested that consciousness might have a "veto" that prevents unwanted actions from execution. In the following, I will argue that there is no evidence that consciousness plays a causal or decisive role in action control, so that there is no reason to believe that consciousness is necessary or useful for the control of human actions.

## **EXECUTIVE IGNORANCE**

If agents would control their actions through the having of conscious experiences, they should be able to report how and based on what information they are exerting such control. However, agents know surprisingly little about their actions (Wegner, 2002). As James (1890, p. 499) put it: "we are only conversant with the outward results of our volition, and not with the hidden inner machinery of nerves and muscles which are what it primarily sets it at work"—a kind of executive ignorance (Turvey, 1977). Numerous examples show that actions can be parameterized and redirected by stimuli that the agent is unable to consciously perceive because of subliminal presentation or lesions in higher perceptual areas [e.g., Prablanc and Pélisson, 1990; for overviews, see Glover (2004); Milner and Goodale, 1995]. Agents can be easily fooled into experiencing artificial effectors as being a part of their own body (Botvinick and Cohen, 1998) and perceiving actions of other people as being carried out by themselves, or vice versa (e.g., Nielsen, 1963). But even higher-level executive-control operations can be triggered by stimuli the conscious perception of which is prevented by masking procedures [for overviews, see van Gaal et al. (2012); Kunde et al., 2012]. Among other things, this holds for the implementation of a task set (Reuss et al., 2011) and the stopping of the planned action (van Gaal et al., 2010), suggesting that executive control does not rely on consciousness. Indeed, there is widespread agreement that generating a conscious experience takes time, at least 300-500 milliseconds (e.g., Libet, 2004; Dehaene et al., 2006), which would be way too slow for many everyday actions and most reactions in cognitive-psychological tasks. Hence, not only is our conscious knowledge about action control severely limited, it is also difficult to see at which point in time the application of this knowledge would be useful.

One way to save a role for consciousness and action control would be to relate it to the translation of intentions into more specific action plans. Consider, for instance, the seminal study of Libet et al. (1982), who observed that the physiological indicators of action preparation preceded the agent's conscious urge to act by hundreds of milliseconds. Even though this might be taken against a causal role of conscious experience in the online-generation of actions (Wegner, 2002), it does not rule out such a role in translating instructions into a general action plan at the beginning of the study. Indeed, several authors since Exner (1879) have considered that implementing such a plan delegates control to internal and external stimuli, which might very well operate outside of consciousness (Bargh, 1989; Hommel, 2000). And yet, this would leave a severely limited role of consciousness that is restricted to off-line control. Moreover, it has been claimed that integrating information from and across different informational maps and systems require conscious experience (Baars, 1988) and one might argue that preparatory off-line action planning involves such kind of cross-domain integration. However, recent demonstrations that neither the integration of multimodal event features (Zmigrod and Hommel, 2011) nor the integration of objects with their background (Mudrik et al., 2011) depend on consciousness do not support this possibility.

## **LACK OF SPECIFICITY**

While many action-control operations were shown to be independent of conscious experience, some have been claimed to require consciousness. After reviewing the available evidence van Gaal et al. (2012) conclude that the conscious representation of task-related stimuli increases the duration (in the range of milliseconds or seconds), flexibility, and strategic use of the represented information for action control and other cognitive operations. In another recent review, Kunde et al. (2012) conclude that all sorts of cognitive-control operations can be automatically driven by endogenous information as long as it is provided by individual, clearly discriminable events and associated with the operation in a one-to-one manner. Conscious representation, in turn, is required if the cues are implicit, distributed in space and time (like frequency information), and context-dependent. Key findings emphasized in both reviews is the absence of conflict-induced cognitive adaptations, like post-error slowing and increased attention to relevant information after conflict trials or frequent conflicts, if the conflict is not consciously perceived.

It is interesting to note that these examples are not only surprisingly few (if one would suspect consciousness to control action as a rule) but they are also rather nonspecific and nonrepresentative for voluntary-action control. They are not representative because sessions with tens to hundreds of trials with many repetitions of just a few stimuli and responses are necessary to create these (often rather small) trial-to-trial effects, conditions that under real-life conditions would motivate the employment of automatic routines rather than online action planning (Norman and Shallice, 1986). And they are nonspecific as any kind of consciously represented information—not just action-related one—is more likely to be held active and made available for a longer time. Most importantly, none of the consciousness-related abilities considered so far (information maintenance, availability, and conflict-induced adaptations) seems so crucial that its loss would seriously compromise voluntary action control.

## **NO CAUSALITY**

Cognitive operations or processing results may correlate with the presence or absence of conscious representation for many reasons. The probably most obvious one has to do with the fact that human brains are noisy, so that the quality of representing a particular piece of information can vary over time and trials. Signal-detection theory states that reporting a particular state of affairs requires that evidence passes a particular threshold and that it can be distinguished from noise, which implies that low-energy, complex, and/or difficultto-discriminate information is unlikely to be reportable. But it also implies that this information is unlikely to be usable for other purposes than conscious report, irrespective of any causal dependency of that other purpose on conscious experience. Accordingly, the mere correlation between the accuracy of conscious report and the usability of information for action control does not tell us anything about the causal relationship between consciousness and action control.

What is needed to make a causal case is the demonstration that preventing conscious representation without reducing signal quality or affecting thresholds impairs action control, or some other sort of proof that signal quality and thresholdsetting cannot account for the correlation. According to my knowledge, no such proof has been provided so far. Worse, there is not even evidence that the few consciousness-correlated functions are really under voluntary control. Observations like post-error slowing or increased attention to relevant information after incongruent trials are often called "strategic" because they seem to optimize some aspect of behavior: slowing down after having done something wrong and paying attention after having experienced decision conflict sounds very reasonable and makes the impression of being the outcome of a strategic (i.e., goal- and context-dependent) decision. However, not only are the overall performance benefits of such "strategies" often negligible (speed is traded for accuracy with post-error slowing and facilitation benefits are traded for interference costs through post-conflict potential effects), but agents also seem to have little choice in applying them. As shown by Jiménez and Méndez (2013), the impact of previous incongruency experiences is entirely independent of (i.e., unaffected by) the actual expectations of the agent, suggesting that sequential effects are due to an automatic learning process [for an application of this logic to congruence-probability effects, see Hommel (1994)]. The degree of associative learning and the reliability of the emerging associations must depend on the quality and discriminability of the signals being associated, which would explain why sequential effects are less pronounced and less reliable under conditions that are likely to reduce signal-to-noise ratios and discriminability—Kunde et al.'s (2012) "implicit" stimulus conditions. Hence, the available evidence can be parsimoniously accounted for by well-understood lowlevel associative processes. These processes apparently run off automatically and, even though they might often support action control (which might well be the reason why evolution has equipped us with them), they can hardly count as "strategic" except in a metaphorical sense (much like Darwinian evolution would be considered a survival "strategy"). Most importantly, there is not any positive evidence that they can be "consciously controlled" and the little evidence we have actually suggests the opposite.

## **WHAT ELSE IS CONSCIOUSNESS GOOD FOR?**

Taken altogether, there is not yet any demonstration of a causal role of consciousness in human action control, which given the enormous interest in this issue must be considered surprising. And it raises the question what else conscious experience might be good for. Even though this brief opinion paper does not seem appropriate to even try tackling that issue exhaustively, a few speculations might be in order. The most obvious difference between conscious and unconscious representations is that we are commonly able to communicate about the former but not the latter. Indeed, most researchers accept communicability of the represented information as either the consciousnessdefining characteristic or at least a useful experimental operationalization. It is hard to see what communicability might contribute to the online control of actions but it provides obvious benefits for social purposes: we can inform other people about our action plans, instruct others to carry out particular actions, evaluate these actions and provide feedback, and discuss the pros and cons of alternative action plans. All of that effectively increases social predictability and thus reduces uncertainty—an aversive state that is driving much of our behavior (Berlyne, 1960). Communicability might also help us to describe and try understanding our own behavior in ways that allows relating and comparing it to others, thus providing the opportunity for self-reflection and social impression management. Most of these hypothetical functions are post-actional, so that they are not compromised by the long time that conscious representation needs to build up or by the lack of impact of most if not all conscious representations on action control proper. And they are not unlikely to work back on action control in a broader, socially embedded sense: how we interpret and sell our actions to the public will affect its reactions and feedback, which again might often provide selective reward and social constraints for our future actions. Hence, the true impact of consciousness on the control of our actions may be more indirect and more socially mediated than common sense has it.

## **ACKNOWLEDGMENTS**

The preparation of this work was supported by the European Commission (EU Cognitive Systems project ROBOHOW.COG; FP7-ICT-2011).

## **REFERENCES**


Jiménez, L., and Méndez, A. (2013). It is not what you expect: dissociating conflict adaptation from expectancies in a Stroop task. *J. Exp. Psychol. Hum. Percept. Perform.* 39, 271–284.


*Psychol. Sci.* 22, 764–770. doi: 10.1177/0956797611 408736


*Received: 09 June 2013; accepted: 09 June 2013; published online: 26 June 2013.*

*Citation: Hommel B (2013) Dancing in the dark: no role for consciousness in action control. Front. Psychol. 4:380. doi: 10.3389/fpsyg.2013.00380*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Hommel. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## How and to what end may consciousness contribute to action? Attributing properties of consciousness to an embodied, minimally cognitive artificial neural network

## *Holk Cruse and Malte Schilling\**

*Center of Excellence 'Cognitive Interaction Technology', University of Bielefeld, Bielefeld, Germany*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University; University of California, USA*

## *Reviewed by:*

*Harold Bekkering, University of Nijmegen, Netherlands J. Scott Jordan, Illinois State University, USA Johan Kwisthout, Radboud Universiteit Nijmegen, Netherlands*

#### *\*Correspondence:*

*Malte Schilling, Center of Excellence 'Cognitive Interaction Technology', University of Bielefeld, D-33594 Bielefeld, Germany e-mail: malteschilling@ googlemail.com*

An artificial neural network called reaCog is described which is based on a decentralized, reactive and embodied architecture developed to control non-trivial hexapod walking in an unpredictable environment (Walknet) while using insect-like navigation (Navinet). In reaCog, these basic networks are extended in such a way that the complete system, reaCog, adopts the capability of inventing new behaviors and – via internal simulation – of planning ahead. This cognitive expansion enables the reactive system to be enriched with additional procedures. Here, we focus on the question to what extent properties of phenomena to be characterized on a different level of description as for example consciousness can be found in this minimally cognitive system. Adopting a monist view, we argue that the phenomenal aspect of mental phenomena can be neglected when discussing the function of such a system. Under this condition, reaCog is discussed to be equipped with properties as are bottom-up and top-down attention, intentions, volition, and some aspects of Access Consciousness. These properties have not been explicitly implemented but emerge from the cooperation between the elements of the network. The aspects of Access Consciousness found in reaCog concern the above mentioned ability to plan ahead and to invent and guide (new) actions. Furthermore, global accessibility of memory elements, another aspect characterizing Access Consciousness is realized by this network. reaCog allows for both reactive/automatic control and (access-) conscious control of behavior. We discuss examples for interactions between both the reactive domain and the conscious domain. Metacognition or Reflexive Consciousness is not a property of reaCog. Possible expansions are discussed to allow for further properties of Access Consciousness, verbal report on internal states, and for Metacognition. In summary, we argue that already simple networks allow for properties of consciousness if leaving the phenomenal aspect aside.

**Keywords: recurrent neural network, consciousness, minimal cognitive system, motor control, robotic architecture, embodiment, access consciousness, internal body model**

## **INTRODUCTION**

The nature of the mental, in particular of consciousness, and its relation to the physical world is a fundamental concern in philosophy of mind. Studies addressing this question have led to a variety of views concerning this matter. Vision (2011) reviews a huge number of variations and sub-variations of these views forming a "crowded and messy field" (Vision, 2011, p. 29). Although, as seen by somebody not being an expert in philosophy of mind, most of these views appear to show a large amount of plausibility, the various positions defended by their proponents appear to be characterized by fundamental disagreements, and a commonly agreed solution seems not to be in reach.

Therefore, as a complement to these top-down approaches, in what follows we would like to begin with a quite different approach, a bottom-up approach. The goal of this approach is to develop a neural architecture that shows a number of abilities found in autonomous agents, i.e., the goal is to formulate quantitative hypotheses concerning the structure and functioning of autonomous and perhaps cognitive systems that can be tested on a robot. In this article, such a system will be presented and used as a scaffold for discussions concerning the higher-level properties usually connoted with mental aspects. In particular we can ask to what extent properties may be observed that have not explicitly been implemented and therefore may loosely be termed emergent properties. Specifically, in this context properties are considered that may be related to high-level properties as are attention, intention, volition, or consciousness.

Our goal is not to construct an artificial system that is equipped with, for example, consciousness. Instead, we want to use this system as a tool to test to what extent descriptions of mental phenomena used in psychology or philosophy of mind may be applied to such an artificial system. All these definitions necessarily rely on verbal formulations and are therefore open to different interpretations. In contrast, a definition based on a mathematical formulation or being given in the form of a quantitative simulation does not suffer from such ambiguities. Based on such an explicit definition, the properties of the phenomenon can be studied in detail and judgments are possible whether the specific definition chosen appears to be sufficient or whether critical aspects of the phenomenon of interest are missing. In the latter case, the definition may be improved accordingly. To start with such an approach, we refer to definitions of attention from Desimone and Duncan (1995), of intention from Pacherie (2006) and Goschke (2013), and for volition from Goschke (2013). Concerning consciousness, as discussed byCleeremans (2005), this phenomenon may only be approachable if the task is split into different aspects that are treated separately. Following Block (1995, 2001), to this endCleeremans (2005) distinguishes between Access Consciousness, Metacognition, and Phenomenal Consciousness.

To proceed in this way, in Section "reaCog, An Embodied, Minimal Version of a Cognitive System" we will briefly and, as far as required for a basic understanding, explain the essential properties of a system called reaCog that is supposed to be equipped with cognitive abilities while being strongly based on a reactive architecture (Schilling and Cruse, 2008, submitted).

Applying a bottom-up approach we focus on a reactive system that is able to deal with a specific domain of behavior, namely walking with six legs in an unpredictable environment including climbing over very large gaps. The reactive part of the system has been termed "Walknet" and is biologically inspired by detailed work on the walking of the stick insect (Dürr et al., 2004; Bläsing, 2006; Schilling et al., submitted a). The stepping patterns ("gaits") observed (in the robot as in the insects) are not explicitly implemented but result from the cooperation of local rules and the coupling through the environment. Furthermore, the system has been expanded by a network allowing for insectlike navigation ("Navinet," Cruse and Wehner, 2011; Hoinville et al., 2012), where the agent is able to select visiting one of a number of food sources learned, and to decide between traveling to the food source or back home. Of particular interest is here that Navinet (like a desert ant) attends known visual landmarks only in the appropriate context, i.e., depending on the food source it is actually traveling to. Furthermore, the reactive network Navinet does not require an explicit "cognitive map" to describe experimental results, for which earlier authors have assumed such a map to be necessary.

The complete network is based on a decentralized architecture consisting of procedural, or reactive, elements which, in turn, consist of artificial neurons. The reactive network, showing a heterarchical structure, allows for selection of different behaviors, which includes protection against, in the actual context, non-relevant sensory input.

As a next "evolutionary" step, the network is equipped with a flexible internal body model allowing for internal simulation of behaviors. This extended system is called reaCog, consisting of the reactive "Walknet" which has been expanded to include cognitive properties. Together with the introduction of this "cognitive expansion," reaCog comprises the ability to plan ahead and to invent new behaviors in order to solve problems for which no solution is actually available. As such, this cognitive expansion cannot function by itself, but only, like a parasite, operates on top of the reactive structures (Norman and Shallice, 1986). The final decision to store a new behavioral procedure is not purely stochastic, because the proposals made by the cognitive expansion are tested for feasibility via the internal simulation as well as by performing the behavior in reality. Thus, invention of new behaviors may be viewed as to be based on a Darwinian procedure (See General Characteristics of reaCog and A Possible Expansion).

Following the definition of McFarland and Bösser (1993) a cognitive system is characterized by the capability of planning ahead. In this sense, reaCog can be termed a cognitive system, that allows planning ahead via internal simulation. As the cognitive system is crucially dependent on its reactive foundations (therefore the name reaCog), the development of rich cognitive abilities requires a correspondingly rich behavioral repertoire.

After having introduced reaCog in Section "reaCog, An Embodied, Minimal Version of a Cognitive System," we will, in Section "Properties of reaCog Being Characterized by Applying Other Levels of Description," discuss to what extent this network, forming a simple structure, could serve as a scaffold providing a quantitative foundation for more abstract concepts formulated on levels of description as being applied in psychology or philosophy of mind. Specifically, we will address the phenomenon of consciousness which, according to some authors, may be an inherent property for at least some cognitive systems. Therefore, although we do not want to state that consciousness should be attributed to our system in any sense, we want to discuss in Section "Properties of reaCog Being Characterized by Applying Other Levels of Description" how properties characterized on differentlevels of description can be observedin ourmodel. In Section "Phenomenality" we discuss as to how phenomenal aspects might be attributed to physical systems and conclude by arguing that the phenomenal aspect is not crucial for understanding the function. Wearenot trying to solve the"hard"problem (Chalmers,1996),but will argue that it suffices to concentrate on the functional aspect.

InSection"Attention,Volition, Intention"wewillbrieflyaddress the question if terms as attention, intention, or volition might be attributed to our network. In Sections "Access Consciousness" and *"Metacognition"* we will specifically address whether and how our model maps to some of the different aspects reviewed by Cleeremans(2005)asareAccessConsciousnessandMetacognition. We will argue that the network studied does show some aspects of Access Consciousness, but not of Metacognition and will finish with Conclusions in Section "Discussion and Conclusion."

## **ReaCog, AN EMBODIED, MINIMAL VERSION OF A COGNITIVE SYSTEM**

The network reaCog represents an expansion of a neural network based controller called Walknet which has been derived as a hypothesis to describe a large number of behavioral studies performed with stick insects (Dürr et al., 2004; Schilling et al., submitted a).

The controller has to deal with a body containing 22 degrees of freedom (DoF), 3 DoF for each of the six legs and 4 DoF allow for movements along the body axis. As body position in space is defined by only 6 DoFs (three for position in space, three for orientation) there are 16 DoFs free to be decided upon. The controller consists of a decentralized architecture, first of all six more or less independent controllers, one for each leg. The controllers of neighboring legs are coupled via a small number of channels transmitting information concerning the actual state of that leg (e.g., swing, stance) or its position, i.e., values of joint angles (**Figure 1**). The architecture of the leg controller is depicted in **Figure 2**, lower part, black boxes. Only two leg controllers are shown. The single leg controller consists of several procedures that are realized by artificial neurons forming a local, in general,

**arrangement of the leg controllers and the coordination influences (1–6) between legs.** Legs are marked by L for left legs and R for right legs and numbered from 1 to 3 for front, middle and hind legs, respectively.

recurrent neural network (RNN). These procedural elements, or modules, might receive direct sensory input and provide output signals that can be used for driving motor elements. But other modules may also provide input to a module. All these networks may be considered to form elements of the procedural memory. The two most important procedural elements in our example are the Swing-net, responsible for controlling a swing movement, and the Stance-net controlling a stance movement. In addition, each leg possesses a so-called Target\_fw-net for forward walking and Target\_bw-net for backward walking, both influencing Swing-net.

To allow the system to select autonomously between different behaviors as for instance standing and walking, or forward and backward walking, reaCog is expanded by introduction of a RNN consisting of so-called motivation units (**Figure 2**, marked in red). The function of a motivation unit as applied here is to control to what extent the corresponding procedural element contributes to the behavior. To this end, these units influence the strength of the output of its procedure network (in a multiplicative way). As illustrated in **Figure 2**, motivation units can also be used to influence other motivation units via excitatory or inhibitory connections. For example, units which belong to the procedural nets controlling the six legs (only two legs are depicted in **Figure 2**) show mutual positive connections to a unit termed "walk" in **Figure 2**. This unit serves the function of arousing all units possibly required when the behavior walk is activated.

**FIGURE 2 | A section of Walknet showing two leg controllers.** Each consists of a Stance-net and a Swing-net, the latter being connected with a Target-net (Targetfw). The motor output acts on the legs (box muscles/body/environment). Sensory feedback is used by the motor procedures as well as to switch between the states (red units connected by mutual inhibition). r1 represents coordination rule 1 (see **Figure 1**). Furthermore, the network is equipped with further procedures (Target\_bw-net), a body model (blue) and a motivation unit network (red). The body is represented by the boxes "leg."

In addition, we introduce units "forward" and "backward" to activate procedures required for forward or backward walking (**Figure 2**, fw, bw), respectively, by selecting specific Target-nets. Both units "fw" and "bw" aremutually coupledwith themotivation unit "walk." Only indicated in this figure is that the unit "walk" may be coupled via mutual inhibition to other units that stand for different behaviors like, for example, standing still (unit "stand"). The corresponding procedures are, however, not depicted. It is also not shown that these "higher-level" motivation units may receive direct or indirect input from sensory units that influence the activation of a motivation unit. In **Figure 2** this is only shown for the "lower-level" motivation units of Swing-net and Stance-net (for example, a ground contact sensor of a leg being stimulatedmay activate themotivation unit of the stance procedure of this leg). Also, the complete motivation unit network used for controlling navigation is not shown (see Hoinville et al., 2012).

As illustrated in **Figure 2**, this at first glance hierarchical structure of the motivation unit network is in general not forming a simple, tree-like arborization. As indicated by the bi-directional connections, motivation units form a RNN coupled by positive (arrowheads) and negative (T-shaped connections) influences (for details concerning the weights used see Schilling et al., submitted b). This structure may therefore be better described as "heterarchical." Some of these motivation units are coupled by local winner-take-all connections. This is true for the Swing-net and Stance-net of each leg, as well as for the motivation units for forward and backward walking. Thereby, a selection of one of the available Target-nets is possible. Excitatory connections between motivation units allow for building coalitions. As can be derived from **Figure 2**, there are different overlapping ensembles possible. For example, all "leg" units and the unit "walk" are activated during backward walking and during forward walking, but only one of the two units termed "fw" (forward) and "bw" (backward) and only some of the targeting modules are active in either case. In this way, through the combination of excitatory and inhibitory connections this architecture can produce various stable attractor states or "internal states." Such a state protects the system from responding to inappropriate sensory input. For instance, as a lower-level example, depending on whether a leg is in swing state or in stance state, a given sensory input can be treated differently. Correspondingly, internal states can be distinguished on higher-levels, as for example walking, standing still, or feeding (for further details see Schilling et al., submitted a,b).

## **BODY MODEL**

A further important element of reaCog concerns the representation of a body model. This body model is realized by a specific RNN (Schilling, 2011) and has by itself a modular structure (Schilling and Cruse, 2007; Schilling et al., 2012). It consists of six networks each representing one leg. These modules are connected on a higher-level forming a seventh network representing the whole body. The latter network represents the central body and the legs in an only abstracted form. In **Figure 2** the elements of the body model are marked in b1lue. Thus, the body model is represented by a modular structure which, as it is constructed as a RNN, at the same time comprises a holistic system [**Figure 3**, for details concerning the body model see (Schilling, 2011; Schilling and Cruse, 2012)].

In normal walking, i.e., still in the reactive mode, the body model is used in forward and backward walking as well as in negotiating curves and provides joint control signals to the corresponding Stance-net. As the model mirrors the 22 DoF of the insect body the task is underdetermined. Therefore, calculation of the joint control signals is still a hard problem and a unique solution is not directly computable (Schilling and Cruse, 2012). As a solution, we apply the idea of the passive motion paradigm to this problem (von Kleist, 1810; Mussa Ivaldi et al., 1988; Loeb, 2001). Like a simulated marionette puppet (**Figure 3**), the internally simulated body is pulled by its head in the direction of desired body movement (**Figure 3B**, delta\_0), provided, for example, by a vector based on sensory input from the antennae or, if available, by visual or acoustic input (**Figure 2**, sensory

**FIGURE 3 | (A)** Illustrates how the body model is attached to the body of robot Hector. **(B)** shows the abstracted body model. Vectors delta\_0 and delta\_back can pull the model in forward or backward direction, respectively. On the right, an example is shown how a leg network is connected to the abstracted/central body model.

input). As a consequence, the stance legs of the puppet move in an appropriate way. The changes of the simulated joint angles can be used as motor commands to control the actual joints. To control backward walking, the body model is pulled by the vector delta\_back (**Figure 3B**) at the bottom. If such a body model is given that represents the kinematical constraints of the real body, we obtain in this way an easy solution of the inverse kinematic problem, i.e., a solution for the question how the joints of legs standing on the ground have to be moved in concert to propel the body.

The body model also receives sensory data. Due to its holistic structure the body model integrates redundant sensory information and is able to correct possible errors in the sensor data (Schilling and Cruse, 2012). As will be sketched below, due to its ability of pattern completion, this model can also be used as a forward model. Therefore, the model allows for prediction, too, a property that can be exploited when dealing with the ability to plan ahead.

## **PLANNING AHEAD**

The network, as described, consists of a "hard-wired" structure, i.e., the weights connecting the artificial neurons are fixed. Nevertheless, the system is able to flexibly adapt to properties of the environment, as for example deal with various disturbances and climb over large gaps (Bläsing, 2006). However, situations may occur in which the controller runs into a deadlock. Think for example of the situation in which, during forward walking, by chance all legs but the right hind leg are positioned in the frontal part of their corresponding range of movement, whilst the right hind leg is positioned very far to the rear. When this leg starts a swing movement, the body may fall backward as the center of gravity is not anymore supported by the legs on the ground. Such a "problem" might be signaled by specific sensory input, a "problem detector." In our case, this could, for example, be a system reacting to a specific load distribution of the legs. To find a way out of this deadlock, a random selection of a behavioral module not belonging to the actual context could provide help. A possible solution in this case might be a backward step of the right middle leg. Such a backward step of the middle leg would make it possible to support the body, then allowing the hind leg to start a swing. However, in our controller, backward steps are only permitted in the context of backward walking. How might it still be possible for the system to find such a solution?

**Figure 4** illustrates a simple expansion allowing the system to search for such a solution. As we will argue later, we name this expansion "attention controller." A third layer (**Figure 4**,

green units), essentially consisting of a recurrent winner-take-all network (WTA-net) <sup>1</sup> is arranged in such a way that each motivation unit has a partner unit in the WTA-net. Motivation units already activated in the actual context inhibit their WTA partner unit (T-shaped connections in **Figure 4**). Thus, a random activation of the WTA-net will, after relaxation, find a unit not belonging to the currently activated modules. The WTA unit winning the competition can then be used to activate its partner motivation unit and thereby trigger a new behavior that can be tested for being able to solve the problem. In this way, the network has the capability of following a trial-and-error strategy.

As has been proposed (Schilling and Cruse, 2008) a further expansion of the system may permit to use the body model instead of the real body to test the new behavior via "internal trial-and-error" whilst the motor output to the real body is switched off. To this end, switches have to be introduced allowing the motor output signals to circumvent the real body and being passed directly to the body model (**Figure 4**, switch SW). Only if the internal simulation has shown that the new trial provides a solution to the problem, the behavior will actually be executed. McFarland and Bösser (1993) define a cognitive system in the strict sense as a system that is able to plan ahead, i.e., to perform internal simulations to predict the possible outcome of a behavior. Therefore, the latter expansion would, according to McFarland and Bösser, make the system a cognitive one (for details see Cruse and Schilling, 2010; Schilling and Cruse, submitted).

## **GENERAL CHARACTERISTICS OF ReaCog AND A POSSIBLE EXPANSION**

To sum up, the neural controller Walknet, as described earlier (e.g., Dürr et al., 2004; Schilling et al., submitted a), represents a typical case of an embodied controller (first-order embodiment, cf. Metzinger, 2006, forthcoming): the network is able to control the movement of a hexapod walker in unpredictably varying environments without relying on other information than available using the given mechanosensors. This is possible because the body and properties of the environment are crucial elements of the computational system – the system is embodied not only in the sense that there is a physical body (e.g., that there are internal states being physically represented), but also in the sense that the properties of the body (e.g., its geometry) are required for computational purposes. Exploiting the loop through the world (including the own body) allows for a dramatic simplification of the computation. These properties can also be attributed to the expanded version, reaCog. In this system, being expanded by an internal body model, control of DoFs does not result from explicit specification by the neuronal controller, but results from a combination/cooperation of the neuronal controller, the internal body model and the coupling via the environment. Furthermore, the body model is used for planning ahead. Such a network, according to Metzinger (2006, forthcoming), represents a system being characterized by second-order embodiment.

The procedures forming the decentralized controller are basically arranged in parallel, i.e., obtain sensory input and provide motor output, but there are also procedures that receive input from other procedures and, as a consequence, procedures that provide output to other procedures.

The artificial neural network reaCog shows automatic behavior and action selection on the reactive level, where several of these procedures can be performed in parallel, but also shows control of behavior on the cognitive level, as the decisions based on imagined action (probehandeln) are not determined strictly by the sensorily given situation. This is the case because due to the noise active in the attention controller, there is a stochastic effect. The final decision is, however, not purely stochastic, because the proposals made by the attention controller are tested for feasibility via the internal simulation. Before being stored in long term memory, the proposal is further tested by performing the behavior in reality. In this way, this decision may be viewed as to be based on a Darwinian procedure, starting with an, in part, stochastic "mutation," followed by an, in our case twofold, selection testing the proposal for "fitness."

Furthermore, inspired by Steels (2007); Steels and Belpaeme (2005), the network may be expanded by a forth layer (not depicted in **Figure 4**), that contains specific procedures, namely networks that represent verbal expressions. These "word-nets" may likewise be used to utter or to comprehend the word stored. The underlying idea is to connect each word-net with a unit of the motivation network of which it carries the meaning (e.g., the word-net "walk" should be connected with the motivation unit walk), thereby grounding the symbolic expression (Cruse, 2010). Although the latter two levels (WTA-net and word-nets) are still quite speculative as they have not yet been tested, together with the two lower layers they illustrate the principal idea of this architecture (**Figure 5**). Horizontally arranged modules (procedures, motivation units, WTA neurons, and procedures for words), are ordered in the horizontal layers in such a way that the corresponding elements in the different layers appear in a vertical order, leading to modules arranged in a columnar fashion (**Figure 5**, dashed rectangles). Addressing this columnar structure does not mean that each lower-level procedure or each motivation unit has to have a partner in the upper layers, but only means that such connections are in principle possible. Similarly, not every unit or procedure in the upper layers necessarily has a partner procedure in the lowest layer.

<sup>1</sup>In a recurrent winner-take-all network, each unit receives positive feedback from itself and negative feedback from all other units of the network. When any random activation is given to these units, after some iterations one unit will show a positive activation and all other units will show an activation of zero.

## **PROPERTIES OF ReaCog BEING CHARACTERIZED BY APPLYING OTHER LEVELS OF DESCRIPTION**

Having available a quantitatively defined network that is able to control specific behaviors of an agent, we will now ask to what extent reaCog is able to realize properties that were not explicitly implemented. For example, as has been noted earlier (Schilling et al., 2008, submitted a), when applying descriptions used in the behavioral domain, a term like tripod gait is sensibly used to describe the walking behavior of a hexapod, although no explicit tripod gait controller, for instance, is implemented in reaCog (different to many other hexapod controllers). Instead, at the neuronal/computational level only local rules are used to couple neighboring legs, which allows for different walking patterns to appear, depending on the control parameter "velocity."

In the following, we will particularly concentrate on concepts usually applied in domains other than computer science and behavioral biology, as are psychology and philosophy of mind. Adopting other levels of description may not only be feasible to better understand the properties of our system on a more abstract level, but may also help to find more operational definitions for concepts used in the other disciplines. Underlying such an approach is the assumption that most, if not all of these phenomena arise as emergent properties (Vision, 2011) and that they can only be observed and characterized when higher-levels of description are applied.

Whereas some authors speculate that phenomena as for instance consciousness can only be attributed to human beings or possibly monkeys, other authors claim that consciousness may come in various degrees and may, to a smaller degree, already occur in lower-level animals (Dennett, 1991). This view is supported by the observation that already small-scale networks might allow for interesting cognitive properties (Herzog et al., 2007; Menzel et al., 2007). Due to its evolutionary plausibility we tend to the latter assumption, and therefore raise the question to what extent any aspects of consciousness could be attributed to the network discussed here although, when designing the network, we did not aim to "implement consciousness" at all. To the extent such attributions would be possible, questions concerning the possible function of consciousness, for example as to how consciousness might contribute to action, i.e., to the control of behavior, might be addressable.

We would like to stress that, in pursuing this question, we are not trying to state what consciousness is, i.e., we do not want to "explain consciousness." We also do not assume that the categories introduced by the different authors referred to represent the ultimate solution to approach the problem. Instead, we would like to connect aspects of this complex issue, as have been addressed by different authors, with our simulation approach. This further means that the collection of properties characterizing a system as being conscious will not be discussed in a rigorous way with respect to being necessary or sufficient for a system being a conscious one. Rather, we will only compare the categories discussed by different authors with our approach. A rigorous definition might only be sensible at a later stage (see also Holland and Goodman, 2003).

To this end, we will begin by following a categorization proposed by Block (1995, 2001) and being placed in a broader framework by Cleeremans (2005). Cleeremans reviewed an impressive number of philosophical statements concerning consciousness. In spite of considerable disagreement between authors in detail (see also Vision, 2011), Cleeremans reported an interesting overlap with respect to the essential properties characterizing possible computational correlates of consciousness. According to this review, phenomena concerning consciousness may be grouped along three domains, termed Phenomenal Consciousness, Access Consciousness, and Metacognition (or Reflexive Consciousness). Concerning the phenomenal aspect of consciousness, some philosophers consider this aspect as a separate domain, being independent of Metacognition and Access Consciousness, whereas other philosophers consider phenomenality as a property not being separable, but being directly connected with Metacognition and Access Consciousness. Again, other philosophers are only prepared to attribute consciousness to systems showing Reflexive Consciousness (e.g., Rosenthal, 2002). As mentioned, we will not enter this discussion. For our purpose it is not critical which of the different taxonomies is better suited to characterize the phenomenon of consciousness. We selected one, Block's taxonomy, as a scaffold to compare the different phenomena described in the literature with properties of our network.

## **PHENOMENALITY**

What is meant by phenomenal consciousness or the phenomenal aspect of consciousness, sometimes also termed internal perspective or subjective experience? The characteristic of subjective experience may be particularly obvious in the case of pain. We might, as a thought experiment, monitor all neuronal activities of a (human) subject that result when his/her skin is stimulated by a needle. One might, in principle even examine one's own action potentials, if oneself is the subject of this experiment. In such an experiment, everybody including the subject him/herself could have a look at the data, but the experience when regarding all these neuronal activities monitored is completely different from the pain one is experiencing at this moment. The content of this subjective experience is only accessible to the person herself or himself. Nobody other than myself can judge how I feel the pain. Thus, self-observation tells us that there are systems, namely humans, that can experience an internal perspective. On the other hand, intuition tells us that there are other systems, like a stone or a simple machine (including some clever present-day robots) that may not have such an internal perspective.

In many cases, consider for example an animal like an insect, we cannot decide whether it belongs to systems that act like a reflex machine, or a clockwork, not being able to experience an internal perspective, or whether it belongs to the second type, and consequently is able to have subjective experience.

But also within the human brain there are sections that belong to one of both states and that may even be able to switch between both states. In (dreamless) sleep or under anesthesia neuronal systems are still active but subjective experience is "switched off." But also when in normal awake state, we are not aware of the contents of all the different neuronal activities taking place in our brain. Rather, at a given moment we consciously attend, and therefore subjectively experience, only one aspect and may later switch conscious attention to another one. Therefore, we have to assume that subjective experiences arise only if specific, yet unknown, types of neuronal activities are given.

Up to now we have only indirect evidence concerning the conditions required for subjective experience to arise. In an early experiment, Libet et al. (1964) performing direct electrical stimulation of the cortex found that a stimulus required a minimum of 500 ms to lead to a reportable experience. More generally, according to Bloch's law (Bloch, 1885), the subjectively experienced strength of a stimulus depends on the mathematical product of stimulus duration and stimulus intensity. This means, in other words, that the temporal integral over stimulus intensity has to reach a given threshold to become subjectively experienced.

In more recent experiments, activation of different procedures have been studied which compete for becoming subjectively experienced. For example, Ansorge et al. (1998) performed masking experiments, where participants first learned to press a button when a circle was presented on a screen, but not when a square was shown. After learning is finished, in the critical experiment the circle was given for a short period (about 30 ms) which was then followed by a longer presentation of the square. The participants reported to have only seen the square. Nonetheless, they pressed the button. This result can be interpreted in such a way that the procedure, "stimulus circle-motor response" can be executed without being accompanied by subjective experience of the circle. The second procedure, "stimulus square – no motor response" apparently influences the first procedure by inhibiting the process leading to subjective experience. This is interpreted in the following way: each procedure shows a temporal dynamics similar to that of a low-pass filter <sup>2</sup> . The motor command of a procedure can already be elicited after a smaller threshold has been reached, whereas a larger threshold is required to reach the state of subjective experience. Only in the latter state the procedure can inhibit other, competing procedures to reach the state of subjective experience. In other words, procedures appear to be connected via a WTA network, where the inhibitory connections are only active when the procedural network has reached the (higher) threshold characterizing the state of subjective experience. Therefore, in the masking experiment the second procedure is not inhibited by the first one, which allows the square to become subjectively experienced.

These results lead us to the following view. There are specific neuronal states that require time to be developed. The basic function of the neural system, namely triggering the output signal (e.g., a motor command) can be performed without phenomenal experience, but at least some procedures may in addition be able to reach the latter state. After the neural network has reached this state, additional functions may arise, one, as mentioned, being to inhibit other procedures to reach this state. Other functions might be to allow the winning procedure to access more neuronal sources, and perhaps to allow faster storing of new information (e.g., for one-shot learning).

It would of course be extremely interesting to understand in detail the conditions that are necessary and sufficient for a neuronal network to reach the state being accompanied by subjective experience. At this time merely pure speculations are possible concerning the character of such neuronal activities although impressive progress has been made in recent years (see review Schier, 2009; Dehaene and Changeux, 2011). Continuation of these research projects by combining neurophysiological with behavioral studies may lead to a better understanding of the physiological properties and functions of this state. But even if this was the case at some future time, we would not understand *why* this state is accompanied by the phenomenal aspect.

The results mentioned above support a non-dualist, or monist, view, which means that there are no separate domains, the mental and the physical domain in the sense that there are causal influences from one domain to the other one as postulated by substance dualism. Rather, both "domains" appear to be different aspects of the same underlying phenomenon. We just deal with different levels of description<sup>3</sup> .

Adopting a monist view allows us to concentrate on the functional aspects when comparing systems endowed with the phenomenal aspect, i.e., human beings, with animals or artificial systems. According to this view, phenomenality is considered a property being directly connected with specific functions of the network. This means that mental phenomena that are characterized by phenomenal content as are, for example, attention, intention, volition, emotion, and consciousness, can be treated by concentrating on the aspect of information processing (Neisser, 1967). In particular with respect to Phenomenal Consciousness, Access Consciousness, and Metacognition, this view has convincingly been supported by Kouider et al. (2010) as well as, in a recent review, by Cohen and Dennett (2011). Therefore, we will compare properties of reaCog with current definitions found in the literature concerning those phenomena. In doing so, we have however to be aware of the possibility that important functional properties may not yet be taken into account by these definitions.

Following the monist view, the question as to how it is possible that a physical system is accompanied by subjective experiences, termed the "hard problem" by Chalmers (1996), can remain open and we may yet be able to understand the functional aspects of consciousness. A further consequence would be that even an artificial system would have some kind of subjective experience, if only the appropriate (yet unknown) neural dynamics were implemented (for ethical problems connected with this matter see Metzinger, 2009). On the other hand, it might be possible that systems exist where the functional aspects currently attributed to consciousness are given although these systems are not accompanied with phenomenality, because the networks show the functions of phenomena as listed in the following sections but do not show the neural dynamics required for phenomenality. In the following, we will first address briefly attention, volition and intention, and then deal with consciousness.

<sup>2</sup>A low-pass filter is characterized by an increase of output activation that, when excited by a constant stimulus, asymptotically approaches a given output value. Such low-pass filter dynamics are for example given by RNN with attractor properties. In this case, the so-called harmony value (Rumelhart and McClelland, 1986) of the network can be used to characterize its state.

<sup>3</sup>There are various views adopting a monist approach differing in detail (epiphenomenalism, emergentism, property dualism and their many derivatives, see Vision, 2011). We will not take part in this discussion here.

## **ATTENTION, VOLITION, INTENTION**

Can we find properties corresponding to attention in reaCog? Attention concerns how perception is selected by bottom-up, i.e., sensory driven influences, or by top-down influences (Desimone and Duncan, 1995). The latter may depend on familiarity with the stimulus, or on internal (e.g., emotional) states. Concerning reaCog, there are, indeed, several cases to be observed.

The motivation unit network is especially designed to allow for competitions on different levels, in this way forming different clusters, or coalitions, of units. For example, the competition on a leg level selects between swing and stance movements. Stimulation by the ground contact sensor, for instance, changes the internal state from swing to stance. Activating the unit Stance means that sensory input relevant for stance, but not inputs relevant for swing can be perceived. Therefore, this case corresponds to bottom-up attention control.

On a more global level, behaviors different from walking or, within the context of walking, the direction on forward or backward, can be selected. Activation of these motivation units not only allows for selection of behavioral elements, but also provides a broader context according to which specific sensory inputs may be selected or not. In this sense, the motivation unit network can be considered to be a system allowing for top-down attention control. In the case of Navinet, for example, visual signals are only considered when they belong to the currently activated context defined by looking for a specific food source. The context might be changed when the food source is found to be empty.

Introduction of the cognitive expansion enables reaCog to invent new behaviors and to test them via internal simulation before executing them. In this layer, the WTA units of the cognitive expansion are arranged in accord with the motivation units in the lower layer. As this expansion of the reactive network allows the complete system, using psychological terms to describe its function, to "focus" or "concentrate" or "attend" on a specific behavior, we may also call this expansion an "attention controller<sup>4</sup> ."

This system represents a special type of top-down attention being used to select new procedures, normally not used in the current context. The decision to execute a new behavior as controlled by the attention controller will be called a cognitive decision in the following. This focusing mechanism may correspond to what sometimes has been termed "spot light" (Baars and Franklin, 2007). Thus, three types of attentional influences can be observed in reaCog. If the procedures controlled by the motivation units are equipped with the still unknown neural dynamics required for phenomenality, their content could reach the state of subjective experience.

Volition is a summary term denoting mechanisms allowing for voluntary actions. The latter are "actions that are not fully determined by the immediate stimulus situation but depend on mental representations of intended goals and anticipated effects" (Goschke, 2013). In other words, the behavior of the agent cannot be predicted by an external observer. Cognitive decisions made by reaCog are indeed based on anticipated effect using internal simulation and they follow a goal, as they aim to solve the problem at hand. These decisions contain a stochastic element, but are not arbitrary because the proposed behavior is tested via internal simulation for feasibility before being executed and because the architecture of the WTA-net being connected to the body already represents a heuristic based on some kind of topological map (solutions near the morphological site of the problem are supported). Therefore, volition may be attributed to an agent controlled by reaCog, whereby, as above, the phenomenal aspect depends on the unknown conditions concerning the required neural dynamics.

Similarly, an agent controlled by reaCog might be attributed the capability of showing intentions. An action is controlled by intention if it is goal-directed. Pacherie (2006)referring to Bratman (1987) distinguishes between different types of intentions based on the temporal characteristics: future-directed intentions and present-directed intentions. Pacherie (2006) adds a third type, called motor-intentions. The latter two are characterized as to guide either "higher-level" functions or "lower-lever" functions, respectively. According to Pacherie (2006), present-directed intentions, in contrast to motor-intentions, are considered as under "conscious" control or "rational" control. In our framework, we interpret this in such a way that motor-intentions act on the reactive level, whereas present-directed intentions require cognitive decisions. Future-directed intentions concerning long term planning are not considered here. In any case, the basic underlying control structure is given by a feedback controller and/or by a feedforward controller containing explicit or implicit representations of the goal. However, the actual behavior may require a network for the control of many more parameters including temporal aspects s is the case in reaCog. According to Goschke (2013), intentions are "causal preconditions explaining why a particular stimulus triggers a particular action (rather than a different action)" (Goschke, 2013, p. 415) In other words, "intentions can be said to shape the "attractor landscape" of an agent's behavioral state space" (Kugler et al., 1990, ref. from Goschke, 2013, p. 415). Indeed, the motivation unit network is able to form such attractor states, for example, when in Navinet the agent has decided to visit a specific food source or the nest. Depending on the actual goal, the relevant behavior will be executed while specific sensory stimuli are attended or not. Therefore, the agent may be called to be endowed with intentions.

## **ACCESS CONSCIOUSNESS**

As mentioned earlier, our approach is *not* to start with theoretical concepts of consciousness (or attention) and then construct a network that is endowed with properties of consciousness. In contrast, our goal is to construct a network that, based on a reactive network, is able to control non-trivial reactive behavior, and shows cognitive abilities, i.e., is able to invent new solutions for a problem and to plan ahead. Only after having such a system available, we ask whether it may also be attributed with properties related with consciousness. Specifically, as we abstract from the phenomenal aspect, we will refer to Access Consciousness and Metacognition.

To begin with, we will focus on the question whether, in reaCog, we would find properties of Access Consciousness. This question would be of interest even if some authors would be correct who argue that consciousness in the strict sense can only arise

<sup>4</sup>Using a WTA network this way has been termed biased competition (see e.g. Bundesen et al., 2011.

in systems showing the faculty of Metacognition (e.g., Lau and Rosenthal, 2011, for a recent review defending this view).

The essential properties of Access Consciousness (e.g., Cleeremans, 2005) refer to the ability of a system to plan and guide actions, to report verbally on the content of the corresponding representations and to reason. In contrast, non-conscious representations cannot be used this way. As discussed in Section "Planning Ahead," planning ahead and guiding actions are indeed central properties of reaCog. An agent equipped with reaCog is able to, first, test a new idea by internal simulation ("probehandeln"), which will, when the test has been successful, then be used to guide the newly invented behavior. Concerning the third issue of Cleeremans's list, verbal report, we only briefly sketched here how reaCog may be equipped with the property to deal with (verbal) symbols allowing the agent to report on internal states and comprehend heard verbal expressions (Cruse, 2010). Steels (2007), Steels and Belpaeme (2005), and Narayanan (1997) have however studied in much detail how these properties may be incorporated in a network being based on reactive structures. Thus, at least in principle, reaCog could realize this property, too. Only the last issue from this list describing properties of Access Consciousness, symbolic reasoning, is clearly not addressed by reaCog.

#### *Related work*

To illustrate in more detail to what extent reaCog shows properties of Access Consciousness, we compare reaCog with other related approaches. Dehaene and Changeux (2011) review the relevant models of networks that are supposed to simulate consciousness, including their own approach "global neural workspace" (GNW) (see also Seth, 2007 for a systematic summary). Of all models discussed by Dehaene and Changeux, GNW shows the largest overlap with reaCog. Therefore, in the following we will focus on a comparison with this approach.

Following the ideas of Baars and colleagues (e.g., Baars, 1988; Baars and Franklin, 2007), who, starting with an abstract conceptual approach, have developed the "global workspace" theory, Dehaene and Changeux continued these ideas developing a neural implementation of the GNW. Coarsely, this model consists of two parts, a number of specialized, automatic processes, considered non-conscious, and a second, upper-level part, to which properties of consciousness are attributed. The function of this "router" is to connect sensory and motor representations by variably connecting different automatic processes. Thereby, this "router" is responsible for amplifying and maintaining specific neural representations, making them consciously accessible. Due to the long distance connections the content of these representations can be globally "broadcasted" to many other processes in the brain.

Let us begin to address the basic differences between the GNW model and reaCog. The first one concerns the architectural details, in particular the granularity of the models. GNW operates with a large number of spiking neurons (two orders of magnitude more neurons than reaCog) simulating in detail membrane properties, ion channels and receptor potentials, like AMPA or NMDA receptors. In reaCog only very simple, piecewise linear, weighted summation units are used.

The GNW model consists of several layers connected via bottom-up and top-down channels. Elements of the uppermost layer are connected via mutual inhibition which leads to a competition between these elements (like in a WTA-net). A weak and/or short stimulus given to the lowest, input, layer elicits a short, decaying excitation of the upper layers. A strong and/or long stimulus may activate the top-down connections in such a way that long reverberating activity will occur showing long range synchronous oscillations. The former case is compared with nonconscious activity, the latter with conscious activity (In humans the latter is paralleled by specific oscillations in the gamma band and also marked by positive waves in event-related potentials, Dehaene and Changeux, 2011). As the elements of the uppermost layer, the "router," compete with each other, only one of these elements can be active (and reach the conscious state) at a given moment of time, whereas weaker stimulation may activate several lower-level elements in parallel, maintaining them in the non-conscious state.

In reaCog, we have only one layer of procedures the activity of which could realize different internal states. These states correspond to different contexts that can control the automatic, non-conscious behaviors. If a problem occurs, the attention controller selects and activates specific lower-level procedures by activating the corresponding motivation units which may form coalitions. Like in the upper layer of GNW, there is a competition based on lateral inhibition represented by the "attention controller," i.e., essentially the WTA-net. Thus, both models allowfor serial (all or none) processing at this level. The event-related potentials, in humans paralleled with the occurrence of subjective experience, could by both approaches be explained by the strong activation of the inhibitory signals used for competition in the uppermost layer of the GNW model and the WTA-net in reaCog. Both models further agree with the requirement (Cleeremans, 2005) that in order to reach Consciousness a high strength of activation is needed to win the WTA competition. Furthermore, some time is required as in both models several iterations are necessary until a unique decision has beenmade. Therefore, access to these attended elements, represented by the upper layer in the GNW model or the WTA units of the attention controller in reaCog, is slower than the reactive or "automatic" activation of a module remaining in unattended state. An essential difference between both approaches is that, in reaCog, this WTA-net does not contribute to the phenomenal state directly, but only selects those procedures that may become conscious. In reaCog, phenomenal experience, if given at all, is accompanied with the corresponding activation of the procedures.

Beside the difference with respect to granularity, the second crucial difference concerns the tasks to be dealt with. The task of the reactive part of reaCog is to control a complex body with 22 DoF – most of which concern redundant DoFs – able to walk over irregular surfaces including very large gaps (up to twice the size of a normal step length) as well as dealing with complex navigation tasks including path integration and landmark navigation. This is different from the GNW approach. A recent implementation of the GNW model, merging elements that have earlier been studied separately, is given by Zylberberg et al. (2011). The GNW model is equipped with the above mentioned complex internal neuronal structure forming a realistic simulation of mammalian brain properties. As input, simulated visual or auditory signals are applied whereas motor outputs are represented by simple go-nogo signals. As studied by Zylberberg et al. (2011), the GNW model concentrates on dual-task inferences, i.e., the inability of human subjects to deal with two tasks, T1 and T2, at the same time. In one type of experiments, studying the psychological refractory period (PRP), first a stimulus S1 is given, that triggers a task T1. Then a second stimulus, S2, triggering another task, T2, is provided. If S2 is presented before T1 has been finished, the execution of T2 is delayed until the first task is finished. In another type, the attentional blink experiment, a stimulus does not become consciously aware if it follows another stimulus too closely. In some masking experiments, the first stimulus is responded to although the person did not become aware of the appearance of this stimulus (e.g., Ansorge et al., 1998). The model of Zylberberg et al. (2011) is able to agree in quantitative detail with many experimental results. Generally, these effects can be interpreted as basic properties of a WTA network with hysteresis properties, the effects depending on the time delay, the strength, and the duration of the stimuli. Therefore qualitatively they could also be found in a network like reaCog. However, no comparable quantitative simulation is possible due to the different granularity. Likewise, no statements can be drawn from reaCog simulations which are comparable with the interesting insights (Zylberberg et al., 2011) concerning the possible properties of oscillatory states.

As reaCog is not equipped with spiking neurons, no long distance phase synchrony can be observed. These events are sometimes assumed as to form the neural correlates of consciousness. As an alternative, they may, however, as such, be mere "technical" requirements necessary for binding of spatially distributed neural elements, a function that in reaCog is represented by selection of the appropriate motivation units.

## *Global accessibility*

A notion tightly related to the above mentioned term "GNW," (e.g., Dehaene and Changeux, 2011), the term "unified neural workspace" (Dehaene and Naccache, 2001), and "global workspace" (Baars and Franklin, 2007), all postulated to characterize a prerequisite of conscious representations, concerns the latter as being "globally accessible" or "globally available" (Cleeremans, 2005). This means that many (but probably not all) of the representations stored in memory can become conscious representations, i.e., become available to be used for the solution of an actual problem (reaCog) or to be selected for a task (GNW). In contrast, nonconsciously used representations can only be used within their respective context.

To what extent can this aspect be represented by reaCog? If the agent is performing an automatic behavior, in this case walking on a not too strongly cluttered surface, the behavior can be driven by direct (and therefore fast) application of local modules belonging to the procedural memory. This is possible as long as no problem occurs. In such situations, the WTA-net of the attention controller is not activated which means that these behaviors are performed, but not "cognitively attended." Therefore, the procedures are activated but not element of Access Consciousness as they are not used for planning, for example. However, when a problem happens to occur, most elements of the procedural memory can, in principle, be accessed by the attention system (Norman and Shallice, 1986). In reaCog this refers to those procedural elements that receive an influence from the WTA units (**Figure 4**, dashed arrows). Recall that, due to the properties of the WTA-net, only one such element can be activated at a given moment of time. All these modules may therefore be described as being "globally accessible" and possible elements of Access Consciousness.

### *Relation between conscious and automatic procedures*

There is another interesting relation between properties of reaCog and findings in psychology, but has, to our knowledge not yet been addressed by the GNW approach. On a qualitative level it is known for long that we can learn new behaviors by treating them consciously, but with time of practice we are able to perform these behaviors more and more without conscious awareness being necessary (sometimes dubbed "downloading into the amphibian brain"). A similar process can be observed to happen in reaCog: as long as learning a new solution has not yet reached a level where no significant errors occur, the problem detectors are still active and the corresponding behavior remains attended. If learning was successful, attention is not any more necessary and the new solution has become part of the procedural memory, i.e., of the reactive system<sup>5</sup> .

On the other hand, there are experimental results showing that, in human beings, conscious access to an element after learning has been finished may lead to problems. Beilock et al. (2002) have shown that well-trained athletes perform better when they are distracted from the task than when they concentrate on performing a well-trained behavior. In principle, this property could be found in reaCog, too. If a WTA unit of the attention controller is activated by any higher-level brain structures (not addressed in **Figure 4**), this influence may activate learning and therefore change, and possibly deteriorate, the properties of the neuronal module. If no such attention influence is active, the behavior may be performed in a perfect way.

### *Localizing access consciousness*

Finally, another difference between the simulation studies of, on the one hand, Dehaene and colleagues and Baars and colleagues and, on the other hand, reaCog, should be addressed. Whereas in the former approaches activities accompanied with consciousness are assigned to specific areas of the human brain, we stay neutral with respect to analogies between the structures of reaCog and the morphology of the human brain due to our extreme reduction to function. Instead, we could ask whether it would be possible to localize the properties of Access Consciousness anywhere within reaCog? Interestingly, there is no specific part that might be attributed the property of Access Consciousness. Rather, the complete system consisting of procedural memory, the attention controller, and its ability to switch the motor output from controlling the body to controlling the body model, can be considered to correspond to the structure required for Access Consciousness or the "neural workspace." Its dynamics, as defined by Dehaene and Naccache (2001), is, in the model, essentially determined by the dynamics of the WTA-net. In our model the neural workspace does not form a separate "theater" where the content of the memory elements is re-represented. Instead,

<sup>5</sup>we have not yet implemented the learning procedure in reaCog.

already existing modules of the procedural memory being coupled via the loop through the model of the body and of the environment together form the global workspace (which compares to the notion of "second-order embodiment," c.f. (Metzinger, forthcoming). reaCog is neither hierarchically structured nor is it strictly parallel, as the attention controller only selects the relevant processes. Therefore, reaCog should not be interpreted as a firstorder model as defined by Lau and Rosenthal (2011), because the upper layer, the attention controller, is necessarily required.

### *Attention and consciousness*

Koch and Tsuchiya (2007) argue that there is attention without consciousness and consciousness without concurrent attention, which leads these authors to the conclusion that both phenomena result from different mechanisms. This statement, of course, depends on how attention and consciousness are defined. If we accept a hypothesis for phenomenal experience to be based on specific neuronal dynamics, and the proposal made by reaCog that a stimulus is attended if specific motivation units are activated, in reaCog both phenomena are, although functionally related, indeed subject to different mechanisms. Attention refers to the selection of the procedure, which may reach a conscious state if attended for long enough time.

### **METACOGNITION**

The second, according Block (1995, 2001) and Cleeremans (2005), essential domain of consciousness, Metacognition, or Reflexive Consciousness (sometimes called Metarepresentation), is characterized by Lau and Rosenthal (2011) as "cognition that is about another cognitive process as opposed to about objects in the world<sup>6</sup> ."

Thus, when focusing on phenomenality, Metacognition can be described as referring to our ability not only to experience, but also to experience that we are experiencing. Correspondingly, when focusing on the execution of behavior, Metacognition refers to the ability of the metacognitive agent to select procedures to control behavior and, by doing so, representing himself or herself ("I make the decision"). In other words, Metacognition requires the ability to observe the own internal states from "above," or "from a bird's eye perspective." Metzinger (forthcoming) classifies this ability as third order embodiment, where the own body is "explicitly represented as existing" and the "body as a whole" can turn "into an object of self-directed attention." Cognitive systems like reaCog can mentally manipulate only objects of the world, including parts of their own body. These objects are manipulated relative to themselves, i.e., in an egocentric world. In contrast, a metacognitive system can, in addition, manipulate a representation of itself relative to the other objects. In other words, metacognitive systems can consider themselves as an object of the world, an ability which may be described as allowing for an allocentric view. reaCog is not equipped with this ability, i.e., reaCog is not equipped with Metacognition.

On a more detailed level, a metacognitive system is characterized by being able to exploit information concerning the quality of the procedure, for instance when selecting a procedure to control the behavior. A person may, for example, access their internal states and guess to what extent he or she is sure about a specific memory content, in order to use this knowledge for decision making. Exploiting stored confidence values is, as such, also possible for a system like reaCog, for example, when the activation of a motivation unit depends on a confidence or quality value. This is indeed the case for the network Navinet mentioned above, which is able to control ant-like navigation allowing for decisions on memory retrieval which depend on the salience of the stored stimulus (Cruse and Wehner, 2011; Hoinville et al., 2012). However, reaCog, extended by Navinet, is not able to represent itself as an element that is mentally manipulable as are other objects of the world, for example its legs. Cleeremans et al. (2007) describe an artificial neural network consisting of two networks. One, a firstorder network, learns a specific input-output task, whereas the other, second-order network learns to estimate the quality of the performance of the first network. The authors claim this system to show a limited form of Metarepresentation, because it represents not only knowledge *in* the system, but also knowledge *for* the system. Although being a very interesting result, we are hesitating to attribute such a system Metacognition as it lacks, like reaCog, a representation of itself.

## **DISCUSSION AND CONCLUSION**

Thus, as a short summary, some of the properties attributed to Access Consciousness can be found in our network, at least in a basic form. Clearly missing are the ability of linguistic reasoning, whereas introduction of verbal communication is only sketched. reaCog may therefore be considered a system that could provide a scaffold for a later system being able to cover some basic aspects of consciousness concerning both Access Consciousness and, as addressed above, Metacognition as long as we put aside the subjective aspect.

The question as to whether it is allowed after all to apply the term consciousness, but also terms as attention, volition, intention (and, not addressed here, emotion) to a simple, insectbased artificial system could be answered in two ways: either these terms are defined as to be strictly coupled to a system that is known to be endowed with an internal perspective. Then, according to current knowledge, these terms are only applicable to human beings, because only in this case we have direct evidence for phenomenality to exist. If we, however, leave this condition open, we have to focus on the functional aspect, and search for corresponding properties also in systems other than human beings including artificial systems. This approach is possible because we believe that the phenomenal aspect is always coupled to specific, yet unknown, properties of the neuronal system which, at the same time, has functional effects and shows subjective experience. In other words, adopting a monist view, we assume that we can circumvent the "hard" problem, i.e., the question concerning the subjective aspect of mental phenomena without losing information concerning the possible function. Of course, we are not in a position to claim which of these structures, if any, are accompanied with phenomenality. If, however, the function of the, for example artificial system, would indeed correspond well enough to those of the neuronal structures that are

<sup>6</sup>Here, the term cognition is used in a more general way compared to the strict definition proposed by McFarland and Bösser (1993) and used in this article.

accompanied with phenomenality, the artificial system may have this property, too.

Following these arguments, we have presented a network that is based on a decentralized architecture consisting of procedural, or reactive, elements. The reactive network consisting of two subnets, Walknet and Navinet, characterized by a heterarchical structure, allows for selection of different behaviors, which includes protection against, in the current behavioral context, non-relevant sensory input, thus representing a kind of implicit attention control. As a next "evolutionary" step the network is equipped with a flexible internal body model allowing for internal simulation of behaviors. Together with the introduction of an attention controller, the complete network, termed reaCog, comprises the ability to plan ahead and to invent new behaviors in order to solve problems for which no solution is actually available. This capability allows the system to test possible adaptations of behavior by internal simulation before carrying them out in reality. In this way the system may circumvent hazardous situations. As such, this attention system cannot function by itself, but only, like a parasite, operates on top of the reactive structures. Following the definition of McFarland and Bösser (1993), the network, being based on reactive procedures and being capable of planning ahead, can be termed a *cog*nitive system, giving rise to its name reaCog.

The architecture applied here integrates often discussed properties postulated to exist in neuronal systems, as are modularity, heterarchy, redundancy, cross modal influences (e.g., path integration and landmark navigation in Navinet), bottom-up and top-down attention control, i.e., selection of relevant input data establishing priorities, as well as application of internal models for prediction. The heterarchical structure used in reaCog comprises a simple realization of "neural reuse" as proposed in Anderson's (2010) massive redeployment hypothesis (2010). Due to the fact that some central structures as the motivation unit network and the body model are realized as a RNN, the complete network forms a holistic system.

This architecture provides an example showing that functional concatenation of modules required for the control of complex behavior does not necessarily require explicit coding, but may emerge from local rules and the coupling through the environment. The latter is illustrated by implementing the network in a, as a first step, dynamic simulation of a 2 DoF, wheeled robot (Navinet) and a 22 DoF hexapod robot. In a second step, its capabilities will be tested on the physical robot Hector (Schneider et al., 2011).

In this article, we particularly focus on the question to what extent aspects of consciousness may be attributed to this system and in which way consciousness may allow for the control of action? Following Block (1995) and Cleeremans (2005), there are two functional aspects of consciousness, Access Consciousness, and Metacognition, when we, as argued above, leave Phenomenal Consciousness aside.

One function of Access Consciousness, as discussed here, is to allow the agent becoming independent of the hard-wired reactive structure by which memory elements can only be selected within a given context. This is, for instance, required if a behavioral problem occurs, i.e., a situation not treatable by the existing system. In the state of Access Consciousness, the agent is able to plan ahead, and thereby to test new ideas, i.e., new combinations of elements of the procedural memory. These new ideas, when successfully tested by internal simulation, are used to guide the newly invented behavior.

The advantages for an agent endowed with properties of Access Consciousness come with drawbacks: (i) Controlling behaviors through a conscious state is slower than controlling it by reactive structures. (ii) Application of consciousness allows for inventing new behaviors, but, when being activated during an ongoing reactively controlled action, might worsen the performance. Both properties can also be found in psychological experiments with human participants.

The architecture used here, that allows to control behavior and endorses properties of Access Consciousness, may also be suited to set the stage for the later introduction of neural structures that can function as neural representation of – averbal and verbal – concepts. However, here we concentrated on a specific domain, solving motor problems. Such problems cover an area being less restricted than it might seem to be the case at a first glance, as many problems, including abstract mathematical problems, can arguably be understood as being based on the ability to solve motor tasks (e.g., Lakoff and Núñez, 2000; Glenberg and Gallese, 2012). In addition to being concerned with motor control, reaCog might be confronted with situations that might be seen as to belong to perception and where attention may not be driven by the WTA system of the "attention controller." For instance, an unexpected stimulus may, in a bottom-up fashion, direct attention to a memory element that represents this kind of stimulus. Similarly, top-down attention is possible. The latter would however require further structures to represent the above mentioned averbal or verbal concepts not yet introduced in reaCog.

Another aspect, not covered by the simple structure of reaCog, concerns incubation (Helie and Sun, 2010). Incubation might help when a problem is given for which actually no solution can be found. A sensible way out of such a deadlock might be to quit the current goal and introduce another one. As for the simple version of reaCog discussed here, internal simulation is only possible whilst the actual behavior is interrupted, switching the goal means that the problem as such would remain unsolved. Incubation describes the observation that humans, in contrast to reaCog, can apparently search for solutions even if other behaviors are active. Thus, a further challenge is to introduce structures that allow searching for solutions of open problems, whilst the agent is performing other behaviors.

Apart from such specific shortcomings that arise when trying to compare a simple system like reaCog with fully conscious systems as humans, a more general counterargument might be to consider Block's conceptualizations that we use here as a scaffold for helping to understand consciousness, as basically misguided. Following this view, properties of reaCog might still be considered interesting, but of minor relevance for the discussion of what is meant by consciousness. One specific case is represented by authors who, as reviewed by Lau and Rosenthal (2011) restrict consciousness to Metacognition only, and are not prepared to attribute properties of consciousness to what is termed Access Consciousness by other authors. This view represents a challenge to expand reaCog for endorsing properties of Metacognition.

Metacognition, or reflexive Cognition, addresses the ability to deal with own mental states. A related aspect has been described by the term Theory of Mind, which characterizes the ability to attribute mental states (e.g., emotional states) also to other agents (Premack and Woodruff, 1978). This has often be described as the ability to "step into the shoes of the other." In classical experiments, this capability is tested in the so-called Sally–Anne task. Two subjects are shown that a candy lying on the table is hidden under a black cover. Then one subject, Sally, has to leave the room whilst the candy is now hidden under the white cover, as observed by Anne. After Sally has come back, Anne is asked under which cover Sally will probably search for the candy. If Anne points to the black cover, she is assumed to have a Theory of Mind, but not, if she points to the white cover where the candy really is placed. Being endowed with the faculty of applying a Theory of Mind would allow to better model the world when it contains not only mere physical objects but other agents capable of operating with not directly observable plans and intentions. Thus, the ability to attribute a Theory of Mind, or mental states, to others allows the agent to better predict the behavior of the other. Two main alternative explanations are discussed as to how Theory of Mind is realized. The so-called theory–theory (Carruthers, 1996) assumes that there are (innate) procedures that allow for prediction of others. In contrast, simulation theory (Goldman, 2005) assumes that the agent has an internal model of him or herself that can be used to represent the other, too. Via internal simulation

## **REFERENCES**


Bloch, A. M. (1885). Expérience sur la vision. *Paris: Soc. Biol. Mem.* 37, 493–495.


(or "probehandeln"), this model can simulate the behavior of the other agent, based on the properties of the simulating agent. However, both theories are not necessarily excluding each other. If we assume that reaCog is expanded by a network that allows to use its own body model to represent another agent (see Cruse and Schilling, 2011) for a sketch of how such a network may be constructed), this model could be used for the simulation. If such a simulation has led to a new, successful interpretation of the behavior of the other, the result could be stored as a procedure, as described for reaCog when having learnt new solutions. In this way, the simulation result could be stored as part of the reactive memory complementing the already existing innate procedures. In this way, the structure allowing for internal simulation may provide a tool for enriching the procedures usable to predict the behavior of others. In any case, the faculty to apply a Theory of Mind is clearly beyond the ability of reaCog, which allows for an egocentric view only.

## **ACKNOWLEDGMENTS**

We gratefully acknowledge support by the EC Project EMICAB (FP7-270182) and the Center of Excellence "Cognitive Interaction Technology" (EXC 277) (Malte Schilling) as well as by the Wissenschaftskolleg zu Berlin (fellowship to Holk Cruse). Further we would like to thank Martin Carrier and Werner Schneider, both Bielefeld, for very helpful comments to an earlier version of the manuscript.

*Research,* Vol. 150, ed. S. Laureys (Elsevier), 81–98.


P. Bourgine, M. Dorigo, and R. Doursat (Cambridge, MA: MIT Press), 185–192.


Cruse and Schilling Consciousness in a minimal cognitive system

a theory of language acquisition, comprehension, and production. *Cortex* 48, 905–922. doi:10.1016/j.cortex.2011.04.010


*(Regul. Ed.)* 15, 365–373. doi:10.1016/j.tics.2011.05.009


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 March 2013; accepted: 17 May 2013; published online: 18 June 2013.*

*Citation: Cruse H and Schilling M (2013) How and to what end may consciousness contribute to action? Attributing properties of consciousness to an embodied, minimally cognitive artificial neural network. Front. Psychol. 4:324. doi: 10.3389/fpsyg.2013.00324*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Cruse and Schilling. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The calcium wave model of the perception-action cycle: evidence from semantic relevance in memory experiments

## **Alfredo Pereira Jr \*, Rafael Peres dos Santos and Rafael Fernandes Barros**

Department of Education, Biosciences Institute, São Paulo State University, Botucatu, São Paulo, Brazil

#### **Edited by:**

Ezequiel Morsella, San Francisco State University; University of California San Francisco, USA

#### **Reviewed by:**

Ezequiel Morsella, San Francisco State University; University of California San Francisco, USA Donelson Edwin Dulany, University of Illinois, USA

#### **\*Correspondence:**

Alfredo Pereira Jr, Department of Education, Biosciences Institute, São Paulo State University, Campus Rubião Jr, Botucatu, 18618-970 São Paulo, Brazil. e-mail: apjmaop@ig.com.br

## **INTRODUCTION**

Brain information processing and the control of action can occur in three modes: automatic, unconscious, and conscious (**Figure 1**). Automatic and unconscious processes are often conflated, but several recent results about complex and flexible unconscious processing – not reviewed here – have contributed to disentangle them. The identification of mechanisms underlying the conscious mode has been a major challenge. What makes some mental processes conscious? The *calcium wave model* (Pereira and Furlan, 2010; Pereira, 2012) relates conscious action control with the presence of large ionic waves in astroglial networks of the brain,feeding back on the neuronal networks that prompt them.

In the model, dynamical information patterns are available in the environment of the conscious agent. They are received and processed, and the products can be used to guide action in the same environment. Linguistic, spoken, and/or written actions require a complex coordination of muscles that would not be possible without conscious processing (Morsella, 2005). The conscious mode requires, according to the model, the formation of an endogenous, positive or negative feedback that corresponds to current views of conjoint "bottom-up" and "top-down" activations, as in Adaptive Resonance Theory (Carpenter et al., 1992). The model relates such a "resonance" to reciprocal neuronal and astroglial network activations mediated by tripartite synapses in different intensities, corresponding to degrees of consciousness (Carrara-Augustenborg and Pereira, 2012).

## **EVIDENCE FOR A ROLE OF ASTROGLIAL CALCIUM WAVES IN CONSCIOUS PROCESSING**

There is currently a good understanding of how astrocytes locally modulate neuronal function (De Pittà et al., 2011; Takata et al., 2011), reinforcing or depressing activity of post-synaptic neurons

We present a general model of brain function (the calcium wave model), distinguishing three processing modes in the perception-action cycle. The model provides an interpretation of the data from experiments on semantic memory conducted by the authors.

**Keywords: learning, memory, consciousness, perception, action, repetition, relevance**

according to (still unknown) relevance filters. Pereira and Furlan (2010) have proposed a model of brain mental functions that relates large-scale calcium ion waves in the astroglial network with the "top-down" signal that modulates neuronal networks. Furthermore, these waves would correspond to the broadcasting of feelings about the content of the information processed by neurons. It is claimed that only with the generation of such feelings conscious processing occurs – otherwise (i.e., without feelings), the cognitive processing is unconscious. The relation of these waves with conscious processing is well documented by Thrane et al. (2012), showing that commonly used general anesthetics selectively suppress astrocyte calcium waves.

The generation of large-scale calcium waves begins with neuronal synchronization (Pereira and Furlan, 2009), producing a "carousel effect," i.e., the neuronal induction of astroglial calcium movements (Pereira and Furlan, 2010; see also Ingber, 2012) simultaneously at many locations in the brain. The resulting calcium wave is both an integration of spatially distributed neuronal information and an affective reaction to the received information. This wave spreads in cortical tissue (Kuga et al., 2011; Navarrete et al., 2012) possibly by means of a "domino effect" (Pereira and Furlan, 2010), and feeds back on neurons, reinforcing or depressing their activity (as definitely proved by Han et al., 2013), probably according to the valence of the feeling (i.e., if the information content is experienced as being good, there is a positive feedback and neuronal activity is reinforced; and if it is experienced as being bad, there is a negative feedback and the activity is depressed).

Here we use this model to interpret empirical results on memory formation, the kind of results that have appeared in textbooks of cognitive psychology but have never been interpreted in light of a calcium wave model.

## **EXPERIMENTS**

Learning can be reinforced by means of two factors: *repetition* of stimulation and *semantic relevance* of the stimulus. In cognitive neurobiology, these strategies correspond respectively to *temporal summation* of stimuli and *spatial summation* induced by the matching of bottom-up (sensory) and top-down (attentional/motivational) signals.

We executed a series of cognitive experiments addressing the possible roles of stimulus repetition and semantic relevance in the formation of short-term declarative (conscious) memory (see Marques et al., 2010; Barros et al., 2011). A population of 157 undergraduate students was presented with linguistic stimuli of two kinds: unrepeated sentences containing information relevant (e.g., about fellowships and sports) or not to their lives, and repeated sentences with irrelevant information only (e.g., about events in distant small towns). The relevance or irrelevance of the sentences for the target population was previously checked by means of piloting.

After a brief, sequential presentation of the sentences using a screen projector, the students were asked to answer a written questionnaire containing one question about each sentence. The results indicate a within-subjects effect: unrepeated relevant information was more efficient for semantic memory formation than repeated irrelevant information (**Figure 2**). Control unrepeated sentences conveying irrelevant information were poorly remembered.

## **DISCUSSION**

The above results can be understood in terms of the calcium wave model of conscious action control (there are, of course, other models that would be consistent with the results). According to the model, our obtained results can be understood as an effect of astroglial modulation of neuronal activity: the triggering of a strong endogenous positive feedback by relevant information contents, but not by non-relevant ones (these would elicit a weaker positive feedback, or even a negative one).

In sum, comparing the effect of presentations of relevantand-unrepeated against irrelevant-and-repeated sentences withinsubjects, our model predicts a difference in degree and valence of calcium wave activation. On the one hand, single presentations of relevant sentences would elicit *a stronger, positively valued astrocyte calcium wave* that reinforces neuronal activity, leading to an increase of calcium ion entry in the post-synaptic neuron. These ions possibly bind to calmodulin and related kinase proteins, activating signaling pathways that support memory formation. On the other hand, repeated presentations of boring information would lead to *a weaker, negatively valued wave* that does not produce such a reinforcement, and in some cases possibly leads to an inhibition of the corresponding neuronal receptors by means of glial transmitters. The latter possibility would explain why some irrelevant sentences presented

five times were less remembered than other irrelevant sentences presented three times (for details of experiments and statistical analysis of the results, see Marques et al., 2010 and Barros et al., 2011).

We hope that this non-mainstream model of consciousness, along with our presentation of the kind of data that could be used

## **REFERENCES**


*Processing*, eds G. A. Carpenter, and S. Grossberg (Cambridge: The MIT Press), 365–384.


to support such a model, will spur new ways of theorizing about the challenging topic of consciousness and action control.

## **ACKNOWLEDGMENTS**

FAPESP (Brazilian funding agency) and two anonymous reviewers for their important contributions.


of neocortical processing. *Cognit. Comput.* 4, 38–50.


neuron-astrocyte interactions and perceptual conscious processing. *J. Biol. Phys.* 35, 465–481.


anesthesia selectively disrupts astrocyte calcium signaling in the awake mouse cortex. *Proc. Natl. Acad. Sci. U.S.A.* 109, 18974–18979.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 March 2013; accepted: 15 April 2013; published online: 06 May 2013.*

*Citation: Pereira A Jr, Santos RPd and Barros RF (2013) The calcium wave* *model of the perception-action cycle: evidence from semantic relevance in memory experiments. Front. Psychol. 4:252. doi: 10.3389/fpsyg.2013.00252 This article was submitted to Frontiers*

*in Cognition, a specialty of Frontiers in Psychology. Copyright © 2013 Pereira Jr, Santos and*

*Barros. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The effects of alerting signals in masked priming

#### *Rico Fischer <sup>1</sup> \*, Franziska Plessow1 and Andrea Kiesel <sup>2</sup>*

*<sup>1</sup> Department of Psychology,Technische Universität Dresden, Dresden, Germany*

*<sup>2</sup> Department of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Sachiko Kinoshita, Macquarie University, Australia T. Andrew Poehlman, Southern Methodist University, USA*

#### *\*Correspondence:*

*Rico Fischer, Department of Psychology, Technische Universität Dresden, Zellescher Weg 17, 01069 Dresden, Germany e-mail: rico.fischer@tu-dresden.de*

Alerting signals often serve to reduce temporal uncertainty by predicting the time of stimulus onset. The resulting response time benefits have often been explained by facilitated translation of stimulus codes into response codes on the basis of established stimulus-response (S-R) links. In paradigms of masked S-R priming alerting signals also modulate response activation processes triggered by subliminally presented prime stimuli. In the present study we tested whether facilitation of visuo-motor translation processes due to alerting signals critically depends on established S-R links. Alerting signals resulted in significantly enhanced masked priming effects for masked prime stimuli that included and that did not include established S-R links (i.e., target vs. novel primes). Yet, the alerting-priming interaction was more pronounced for target than for novel primes. These results suggest that effects of alerting signals on masked priming are especially evident when S-R links between prime and target exist. At the same time, an alerting-priming interaction also for novel primes suggests that alerting signals also facilitate stimulus-response translation processes when masked prime stimuli provide action-trigger conditions in terms of programmed S-R links.

**Keywords: temporal predictability, alerting signal, accessory, masked priming, action-trigger, target primes, novel primes**

## **THE EFFECTS OF ALERTING SIGNALS IN MASKED PRIMING**

Task-irrelevant acoustic signals that precede an imperative visual target stimulus by few hundred milliseconds (e.g., 200–1000 ms) have been demonstrated to improve performance, typically reflected in speeded responses (Niemi and Näätänen, 1981). The presence of such an alerting signal can be utilized as readiness signal predicting the temporal onset of the forthcoming stimulus and thus reducing temporal uncertainty by attentional focusing. At the same time, alerting signals also elicit a brief surge of arousal that non-specifically primes low-level motor pathways (Sanders, 1983).

Accordingly, much research has demonstrated that beneficial effects of alerting signals occur on various levels of information processing, including perceptual encoding and sensory information accumulation (Bausenhart et al., 2010; Seibold et al., 2011), early response selection processes (Hackley and Valle-Inclán, 1999) and/or motor execution processes (Miller et al., 1999; Kiesel and Miller, 2007; Thomaschke and Dreisbach, 2013).

On a more general level, the functional role of alerting signals may be to support the cognitive system in adapting behavior to an expected event by increasing unspecific alertness and motor readiness and by inducing a bias toward stronger reliance on reflex-like habitual behavior (Fischer et al., 2013). This assumption is captured in the recently proposed facilitated response activation account of alerting signals, suggesting that alerting signals facilitate automatic translation of stimulus codes into response codes (Fischer and Plessow, in revision; Fischer et al., 2010, 2012). In particular, it is argued that alerting signals lead to a more efficient transmission of perceptual information of the expected stimulus into corresponding motor codes. In line with assumptions of increased information transmission efficiency, recent findings show that the presence of alerting signals reduce neural activity in the primary visual cortex (Fischer et al., 2013). More specifically, alerting signal effects of facilitated behavioral responses correlated with a reduction in the neural activity in the primary visual cortex. Thus, expectation of a sensory input reduces the neural effort needed to process this visual stimulus (Alink et al., 2010). Therefore, information transmission from lower to higher cortices is achieved with less neural activation (Rao and Ballard, 1999), which is in line with an assumed beneficial alerting signal based visuo-motor translation.

This facilitated visuo-motor translation by alerting signals might be based on direct (i.e., learned) S-R links that are established by responding to a stimulus and thus actively associating a particular stimulus or stimulus feature with the corresponding motor response (Neumann and Klotz, 1994; Klapp and Haas, 2005). Currently there is some evidence that alerting signals impact on these types of S-R links (see below), whereas clear impact of alerting signals on visuo-motor translation without direct S-R links is to date lacking. Evidence for facilitated response activation on the basis of direct S-R links can be found in often reported alerting-congruence interactions when alerting signals are incorporated in conflict paradigms (e.g., Simon, Eriksen flanker). <sup>1</sup> In such paradigms conflict occurs when relevant and irrelevant information activate different response alternatives

<sup>1</sup>At present it is debated whether alerting-congruence interactions can also be found in Stroop paradigms, in which relevant and irrelevant information are included into a single object representation (Fischer and Plessow, in revision; Weinbach and Henik, 2012).

(incongruent trials) compared to the activation of the same response alternative (congruent trials). Consequently, response conflicts, for example, reflect competition between simultaneously activated response codes. In this context, the presence of alerting signals is assumed to facilitate automatic stimulusresponse translation processes for relevant and for irrelevant stimulus attributes, resulting in increased interference effects between simultaneously active response codes (e.g., Fischer et al., 2010; Böckler et al., 2011). In a recent electrophysiological study, for example, Böckler et al. (2011) found that an alerting-signal increased the amplitude of the lateralized readiness potential (LRP) for the incorrect response in incongruent trials, which has been taken as direct evidence that alerting signals facilitate visuo-motor response activation.

Importantly, in a previous behavioral study we demonstrated that facilitation of visuo-motor translation due to alerting signals was only observed when direct stimulus-response links existed. In a word-variant of the Eriksen flanker task (Shaffer and LaBerge, 1979; Fischer and Schubert, 2008) increased interference due to alerting signals was found only for flanker items that were included in the response set and thus contained direct stimulusresponse associations. Distracter words that were not part of the response set revealed semantic conflict that was, however, not affected by alerting signals (Fischer et al., 2012).

The beneficial effects of alerting signals on visuo-motor translation processes can also be found for response activation processes triggered by subliminally presented (masked) stimuli (Fischer et al., 2007). For example, in a masked priming paradigm (Vorberg et al., 2003), participants were asked to respond to left or right pointing arrows. Unbeknownst to the participants, target arrows were preceded by masked prime arrows that also pointed toward the left or right side and thus formed congruent or incongruent prime-target relations when pointing into the same or the opposite direction than the target arrow, respectively. Alerting signals were presented in various random (Experiment 1) or blocked (Experiment 2) foreperiod intervals prior to the primetarget pair. Importantly, alerting signals facilitated visuo-motor response activation processes triggered by the visual stimuli. As a consequence enlarged masked priming effects were especially observed when alerting signals preceded the target arrow by at least 250 ms compared to conditions with shorter foreperiods or conditions without alerting signals.

Importantly, prime arrows were able to subconsciously activate stimulus-response links. Alerting signals served to increase this prime-triggered response activation. More specifically, alerting signals facilitated transmission of information along the established stimulus-response links. Because recent data suggested that in conflict tasks increased effects due to alerting signals depend on existing stimulus-response links (Fischer et al., 2012), in the present study we aimed to extend these findings by further testing and specifying the stimulus-response link dependency.

In a masked number comparison task, for example, in which participants categorize target digits for example as smaller or larger than five (Dehaene et al., 1998; Naccache and Dehaene, 2001; Kunde et al., 2003; Reynvoet et al., 2005; Kiesel et al., 2006a, 2007b; Van den Bussche et al., 2009; Fischer et al., 2011) two sets of prime stimuli can be included. First, primes that also appear as target stimuli are referred to as "target primes" (e.g., the digits 1, 4, 6, and 9). Stimulus-response links are established whenever a target number is responded to with a specified response key (e.g., digits larger than five—right response). These response activation processes on the basis of stimulus-response links are triggered when the same target stimuli serve as masked primes in other trials (Neumann and Klotz, 1994; Damian, 2001). Second, prime stimuli that never serve as target stimuli and are therefore never responded to are called "novel primes". Importantly for the aim of the present study, these novel primes do not contain established direct stimulus-response links. In fact, some researches assume that novel primes elicit semantic processing (Naccache and Dehaene, 2001; Reynvoet et al., 2005; see Van den Bussche et al., 2009 for an overview). The differential reliance on established direct stimulus-response links for target and novel primes may account for observed differences in processing triggered by these prime types. For example, masked priming by novel primes has been shown to be smaller in size (Naccache and Dehaene, 2001), to depend on task conditions (Kiesel et al., 2006a; Pohl et al., 2010; Fischer et al., 2011), and has been reported to display a different time course (Kinoshita and Hunt, 2008; Finkbeiner and Friedman, 2011).

In the present study we aim to extend and further test the assumption that the presence of alerting signals affect visuomotor translation particularly on the basis of established S-R links and not on the basis of semantic processing (Fischer et al., 2012). For this, we implemented a different paradigm than in Fischer et al. (2012), i.e., masked priming paradigm including target and novel prime stimuli that are known to differ with respect to the involvement of established direct stimulus-response links. If alerting signals exclusively facilitate visuo-motor response activation on the basis of established direct S-R links, alerting signals should increase masked priming effects specifically for target but not for novel primes.

## **EXPERIMENT 1**

The aim of Experiment 1 was to test whether alerting signals affect response activation processes triggered by target primes that include S-R links (see also Fischer et al., 2007) and response activation processes triggered by novel primes. For this, we included an alerting signal (present vs. absent) in a masked number priming task (Naccache and Dehaene, 2001), in which the numbers 1, 4, 6, and 9 served as target and as target primes, whereas the enclosed numbers 2, 3, 6, and 7 functioned as novel primes.

## **METHOD**

#### *Participants*

Thirty-two students of the Technische Universität Dresden (24 female, 21–35 years; mean age ± SD, 25*.*0 ± 2*.*8 years) participated in the study for partial course fulfillment or C5 payment. All participants had normal or corrected-to normal vision and were naive about the hypothesis of the experiment.

## *Apparatus and stimuli*

Stimulus presentation and collection of responses were performed by an IBM-compatible computer with a 17 inch VGA-Display. Participants responded by pressing the "X" and "," key of a standard QWERTZ keyboard with the left and right index finger, respectively. Stimulus presentation and data recording were realized using Presentation software (Version 0.71, Neurobehavioral Systems). Stimulus presentation was synchronized with the vertical retraces of a 70-Hz monitor, resulting in a vertical refresh rate of approximately 14 ms. Two sets of stimuli were used that were presented white on black background. The numbers 1, 4, 6, and 9 served as prime and as target stimuli (target primes) whereas the numbers 2, 3, 7, and 8 where never presented as targets and thus, served as prime stimuli only (novel primes). Out of a set of fourteen masks, each consisting of randomly assigned capitalized/non-capitalized 7 letter strings chosen from the whole alphabet (e.g., TsPLqaF), one was randomly selected to serve as pre-mask. From the same set another mask was randomly selected to serve as post-mask. With a viewing distance of about 60 cm, the visual angle extended to 0*.*38◦ × 0*.*76◦ for prime and target stimuli and to 3*.*34◦ × 0*.*76◦ for masks. A tone of 700 Hz frequency served as alerting signal and was presented binaurally via headphones.

## *Procedure*

Participants were asked to perform a size judgment task (smaller or larger than 5) on numbers between 1 and 9, excluding 5, responding with the left index finger to numbers smaller than five and with the right index finger to numbers larger than five. A masked prime stimulus preceded the target number. They described a congruent relation when both numbers fell on the same side of five. In an incongruent condition, prime and targets resided on opposite sides of five. In order to prevent prime visibility, a prime stimulus was imbedded between two masks, each consisting of a random letter string, e.g., WLulMBa (see Dehaene et al., 1998).

Trials without an alerting signal started with the presentation of a fixation cross for 1100 ms, which was followed by a premask for 71 ms. Subsequently, a prime stimulus was shown for 43 ms and was immediately masked by a post-mask for 57 ms. Finally a target number was presented for 200 ms. If a response exceeded 1800 ms (beginning at target onset) or if the wrong response was given, the feedback "too slow" or "error" was presented for 300 ms. A correct response was followed by the fixation cross for another 300 ms. Following feedback, the fixation sign was presented in a random response-stimulus-interval (RSI) that varied in steps of 100 ms in the range between 1100 and 2000 ms. In half of the trials an alerting signal was presented 250 ms prior to the pre-mask. Instructions emphasized speed and accuracy of responding to equal parts.

The experiment consisted of 768 trials presented in 12 blocks separated by brief pauses. Each block comprised 64 trials corresponding to a combination of Novel or Target prime (4 + 4) × Target (4) × Alerting signal (2). The experiment was preceded by 16 practice trials.

After the priming experiment participants were fully informed about the presence of the prime stimuli. We conducted a signal detection experiment in which participants were asked to discriminate whether a prime was smaller or larger than five. Participants were instructed to respond at leisure and to prioritize accuracy over speed. To avoid the possibility of unconscious priming influencing the free response choice (Schlaghecken and Eimer, 2004; Kiesel et al., 2006a,b), we included an interval of 1000 ms after target onset, in which in case of an executed response the feedback "too fast" was provided (adopted from Vorberg et al., 2003).

## **RESULTS**

## *Prime visibility*

To assess prime visibility, we computed the signal detection measure *d* whereby primes smaller than 5 were treated as signal. Overall discrimination for primes was *d* = 1*.*70 and deviated from zero, *t(*31*)* = 13*.*20, *p <* 0*.*001. Discrimination performance was better for novel than for target primes, *t(*31*)* = 8*.*73, *p <* 0*.*001, it amounted to *d* = 2*.*26 for novel primes and *d* = 1*.*26 for target primes. Due to the high prime visibility, we further investigated whether target and novel priming effects were related to prime visibility. To pursue this aim we conducted a regression analysis as proposed by Draine and Greenwald (1998; see also Greenwald et al., 1995). We calculated a priming index for each participant and prime-type: prime index = 100× (RT incongruent—RT congruent)/RT congruent. Individual target and novel priming indices were regressed onto the individual d values for target and novel primes, respectively. No correlation between d and the corresponding target priming effects, *r* = −0*.*153, *p* = 0*.*403, or novel priming effects, *r* = 0*.*217, *p* = 0*.*234, were found. Similarly, none of the correlations were significant (all *p*'s *>* 0.147) when considering target and novel prime indices separately for alerting signal present vs. alerting signal absent. These findings show that despite the high visibility values, the size of target and novel priming effects seemed not to depend on prime visibility.

## *Priming task*

For the RT analyses, all error trials and trials following an error were discarded (8.7%). Furthermore, all trials that did not fit the outlier criterion (RTs *<*150 and *>*1200 ms) were also excluded from analyses (0.1%). Prior to the error analysis, only trials following an error were eliminated. Repeated measures ANOVAs were conducted on mean RTs and percent error containing the factors Alerting signal (present, absent), Congruence (C vs. IC) and Prime-type (target vs. novel primes). Results are presented in **Figure 1**.

## *RT*

Responses were faster when an alerting signal was present (441 ms) than when it was absent (450 ms), *F(*1*,* <sup>31</sup>*)* = 32*.*09, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*51. Prime stimuli shortened RTs in congruent prime-target relations (434 ms) compared to incongruent primetarget relations (456 ms), *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>115</sup>*.*76, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*79. This priming effect was not differentially affected by the factor prime-type, *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>1</sup>*.*54, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*224, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*05. However, the priming effect was increased by the presence of an alerting signal, *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>23</sup>*.*93, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*44. This increase was stronger for target primes than for novel primes as indicated in the significant 3-way interaction between Alerting signal, Congruence, and Prime-type on RTs, *F(*1*,* <sup>31</sup>*)* = 5*.*23, *p <* 0*.*05, η2 *<sup>p</sup>* = 0*.*14.

We conducted RT distribution analyses (De Jong et al., 1994; Kinoshita and Hunt, 2008; Fischer et al., 2010) to test whether

alerting signals impact on different time segments of the RT distribution for target and novel primes, respectively. For this, we computed the percentile values based on the whole RT distribution. That is, we assessed the upper border for each percentile and therewith the 50% percentile is the median. The distribution analysis showed that the specific alerting signal impact on priming effects for target and novel primes did not differ across different RT bins, as the three-way interaction between Alerting signal, Congruence, and Prime-type was not further modulated by the factor Percentile (10, 20, 30, 40, 50, 60, 70, 80, and 90), *F <* 1. Priming effects generally decreased as a function of increasing RTs, *<sup>F</sup>(*8*,* <sup>248</sup>*)* <sup>=</sup> <sup>11</sup>*.*32, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *p* = <sup>0</sup>*.*27 [*F(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>13</sup>*.*52, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*30, linear contrast], which, however, was the same for target and novel primes, *F <* 1. The impact of the alerting signal on the overall masked priming effect was also independent of the time course, *F <* 1 (see **Figure 2**).

Separate ANOVAs for each prime-type confirmed an alerting signal based increase of the priming effect for target primes, *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>20</sup>*.*48, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*40. In particular, a priming effect of 16 ms, *t(*31*)* = 6*.*09, *p <* 0*.*001, in conditions without an alerting signal increased to a priming effect of 31 ms, *t(*31*)* = 9*.*00, *p <* 0*.*001, when an alerting signal was present. For novel primes, however, the priming effect in conditions without an alerting signal [18 ms, *t(*31*)* = 7*.*46, *p <* 0*.*001] also increased significantly when an alerting signal was present [23 ms, *t(*31*)* = 9*.*61, *p <* 0*.*001], *F(*1*,* <sup>31</sup>*)* = 4*.*74, *p <* 0*.*05, η2 *<sup>p</sup>* = 0*.*13.

When effects of repetition priming were controlled for (i.e., elimination of exact prime-target stimulus repetitions), target primes revealed a response priming effect that was significantly increased by the presence of an alerting signal,*F(*1*,* <sup>31</sup>*)* = 12*.*66, *p <* 0*.*01, η<sup>2</sup> *<sup>p</sup>* = 0*.*29. Although this increase of response priming was numerically still larger than the analogous alerting signal based increase for priming by novel primes, this difference was only marginally significant, *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>3</sup>*.*46, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*072, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*10.

#### *Errors*

A total of 4.5% errors were observed in Experiment 1. The alerting signal did not affect overall error rates, *F <* 1, ruling out the possibility of a speed-accuracy trade-off. Error rates were modulated by prime congruence, F*(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>21</sup>*.*90, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *p* = 0*.*41. More errors were committed in incongruent (5.9%) than in congruent (3.1%) prime-target relations. This priming effect was more pronounced for target (3.3%) than for novel primes (2.3%), F*(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>6</sup>*.*39, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*17, and in conditions with (3.6%) compared to conditions without (2.0%) an alerting signal, F*(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>6</sup>*.*12, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*17. A significant threeway interaction, however, was not observed, F*(*1*,* <sup>31</sup>*)* = 1*.*14, *p* = 0*.*294, η<sup>2</sup> *<sup>p</sup>* = 0*.*04*.*

#### **DISCUSSION**

In Experiment 1 the presence of an alerting signal resulted in increased masked priming effects. This holds especially for masked priming elicited by target prime stimuli which also serve as to-be-categorized target stimuli thus containing overtly established S-R links. Importantly, masked priming effects induced by novel primes that were never overtly responded to, were also affected by the presence of alerting signals. This finding in particular demonstrates that alerting signals can affect response activation processes triggered by stimuli that do not include direct S-R links (for a further discussion, see the General Discussion section).

At the same time, the alerting signal based increase of masked priming for target primes was larger in size than the increase of masked priming found for novel primes. Restricting the analysis exclusively to trials of stimulus-response priming (i.e., excluding identical prime-target pairs), the stronger influence of alerting signals on priming by target primes compared to novel primes was still detectable but fell short of significance. Importantly, the influence of the alerting signal on the masked priming effect was the same across the RT distribution for target and novel primes. As in Kinoshita and Hunt (2008), priming effects for target and novel primes declined with increasing RT bins. In contrast to Kinoshita and Hunt (2008), however, both functions for target and novel primes declined in the same way.

## **EXPERIMENT 2**

Experiment 2 served to replicate findings from Experiment 1 and therefore, to provide further evidence for an alerting signal based increase of masked priming effects for target as well as for novel primes. Two changes were included. First, because of rather high prime detection rates in Experiment 1, the prime stimulus duration was shortened. Second, we increased the variation in RSI to reduce an overall temporal predictability of trial onset.

## **METHODS**

## *Participants*

Twenty-six new students of the Technische Universität Dresden (17 female, 18–33 years; mean age ± SD, 21*.*9 ± 3*.*5 years) participated in the study for partial course fulfillment or C5 payment. All participants had normal or corrected-to normal vision and were naive about the hypothesis of the experiment.

## *Apparatus, stimuli and procedure*

The experimental setup of Experiment 2 varied to that in Experiment 1 as follows: Stimuli were presented on a 17 inch VGA-Display with the vertical retraces of a 75-Hz monitor. This resulted in a vertical refresh rate of approximately 13.3 ms. The pre-mask was presented for 67 ms and the subsequent prime stimulus was shown for two refresh cycles of the display (27 ms). The prime was followed by a brief blank (13 ms) and a post-mask shown for 53 ms. In addition, the variation in the range of RSIs was extended. Experiment 2 included ten RSIs increasing from 300 to 2100 ms in steps of 300 ms. The RSI was selected randomly in each trial.

## **RESULTS**

## *Prime visibility*

Overall discrimination for primes was *d* = 0*.*64 and deviated from zero, *t(*25*)* = 5*.*74, *p <* 0*.*001. Discrimination performance was again better for novel (*d* = 0*.*78) than for target primes (*d* = 0*.*51), *t(*25*)* = 2*.*49, *p <* 0*.*05. The regression analyses, however, revealed no correlation between *d* and the corresponding target priming effects, *r* = 0*.*172, *p* = 0*.*400, and the corresponding novel priming effects, *r* = −0*.*109, *p* = 0*.*596. As in Experiment 1, none of the correlations were significant (all *p*'s *>* 0.107) when considering target and novel prime indices separately for alerting signal present vs. alerting signal absent.

## *Priming task*

As in Experiment 1, all error trials and trials following an error were discarded (7.6%) and all trials that did not fit the outlier criterion (RTs *<*150 ms and *>*1200 ms) were also excluded from analyses (*<*0*.*1%). Prior to the error analysis, only trials following an error were eliminated. Repeated measures ANOVAs were conducted on mean RTs and percentage error containing the factors Alerting signal (present, absent), Congruence (C vs. IC) and Prime-type (target vs. novel primes). Results are presented in **Figure 3**.

## *RT*

The presence (433 ms) compared to the absence (456 ms) of an alerting signal considerably reduced RTs, *F(*1*,* <sup>25</sup>*)* = 103*.*43,

**percent error (PE) in Experiment 2 as a function of prime-target congruence, prime type, and alerting signal (AS).**

*p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*81. The factor Congruence also affected responses, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>16</sup>*.*33, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*40, with faster responses in congruent (440 ms) than in incongruent (449 ms) prime-target relations. This priming effect was increased by an alerting signal, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>21</sup>*.*84, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*47. As in Experiment 1, this alerting signal based increase of the priming effect was larger for target compared to novel primes, as indicated by the significant three-way interaction of all factors, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>5</sup>*.*70, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*19 (see also **Figure 3**). Finally, the priming effect in general seemed larger for target than for novel primes, which however, failed significance, *F(*1*,* <sup>25</sup>*)* = 3*.*01, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*095, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*11.

Similar to Experiment 1, the RT distribution analysis showed that there was no interaction between the factors Alerting signal, Congruence, Prime-type, and Percentile, *F(*8*,* <sup>200</sup>*)* = 1*.*00, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*392, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*04. Yet, irrespective of prime-type, the impact of the alerting signal on priming seemed less pronounced for the slowest RTs of the RT distribution, *F(*8*,* <sup>200</sup>*)* = 3*.*38, *p* = 0*.*026, η<sup>2</sup> *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*12 [*F(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>6</sup>*.*54, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*017, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*21, linear contrast]. Finally, although masked priming effects for novel primes were rather stable across the RT distribution, masked priming effects for target primes declined at larger percentiles resulting in an interaction between Congruence, Prime-type, and Percentile (see **Figure 4**), *<sup>F</sup>(*8*,* <sup>200</sup>*)* <sup>=</sup> <sup>4</sup>*.*27, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*021, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*15 [*F(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>5</sup>*.*20, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*031, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*17, linear contrast].

Separate ANOVAs for target and novel primes confirmed an increase of the priming effect by an alerting signal for target primes, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>24</sup>*.*06, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*49, as well as for novel primes, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>4</sup>*.*82, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*16. For target primes the presence of an alerting signal increased from a non-significant priming effect of 2 ms, *t(*25*)* = 0*.*86, *p* = 0*.*397, to a significant 21 ms, *t(*25*)* = 4*.*58, *p <* 0*.*001 priming effect. Similar results were obtained for novel primes. Here, a non-significant effect of 4 ms, *t(*25*)* = 1*.*71, *p* = 0*.*100, without alerting signal was increased to 12 ms, *t(*25*)* = 3*.*85, *p <* 0*.*01 when an alerting signal was present.

As in Experiment 1, the elimination of prime-target stimulus repetitions for target primes resulted in a priming effect

that increased when an alerting signal was presented, *F(*1*,* <sup>25</sup>*)* = 19*.*53, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*44. The alerting signal based increase in priming was still larger for target primes than for novel primes, *<sup>F</sup>(*1*,* <sup>25</sup>*)* <sup>=</sup> <sup>4</sup>*.*23, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*15.

## *Errors*

Participants committed a total of 4.0% errors. As in Experiment 1, the alerting signal did not affect overall error rates, *F <* 1, ruling out the possibility of a speed-accuracy trade-off. However, more errors were produced in incongruent than in congruent prime-target relations, *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>8</sup>*.*02, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*24. Furthermore, this priming effect was larger for target (2.3%) than for novel primes (0.5%), *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>10</sup>*.*25, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*29 and also when an alerting signal was present (2.2%) than when it was absent (0.6%), *<sup>F</sup>(*1*,* <sup>31</sup>*)* <sup>=</sup> <sup>5</sup>*.*96, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*19. Again, a significant three-way interaction, however, was not observed, *F <* 1.

#### **DISCUSSION**

Experiment 2 closely replicated findings from Experiment 1 providing virtually the same results and therefore, making a strong case of alerting signals affecting not only masked priming by target primes but also increasing masked priming revealed by novel primes. In addition, the alerting signal based increase of masked priming effect for target primes also exceeded the increase for novel primes when the analysis was restricted to stimulusresponse priming (excluding identical prime-target pairs). As in Experiment 1, the influence of the alerting signal on the masked priming effect was the same across the RT distribution for target and novel primes. Although, the impact of alerting signals on priming effects was stronger for faster RTs, this finding did not depend on prime-type. Again, this suggests that the alerting signal impact on priming effects for target and novel primes relates to the same RT bins. At the same time, in Experiment 2 and in contrast to Experiment 1 we found a stronger decline of priming effects for target than for novel primes.

## **GENERAL DISCUSSION**

In two experiments it was tested whether alerting signals affect response activation processes in a masked priming paradigm with two different types of prime stimuli that differed with respect to the involvement of learned direct S-R links. In the implemented number categorization task target primes consisted of numbers that also served as target stimuli. By overtly responding to these stimuli, S-R links are established on the basis of which response activation processes are triggered when these stimuli serve as masked primes. Novel primes consisted of a set of numbers that were never presented as target stimuli. Participants did not overtly execute a smaller or larger than five response to these stimuli so that no overt S-R links are formed.

According to previous studies, in which it was assumed that alerting signals particularly facilitate visuo-motor translation processes on the basis of established S-R links (Fischer et al., 2010, 2012), it was argued that the presence of alerting signals (compared to the absence of alerting signals) increase the priming effect especially for target primes for which S-R links existed. In support with this assumption, in two experiments an enhanced masked priming effect for target primes was consistently demonstrated under alerting signal stimulation. An open question was whether alerting signals also affect masked priming for novel primes that did not include direct S-R links. Results of both experiments showed that response activation processes triggered by novel primes were also affected by the presence of alerting signals, resulting in increased masked priming effects for novel primes. Importantly, even when restricting masked priming by target primes to pure stimulus-response priming, the effects of alerting signals on the size of the masked priming effects was more pronounced for target primes than for novel primes. At the same time, even though detectability of prime stimuli (*d* ) was not zero, neither the masked priming effect seemed to depend on prime visibility nor was prime visibility different for target and novel primes.

Together these results have important implications. First, alerting signals seem to especially facilitate response activation processes that are triggered by visual stimuli when established S-R links exist (Fischer et al., 2012) as in the case for target primes. In addition, smaller but reliable effects of alerting signals on the size of masked priming effects for novel primes suggest that the effects of alerting signals seem not exclusively depend on overtly established S-R links. Furthermore, alerting signals affect priming by target and novel primes similarly across different RT bins.

How do these findings fit with previous studies demonstrating that alerting signals facilitate response activation processes when S-R links exist, but do not facilitate semantic processing in conditions without S-R links (Fischer et al., 2012)?

One possible explanation is based on the action-trigger account (Kunde et al., 2003), which does not posit semantic processing for novel primes. Instead, the extent to which novel primes trigger response activation processes that result in priming effects depends on whether these prime stimuli belong to the action-trigger set. According to this account, stimuli trigger responses when they match existing action release conditions, so called action triggers that automatically activate the related action (cf. Kiesel et al., 2007a). In particular, following the instruction participants form memory representations of environmental events that are thought to activate specific motor responses (i.e., action triggers). Online processing, however, is characterized by a comparison process that defines whether a given stimulus matches the established action triggers. If so, the related response alternative is automatically activated.

For example, in the applied number priming task of the present study, the digits 1 and 4 might serve as action triggers for the left response (smaller than five) and the digits 6 and 9 might serve as action triggers for the right response (larger than five). The overt categorization of target stimuli according to the task rule results in an inclusion of unseen prime stimuli into the set of action triggers (cf. Kiesel et al., 2007a, 2009). Moreover, and in line with common assumptions of a mental left-to-right spatial representation of numbers (i.e., mental number line, Galton, 1880; Göbel et al., 2001; Fias and Fischer, 2005), action triggers established for numbers 1, 4, 6, and 9 may also extend to mentally enclosed numbers of novel primes, i.e., 2, 3, 7, and 8, thus explaining priming effects revealed by novel primes without an assumed semantic processing (Kunde et al., 2003). In order to test the assertion of the action-trigger account, Kunde and colleagues varied the set of target and novel primes. For example, using numbers adjacent to five (i.e., 3, 4, 6, and 7) as target stimuli resulted in priming effects when the same stimuli served as target primes. At the same time, however, neighboring but not enclosed novel primes (i.e., 1, 2, 8, and 9) did not yield a priming effect (Kunde et al., 2003, Experiment 2).

Back to our own study, alerting signals seem to facilitate performance whenever stimuli are able to trigger automatic response activation processes. That is, novel primes that are included in the action trigger set automatically trigger response activation processes that can be modulated by the presence of alerting signals. This alerting signal based modulation of response activation occurs at the same RT bins for target primes and for novel primes. Therefore, it is conceivable that in the present study, and in contrast to Fischer et al. (2012), participants were able to form very specific action-trigger (S-R links) because the expected stimuli were clear defined. That is, similarly to Kunde et al. (2003), numbers representing novel primes were included into the action-trigger set and were able to automatically trigger response activation processes.

Therefore, the present findings of alerting signals modulating masked priming effects by novel primes also suggest

## **REFERENCES**


that processing of novel primes is not (exclusively) based on semantic processing (but see Van den Bussche et al., 2009). Although we cannot exclude that additional components of (e.g., semantic) processing may kick in for novel primes especially at larger RT bins (Kinoshita and Hunt, 2008), alerting signals affected target and novel prime processing irrespective of RT bins (Experiment 1) and across the same RT bins (Experiment 2). Furthermore, the fact that we did not find unequivocal evidence for differential time courses for target and novel priming effects, clearly calls for further research in this line. Instead, we think that novel primes that are included in the actiontrigger set form so-called programmed or instructed S-R links which are formed for expected stimuli as soon as participants read and implement the task instruction (Woodworth, 1938; Hommel, 2000). In line with the action-trigger account, alerting signals not only affect response activation processes of overtly learned and responded to S-R links, but also affect response activation processes for those stimuli that do not contain direct learned S-R links but which are part of the action trigger condition set.

On a more broadly applied and more speculative note, given that alerting signals are often implemented as trigger signals to facilitate the activation of motor responses in dangerous situations (e.g., facilitating the initiation of an emergency stop when driving a car), extending the impact of alerting signals from highly practiced visuo-motor links also to less practiced but instructed visuo-motor links seem encouraging news. More specifically, it may be useful to also apply alerting signals as trigger signals to facilitate the activation of instructed but less practiced, often only theoretical motor programs (e.g., to counter steer or to full braking).

## **ACKNOWLEDGMENTS**

We thank Stefanie Richter for assistance in data collection. Correspondence concerning this article should be addressed to Rico Fischer, Department of Psychology, Technische Universität Dresden, D-01062 Dresden, Germany, e-mail: fischer@psychologie.tu-dresden.de. This research was supported by a grant of the German Research Foundation to Rico Fischer (DFG, FI 1624/2-1).

*Psychol. Hum. Percept. Perform.* 20, 731–750. doi: 10.1037/0096-1523. 20.4.731


(New York, NY: Psychology Press), 43–54.


affect mechanisms of response selection: evidence from a Simon task. *Exp. Psychol.* 57, 89–97. doi: 10.1027/1618-3169/a000012


(Cambridge, MA: MIT Press), 247–273.


ple, go/no-go, and choice RT tasks. *Percept. Psychophys.* 61, 107–119. doi: 10.3758/BF03211952


*Verbal Behav.* 18, 413–426. doi: 10.1016/S0022-5371(79)90228-7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2013; accepted: 27 June 2013; published online: 17 July 2013. Citation: Fischer R, Plessow F and Kiesel A (2013) The effects of alerting signals in masked priming. Front. Psychol. 4:448. doi: 10.3389/fpsyg.2013.00448*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Fischer, Plessow and Kiesel. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Adaptive control of human action: the role of outcome representations and reward signals

#### *Hans Marien1 \*, Henk Aarts <sup>1</sup> and Ruud Custers 1,2*

*<sup>1</sup> Department of Psychology, Utrecht University, Utrecht, Netherlands*

*<sup>2</sup> Cognitive, Perceptual and Brain Sciences, University College London, London, UK*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, USA T. Andrew Poehlman, Southern Methodist University, USA*

#### *\*Correspondence:*

*Hans Marien, Department of Psychology, Utrecht University, Heidelberglaan 1, 3584CS Utrecht, PO Box 80140, 3508 TC Utrecht, Netherlands*

*e-mail: h.marien@uu.nl*

The present paper aims to advance the understanding of the control of human behavior by integrating two lines of literature that so far have led separate lives. First, one line of literature is concerned with the ideomotor principle of human behavior, according to which actions are represented in terms of their outcomes. The second line of literature mainly considers the role of reward signals in adaptive control. Here, we offer a combined perspective on how outcome representations and reward signals work together to modulate adaptive control processes. We propose that reward signals signify the value of outcome representations and facilitate the recruitment of control resources in situations where behavior needs to be maintained or adapted to attain the represented outcome. We discuss recent research demonstrating how adaptive control of goal-directed behavior may emerge when outcome representations are co-activated with positive reward signals.

**Keywords: goal-directed action, motivation, adaptive control, outcome representation, reward signal**

## **INTRODUCTION**

Human goal-directed behavior is supported by a set of mental tools that tune action to dynamic environments. The question how this adaptive control process works has received a lot of attention in the literature (Morsella et al., 2009). Although there exist different conceptualizations, such as executive processes (Smith and Jonides, 1999), working memory operations (Baddeley, 2007), and cognitive control (Miller and Cohen, 2001), they share three basic components of control: active maintenance of goal-relevant information; inhibition of irrelevant information; and shifting of information (Miyake and Shah, 1999).

Most research on the control of human behavior considers the person as the agent of control (Locke and Latham, 1990; Bandura, 2001). People are assumed to control their behavior by setting goals, keeping them active in mind, and adapting their behavior when needed. More recent research adopts a mechanistic account by suggesting that adaptive control processes are self-emergent once a goal is activated (Braver and Cohen, 2000; Postle, 2006; Hazy et al., 2007). In line with this mechanistic account we take the activation of a goal as the starting point of our analysis, and address the question of how the self-emergent process may be modeled to understand how goals instantiate adaptive control.

Basically, two features are central to the control of goaldirected behavior. The first feature pertains to the notion that actions are represented in terms of outcomes. The second feature comprises the rewarding property or value of these represented outcomes. Research on ideomotor theory of action investigates the first feature by examining and explaining how action-effect knowledge is acquired and how outcome representations are implemented in action selection (Hommel, 2013). Research on the second feature investigates how rewarding or positive affective signals, such as positive mood (Aspinwall, 1998; van Wouwe et al., 2011), monetary gains (Muller et al., 2007; Heitz et al., 2008), or positively valenced outcome information (Custers and Aarts, 2005; Gable and Harmon-Jones, 2008) influence perception and cognition in action control.

In essence, both features work in tandem to control behavior adaptively. Whereas outcome representations serve as reference points for perception and action (Powers, 1973; Carver and Scheier, 1982), accompanying positive reward signals assign value or utility to outcomes (Shizgal, 1999) and facilitate the recruitment of executive control processes (Locke and Braver, 2010). However, a theoretical and empirical analysis of the combined role of these features has largely been neglected in the literature. Here, we aim to integrate research on the ideomotor principle and research on the role of reward signals in action control.

## **THE ROLE OF OUTCOME REPRESENTATIONS IN THE CONTROL OF BEHAVIOR**

Human goal-directed behavior is thought to result from the brain's capacity to predict and represent actions in terms of their outcomes (Suddendorf and Corballis, 2007). Activating an outcome representation prepares action in an offline fashion (i.e., planned ahead). However, engaging in goal-directed behavior requires knowledge about action-effect relationships. Action-effect learning has been extensively studied and provides an explanation for the emergence of outcome representations (Shin et al., 2010). Basically, a link between action and effect is formed when a consequence of a motor movement is observed and further strengthened if this effect occurs consistently. Because the link between action and effect is assumed to be bidirectional, this strengthened link can be used to produce a specific outcome. This is the ideomotor principle: activating an outcome representation readily selects the action (Hommel et al., 2001).

According to this principle, multiple outcome representations can be associated with multiple actions (Hommel, 1996; Kunde et al., 2002). This way, goal-directed behavior is structured around equifinality and multifinality sets. Multiple actions can thus serve one outcome or a single action can produce multiple outcomes, rendering goal-directed behavior adaptive (Kruglanski et al., 2002).

Initially the ideomotor principle explains action selection on a sensorimotor level. However, human behavior is more complex and involves goals that are further removed from direct motor activation. It can be suggested, though, that goal-directed behavior emerges from simple movement goals to complex social goals that are accessed in different contexts by the same mechanisms underlying action-effect learning (Maturana and Varela, 1987). We first learn to orchestrate our motor movements before we can effectively hit a light switch and illuminate a dark room. Eventually certain learned patterns of motor movements become associated with new observable outcomes in terms of sensory/perceptual and semantic/cognitive codes (Pulvermüller, 2005; Kray et al., 2006; Lindemann et al., 2006; Aarts and Veling, 2009). Indeed, it has been demonstrated that sensory-motor goal representations (acquired in goal-directed motor tasks) generalize to abstract features of outcomes, such that outcome representations can become socially meaningful (Beckers et al., 2002).

People rely on these outcome representations during action selection and execution. In cybernetic models of action control outcome representations serve as reference points (Adams, 1971). When an action produces an outcome not matching the preactivated outcome representation, an action-related error signal is produced (Carter et al., 1998). Control is then necessary and should subsequently result in switching to a new course of action and inhibiting the old one. Active maintenance of the outcome representation thus often operates in concert with other adaptive control processes to attain the outcome.

## **THE ROLE OF REWARD SIGNALS IN CONTROL**

Ideomotor theorizing provides a parsimonious framework to understand how action-effect knowledge is acquired and how outcome representations are involved in the selection of action. However, it does not include specific predictions about when and how outcome representations gain control over behavior. There is a vast literature that does examine the emergence of adaptive control from an affective-motivational perspective.

First of all, there is research on the role of positive mood or emotion in cognitive control (Ashby et al., 1999; Fredrickson, 2004). This literature suggests that positive affect can broaden cognition (e.g., making people more creative) or funnel cognition (e.g., by focusing on local stimuli). Secondly, there is literature showing effects of prospective monetary gains on control processing such that effortful behavior can be boosted or strategically implemented (Bijleveld et al., 2012). Finally, the positive valence of outcome representations (acquired through evaluative conditioning procedures) can enhance effortful control in tasks generating the outcome (Custers and Aarts, 2010). These different lines of research suggest that positive affect, monetary gains and positive outcome representations serve as a general reward signal that acts as a common currency for modulating adaptive control (Shizgal and Conover, 1996), which either results in increased flexibility or more focused processing (Aston-Jones and Cohen, 2005). It remains unclear how the affective-motivational perspective deals with the question of when flexible or focused processing dominates. However, it is assumed that adaptive control processes originate from subcortical output releases of dopamine in the PFC, which is associated with the processing of general reward signals (Aarts et al., 2011; Chiew and Braver, 2011).

From this affective-motivational perspective, reward signals have been found to play a crucial role in each of the three basic components of adaptive control. Reward signals have been shown to (1) cause active maintenance of task relevant information and outcomes (Zedelius et al., 2011); (2) facilitate the inhibition of task-irrelevant information (Veling and Aarts, 2010); and (3) reduce switch costs in task-switching paradigms (Dreisbach and Goshke, 2004). These findings indicate the close relationship between adaptive control of human action and the processing of reward signals.

Reward-driven modulation of executive control is highly adaptive, because it justifies the allocation of limited cognitive resources (Pessoa, 2009). Resource allocation is guided by a principle of conservation such that effort will be expended only if it can be compensated by a significant benefit in the end (Brehm and Self, 1989; Gendolla et al., 2011). Reward signals thus ensure the recruitment of adaptive control processes when behavioral demands are imposed by environmental changes. Indeed, there are several studies that show how task demands and task incentives interact in producing effort intensity (Bijleveld et al., 2009; Silvestrini and Gendolla, 2013). In this research the conditions of demand are often explicitly communicated and it is shown that individuals invest effort only when the goal is attainable (i.e., moderately high demands) and valuable rewards are at stake. Thus, people seem to make trade-offs by weighing explicit information of reward value and demands. This raises the question of whether demand information needs to be explicit or whether such trade-offs also occur in contexts where differences in demands are less clear.

In a recent line of research we addressed this question using a modality shift paradigm (Marien et al., in preparation-a). Participants were instructed to respond to visual or auditory targets as fast as possible. Immediately before presentation of these targets we either presented a preparatory stimulus in the same modality as the target (ipsimodal trials, e.g., visual-visual), or a preparatory stimulus in a different modality (crossmodal trials, e.g., visual-auditory). The latter type of trials requires more resources (i.e., are more demanding) to respond to than the former type, because participants have to switch their prepared visual modality to the auditory modality. This typically results in a delayed response time caused by a modality switch cost, especially when this switch cannot be anticipated (Turatto et al., 2002). On half of the trials participants were presented with a 5 eurocents coin which they could earn; on other trials this reward signal of the coin was absent. Importantly, the preparatory stimuli were not predictive of whether a switch would occur or not. As expected, participants responded significantly faster when a reward was at stake during crossmodal trials, but there was no speeded responding during rewarded ipsimodal trials. Furthermore, the absence of the latter effect could not be explained by physical limits of speed of responding. Reward signals thus specifically reduce switch costs in an instrumental way, even in contexts that are ambiguous about task demands.

However, in most research on reward signals and cognitive control participants are instructed to perform a given action to obtain a specific outcome. Accordingly, research on the impact of reward signals on adaptive control is thus mainly limited to instructed task goals and does not consider how reward signals interact with outcome representations in controlling behavior (Dickinson and Balleine, 1994). We propose that analyzing the interplay between outcome representations and positive reward signals offers a more comprehensive examination of adaptive control of human action. In the next section, we discuss some recent research that examines this interplay in more detail.

## **THE COMBINED ROLE OF OUTCOME REPRESENTATIONS AND REWARD SIGNALS**

The combined role of outcome representations and reward signals has been examined to explore the building blocks of adaptive control in goal pursuit (Custers and Aarts, 2005, 2010). For instance, the activation of the outcome representation of physical exertion facilitated effortful control in action when this outcome representation was immediately followed by reward signals (i.e., positive words) in an evaluative conditioning procedure (Aarts et al., 2008). Participants resisted the pressure to release but persisted in squeezing a handgrip. Furthermore, this study provided evidence for the distinct roles of outcome representations and reward signals. The mere activation of the outcome representation facilitates initiation of the action, but did not increase control unless positive reward signals were attached to it. Several other studies have also demonstrated the function of reward signals in mobilizing action control (e.g., Capa et al., 2011; Köpetz et al., 2011; Veltkamp et al., 2011).

Building on this line of research, we investigated whether the pairing of positive reward signals with outcome representations translates into adaptive control in terms of making people more flexible in goal-directed behavior (Marien et al., 2012). In a modification of a set-switch paradigm (Dreisbach and Goschke, 2004), participants had to turn on a light by pressing either a left or a right key. On each trial, the correct response was indicated by a dot of a particular color appearing either left or right. A dot of a different color was presented in the opposite location, but had to be ignored. Before each trial, a cue appeared consistently reminding people of the outcome (turn on light). These cues were immediately followed by positive or neutral stimuli. After some trials, participants had to ignore the color they had to attend to earlier and react to a new color. Participants in the positive reward signal condition had significantly lower switch costs than those in the neutral condition. These findings suggest that being able to swiftly switch the course of action to obtain an outcome is dependent on whether the outcome representation of the action was co-activated with reward signals.

Whereas most studies on the combined role of outcome representations and reward signals in facilitating control consider the outcomes as given, from research on ideomotor theory one would expect that these outcome representations are normally acquired in daily life as a result of learning that the outcome follows from an action (Elsner and Hommel, 2001). Thus, according to our present analysis positive reward signals should only increase control when an action is represented in terms of its outcome. Specifically, only when the presentation of a specific stimulus follows an action rather than preceding it, will an accompanying positive reward signal cause people to engage in controlled behavior to obtain the outcome.

In a recent test of this idea (Marien et al., in preparation-b), participants had to execute an action (pressing a key) that was either preceded or followed by a stimulus on the computer-screen (e.g., the word "scissors"). The stimulus was accompanied by a neutral or positive reward signal by presenting a spoken word through headphones (e.g., the word "with" or "nice"). Thus, the stimulus represented an outcome of an action or not, and this outcome representation was co-activated with a reward signal or not. After some pairings, participants were presented with the stimulus on the screen and had to press another key repeatedly to move the stimulus closer to themselves in an easy way (one single key) or a more demanding way (multiple keys). Faster repetitive action in this task implies more control. Results showed that participants were faster in moving the stimulus to themselves only when it represented an outcome of their action and was co-activated with a positive reward signal. This effect was more pronounced when moving the stimulus to themselves was demanding. These findings suggest that adaptive control of goaldirected behavior is more likely to occur when positive reward signals accompany the process of representing action in terms of outcomes. Moreover, resources to control behavior seem to be allocated to obtain the outcome according to a principle of conservation (Silvestrini and Gendolla, 2013).

#### **CONCLUSION, IMPLICATIONS, AND PROSPECTS**

We proposed that an integration of ideomotor accounts with affective-motivational accounts of action can shed new light on the control mechanisms underlying human goal-directed behavior. Although ideomotor theorizing offers a framework to understand how action-effect knowledge is acquired and how outcome representations select action, it is less explicit in predicting when and how control of behavior results from the activation of outcome representations. To understand the emergence of adaptive control reward signals should be taken into account. Although there is some research investigating the impact of reward signals on action-effect learning, the analysis is mainly focused on how it affects the binding strength and performance of the associated action (Muhle-Karbe and Krebs, 2012).

We also suggest that motivational accounts of adaptive control should incorporate more insights of ideomotor theory. Adaptive control processes are closely linked with reward processing, but the role of outcome representations is under-investigated in this literature. It is important for reward signals to connect with outcome representations in order for them to have a profound effect on adaptive control. The present analysis suggests that positive signals of different sources denote the value of an outcome and facilitate control of behavior. This implies that the influence of reward signals on recruiting executive control resources might not follow a direct path, but is mediated by the assigned value of the outcome representation. Future research could address (1) how personal value of an outcome representation results from reward signals, and (2) whether personal value mediates the instigation of control.

One way to approach this matter is by analyzing the neurocircuits prioritizing and controlling goals. Specifically, recent work in cognitive neuroscience proposes the involvement of specific neurotransmitter systems that cause people to exploit (being rigid to reach a goal) or to explore (prioritizing other goals) their environment (Aston-Jones and Cohen, 2005). Noradrenergic pathways in the brain are suggested to be associated with exploitation while dopaminergic pathways are supposed to be engaged in exploration.

This neurocircuit analysis of adaptive control can benefit from the present analysis. Adaptive control in terms of flexible or rigid/persistent processes may be dependent on the level of behavioral representation to which reward signals are attached. Goaldirected behavior is hierarchically structured (Botvinick, 2008), and hence the control of behavior may be directed at the level of action (means) representations or outcome (goal) representations depending on context and individual differences (Vallacher

## **REFERENCES**


and Wegner, 1989). For example, goal-directed control of turning on a light may be identified and guided by the representation of "pressing the button" or "turning on the light." So when representations of means are paired with reward signals action control is more likely to occur on the means level. Paradoxically, this could lead to more rigidity in control. We found that participants were less prone to switch to another action when the representation of the means was cued and paired with reward signals (Marien et al., 2012). In other words, when an outcome representation can be regarded as a subgoal of another outcome representation higher in the hierarchy (i.e., "pressing the key" in order to "turn on the light"), treating it with reward signals will increase local exploitive focus instead of broad explorative processing (Gable and Harmon-Jones, 2008). Taking the level of behavior representation into account may lead to specific predictions when reward signals produce a flexible or rigid mode of control.

Research on adaptive control of human action can advance by looking at outcome representations in combination with reward signals. It can especially help us to understand how the human mind functions optimally in the ever changing environment that we inhabit.


awareness. *Science* 329, 47–50. doi: 10.1126/science.1188595


*Biol. Sci. 362*, 1601–1613. doi: 10.1098/rstb.2007.2055


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2013; paper pending published: 21 July 2013; accepted: 19 August 2013; published online: 09 September 2013.*

*Citation: Marien H, Aarts H and Custers R (2013) Adaptive control of human action: the role of outcome representations and reward signals. Front. Psychol. 4:602. doi: 10.3389/fpsyg.2013.00602*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Marien, Aarts and Custers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The wild ways of conscious will: what we do, how we do it, and why it has meaning

## *J. Scott Jordan\**

*Director, Institute for Prospective Cognition, Department of Psychology, Institute for Prospective Cognition, Illinois State University, Normal, IL, USA*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA T. Andrew Poehlman, Cox School of Business, Southern Methodist University, USA*

#### *\*Correspondence:*

*J. Scott Jordan, Dynamic Cognition Lab, Department of Psychology, Institute for Prospective Cognition, Illinois State University, Campus Box 4620, Normal, IL 61790-4620, USA e-mail: jsjorda@ilstu.edu*

It is becoming increasingly mainstream to claim that conscious will is an illusion. This assertion is based on a host of findings that indicate conscious will does not share an efficient-cause relationship with actions. As an alternative, the present paper will propose that conscious will is not about causing actions, but rather, about constraining action systems toward producing outcomes. In addition, it will be proposed that we generate and sustain multiple outcomes simultaneously because the multi-scale dynamics by which we do so are, themselves, self-sustaining. Finally, it will be proposed that self-sustaining dynamics entail meaning (i.e., conscious content) because they naturally and necessarily constitute embodiments of context.

**Keywords: entrainment, teacher/student interaction, mimicry, imitation, synchrony, mirroring, mirror neuron system**

While the present paper addresses the relationship between consciousness and action control, its ultimate goal is to propose that terms such as "action" and "consciousness" are scientifically inadequate and, in the end, may have to be replaced in a scientific account of what we do, how we do it, and why it has meaning. This is because, as I will argue, the current conceptual framework used in cognitive science (e.g., perception, cognition, action, attention, intention, and consciousness) is not capable of addressing the complex array of causal regularities that have been discovered in cognitive science over the past 30 years.

In addition, the current conceptual framework has yet to give rise to a scientific conception of how we do what we do that renders the phenomenon of "consciousness" a *necessary* aspect of the causal story. That is, consciousness is described as either identical with the physical (i.e., identity theory), emergent from the physical (i.e., emergentism), as an informational property of causal relations (i.e., functionalism), or as an aspect of reality other than the physical (i.e., double-aspect theory and property dualism). In all of these positions, consciousness is not a logically necessary aspect of the causal story. That is, the scientific, causal description of how we do what we do is able to disregard consciousness as a causal factor.

While the notion that consciousness might not be logically necessary is certainly popular, one might also take it to indicate the need for an approach to "how we do what we do" that renders consciousness causal (i.e., non-ephiphenomenal). In what follows, I present Wild Systems Theory as an approach to causality and consciousness that renders the latter logically necessary. To be sure, by the time this has been explicated, the term "consciousness" will mean something different that what is referred to via constructs such as Access Consciousness, Metacognition, and Phenomenal Consciousness (Block, 1995, 2001; Cleeremans, 2005).

## **WHAT WE DO**

One of the reasons consciousness is not seen as logically necessary in scientific accounts of "what we do" is because we do not conceptualize it as an activity. In contemporary cognitive science, "what we do," is conceptualized via terms such as perceive, act, think, attend, intend, infer, cognize, represent, remember, simulate, and behave. Notice that all of these terms are verbs. When the concept "consciousness" is thrown into the mix, it enters as a noun. In short, consciousness is not conceptualized as something we do.

In the early days of experimental psychology, this was not the case. In fact, consciousness was seen as an *act* of intending, ". . . all experience involves directedness toward an object. . . Every mental phenomenon includes something as object within itself " (Ash, 1995, p. 28), and there was much theory regarding "conscious acts," not in the sense that consciousness caused certain actions, but rather, in the sense that certain conscious states were, themselves, actions (i.e., mental acts).

#### **CONSCIOUS THOUGHT AS AN EFFICIENT CAUSE OF ACTION**

As experimental psychology moved away from consciousness and turned toward behavior in the early 1900s, explanations of "what we do" came to be couched in terms of efficient cause relationships between "stimuli" and "behavior." And as cognitive psychology later challenged behaviorism's unwillingness to appeal to internal process (Tolman, 1951; Chomsky, 1959) it nonetheless adopted behaviorism's commitment to discovering efficient cause relationships. And now, instead of efficient cause residing between stimuli and responses, or vice versa, it has come to permeate the entire servo-mechanistic architecture that ultimately connects perceptual inputs to internal representations (i.e., cognitive structures) to behavioral outputs<sup>1</sup> .

Within such a servo-mechanistic framework, the relationship between consciousness and action control tends to be described such that conscious thoughts are modeled as causing actions. Another way to say this is that thoughts share an efficient cause relationship with actions. Despite the apparent obviousness of this claim, findings have come to the fore over the past few decades that severely challenge this idea. Wegner (2002) organizes these finding around Michotte's (1963) work on the perception of causality. Specifically, inspired by Hume (1739/1888) and his assertion that our sense of conscious agency constitutes yet another example of "perceived" causality based on contingent correlation (vs. metaphysical causality), Michotte discovered that our sense of causality was dependent upon the principles of priority (i.e., event A must precede event B for A to be experienced as the cause of B), consistency (i.e., the more event A is consistent with event B, the more event A is experienced as being the cause of event B), and exclusivity (i.e., the fewer event As there are, the more an event A is experienced as the cause of event B).

In Wegner's (2002) work, these principles translate into the idea that we perceive ourselves (i.e., our thoughts) to be the cause of our own actions to the extent our thoughts precede our actions (i.e., priority), our actions are consistent with our preceding thoughts (i.e., consistency), and our thoughts are the only available cause of our actions (i.e., exclusivity). Wegner then reports multiple examples of how violations of these principles lead to illusions of conscious will (i.e., feeling as though we caused actions we did not cause, or feeling as though we did not cause actions that we, in fact, did).

As regards priority, Wegner points to Kornhuber and Deecke's (1965) classic work on the Bereitschaftspotential (readiness potential), a negativity in the supplementary motor cortex that begins roughly 1 s prior to the initiation of a voluntary finger flexion. In Libet's (1985) classic work, he found that while the Bereitschaftspotential begins roughly 1 s *before* a movement, one becomes consciously aware of having planned a movement roughly 200 ms before the movement. This discrepancy is often interpreted as implying that the brain knows what one is planning to do before one is even aware of it. Wegner argues this constitutes a violation of the priority principle. That is, if thoughts cause actions, the "thought" of planning the tap should precede the onset of preparatory brain dynamics.

As regards consistency, Wegner (2002) cites Langer and Roth's (1975) finding that people are more likely to feel as though they controlled a chance event (e.g., they willed a particular number to result from the roll of a die) if they have previous experience successfully predicting such events. Wegner claims this to be evidence for the illusory nature of conscious will because people felt themselves to be in control of an event they were not in control of, simply because the final event was consistent with their preceding thoughts.

Finally, as regards exclusivity, Wegner reports research of his own (Wegner et al., 2003) in which participants completed a simple yes/no reaction time task while a confederate sat behind them. The confederate reached around the participant's torso and held her index fingers just above the fingers the participant was using to indicate yes/no responses. The confederates never made contact with the participant's fingers. And although the participants were accurate on 87% of the trials, they attributed 37% of the influence for the answers to the confederate. In short, the simple availability of the confederate as a potential cause of the response led the participants to experience a reduction in their own efficacy.

Wegner (2002) uses the above-mentioned experiments, as well as many, many others, to support the claim that conscious will is an illusion. That is, since these data so clearly reveal that our sense of agency (i.e., the feeling that our thoughts are the cause of our actions) is vulnerable to Michotte's (1963) principles of perceived causality, it must be the case that we are incorrect, and our sense of agency is actually an illusion. The true causes of our actions are unconscious, automatic associations between perception and action, what Bargh and Chartrand (1999) refer to as the "perception-action" link.

To be sure, there are contemporary cognitive scientists who disagree with the idea that conscious will is an illusion (see Baumeister et al., 2010). The point of addressing this issue so thoroughly at present is to propose that perhaps the reason conscious will is seen as being illusory is because certain researchers have committed themselves to an efficient cause approach to psychological functionality (i.e., how do we do what we do) that has historically led to the implicit assumption that thoughts cause actions. That is, it may be the case that thoughts did not evolve to cause actions, and the notion that conscious will is an illusion is a misconception one derives from a commitment to an illusory, efficient-cause architecture regarding the relationship between consciousness and action.

### **THOUGHT, ACTION, AND EVENT CONTROL**

Every year, millions of people all over the world watch professional soccer matches. During such matches, referees make judgments about the intentional states of players whenever the ball makes contact with a player's hand. The judgment has to do with whether or not the player intended (i.e., pre-specified) that the hand should hit the ball. While the anecdote might seem out of place, it nicely illustrates what is at stake in the conversation regarding the nature of conscious will. For if the referee decides the player acted intentionally, what is it that the player intended? Did the player pre-specify a particular movement of the hand or a particular outcome (i.e., hit the ball)?

William James (1890) believed that voluntary action had more to do with outcomes than limb movements:

I trust that I have now made clear what that 'idea of a movement' is which must precede it in order that it be voluntary. It is not the

<sup>1</sup>To be sure, there have been those who have critiqued experimental psychology's reliance on efficient cause explanations all throughout its history. These critiques have come primarily from researchers espousing a more dynamic approach to psychological functionality including the Gestalt psychologists (Ash, 1995), the New Realists such as Holt and Gibson (Charles, 2011), and a host of contemporary researchers making use of dynamical systems theory such as van Gelder (1998); Van Orden and Holden (2002), and Coey et al. (2012). These will be discussed in the section below entitled, "How we do it."

thought of the innervation which the movement requires. It is the anticipation of the movement's *sensible effects*, *resident, or remote, and sometimes very remote indeed*. (Volume 2, p. 521)

James' assertion that voluntary action involves the prespecification (i.e., anticipation) of a movement's "sensible effects" is consistent with the notion that what is pre-specified during intentional action is the outcome, not limb movement. James then describes different levels of sensible effects: resident, remote, and very remote. "Resident" refers to the proximal, somatosensory, kinesthetic effects of movement. "Remote" refers to the distal effects of movement (e.g., seeing and feeling oneself make contact with a soccer ball). "Very remote indeed," refers to effects beyond ones current context that one can pre-specify and work toward (e.g., going to the store to buy a bottle of milk, saving money to buy a new stereo, or becoming a college graduate).

Common to all three of these levels of sensible effects is the fact that (1) they can be pre-specified and therefore constitute intentionality, and (2) the pre-specification is of "effects" that will, at some point in time (i.e., proximal, distal, and abstract) result from movement. In short, inspired by James, it is my contention that "what we do" is best described as the pre specification and control of effects at multiple times scales, simultaneously; what I refer to as multi-scale effect control (MSEC). For example, as one dances a Tango with another, one simultaneously controls limb movements (i.e., proximal effects), one's distance from the partner (i.e., distal effects), and the larger-scale pattern of successfully completing an entire, pre-specified dance (i.e., abstract effects). All three levels are pre-specified and controlled continuously and simultaneously.

On the one hand, the notion that we pre-specify and control effects at multiple time-scales simultaneously seems at odds with the feeling that conscious will tends to involve one prespecification at a time (e.g., pick up the pen, answer the question, walk to the store). In what follows, I review recent findings that reveal the brain continuously feeds memories of the past into the present as anticipation about the future, at multiple time scales simultaneously. In short, the anticipation of effects, resident, remote, and very remote indeed, constitutes a design principle of the brain.

## **MULTI-SCALE EFFECT CONTROL AND THE BRAIN**

Over the past three decades, neuroscientists have discovered recursive connections between the cortex and the cerebellum that continuously render cortical activity anticipatory. Neurons in motor cortex, for example, project to neurons in the spinal cord as well to neurons in the cerebellum. These same cerebellar neurons receive inputs form the sensory neurons located in the limbs that are made to move by the associated motor neurons (Kawato et al., 1987). These cerebellar neurons project back to cortex. Thus, as one learns a particular limb movement (e.g., an infant learning to grasp a ball), and successful movements are repeated, successful *command-feedback regularities* become stored in these cortical-cerebellar networks such that when the infant later initiates such a movement, the cerebellar neurons are able to prime the motor cortical neurons before sensory feedback arrives from the moving limb. This is because the cortical-cerebellar networks have a time-cycle of 10–20 ms, while actual sensory feedback has a time-cycle of 120 ms. This faster-than-feedback time-cycle allows us to generate very fast, controlled body movements.

Kawato et al. (1987) refer to this cerebellar priming of cortex as *anticipatory motor error*, Clark (2001) and Grush (2004) refer to it as *virtual feedback*, and Paulin (1993) refers to it as *dynamic state estimation*. Quite often, these cerebro-cerebellar networks are referred to as *forward models* (Miall, 2003; Wolpert et al., 2003; Ito, 2005, 2008; Shadmehr and Krakauer, 2008; Golfinopoulos et al., 2009; Koziol and Lutz, 2013), and/or cerebellar control models (Koziol et al., 2011). Common to all these nomenclatures is the assertion that cerebellar-cortical networks are anticipatory and that the anticipation they entail derives from previous experience.

In addition to sharing recursive innervation with the motor cortex, the cerebellum also shares such connectivity with the reticular, autonomic, and limbic systems, as well as the prefrontal cortex, multimodal regions of the posterior parietal lobes, and the temporal lobes (Schmahmann, 2001). These recursive cerebro-cerebellar connections entail a two-step feedforward projection from cortex to the pons to the cerebellum, and a twostep feedback projection from cerebellum to thalamus to cortex (Schmahmann, 2001). Koziol et al. (2011) assert that the entire cortex is innervated by the cerebellum, save for the inferior temporal cortex, while Buckner et al. (2011) hypothesize the entire cortex is represented in the cerebellum, save very early vision and audition centers. Regardless of these small differences, it is clear the vast majority of the cortex shares recursive innervation with the cerebellum. Given these cortical projections to cerebellum are functionally segregated, it seems the brain entails a host of cerebellar control models (Koziol et al., 2011).

The discovery of memory-primed, prospective cerebrocerebellar networks holds major implications for consciousness and action-control specifically, and cognitive science more generally. To begin, the existence of such networks constitutes evidence for James' (1890) assertion that what makes an action voluntary is the pre-specification of its "sensible effects." While this idea conjures up images of an individual expending large amounts of conscious effort to imagine (i.e., pre-specify) what an action's sensible effects should be like, the notion of prospective cerebrocerebellar networks illustrates how such pre-specifications are continuously fed to the cortex via its connections with the cerebellum. Activity in the cortex is continuously rendered prospective (i.e., anticipatory) as past experiences stored in cerebro-cerebellar networks are fed forward in the present as anticipations about what should happen next.

The discovery of prospective cerebro-cerebellar networks also provides support forJames' (1890) assertion that "sensory effects" can be pre-specified at many different scales: resident (proximal), remote (distal), and very-remote, indeed (abstract). This implies that as we think, perceive, and act, cortical areas involved in such activities are continuously primed by past thoughts, past perceptions, and past actions stored in cerebro-cerebellar circuits. In short, cognition, perception, and action are all prospective, and what is "pre-specified" in each is the potential, eventual occurrence of an effect at an abstract, distal, or proximal time scale. Ito (1993) recognized this aspect of brain design decades ago and used it to argue that the neurodynamics underlying thought were of the same kind as those underlying movement control. The notion of a neurodynamic homology underlying thought and action is shared by many researchers (Schmahmann, 2001; Koziol et al., 2011; Ito, 2012; Koziol and Lutz, 2013), and it led Kinsbourne and Jordan (2009) to claim that anticipation constitutes a design principle of the brain.

Another point to make about such neurodynamic homology is the fact that all of these different event control systems function simultaneously. This means that future outcomes are being specified continuously at the proximal, distal, and abstract scale. For example, as one walks down a flight of stairs while talking to a friend and consciously anticipates what the friend will say next, one is simultaneously unaware of the fact that future outcomes are being generated for the feet; that is, until one steps out onto the floor and begins to fall forward because the floor is not there. During this moment of surprise, one is aware of the discrepancy between the pressure one was supposed to feel when the foot landed on the floor, and the unanticipated lack of pressure experienced because the floor was not there. It is in this moment of conscious error detection that one realizes she was unconsciously anticipating a certain amount of pressure on the bottom of the foot at a particular moment in the foot's trajectory. This unconsciously anticipated pressure on the foot is a pre-specified proximal outcome. It was generated as the cerebellum continuously and unconsciously primed the cortex with past patterns associated with negotiating stairs. In addition, the brain simultaneously generated conscious predictions about the conversation. In short, cerebro-cerebellar loops result in the cortex being continuously primed for events at multiple time-scales (i.e., proximal, distal, and abstract), simultaneously.

#### **PERCEPTION AND ACTION AS MULTI-SCALE EFFECT CONTROL**

While on the one hand it seems appropriate to conceptualize motor control (i.e., proximal effect control) as being mediated via cerebro-cerebellar control loops, it is more challenging to conceptualize perception as being controlled via such loops. This is because traditional approaches to how we do what we do implicitly, if not explicitly, conceptualize perception as an attentionattenuated input that, in the end, is used to guide action. In what follows, I review research in the area of spatial perception in the hope of demonstrating how one might conceptualize perception as distal effect control.

Research on spatial perception clearly indicates we perceive the location of distal stimuli prospectively, in relation to the "sensible effects" we are pre-specifying for the distal stimulus. For example, it has been known for some time that the perceived vanishing point of a moving stimulus is localized beyond the actual vanishing point in the direction of stimulus motion (Hubbard, 1995, 2005). In addition, the magnitude of the spatial displacement (sd) varies with the laws of physics, in that the faster the stimulus movements, the larger the SD.

While SD is often accounted for in terms of representational momentum—the idea that evolution has endowed the brain with the ability to present dynamic as well as static properties— Jordan (2009) argues SD has more to do with planning dynamics than representational dynamics. In Kerzel et al. (2001) SD was eliminated, and in Jordan et al. (2002), SD actually became negative (i.e., participants perceived the stimulus to vanish behind its actual vanishing point) if participants were asked to fixate on a centrally located fixation cross as the stimulus moved across the screen, or moved around the fixation cross, respectively. That is, once participants were not allowed to track the movements of the stimulus with their eyes, which requires planning, forward SD vanished.

Further experiments reveal that the "planning" that gives rise to SD has to do with the movements of the distal stimulus (i.e., remote sensory effects according to James, 1890), not the movements of the body (i.e., proximal effects). For example, Jordan et al. (2002) asked participants to fixate on a centrally located fixation cross as a stimulus moved on a circular trajectory around the fixation cross. Half of the participants were asked to press a button as soon as the stimulus began to move (i.e., the cue condition). The other half was asked to press the button in order to make the stimulus vanish (i.e., the intention condition). This manipulation resulted in two groups of participants who were generating the same proximal effects (i.e., fixate on a fixation cross and press a button) in order to obtain different distal effects (i.e., respond to the stimulus' onset or make it vanish). In the cue condition the pre-specified distal effect referred to the initial position of the stimulus, while in the intention condition, it referred to final position.

When participants pressed the button the stimulus vanished. Participants then indicated the perceived vanishing point. Analyses revealed that those responding to the onset of the stimulus (i.e., the cue condition) saw it vanish behind the actual vanishing point, in the direction of the initial position, while those who pressed the button to make the stimulus vanish (i.e., the intention condition) saw it vanish precisely where it had vanished. This difference in perceived vanishing points is consistent with the assertion that the influence of planning on spatial perception derives from the distal effect (i.e., stimulus movements) the participant is planning, not the body movements (i.e., proximal effects) generated in order produce the distal effect. Both groups were specifying and controlling the same proximal effects (i.e., hold the eyes in a certain position and move the finger in a certain way), but they were doing so for different distal reasons. For those in the cue condition, the specified distal effect (i.e., press the button as soon as the stimulus appears) referred to the initial position of the distal stimulus, and the perceived vanishing points were attracted backward toward this initial position. For those in the intention condition, the specified distal effect (i.e., press the button in order to make the stimulus vanish) referred to the final position of the distal stimulus, and given the vanishing point was known by the participants because they pre-specified it and produced it, there was no SD.

Collectively, the data of Kerzel et al. (2001) and Jordan et al. (2002) indicate that the an important portion of the SD experienced in studies involving oculomotor tracking (Hubbard, 2005) derives from the planning required to keep the eyes aligned with the movements of the stimulus. This is consistent with James' (1890) assertion that we are able to pre-specify remote (i.e., distal) effects. Another important aspect of these distal prespecifications is that they have to be generated continuously if one it to successfully track the moving stimulus. To be sure, no one experiences himself or herself as consciously willing the eyes to move. Rather, what one experiences is the pre-specification and sustainment of "track the stimulus." How it is that one actually accomplishes the tracking is simply outside of one's conscious awareness. This is interesting, for it implies that the anticipated stimulus position (i.e., the specified distal effect) is not being generated consciously. This implies that while one is consciously prespecifying "track the stimulus" the anticipated stimulus positions are being generated outside conscious awareness.

In order to better understand how it is one can learn to unconsciously pre-specify distal effects, Jordan and Hunsinger (2008) conducted an experiment in which one participant controlled stimulus movements back and forth across a computer screen via right and left button presses on a keyboard while another participant, who could neither see nor hear the controller, observed the movements of the stimulus on a separate monitor. At some point during the trial the stimulus unexpectedly vanished and the observer moved a crosshair to the location on the monitor where she saw the dot vanish. After forty trials (i.e., Phase 1), the two participants switched roles and the participant who previously controlled the stimulus now indicated perceived vanishing points (i.e., Phase 2). Analysis revealed that participants with previous control experience produced significantly larger SD than participants having no such experience.

In another experiment, Jordan and Hunsinger (2008) investigated the aspects of Phase 1 control experience that led to later changes in perception. Specifically, they replicated Experiment 1 except for the fact that during Phase 1, an observational learner sat next to the controller. In Phase 2, the observational leaner switched places with the Phase 1 naïve observer (i.e., the Phase 1 naïve observer controlled the stimulus while the Phase 1 observational learner observed stimulus movements and indicated perceived vanishing points). In addition, during Phase 1, half of the observational learners were allowed to see the movements of the stimulus on the computer screen as well as hear and see the button presses the controller made while controlling the stimulus movements. In contrast to these "full access" participants, the other half of the observational learners were denied access to the actions of the controller (i.e., a board prevented them from seeing the key presses while headphones prevented them from hearing the key presses). Analyses revealed that the Phase 2 SD from the "full access" observational learners was larger than that of the "no action access" observational learners. In addition, the SD values of the two groups basically replicated the SD pattern of Experiment 1, with "no action access" participants producing SD similar to that of naïve observers, and "full access" participants producing SD similar to that of observers having previous control experience. In short, observational learners who were given access to the proximal and distal effects generated by the controller (i.e., key presses and stimulus movements, respectively) later perceived the stimulus movements in the same way it was perceived by those who had actually, previously controlled it.

Jordan and Hunsinger (2008) accounted for these finding by asserting that during Phase 2 observation, the moving stimulus activated the distal effect planning (i.e., planning of stimulus movements) the participant had learned to generate while controlling the stimulus during Phase 1. That is, having learned to control the distal event in Phase 1 (i.e., the movements of the stimulus), perception of the stimulus in Phase 2 activated Phase 1 control memories such that the participant experienced the stimulus in terms of the control dynamics learned during Phase 1.

Recent findings in cognitive neuroscience shed light on how "remembered control dynamics" can be activated during perception. First, areas of the cortex involved in *planning* distal events (i.e., pre-motor cortex) are also involved in *detecting* distal events (Rizzolatti et al., 1996; Hommel et al., 2001). This reveals that we perceive distal events in terms of the plans we would generate to produce the distal event ourselves. Second, a finding previously mentioned in the present paper, the vast majority of the cortex shares recursive innervation with the cerebellum (Miall, 2003; Koziol et al., 2011). Collectively, these finding indicate that (1) seeing is planning, and (2) planning activated by distal events is continuously, prospectively primed by recursive cerebro-cerebellar memories. Thus, while participants controlled the stimulus during Phase 1, patterns in the planning states they generated for the stimulus (e.g., make it accelerate, make it coast across the screen, make it decelerate, make it stop and change direction) altered the cerebro-cerebellar dynamics generating these plans such that during later observation, the movements of the stimulus gave rise to pre-motor planning dynamics that were continuously, prospectively primed by remembered planning patterns embedded in cerebro-cerebellar dynamics. To be sure, during later observation participants did not have to consciously generate anticipated stimulus locations. Rather, they consciously activated "watch the stimulus," and given that distalevent detection (i.e., perception) and distal-event planning (i.e., distal effect planning) share neural overlap, they "perceived" the stimulus movements in terms of distal-event plans that were continuously primed by cerebro-cerebellar memories. Thus, while "watch the stimulus" was consciously pre-specified, conscious anticipation of stimulus positions was unnecessary, for they were provided by cerebro-cerebellar memories.

While the notions that (1) distal-event planning and detection share neural overlap, and (2) continuous priming of planning via cerebro-cerebellar memories, collectively provide an account of why observers with previous control experience give rise to larger SD than naïve observers, they do not explain the differences in SD between the "full access" and "no action information" observational learners. As an account, Jordan and Hunsinger (2008) propose that full access observational learners developed planning memories like those of controllers because while they watched the controller control the stimulus during Phase 1, they had continuous access to both the movements of the stimulus (i.e., the distal effect) and the key presses the controller made in order to control the movements (i.e., the proximal effect). This assertion is supported by data indicating that in addition to pre-motor cortical areas being involved in both planning and detecting distal events (i.e., what is often referred to *mirroring*), there are parietal cortical areas (i.e., PF) that are involved in both the planning and detection of proximal events (i.e., body movements). Iacoboni (2005) proposes that the pre-motor mirroring systems and the parietal mirroring systems, along with STS located in the temporal lobe, collectively constitute a mirroring system that affords us the ability to imitate and understand the actions of others. Iacoboni further proposes that while the planning which occurs in pre-motor cortex refers to distal events (e.g., grasp a raisin), the planning in parietal cortex has more to do with proximal effects (i.e., the anticipated somatosensory feedback of moving an effector). This assertion is based on findings that indicate that as one simply observes meaningful and meaningless actions, frontal mirroring activation is more prominent in the former, while parietal activation is more prominent in the latter (Grezes et al., 1999). This is because the parietal system is involved in the analysis of body movement. This "frontal-parietal" division of labor is further supported by the finding that that both systems are active during the observation of meaningful and meaningless actions if one has the goal of imitating the observed action.

Jordan and Hunsinger (2008) utilized Iacoboni's frontalparietal-STS mirroring system theory of imitation as an account of why observational learners with access to the actions later produced large SD similar to that of those having previous control experience. Specifically, they assert that while observing the controller control the movements of the stimulus via button presses, the movements of the stimulus activated the frontal mirroring system (i.e., distal-effect system—Jordan, 2003) while the sight and sound of the finger movements activated the parietal mirroring system. These frontal-parietal activations would have activated their associated cerebellar recursions. Given these mirroring systems are involved in both pre-specification (i.e., planning) and detection (i.e., perception), their continuous activation via observation could have driven the observer through the same planning states the controller was undergoing. It's as if the proximal-distal pattern generated by the controller hijacked the multi-scale planning states of the observer and, in a sense, drove the observer's proximal-distal cerebro-cerebellar systems as if the observer were generating the planning states endogenously. It seems possible that the repeated, exogenous activation of these systems led to changes in the observer's cerebro-cerebellar systems such that during later observation, the movements of the stimulus drove the observer's planning states as if they had actually had previous control experience. Observational learners who did not have access to the controller's actions, would not have been able to experience the proximal-distal patterns generated by the controller (i.e., they only had access to the distal pattern the movements of the stimulus). Hence, they did not have the opportunity to learn a proximal-distal model, and during later observation, simply experienced the stimulus much like a naïve observer.

Collectively, the work of Kerzel et al. (2001), Jordan et al. (2002), and Jordan and Hunsinger (2008) lead to two very important implications regarding multi-scale event control: (1) multiple effects at different scales (i.e., proximal and distal) are prespecified continuously and simultaneously, (2) the pre-specified effects are generated unconsciously via remembered "planning" dynamics embodied in cerebro-cerebellar loops. In addition, given that cortical areas involved in detecting events are also involved in pre-specifying distal events, it seems difficult to sustain the traditional practice of conceptualizing perception as an attention attenuated input that is used to guide action. The finding that mirroring systems are involved in both the detection and pre-specification of distal effects indicates these systems simultaneously contain both the pre-specified distal effect (i.e., the goal) and the current state of the distal event (i.e., feedback). Thus, it seems evolution has left us with a rather elegant solution to controlling distal effects. Instead of developing one system for *doing* and another for *seeing,* as is assumed in traditional approaches of psychological functionality, evolution has endowed us with a host of systems that are able to pre-specify and detect distal events simultaneously.

## **COGNITION AS MULTI-SCALE EVENT CONTROL**

In addition to proximal and distal effects, however, participants in the above-mentioned studies were simultaneously pre-specifying and generating effects that could be labeled as cognitive. For example, all of the participants were pre-specifying and sustaining the abstract effect of complying with the experimenter's instructions. This is an abstract effect, what James (1890) referred to "very remote indeed" because it is a pre-specification of the how the participant will configure herself in the current context.

There are infinite degrees of freedom in terms of the proximaldistal pattern of effects one can pre-specify and sustain in a given context. The participant could have hopped on one leg across the room, or curled up into a corner and read a book. Both of these proximal-distal configurations were afforded by the laboratory context. Participants were able to inhibit all the other proximaldistal options the context afforded and, instead, produce the "make the stimulus move across the screen by pressing buttons" option requested by the experimenter, because they were able to pre-specify it (i.e., they constrained their proximal and distal effect systems toward controlling the movements of the stimulus) and sustain it (i.e., they prevented their proximal and distal effects systems from producing a different configuration). As any caregiver knows, getting a child to organize himself in a particular way in a particular context (e.g., clean up his room) is very difficult. Fair et al. (2007) report that the cinguloopercular system believed to underlie set maintenance (i.e., the ability to focus on a specific task—maintain a specific abstract effect—for an extended period of time) segregates itself developmentally from the frontoparietal system believed to underlie adaptive online task control (i.e., proximal and distal effect control). Thus, by the time a college student participates in an experiment, she has already developed neural systems that afford abstract effect control.

To be sure, what a student is "doing" in an experiment is even more abstract (i.e., remote) than complying with instructions. For as students comply with instructions, they are actually doing so in order to sustain an even more abstract (i.e., more remote) effect; namely, receiving extra credit or monetary payment for participating in an experiment. And what is more, they are prespecifying and sustaining the "extra credit" abstract effect in order to work toward achieving the even more abstract effect (i.e., very remote indeed) of receiving a particular grade in a course.

The point being made here is that when a person participates in an experiment, they are doing significantly more than theories of action, perception, and/or cognition often give them credit for doing. Specifically, while participants in the previously mentioned studies were pre-specifying and producing the distal effect regarding the distal stimulus (i.e., make the stimulus move back and forth across the screen), they were *simultaneously* prespecifying and generating multiple proximal effects (e.g., move your eyes in a manner that allows you to track the stimulus, and move your fingers in ways that result in the buttons being pressed). To be sure, they were actually pre-specifying and generating many, many other proximal effects all at the same time, such as hold the body in a particular position, keep the hand configured in a way that affords fast button presses, and keep the head positioned toward the computer monitor.

One could try to make the case participants were "perceiving," "acting," and "thinking." But in reality, as the above-stated examples indicate, they were doing so much more (i.e., pre-specifying and sustaining a constellation of multi-scale effects), and they were doing all of it at the same time.

## **HOW WE DO IT**

Given that persons pre-specify and sustain multiple effects at multiple time-scales simultaneously, it is not clear to what extent the concepts "action," "perception," and "cognition" are terribly useful in a scientific context. Traditional assumptions that frame perception as input, action as output, and cognition as intermediary processing, fail to acknowledge the cerebro-cerebellar homology that underlies the various levels of effect control. This leads them to overlook the fact that all levels of effect control entail pre-specification (i.e., planning) and detection (i.e., perception).

### **PLANNING AND CONTROL IN MSEC**

To be sure, "planning" looks different in this framework in that it (i.e., planning) takes place continuously at multiple levels of effect control as the cerebellum prospectively and continuously primes the cortex. "Control" also looks different within the framework of MSEC because it (i.e., control) does not mean "cause" in the efficient cause sense that one level of effect control (e.g., conscious thought) "causes" changes in another (i.e., action) in the same way one billiard ball "causes" another to move (Jordan and Ghin, 2007). Rather, within the framework of MSEC, different levels of effect control *constrain* each other. That is, the more proximal scales of effect control (e.g., moving one's hand a particular way, or positioning one's body in a particular configuration) find themselves prospectively and continuously *constrained* toward the generation of pre-specified distal effects (e.g., press the buttons or sit in front of the computer, respectively). And these distal-effect systems (Jordan, 2003; Clark, 2007) find themselves prospectively and continuously constrained by more remote, abstract effect systems (e.g., comply with the experimenter's instructions or obtain extra credit points for a course) as well as proximal effect control systems.

*Constraint* in this sense means that the cortical areas involved in different levels of effect control influence each other continuously via neural recursion. For example, Fair et al. (2007) report that during the developmental segregation of the cinguloopercular system (i.e., the system believed to underlie set maintenance) and the frontoparietal system (i.e., the system believed to underlie online task control), short-range neural connections between closely adjacent brain regions within each system "grow down" (i.e., decrease) with age, while long-range functional connections between the systems "grow up" (i.e., increase). In addition, they speculate,

These developmental dynamics may represent a learning mechanism whereby precursors to adult task sets are originally derived from more available signals generated by regions of the more rapidly adaptive control network (i.e., frontoparietal). In this sense, the performance of tasks with novel components would rely more heavily on rapidly adaptive control generated by the frontoparietal network. With greater age, and therefore greater experience, stored task sets may be retrieved and stably maintained throughout the task epoch by the cinguloopercular network. (p. 13511)

This developmental increase in long-range neural connectivity allows different levels of effect control to constrain, not cause, each other because at any given moment, the activity of a given neural area is modulated continuously by both long-range and short-range projections. Thus, the activity configuration in a given cortical area at any given time constitutes an emergent, dynamic compromise among all the forces impinging upon the neurons that constitute that cortical area. In short, the extreme level of recursion in brain organization makes if difficult, if not impossible, to make coherent "efficient-cause" assertions regarding brain dynamics in general, let alone the type of influence one level of event control shares with another, specifically.

MSEC's proposal to conceptualize brain dynamics in terms of constraint as opposed to efficient cause is consistent with Rosen's (1991) assertion that the dynamics of biological systems in general are simply closed to efficient cause. It is also consistent with Van Orden and Holden (2002) assertion there does not exist a causally isolated level of brain dynamics capable of mediating efficient cause relationships between isolated content vehicles. Rather, brain dynamics are inherently "interaction-dominant" in that activity in all neurons, as well as neural areas, is continuously modulated by the activity taking place in a plethora of other neurons and neural populations. To continue the recursion, recursive brain dynamics are continuously modulated by body and world dynamics, just as body and world dynamics are continuously, recursively, modulated by brain dynamics.

## **MULTI-SCALE RECURSION, ACTION CONTROL, AND CONSCIOUSNESS**

In the midst of all this multi-scale recursion (i.e., constraint), it becomes increasingly difficult to sustain "efficient cause" approaches to "how we do it." Neither psychological functions, neural networks, nor neurons are causally isolated. As a result, assertions regarding whether or not there exist efficient cause relationships between thought and action might simply be outdated. And experiments that reveal persons to be capable of feeling as though they caused events they did not, or as though they did not cause events they actually did, might be misinterpreted.

Instead of such data revealing a delusion of control (Wegner, 2002), they might reveal intervals of uncertainty that emerge spontaneously as one controls multiple effects simultaneously. For example, Knöblich and Kircher (2004) asked participants to control the trajectory of a dot presented on a computer monitor. They did so by moving a stylus on a writing pad capable of detecting and codifying stylus movements. Participants could not see their hand movements. The specific task was twofold: first, they were to make the dot move in such a way that its movement through the 12:00 position was synchronized with the presentation of a temporally predictable tone. Second, they were to lift the stylus from the pad if, at any moment, they detected a difference between stylus and dot movement.

To assess the participants' sensitivity to perturbations of visuomotor coordinations, the experimenters manipulated the relationship between the movement dynamics recorded on the digital pad and the visual effects displayed on the monitor such that on the fourth cycle of a given trial, the velocity of the dot was increased relative to the movements recorded on the pad. As a result, in order to execute a visual circle, participants had to basically draw an ellipse. Results indicated that participants did not become aware they were drawing ellipses versus circles until the velocity of the stimulus was increased by 50% of its initial value.

From the traditional perspective, one might claim that during the period in which participants were not aware of the discrepancy between the ellipses they were generating with their hands and the circles they were generating on the monitor, they were suffering a delusion of control. That is, one might assert participants were experiencing an illusion of conscious will because they were intending to produced circles with their hands but were actually producing ellipses. However, it might also be the case that the participants' proximal and distal control systems were functioning properly (i.e., the proximal systems were controlling hand movements—pre-specified and achieved kinesthetic feedback while the trajectory of the hand movements was continuously constrained by distal effect systems). In this sense, one might propose that proximal and distal effect systems are coupled in such a way that the function of the latter is not to "cause" the former, but rather, to "constrain" the former toward a specific distal outcome—draw a circle.

Given that *constraint* takes time as the neurodynamics supporting one level of event control influence the neurodynamics of another, one should not be surprised to find temporal windows (i.e., psychophysical intervals of uncertainty) during which a distal-event system is "unaware" of the faster time-scale dynamics of the proximal-event systems the former is constraining. Dennett (1991) said much the same thing in his critique of Libet's (1985) paradigm. Specifically, he claimed that the temporal order, or sequence, in which the nervous system distributes information is not dictated by the order in which the information is transduced by the sense organs. Rather, it is dictated by the temporal constraints imposed by the on-going control of the body in space-time. Dennett refers to these constraints as "temporal control windows" and contends that the nature of these windows is a function of the relevant sensory-motor coordination.

When we are engaged in some act of manual dexterity, "fingertip time" should be the standard; when we are conducting an orchestra, "ear time" might capture the registration. (p. 162)

According to MSEC, different event control systems will have different "temporal control windows," and results such as those obtained by Knöblich and Kircher (2004) emerge out of the multi-scale temporal control windows demanded by a certain task.

This idea of yoked, yet distinct systems that function simultaneously and mutually constrain one another is part and parcel to distinctions vision researchers frequently make when referring to Milner et al.'s (2006) vision for perception, versus vision for action distinction. The major difference between MSEC and Milner and Goodale's model is that, in the latter, "vision for action" and "vision for perception" basically reduce to "visual *input* for moving" and "visual *input* for seeing," respectively. This notion that perception (i.e., seeing) constitutes *input* still permeates both contemporary philosophical and psychological discussions regarding the ventral-dorsal distinction (Clark, 2007; Milner et al., 2013). In MSEC, perception is not *seeing*; it is not *input*. Rather, it is the pre-specification and detection of distal events. In short, it is distal-effect control. And what is more, all levels of effect control entail pre-specification (i.e., planning) and detection (i.e., perception). So to refer to one brain area as a "doing" area and another as a "seeing" areas prevents one from recognizing that all such areas are "doing" something (i.e., controlling effects) via the same cerebro-cerebellar homology, just at different, yet yoked, time scales.

## **WHY IT HAS MEANING**

To be sure, conceptualizing "how we do it" in terms of multiscale systems sharing recursive interactions is not new. This idea is espoused by many theorists in both the dynamical systems camp (Clark, 1997, 2000; van Gelder, 1998; Juarrero, 1999; O'Regan and Nöe, 2001; Myin and O'Regan, 2002; Van Orden and Holden, 2002) and the computationalist camp (Powers, 1973; Kawato et al., 1987; Meyer and Kieras, 1997a,b). What distinguishes MSEC from these other approaches is the manner in which it conceptualizes the nature of the multi-scale dynamics. Specifically, MSEC is actually a sub-component of a larger theoretical framework known as Wild Systems Theory (WST). According to WST, living systems are comprised of multi-scale systems of self-sustaining work. "Self-sustaining work," in this context refers to patterns of energy transformation that produce products that feedback into and sustain the work that produced the product in the first place. [For a thorough description of WST and its take on self-sustaining work please see Jordan and Ghin, 2006, 2007; Jordan, 2008; Jordan and Heidenreich, 2010; Jordan and Vinson, 2012]. According to Jordan and Vinson (2012):

At the chemical level, self-sustaining work has been referred to as autocatalysis (Kauffman, 1995), the idea being that a selfsustaining chemical system is one in which reactions produce either their own catalysts or catalysts for some other reaction in the system. At the biological level, self-sustaining work has been referred to as autopoiesis (Maturana and Varela, 1980), again, the idea being that a single cell constitutes a multi-scale system of work in which lower-scale chemical processes give rise to the larger biological whole of the cell which, in turn, provides a context in which the lower-scale work sustains itself and the whole it gives rise to (Jordan and Ghin, 2006). Hebb (1949) referred to the selfsustaining nature of neural networks as the "cell assembly," the idea being that neurons that fire together wire together. Jordan and Heidenreich (2010) recently cast this idea in terms of selfsustaining work by examining data that indicate the generation of action potentials increases nuclear transcription processes in neurons which, in turn, fosters synapse formation. At the behavioural level, Skinner (1976) referred to the self-sustaining nature of behaviour as operant conditioning, the idea being that behaviours sustain themselves in one's behavioural repertoire as a function of the consequences they generate. Streeck and Jordan (2009) recently described communication as a dynamical self-sustaining system in which multi-scale events such as postural alignment, gesture, gaze, and speech produce outcomes that sustain an ongoing interaction. And finally, Odum (1988) and Vandervert (1995) used the notion of self-sustaining work to refer to ecologies in general. (p. 235)

WSTs assertion that organisms are constituted of multi-scale self-sustaining work reveals the dynamic homologies that transcend both the phyla and the nesting of multi-scale energytransformation systems that constitute a single organism. From plants, to neurons, to behavior, to persons, to human societies, increasingly complex systems of work (i.e., energy transformation) have evolved precisely because the very work of which they are constituted, is self-sustaining. That is, the work produces catalysts for either the work itself, or some other level of work in the multi-scale system.

In addition to revealing the multi-scale homologies that constitute an organism, WST's notion of multi-scale self-sustaining work affords a conceptual reframe of the context in which organisms sustain themselves. In traditional accounts, nature is conceptualized as being physical, and phenomenal properties such as meaning, value, and consciousness are conceptualized as either identical with the physical (i.e., identity theory), emergent from the physical (i.e., emergentism), as an informational property of causal relations (i.e., functionalism), or as an aspect of reality other than the physical (i.e., double-aspect theory and property dualism). Again, as was stated at the outset of the present paper, in all of these positions, phenomenal properties do not constitute a logically necessary aspect of the causal story. As a result, phenomenal properties do not enter into a scientific, causal description of what we do and how we do it. In short, consciousness is an epiphenomenon.

Within WST however, "nature" is conceptualized as a self-organizing energy-transformation hierarchy (Odum, 1988; Vandervert, 1995) within which "the fuel source dictates the consumer" (Jordan and Ghin, 2006). What this means is that any system that sustains itself on a given fuel source (e.g., plants on sunlight, herbivores on plants, or carnivores on herbivores) must be constituted in such a way that it is capable of addressing all the constraints involved in capturing that fuel source. Given this necessary connection between a consumer, its fuel source, and the context in which the two exist, it seems appropriate to claim that an organism constitutes a multiscale, self-sustaining embodiment of the constraints entailed in taking in, transforming, and dissipating its fuel source. Said another way, organisms are self-sustaining embodiments of the contexts in which they phylogenetically and ontogenetically emerged.

Conceptualizing organisms as embodiments of context is an important move for WST because it provides a means of conceptualizing organisms as inherently meaningful. Specifically, if an organism constitutes an embodiment of context, then it is naturally and necessarily "about" that context. That is, its internal dynamics are phylogenetically and ontogenetically emergent from the energy-transformation hierarchy in which it has sustained itself. As a result,

. . . there is no epistemic gap between an organism and its environment. Organisms do not need to be "informed" by environments in order to be about environments because they are necessarily "about" the contexts they embody. Rather, what self-sustaining systems need do is sustain relationships with the contexts in which they are embedded in ways that lead them to sustainment. According to WST, meaning is constitutive of embodied context (i.e., bodies). As a result, living systems are necessarily meaningful (Jordan, 2000a), not because a body is alive or dead, because it is physical, or because it is biological. Living is meaning because it is sustained, embodied context. (Jordan and Vinson, 2012, p. 9)

## **EMBODIED CONTEXT, ACTION-CONTROL, AND CONSCIOUSNESS**

Given the notion of "embodied context," WST asserts that the phenomenon we refer to as consciousness is actually a phylogenetically scaled-up recursion on the embodied aboutness inherent in all organisms. What determines the distality of the aboutness (i.e., the level of conscious awareness) an organism entails varies with the distality of the contexts in which the organism can prespecify outcomes and work to sustain those outcomes: resident, remote, and sometimes very remote indeed. As an example of species differences in the scale of event control, while my dog and I can jointly sustain the outcome of playing tug-of-war in the hear-and-now, my dog is not able to organize himself in the hear-and-now in order to play tug-of-war again at the same time tomorrow. Dogs are not able to pre-specify the very remote effect, "tomorrow," and therefore, cannot sustain a relationship with it. From this perspective, I am able to pre-specify and sustain relationships with contexts that are vastly more "remote" than those of my dog.

On the one hand, it may seem that the obvious account of why different species sustain effects at different time-scales is a neural one; organisms capable of sustaining increasingly abstract effects (e.g., "tomorrow," "next June," or "forever") can do so because they have more sophisticated brains. On the other hand, WST proposes it is more than just brains. Rather, consistent with Oyama (2000), Jordan (2008) asserts that the sustainment of abstract contexts necessitates the emergence and sustainment of external contexts such as language and technology specifically, and culture, in general (what Oyama refers to as developmental contexts). It is within this entire multi-scale, contextually emergent, self-sustaining system of work that nested sub-systems (i.e., individual humans) are able to generate and sustain abstract contexts. Again, consistent with Oyama, from this perspective, infants inherit much more than genes. In short, they inherit a culture.

Within such a multi-scale, self-sustaining transformation hierarchy, the issue of action-control and consciousness is about so much more than the issue of whether or not conscious thoughts cause actions. To be sure, consciousness (i.e., embodied aboutness, or embodied context) does exist, but not in the way it is thought to exist within traditional frameworks that conceptualize consciousness as being opposed to unconsciousness. Again, within WST, "aboutness" is a constituent property of all self-sustaining embodiments of context. All selfsustaining systems *are* abountess (Jordan, 2000a). (See Jordan and Vinson, 2012, for a thorough analysis of how these ideas are related to the non-living systems). Thus, according to WST, the issue of consciousness and action control as it is traditionally framed, is framed in WST in terms of the effects that are most prescient during any given movement of multiscale effect control. For example, while conversing with a friend and walking down a flight of stairs, sensed foot pressure is not prescient (i.e., it is not within one's currently reportable conscious states) until there is an error (i.e., prespecified and attained foot pressure do not match). As one starts to fall because the predicted floor is not there, sensed foot pressure becomes prescient; it comes to dominate immediate, reportable consciousness as one struggles to avoid falling down the stairs.

From this perspective, consciousness does not reside at a particular level of event control. Instead, it is fluid and makes its way transiently to different levels of effect control as different effects become contextually prescient (Jordan, 2003). This implies that consciousness has more to do with managing relationships across different levels of effect control. In a test of this idea, Kumar and Srinivasan (2013) asked participants to use a joystick to aim at targets on a computer screen. Target trajectory entailed one of three levels of random perturbation. After the participant pulled the trigger on the joystick, a stimulus appeared at the targeted location, and the participants indicated (1) how much time passed between the trigger pull and the appearance of the stimulus, and (2) how confident they were that they themselves were the author of the action. Results revealed that if participants missed the target (i.e., the more distal effect was not achieved), estimates of the action-stimulus interval were significantly correlated with the actual action-stimulus interval as well as the degree of noise in the target movements. Specifically, as the amount of noise in the target movements decreased, time estimates also decreased. This temporal attraction of the timing of a post-action stimulus toward the moment of the action is referred to as intentional binding (Jordan, 2000b; Haggard et al., 2002), and it is assumed to constitute an implicit measure of one's sense of agency. If, however, the participant hit the target, the pattern changed. Specifically, estimates of the action-stimulus interval were significantly correlated with the action-stimulus interval (i.e., intentional binding occurred), but they were not correlated with the degree of noise in the stimulus. This indicates that once the distal effect is achieved, one's consciousness is more about the achieved distal effect than the constraints that had to be addressed by the proximal control systems as they worked to achieve the distal effect.

The idea that consciousness ebbs and flows across different levels of effect control has much in common with Vallacher et al.'s (1989) action-identity theory, which assumes that there are many different ways to cognitively identify (i.e., represent) a given action, but only one identification tends to be prepotent for the actor at any given moment:

. . . although talking, for example, could be identified as sharing information, expressing an opinion, influencing someone, passing time, or choosing words, the actor is likely to have in mind only one of these identifications. (p. 199)

The notion of consciousness working as a manager across levels of effect control is also consistent with Baars' global workspace hypothesis (1988), which asserts that the purpose of consciousness is to make the contents regarding a specific conscious experience massively available to a host of unconscious brain processes so that these latter brain processes can be brought to bear on the immediate situation. From this perspective, consciousness ebbs and flows across different contents as different problems emerge for the system in real time. Morsella (2005) proposes a similar view in which the purpose of phenomenal (i.e., conscious) states is the resolution of conflicts between action plans as different action systems compete for expression through the skeletal muscular system, what he refers to as PRISM (i.e., parallel responses into skeletal muscle).

Common to PRISM, Global Workspace Theory, and WST is the idea that potential conflicts among competing actions (i.e., effect control systems) need to be sorted out by the system. From the traditional perspective, this might be taken to mean that a certain conscious state intervenes and causes a particular action to be expressed. From the perspective of WST, it means that at any given moment, the pattern of multi-scale effects one works to control emerges spontaneously and continuously out of both exogenous influences that activate pre-specifications of past effect-control episodes via cerebro-cerebellar systems, and the endogenous constellation of constraint that builds up over the life course across different levels of effect control. Imamizu and Kawato (2009) review a host of empirical findings that are consistent with the idea that moment-to-moment changes in effect-control dynamics, what they refer to as the switching of internal models, is brought about my the continuous, exogenous and endogenous modulation of internal models (i.e., cerebellar control models).

On the one hand, GWT and PRISM seem to have the advantage of Occam's razor. They provide a clear, causal story of how changes in a physical system like the brain are associated with conscious states. On the other, WST overcomes the potential epiphenomenalism inherent in the physicalism of both GWT and PRISM, because WST provides an account of what consciousness is and why it is necessary. However, according to WST, consciousness is not necessary because it helps physical brains sort out potential actions. Rather, it is necessary because it is what organisms are.

## **CONCLUSIONS**

The purpose of the present paper was to present an approach to the issue of *consciousness* and *action control* that, in the end, challenges the utility of concepts such as *consciousness* and *action control* in a science of what we do and how we do it. Traditional models assert we do things such as act, perceive, think, attend, and remember. While these concepts have great utility in daily life, from which they emerged, it is my contention they are not complex enough to address the host of hypercomplex regularities cognitive science has discovered over the past 30 years. Brains specifically, and living systems in general, have turned out to be closed to efficient cause (Rosen, 1991) and interaction-dominant (Van Orden and Holden, 2002). Action oriented areas of the brain have turned out to be simultaneously perceptual (Miall, 2003), and moment-to-moment experience finds itself having a prospective, anticipatory edge as memories of the past continuously prime those areas of the cortex we once thought served the

## **REFERENCES**


and the inner zombie. *Br. J. Philos. Sci.* 58, 563–594. doi: 10.1093/bjps/ axm030


purpose of informing us about the present. What we do and how we do it turns out to be continuous, multi-scale, and wild. By wild I do not mean out of control. To the contrary, I mean massively in control. Not like a closed system such as a robotic arm placing hyper accurate welds on an assembly line, or a computer code parsing chunks into appropriate sectors. Rather, like an open system such as a bird in flight, whose wing dynamics absorb and resist the multi-scale wind patterns it encounters in real time, not because it has to control its flight, but because controlling flight is what it is.

*Sci.* 24, 849–878. doi: 10.1017/ S0140525X01000103


*XIX: Common Mechanisms in Perception and Action* eds W. Prinz and B. Hommel (Oxford, England: Oxford University), 158–176.


*35th Annual Meeting of the Cognitive Science Society,* (Berlin).


way to naturalise phenomenology? *J. Conscious. Stud.* 9, 27–45.


*Psychol.* 56, 199–208. doi: 10.1037/ 0022-3514.56.2.199


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 June 2013; paper pending published: 16 July 2013; accepted: 11 August 2013; published online: 03 September 2013.*

*Citation: Jordan JS (2013) The wild ways of conscious will: what we do, how we do it, and why it has meaning. Front. Psychol. 4:574. doi: 10.3389/fpsyg. 2013.00574*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Jordan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Action simulation: time course and representational mechanisms

## *Anne Springer 1,2\*, Jim Parkinson3,4 and Wolfgang Prinz <sup>1</sup>*

*<sup>1</sup> Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany*

*<sup>2</sup> Department of Sport and Exercise Psychology, University of Potsdam, Potsdam, Germany*

*<sup>3</sup> Institute of Cognitive Neuroscience, University College London, London, UK*

*<sup>4</sup> Sackler Centre for Consciousness Science, University of Sussex, Falmer, UK*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA*

#### *Reviewed by:*

*Cosimo Urgesi, University of Udine, Italy*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA*

#### *\*Correspondence:*

*Anne Springer, Department of Sport and Exercise Psychology, University of Potsdam, Am Neuen Palais 10, D-14469 Potsdam, Germany e-mail: anne.springer@ uni-potsdam.de*

The notion of action simulation refers to the ability to re-enact foreign actions (i.e., actions observed in other individuals). Simulating others' actions implies a *mirroring* of their activities, based on one's own sensorimotor competencies. Here, we discuss theoretical and experimental approaches to action simulation and the study of its representational underpinnings. One focus of our discussion is on the timing of internal simulation and its relation to the timing of external action, and a paradigm that requires participants to predict the future course of actions that are temporarily occluded from view. We address transitions between perceptual mechanisms (referring to action representation before and after occlusion) and simulation mechanisms (referring to action representation during occlusion). Findings suggest that action simulation runs in real-time; acting on newly created action representations rather than relying on continuous visual extrapolations. A further focus of our discussion pertains to the functional characteristics of the mechanisms involved in predicting other people's actions. We propose that two processes are engaged, dynamic updating and static matching, which may draw on both semantic and motor information. In a concluding section, we discuss these findings in the context of broader theoretical issues related to action and event representation, arguing that a detailed functional analysis of action simulation in cognitive, neural, and computational terms may help to further advance our understanding of action cognition and motor control.

**Keywords: action simulation, internal forward models, occlusion, point-light action, static matching, predictive coding**

## **REPRESENTATION OF OCCLUDED ACTION**

In recent years, the concept of *simulation* has flourished within various fields of psychological research, ranging from agency (Ruby and Decety, 2001), action perception (Rizzolatti et al., 1996; Blakemore and Decety, 2001; Mukamel et al., 2010), imitation (Brass et al., 2001; Buccino et al., 2004), mind reading (Gordon, 1996, 2001; Goldman, 2005, 2006), and empathy (Chartrand and Bargh, 1999; Gallese et al., 2004) to the study of clinical issues like schizophrenia (Enticott et al., 2008). Across these domains and the majority of studies, the term simulation concordantly expresses the idea that humans possess nonconceptual and direct ways of understanding others' thoughts, feelings, intentions and actions by *mirroring* or *re-enacting* their mental states and physical activities (Blakemore and Decety, 2001; Rizzolatti et al., 2006).

The present paper focuses on action simulation. It aims to discuss theoretical and experimental approaches to action simulation and its cognitive underpinnings, broadly understood as internal representations that parallel external action events, like observing a friend making a cup of coffee or a couple dancing together. From a functional point of view, internal simulations of physical actions may improve our appraisal of actions that we plan to perform in collaboration with others and that require us to act in response to and in anticipation of the actions of others (e.g., Sebanz and Knoblich, 2009; Doerrfeld et al., 2012; Manera et al., 2013). A recent study illustrative of this demonstrated that judgments of the weight of an object varied according to whether the participants planned to lift the object by themselves or whether they planned to lift it together with a co-actor who was either healthy or injured (Doerrfeld et al., 2012).

Here, we use the term simulation to refer to the mental operations involved in internally representing actions during so-called visual action occlusions. In everyday life, when we watch other people moving around us, it often happens that they disappear from sight for a moment. However, in these situations, behaviours and physical actions observed immediately prior to occlusion do not just stop. Based on what we have seen before, we are usually capable of internally substituting (simulating) the invisible parts of an action and rendering quite precise predictions about its future course (i.e., what the agent will do, and when his/her action will take place).

Neurophysiological evidence in monkeys demonstrates that neurons continue to fire for some time in response to specific actions even after these actions disappear behind an occluding object (Umilta et al., 2001; Jellema and Perrett, 2003). In human experiments using action occlusion paradigms, participants typically watch familiar actions that are briefly occluded from view and then continued. A typical task is then for participants to indicate if the re-appearing action part (after occlusion) is an accurate continuation of the perceived action (seen prior to occlusion), or if it has changed in spatial angle (e.g., Graf et al., 2007; Springer et al., 2011) or has jumped in time (e.g., Stadler et al., 2012b). High accuracy on this task requires a precise internal representation (simulation) of the occluded action part, and factors that influence performance can then be measured. For instance, Stadler et al. (2012b) examined the accuracy of action simulation when the observed actors moved according to human-like vs. artificial motion profiles whose kinematics had been changed according to a non-human, constant velocity. Under conditions with temporary occlusions, observers were clearly more able to predict human-like actions compared to non-human actions (Stadler et al., 2012b), highlighting the fundamental susceptibility of the human action simulation system (Parkinson et al., 2012; Saygin and Stadler, 2012; see Gowen and Poliakoff, 2012, for a review).

Since temporary occlusions require switching back and forth between externally guided perception and internal representations (simulations), action occlusion paradigms, further, allow the study of these two phases separately and in terms of their interrelations. In an illustrative study by Prinz and Rapinett (2008), the participants observed a human hand transporting an object. After a moment, the hand with the object was briefly occluded from view. Participants were required to make a judgment about the time that they thought the transporting hand with the object would reappear from behind the occluding object (**Figure 5**). Results indicated a substantial positive time error (i.e., lag effect), meaning that the continuation of the action after occlusion was judged to be "just in time" when the point of reappearance was slightly shifted ahead (by 20–100 ms). This finding provided a first insight into the timing and nature of internal action representations (simulations) during occlusions, suggesting that the mental operations called up during occlusions involve the generation of novel action representations rather than just pure extrapolation of perceived movement trajectories (Prinz and Rapinett, 2008).

The authors of this study put forward that these issues are based on the assumption that unlike action perception, which naturally draws on external resources derived from actual stimulation, action simulation draws on internal resources derived from stored knowledge (cf. Prinz and Rapinett, 2008). In the following section, we focus on the temporal characteristics of action simulation. Based on the experimental evidence from different occlusion tasks, we propose that action simulation engages internal online processes operating in real-time, which act on newly created action representations rather than relying on continuous visual extrapolations of observed movement trajectories.

In Section Representational Mechanisms, we discuss another strand of experimental studies that have explored the procedural and representational characteristics of the processes engaged for solving action occlusion tasks. Two major findings emerge. Firstly, predicting occluded actions may engage two distinct processes: dynamic simulation and static matching. Both processes do not by themselves speak to the representational format in which they occur (e.g., simulation/matching in the motor and/or visual domain). Secondly, two different kinds of representational domains may be involved: sensorimotor processes (those involved in an observers' own physical activity) and semantic processes (those involved in understanding action-related verbal contents). In a concluding section, we propose that the concept of internal action simulation can be related to a predictive coding account of motor control (e.g., Kilner et al., 2007), in correspondence with the broader notion that humans can use their motor system to simulate and predict others' actions (Grèzes and Decety, 2001; Jeannerod, 2001). In line with this notion, action simulation may also be linked to embodied views of language, holding that processing verbal and conceptual action-related information is strongly linked to (and may even rely on) information processing in sensory and motor domains (e.g., Barsalou, 2003, 2008; Zwaan, 2004; Pulvermüller, 2005, 2008; Glenberg, 2008; Mahon and Caramazza, 2008, 2009).

## **TIME COURSE**

## **SIMULATION IN REAL TIME: THE OCCLUDER PARADIGM**

The first research into real-time action simulation used what we refer to as the *occluder paradigm*, first developed by Graf et al. (2007). This paradigm has been used, with some novel variations, in subsequent research on action simulation (Prinz and Rapinett, 2008; Parkinson et al., 2011; Sparenberg et al., 2012). The occluder paradigm is based on the hypothesis that when we observe a human moving, who is then occluded from view perhaps he or she disappears behind a fence—an internal action simulation runs in real-time predicting the on-going motion. Then the person reappears in view at a point in the motion that either matches or does not match that internal real-time simulation. In this way, the spatiotemporal accuracy of the action simulation and the ability of people to use action simulation to aid their perception and prediction of human movement can be tested.

The original occluder paradigm (Graf et al., 2007) used representations of human motion known as *point-light actors* (PLAs; Johansson, 1973), which convey motion via a group of moving dots that track the motion of the major joints of the human body. These stimuli have been widely used to examine human movement processing (Johansson, 1973, 1976; Cutting et al., 1988). Graf et al.'s (2007) original studies used PLA stimuli of noncyclical human motions, such as performing a basketball shot or hitting a tennis ball. The PLA was presented to the participant for a short period of 2–4 s before being occluded from view for a fixed amount of time (occluder time; 100, 400 or 700 ms, see **Figure 1**). Following this, a static test posture of the action was presented that was either rotated around the actor's vertical axis, as if he had suddenly spun to the left or the right, or in the correct orientation, as if he had continued smoothly on in his motion. The participants' task was to judge whether this rotation had occurred or not (yes/no response). Crucially, and independent of the spatial orientation, the test posture was either a congruent or incongruent continuation of the motion. That is, if the test posture was congruent, it was taken from the point in the motion that the actor would have reached if he had continued for the exact duration of the occluder. In this condition, the test posture should match the state of the internal real-time simulation. In the incongruent

conditions, the test posture was from a point of the motion that was too early or too late with respect to the exact occlusion period. In this case, the test posture would not match the current state of the real-time action simulation. The occlusion period is referred to as the *pose time* (which, again, can take three values, 100, 400 or 700 ms, see **Figure 1**).

The hypothesis was that if the test posture matched the current state of the action simulation, then the orientation judgment would be easier and more successful. This was indeed the case: error rates for congruent continuations were less than those for incongruent continuations across different occluded durations, a finding that was demonstrated across a number of different types of human motions. This was the first evidence that the real-time action simulation existed and that it could be tested by utilizing its beneficial effects in visual judgment tasks. Graf et al. (2007) conducted a further experiment using PLAs that were inverted on their vertical axis; it is well-known that it is difficult to perceive human motion under these conditions. The results showed that there was no judgment-benefit for the congruent test postures compared to the incongruent ones. This suggested that the effects seen in the earlier studies were specific to human motion itself and not some more generalized visual predictive system. That is, the real-time simulation is specifically concerned with predicting coherent examples of human actions.

#### **BENEFITS OF REAL-TIME SIMULATION**

Having described the occluder paradigm for testing action simulation, we will now review recent research that has used and further developed it to measure various aspects of action simulation, such as its precise time course, its susceptibility to online change, and its role in the direct visual perception of human motion.

## *Detection thresholds*

First, we consider the advantageous role that real-time action simulation has in the visual processing of human motion. That is, can we show that the action simulation can provide a here-andnow, or predictive, benefit to visual perception? Imagine watching a football match on television. The TV is an old analogue set and is fed by a radio antenna on the roof. It's windy, the antenna is blown around, and the TV picture is occasionally replaced with "snow": the random cascade of black and white dots that represent a lack of signal. A player is about to score when he disappears briefly under this visual snow and you are trying to keep track of him until he reappears on the screen in the haze of bad picture quality.

We argue that this is when action simulation might occur: you generate a real-time model of a player's movements whilst he is briefly occluded from view. This example also illustrates what may be an adaptive benefit of the real-time action simulation: if the internal model tracks the exact time-course of the player's movements, the model can then be used to provide a perceptual prediction—also in real-time—of how the player should look at any moment during occlusion. If the figure is not fully occluded but simply visually degraded, the perceptual prediction may provide real-time support to the visual system, aiding the detection and processing of the faintly seen player. Importantly, if the faintly reappearing player was not then moving in a way consistent with the real-time predictive simulation—the video had skipped forward or backwards, for example—the simulation would provide no such predictive benefit.

This was the rationale behind a recent set of experiments (Parkinson et al., 2011) in which PLAs were presented on top of a constantly changing random pattern of black and white pixels in a 50/50 ratio, resembling TV "noise." The points of the actor were squares of pixels that could also be rendered as randomized patterns of black and white dots, with a variable ratio of white dots to black dots. If the ratio of white dots was 100%, the actor was clearly visible. However, as the white ratio was reduced the actor was less visible against the background noise, until a 50% ratio rendered him totally invisible (see **Figure 2A**).

In each trial, the participants were presented with background noise that continuously changed throughout the trials. Participants clearly saw the initial part of the action, which we refer to as the *prime motion*, as it was the section of motion that was assumed to prime the generation of the subsequent action simulation. Then the actor was occluded from view for a short period (400 ms) after which he reappeared, in motion, for 400 ms. This was the *test motion* that was presented with variable visibility against the background (see **Figure 2B**). The participants' task was to indicate whether they saw the reappearing actor or not, with test motion visibility adaptively altered to reach set detection rate targets. Thus, detection thresholds for the test motion could be measured in terms of the ratio of white pixels depicting the test motion actor. This was an indicator of how easy participants found it to detect the reappearing actor under difficult visual conditions.

It is important to note that this detection task is a simple, immediate "here-and-now" judgment, as opposed to the more

**presented with variable visibility against the background. (A)** Shows how varying white pixel ratio in the actor's joints increases visibility against the noise background, **(B)** shows a basic trial sequence, and **(C)** depicts a schematic showing how different sections of the action were shown as the

section. Figure adapted from Parkinson et al. (2011, p. 1466). Copyright © The Experimental Psychology Society. Adapted with permission of Taylor and Francis Ltd., www.tandfonline.com on behalf of The Experimental Psychology Society, with permission from the authors.

postdictive one used in the original Graf et al. (2007) paradigm. That is, participants were not asked to make any judgment about the quality of the reappearing actor (e.g., whether he had turned or not), but quite simply whether he had reappeared at all. Thus, this paradigm made it possible to measure whether action simulation aided the basic visual prediction and subsequent detection of human motion (i.e., detection thresholds). An action simulation would be generated during the occlusion, which was a continuation of the prime motion seen prior to occlusion. In order to test the simulation's effect on test motion detection, the spatiotemporal relationship between the prime motion and the test motion was manipulated. This meant that the action simulation would be congruent with the test motion, or the test motion would be "too early" or "too late" to match the action simulation (incongruent conditions). To avoid confounds of test motion on detection thresholds, the same section of test motion was used in all congruent and incongruent conditions. Thus, the spatiotemporal manipulations were achieved by presenting different sections of prime motion, which would subsequently drive different action simulations in relation to the single test motion (see **Figure 2C**).

Detection thresholds were measured for a variety of actions in three conditions: action simulation-congruent test motions, incongruent early test motions, or incongruent late test motions. Congruent thresholds were consistently lower than those for either of the incongruent condition. Hence, if the currently generated action simulation was temporally congruent with the test motion, the latter was more easily detected. These experiments suggest that the action simulation can have a direct, immediate, here-and-now benefit for the perception of human movement, but only when the external stimulus of movement temporally matches the internal action simulation. This suggests some form of on-going, real-time, top-down effect of action simulation on ongoing visual processes, and supports the notion that internal forward models for biological motion and action can be used to directly supplement the perception of those actions (Wilson and Knoblich, 2005; Prinz, 2006).

## *Inserted motion*

Another consideration in understanding action simulation is how stable the internal real-time model of motion is. In other words, is it possible to briefly bias, or indeed replace, the current ongoing action simulation by very briefly introducing human motion information that does not match the current state of the ongoing simulation process? Parkinson et al. (2012; Experiment 2) investigated this question by adding "inserted motion" in the occluder. Specifically, PLAs were presented that were briefly occluded for 500 ms and then reappeared in motion for 500 ms (i.e., test motions). These test motions were either temporally congruent with the ongoing action simulation at that point or were temporally incongruent, that is, offset 267 ms too early or too late in the motion sequences. The participants were asked to make an explicit 2-alternative forced-choice judgment as to whether the test motion was a correct continuation or not, that is, the judgment measured how well the test motion matched the action simulation that was being generated at that point in time.

Around the temporal halfway point of the occlusion period, 67 ms (4 frames) of low contrast PLA was presented, such that participants got the impression of a brief "flash" of motion within the occlusion period (see **Figure 3**). Crucially, the inserted motion was either temporally congruent with the action simulation or too early or too late by 267 ms. In the congruent instance, the inserted motion matched the action simulation, almost as if the ongoing action was seen very briefly through a slit in the occlusion period. In the two incongruent instances, the inserted motion would not match the ongoing action simulation. The hypothesis was that, despite the brevity of the inserted motion, it would nevertheless act to bias or replace the action simulation.

The experiment, thus, represented a 3 (test motion offset) × 3 (inserted motion offset) design, with the measurement being the percentage of trials in which the participants judged the test motion to be a "correct continuation." The results are shown in **Figure 4**. When the inserted motion matched the action simulation (0 ms offset, central cluster), it did not interfere with the action simulation process and the results were as expected: participants were more likely to judge the congruent test motion as correct, as opposed to either of the incongruent test motions, showing that they could utilize the action simulation to correctly judge the veracity of the test motion. However, the pattern of results was distinctly different when the inserted motion was incongruent with the action simulation: when the inserted motion was offset in one temporal direction, judgments of what was a correct test motion also shifted in that temporal direction.

This suggests that the action simulation process can be updated "mid-flow" by new incoming motion information, no matter how briefly that new motion is perceived for. This is perhaps unsurprising because an action simulation process that remains immune to change may not be a very useful mechanism for predicting the ongoing movements of others. Two possibilities arise as to how the inserted motion effects this updating of the action simulation: firstly, the *biasing hypothesis* suggests that briefly presented motion that does not temporally match the current state of the action simulation acts to fluidly update it, temporally "pushing" the action simulation in that direction. The second hypothesis suggests that *re-simulation* occurs: perceiving even a short duration of inserted motion cancels the currently generated action simulation and generates a new one based on the new motion information. Naturally, if the inserted motion matches the old simulation, the new one will roughly match the old one, leading to the expected results, as seen in the 0 ms inserted offset condition described above. However, if the re-simulation is based upon temporally shifted motion, the newly generated simulation will perceptually support "incorrect" continuation test motions.

Both of these hypotheses are possible and the current evidence does not conclusively support one or the other (Parkinson et al., 2012). It might seem, on the face of it, that the re-simulation hypothesis is less intuitively sensible, since it would entail that an entirely new action simulation can be accurately generated from only 67 ms of a PLA motion. In fact, this is entirely feasible, as we will see later (Section The Lag Effect: Towards a Higher Temporal Resolution).

## **THE LAG EFFECT: TOWARDS A HIGHER TEMPORAL RESOLUTION**

We have described recent research, which, in the first instance, shows the existence of real-time action simulation of human motion (Graf et al., 2007; Parkinson et al., 2011). Secondly, we have demonstrated an ecologically valid real-world benefit of action simulation, which can act in a top-down fashion to aid

human motion detection (Parkinson et al., 2011). Finally, we went on to illustrate the fluid, updatable nature of the simulation (Parkinson et al., 2012). Now, we turn to the investigation of the spatiotemporal accuracy of the action simulation: the more fine-grained nature of how well the simulation can track human motion.

## *Spatial occlusion and the teapot experiment*

Work by Prinz and Rapinett (2008) attempted to investigate the time-course and accuracy of action simulation by asking how it relates, at first, to simple visual linear extrapolation. They used a novel version of the occluder paradigm, which used actual video footage of people making simple goal-directed motions. The actors of these videos sat facing the viewer but hidden behind a permanently present cardboard occluder (see **Figure 5**). The motions they performed were simple, manual transport movements, such as reaching with their right hand for a teapot on their right (screen-left), picking it up, and moving it screen-right behind the occluder, to reappear screen-right of the occluder where a mug or cup awaited the teapot. Since the occluder was onscreen throughout, the moments and position of the start of the occlusion were highly predictable, as was the spatial position of reappearance considering the linear left-right trajectory of the transport motion. Therefore, the experiment was ideally suited to measure the spatiotemporal accuracy of the action simulation by examining participants' judgments of the time the teapot reappeared from the right-edge of the occluder.

The teapot could reappear either at the correct time, as if the motion had continued as normal behind the occluder, or too late or too early in steps of 40 ms. Participants judged whether the teapot appeared too late, just in time, or too early. When the "just in time" judgments were analyzed, it was found that there was a

**FIGURE 5 | Illustration of the experimental setting as seen through the eyes of the participant.** On each trial the actor (sitting behind an occluding object) transported a teapot from a home position to a target position. Figure adapted from Prinz and Rapinett (2008) (p. 226). Copyright by IOS Press. Adapted with permission.

positive time error in the judgments. That is, the reappearances participants thought were correct were, in fact, too late. This suggests that the action simulation is not entirely accurate in tracking ongoing occluded motion; there is some temporal lag present. **Figure 6A** schematically illustrates these results on the assumption that the transport motion (solid black line) has a constant velocity as the teapot is moved from screen left to right. The black dotted line represents the occluded portion of the motion. This would also represent the trajectory of an accurate action simulation: one without lag. The gray line represents the perceived trajectory of the teapot after occlusion, with a positive time lag, which equates with the time lag in the "just in time" judgments.

Retaining the assumption that an action simulation has a linear velocity profile like the action it represents, there are two possible sources for the judgment error. Firstly, the generated action simulation may be slower than the actual action. This is called the *slope error*, and is represented in **Figure 6B** as the solid gray line within the occluder. The second source of error comes from instances where the action simulation matches the real action in terms of speed, but there is a time-cost involved in generating an action simulation, which means it lags behind the action by a set amount from the start. This is represented as the dotted line in the occluder in **Figure 6B**, and is known as the *intercept error*. The two errors are not mutually exclusive.

In order to ascertain which of the two errors contributes to the lag in continuation judgments described above Prinz and Rapinett (2008), conducted a second experiment in which two different sizes of occluder and two different speeds of motion were used. Altering the occluder size changes both the distance over which the action is occluded and the time taken until reappearance (**Figure 6C**). Altering the speed of the motion alters the amount of time it takes the motion to cross the same occluder distance (**Figure 6D**). In both cases the slope and intercept error hypotheses make markedly different predictions. The simulations affected by intercept errors are shown in dotted gray lines, and those affected by the slope error are in solid gray. Because the time cost involved will occur from the start of simulation, this lag should be constant, irrespective of the occluder size, and so judgment lags will remain constant across occluder conditions. However, the slope error hypothesis implies that the longer the action is occluded for, the more the lag increases. Hence, a larger occluder should produce more error than a smaller occluder (**Figure 6C**). Similarly, increasing the speed of the action and, thus, decreasing occlusion time should, according to the slope hypothesis, decrease lag error, whilst again the intercept error suggests that lag will be the same irrespective of action speed (**Figure 6D**).

However, when an experiment was run combining two transport speeds with two occluder widths, the results were the opposite of the predictions of both the slope and the intercept error hypotheses, with lag error being smaller for slower action speeds, and smaller for longer occluders. This means, firstly that the constant cost (intercept) hypothesis must be rejected, because it predicted that error would be constant. Interestingly, however, it also suggests that the source of the lag error cannot be a slowing of a linear extrapolation (slope error) because the results are the opposite of what that hypothesis predicts.

Since the available evidence did not support action simulation as a simple, linear extrapolation of the occluded motion, Prinz and Rapinett (2008) went on to reconsider the nature of the simulation: First, they included more details about the spatiotemporal properties of goal-directed movements, namely that a goal-directed transport movement tends to have a period of acceleration at the start and a deceleration toward the goal at the end. Second, they suggested that rather than being a simple extrapolation or continuation of the movement, the action simulation is actually an internally generated re-start of the motion. As the visual input of the goal-directed input is removed at the occluded edge, the action simulation may generate a model of a similar goal-directed action with the same target (end-point) but with a new start point that of the occluded edge. This means that the action simulation entails a period of acceleration from its own start, then moves and decelerates toward the exact same spatiotemporal target of the original action. **Figure 7A** shows the velocity profile of the action as it accelerates from the start and decelerates at the target (black solid line) with the occluded portion dotted. The re-generated action simulation is shown in gray, with a similar accelerating-decelerating profile. The thick line on the right side of the occluder highlights the magnitude of the lag error. **Figure 7B** shows how this re-generated simulation hypothesis can account for the previously puzzling results: faster actions produce more lag error than slower actions and larger occluders produce more error than smaller occluders.

In a final experiment, Prinz and Rapinett (2008) looked at the effects of implied goal duration and produced a remarkably effective demonstration that action simulation involves the internal modeling of goal-oriented human action and not merely visual prediction of kinematics: they used the same video-based paradigm involving a left–right teapot transport with two different occluder widths. In addition, they varied the visual identity of the target item between a small cup and a large mug. The large mug would take longer to fill than the small cup, so the length of time taken to achieve the action-goal should be longer for the mug than the cup. While the videos of the reappearing movement always stopped at the same point, just before the contents of the teapot were about to be poured, a greater positive lag error was observed in response to the mug compared to the cup

targets, meaning that the greater amount of time implied for filling the mug had increased the target time for the generated action simulation (see **Figure 7C**).

This work by Prinz and Rapinett (2008) simply, but effectively, demonstrates a number of details regarding both the generation and the spatiotemporal details of action simulation. Firstly, the simulation is not merely a linear extrapolation or continuation of the perceptual information; indeed, it seems not to be a continuation at all. Instead, it may actually be that an entirely new model of the goal-directed action that has been occluded is generated, but starting from the point of occlusion, and this re-generation utilizes goal-directed kinematic information inherent in action systems. In this sense, the re-generation may, in fact, be more closely tied to motor systems than perceptual systems, in that it uses goal-directed motor information to supply the perceptual information, a notion put forward by Prinz (2006) and Wilson and Knoblich (2005).

Sparenberg et al. (2012) took a more detailed look at the lag error in action simulation measured by Prinz and Rapinett (2008). They used PLA stimuli and a 300 ms occluder period, after which they showed a static test posture, which could be offset earlier or later than the true posture of the actor immediately following occlusion. Participants were asked if the test posture was too late or too early to be the correct continuation of the motion. Results showed that test postures that were too early in the sequence were judged to be a correct continuation. That is, over a fixed period of occluder time, the action simulation lags behind the true motion. When Sparenberg et al. (2012) measured this lag over two different occluder durations, they found that the lag did not change but remained constant at 25 ms lag error. This contradicted the findings of Prinz and Rapinett (2008) that the error reduced with longer occlusion durations (and also movement speed, not manipulated by Sparenberg et al., 2012), and varied between 18 and 141 ms. Sparenberg et al. (2012) concluded that the stable lag error was a result of a constant time-cost when switching from perception of motion to internal action simulation.

It should be noted that, on closer inspection, it is very difficult to directly compare the two paradigms. For instance, in the stimuli used by Prinz and Rapinett (2008), the occluder was permanently visible, meaning that the point of occlusion was spatially and temporally predictable, and the point of reappearance was at least spatially predictable. In comparison, in Sparenberg et al. (2012) the "occluder square" would suddenly appear and then disappear on the screen, meaning that the spatiotemporal point of occlusion was less predictable, and the position of the reappearing test posture was also not predicted by any properties of the occluder. Secondly, the nature and complexity of the motions that were to be simulated differed greatly between paradigms: the Prinz and Rapinett (2008) transport movement is much more linear in nature than the full body motions used by Sparenberg et al. (2012). This action also only uses a single limb and the individual arm movement has a clearly defined (or implied) end-point or goal.

The combination of predictable occluder onset and offset, plus the simpler, linear quality of the motion in the Prinz and Rapinett (2008) paradigm may contribute to a greater ability to simulate a goal-directed end-point for those actions and thus produce an action simulation with the spatiotemporal properties shown in **Figure 7**. In this situation, lag error will vary according to occluder duration and action speed, as previously discussed (**Figure 7B**). On the other hand, it may not be possible to generate a simulation based on a goal-directed end-point for more complex full body motions, as used in Sparenberg et al. (2012). In this case, the action simulation may predict the ongoing complex motion in a more linear fashion, meaning constant lag costs irrespective of occluder duration. Further experiments are needed to tackle this issue.

Still, the findings from both paradigms are informative regarding the nature of action simulation and its underlying dynamic processes, suggesting that, whilst the precise temporal nature of the simulation may vary with the type of action being simulated, the existence of temporal lag is common.

## *Motion information required for action simulation generation*

As described earlier, Prinz and Rapinett (2008) suggested that action simulation is not merely a perceptual–continuation mechanism but is instead a generative internal modeling that uses information about the perceived motion—and perhaps also motoric knowledge regarding that action—to produce a new goal-directed simulation of the action. If the simulation is generated using motor as well as perceptual information, a pertinent question to ask is: exactly how much visual motion information is required to generate the action simulation?

To test this issue, Parkinson et al. (2012; Experiment 1) used a PLA version of the paradigm, in which the initial prime motion of the PLA was occluded for 500 ms followed by 500 ms of the reappearing actor in motion. The crucial manipulation was the duration of the pre-occluder motion: with each frame of the PLA animation lasting 10 ms, the prime duration was varied to be either 20, 50, 100, 500, or 1000 ms (i.e., the last condition only presented 2 frames of PLA motion before the occluder). Different sections of test motion were presented in method of constant stimulus (MOCS) fashion and participants were asked to judge if the test motion was too early or too late to be a correct continuation. This allowed the accuracy of the action simulations in terms of its temporal lag to be computed in a similar way to that used by Prinz and Rapinett (2008).

The mean lag errors for each of the prime motion durations are shown in **Figure 8**. The lag errors are all negative, meaning that participants tended to judge early offset test motions as being correct continuations. This implies that the action simulation is running slightly slower than the action it is being generated to predict. This is a result we have already encountered in Section The Lag Effect: Towards A Higher Temporal Resolution where the Prinz and Rapinett (2008) and Sparenberg et al. (2012) studies are detailed. What is remarkable in Parkinson et al. (2012) experiment is that the temporal accuracy of the action simulation was not affected by the amount of human motion provided before the occluder: even when presented with as little as 20 or 50 ms of PLA motion (2 or 5 frames), the generated action simulation was just as temporally accurate as it was when participants saw 1 s of motion before the occluder.

Of course, during the course of an experiment the actions will become familiar, so generating a simulation from a brief glimpse of a PLA may not be a finding that will generalize to

other situations, but it is still an interesting finding. This relates to Section Benefits of Real-Time Simulation, in which we described how inserting a very brief amount of point light motion within the occluder can bias subsequent judgments of reappearing motion (Parkinson et al., 2011). We suggested two mechanisms for this: 1) that the inserted motion biases the currently generated action simulation, or 2) that the inserted motion is used as the basis for a re-simulation and the generation of an entirely new action simulation based on the new motion percept. At first the re-simulation notion seems less appealing: is 67 ms of human motion enough to generate a whole new predictive model?

However, the results of the experiment by Parkinson et al. (2012) point to just this and it had been shown previously that very brief exposures to biological motion can provide enough information for adequate processing (Thirkettle et al., 2009). The results of Parkinson et al. (2012) suggest that the action simulation is remarkable in that it can generate real-time motion predictions from very short exposure to familiar human movements, and this perhaps accounts for the results when brief conflicting motion information is inserted in the occluder: the action simulation is re-started using the new motion section. The notion of re-simulation is also appealing in light of the hypothesis brought forward by Prinz and Rapinett (2008), namely that action simulation involves the generation of an internal forward model that combines current motion information, motor knowledge, and information about the implied end-point of the motion.

#### **AN INTERIM SUMMARY**

Action simulation is a process that internally models human movements in real-time. The process of action simulation can be demonstrated and investigated using the occluder paradigm (Graf et al., 2007; Prinz and Rapinett, 2008), in which a human actor disappears from view and then reappears at a position/motion– continuation which can be either correct—as if they had continued moving behind the occluder—or from too late or too early in the sequence—as if the "video" of the motion skipped forward or back. When participants are tested on some orthogonal aspect of the reappearing actor, for example when asked "Has her form been rotated?", they perform better when the test position of the actor is correct with respect to the length of the occlusion. This demonstrates that the action simulation is real-time in nature, modeling the position of the occluded actor at that temporal point (cf. Section Simulation in Real Time: The Occluder Paradigm).

Action simulation has been demonstrated to directly aid the visual perception of a visually degraded human motion, but only when that motion spatiotemporally matches the real-time state of the action simulation (Sections Benefits of Real-Time Simulation and Detection thresholds). We have detailed how visual exposure to even very short durations of human motion can provide sufficient information to generate an action simulation (Sections The Lag Effect: Towards A Higher Temporal Resolution and Motion information required for action simulation generation). We have also described the way in which the ongoing time-course of the action simulation can be manipulated by displaying very short sections of the motion during occlusion, which could again be either temporally congruent with—or earlier or later than—the real-time state of the action simulation at that point. These tended to bias judgments of which reappearing motion was a "correct continuation" in the temporal direction of the inserted motion (Section Benefits of Real-Time Simulation, "Inserted motion"). This illustrates that the action simulation can be updated in real-time.

Finally, whilst it is clear that the action simulation is real-time, in that it unfolds over time as the real action does, the simulation slightly lags the action (Section The Lag Effect: Towards A Higher Temporal Resolution, "Spatial Occlusion and the Teapot Experiment"). Research into the source of this lag error has suggested that the simulation itself is not simply a linear extrapolation of the visual motion of the action before occlusion. Instead, the process of action simulation involves an internal generation of a model of the movement that includes the velocity and acceleration profiles of a newly initiated goal-directed action. This model uses the spatiotemporal point of occlusion as the starting point and the implied goal of the action as its end point. Taken together, we see that action simulation is a process which generates a realtime model of an action that takes into account the goals of the action, probably using one's own implicit motor knowledge, and that the action simulation can be dynamically updated and provide direct perceptual benefits when a human motion is difficult to see.

## **REPRESENTATIONAL MECHANISMS**

While the studies discussed above indicated that perceptual processes can strongly impact how we perceive and predict others' actions, a number of significant unexplored questions regarding the role of motor processes remain. In the following, we discuss this issue based on studies examining how an observer's own physical activity may affect his or her ability to accurately simulate and predict others' actions.

## **SENSORIMOTOR PROCESSES**

A wealth of experimental research has demonstrated strong mutual influences between action perception and execution (for a review see Schütz-Bosbach and Prinz, 2007). While motion detection is impaired when the motions go in the same direction as concurrently performed actions (Hamilton et al., 2004; Zwickel et al., 2007), movement execution depends on similarity-based relationships between go-signals and movements to be performed (Brass et al., 2001; Craighero et al., 2002) and exhibits greater variability when a different movement is concurrently observed (Kilner et al., 2003).

Recent experiments have studied how the representational resources involved in action simulation may be related to the resources involved in action execution and asked: does action execution affect the performance in occluded action tasks considered to reflect internal action simulation? In one of these studies, participants observed arm movements of a PLA while performing a corresponding arm movement themselves (Springer et al., 2011). The executed and observed movements were synchronized; furthermore, they were either fully congruent (i.e., involving the same anatomical body side and the same movement pattern; *full overlap*) or they were fully incongruent on both dimensions (*no overlap*), or they differed in either the anatomical body side used or in the movement pattern involved (*partial overlap*). For example, in a no overlap trial, the participant reached out his right arm to the right side, while the PLA lifted his left arm upwards over his head.

In each trial, the observed action was briefly occluded and then continued by the presentation of a static test pose (Graf et al., 2007). Participants indicated whether the test pose depicted a spatially coherent continuation of the previous arm movement. Two factors were manipulated: occluder time (the duration of occlusion) and pose time (the time at which the posture shown after occlusion was actually taken from the occluded movement), and each of them could take three values (100, 400 and 700 ms; as already explained in Section Simulation in Real Time: The Occluder Paradigm; **Figure 1**).

If real-time simulation takes place, response accuracy should be best when the occluder time (OT) and the pose time (PT) match, because then the internal representation (updated in realtime) should match the actual test pose (Graf et al., 2007). In addition, performance should yield a *monotonic distance function*, which emerges from the three levels of absolute time distances between the OT and PT (i.e., 0, 300 and 600 ms). If the two times match perfectly (i.e., no time distance), the test posture is presented just in time. Running a real-time simulation of the occluded action means an internal representation is run and updated, which can be used as a reference for evaluating the upcoming test pose. If real-time simulation occurs, that internal reference would, in the 0 ms distance condition, precisely match the test pose—whereas that match should be weaker in the conditions with a temporal distance of either 300 or 600 ms. This is reflected by a monotonic distance function, that is, a monotonic decrease of response accuracy with increasing temporal distance (e.g., Graf et al., 2007; Springer and Prinz, 2010). This description of the logic of the occluder paradigm by Graf et al. (2007) is a more technical recapitulation of the description already given earlier on in Section Simulation in Real Time: The Occluder Paradigm.

If internal simulation involves motor resources, the distance function should vary depending on the conditions of motor execution. This was, in fact, indicated. A monotonic distance effect (indicating real-time simulation) emerged when the observer's own movements were similar (but not identical) to the PLA's movements (i.e., partial overlap). In contrast, there was no monotonic distance effect for full overlap and no overlap (i.e., when both movements involved the same body sides and movement patterns and different body sides and movement patterns, respectively). This finding suggests that the degree of a representational overlap between performed and observed actions (e.g., Hommel et al., 2001) influenced the action simulation, as indicated by a monotonic distance effect.

However, spatial congruence may matter (Craighero et al., 2002; Kilner et al., 2009). That is, in one of the conditions of partial overlap, executed and observed movements involved the same movement pattern and occurred at the same side of the screen. This condition clearly showed a monotonic distance effect (i.e., real-time simulation). Hence, spatial congruence may have acted to increase the likelihood with which the participants engaged in internal action simulation when solving the task.

To test this alternative, an additional experiment was run in which participants were instructed that they would see the back view of the PLA, while all other parameters remained constant. This was possible because the PL stimuli being used were ambiguous with regard to front vs. back view. While under front view conditions, spatial and anatomical body side congruence falls apart, the back view manipulation implies that spatial and anatomical congruence corresponds, meaning that if the PLA and the executed action involve the same body side (e.g., left arm), they occur on the same side of the screen (left side). Hence, if spatial congruence matters, a monotonic distance function should occur in this condition.

However, the back view instructions revealed the same pattern as was found under front view instructions (Springer et al., 2011; Experiment 2). Specifically, the *mirror-inverted* constellation (implying spatial congruence between executed and observed movements) did not show a monotonic distance function. Therefore, the findings clearly contradicted a spatial congruence account. This study suggests that action simulation engages motor resources. The strength of the motor influences may depend on the amount of structural overlap between observed and executed actions (as defined by the anatomical side of the body and the movement pattern involved).

Further evidence of this view comes from a study by Tausche et al. (2010) examining effector-specific influences on the prediction of partly occluded full-body actions of a PLA (cf. Graf et al., 2007). While the movements observed were performed with either the arms or the legs, the participants themselves responded with a (different) movement involving either their arms or legs. The results indicated that a correspondence between the effectors observed and the effectors used induced a motor interference effect. Specifically, a monotonic distance effect, indicating realtime simulation, emerged for the incompatible trials (involving different effectors, i.e., arms and legs), whereas no such function occurred for the compatible trials (involving the same effectors, i.e., arms or legs).

Overall, these findings suggest that the accuracy with which an acting observer predicts others' actions may be influenced by anatomical mappings between performed and observed actions (Wapner and Cirillo, 1968; Sambrook, 1998; Gillmeister et al., 2008; Liepelt et al., 2010). This influence may arise at the level of effector-specific formats (Tausche et al., 2010; Springer et al., 2011; cf. Springer et al., 2013). This view accords with the notion that action observation activates the motor system in a corresponding somatotopic manner (Decety and Grèzes, 1999, 2006; Buccino et al., 2001; Grèzes and Decety, 2001).

## **DYNAMIC AND STATIC PROCESSES**

As the above-described studies demonstrated, physical activity does not appear to prevent participants from solving the action occlusion task (Tausche et al., 2010; Springer et al., 2011). Hence, additional and/or alternative processes may contribute to solving this task. Can action simulation recruit additional processes that are less motor-based when motor representations are constrained by execution?

It has, in fact, been suggested that predicting actions over visual occlusions may base on (at least) two different mental operations: dynamic updating and static matching (Springer and Prinz, 2010). Dynamic updating corresponds to an internal realtime simulation that should be indicated by a monotonic distance effect (i.e., performance should be best for time distances of 0 ms and should monotonically decrease for time distances of 300 and 600 ms; as explained previously; cf. Section Simulation in Real Time: The Occluder Paradigm). In the following, we use the term *real-time simulation* (specifying the timing of an assumed internal simulation process) synonymously with *dynamic simulation* and *dynamic updating*.

In addition to dynamic updating, performing an action occlusion task may involve a matching process, implying that the test pose after occlusion is matched against a statically maintained representation derived from the last action pose seen or perceived prior to occlusion (Springer and Prinz, 2010). If static matching takes place, performance in the action occlusion paradigm should decrease with increasing pose times (i.e., 100, 400, or 700 ms), irrespective of the actual duration of the occlusion period, because an increase in the pose time implies a decrease in the similarity between the last visible action pose (shown before occlusion) and the test pose (shown after occlusion) by definition. Hence, while static matching in its pure form predicts a main effect of pose time (but no interaction of occluder time and pose time and, therefore, no monotonic distance function), real-time simulation, in its pure form, predicts a strong interaction (emerging as a monotonic distance function), but no main effect of the pose time factor.

A study by Springer et al. (2013) used body part priming to address this issue. The participants played a motion-controlled video game for 5 min with either their arms or legs, yielding conditions of compatible and incompatible effector priming relative to subsequently performed arm movements of a PLA. The visual actions shown were briefly occluded after some time (action duration of 1254–1782 ms), followed by a static test pose. Participants judged whether or not the test pose showed a spatially coherent continuation of the previous action (as explained previously; cf. Graf et al., 2007). While compatible effector priming (e.g., arms) revealed evidence of dynamic updating (i.e., a monotonic distance effect, but no pose time effect), incompatible effector priming (e.g., legs) indicated static matching (i.e., a pose time effect, but no monotonic distance function). That is, in the compatible effector priming condition, response accuracy was best when the duration of occlusion matched the actual test pose shown after the occlusion, indicating an internal representation of the observed action was updated in real-time, thus matching the actual test pose. In addition, response accuracy decreased monotonically with increasing time difference between the duration of occlusion and the actual test pose (i.e., monotonic distance effect), corresponding to an increase of the time difference between an internal real-time model and the actual action outcome shown in the test pose. Hence, the findings of the compatible condition supported real-time simulation (Graf et al., 2007).

On the other hand, in the incompatible effector priming condition, evidence of real-time simulation was lacking (i.e., the duration of occlusion did not interact with the actual action progress shown in the test pose; see **Figure 1**). In this condition, however, response accuracy decreased with an increase in the pose time factor, implying a decrease in the similarity between the last visible action pose seen prior to occlusion and the test pose seen after occlusion—irrespective of the actual duration of the occlusion period. Thus, after being primed with incompatible effectors, participants were more accurate in the action occlusion task when the test pose shown was more similar to the most recently perceived action pose seen prior to occlusion (pose time effect). This effect cannot be explained by internal updating of the last perceived action image. It supports static matching. Instead of matching the test poses against real-time updated representations, participants in this condition may have alternatively matched the test poses against statically maintained representations derived from the most recently perceived action pose, which were maintained and then used as a static reference for the match with the upcoming test pose (Springer et al., 2013 cf. Springer and Prinz, 2010).

These results suggest that recognizing and predicting others' actions engages two distinct processes: dynamic updating (simulation) and static matching. The degree to which each process is involved may depend on contextual factors, such as the compatibility of the body parts involved in one's own and others' actions. Converging evidence comes from studies with a quite different focus of interest, for example, studies using semantic priming as a means of experimental context manipulation and addressing verbal descriptions of meaningful actions, rather than the kinematics involved in those actions.

## **SEMANTIC PROCESSES**

The experiments we are going to consider now investigated the relationships between the processes involved in predicting occluded actions and those involved in semantic processing of verbal contents (Springer and Prinz, 2010; Springer et al., 2012; cf. Prinz et al., 2013). A great deal of previous research has indicated that motor processes are involved during the understanding of language that describes action (e.g., Pulvermüller, 2005, 2008; Andres et al., 2008; Fischer and Zwaan, 2008). For instance, while words denoting "far" and "near" printed on objects to be grasped yielded comparable effects on movement kinematics to the actual greater or shorter distances between hand position and the object (Gentilucci et al., 2000), processing verbal descriptions of actions activated compatible motor responses (e.g., Glenberg and Kaschak, 2002; Glenberg et al., 2008) and supported the conduct of reaching movements when the verb was processed prior to movement onset (Boulenger et al., 2006).

To what extent would verbal primes modulate the internal simulation of actions under conditions of temporary occlusion? In one study, the occluded action task was always preceded by a lexical decision task (Springer and Prinz, 2010; Experiment 2). Specifically, the participants judged whether a single word (onset 1250 ms) was a valid German verb (which was the case in 75% of trials, whereas pseudo-verbs appeared in the remaining 25 %). While all 102 verbs shown (all of them in the infinitive form) described achievable full-body actions, one half expressed high motor activity (like springen—"to jump") while the other half expressed low motor activity (like *stehen*—"to stand"). This (relative) distinction of high vs. low motor activity resulted from an independent rating of the words by 20 volunteers.

On each trial, the lexical decision task was immediately followed by an occluded action task (as described previously) displaying a familiar PLA involving the whole body (e.g., lifting something from the floor, putting on a boot, or getting up from a chair). Instructions for the two tasks were given to make them appear to be completely unrelated to each other. However, as the results clearly showed, verbal content affected performance in this task. While lexical decisions involving high-activity verbs revealed a pronounced monotonic distance function (taken as a signature of internal real-time simulation), no such effect emerged for trials involving lexical decisions about low-activity verbs. We took these results as first evidence for the idea that the processes involved in an occluded action task may be tuned by the dynamic qualities of action verbs. To test this assumption, we ran another experiment in which the same verbs were used, but they were further differentiated according to the speed being expressed by "fast," "moderate," and "slow" action verbs based on an additional word rating (e.g., "to catch," "to grasp," "to stretch," respectively; Springer and Prinz, 2010; Experiment 3). While words expressing fast and moderate actions produced a monotonic distance effect (indicating real-time simulation), slow action words clearly did not. That is, when the action occlusion task followed lexical decisions about verbs denoting fast and moderately fast actions, the monotonic distance function turned out to be more pronounced and steeper compared to trials in which the task was preceded by lexical decisions involving slow activity verbs.

These experiments suggest that language-based representations can affect the processes used for predicting actions observed in another individual. However, because the prime verbs always required lexical decisions, participants may have noticed that

some of the verbs matched the visual actions, while others did not. Therefore, when responding to the test poses after occlusion (i.e., deciding whether or not it depicted a coherent continuation of the action), participants may have been more likely to give "yes" responses after a "match" than a "mismatch" (e.g., Forster and Davis, 1984).

To control for such strategy-based effects, we ran an additional experiment in which the prime verbs were masked and did not require any response at all (Springer et al., 2012). Specifically, ten verbs that had been rated as very fast (e.g., fangen—"to catch") and ten verbs rated as very slow (e.g., lehnen—"to lean") were briefly presented (onset 33 ms) embedded within a forward and a backward mask consisting of meaningless letter strings. Hence, people were not consciously aware of the verbal primes and were unlikely to engage in any deliberate response strategies (e.g., mapping the semantic content to the observed actions; see Forster, 1998; Van den Bussche et al., 2009). Still, masked priming revealed a similar result: While a pose time main effect was always present, indicating static matching was involved, a pronounced monotonic distance effect (taken to reflect dynamic updating, i.e., real-time simulation) emerged for verbs expressing dynamic actions, while it was lacking for verbs expressing static actions and meaningless letter strings (Springer et al., 2012).

While masked words are not visible, they have still been shown to access semantic processing levels (Kiefer and Spitzer, 2000; Schütz et al., 2007; Van den Bussche and Reynvoet, 2007). Also, when we used a non-semantic, purely visual priming of action dynamics (by presenting dots rotating with slow, moderate, or fast speed), a monotonic distance effect was lacking. Overall, the observations from both conscious and unconscious priming experiments seem to suggest that the semantic content implied in verbal processing has an impact on procedural operations involved in a subsequent occluded action task.

To better understand the nature of these effects the details of the putative internal action representation during occlusions and its underlying mental operations described above must be considered. Specifically, predicting occluded actions seems to imply two processes: dynamic updating and static matching. Hence, the observation that the slope of the monotonic distance function (indicating dynamic updating) is more pronounced after processing high-activity, as compared to low-activity action verbs, suggests (at least) two different functional interpretations (cf. Prinz et al., 2013). One is to consider a direct impact of verbal semantics on simulation dynamics—in the sense that the degree of activity expressed in the verbs affects the speed of simulation (faster after processing high-activity verbs as compared to lowactivity verbs). The other alternative is that the distance function actually reflects a blend of performance resulting from two ways of solving the task: dynamic updating and static matching. While dynamic updating relies on real-time updating of the internal reference against which the test pose is matched, static matching relies on an internal reference that is static and may be derived from the last posture seen before occlusion (Springer and Prinz, 2010; Prinz et al., 2013).

Based on a direct differentiation of the two processes, as described previously (Springer et al., 2013), we argue that the results by Springer et al. (2012) can be best understood as a blend of outcomes of static and dynamic processes. That is, semantic verb content may modulate the relative contributions of two processes, static matching and dynamic updating. While highactivity verb contents invite stronger contributions of dynamic processing than low-activity contents, low-activity contents may promote stronger contributions of static processing.

In sum, the results from both explicit and implicit semantic priming experiments suggest that semantic verbal contents may impact on the mental operations involved when observers engage in recognizing actions that are transiently covered from sight (Springer and Prinz, 2010; Springer et al., 2012). Further studies converge with this view, although addressing semantic interference rather than priming effects (e.g., Liepelt et al., 2012; Diefenbach et al., 2013). For example, Liepelt et al. (2012) found evidence of interference between language and action, demonstrating that word perception influences hand actions and hand actions influence language production. Overall, one may conclude that internal action simulation and semantic processing can access common underlying representations, a view that corresponds to recent accounts of embodied cognition (e.g., Barsalou, 2003, 2008; Zwaan, 2004; Pulvermüller, 2005, 2008; Glenberg, 2008; Mahon and Caramazza, 2008, 2009).

## **A FRAMEWORK FOR ACTION REPRESENTATION**

This paper focuses on experimental research investigating action simulation through systematic manipulation of the factors that influence how we perceive and predict actions observed in other people. While the studies discussed here differ according to a number of methodological aspects, including postdictive and predictive types of measurements, as well as the features studied, including the time course, sub-processes, and representational grounds of action simulation, all of them involve variations of an action occlusion paradigm (Graf et al., 2007; Prinz and Rapinett, 2008). This paradigm requires observers to evaluate the course of actions that are briefly and transiently covered from sight. When visual input is lacking, observers need to strongly rely on internally guided action representations. Thus, the paradigm allows for systematic testing of the cognitive underpinnings of action simulation and its internal processes and resources.

Several new insights emerged from the findings of these action occlusion paradigms. First, action simulation enables observers to render quite precise real-time predictions of others' actions (Graf et al., 2007; Parkinson et al., 2011; Sparenberg et al., 2012). For instance, observers were highly accurate in differentiating between time-coherent and time-incoherent continuations of temporarily occluded human full-body actions (Sparenberg et al., 2012) and spatially occluded human hand actions (Prinz and Rapinett, 2008). Hence, action simulation may involve an internal predictive process that runs in real-time with observed actions. This process may act on newly created action representations rather than relying on continuous visual extrapolations of observed movement trajectories (Prinz and Rapinett, 2008).

Second, action simulation seems to be highly susceptible to subtle visual manipulations, indicating that it draws on perceptual representations of diverse aspects of human motion and kinematic features, which may enable observers to develop highly accurate predictions about actions observed even after quite short phases of visual observation (Parkinson et al., 2012).

Third, action simulation can be influenced by an observer's own physical activity. Thus, the representational resources involved in internal action simulation may be related to the resources involved in motor execution. The strength of the motor influences varied according to the degree of correspondence between observed and performed actions, for instance, regarding the effectors involved (Tausche et al., 2010; Springer et al., 2011).

Fourth, predicting actions through periods of occlusion may involve two distinct processes: dynamic updating and static matching. While dynamic updating corresponds to real-time simulation, static matching implies that recently perceived action images are maintained as an internal reference against which newly incoming action information can be matched. The relative proportion to which the two processes are used may depend on contextual factors such as a correspondence of body parts involved in performed and perceived action (Springer et al., 2013).

Fifth, internal action simulation was affected by linguistic processing of action-related words. While prime verbs describing dynamic actions corresponding to the observed actions (i.e., implying movement of the limbs) revealed evidence of dynamic updating, this was not the case for those describing static actions (implying no movement of the limbs) (Springer and Prinz, 2010). This occurs even if people are not consciously aware of these action verbs and, thus, not prone to deliberate response strategies, suggesting that action simulation may involve semantic representational resources (Springer et al., 2012).

In the next section we are going to place the experimental evidence in the wider context of major theoretical issues in the broad domain of action and event representation.

## **REAL-TIME SIMULATION AND PREDICTIVE CODING**

Several studies in which observers had to predict temporarily occluded actions have shown that prediction accuracy was best when the actions reappeared in a time-consistent manner after occlusions. In addition, prediction accuracy systematically decreased as the time gap between the duration of occlusion and the temporal advance of the action stage shown after occlusion increased (Graf et al., 2007; Springer and Prinz, 2010). These findings correspond to the notion that action simulation involves internal models that run in real-time with observed action (Verfaillie and Daems, 2002; Flanagan and Johansson, 2003; Rotman et al., 2006). Furthermore, internal real-time simulation was affected by the observers' own physical activity (Tausche et al., 2010; Springer et al., 2011).

Possible explanations for these results come from a predictive coding account of motor control (e.g., Kilner et al., 2007, 2009) and from the broader Theory of Event Coding (TEC; Prinz, 1990, 1997, 2006; Hommel et al., 2001). Efficient visuo-motor control requires estimating one's own body state prior to movement execution, which is based on internal forward models. These internal forward models allow individuals to anticipate the sensory consequences of their own movements in real-time based on motor commands (i.e., efference copies; Wolpert and Flanagan, 2001). They may also operate when observers engage in predicting actions observed in others (Grush, 2004; Blakemore and Frith, 2005; Prinz, 2006; Thornton and Knoblich, 2006; Kilner et al., 2007). Internal sensorimotor simulations may contribute to perceptual processing by generating top-down expectations and predictions of the unfolding action, allowing to precisely anticipating others' actions (see Wilson and Knoblich, 2005, for a review).

According to TEC, codes of perceived events and planned actions share a common representational domain. Perceptual codes and action codes may, thus, influence each other on the basis of this representational overlap. For instance, during different motor cognitive tasks (i.e., action observation or motor imagery), the cortical representations of a target muscle and a functionally related muscle were enhanced within a single task and across different tasks, suggesting a topographical and functional overlap of motor cortical representations (Marconi et al., 2007; cf. Dinstein et al., 2007). This overlap may provide a basis for anticipating others' actions by mapping those actions onto one's own sensorimotor experiences (Jeannerod, 2001; Gallese, 2005).

The participants in the experiments reported here (Section Representational Mechanisms) may have applied the same motor representations that were activated during execution to predicting a corresponding action observed. If so, the requirement to internally simulate an observed action may be reduced when observed and concurrently performed actions fully correspond, because under this condition execution by itself may already provide a continuously updated internal reference by which the occlusion task can be solved. Hence, this condition yielded better task performance than conditions in which observed and performed actions were not (or only partially) similar to each other (Springer et al., 2011). Given a complete lack of correspondence, execution may strongly interfere with internal simulation (Prinz, 1997; Wilson and Knoblich, 2005) such that internal simulations need to be shielded from information available from executing a movement that is entirely different from the observed one. Hence, this condition did not reveal evidence of internal simulation but showed increased errors, suggesting interference from execution to simulation (cf. Tausche et al., 2010). In fact, running real-time simulations of observed actions may be efficient for solving the task only when executed and observed movements are similar (but not when they are identical or fully incongruent on each possible dimension) (Springer et al., 2011). Here, evidence of real-time simulation was obtained, suggesting that the cost/benefit ratio for running internal sensorimotor simulations was more balanced, whether this is due to congruence in terms of the anatomical body sides used (Wilson and Knoblich, 2005) or the exact movement patterns involved in observed and performed actions (Kilner et al., 2003).

In line with this view, the temporal predictions generated by one's own motor system for efficient motor control may also be applied when predicting other people's actions (Blakemore and Frith, 2005; Kilner et al., 2007). Observers are able to quite precisely predict not only the sensory consequences of their own actions, but also those of others' actions (e.g., Sato, 2008). Furthermore, based on the observation of the communicative gestures of an agent in dyadic interaction, they are able to render quite precise predictions about when the action of the second agent will take place (Manera et al., 2013).

Neuroscientific studies have clearly shown the involvement of motor brain regions in action observation (e.g., Gallese et al., 1996; for a review see Iacoboni and Dapretto, 2006). This corresponds to the notion that an observer uses his or her motor system to simulate and predict others' actions (i.e., internal modeling on the basis of the observer's own sensorimotor experiences; e.g., Jeannerod, 2001; see Schubotz, 2007, for a review). When observers predicted transiently occluded full-body actions, different parts of the action observation network, including the dorsal premotor cortex, were involved (Stadler et al., 2011). Furthermore, grasp observation yielded increased activation of this network, including the dorsal premotor cortex and posterior parietal brain regions, which may reflect a motor simulation process for object-directed hand actions observed (Ramsey et al., 2012). Moreover, observing the start and middle phases of an action sequence yielded higher motor facilitation than observing the final postures of these actions (Urgesi et al., 2010), suggesting that parts of the human motor system are preferentially activated by predictive sensorimotor simulations of actions observed in other people (Blakemore and Frith, 2005; Kilner et al., 2007).

## **DYNAMIC UPDATING AND STATIC MATCHING**

Several experiments have indicated that two distinct processes may be involved when observers engage in predicting the future course of other people's actions: dynamic updating (corresponding to real-time simulation) and static matching (Section Representational Mechanisms). The relative contributions of dynamic and static processes may depend on contextual factors. For example, while priming the same effectors as perceived in another person revealed evidence of dynamic updating, priming incompatible effectors clearly did not (Springer et al., 2013). After incompatible effector priming, however, observers were better able to predict an occluded action when the action stage shown after occlusion was more similar to the most recently perceived action pose (seen prior to occlusion). This effect cannot be explained by internal real-time updating. It supports static matching. Instead of being matched against real-time updated internal models, test poses may, alternatively, be matched against statically maintained representations derived from the most accessible action pose, which are maintained and then used as a static reference for the match with the upcoming action.

Adopting a common coding perspective (TEC; Prinz, 1990, 1997; Hommel et al., 2001; Prinz and Hommel, 2002), participants may have mapped the (sensorimotor) representations used for acting to solve the action occlusion task. If action representations that were recently accessed could be mapped onto the actions perceived due to common representational grounds (i.e., due to effector compatibility), dynamic updating may be strengthened because recently activated internal real-time models (used for controlling one's own actions) can be mapped onto the perceived actions. Hence, using a compatible (but not incompatible) effector may aid action prediction (Reed and McGoldrick, 2007) and may foster internal real-time simulation (Springer et al., 2011).

On the other hand, if recently accessed action representations are not (or are less efficiently) applicable to internal forward models of perceived actions (due to effector incompatibility), realtime simulation may be constrained (Prinz, 1990, 1997; Hommel et al., 2001). Hence, incompatible effector priming fosters static matching as an alternative process for solving the action occlusion task, that is, matching internally stored action images without the involvement of (possibly conflicting) internal real-time models (Springer et al., 2013).

Corresponding to this view, observers were generally more accurate at predicting occluded actions after compatible than incompatible body part priming (Springer et al., 2013). This finding may suggest that real-time simulations yielded, overall, more precise predictions than static matching. This view corresponds to the notion that internal sensorimotor activation (simulations) are used when predicting others' actions (Blakemore and Frith, 2005; Wilson and Knoblich, 2005; Kilner et al., 2009) and that action observation activates premotor brain regions in a somatotopic way (i.e., reflecting the body parts being observed; Decety and Grèzes, 1999, 2006; Buccino et al., 2001; Sakreida et al., 2005).

## **ACTION SEMANTICS**

Several experiments indicated that the precision by which observers were able to predict the future course of an action was affected by verbal primes (Section Semantic Processes). One intriguing explanation for this is to assume that language-based descriptions of actions may modulate the relative involvement of two processes: dynamic updating (i.e., real-time simulations) and static matching (as explained previously).

A large body of evidence shows that processing verbal information is closely linked to information processing in sensory and motor domains, indicating that activation of semantic knowledge coincides with activation of corresponding sensory and/or motor representations (Barsalou, 2003, 2008; Barsalou et al., 2003; Glenberg, 2008; Kiefer et al., 2008; Pulvermüller, 2005, 2008; Mahon and Caramazza, 2008, 2009). Likewise, many studies have indicated that motor control may be closely linked to semantic processing, such that the kinematics of ongoing movements are affected by semantic processing (Gentilucci et al., 2000; Glover et al., 2004; Boulenger et al., 2006, 2008).

Related to the studies reported here (Section Semantic Processes), one may assume that verbs describing dynamic action and implying movement of the limbs (corresponding to the observed actions) act to strengthen the involvement of dynamic updating over static matching due to common representational grounds between meaning and movement (Barsalou, 2003, 2008; Pulvermüller, 2005; Glenberg, 2008). As a result, dynamic updating was indicated when participants accessed verbs expressing a dynamic action prior to an action occlusion task. Correspondingly, static action verbs, which did not imply movement of the limbs, did not indicate dynamic updating. Static (and meaningless) primes may have favored the contribution of static matching, thus, preventing an indication of dynamic updating from occurring (Springer and Prinz, 2010; Springer et al., 2012).

This pattern was even observed when people were not aware of the primes and were, thus, unlikely to have engaged in deliberate task strategies (Springer et al., 2012). When the verbal descriptions involved a coding of action dynamics that corresponded to the visual actions, dynamic real-time simulation was indicated. Hence, linguistic representations may trigger anticipatory internal simulations, thus affecting the processes involved in an action prediction task (Springer and Prinz, 2010; Springer et al., 2012).

Overall, the observation of a semantic modulation of action simulation converges with recent evidence supporting the notion of close links between semantic processing and internal action simulation (Liepelt et al., 2012; Diefenbach et al., 2013). This view is consistent with embodied accounts, which hold that understanding action language coincides (or even requires) internal sensorimotor simulations (or reactivation) of the described action. In these theories, sensorimotor simulation is understood as the activation of the same representations (and neural structures) that are derived from bodily experience, but in the absence of overt performance (e.g., Glenberg and Kaschak, 2002; Barsalou et al., 2003; Zwaan, 2004; Pulvermüller, 2005; Zwaan and Taylor, 2006; Barsalou, 2008; see Rumiati et al., 2010, for a review).

Recent evidence has clearly demonstrated cross-talk effects between action language and execution (e.g., Nazir et al., 2008). Processing action verbs modulated the kinematics of movements relative to nouns without motor associations (Boulenger et al., 2006). Parts of the motor system were activated when words and sentences implying the corresponding actions (e.g., the same effector) were perceived (Buccino et al., 2001; Aziz-Zadeh et al., 2006). Pulvermüller et al. (2005) found somatotopic activity in the motor cortex when participants were listening to faceand leg-related action words; corresponding to the view that motor regions of the brain are involved in action word retrieval (Pulvermüller, 2005). Furthermore, reading hand-related action verbs conjugated in the future enhanced the excitability of hand muscles relative to reading the same verbs conjugated in the past tense; indicating that an activation of predictive sensorimotor simulations is not restricted to direct action observation but may also be induced by action-related features derived from linguistic stimuli (Candidi et al., 2010).

## **LIMITATIONS, OPEN QUESTIONS, FUTURE DIRECTIONS**

One conclusion from several studies discussed in this paper is that one mechanism by which a given action perception context can modulate the precision of internal predictions about the future course of other people's actions is by altering the relative contributions of dynamic and static processes. While dynamic updating corresponds to an internal predictive simulation process, static matching implies that most recently accessed action representations are maintained and then retrospectively used for evaluating newly incoming information (e.g., Springer and Prinz, 2010; Springer et al., 2013). However, although this model seems to fit several of the studies we have discussed here, there are some limitations and open issues to consider.

Firstly, neither of the two processes by themselves speaks to the representational modality to which the operations pertain (e.g., updating/matching in the visual and/or motor domain). Possibly, predicting occluded actions may not rely on only one representational domain but may involve alternating or simultaneous processes in different domains (e.g., visual driven static matching and motor driven dynamic simulation). Likewise, the order in which the two processes may run (e.g., trial-by-trial or in parallel) is, at this point, an open question that needs to be addressed in future work.

Secondly, the accuracy of predicting (simulating) actions may be moderated by individual characteristics such as age or sensorimotor expertise. While many studies have shown that higher motor expertise goes along with stronger motor simulation during observation of actions from the respective domain of expertise (Calvo-Merino et al., 2005; Cross et al., 2006; Aglioti et al., 2008; Urgesi et al., 2012), only few studies have illuminated how the aging process might interact with sensorimotor expertise during action prediction (Diersch et al., 2012, 2013). Diersch et al. (2012) found that figure skating expertise can improve both young and older adults' action prediction abilities when those actions are within the observer's domain of physical expertise. Thus, sensorimotor expertise, even when acquired many decades ago, may still strongly impact our ability to precisely predict others' actions. Thirdly, the interpersonal relationship between an observer and the agent observed may matter. This may concern close relationships (e.g., children, parents, or romantic partners) and novel social partners (e.g., strangers). Taking self-generated actions as an extreme illustration of actions to which observers have privileged access, it has been shown that observers are most accurate in predicting those actions that they are able to perform themselves (e.g., Knoblich et al., 2002). Apart from allowing one to regulate one's own behavior, such privileged self-recognition enables recognition of the effects of one's own actions as being self-generated (Jeannerod, 1999, 2003; Frith et al., 2000).

Although the focus scope of the current paper was quite narrow, in that it focused on action simulation, experimentally investigated by behavioral action occlusion paradigms, considering other strands of action simulation research, like modelling studies (e.g., Fleischer et al., 2012) or studies focusing on the processing of robot vs. humanoid form and motion (e.g., Saygin and Stadler, 2012; see Gowen and Poliakoff, 2012, for a review), may complement the work discussed here.

## **REFERENCES**


On a neuroscientific level, investigating the involvement of common and/or distinct brain networks in relation to the different processes engaged in action prediction seems to be highly promising (e.g., Schiffer and Schubotz, 2011; Ramsey et al., 2012; cf. Szpunar et al., 2013). Yet, only few human fMRI studies have examined action simulation by use of action occlusion paradigms (Stadler et al., 2011; Diersch et al., 2013). In line with our notion that predicting others' actions recruits dynamic and static processes, Stadler et al. (2011) found that different portions of the premotor cortex play different roles in each of these aspects. While the right pre-supplementary motor area (pre-SMA) was recruited for maintaining an internal reference of transiently occluded actions, dynamic updating of internal action representations yielded increased activation in the pre-SMA and the dorsal premotor cortex (PMd) (Stadler et al., 2011; see also Stadler et al., 2012a).

In sum, the studies we have discussed in this paper collectively suggest that action simulation can be conceived of as a highly susceptible, dynamic process that runs in real-time with actions observed, involving sensorimotor and semantic representations. Moreover, when predicting the future course of other people's actions, dynamic simulations may co-exist with similarity-based evaluations of statically maintained action representations (static matching). The relative involvement of both processes, dynamic simulation and static matching, may be tuned by contextual factors, like understanding action-related verbal contents, or actually performing actions corresponding to those observed in other people. This view corresponds to the general assumption that an observer can use his or her own motor system to internally simulate and predict others' actions (Grèzes and Decety, 2001; Jeannerod, 2001) and is compatible with a more specific predictive coding account of motor control (e.g., Kilner et al., 2007).

200 ms of processing. *J. Cogn. Neurosci.* 18, 1606–1615. doi: 10.1162/jocn.2006.18.10.1607


(2004). Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. *Neuron* 42, 323–334. doi: 10.1016/S0896- 6273(04)00181-3


*Q. J. Exp. Psychol.* 61, 825–850. doi: 10.1080/17470210701623605


*Brain Sci.* 24, 849–937. doi: 10.1017/S0140525X01000103


*Curr. Biol.* 13, 522–525. doi: 10.1016/S0960-9822(03)00165-9


self-other correspondences. *Soc. Neurosci.* 2, 134–149. doi: 10.1080/17470910701376811


20, 2511–2521. doi: 10.1093/cercor/bhp292


*Child Development* 39, 887–894. doi: 10.2307/1126991


*Exp. Psychol.* 60, 1063–1071. doi: 10.1080/17470210701288722

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 April 2013; accepted: 10 June 2013; published online: 04 July 2013.*

*Citation: Springer A, Parkinson J and Prinz W (2013) Action simulation: time course and representational mechanisms. Front. Psychol. 4:387. doi: 10.3389/fpsyg. 2013.00387*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Springer, Parkinson and Prinz. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

#### *E. J. Masicampo1 \* and Roy F. Baumeister <sup>2</sup>*

*<sup>1</sup> Department of Psychology, Wake Forest University, Winston-Salem, NC, USA <sup>2</sup> Department of Psychology, Florida State University, Tallahassee, FL, USA*

#### *Edited by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA*

#### *Reviewed by:*

*Ezequiel Morsella, San Francisco State University and University of California, San Francisco, USA T. Andrew Poehlman, Southern Methodist University, USA*

#### *\*Correspondence:*

*E. J. Masicampo, Department of Psychology, Wake Forest University, Greene Hall 415, PO Box 7778, Reynolda Hall, Winston-Salem, NC 27109, USA e-mail: masicaej@wfu.edu*

Humans enjoy a private, mental life that is richer and more vivid than that of any other animal. Yet as central as the conscious experience is to human life, numerous disciplines have long struggled to explain it. The present paper reviews the latest theories and evidence from psychology that addresses what conscious thought is and how it affects human behavior. We suggest that conscious thought adapts human behavior to life in complex society and culture. First, we review research challenging the common notion that conscious thought directly guides and controls action. Second, we present an alternative view—that conscious thought processes actions and events that are typically removed from the here and now, and that it indirectly shapes action to favor culturally adaptive responses. Third, we summarize recent empirical work on conscious thought, which generally supports this alternative view. We see conscious thought as the place where the unconscious mind assembles ideas so as to reach new conclusions about how best to behave, or what outcomes to pursue or avoid. Rather than directly controlling action, conscious thought provides the input from these kinds of mental simulations to the executive. Conscious thought offers insights about the past and future, socially shared information, and cultural rules. Without it, the complex forms of social and cultural coordination that define human life would not be possible.

**Keywords: consciousness, conscious thought, action control, cultural evolution, unconscious**

Humans enjoy a private, mental life that is richer and more vivid than that of any other animal (e.g., Damasio, 1999; Edelman, 2004; Suddendorf, 2006). Yet as central as the conscious experience is to human life, numerous disciplines have long struggled to explain it (e.g., Blackmore, 2005). The present paper reviews the latest theories and evidence from psychology that addresses what conscious thought is and how it affects human behavior.

Our focus is on the type of conscious experience that is unique to humans. A common practice in discussions of consciousness is to distinguish between two levels (e.g., Damasio, 1999; Edelman, 2004). The first level, phenomenal awareness, is shared by humans with most other animals. It comprises the experience of sensations, feelings, or qualia. The second level, conscious thought, is largely unique to humans and includes self-awareness, inner reflections, and deliberations. This paper reviews the latest research on the link between this second level of consciousness and human action. Earlier and more extensive treatment of these issues is available in Baumeister and Masicampo (2010) and Baumeister et al. (2011).

We suggest that conscious thought adapts human behavior to life in complex society and culture. First, we review research challenging the common notion that conscious thought directly controls action. Second, we present an alternative view—that conscious thought processes actions and events that are typically removed from the here and now, and that it indirectly shapes action to favor culturally adaptive responses. Third, we summarize recent empirical work on conscious thought, which generally supports this alternative view.

## **QUESTIONING CONSCIOUS THOUGHT AS THE CONTROLLER OF ACTION**

A commonly held view assumes that conscious thought is in charge of behavior (e.g., Wegner, 2002). However, several decades of psychology research have challenged this notion. The findings have shown that conscious thought has limited access to the mind's inner workings, while revealing that the unconscious is capable of initiating and guiding behavior without help from conscious thought.

### **THE LIMITATIONS OF CONSCIOUS THOUGHT**

If conscious thought were in charge of behavior, then people could presumably report and explain their actions accurately. To the contrary, Nisbett and Wilson (1977) showed repeatedly that people who were asked to explain their actions would overlook factors that had demonstrably large influences on their behavior. People even denied those influences when asked about them directly. Thus, when people introspect about their behaviors, they seem incapable of retrieving accurate accounts of what they did and why.

Gazzaniga (2000) has suggested that people explain their behaviors by fabricating stories. In his research, brain damaged patients who could not explain their behaviors accurately were nevertheless quick to provide plausible, though obviously false, explanations for their actions. More recent research has demonstrated a similar phenomenon in normally functioning adults (Johansson et al., 2005). This work employed sleight of hand to dupe participants into explaining decisions they did not make. Most participants failed to notice that these were not their decisions. Furthermore, they had no problem providing quick and elaborate explanations for why they made them, even though the explanations could not have possibly been true. The general pattern thus seems to be that people are unaware of their own behaviors. If the conscious self cannot recognize its own actions, it is unlikely that it controls them.

A further limitation of conscious thought is that it is too slow to initiate behavior. Libet (1985) observed people as they decided to initiate simple motor movements. His data revealed that conscious choices were too delayed to be the true source of behavior. Unconscious processes, on the other hand, were much earlier indicators of action (in milliseconds; for a recent conceptual replication of Libet's work, see Soon et al., 2008). Even for more complex decisions, such as how to vote in an upcoming election, conscious decisions appear days after the unconscious has made up its mind (Galdi et al., 2008). According to these findings, the conscious self receives its information too late in the chain of events to be the initiator of behavior.

## **THE DYNAMIC UNCONSCIOUS**

Other work has revealed that unconscious processes are capable of initiating and guiding action, including for complex behaviors once thought to require conscious control. Bargh and his colleagues (Bargh and Chartrand, 1999; Bargh and Morsella, 2008) have argued that most human behaviors are initiated automatically and unconsciously in response to environmental cues. Many social motives and goals have been shown to operate in this way. The mere exposure to words related to achievement can trigger a range of motivated behaviors aimed at attaining mastery over later tasks (Bargh et al., 2001). Crucially, participants are typically unaware of these environmental influences on their behavior. Thus, the initiation and subsequent regulation of behavior occurs despite the person having no conscious awareness of the process, including for complex sets of actions.

Thus, the emerging view in recent decades has been that conscious thought is not the all-powerful controller of behavior that many perceive it to be (Pocket, 2004; Dijksterhuis et al., 2005). The conscious self is often mistaken about what it does and why. Furthermore, the unconscious seems capable of guiding much of what people do. If conscious thought affects human action, it is not in the manner of controlling moment-to-moment actions. Its influence on behavior must lie elsewhere.

## **CONSCIOUS THOUGHT SERVES SOCIAL AND CULTURAL FUNCTIONS**

The above empirical work has prompted a revised understanding of how conscious thought relates to action (e.g., Wegner, 2002; Pocket, 2004). Some have speculated that there is no role for conscious thought in determining behavior (e.g., Dijksterhuis et al., 2005). In our more positive view, nature would not have equipped humankind with such a complex capacity as conscious thought if it did not serve an adaptive function. We propose that conscious thought may have powerful indirect effects on behavior even if it does not directly control it. Furthermore, given the uniquely human nature of conscious thought, we suggest it likely serves uniquely human needs—particularly, social and cultural ones.

## **SOCIAL COORDINATION AND COMMUNICATION**

We propose that conscious thought enables coordination with the social and cultural environment (Baumeister and Masicampo, 2010). Our thinking follows other perspectives that emphasize social pressures as the driver of uniquely human mental capacities. These have argued that primate intelligence evolved for the purpose of adapting to social life (Byrne and Whiten, 1988; Dunbar, 1998), with humans further evolving the motivation to understand others' mental states and to communicate their own mental states with others (Tomasello, 1999; Tomasello et al., 2005). We propose that this pressure to communicate with others transformed thinking from an individual capacity to a social one.

James (1890) famously asserted that "thinking is for doing." We suggest that much of conscious thinking might instead (or also) be for talking. Consistent with that view, conscious thought and speech seem to emerge in complementary ways both in phylogeny and in development (see Baumeister and Masicampo, 2010). The link between conscious thought and speech is also observable among adults, in whom the full processing of language requires conscious thought (Greenwald and Liu, 1985), and conscious thinking suffers if inner speech is suppressed (Emerson and Miyake, 2003).

We suggest that conscious thought and communication afford numerous advantages. People who share their thoughts within a group can correct one another's mistakes, and so talking enables drawing on others' wisdom. People who communicate can also reach agreements with one another, taking into account others' intentions, knowledge, and resources. Thus, talking also allows for coordination and collaborative planning.

## **WHY COMMUNICATION REQUIRES CONSCIOUS THOUGHT**

The proposition that conscious thoughts are largely for communicating does not by itself explain how conscious thoughts influence action or why they need be conscious in the first place. One answer to these questions was provided by Morsella (2005) to explain phenomenal awareness, and his answer applies to conscious thought as well. He argued that consciousness allows for communication across disparate parts of the mind, so that inner conflicts can be resolved. For organisms with few motivations, responses to sensory input can be supplied with relatively little information processing. For humans and most animals, however, motivations co-occur and conflict. In these situations, different parts of the mind offer diverging prescriptions for behavior. One part urges the body to flee while another calls on it to stay put. A major function of consciousness is to broadcast incoming sensory input to the disparate parts of the mind so that multiple needs may be negotiated and an optimal course of action taken. Phenomenal consciousness allows conflicts originating from the physical environment to be resolved. In humans, we propose that conscious thought enables conflicts originating from society and culture to be resolved as well.

A second answer to the questions about the utility of consciousness is that conscious thought makes possible certain kinds of information processing that the unconscious cannot perform. Specifically, we see conscious thought as the place where the unconscious creates meaningful sequences of events or ideas. Language is one important example. The unconscious can process only single words, but conscious thought can combine words into meaningful sentences (Baars, 2002). Furthermore, the amount of information that can be communicated in sentences is infinitely more than the amount that can be captured in single words. It is only through the integrative serial processing afforded by conscious thought that the mind can combine simple concepts to produce novel conclusions. Indeed, we argue that a key function of conscious thought is to enable the unconscious to derive new insights from the information it already has.

Many types of thinking are made possible by conscious thought, and each provides a means for the unconscious to reach new conclusions without acquiring additional outside information. These include logical reasoning, quantification, and causal understanding (e.g., DeWall et al., 2008). As with language, each of these thought processes involves combining simple ideas in accordance with shared rules. Furthermore, we propose that each produces novel conclusions that can be communicated to others or incorporated into one's own decisions and behaviors.

These categories of sequential thought may seem non-social, but we argue that each is a cultural process. Each type of thought employs rules communicated within culture. And each allows individuals to operate successfully within the culture, whether it is used to cooperate with others, justify one's actions (Haidt, 2007), or argue (Mercier and Sperber, 2011).

## **TRANSLATING CONSCIOUS THOUGHT INTO ACTION**

Conscious thought influences action via mental simulation (Baumeister and Masicampo, 2010). Much of conscious thinking involves simulating non-present events (Kane et al., 2007; Killingsworth and Gilbert, 2010), as when people relive past experiences, anticipate desired futures, consume fiction, or daydream. Thus, conscious thought focuses frequently on non-present information rather than on current actions. Furthermore, mental simulations incorporate both of the features of conscious thought discussed above. They comprise meaningful sequences of events, at times incorporating the types of thinking already mentioned (e.g., logical reasoning, quantification, causality). And they allow for inner crosstalk and conflict resolution (e.g., Morsella, 2005). A person can imagine the outcome of engaging in a certain behavior, and the various parts of the mind can access the simulation, objecting as problems arise. By mentally simulating positive and negative behaviors and outcomes, individuals can learn to perform or avoid them (e.g., Grouios, 1992).

We suggest the power of conscious thought is not in the direct control of action, as common views assume. Rather, its power lies in processing information from society and culture. It takes in information, and it combines it into meaningful mental simulations constructed according to cultural rules. These simulations can be used to determine optimal outcomes or to rehearse optimal ways of behaving. Thus, conscious thought allows individuals to translate information from culture into socially adaptive responses.

## **THE EXPERIMENTAL EVIDENCE FOR EFFECTS OF CONSCIOUS THOUGHT ON BEHAVIOR**

We recently reviewed the literature for evidence of conscious causation of behavior (Baumeister et al., 2011). Our review surveyed experiments in which conscious thoughts were manipulated by random assignment and effects on outward behavior were measured. By the logic of experimental design, such findings indicate that the conscious thought caused behavior. We identified many such phenomena, which had the following three themes.

## **INTEGRATION OF BEHAVIOR ACROSS TIME**

There are numerous influences of past and future reflections on behavior. People who reflect on and analyze the past can benefit. Some reflect on prior traumas to gain useful insights about them, thereby facilitating healthy recoveries (Pennebaker and Chung, 2007). Others analyze past actions to explore how they might have behaved differently, inviting lessons for achieving more desired outcomes later (Epstude and Roese, 2008). Alternatively, people who imagine or mentally relive the past can prolong prior mindsets rather than move beyond them (Lyubomirsky et al., 2006). Imagining the past preserves and even amplifies prior emotions and motivations, thereby affecting later behavior. For example, ruminating about a prior, anger-provoking event can amplify anger (Ray et al., 2008) and incite aggression (Bushman et al., 2005).

Thoughts of the future are also influential and have selfregulatory benefits (e.g., Schacter and Addis, 2007). Conscious thoughts facilitate goal attainment by allowing people to set plans for their goals (Gollwitzer, 1999) and energizing people toward desired, future outcomes (Taylor et al., 1998; Oettingen et al., 2001, 2009). Thoughts of the future can also sway behavior by exposing people to the potential consequences of their actions. For example, anticipation of regret can sway decisions (Tetlock and Boettger, 1994; Zeelenberg et al., 1996; Zeelenberg and Beattie, 1997).

## **CONSIDERATION OF SOCIAL AND CULTURAL FACTORS**

Conscious thought also enables people to connect with others. It enables perspective taking, which enhances social coordination and negotiation outcomes (Galinsky et al., 2008a,b). It also allows people to communicate effectively with others (Roßnagel, 2000), which promotes cooperation in groups (e.g., Dawes et al., 1977; Jorgenson and Papciak, 1981).

Conscious thought likewise allows people to modify their behaviors to adhere to group expectations, norms, and laws, usually to the benefit of both the individual and the group. When people think about and explain their actions, group decisions (Scholten et al., 2007) and joint negotiation outcomes improve (De Dreu et al., 2000), and interaction partners become more cooperative, less hostile, and more trusting (De Dreu et al., 2006). Even absent any specific interaction partners, conscious thought generally promotes doing what is morally right (Caruso and Gino, 2011; Amit and Greene, 2012).

### **SELECTION FROM AMONG MULTIPLE ALTERNATIVE OPTIONS**

We propose that conscious thought is particularly useful for allowing people to consider multiple possible actions or outcomes. This is evident in counterfactual thinking. People often cannot help but reflect on how they might have behaved differently in the past. Such thinking can inspire new, improved strategies for later behavior (Epstude and Roese, 2008).

Consideration of alternative actions is also apparent in selfregulation and decision making. Hofmann et al. (2009) noted that explicit preferences and automatic impulses are often in conflict, and that explicit preferences are likely to guide behavior when people are free to reflect. In contrast, when conscious reflection is hindered, people are more impulsive (Ward and Mann, 2000) and more likely to yield to external influences (Westling et al., 2006). Conscious thought thus promotes adopting non-automatic forms of responding.

Pursuit of alternative responses is evident as well in sports. In almost every popular sport, researchers have found that the mental rehearsal of motor skills is nearly as beneficial for performance as physical practice (Druckman and Swets, 1988; Driskell et al., 1994). Thus, conscious mental practice improves skilled performance.

Each of the above patterns suggests that conscious thought does indeed help cause behavior. In each case, the influence of conscious thought on action is mostly indirect. Conscious thought is generally not found to guide moment-to-moment actions. However, reflections on the past enable people to improve later behaviors, considerations of social or cultural information sway decisions in favor of more cooperative responses, and mental simulations of plans and skills can be used to reshape habits. These findings support the notion that conscious thought is slightly removed from present actions, but that it nevertheless provides influential input into behavior.

## **REFERENCES**


**CONCLUSION**

The past several decades of research in psychology have revealed some important limitations of conscious thought. Specifically, the findings suggest that conscious thought is not the direct controller of behavior that many assume it to be. We have argued nonetheless that it plays a crucial role in shaping human behavior. Our approach assumes that uniquely human capacities evolved to solve uniquely human challenges (e.g., Baumeister, 2005). Other animals interact with the physical environment (i.e., action control) without needing the capacity for conscious thought (Roberts, 2002). Humans, however, face the unique challenge of striving in society and culture (Baumeister, 2005). We think that it is precisely for that purpose that conscious thought developed.

In our review of the empirical research on conscious thought, we found numerous kinds of evidence in support of this view (Baumeister et al., 2011). The findings suggest that conscious thought affects behavior indirectly, by integrating information across time and from culture, so that multiple alternative behaviors—particularly socially adaptive ones—can be considered and an optimal action selected.

We conclude that most or all of human behavior is likely a product of conscious and unconscious processes working together. The private daydreams, fantasies, and counterfactual thoughts that pervade everyday life are far from being a feckless epiphenomenon. We see these processes as the place where the unconscious mind assembles ideas so as to reach new conclusions about how best to behave, or what outcomes to pursue or avoid. Rather than directly controlling action, conscious thought provides the input from these kinds of mental simulations to the executive. Conscious thought offers insights about the past and future, socially shared information, and cultural rules. Without it, the complex forms of social and cultural coordination that define human life would not be possible.


*Soc. Psychol.* 90, 927–943. doi: 10.1037/0022-3514.90.6.927


*Soc. Psychol.* 80, 736–753. doi: 10.1037/0022-3514.80.5.736


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 June 2013; paper pending published: 25 June 2013; accepted: 09 July 2013; published online: 26 July 2013. Citation: Masicampo EJ and Baumeister RF (2013) Conscious thought does not guide moment-to-moment actions—it serves social and cultural functions. Front. Psychol. 4:478. doi: 10.3389/fpsyg. 2013.00478*

*This article was submitted to Frontiers in Cognition, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Masicampo and Baumeister. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*