# AWARENESS SHAPING OR SHAPED BY PREDICTION AND POSTDICTION

EDITED BY: Yuki Yamada, Takahiro Kawabe and Makoto Miyazaki PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-532-9 DOI 10.3389/978-2-88919-532-9

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **AWARENESS SHAPING OR SHAPED BY PREDICTION AND POSTDICTION**

Topic Editors:

**Yuki Yamada,** Kyushu University, Japan **Takahiro Kawabe,** Nippon Telegraph and Telephone Corporation, Japan **Makoto Miyazaki,** Shizuoka University, Japan

Image taken from https://www.fotolia.com

We intuitively believe that we are aware of the external world as it is. Unfortunately, this is not entirely true. In fact, the capacity of our sensory system is too small to veridically perceive the world. To overcome this problem, the sensory system has to spatiotemporally integrate neural signals in order to interpret the external world. However, the spatiotemporal integration involves severe neural latencies. How does the sensory system keep up with the ever-changing external world? As later discussed, 'prediction' and 'postdiction' are essential keywords here.

For example, the sensory system uses temporally preceding events to predict subsequent events (e.g., Nijhawan, 1994; Kerzel, 2003; Hubbard, 2005) even when the preceding event is subliminally presented (Schmidt, 2000). Moreover, internal prediction modulates the perception of action outcomes (Bays et al., 2005;

Cardoso-Leite et al., 2010) and sense of agency (Wenke et al., 2010). Prediction is also an indispensable factor for movement planning and control (Kawato, 1999).

On the other hand, the sensory system also makes use of subsequent events to postdictively interpret a preceding event (e.g. Eagleman & Sejnowski, 2000; Enns, 2002; Khuu et al., 2010; Kawabe, 2011, 2012; Miyazaki et al., 2010; Ono & Kitazawa, 2011) and it's much the same even for infancy (Newman et al., 2008). Moreover, it has also been proposed that sense of agency stems not only from predictive processing but also from postdictive inference (Ebert & Wegner, 2011). The existence of postdictive processing is also supported by several neuroscience studies (Kamitani & Shimojo, 1999; Lau et al., 2007).

How prediction and postdiction shape awareness of the external world is an intriguing question. Prediction is involved with the encoding of incoming signals, whereas postdiction is related to a re-interpretation of already encoded signals. Given this perspective, prediction and postdiction may exist along a processing stream for a single external event. However, it is unclear whether, and if so how, prediction and postdiction interact with each other to shape awareness of the external world.

Awareness of the external world may also shape prediction and/or postdiction. It is plausible that awareness of the external world drives the prediction and postdiction of future and past appearances of the world. However, the literature provides little information about the role of awareness of the external world in prediction and postdiction.

This background propelled us to propose this research topic with the aim of offering a space for systematic discussion concerning the relationship between awareness, prediction and postdiction among researchers in broad research areas, such as psychology, psychophysics, neuroscience, cognitive science, philosophy, and so forth. We encouraged papers that address one or more of the following questions:


**Citation:** Yamada, Y., Kawabe, T., Miyazaki, M., eds. (2015). Awareness Shaping or Shaped by Prediction and Postdiction. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-532-9

# Table of Contents



Sung-en Chien, Fuminori Ono and Katsumi Watanabe

*68 Adaptation to implied tilt: extensive spatial extrapolation of orientation gradients*

Neil W. Roach and Ben S. Webb


Takuya Honda, Nobuhiro Hagura, Toshinori Yoshioka and Hiroshi Imamizu

*123 Neurobiological mechanisms behind the spatiotemporal illusions of awareness used for advocating prediction or postdiction* Talis Bachmann

*131 Do the flash-lag effect and representational momentum involve similar extrapolations?*

Timothy L. Hubbard

*137 Postdiction: its implications on visual awareness, hindsight, and sense of agency*

Shinsuke Shimojo

## Awareness shaping or shaped by prediction and postdiction: Editorial

#### *Yuki Yamada1 \*, Takahiro Kawabe2 and Makoto Miyazaki <sup>3</sup>*

*<sup>1</sup> Faculty of Arts and Science, Kyushu University, Fukuoka, Japan*

*<sup>2</sup> Human Information Science Laboratory, NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan*

*<sup>3</sup> Research Institute for Time Studies, Yamaguchi University, Yamaguchi, Japan*

*\*Correspondence: yamadayuk@gmail.com*

#### *Edited by:*

*Morten Overgaard, Aarhus University, Denmark*

#### *Reviewed by:*

*Kristian Sandberg, Aarhus University Hospital, Denmark*

**Keywords: consciousness, vision, audition, touch, motor control, action, motion perception**

Our conscious experience of the external world and/or our body states is quite rich. For example, we see the red color of a ripe apple, hear the sound of a stream, and feel the smoothness of silk by touch. In addition to the external world, we consciously experience the movement and states of our body. We intuitively believe that we are aware of all the events that occur in the external world, and that we control our body movements at will. From a scientific point of view, however, this is not true. Because of capacity limitations in neural processing, the brain can handle only a limited amount of information at once, and hence we experience just a fraction of available sensory inputs (e.g., change blindness: Rensink et al., 1997). The selected information does not necessarily shape our conscious experience as-is. To generate coherent perceptual representations of the external world/our body, the spatiotemporal integration and organization of the selected information is necessary.

However, neural processing in the brain inevitably takes a certain amount of physical time. Thus, this neural processing time should cause delays in our conscious experience from the actual transition of the external world/our body states. However, in general, we do not experience such temporal lags. One possibility is that the brain compensates for the lag and keeps up with the transition. How does the brain accomplish this seemingly difficult task?

Here we focus on the two strategies that the brain seems to adopt: "prediction," which is the expectancy of an event that will arise in the future, and "postdiction," which is a process that retrospectively interprets an event based on information available after the event (e.g., backward referral in Libet et al., 1979). How these two processes contribute to the generation of conscious experience has been an important question to date. Moreover, it is an intriguing question as to how these processes, prediction and postdiction, interact with each other in shaping conscious experience.

The present research topic aims at contributing to the understanding of the neural and psychological mechanisms underlying the generation of conscious experience. To this end, we collected the latest research focusing on the role of the temporal aspects of neural processing, such as prediction and postdiction, in shaping conscious experience. Additionally, we called the latest studies investigating the relation between conscious experience and spatial perception/sensorimotor factors. We present a brief overview of the research that this research topic includes.

First, the present research topic contains studies about the interaction between prediction and postdiction. Lenkic and Enns (2013) investigated the importance of both predictive and postdictive mechanisms in determining a target's shape visibility in an apparent motion sequence, and demonstrated that the postdictive influence was stronger than the predictive one. Hidaka and Nagai (2013) showed that a visual target in apparent motion was mislocalized by the offset signals of the target, and suggested that motion and position information are integrated in a postdictive manner. Vaughn and Eagleman (2013) showed that the Hering illusion was induced by radial optic flow in both predictive and postdictive ("peri-dictive") manners, and discussed how the spatial warping counteracts processing lags. These studies psychologically suggest that conscious experience is generated by the temporal integration of sensory inputs. In addition, Goldreich and Tong (2013) provided a computational model that incorporates prediction and postdiction, which can broadly explain the cutaneous rabbit illusion and its related phenomena. The interaction between prediction and postdiction is not confined to the processing of a single modality, but rather extends to multiple modalities; e.g., Chien et al. (2013) showed that the perceived offset position of a moving object was modulated by temporally preceding/trailing sounds.

Integrating sensory signals across space as well as time is also an important component in generating our conscious experience. Roach and Webb (2013) showed that a tilt aftereffect induced by an implied orientation structure occurred even when the fringe of an occluded area was surrounded by a random orientation texture, suggesting integration of orientation gradients within extensive visual space.

This research topic includes reports that investigate the sensorimotor aspects of conscious experience. Synofzik et al. (2013) hypothesized that the sense of agency is established based on a complex interactive mechanism consisting of predictive and postdictive cues at sensorimotor, cognitive and affective levels. Sonoda et al. (2013) discussed the emergent nature of the sense of agency in terms of the observational heterarchical model. Ichikawa and Masakura (2013) showed that the flash-lag effect in the luminance dimension was modulated, depending on the sense of agency of manual control of the target's luminance change. It is intriguing to interpret this finding in the light of Synofzik et al.'s and Sonoda et al.'s models. Additionally, Higuchi (2013) reviewed behavioral studies regarding the anticipatory (i.e., predictive) nature of human locomotion. This review showed that visual information plays a critical role in modifying locomotor actions in an anticipatory manner in response to altered environmental properties. Honda et al. (2013) demonstrated that object-mass overestimation based on visual feedback delay (Di Luca et al., 2011) is determined by prediction errors in feedback timing rather than actual delays in visual feedback, suggesting that predictive mechanisms are involved in shaping awareness of object-masses.

Other theoretical considerations were also made. Bachmann (2012) provided a framework based on his perceptual retouch theory (e.g., Bachmann, 1984) in which interactions within and between stimulus-specific and non-specific processes in binding systems form conscious perception. In a review of Hubbard (2013), representational momentum was compared with the flash-lag effect in detail in terms of an extrapolation mechanism. Shimojo (2014) provided an extensive review on postdiction, encompassing sensorimotor, memory, and cognitive phenomena. The review has implications for underlying psychological and neural mechanisms and for explanations of real-world examples of postdiction.

As outlined above, a total of 14 articles written by 37 expert researchers across broad research areas discussed this topic from a variety of perspectives. We believe that these articles give researchers profound insights into how prediction and postdiction involve awareness of the external world and body states, and that the frameworks and findings provided here will serve to open up new avenues for future research.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 January 2015; accepted: 02 February 2015; published online: 18 February 2015.*

*Citation: Yamada Y, Kawabe T and Miyazaki M (2015) Awareness shaping or shaped by prediction and postdiction: Editorial. Front. Psychol. 6:166. doi: 10.3389/fpsyg. 2015.00166*

*This article was submitted to Consciousness Research, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Yamada, Kawabe and Miyazaki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Apparent motion can impair and enhance target visibility: the role of shape in predicting and postdicting object continuity

## **Peter J. Lenkic and James T. Enns\***

Department of Psychology, University of British Columbia, Vancouver, BC, Canada

#### **Edited by:**

Yuki Yamada, Yamaguchi University, Japan

#### **Reviewed by:**

Giorgio Marchetti, Mind, Consciousness, and Language, Italy David Souto, University of Geneva, Switzerland

#### **\*Correspondence:**

James T. Enns, Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, Canada V6T 1Z4. e-mail: jenns@psych.ubc.ca

Some previous studies have reported that the visibility of a target in the path of an apparent motion sequence is impaired; other studies have reported that it is facilitated. Here we test whether the relation of shape similarity between the inducing and target stimuli has an influence on visibility. Reasoning from a theoretical framework in which there are both predictive and postdictive influences on shape perception, we report experiments involving three-frame apparent motion sequences. In these experiments, we systematically varied the congruence between target shapes and contextual shapes (preceding and following). Experiment 1 established the baseline visibility of the target, when it was presented in isolation and when it was preceded or followed by a single contextual shape. This set the stage for Experiment 2, where the shape congruence between the target and both contextual shapes was varied orthogonally. The results showed a remarkable degree of synergy between predictive and postdictive influences, allowing a backward-masked shape that was almost invisible when presented in isolation to be discriminated with a d <sup>0</sup> of 2 when either of the contextual shapes are congruent. In Experiment 3 participants performed a shape-feature detection task with the same stimuli, with the results indicating that the predictive and postdictive effects were now absent. This finding confirms that shape congruence effects on visibility are specific to shape perception and are not due to either general alerting effects for objects in the path of a motion signal nor to low-level perceptual filling-in.

**Keywords: visual masking, apparent motion, shape perception, prediction, postdiction**

## **INTRODUCTION**

When two stimuli are presented in close spatio-temporal proximity we experience a single object in motion. Although such *apparent motion* is experienced without effort by the viewer, it is only achieved after a number of complex problems have been solved. These include problems of image correspondence (Ramachandran and Anstis, 1986), the relative spatial position of elements (Nijhawan, 1994; Eagleman and Sejnowski, 2000;Krekelberg and Lappe, 2000), and visual masking of one stimulus by the other (Breitmeyer and Ogmen, 2000, 2006; Enns and Di Lollo, 2000). One might reasonably predict from these challenges that a stimulus in motion would be seen less accurately than a static stimulus of similar duration and size. In the present paper, we demonstrate that visibility can sometimes be impaired and at other times enhanced by the relations between stimuli making up the perceptual object in an apparent motion sequence.

## **EVIDENCE FOR PREDICTION AND POSTDICTION IN PERCEPTION**

The role of prediction is emphasized in recent theories of spatiotemporal processing (Nijhawan, 1994; Enns and Lleras, 2008; Mathewson et al., 2010; Roach et al., 2011). As one example of a study of motion predictability on target visibility, Schwiedrzik et al. (2007) presented a target within various phases of the upand-down motion path of a secondary stimulus and reported that target visibility was especially reduced when the target coincided with the middle portion of the motion path. In contrast, visibility was increased for targets at the end-points of the path, and when there was only a single preceding motion stimulus or a single following motion stimulus. Schwiedrzik et al. (2007) referred to this impairment as "motion masking," in keeping with the earlier use of this term by Yantis and Nakama (1998). Similar results have also been been reported by Hidaka et al. (2011, 2012), Khuu et al. (2010), and Souto and Johnston (2012).

In another study, Roach et al. (2011) presented pairs of inducer stimuli to the left and right of central fixation, oscillating up-anddown over several cycles. A target Gabor patch was presented in the path of one of these inducers, and its timing adjusted so that it appeared either at the end of the motion sequence or the beginning. The target was also presented either in or out of spatial phase with the inducer. The participant's task was to report whether the target appeared to the left or right of fixation. The results indicated that target visibility was lowest when the inducing stimuli moved away from the target location and it was highest when it was predictable from both the temporal and spatial phase of the inducer. Thus, contrary to Schwiedrzik et al. (2007), motion predictability was a *benefit* to target visibility in this task, not an impairment.

Prediction, orforward-going expectations, are only part of what occurs in a motion sequence. Postdiction, or a revisionist history of what has just occurred, also influences the visibility of a target in motion (Di Lollo et al., 2000; Eagleman and Sejnowski, 2000; Lleras and Moore, 2003; see also Kolers and Pomerantz, 1971; Kolers and von Grunau, 1976). The theoretical mechanism for these influences is often referred to as *object updating*, because the visual system seems to give a revisionist interpretation specifically to perceptual objects, not to the image as a whole (see review by Enns et al., 2009). That is, there is a powerful bias to interpret changes to a scene as the consequence of a single object in motion, rather than as the sudden appearance of unexpected new objects, or as the consequence of a moving background in the context of a stationary single object. This bias offers heuristic benefits to a visual system faced with chaotic input, but at the same time it incurs a cost in certain conditions. The cost is that target features seen at point A in time may be overwritten and rendered less visible, or even invisible, by the target features presented at point B. This is the main idea behind what has come to be called *object substitution* masking (e.g., Di Lollo et al., 2000; Lleras and Moore, 2003; Moore and Lleras, 2005; Enns, 2008).

## **THE ROLE OF SHAPE**

At what level of representation are the predictive and postdictive mechanisms at work when interpreting an object in motion? Extant theories of how motion relates to target visibility have been described as falling into three camps (Souto and Johnston, 2012). In one camp are researchers who give their participants a detection task (i.e., reporting whether a stimulus is present or absent along a motion path), thereby emphasizing image-level processes. For example, Hidaka et al. (2011) showed that motion path predictability lead to a decrement in target detection, and they conclude that motion masking is the result of an early visual interaction between a physical stimulus (the target) and an illusory percept (the interpolated motion path between stimulus inducers). Souto and Johnston (2012) expanded on this idea, reporting that motion masking depended on the targets and inducers sharing the same isoluminant colors. In a second camp, researchers have demonstrated that object-level competition between inducers and target also plays a role in motion masking (Yantis and Nakama, 1998; Liu et al., 2004). These authors demonstrate that more than detection-level processes are involved by giving their participants shape-discrimination tasks. In a third camp, Schwiedrzik et al. (2007) and Roach et al. (2011) go a step further, by arguing that when masking is attenuated by motion path consistency,it demonstrates the role of predictive processes at play, over, and above an object-level competition between stimuli.

Although Schwiedrzik et al. (2007) andRoach et al. (2011)show that predictable targets can attenuate masking (i.e., reduce the visibility impairment caused by motion), they do not examine the role of shape consistency between stimuli and inducers, focusing only on spatio-temporal consistency. To be fair, Schwiedrzik et al. (2007) discuss the possibility that the shape dissimilarity between the stimuli in motion and the target may have played a role in the impairments that they and Yantis and Nakama (1998) reported. This way of thinking also raises the possibility that the predictive benefits of Roach et al. (2011) may have occurred because of the greater similarity between inducing and target shapes in their study.

Here we focus on the role of *shape continuity* in the visibility of a target in an apparent motion sequence. Specifically, we compare the influences that arise from forward-acting (predictive) processes with those that derive from backward-acting (postdictive) processes (see also Hogendoorn et al., 2008). If we find that both processes are at work, we can then ask questions about their relative magnitude and whether they combine in an additive way (indicating independent processes) or interactively (pointing to synergistic processes).

It may also be important to distinguish between previous studies in which the target stimulus was unrelated to the motion inducing stimulus (e.g., Yantis and Nakama, 1998; Khuu et al., 2010), offering greater opportunity for masking, versus those in which the target stimulus was a component of the motion inducing stimulus (e.g., Hidaka et al., 2011). As such, we begin with a study in which the target to be perceived is itself part of the motion sequence.

To address these questions, we designed a target discrimination task in which the effects of a preceding shape and a following shape could be evaluated, first independently (Experiment 1), and then jointly (Experiment 2). We did this by varying the motion congruence between the central target shape and the contextual shapes (preceding, following). To anticipate the results, we report strong predictive and postdictive influences on target visibility, along with a great deal of synergy between these influences.

In a final experiment (Experiment 3) we replicated the essential stimulus conditions of Experiment 2, but asked participants to perform a shape-feature *detection* task (presence versus absence) rather than a shape-*discrimination* task. This serves as an important control for the idea that predictive and postdictive processes specific to *shape* perception are influencing target visibility, as opposed to more primitive alerting process or imagelevel processes that boost the gain of all signals in the path. If the processes we are studying are shape specific, we anticipate that continuity in apparent motion will not have the same effect on a target detection task. And again, to anticipate the results, that is what we find.

## **EXPERIMENT 1: BASELINE VISIBILITY**

To set the stage for a study of target visibility in the context of a three-frame motion sequence, we first compared the visibility of a target shape in isolation, with the visibility of a target either preceded or followed by a single shape. The spatial layout and temporal sequence is illustrated in **Figure 1**. We also varied the orientation of the preceding and following shapes, so that they were congruent or incongruent with the target. Three additional factors were varied to increase the generality of the findings and to minimize the possibility of strategic factors influencing the results. First, to ensure that target visibility would be measured at more than one level, we varied whether or not a pattern mask was presented immediately after the target and in the same spatial position (Breitmeyer and Ogmen, 2006). Second, we varied the spatial proximity between neighboring shapes at two levels, as this is often a critical factor in target visibility (Breitmeyer and Ogmen, 2006). Finally, the shapes were presented randomly to the right or left of fixation, and motion sequences were also either to the left or the right, so that observers were unable to predict where

the shapes would appear and in what context (Enns and Di Lollo, 1997, 2000).

Participants were asked to report the location of a notch in each target shape, which could be either on the right or left side. Note that this task is immune from any decision-based biases arising from the orientation of the preceding or following shapes, or from the relation between these shapes and the target (congruent versus incongruent), since the only shape with a notch was the target, and the notch was equally often on the right or the left of this shape, independent of all other factors.

## **METHOD**

#### **Participants**

Fifteen university students participated in a 1-h session for extracourse credit or a \$10 payment. All participants had normal or corrected-to-normal vision and were treated according to APA ethical guidelines as administered by the University of British Columbia.

## **Stimuli and apparatus**

Rectangular gray shapes (gray level = 62%) were presented on an LCD monitor with a refresh rate of 60 Hz. The shapes subtended 2.5˚ × 1˚ of visual angle, were slanted either 45˚ or 135˚ from vertical (i.e., they had a positive or negative slant, see **Figure 1A**), and were presented on a white background. The pattern masks consisted of six rectangular shapes, as illustrated in **Figure 1A**, each oriented to differ slightly from the cardinal directions of vertical, horizontal, and oblique. This pattern subtended 2.5˚ × 2.5˚ of visual angle. The target shape had a semicircular notch on one side. A fixation cross was centered horizontally on the screen, but positioned 5.5˚ below the vertical center, so that the shapes were presented above fixation.

The contextual shape that preceded or followed the target shape on some trials was identical to the target in size and luminance, but it did not have a notch, and it was spatially separated by a centerto-center distance of either 2.5˚ (near proximity condition) or 6.5˚ (far). The target was always presented 10.5˚ from the fixation point, but randomly to the left or right, with a positive or negative slant and with a notch randomly removed from its left or right side. The orientation of the preceding and following shapes was either congruent or incongruent with a linear motion trajectory.

The temporal sequence of events is illustrated in **Figure 1B**, with the target shape and preceding or following shape (when either was present) appearing 100 ms apart (stimulus onset asynchrony). The target had a duration of 33 ms, as did the mask, when present, and the target and mask were separated by an interval of 33 ms.

## **Procedure**

Participants were seated with their eyes 57 cm from the display screen. They were instructed to maintain gaze on the cross in the bottom of the screen, using their peripheral vision to view the shapes. They were introduced to the task with 10 practice trials with much longer display durations and received feedback on each trial (the words "correct" or "incorrect" appeared at fixation), and the experimenter monitored this feedback during the practice trials and provided further verbal instruction when necessary to ensure they understood the task.

Each trial began with a variable onset interval (1400–2200 ms, in 200 ms steps) that began after the participant's previous response. Participants registered their responses with one of two keys ("w" or "o") and visual feedback consisting of a green or red colored text message at fixation indicated whether their response was "correct" or "incorrect," respectively. Trials were presented in a random order, with equal representation of the three conditions (alone, preceding, and following) × 2 notch locations × 2 target orientations × 2 mask conditions. Among the preceding and following conditions, trials were further divided among congruent and incongruent shape relations and close and far proximity conditions. Participants completed a total of 768 trials, divided into eight blocks of 96 trials, with self-paced breaks between blocks.

## **Data analyses**

In order to convert responses into hits and false-alarm rates that are amenable to a signal detection analysis, the proportion of left responses to left-notched targets were counted as hits and the proportion of right responses to left-notched targets were counted as false-alarms, for each participant. These rates were then used to calculate *d* 0 , a measure of sensitivity unaffected by response bias. Because proportions of 0 or 1 cause *d* 0 to take on a value of infinity, hit, or false-alarm rates with these values were replaced with values of 0.01 and 0.99, respectively (MacMillan and Creelman, 1991), which placed a ceiling on *d* <sup>0</sup> of 4.46.

#### **RESULTS**

**Figure 2** shows target visibility in Experiment 1. Masking was clearly effective in reducing overall visibility, as the mean *d* <sup>0</sup> was 3 with no mask and less than 1 with the mask. Shape congruency also played a large role in target visibility: congruent shape sequences resulted in larger *d* 0 values than incongruent sequences at both levels of masking. The temporal order of the contextual shape also played a large role, with a preceding shape having less of an influence on target visibility than a following shape. Most important, the influence of shape congruence on visibility was greater for following shapes than preceding shapes, with an incongruent-following shape reducing visibility in the no-mask condition (*d* <sup>0</sup> = 1.01) near the baseline level in the masking condition (*d* <sup>0</sup> = 0.79), and in the mask condition reducing visibility to a *d* <sup>0</sup> of near zero (*d* <sup>0</sup> = 0.21). Contextual shapes that were near in proximity to the target generally led to lower levels of visibility (*d* <sup>0</sup> = 1.64) than contextual shapes that were farther away (*d* <sup>0</sup> = 1.85). These observations were supported by the following statistical analyses.

A repeated measures ANOVA examined the factors of temporal order (2) × congruency (2) × masking (2) × proximity (2). All main effects were significant: temporal order [*F*(1,14) = 19.17, *p* = 0.00063], congruence [*F*(1,14) = 105.26], mask [*F*(1,14) = 369.07], and proximity [*F*(1,14) = 6.52, *p* = 0.023], as were the two-way interactions of temporal order × congruence [*F*(1,14) = 65.40], temporal order × proximity [*F*(1,14) = 9.50 *p* = 0.0081], temporal order × mask [*F*(1,14) = 17.28,*p* = 0.00097], mask × congruence [*F*(1,14) = 59.57], and mask × proximity [*F*(1,14) = 5.03, *p* = 0.042]. The only significant three-way interactions were temporal order × congruence × mask [*F*(1,14) = 18.09, *p* = 0.00080] and congruence × mask × proximity [*F*(1,14) = 5.37, *p* = 0.036]. All other effects were not significant (*p*s > 0.094).

Simple effect tests on the critical temporal order × congruence interaction indicated that, although the congruency effect was much greater in the following than preceding condition, congruent shapes were nonetheless more visible than incongruent shapes in both conditions: [*F*(1,14) = 234.70] and [*F*(1,14) = 15.08, *p* = 0.0017], respectively.

Additional comparisons tested whether target visibility in the preceding and following shape conditions was improved or impaired relative to the target presented alone. The asterisks in **Figure 2** indicate which of these comparisons were significant, based on a Bonferroni-adjusted family wise alpha of *p* < 0.05. With no mask, only the two incongruent conditions resulted in significant reductions in visibility, preceding [*F*(1,14) = 16.53, *p* = 0.0012] and following [*F*(1,14) = 190.86, *p* < 0.0001]. When the mask was present only the following incongruent condition showed a significant visibility reduction [*F*(1,14) = 21.15, *p* = 0.0004].

**FIGURE 2 | Visibility of the target in Experiment 1, as indexed by d .** Error bars represent ±1 SEM. The asterisks indicate those conditions in which target visibility was significantly reduced relative to the target alone condition.

### **DISCUSSION**

These results establish an important baseline for us to explore how prediction and postdiction combine in their influence when a target is seen in the context of a larger motion sequence. In summary, the results show that shape congruence in a motion sequence plays a critical role in influencing the visibility of a target shape, such that when the shapes are congruent, visibility is similar to when the same target is presented briefly in isolation. However, when the shapes are incongruent there is a serious reduction in visibility, with this reduction being much greater for an incongruent shape that follows the target (postdiction based on the incongruent shape impairs visibility) than for an incongruent shape that precedes it (prediction based on an incongruent shape has little consequence).

These results are broadly consistent with previous reports of motion masking (Yantis and Nakama, 1998; Schwiedrzik et al., 2007; Hogendoorn et al., 2008), in that placing a target in a motion sequence can be detrimental to its visibility under some conditions (e.g., when following shapes are incongruent). These results are also consistent with previous reports that backward masking of shape is generally more detrimental to visibility than forward masking (Breitmeyer and Ogmen, 2006). Finally, they are consistent with object updating theory (Enns et al., 2009), which proposes that human vision is biased to process a spatio-temporal sequence of stimuli as the same object translating in space-time. To the extent that this bias is supported by a spatio-temporally consistent motion display (here the congruent condition), the visibility of a target shape in an apparent motion sequence is not impaired.

## **EXPERIMENT 2: VISIBILITY IN AN APPARENT MOTION SEQUENCE**

In this experiment we measured the visibility of a target shape in a three-frame apparent motion sequence, while varying whether the preceding and following shapes were congruent or incongruent with the overall motion trajectory. By comparing these data with those in Experiment 1, we were able to gage the extent to which congruency in the two contextual shapes made additive or synergistic contributions to target visibility.

## **METHOD**

The methods were identical in this experiment to the previous one, with the exception that the participants were 15 different university students and all of the displays now had both preceding and following contextual shapes in addition to the target. These shapes could be independently congruent or incongruent with overall motion trajectory, as illustrated in **Figure 3**. The target was always congruent with the overall motion trajectory. Participants again completed a total of 768 trials, divided into eight blocks of 96 trials, with self-paced breaks between blocks.

## **RESULTS**

**Figure 4** shows the target visibility in Experiment 2. As in the previous experiment, backward masking was effective in reducing overall visibility of the target. Shape congruence also provided a significant benefit to target visibility. One important new finding was observed in the backward masking condition (right panel of **Figure 4**). Here the target shapes in the three-frame motion sequence were now even *more visible* than when the same target shape was presented in isolation.

A second important finding was that the effects of preceding and following shapes were synergistic. Specifically, congruent

contextual shapes preceding or following target shapes were both beneficial to target visibility, but the consequences of sandwiching the target shape between two incongruent shapes was catastrophic to its visibility. Even without a backward pattern mask to reduce visibility (left panel in **Figure 4**), two incongruent context shapes reduced target visibility to levels similar to that of a solitary target followed by a pattern mask. In the masking condition (right panel), two incongruent context shapes again reduced visibility to that same low level.

or increased relative to the target alone condition.

A third finding was that the detrimental effects of backward pattern masking on target visibility were largely overcome by placing the target into a three-frame sequence of apparent motion. In contrast to the baseline influence of backward masking, which was about 3 *d* <sup>0</sup> units when a target was presented in isolation (compare target alone visibility for no masking versus masking in **Figure 4**), backward masking was less than a 1 *d* <sup>0</sup> unit effect when either the preceding or following shape was congruent in a motion sequence (compare target visibility for congruent shapes in the no masking versus masking conditions in **Figure 4**).

Finally, as in Experiment 1, contextual shapes that were near in proximity to the target generally led to lower levels of visibility (*d* <sup>0</sup> = 1.91) than contextual shapes that were farther away (*d* <sup>0</sup> = 2.06). These observations were supported by the following statistical analyses.

A four-way repeated measures ANOVA was conducted with the following factors: 2 preceding shape congruence × 2 following shape congruence × 2 mask × 2 proximity conditions. Target visibility was higher when the preceding shape was congruent than when it was incongruent [*F*(1,14) = 32.28, *p* = 0.000057], and it was higher when the following shape was congruent than

when it was incongruent [*F*(1,14) = 40.19, *p* = 0.00018]. Backward masking reduced target visibility [*F*(1,14) = 92.73], and close proximity was marginally significant in reducing target visibility [*F*(1,14) = 3.96, *p* = 0.066]. The two-way interaction of preceding shape congruence × following shape congruence was significant [*F*(1,14) = 27.97, *p* = 0.00011], as was the four-way interaction of all factors combined [*F*(1,14) = 5.87, *p* = 0.030]. Bonferroni tests (family wise alpha = 0.05) of the interaction indicated that target visibility in all four congruency conditions was lower than the single target baseline when there was no mask. However, when there was a backward pattern mask, target visibility in three of the four congruency conditions was now significantly *greater* than the single target baseline. Only when the target was placed between two incongruent shapes was target visibility not improved over that of a single target.

A comparison of the effects of backward pattern masking on the single target condition (Experiment 1) with the motion sequence conditions (Experiment 2) indicated that backward masking was more detrimental to single target visibility than it was to each of the four motion conditions formed by combining preceding congruence with following congruence,in the order shown in **Figure 4** [*t*(28) = 10.96, *t*(28) = 11.34, *t*(28) = 9.40, and *t*(28) = 11.91].

#### **DISCUSSION**

These results indicate that an apparent motion sequence has both detrimental and beneficial effects on the visibility of a target shape embedded in the sequence. In comparison to a target shape presented briefly in isolation, placing the target in the center of a three-frame motion sequence reduces its visibility somewhat (less than 1 *d* <sup>0</sup> unit). However, this reduction is greater when the following contextual shape is incongruent with the motion trajectory implied by all three shapes, and it is even greater when both contextual shapes are incongruent with this trajectory. This latter finding is consistent with Yantis and Yakama's (1998) previous reports of motion masking, in which they found significant reductions in letter visibility within the motion path of two circle stimuli, which were highly dissimilar in shape since the circles contained only curved edges whereas the letters consisted solely of straight lines.

The truly novel result of this study is the benefit that occurs for target visibility in the context of backward pattern masking. Here the results show that in comparison to a target shape presented briefly in isolation and then masked, placing the target in the center of a three-frame motion sequence increases its visibility quite significantly (more than 1 *d* <sup>0</sup> unit). This finding runs counter to some previous reports of motion masking (Yantis and Nakama, 1998; Schwiedrzik et al., 2007; Hogendoorn et al., 2008). However, this finding is consistent with theories based on the constructs of prediction and postdiction in motion processing, including the RECOD model (Breitmeyer and Ogmen, 2000) and object updating theory (Enns et al., 2009). Consistent with these theories, when a target shape is embedded within a motion path that allows for prediction and postdiction based on shape, a target shape can become more visible than it would otherwise be.

What are we to make of the finding that motion contributed to an enhancement of target visibility in the masking condition, but not in the no-masking condition? One possibility is that this reflected a ceiling effect. If so, then participants were already

discriminating shapes at a near optimal level in the no-mask solitary target conditions, with no room for improvement. As such, the enhancement in visibility deriving from a shape-consistent motion trajectory was measurable until overall visibility had been degraded with a backward pattern mask.

Another possibility is that the visibility benefit (relative to a single target) only occurs under backward masking conditions because the shape-based predictions allow for the recovery of features in the target that have become suppressed by the backward mask. On this account, reentrant processes of object substitution make it difficult to access the original target features that have been substituted by the mask features (Di Lollo et al., 2000). The benefit of the congruent motion sequence is that this substitution process no longer occurs within the context of predictive motion. Indeed, one of the ways these mechanisms could play an active role in such a visibility benefit is through what Otto et al. (2006) refer to as "grouping-based feature inheritance." That is, because the target is perceived to be the same object as the inducers, merely at a different spatial-temporal location, the target feature (i.e., the notch) that would otherwise be backward-masked may actually be seen by participants as belonging to the following shape, which is not masked. Such feature migrations or inheritance effects have been documented in many previous studies of masking (Wilson and Johnson, 1985; Enns, 2002; Otto et al., 2006).

## **EXPERIMENT 3: SHAPE CONGRUENCY DOES NOT INFLUENCE TARGET DETECTION**

This experiment tested whether the influences of apparent motion on target shape visibility were specific to shape perception, or whether they applied to the mere detection of a stimulus. One reason for posing this question is because of mixed previous results in the motion masking literature. For example, although Kolers (1963) failed to find evidence of motion masking using a detection task, others reported motion masking effects using detection, identification, and discrimination tasks (Yantis and Nakama, 1998; Schwiedrzik et al., 2007; Hogendoorn et al., 2008; Hidaka et al., 2011). Moreover, Gellatly and colleagues (Gellatly et al., 2006; Pilling and Gellatly, 2009) and Hogendoorn et al. (2008) have both reported significant interactions of task and masking, with masking being much more effective on shape discrimination than on shape detection. These findings strongly hint that it is not only the detection of a shape's presence that is influenced by the motion trajectory, but rather it is the determination of the target's detailed shape characteristics that are affected.

## **METHOD**

The methods were identical to Experiment 2, with the exception that the participants were 15 different university students, the target shapes now had a notch on a random one-half of the trials, and only one proximity condition was tested (the far condition). The participant's task was to report whether the target shape had a notch (target present) or not (target absent). Participants again completed a total of 768 trials, divided into six blocks of 128 trials. The data were analyzed by counting correct reports of a notch as hits and counting reports of a notch on target absent trials as

A comparison of these results with Experiment 2 was conducted with a mixed ANOVA involving the between-groups factor of two tasks (discrimination, detection) and the within-subjects factors of two preceding shape congruence and two following shape congruence. Target visibility differed marginally according to task [*F*(1,28) = 4.19, *p* = 0.050], with the detection task showing lower target sensitivity than the discrimination task. Also, target visibility differed significantly according to preceding and following shape congruence [*F*(1,28) = 30.03, and *F*(1,28) = 40.00]; however, this effect was moderated by two-way interactions of task × preceding congruence [*F*(1,28) = 24.02], task × following congruence [*F*(1,28) = 29.02], and preceding × following congruence [*F*(1,28) = 11.06, *p* = 0.0025]. Finally, the three-way interaction of task × preceding × following congruence was significant [*F*(1,28) = 20.00, *p* = 0.00012]. This three-way interaction follows from the finding that the two-way preceding × following shape congruence interaction was significant for the discrimination task in Experiment 2, but not significant for the detection task of experiment 3.

We also conducted analyses examining the effect of masking across the different tasks. For this, we used a 2 × 2 ANOVA with mask as a within-groups factor and task (discrimination, detection) as a between-groups factor. This ANOVA showed a significant effect of mask [*F*(1,28) = 197.42], and significant interaction of task × mask. Follow-up simple main effect analyses revealed that the masking effect was significant for both the discrimination and detection tasks, but larger in the latter than the former [*d* <sup>0</sup> difference = 0.60, *F*(1,28) = 39.93; *d* <sup>0</sup> difference = 1.28, *F*(1,28) = 181.77].

## **DISCUSSION**

These results indicate that the shape congruence effect on motion masking in Experiments 1 and 2 is specific to the task of discriminating target shapes. It does not apply to merely detecting the presence or absence of a target feature in the motion sequence. While this is generally consistent with the report from Gellatly et al. (2006) that detection tasks are influenced less by backward masking than discrimination tasks, it also extends this finding to the consequences of contextual shapes in a motion sequence. That is, Experiments 2 and 3 taken together, show that contextual shape congruency has a strong influence on target visibility when the task is to discriminate among two possible shape possibilities, but that it has no influence when the task is merely to detect the presence of the shape's distinctive feature.

### **GENERAL DISCUSSION**

In this study we examined how the perception of a target's shape is influenced by its relation to the shapes that precede and follow it in an apparent motion sequence. In a first experiment, we established the baseline visibility of a target shape, both when it was presented in isolation and when it was preceded or followed by a single shape. The results showed a reduction in visibility when either the preceding or following shapes were incongruent, though this visibility impairment was greater when the incongruent shape was following rather than preceding. This finding is consistent with what many previous reports that it is more effective to mask a

a false alarm. *d* <sup>0</sup> Values were then calculated as in the previous experiments.

#### **RESULTS**

**Figure 5** shows the target visibility in Experiment 3. As in previous experiments, the backward pattern mask was effective in reducing the overall visibility of the target. Yet, unlike the discrimination task (Experiments 1 and 2), the congruency of the preceding and following shapes had no measurable influence on the detection task. Another noticeable difference between experiments was the reduction in *d* 0 in the no-mask condition. A comparison of **Figures 4** and **5** shows that target visibility as measured by the detection task is considerably reduced overall from that of the shape-discrimination task. These observations were supported by the following statistical analyses.

A three-way repeated measures ANOVA was conducted with the following factors: 2 preceding shape congruence × 2 following shape congruence × 2 mask conditions. The backward pattern mask reduced target visibility [*F*(1,14) = 149.49]. The only other significant effect was the interaction of mask × preceding shape congruence [*F*(1,14) = 6.47,*p* = 0.023; all other *p*s > 0.085]. Simple main effect follow-ups revealed that there was an effect of preceding shape congruence when the mask was present [*F*(1,14) = 5.79, *p* = 0.030] but not when the mask was absent [*F*(1,14) = 1.41, *p* = 0.25]. This suggests that a congruent preceding shape is able to help to detect a target which is followed by a backward mask, but the congruence of the preceding shape makes no difference in detecting an unmasked target.

target shape with a neighboring shape that follows rather than precedes the target (Enns and Di Lollo, 2000; Breitmeyer and Ogmen, 2006).

In a second experiment we studied the combined effects of preceding and following shapes. The novel result here was a considerable *benefit* for target visibility from a congruent three-frame motion sequence. The results indicated that in comparison to an isolated target shape, presented briefly, and backward masked, a target in the center of a three-frame motion sequence was increased in its visibility by more than 1 *d* <sup>0</sup> unit. This finding runs counter to some previous reports of motion masking (Yantis and Nakama, 1998; Schwiedrzik et al., 2007; Hogendoorn et al., 2008; Khuu et al., 2010; Hidaka et al., 2011), but is consistent with theories that appeal to the constructs of prediction and postdiction (Breitmeyer and Ogmen, 2006; Enns et al., 2009). Moreover, the present finding offers a resolution to the mixed results of previous research, which did not systematically study the role of *shape congruence* in motion masking phenomena. In contrast to those mixed results, the present findings suggest that *motion masking* (a visibility impairment) is most likely to occur when target and contextual shapes are different, and motion enhancement (a visibility benefit) is most likely to occur when target and contextual shapes are the same. This is because the contextual shapes influence target visibility through expectations (both predictive and postdictive) that are based on the available evidence about *shape* (Breitmeyer and Ogmen, 2000; Enns et al., 2009).

In the third experiment, when the participant's task was to merely detect the presence or absence of the target feature, without having to indicate its precise location, all shape congruency effects disappeared. This finding helps to confirm that the visibility effects measured in Experiments 1 and 2 were specific to binding shape features to precise locations in space, and were not reflecting more general mechanisms of arousal or alerting (Bachmann, 1984) nor of low-level perceptual filling-in (Hidaka et al., 2011; Souto and Johnston, 2012). Taken together, these results show that contextual shape congruency has a strong influence on target visibility when the task is to discriminate among two possible shape possibilities, but it has no influence when the task is merely to detect a target feature. This confirms that the prediction and postdiction processes evoked by the contextual shapes in these motion sequences were concerned with *shape*.

The results of this study also provide (1) a comparison of the relative magnitude of predictive and postdictive effects on shape perception and (2) an analysis of whether these effects were additive or interactive. With regard to the first question, the results from both Experiments 1 and 2 indicate that postdiction has a stronger influence than prediction. This is seen in the greater impairments associated with an incongruent-following shape than an incongruent preceding shape, both when there was only one of these shapes (**Figure 2**) and when both contextual shapes were considered in combination (**Figure 4**). From the perspective of object updating theory (Enns et al., 2009), this asymmetry is a consequence of the way vision handles the task of keeping track of an object in motion. That is, the default

interpretation that a sudden scene change is indicative of an object in motion biases the system to look for confirmatory evidence that the same shape features are now present in a new location. At the same time, unless attention has previously been focused on the specific features of the object, rather than simply its rough location, it will take some time to establish the appropriate links between the various features of the object and their locations in space. If during that time, the features have changed, the system may only have access to the target features currently on view. This leads to *object substitution* masking, which in the present study is expressed as target visibility that is especially reduced when the following shape is not a match for its shape features. As such, this is a consequence of our time-limited nervous systems, which is destined, by virtue of its slow processing speed, to be living "slightly in the past" (Eagleman and Sejnowski, 2000).

With regard to the second question, the data in Experiment 2 clearly point to an interactive (synergistic) pattern of influence for prediction and postdiction. That is, the combined *impairment* of having both preceding and following shapes be incongruent with the target was greater than could be predicted when only one of these shapes was incongruent on its own.

Importantly, this interaction was not a by-product of ceiling or floor effects on the accuracy measure, since the interaction occurred at two quite different levels of baseline visibility (compare the no mask and mask conditions in **Figure 4**). Such synergy is indicative of a single dynamic system, rather than of separate or dissociable mechanisms that combine their influences in a linear fashion.

Synergistic predictive and postdictive behavioral effects are also consistent with the neural feedback or recurrent neural activity that inspired theories of object updating (Breitmeyer and Ogmen, 2000; Moore et al., 2007; Enns et al., 2009). These theories are premised on conscious visual perception being the end product of a system containing neural projections that not only ascend the anatomical hierarchy, that is from regions of lower to higher-levels of representational complexity, but with neural connections that are horizontal (between regions with different specialization), and backward or reentrant to lower-level regions (Bullier et al., 1988; Felleman and Van Essen, 1991; Zeki, 1993). The conscious perception of a stimulus in these accounts is the result of the system reaching a stable state of resonance between the feedforward and reentrant signals. Recent evidence in support of this view comes from electrophysiological data from monkey (Fahrenfort et al., 2007) and from transcranial magnetic stimulation in humans (Ro et al., 2003; Hirose et al., 2005, 2007). For instance, Hirose et al. (2005, 2007) applied brief high-intensity magnetic pulses to the brain region MT/MT+ in human participants and reported that it disrupted masking and led to increased visibility of a target that would otherwise have been invisible. Notably for the present study, reentrant neural activity projecting from the MT cortex is also involved in motion perception, and thus may be the neural mechanism by which perception of a target in motion is influenced by signals generated by the contextual shapes surrounding a target shape (Liu et al., 2004; Muckli et al., 2005; Sterzer et al., 2006).

## **REFERENCES**


human visual cortex. *J. Cogn. Neurosci.* 19, 1488–1497.


upon a slow averaging process. *Vision Res.* 40, 201–215.


awareness in human occipital cortex. *Curr. Biol.* 13, 1038–1041.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 October 2012; accepted: 15 January 2013; published online: 01 February 2013.*

*Citation: Lenkic PJ and Enns JT (2013) Apparent motion can impair and enhance target visibility: the role of shape in predicting and postdicting object continuity. Front. Psychology 4:35. doi: 10.3389/fpsyg.2013.00035*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Lenkic and Enns. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Illusory motion and mislocalization of temporally offset target in apparent motion display

## **Souta Hidaka1,2\* and Masayoshi Nagai <sup>3</sup>**

<sup>1</sup> Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Miyagi, Japan

<sup>2</sup> Department of Psychology, Rikkyo University, Saitama, Japan

<sup>3</sup> National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan

#### **Edited by:**

Yuki Yamada, Yamaguchi University, Japan

#### **Reviewed by:**

Patrick Cavanagh, Université Paris Descartes, France Taiki Fukiage, The University of Tokyo, Japan

#### **\*Correspondence:**

Souta Hidaka, Department of Psychology, Rikkyo University, 1-2-26, Kitano, Niiza-shi, Saitama, Japan. e-mail: hidaka@rikkyo.ac.jp

When a visual target briefly appears in a display containing visual motion information, the perceived position of the target is mislocalized forward along its direction of motion. This phenomenon is assumed to be caused by the interaction between the transient onset signal of the target and motion information. However, while transient onset and offset signals are important for the establishment of our perceptual awareness, it has not been examined whether transient offset signals could be also effective for target mislocalization. Here, we demonstrate that shifts in perceived position occurred for a visual target containing a temporally transient offset signal in an apparent motion (AM) display. First, with horizontal AM, we found that illusory motion was perceived when a static target transiently and repeatedly blinked at a fixed position. The perceived direction of the illusory motion was in counter-phase with that of the AM stimuli. Further, we confirmed that illusory motion was frequently perceived when (1) the eccentricity of the target was larger, (2) offset duration was longer, and (3) smoother AM was perceived. Illusory motion was not perceived unless AM stimuli were presented after the offset signal, while illusory motion still occurred when the AM stimuli disappeared before the offset signal. In addition, we found that mislocalization of the target's perceived position actually occurred in a direction opposite to AM. These findings suggest that a transient offset signal could trigger perceptual mislocalization of static visual stimuli by interacting with motion information in a postdictive manner.

**Keywords: temporal offset, illusory motion perception, perceived mislocalization, apparent motion, postdiction**

## **INTRODUCTION**

When we focus on an object in a scene, we do not receive information solely about that object. Rather, our perceptual systems are strongly affected by the surrounding information and context around the objects we see. Not only form information (shape, texture, etc.) but also motion information influences the establishment of our perception/awareness. For example, motion information induces mislocalization of the perceived positions of objects: when a visual target briefly appears in a display containing visual motion information, the target's perceived position is mislocalized in the forward direction of motion in both continuous motion (Whitney and Cavanagh, 2000) and apparent motion (AM; Shim and Cavanagh, 2004) displays (Flash-drag effect, FDE). In cases when observers judge the relative positions of a visual target and moving stimuli, the target is perceived as being behind the moving stimuli when they are actually aligned (Flash-lag effect, FLE; MacKay, 1958; Nijhawan, 1994; Eagleman and Sejnowski, 2000). While some explanatory hypotheses have been suggested regarding these phenomena (see Whitney, 2002 for review), a recent model indicates that motion information consisting of both spatial and temporal information plays a key role (Eagleman and Sejnowski, 2007).

In addition to motion information, temporally transient onset signal of the target could be also important for triggering mislocalization. In fact, many studies have demonstrated that transient signals contribute substantially to our perceptual awareness (e.g., Kanai and Kamitani, 2003; Kawabe et al., 2007). It could be also notable that the position of a visual stimulus is relatively uncertain when the stimulus is presented briefly (appropriately 20 ms in many cases). Based on these characteristics, the mislocalization induced by motion information has mainly been demonstrated for targets containing a transient onset signal. However, both transient onset and offset signals are assumed to be involved in the establishment of our perceptual awareness. For example, Macknik and Livingstone (1998) investigated the relationship between forward/backward masking and neural responses. They found that in a forward masking situation in which the onset of a mask temporally preceded that of the target, the mask suppressed neural responses related to the target onset signal. In contrast, masks presented after the target suppressed neural responses to the target offset signal in the backward masking situation. Perceptual awareness of the target stimuli was inhibited equally in both situations. On the basis of those findings, we could hypothesize that transient offset signals would also interact with motion information and induce mislocalization of the perceived target position.

The aim of this study was to examine whether a target containing a temporally transient offset signal could be perceptually mislocalized by motion information. In this study, AM was introduced as motion information, because the quality and direction of a motion signal is easy to manipulate by simply modifying the spatiotemporal properties of the inducers of AM (Wertheimer, 1912; Korte, 1915; Kolers, 1972). In previous literature, targets with transient onset signals were presented as brief onset-offset signal. On the contrary, the present study presented transient offset signals as brief offset-onset signal (**Figure 1A**). In addition to physical differences in the presentation order of the transient signals (onset-first or offset-first), the phenomenological aspects of these signals should also differ. Transient onset signals could be a cue for the sudden appearance of an object within a scene. Thus, previous studies that have adapted transient onset signals have mainly investigated the effect of motion information on the initial positional encoding process of an object. In contrast, a transient offset signal would indicate the abrupt disappearance and reappearance of an object. Therefore, by focusing on the transient offset signal,

the current study could shed light on whether and how motion information could affect the re-encoding process of an object's positional information. In this case, an object's positional information might be compared before and after the transient offset of the object. Additionally, the target stimuli seemed to contain relatively certain positional information, because the duration of target presentation was longer than that of targets with temporally transient onset signals.

In a phenomenal observation, we found that shifts in perceived position occurred stronglyfor visual targets with temporally transient offset signals: illusory motion was perceived for a static target blinking at a fixed position with horizontal AM (Movie S1 in Supplementary Material). The perceived direction of illusory motion was in counter-phase with that of the AM stimuli. Our experiments further confirmed that illusory motion perception frequently occurred when (1) the eccentricity of the target was larger (Experiment 1), (2) offset duration was longer (Experiment 2), and (3) smoother AM was perceived (Experiment 3). Further, illusory motion was not perceived unless AM stimuli were presented after the offset signal, whereas it was perceived when the AM stimuli disappeared before the offset signal (Experiment 4). We further found that mislocalization of the perceived position of the target actually occurred in a direction opposite to AM (Experiments 5 and 6). These findings suggest that a transient offset signal could trigger the perceptual mislocalization of static visual stimuli by interacting with motion information in a postdictive manner.

## **EXPERIMENT 1**

In Experiment 1, we investigated the spatial aspect of illusory motion perception for temporally offset target stimuli presented in conjunction with AM stimuli (Movie S1 in Supplementary Material). We manipulated the vertical distances between a fixation point and the target and AM stimuli (i.e., eccentricity) and compared how frequently illusory motion perception occurred for target stimuli blinking at a fixed position.

## **MATERIAL AND METHODS**

#### **Participants and apparatus**

Written consent was obtained from each participant before the experiments were initiated. All experiments were approved by the local ethics committee of Tohoku University. One of the authors (Souta Hidaka) and three volunteers participated in the first experiment. The volunteers were naive to the purpose of this experiment. All had normal or corrected-to-normal vision. The visual stimuli were presented on a linearized CRT display (Sony Trinitron GDM-FW900, 2400) with a resolution of 1280 × 960 pixels and a refresh rate of 75 Hz. An Apple Power Mac G4 and MATLAB (MathWorks) with the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) were used to control the experiment. The participants placed their heads on a chin rest and reported their responses using the "1" (indicating static) or "3" (indicating moving) keys on a numeric keyboard.

## **Stimuli**

We presented white squares (59.98 cd/m<sup>2</sup> , 0.8˚ × 0.8˚) as target stimuli and inducers of AM against a gray background (29.98 cd/m<sup>2</sup> ; **Figure 1B**). The inducers and target were aligned horizontally, and the distance between the inducers was 8˚. The target was presented between the inducers, so that the distance between the target and inducers was 4˚. Two black rings (0.1 cd/m<sup>2</sup> ) were presented as a fixation at the center of the display. The fixation and target were aligned vertically. The vertical distance (eccentricity) between the fixation and the target/inducers was either 2˚, 4˚, 6˚, 8˚, or 10˚. The duration of the inducers was 80 ms, and the inter-stimulus interval (ISI) was 266 ms.

## **Procedure**

After the presentation of the fixation circles for 500 ms, the inducers were presented as shifting from either left to right or vice versa. In each trial, four AM sequences were presented in which AM stimuli were perceived as moving back and forth horizontally. In the Without-offset condition, the target was presented continuously during AM. In contrast, the target disappeared for 93 ms after the offset of the inducers and then reappeared after 80 ms of the target's disappearance in-between each AM sequence in the Offset condition (**Figure 1C**). The target was statically presented at the same fixed position in both conditions. The participants' task was to report whether or not they perceived the target as moving. The experiment consisted of 200 trials: Target offset (2) × Eccentricity (5) × Repetition (20). These conditions were randomly introduced in each trial and were counterbalanced across participants. The initial position of the inducers (left or right) was also randomized and counterbalanced among conditions and trials.

## **RESULTS AND DISCUSSION**

We plotted the proportion of trials in which the target was judged as moving during the presentation of AM sequences (**Figure 1D**). A two-way repeated-measures analysis of variance (ANOVA) was conducted with Target offset (2) × Eccentricity (5). This analysis revealed a significant interaction between the factors [*F*(4, 12) = 35.55, *p* < 0.001]. The simple main effects of Target offset revealed that the proportion in the Offset condition was higher than that in the Without-offset condition under 4˚, 6˚, 8˚, and 10˚ of eccentricity [*F*s(1, 15) > 17.17, *p*s < 0.001]. Regarding the

simple main effect of Eccentricity in the Offset condition [*F*(4, 24) = 75.24, *p* < 0.001], a *post hoc* test (Tukey's HSD) found that the proportion increased in correspondence with higher eccentricity (*p*s < 0.05).

The results suggested that the transient offset signal induced illusory motion perception of the target in an AM display. We also found that the proportion of illusory motion perception became greater with increased eccentricity. This would indicate that the target's offset signal interacts with AM information more efficiently under larger eccentricities. This idea echoes the fact that whereas the visibility and spatial uncertainty of stimuli decreases with increased retinal eccentricity, sensitivity to motion remains constant (Koenderink et al., 1985).

## **EXPERIMENT 1B**

In Experiment 1, the eccentricities were manipulated for both the target and AM stimuli. In order to test whether the illusory motion perception could occur for the temporally offset target, even when the target was presented outside of the AM trajectory, we only manipulated the eccentricities of the target while those of the AM stimuli were fixed (Movie S2 in Supplementary Material).

## **METHODS**

One of the authors (Souta Hidaka) and three volunteers participated in this experiment. We manipulated the vertical distance of the target as 0˚, 2˚, 4˚, 6˚, or 8˚ from AM stimuli, which were presented at a fixed position (2˚ of eccentricityfrom the fixation point) (**Figure 2A**). Except for this, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

## **RESULTS AND DISCUSSION**

Regarding the proportion of illusory motion perception (**Figure 2B**), a two-way repeated-measures ANOVA with Target offset (2) ×Vertical distances (5) revealed a significant interaction between the factors [*F*(4, 12) = 15.29, *p* < 0.001]. The simple main effects of Target offset revealed that the proportion of motion perception in the Offset condition was higher than that in the

Without-offset condition under 2˚, 4˚, 6˚, and 8˚ of vertical eccentricity [*F*s(1, 15) > 9.72, *p*s < 0.001]. Regarding the simple main effect of Vertical distance in the Offset condition [*F*(4, 24) = 30.87, *p* < 0.001], *post hoc* tests (Tukey's HSD) found that the proportions of motion perception under 4˚, 6˚, and 8˚ of vertical eccentricity were higher than those under 0˚ and 2˚ of eccentricity (*p*s < 0.05). These results indicate that illusory motion perception occurred for targets with temporally offset signals, even when the targets were located outside the trajectory of AM.

## **EXPERIMENT 2**

The purpose of Experiment 2 was to examine a temporal aspect of illusory motion perception of the temporally offset target. We investigated what offset duration was sufficient for the perception of illusory motion.

#### **METHODS**

One of the authors (Souta Hidaka) and three volunteers participated in this experiment. The volunteers were naive to the purpose of this experiment. All participants had normal or corrected-tonormal vision. In this experiment, the ISI of the inducers was 267 ms. The offset duration was either 0 (Without-offset), 27, 53, 80, 107, or 133 ms. The eccentricities of the target and AM stimuli were fixed at 8˚. The main session consisted of 120 trials: Offset duration (6) × Repetition (20). The order of the conditions was randomized and counterbalanced across trials and participants. Except for these differences, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

#### **RESULTS AND DISCUSSION**

Regarding the proportion of illusory motion perception (**Figure 3**), a one-way repeated-measures ANOVA revealed a significant main effect of Offset duration [*F*(5, 15) = 16.69, *p* < 0.001]. The *post hoc* test (*p* < 0.05) found that the proportion of motion perception at 27 ms offset was higher than that with 0 ms offset. Further, the proportions at 53, 80, 107, and 133 ms offset duration were higher than those in the other conditions.

The results showed that the target's temporally offset signal induced illusory motion perception more frequently with longer offset durations and that 53 ms of offset duration was sufficient to trigger illusory motion perception reliably.

## **EXPERIMENT 3**

The results of Section "Experiments 1 and 2" clearly showed that illusory motion perception occurred for targets with temporally offset signals in an AM display. Given that illusory motion perception occurred due to the interaction between motion information from AM stimuli and the target's temporally offset signal, we predicted that illusory motion perception would be directly related to AM perception. The perceived quality (smoothness or goodness) of AM could be experimentally altered by changes in ISI under a fixed distance between the target and inducers (Korte, 1915; Kolers, 1972). Thus, we examined the effects of the perceived motion quality of AM on illusory motion perception by manipulating the ISI of the inducers.

#### **METHODS**

One of the authors (Souta Hidaka) and three volunteers participated in this experiment. The volunteers were naive to the purpose of this experiment. All the participants had normal or corrected-to-normal vision. The ISI of the inducers was either 134, 186, 294, 506, 934, or 1786 ms. The eccentricities of the target and AM stimuli were fixed at 8˚. Only the Offset condition was presented. First, the participants completed a motion-judgment session wherein they were asked to judge whether or not the target was perceived as moving. This session consisted of 120 trials: ISI (6) × Repetition (20). In the subsequent motion-qualityjudgment session, we asked the participants to judge perceived motion quality (smoothness, goodness, etc.) of AM stimuli by using a five-point scale [from 1 (bad) to 5 (good)]. This session consisted of 60 trials: ISI (6) × Repetition (10). The conditions were randomly assigned and counterbalanced among the trials and participants. Except for these differences, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

#### **RESULTS AND DISCUSSION**

With regard to the obtained proportion of motion perception (**Figure 4A**), a one-way repeated-measures ANOVA found a significant main effect of ISI [*F*(5, 15) = 17.87, *p* < 0.001]. The *post hoc* tests (*p* < 0.05) revealed that the proportions with 294 and 506 ms ISI were higher than those for the other ISI values. We also calculated the correlation coefficient (Spearman's *r*) between the proportion of motion perception and perceived motion quality (**Figure 4B**). The estimated correlation was *r* = 0.83, which was statistically significant (*p* < 0.05, one-tailed).

The results showed that illusory motion perception selectively occurred with particular ISI values. Moreover, this tendency was highly related to the perceived motion quality of AM. Thus, we could consider illusory motion perception of the temporally offset target to be directly related to motion information.

#### **EXPERIMENT 4**

The results of Section "Experiments 1 and 3" suggested that the interaction between the target's temporally offset signal and AM

information could be an important factor of illusory motion perception. Since the previous experiments repeatedly presented AM sequences and temporally offset targets, it was uncertain whether the presentation of AM information before or after the offset signal – or both – primarily contributed to the perception of illusory motion. To test this, we introduced the absence of inducers before or after the presentation of the offset signal. We could predict that if AM information presented before the offset signal plays a key role, then illusory motion perception would not occur unless the inducer was presented before target offset. On the other hand, if AM information presented after target offset is critical, then illusory motion would not be perceived unless the inducer was presented after the offset.

## **METHODS**

One of the authors (Souta Hidaka) and three volunteers participated in this experiment. The volunteers were naive to the purpose of this experiment. All the participants had normal or corrected-to-normal vision. In each trial, four AM sequences were presented. In the Without-absence condition, the inducers were continuously presented in all the sequences. However, in the Absence-before-offset condition, the inducer was not presented before the target's temporal offset during the last AM sequence. On the contrary, in the Absence-after-offset condition, the inducer did not appear after target's temporal offset during the last AM sequence (**Figure 5A**). The eccentricities of the target and AM stimuli were fixed at 8˚. The participants completed 60 trials: Condition (3) × Repetition (20). The conditions were randomly assigned and counterbalanced across trials and participants. Except for these differences, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

## **RESULTS AND DISCUSSION**

A one-way repeated-measures ANOVA found a significant main effect of Condition [*F*(2, 8) = 8.43, *p* < 0.05; **Figure 5B**]. The *post hoc* tests (*p* < 0.05) revealed that the proportion of perceived motion in the Absence-after-offset condition was lower than that in the other conditions.

The results showed that the proportion of illusory motion perception was reduced when the inducer was not presented after target's temporal offset. This would indicate that AM information presented after target's offset mainly contributes to illusory motion perception in a postdictive manner. A reliable amount of illusory motion perception occurred in the Absence-before-offset condition, although AM information was not explicitly presented during the last sequence (the inducers were presented twice at the same position.) This might be because, in addition to that the repeated presentation of AM sequences might introduce AM information implicitly and predictively, AM perception might also occur between the target (at the center of the display) and the inducer when it is presented after the offset.

## **EXPERIMENT 5**

In the previous experiments, we demonstrated that a target with a temporally offset signal was perceived as moving within anAM display, even though the target was actually presented at a fixed position. The underlying mechanism of this effect could be that AM information induced perceived shifts of the target's position (e.g., Whitney and Cavanagh, 2000; Shim and Cavanagh, 2004). To confirm this possibility, in Experiment 5, we measured the magnitude of mislocalization of the temporally offset target in an AM display.

## **METHODS**

One of the authors (Souta Hidaka) and seven volunteers participated in this experiment. The volunteers were naive to the purpose of this experiment. All the participants had normal or correctedto-normal vision. We presented a blue probe square (17.37 cd/m<sup>2</sup> , 0.8˚ × 0.8˚ ) at 6.7˚ above the fixation point. The horizontal position of the probe was randomly selected within ±0.8˚ around the center of the display in each trial. The target was presented 8˚ below the fixation point. The inducers were presented at 4˚ above and below the target,while their horizontal positions were aligned with the target (**Figure 6A**). We presented three AM sequences in the vertical direction. Then, for the subsequent, final (4th) sequence, the final position of the inducer was moved to a location 4˚ either

to the left (Left condition) or right (Right condition) of the target. The vertical positions of the inducers were aligned with that of the target. A condition in which the positions of the inducers were not changed was also introduced (Without-change condition). Only the offset condition was presented, so that the target always transiently disappeared between presentations of the inducers. The participants were asked to adjust the horizontal position of the probe to a location consistent with the perceived final location of the target while focusing on the fixation point. The participants completed 60 trials: Condition (3) × Repetition (20). The order of conditions was randomized and counterbalanced across trials and participants. Except for these differences, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

## **RESULTS AND DISCUSSION**

We normalized each participant's data by subtracting the adjustments made in the Without-change condition from those in the Left and Right conditions (**Figure 6B**). Then, we conducted a twotailed, paired *t* test, which revealed significant difference between the Left and Right conditions [*t*(7) = 2.98, *p* < 0.05]: the adjustments shifted to the right in the Left condition and to the left in the Right condition.

The results showed that the shifts in perceived position actually occurred for the temporally offset targets. In addition, although the inducers' positions in the last display were randomly assigned across conditions and trials, the perceived shifts were consistently against the direction of AM. Consistent with the results of the previous experiments, these results indicate that perceptual mislocalization of the target occurred postdictively and that the direction of the perceptual shift was opposite to the direction of AM.

## **EXPERIMENT 6**

In the previous experiments, the AM sequences and target's offset signals were repeatedly presented in a few cycles. In contrast, studies have demonstrated that perceived mislocalization for the target with a temporally onset signal could occur even with a single

presentation of the onset signal and AM sequence (e.g., Eagleman and Sejnowski,2007). Thus,in Experiment 6,we tested whether the temporally offset target could be perceptually mislocalized when the target offset signal and AM sequence were presented only once. As in Section "Experiment 5," we introduced a situation where the AM direction was unpredictable and determined only after the target offset was presented.

#### **METHODS**

One of the authors (Souta Hidaka) and three volunteers participated in this experiment. We presented the target and one of the inducers at the center of the display, 8˚ and 6˚ below the fixation points, respectively (cf. Eagleman and Sejnowski, 2007) (**Figure 7A**). These stimuli were horizontally aligned. In order to quantify the amount of perceived mislocalization for the target, we adopted a nulling procedure. In each trial, after 400 ms of the target presentation, the inducer was presented for 80 ms. Next, the target was temporally offset for 80 ms. The target subsequently reappeared, and its horizontal position was displaced either 0.03˚, 0.06˚, 0.12˚, or 0.24˚ in the left or right direction. An inducer was then presented. While the inducer was presented at the same position as the first inducer in the No-motion condition, the inducer's position was shifted 6˚ toward the left or right in the Motion condition. Participants were asked to judge the perceived direction of the target's displacement (left or right). Participants completed 160 trials: Motion (2) × Target displacements (8) × Repetitions (10). The order of the conditions was randomized and counterbalanced among trials and participants. Further, the amount of the target's displacements and the direction of the target's displacements and AM were randomly introduced in each trial and counterbalanced among the conditions. Except for these differences, the apparatus, stimulus parameters, and procedures were identical to those in Section "Experiment 1."

#### **RESULTS AND DISCUSSION**

For each participant, we plotted the proportion at which the perceived direction of the target was consistent with the physical

displacements in each motion condition (**Figure 7B**). Positive values of the target's displacements indicate that the displacements were consistent with the AM direction in the Motion condition. We then estimated the point of subjective equality (PSE) by fitting a cumulative Gaussian distribution function to each participant's data by using a maximum likelihood method. A two-tailed paired *t* test revealed a significant difference between the Motion and No-motion conditions [*t*(3) = 9.70, *p* < 0.005]. Since the PSE shifted in the direction consistent with the AM, the target displacements tended to be perceived to be opposite the AM direction in the Motion condition. These results indicate that a reliable amount of perceptual displacements could postdictively occur for the temporally offset target, even when the target offset signal and AM sequence were presented only once.

## **GENERAL DISCUSSION**

It has been reported that the perceived position of a target with a transient onset signal is mislocalized in the forward direction with respect to nearby motion information. The aim of the present study was to investigate whether a transient offset signal would

also induce mislocalization of the perceived position of the target. Phenomenological observation revealed that illusory motion was perceived for the target blinking at a fixed position in counter to the direction of horizontal AM stimuli (Movie S1 in Supplementary Material). Illusory motion was frequently perceived when (1) the eccentricity of the target was larger (Experiment 1), (2) offset duration was longer (Experiment 2), and (3) smoother AM was perceived (Experiment 3). Further, illusory motion perception did not occur when AM stimuli did not appear after the target's offset signal (Experiment 4). We further found that mislocalization of the target's perceived position actually occurred in a direction opposite to AM (Experiments 5 and 6). These findings suggest that a transient offset signal could trigger the perceptual mislocalization of static visual stimuli by interacting with motion information in a postdictive manner.

Eye movements induced by AM stimuli might contribute to the perception of illusory motion and target mislocalization. However, we found that illusory motion perception was modulated by changes in inducers' ISI which was strongly related to the perceived quality of AM, although eye movements could occur irrespective of changes in ISI (Experiment 3). In addition, perceived mislocalization was consistently observed in the situation where the final location of AM stimuli was changed randomly in the last display (Experiment 5). Further, we observationally confirmed that illusory motion could also occur in the vertical direction (the direction in which eye movements are less effective; the first three AM sequences in Experiment 5). These findings would thus indicate that eye movements were not a decisive factor in the current study.

The involvement of attentional shifts might be also considered. In fact, it has been reported that shifts in attentional location induced perceived mislocalization of briefly presented targets in the direction opposite to the attentional shifts (attentional repulsion effect: Suzuki and Cavanagh, 1997). However, the phenomenological aspects of that finding could differ from those of our current ones. In the study by Suzuki and Cavanagh (1997), attentional cues were always presented before target's onset, and AM information presented just after target's onset did not modulate the occurrence of the effect. On the other hand, we demonstrated that AM stimuli presented after the target's offset signal dominantly contributed to mislocalization. Another study also reported that attentional shifts induced after the onset of a target triggered perceived mislocalization of the target (Ono and Watanabe, 2011). In that case, however, the observed mislocalization was always in the direction of the attentional shifts (attentional attraction effect). While some studies have reported attentional modulation of the FDE (Shim and Cavanagh, 2005; Tse et al., 2011), attention might have only modulatory effects that help the observer to selectively track one of two competitive sources of motion information. Based on these facts, we could assume that the involvement of attentional shift/modulation would not fully explain our current findings.

Thus, we could consider that mislocalization of targets containing transient offset signal occurs due to the interaction between the transient offset signal and AM information. Since illusory motion perception and perceived mislocalization for the target could occur both along (Experiments 1–5) and outside the AM trajectory (Experiments 1B and 6), the target's offset signal could explicitly and implicitly interact with AM information. Some possible underlying mechanisms could be considered. For example, one may assume involvement of the "shadow motion" phenomenon (also called "pure phi" or "omega" motion; Saucer, 1953; Zeeman and Roelofs, 1953; Tyler, 1973; Sigman and Rock, 1974; Gellatly and Blurton, 1995; Ekroll et al., 2008). Typically, in this phenomenon, when two white squares on a black background (horizontally apart from each other) are alternately turned on and off, depending on particular temporal properties,AM for the white squares ("stimulus motion") is not perceived. Rather, the blank (offset) points of the squares are perceived as a black "shadow" that appears to move counter to the onset of the white squares. This phenomenon might suggest that in our experimental situation, shadow motion was perceived for the AM stimuli, and the temporally offset target was perceptually grouped together with the AM stimuli. Consequently, illusory motion was perceived for the target counter to the direction of the horizontal AM stimuli. A notable point is that the temporal characteristics of our AM stimuli would be optimal for stimulus motion. Indeed, the results of Section "Experiment 3" showed that the perceived quality of stimulus motion induced by the AM stimuli became higher at

the particular ISIs when the participants were asked to directly judge the motion. Illusory motion perception occurred most frequently at these ISIs. In addition, Ekroll et al. (2008) showed that optimal temporal properties were contradictory between stimulus and shadow motion perception. In other words, while stimulus motion prefers that AM stimuli contain transient onset signals, shadow motion prefers AM stimuli with transient offset signals. Thus, these motion perceptions should occur exclusively. Actually, as shown in our demonstration movie, we may not perceive shadow motion but may mainly perceive stimulus motion with our stimuli (see Movie S1 in Supplementary Material). We also create a demonstration in which shadow motion, instead of stimulus motion, may be dominantly perceived (Movie S3 in Supplementary Material). Whereas illusory motion perception may be vividly perceived in the former case, illusory motion may not appear or unreliably occur in the latter case. Moreover, an additional experiment (Experiment A1) found that perceived mislocalization for the target did not occur when we modified the spatiotemporal characteristics of AM stimuli in Section "Experiment 6" so that shadow motion could be perceived (**Figure 8**). These findings would suggest that illusory motion perception and perceived mislocalization for the target could be well observed with stimulus motion of AM stimuli. However, we should also note that our current manipulation might not be appropriate for shadow motion perception. Thus, investigations should be performed in future studies by using optimal spatiotemporal characteristics for shadow motion perception.

Furthermore, it could be notable that the existence of shadow motion mechanisms would indicate that the target's temporally transient offset signal could potentially serve as a motion cue by itself. This would suggest another possible underlying mechanism. For instance, the involvement of the onset repulsion effect (ORE) may be considered. In this phenomenon, the onset position of a moving target tends to shift backwards along motion trajectory (Thornton, 2002). In fact, the data obtained in the Absence-beforeoffset condition of Section "Experiment 4" seemed to suggest that AM was perceived between the reappearing target and the subsequent AM stimuli so that the target's position is misperceived in a backward direction, as in the ORE. Thus far, ORE has been mainly reported in a situation where the temporally onset target is presented along an AM trajectory. However, we also reported that illusory motion perception and perceived mislocalization occurred even when the target was presented outside of the AM trajectory (Experiments 1B and 6: see also Movie S2 in Supplementary Material). It may be interesting to consider that ORE could occur for the target with temporally offset signal outside the AM trajectory. Involvement of the mechanism related to FDE may also be assumed. FDE is a phenomenon whereby a visual target with a transient onset signal is mislocalized in the forward direction of a nearby motion signal (Whitney and Cavanagh, 2000; Shim and Cavanagh, 2004). There seems to be a basic phenomenological distinction: whereas forward displacements are observed in FDE, backward mislocalization consistently appeared in this study. However, it may be likely that AM information could induce forward perceived mislocalization of "shadow" element of the target. This may then result in the backward mislocalization of "stimulus" element of the target.

As perceptual mislocalization occurred in the backward motion direction and in a postdictive manner, the findings reported here may be also related to FLE. FLE is reported to occur such that a target with a transient onset signal is perceived at a backward position relative to the moving stimulus, although they are physically aligned (MacKay, 1958; Nijhawan, 1994; Eagleman and Sejnowski, 2000). The mechanism of FLE is considered to be that the target's transient onset signal resets the spatiotemporal integration process of the nearby motion signal. Motion information presented after the target's onset signal would be then reintegrated within a particular temporal window. This would result in motion bias: the position of the moving object is perceived as displaced toward motion direction relative to the target (Eagleman and Sejnowski, 2000, 2007). Accordingly, in the current study, the transient offset signal could also reset the process of spatiotemporal integration of motion information. In the subsequent stage, the motion signal would be reintegrated within a particular timeframe and resulting motion bias would occur, such that the target was perceptually localized relatively behind the motion signal. In addition, since the transient offset signal

might phenomenologically indicate the abrupt disappearance and reappearance of an object, we may assume that the transient offset signal would also induce the reset and re-encoding of the target's positional information: an object's positional information might be compared before and after the transient offset. A perceptual displacement signal of the target induced by motion biasing might be attributed to the comparison process between the previous and subsequent target positions. Consequently, the target's position after the offset may be consistently perceived as a backward position relative to the position before the offset in a postdictive manner.

Since these ideas are speculative at this stage, further investigations as to the underlying mechanisms of offset-induced mislocalization, including its postdictive aspects, should be performed in the near future. However, the current findings clearly demonstrate two novel phenomenological aspects of perceptual mislocalization of temporally offset targets. The first is that, contrary to the target containing a transient onset signal, the illusory motion and perceptual mislocalization for the target with temporally offset signal consistently occurs opposite the direction of AM information. The second aspect is that illusory motion perception and perceived mislocalization are observed for the target's absolute position, whereas previous literature has mainly reported that the temporally onset target's position is perceptually misaligned relative to nearby reference stimuli (Whitney and Cavanagh, 2000; Shim and Cavanagh, 2004) or nearby motion signals (Eagleman and Sejnowski, 2007; Shi and de'Sperati, 2008). In fact, the mislocalization occurred strongly for temporally offset targets as they were perceived as moving back and forth, even though the targets contained relatively certain positional information (presented for approximately 180 ms) compared with targets containing transient onset signals (which were typically presented for approximately 20 ms) (Experiments 1–4). Moreover, the perceptual displacements in a backward direction consistently occurred even when the participants judged the target's position itself (Experiments 5 and 6). Therefore, temporally offset signal would have phenomenological aspects or functions different from those of temporally onset signal in our perceptual systems.

## **ACKNOWLEDGMENTS**

We thank Wataru Teramoto and Jiro Gyoba for their suggestions. We are also grateful to the two reviewers for their insightful comments and suggestionsfor early versions of the manuscript.We also thank Yuki Yamada for his assistances. This research was supported

#### **REFERENCES**


by Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists (No. 19004400) and Rikkyo University Special Fund for Research.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Consciousness\_Research/10.3389/ fpsyg.2013.00196/abstract

**Movie S1 | The basic phenomenon of illusory motion perception for a temporally offset target in an apparent motion (AM) display.** Please see a white square blinking at a lower position while fixating on the black rings. Whereas the square will be perceived as static without AM at the first sequence, the square will then appear to move to the left or right when AM is presented. The direction of illusory motion of the target will be in counter-phase with that of AM.

#### **Movie S2 | Demonstration of the situation where the target is not**

**presented along an apparent motion trajectory.** This movie demonstrates the case where the vertical distance is well separated between the AM stimuli and target; the relative distance between them corresponds to 6 of the vertical distance. We confirmed that illusory motion perception reliably occurred even in this situation.

**Movie S3 | Demonstration of the situation where the shadow motion can be perceived.** We may notice that illusory motion perception does not appear or unreliably occurs contrary to when the stimulus motion can be perceived (Movie S1).

of bistable quartet motion. *Vision Res.* 44, 2393–2401.


the perceived position of remote stationary objects. *Nat. Neurosci.* 3, 954–959.

Zeeman, W. P., and Roelofs, C. O. (1953). Some aspects of apparent motion. *Acta Psychol.* 9, 158–181.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 December 2012; accepted: 01 April 2013; published online: 19 April 2013.*

*Citation: Hidaka S and Nagai M (2013) Illusory motion and mislocalization of temporally offset target in apparent motion display. Front. Psychol. 4:196. doi: 10.3389/fpsyg.2013.00196*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Hidaka and Nagai. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Spatial warping by oriented line detectors can counteract neural delays

## *Don A. Vaughn1 and David M. Eagleman1,2\**

*<sup>1</sup> Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA <sup>2</sup> Department of Psychiatry, Baylor College of Medicine, Houston, TX, USA*

#### *Edited by:*

*Makoto Miyazaki, Yamaguchi University, Japan*

#### *Reviewed by:*

*Katsumi Watanabe, The University of Tokyo, Japan Mark Changizi, 2ai Labs, USA*

#### *\*Correspondence:*

*David M. Eagleman, Departments of Neuroscience and Psychiatry, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA e-mail: david@eaglemanlab.net*

The slow speed of neural transmission necessitates that cortical visual information from dynamic scenes will lag reality. The "perceiving the present" (PTP) hypothesis suggests that the visual system can mitigate the effect of such delays by spatially warping scenes to look as they will in ∼100 ms from now (Changizi, 2001). We here show that the Hering illusion, in which straight lines appear bowed, can be induced by a background of optic flow, consistent with the PTP hypothesis. However, importantly, the bowing direction is the same whether the flow is inward or outward. This suggests that if the warping is meant to counteract latencies, it is accomplished by a simple strategy that is insensitive to motion direction, and that works only under typical (forward-moving) circumstances. We also find that the illusion strengthens with longer pulses of optic flow, demonstrating motion integration over ∼80 ms. The illusion is identical whether optic flow precedes or follows the flashing of bars, exposing the spatial warping to be equally postdictive and predictive, i.e., peri-dictive. Additionally, the illusion is diminished by cues which suggest the bars are independent of the background movement. Collectively, our findings are consistent with a role for networks of visual orientation-tuned neurons (e.g., simple cells in primary visual cortex) in spatial warping. We conclude that under the common condition of forward ego-motion, spatial warping counteracts the disadvantage of neural latencies. It is not possible to prove that this is the purpose of spatial warping, but our findings at minimum place constraints on the PTP hypothesis, demonstrating that any spatial warping for the purpose of counteracting neural delays is not a precise, on-the-fly computation, but instead a heuristic achieved by a simple mechanism that succeeds under normal circumstances.

**Keywords: neural delays, neural latency, orientation tuning, prediction, postdiction, hering illusion, spatial cognition, time and motion studies**

## **INTRODUCTION**

It has traditionally been proposed that geometric illusions result from angle overestimation (Hering, 1861; Wundt, 1862; Holt-Hansen, 1961; Prinzmetal and Beck, 2001), presumably as a result of lateral inhibition in visual cortex (Blakemore et al., 1970) or a bias in extrapolating 3D angle information from 2D projections (Nundy et al., 2000; Howe and Purves, 2005). However, a recent framework by Changizi and colleagues suggests that several geometric illusions are caused instead by temporal delays with which the visual system must cope (Nijhawan, 1997; Changizi, 2001; Changizi and Widders, 2002). In this framework, the visual system extrapolates current information to "perceive the present" (PTP): instead of providing a conscious image of how the world was ∼100 ms in the past (when signals first struck the retina), the visual system estimates how the world is likely to look in the next moment.

Despite its theoretical importance, the temporal hypothesis is supported by little direct data: it has not been unequivocally pitted against traditional frameworks, it is not known whether it would operate in a rule-based or direct manner, and there are no clues to its possible neural bases.

To test the temporal hypothesis, we capitalized on the Hering illusion (**Figure 1A**). The PTP hypothesis proposes that the background of radial lines simulates optic flow, causing the visual system to assume forward ego-motion and to extrapolate the appearance of the parallel bars to the next moment. Because objects closest to the horizontal plane move fastest during forward motion, this generates the illusory percept that the two parallel bars bend outward. Imagine driving on a suspension bridge toward two of its pillars: from a distance the pillars appear as parallel lines. As you approach, the pillars move farther apart at eye level, but their distant tops still appear close together.

## **METHODS**

## **APPARATUS**

Stimuli were displayed on a 19 Dell monitor at a resolution of 1280 × 1024 pixels and a refresh rate of 120 Hz. Eight participants observed stimuli in a dark room, at ∼0.59 m from the display.

## **PARTICIPANTS**

Thirteen subjects (5 women) participated in Experiment 1, eight (4 women) in Experiment 2, and nine (4 women) in

Experiment 3. All participants were naive regarding the purpose of the experiments, had normal or corrected-to-normal vision, and signed an informed consent statement approved by the Baylor College of Medicine Institutional Review Board.

## **STIMULI**

On each trial, participants fixated on a red cross in the center of the screen and were presented with a background of radial lines, dots in expanding or contracting motion, or motionless dots. In all four cases, the background persisted until the participant registered an answer. The radial lines were equally spaced, subtended 17◦ of visual angle, and had a luminance of 11 cd/m2. With an average luminance of 20 cd/m2, each of the 600 dots subtended between 0.05 and 0.16◦. The dots were displayed to imply the observer was moving forward or backward at 0.12 m/s. To achieve this, dots were randomly initialized throughout an imaginary 3D space in front of the observer. Each frame, dot positions in the imaginary depth plane were updated, and each dot was then rendered to the screen consistent in its new location, giving the dots a radial inward or outward trajectory. Consequently, dots had a larger radial displacement each frame at the outeredges of the screen than at the focus of expansion; velocity was not constant across a dot's lifetime. Two bars, each 2◦ of visual angle from the vertical meridian, repeatedly flashed over the dot pattern for 80 ms with an interstimulus interval of 1 s until the participant registered an answer. The bars were generated as segments of a circle, which for each trial was randomly assigned a curvature between ±2 m−<sup>1</sup> (0 is a straight vertical line). Bar length of 10.6◦ was held constant across all curvature values. Participants ran each condition 3 times. On each trial, the initial curvature of the two bars was randomized to one of 33 values (symmetric around 0). With the left and right arrow keys, participants adjusted the curvature until the two bars appeared subjectively straight (nullification technique).

Experiment 2 (prediction and postdiction) presented 5 durations of the background optic flow (40, 80, 160, 320, 640 ms). In prediction trials, the optic flow ended with the offset of the 80 ms bars; in postdiction trials, the background motion appeared with their onset. The interstimulus interval consisted of a 1 s blank screen, a 4 s 1/f static noise grating (uniquely generated on each trial), and another 1 s blank screen; these measures were included to prevent any motion aftereffect between presentations. Participants watched as many presentations as desired to adjust the curvature of the bars to nullify the illusion. Each condition was presented 3 times.

Using the contracting and expanding portions of the first experiment, the third experiment varied bar duration and optic flow speed. For each trial, bar duration was randomly selected to be 40, 80, 160, 320, or 640 ms, or continuously present until the participant registered an answer. Using the same method as Experiment 1, implied ego-speed was 0.12 m/s or 0.32 m/s. Participants ran 2 trials for each combination of speed, duration, and optic flow direction.

## **RESULTS**

Participants viewed two bars flashed above a background of radially expanding or contracting dots (optic flow; see Methods). In randomly interleaved trials, radial lines or a control background of motionless dots were used. The bars were flashed for 80 ms with an interstimulus interval of 1 s.

**Figure 1B** shows the average curvature required to nullify the illusion (i.e., to make the bars appear straight). The radial line, expanding, and contracting backgrounds give rise to the Hering illusion [**Figure 1B**, *p <* 0*.*001 *t*-test; *t(*12*)* = 10*.*14, 13.53, 8.19 respectively] while the motionless background does not [*p* = 0*.*73 *ns t*-test; *t(*12*)* = 0*.*35]. Strikingly, the magnitude and direction of the illusion are nearly identical in both the expanding and contracting cases: whether the dots moved toward or away from the center, the bars appear to bow outward [paired *t*-test *ns p* = 0*.*93, *t(*12*)* = 0*.*10; see demonstration at eaglemanlab.net/hering]. Note that the radial line condition induced the largest effect size; we suggest this would be consistent with optic flow at higher velocities becoming indistinguishable from radial lines.

At first glance, the bowing of the bars during contracting motion would seem to refute the PTP framework: an active temporal extrapolation of the scene should make the bars bend in the other direction. However, backward motion is ecologically rare, and backward extrapolation would provide little information as approaching objects would not be in the visual field (Changizi and Widders, 2002). It therefore appears plausible that a mechanism which evolved to temporally extrapolate based on optic flow might be directionally insensitive, always equating flow with forward ego-motion. Such a bias would similarly explain why observers generally perceive ambiguously forward or backward motion as forward motion (Lewis and McBeath, 2004). Thus, if the Hering illusion is caused by spatial warping to account for neural delays, we can refine our hypothesis about its mechanism and conclude that the warping operates heuristically, succeeding only in the common situation of forward motion and producing a disadvantageous percept in backward motion.

We next investigated whether the putative temporal mechanisms are strictly predictive (as the PTP hypothesis posits) or might also be postdictive (Eagleman and Sejnowski, 2000). To address this, we had participants view a 1 s expanding optic flow pattern offset-aligned with 80 ms bars (predictive case) or onsetaligned (postdictive case; **Figure 2**). If optic flow induces spatial warping by extrapolation, any optic flow *after* the presence of the bars should have no effect on the illusion magnitude. We found, in contrast, that information collected in a ∼80 ms window on either side of the bars contributes equally to the spatial warping [**Figure 2**; Two-Way ANOVA, motion duration *p <* 0*.*001, *F(*4*,* <sup>74</sup>*)* = 73*.*56; pre/postdiction *ns p* = 1*.*00, *F(*1*,* <sup>74</sup>*)* = 0*.*00]. In other words, the effect is not merely postdictive or predictive, but symmetrically peri-dictive: there is a symmetrical temporal window of motion integration around the flashing of the bars.

Having established that implied motion evokes this illusion, we next investigated the effect of modulating the two main temporal parameters: background dot speed and the duration of the bars' presence. Participants viewed the expanding and contracting conditions of the experiment at two different background speeds

**FIGURE 2 | Peri-dictive warping of the bars.** The magnitude of the illusion is identical whether background motion precedes the presentation of the bars (prediction) or follows it (postdiction). Results reveal a window of motion integration between 80 and 160 ms. In both conditions the bars flashed for 80 ms; the optic flow pattern was followed by a blank screen for 1 s, a noise grating for 4 s, and another blank screen for 1 s to eliminate motion after effect from one trial to the next. *n* = 8, error bars SEM.

with five different bar durations. The magnitude of the illusion was significantly reduced by increased bar duration [**Figure 3**; *p <* 0*.*001, *F(*5*,* <sup>208</sup>*)* = 14*.*52] and by increased background speed [*p <* 0*.*001, *F(*1*,* <sup>208</sup>*)* = 9*.*09, Three-Way ANOVA].

These results do not seem consistent with the angle overestimation hypothesis (AOH; Prinzmetal and Beck, 2001), as the AOH might have predicted that a longer bar duration would give a clearer signal of the intersection angle, making the effect larger. However, we find the opposite: longer bar durations decrease the effect magnitude. Moreover, the background dots increasingly look like lines as their speed increases, which would again make the intersection angle clearer, predicting a larger effect at faster speeds if the AOH were true; we find instead, a decreased effect with increased dot speed. We note, however, that the results could be consistent with the AOH if the visual system instead treats increased dot speed and decreased bar duration as low-contrast signals, given that contrast does effect the Hering illusion's magnitude (Astor-Stetson and Purnell, 1990).

Instead, we suggest that a continued presence of the bars evinces that the bars are not moving relative to the observer even while the dot pattern is moving, allowing the visual system to reduce the coupling between the bars and the background, and therefore to warp them less. Such a variable coupling can further explain why increased dot speed decreases the illusion magnitude: at faster speeds, the bars should change location even more if they were part of the background. Thus, an increased passage of optic flow for a fixed duration serves as mounting evidence that the bars are separate from the background.

Although a different geometric illusion against a background of expanding dots had previously been demonstrated (Changizi et al., 2008), the importance of the present findings lies in the equivalence of the illusion in both forward and backward motion, both predictively and postdictively, and as a function of the degree to which the bars are expected to change. First, these findings indicate that the spatial warping is a heuristic rather than an on-line computation. Second, if we had merely shown the illusion

**FIGURE 3 | The magnitude of the Hering illusion decreases with increasing bar duration and dot speed, both of which give evidence that the bars should not be expected to move with the background.** Accordingly, the warping of the bars diminishes. *n* = 9, error bars SEM.

with expanding motion, our findings could have potentially been explained by perceptual displacement of the lines by the background motion (Ramachandran and Cavanagh, 1987; Festa-Martino and Welch, 2001; Eagleman and Sejnowski, 2007); the illusion with contracting dots rules out motion capture as a possible explanation for this phenomenon (**Figure 1**).

Third, our demonstration that the Hering illusion is symmetrically induced by expanding or contracting optic flow either preceding or following the presentation of the bars unmasks clues about underlying neural mechanisms. Specifically, parsimony might suggest a single neural mechanism with two properties: (1) it is equally sensitive to static lines and antiparallel motion and (2) has an 80 ms symmetrical temporal integration window. Neurons in area MT do not meet the criteria: they are typically responsive to movement in a particular direction, and either do not respond or sometimes show suppressive effects to the opposite direction (Snowden et al., 1991; Bradley et al., 1995). Similarly, many neurons in area MSTd are responsive to either expanding or contracting optic flow patterns, but not both (Saito et al., 1986; Tanaka et al., 1989). Further, as a population, MSTd neurons are not responsive to radial lines. It therefore appears unlikely that the neural mechanisms of the illusion involve higher level, motion-sensitive areas like MT and MSTd. Instead, a stronger model would implicate orientation selective neurons in primary visual cortex, V1. These simple cells are sensitive to lines (Hubel

## **REFERENCES**


*Science* 287, 2036–2038. doi: 10.1126/science.287.5460.2036


and Wiesel, 1959) as well as motion streaks from dots moving at sufficient speed in either direction parallel to the preferred orientation (Geisler, 1999), and they have a temporal integration window consistent with our results. Future experiments in primates could elucidate if high-level warping of a visual scene to account for neural delays is rooted in the directionally-insensitive response of V1 neurons.

In summary, our findings indicate that the spatial warping caused by motion streaks reduces to the PTP model under the typical circumstances of forward ego-motion. This does not prove that the PTP hypothesis is the reason for the warping, but it is consistent with the possibility. Our current findings place constraints on the PTP hypothesis, demonstrating that any spatial warping for the purpose of counteracting neural delays is not a "smart," active neural process, but instead a heuristic subserved by a simple mechanism that succeeds only under forward-moving circumstances.

## **AUTHOR CONTRIBUTIONS**

Don A. Vaughn and David M. Eagleman jointly designed and conducted the experiments and jointly wrote the manuscript.

## **ACKNOWLEDGMENTS**

This research was supported by a grant from the US National Institutes of Health (R01NS053960 to David M. Eagleman).


macaque monkey. *J. Neurophysiol.* 62, 642–656.

Wundt, W. (1862). *Beiträge zur Theorie der Sinneswahrnehmung.* Leipzig: Wintersche Verlag.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 08 October 2013; published online: 01 November 2013.*

*Citation: Vaughn DA and Eagleman DM (2013) Spatial warping by oriented line detectors can counteract neural delays. Front. Psychol. 4:794. doi: 10.3389/fpsyg. 2013.00794*

*This article was submitted to Consciousness Research, a section of the journal Frontiers in Psychology. Copyright © 2013 Vaughn and Eagleman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Prediction, postdiction, and perceptual length contraction: a Bayesian low-speed prior captures the cutaneous rabbit and related illusions

## **Daniel Goldreich\* and Jonathan Tong**

Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, ON, Canada

#### **Edited by:**

Yuki Yamada, Yamaguchi University, Japan

#### **Reviewed by:**

Iris M. D. Vilares, Northwestern University and Rehabilitation Institute of Chicago, USA Robert Van Beers, VU University Amsterdam, Netherlands

#### **\*Correspondence:**

Daniel Goldreich, Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada. e-mail: goldrd@mcmaster.ca

Illusions provide a window into the brain's perceptual strategies. In certain illusions, an ostensibly task-irrelevant variable influences perception. For example, in touch as in audition and vision, the perceived distance between successive punctate stimuli reflects not only the actual distance but curiously the inter-stimulus time. Stimuli presented at different positions in rapid succession are drawn perceptually toward one another. This effect manifests in several illusions, among them the startling cutaneous rabbit, in which taps delivered to as few as two skin positions appear to hop progressively from one position to the next, landing in the process on intervening areas that were never stimulated. Here we provide an accessible step-by-step exposition of a Bayesian perceptual model that replicates the rabbit and related illusions. The Bayesian observer optimally joins uncertain estimates of spatial location with the expectation that stimuli tend to move slowly. We speculate that this expectation – a Bayesian prior – represents the statistics of naturally occurring stimuli, learned by humans through sensory experience. In its simplest form, the model contains a single free parameter, tau: a time constant for space perception. We show that the Bayesian observer incorporates both pre- and post-dictive inference. Directed spatial attention affects the prediction-postdiction balance, shifting the model's percept toward the attended location, as observed experimentally in humans. Applying the model to the perception of multi-tap sequences, we show that the low-speed prior fits perception better than an alternative, low-acceleration prior. We discuss the applicability of our model to related tactile, visual, and auditory illusions. To facilitate future model-driven experimental studies, we present a convenient freeware computer program that implements the Bayesian observer; we invite investigators to use this program to create their own testable predictions.

**Keywords: probabilistic inference, sensory saltation, motion illusions, tactile spatial attention, optimal percepts, Kalman smoothing, somatosensory spatiotemporal perception, sensory uncertainty**

## **INTRODUCTION**

Illusions provide investigators a window into the brain's unconscious perceptual strategies. In a particularly interesting category of illusions, an ostensibly task-irrelevant stimulus feature strongly influences the perception of a target feature. Here we consider one group of such illusions, characterized by the curious influence of time on the tactile perception of space (**Figure 1**).

When humans are asked to judge the distance between two brief taps delivered in rapid succession to the skin, they consistently underestimate the true distance. Indeed, the perceived distance between taps shortens systematically as the time between taps is reduced. This *perceptual length contraction* occurs even when the participant is explicitly instructed to attend only to the distance between stimuli, and to ignore the time. The phenomenon is particularly pronounced on the forearm and other body areas that have poor spatial acuity. Several striking illusions result from this puzzling compressive effect of time on space perception (**Figures 1A–C**). For instance, a stimulus sequence consisting of two-taps delivered at one position followed by two taps at another, with a short inter-stimulus interval (ISI) separating the second and third taps, is perceived as four taps hopping progressively along the arm: the second and third taps are perceptually displaced from their true positions, as if attracted toward one another (**Figure 1C**). This phenomenon is known as sensory saltation, or more famously, the cutaneous rabbit illusion (Geldard and Sherrick, 1972; Geldard, 1982). Analogous phenomena occur in vision (Geldard, 1976; Lockhead et al., 1980; Khuu et al., 2011) and audition (Bremer et al., 1977; Shore et al., 1998; Getzmann, 2009).

Why does time influence space perception in this manner? Much research supports the view that perception works out a probabilistic best guess. An optimal probabilistic (i.e., Bayesian) observer interprets the current sensory input, not in isolation, but rather within the context of the structure and statistics of the natural world (Knill and Pouget, 2004;Vilares and Kording, 2011). By exploiting its knowledge of the world, the observer achieves a more

of the Bayesian model.

subsequent figures, we illustrate stimulus sequences that progress distally on the arm; the illusions occur also for stimuli in the opposite direction. **(A)** Top: at short ISI (t), the perceived length (l\*) between two taps to the forearm is less than the actual length (l). Bottom: perceived length grows linearly with actual length, but with a slope less than 1. Filled circles: human perceptual data from Marks et al. (1982) for electrocutaneous stimuli delivered at t = 0.24 s. Solid line: fit of the Bayesian model. Dashed line: l = l\*. **(B)** Top: a pair of taps delivered to the right forearm at short ISI (t <sup>2</sup>) is perceived to have the

accurate perceptual inference. Following the Bayesian model of Goldreich (2007), we hypothesize that perception interprets successive taps to the skin as arising from a moving object that touches down intermittently, and that perception expects slowly moving objects to occur more often than rapidly moving ones. We speculate that the expectation for slow movement results from a lifetime of experience with tactile stimuli that are primarily stationary (e.g., the pressure of clothing against the skin) or – somewhat less frequently – slowly moving (e.g., grooming, movement of clothing during walking, etc.). Thus, in the observer's experience, stimuli separated by large distances at short ISI are uncommon. Faced with such a stimulus sequence, and somewhat uncertain as to the true locations of the taps, the brain concludes that the sensory measurements were caused by a stimulus sequence that was more probable *a priori*: one that moved at a slower speed (i.e., shorter distance) on the skin. Under this view, the influence of time over space perception, far from reflecting a design flaw in our perceptual machinery, is a consequence of optimal probabilistic inference under conditions of sensory uncertainty.

Here, we present and elaborate on the Bayesian observer model introduced by Goldreich (2007). We show that our model is compatible with the view that the rabbit illusion – and perceptual length contraction generally – involves concomitant pre- and postdiction. By prediction, we mean an inference process in which earlier sensory events influence the perception of later ones. By postdiction, we mean an inference process in which later sensory events influence the perception of earlier ones (Eagleman and Sejnowski, 2000). We show interestingly that pre- and postdiction emerge naturally from our model, even though the model does not explicitly represent these processes. We show further that directed spatial attention shifts the Bayesian observer's percept by modulating the prediction-postdiction balance. Finally, we apply our Bayesian model to the perception of spatiotemporal stimulus

patterns that are more complex than those depicted in **Figure 1**.

## **THE FUNDAMENTALS OF THE BAYESIAN OBSERVER**

two skin sites are perceived as hopping sequentially along the arm, because the short ISI (t) between taps 2 and 3 results in contraction of the perceived distance between them (l\* < l). Bottom: the perceived length from taps 2–3 asymptotically approaches the actual length (l = 10 cm, dashed line) as ISI is increased. Filled circles: human perceptual data from Kilgard and Merzenich (1995). Curve: fit

Stochastic variability in stimulus-evoked neural activity presents one of many challenges to perception. An identical repeated stimulus – such as a tap to a particular location on the skin – will evoke a different neural response on each trial (Sripati et al., 2006). Consequently, a given response could have been caused by a stimulus at any one of many locations. The spatial uncertainty caused by stochastic variability is lessened, but not eliminated, when a stimulus activates a larger number of neurons. On the forearm, where receptor density is relatively low, humans can localize a stimulus to within about ±1 cm of its true location; on the fingertip, where receptor density is much higher, localization improves to about ±1 mm (Weinstein, 1968).

To model stochastic neural variability, we assume that a single tap to the skin evokes an internal position *measurement* that is randomly sampled from a Gaussian distribution centered at the true tap position, with a standard deviation, σ*<sup>s</sup>* , that depends on the receptor density (the subscript *s* signifies "spatial")<sup>1</sup> . On repeated trials with an identical tap position, the measurement will vary stochastically, but on average will equal the true position. In the absence of any other perceptual influence, the measurement is the location the observer perceives. Consequently, on average the perception of an isolated single tap to the skin is veridical. However, unlike an isolated single tap, a rapid spatiotemporal tap sequence is not veridically perceived (**Figure 1**). To understand why, we explore a probabilistic model – a Bayesian observer that makes a perceptual best guess.

We begin by considering sequences of two taps, which result in two uncertain spatial measurements (*x*1*m*, *x*2*m*) and a detected time, *t*, between them<sup>2</sup> . The Bayesian observer (**Figure 2**) attempts

<sup>2</sup>We assume here that the observer veridically perceives the time between taps, such that temporal uncertainty is zero. Goldreich (2007) showed that temporal

to infer the actual tap positions (*x*1, *x*2) that produced the measurements (*x*1*m*, *x*2*m*). We refer to each possible (*x*1, *x*2) pair as a *candidate trajectory*, and to the measured positions (*x*1*m*, *x*2*m*) as the *measured trajectory*. The Bayesian observer considers both the *likelihood* and the *prior probability* of every candidate trajectory. A trajectory's likelihood is the probability that the trajectory would give rise to the measured trajectory. The plot of trajectory likelihoods – the likelihood function – is a cloud of uncertainty centered on the measured trajectory (**Figure 2A**, top). We analogize the likelihood function to a (typically unconscious) *sensation* – a precursor to the conscious percept.

A trajectory's prior probability is the frequency with which the observer expects the trajectory to occur; this may be the prevalence of the trajectory in nature, which the observer has learned from experience. The plot of prior probabilities – the prior density – represents the observer's *expectation* regarding trajectory occurrence. Crucially, our Bayesian observer believes that slow trajectories are more common than fast ones. We model this low-speed prior

uncertainty exerts a negligible effect on the percept when stimuli occur on a skin region with poor spatial acuity, such as the forearm. Accordingly, here we confine ourselves to modeling stimuli on the forearm, which is also the skin region most often tested in experimental studies of the cutaneous tau and rabbit illusions.

**FIGURE 2 | Bayesian model**. **(A)** The observer's likelihood function, prior probability density, and posterior probability density in response to taps sensed (i.e., measured by the observer) at positions (x <sup>1</sup><sup>m</sup> , x <sup>2</sup><sup>m</sup> ) = (3, 7 cm) (open red circles in all plots). Each pixel in the intensity plots represents a candidate trajectory: a possible tap 1 position and tap 2 position pair (x <sup>1</sup>, x <sup>2</sup>). Lighter color indicates higher probability (each plot is individually auto-scaled to take advantage of the full brightness range). The measured trajectory length is l<sup>m</sup> = x <sup>2</sup><sup>m</sup> − x <sup>1</sup><sup>m</sup> = 4 cm. Top: the observer's likelihood function plots the probability of the measured trajectory given each candidate trajectory. The observer understands that a single tap at any location produces a measurement drawn from a Gaussian distribution centered at that location, with standard deviation σ<sup>s</sup> ; thus, the likelihood function is a two-dimensional Gaussian density centered on the measured trajectory. Middle: the observer expects slow movement to occur more commonly; we model this expectation as a Gaussian distribution over trajectory speed, with mean zero and standard deviation, σ<sup>v</sup> . Consequently, the observer expects closely spaced taps, and its prior is maximal along the x <sup>1</sup> = x <sup>2</sup> diagonal. Bottom: the posterior probability of each trajectory is proportional to the product of its

likelihood and prior. The mode of the posterior (filled red circle) is the percept. **(B)** Space-time plots equivalently illustrate the inference process. Top: open red circles show measured tap positions (vertical-axis) and times of occurrence (horizontal-axis). Error bars (±1σ<sup>s</sup> ) represent the spatial imprecision of the measurements. The slope of the line connecting the taps is the measured trajectory speed: l<sup>m</sup> /t = 4 cm/0.15 s = 27 cm/s. Middle: the observer's low-speed expectation is represented by the line of slope zero and diagonal lines of slopes ±1σ<sup>v</sup> = ±10 cm/s. The distance traversed at speed σ<sup>v</sup> in time t is tσ<sup>v</sup> = 1.5 cm. The ascending diagonal line is shallower than the measured velocity: 10 cm/s < 27 cm/s. Equivalently, tσ<sup>v</sup> = 1.5 cm < l<sup>m</sup> = 4 cm. Thus, the measured trajectory violates the observer's low-speed expectation. Bottom: the perceived trajectory (filled red circles and red line) is a compromise between the measured trajectory (open circles, reproduced from top panel) and expectation (middle panel). Each tap has migrated perceptually by 1 cm toward the other, resulting in perceptual length contraction: l\* = 2cm < l<sup>m</sup> = 4 cm. The perceived trajectory speed is l\*/t = 2 cm/0.15 s = 13 cm/s. In both panels, σ<sup>s</sup> = 1 cm, σ<sup>v</sup> = 10 cm/s, t = 0.15 s, x <sup>1</sup><sup>m</sup> = 3 cm, x <sup>2</sup><sup>m</sup> = 7 cm.

<sup>1</sup>Neuroscientists may find it useful to conceive of the measurement as the location of the peak of evoked activity in the underlying receptor population (or its cortical equivalent), or more precisely as the maximal likelihood estimate of stimulus location, based on the neural response.

as a Gaussian density over trajectory speed, with mean zero and standard deviation σ*<sup>v</sup>* (the subscript *v* signifies "velocity"). Thus, trajectories in which the two taps are spaced closer together (i.e., lower-speed trajectories) have greater prior probability than those in which the taps are spaced farther apart (**Figure 2A**, middle).

Using Bayes' rule, the observer multiplies each trajectory's likelihood by its prior probability to obtain its posterior (final) probability. In essence, the observer combines *sensation* with *expectation* to achieve *perception*. The mode of the posterior distribution – the most probable trajectory – is the observer's percept (**Figure 2A**, bottom). Because of the low-speed prior, the percept underestimates the distance between rapidly presented stimuli. In the example illustrated, whereas the measured tap positions were (3, 7 cm), the percept was (4, 6 cm). The perceived distance between taps (*l* <sup>∗</sup> = 2 cm) was thus half the measured distance (*l<sup>m</sup>* = 4 cm) (**Figures 2A,B**).

How, exactly, does the time between taps influence perceptual length contraction? This question is answered in **Figure 3**. Because speed is distance divided by time, the prior probability falls off more sharply with distance when the time between taps is short. While always maximal along the *x*<sup>1</sup> = *x*<sup>2</sup> diagonal, the prior widens as ISI increases (**Figure 3A**, left to right). As a consequence, perceptual length contraction is most pronounced at shorter ISIs; as ISI increases, the perceived distance between taps asymptotically approaches the measured distance (**Figure 3B**).

We have explained the influence of time on the Bayesian observer's perception of space, but what of the influence of space itself on space perception? In **Figure 4**, we find reassuringly that *l* ∗ varies linearly with *lm*, although length contraction ensures that the slope of the relationship is less than one.

#### **THE PERCEPTUAL LENGTH CONTRACTION FORMULA**

*l*

In the Section"The Bayesian model"in Appendix,we show that the Bayesian observer's posterior density is a two-dimensional Gaussian distribution. The mode of the posterior reveals a relationship between *l* ∗ and *lm*:

$$\zeta^\* = \frac{l\_m}{1 + 2\left(\frac{\sigma\_s}{\sigma\_v t}\right)^2} \tag{1}$$

**FIGURE 3 | Time affects space perception**. **(A)** The columns display the observer's likelihood function, prior probability density, and posterior probability density on four trials in which the measured trajectory (open red circle in all plots) was x <sup>1</sup><sup>m</sup> = 3 cm, x <sup>2</sup><sup>m</sup> = 7 cm, and the time, t, between taps was (left to right) 0.05, 0.15, 0.25, and 0.35 s. Because the observer has a low-speed expectation, it most strongly expects the taps to fall close together when the time between them is short; thus, the narrowest prior distribution is found in the left column, and the prior distribution widens as t increases. The perceived trajectory (mode of the posterior, filled red circle) is pulled closer to the x <sup>1</sup> = x <sup>2</sup> diagonal when the prior is sharper. Therefore, the observer experiences more pronounced length contraction as t decreases. Conversely, as t increases, length contraction diminishes, and the perceived trajectory asymptotically approaches the measured trajectory (note diminishing distance between filled and open circles in the posterior plots as t increases). For all columns, σ<sup>s</sup> = 1 cm, σ<sup>v</sup> = 10 cm/s. **(B)** The perceived first and second tap positions (filled red circles), corresponding to the mode of each of the posterior plots above, are graphed along with the measured tap positions (dashed lines). The perceived distance between taps asymptotically approaches the measured distance as t increases (compare to **Figure 1C**, lower). **(C)** The amount of perceptual length contraction depends not only on t and σ<sup>v</sup> but also on σ<sup>s</sup> . Here we simulate a trial at t = 0.1 s for an observer whose spatial acuity is worse (σ<sup>s</sup> = 2 cm) than the observer in **(A)**. Although its posterior density is broader, this observer has the same percept (mode of the posterior) as the observer in **(A)** with t = 0.05 s (leftmost column in **A)**. Note that the ratio of σ<sup>s</sup> to σ<sup>v</sup> t is identical (=2) in the two cases. It is this ratio that determines the amount of perceptual length contraction.

Equation 1 is the perceptual length contraction formula, first reported by Goldreich (2007). Notice that, as we have seen, this formula predicts that *l* ∗ asymptotically approaches *l<sup>m</sup>* in the limit that *t* approaches infinity (**Figures 3A,B**), that the degree of length contraction is determined by the ratio of σ*<sup>s</sup>* to σ*vt* (**Figure 3C**), and that, at fixed *t*, *l* ∗ relates linearly to, but underestimates, *l<sup>m</sup>* (**Figure 4**).

Because σ*<sup>s</sup>* and σ*<sup>v</sup>* occur only as a ratio in the length contraction formula, it is convenient to rewrite the formula as:

$$l^\* = \frac{l\_m}{1 + 2\left(\frac{\varepsilon}{t}\right)^2} \tag{2}$$

where tau (τ), defined as σ*s*/σ*v*, has units of time, and is the model's single free parameter<sup>3</sup> . From Eq. 2 we see that tau is a time constant for space perception. The smaller the value of tau, the more the perceived length increases toward the measured length as interstimulus time increases: *l* <sup>∗</sup> = (1/3) *l<sup>m</sup>* when *t* = τ, and *l* <sup>∗</sup> = (2/3) *l<sup>m</sup>* when *t* = 2τ (**Figure 5A**). Thus, the larger the value of τ, the more susceptible the observer is to perceptual length contraction: for a given *t* and *lm*, an observer with a larger τ will perceive a shorter trajectory (**Figures 5A,B**).

To develop an intuition for these effects of tau, consider that the parameter can be rewritten:

$$\pi = \frac{\sigma\_s}{\sigma\_v} = \frac{1/\sigma\_v}{1/\sigma\_s} = \frac{\text{strength of low-speed expectation}}{\text{spatial activity}} \tag{3}$$

Thus, tau reflects the strength of the observer's low-speed expectation relative to the observer's spatial acuity. Tau is large in an observer with poor spatial acuity (large σ*s*) and a strong expectation for slow movement (small σ*v*). This observer places trust in the low-speed expectation; the observer's perception is considerably length contracted. Tau is small in an observer with excellent spatial acuity (small σ*s*) and little expectation regarding movement speed (large σ*v*). This observer places trust in the measurement; the observer's perception is only modestly length contracted.

The perceptual length contraction formula closely fits human data from a variety of experiments (**Figure 1**; see also Goldreich, 2007 for additional data fits). The fit is particularly satisfying given that the formula has just a single free parameter. The best-fit τvalues for the data displayed in **Figures 1A–C** were 0.21, 0.11, and 0.08 s. The larger τ for the **Figure 1A** fit may reflect the use of electrocutaneous stimuli by Marks et al. (1982), the source of the data plotted in **Figure 1A**. Electrical pulses tend to be more difficult to localize (larger σ*s*) than mechanical taps (Higashiyama

<sup>3</sup>We note for reference that Goldreich (2007) defined the model's free parameter as λ = σv/σ<sup>s</sup> ; thus, the lambda parameter in that paper is simply the reciprocal of the tau parameter.

length contraction. Dashed diagonal line: l\* = lm.

and Hayashi, 1993), which were used to generate the data in **Figure 1B** (Lechelt and Borchert, 1977) and **Figure 1C** (Kilgard and Merzenich, 1995). Measures of point localization suggest that σ*s* is on the order of 1 cm in response to light mechanical stimuli on the forearm (Weinstein, 1968; Martikainen and Pertovaara, 2002; Cody et al., 2008); thus, taking τ = 0.1 s as a nominal value for mechanosensory perception on the forearm, we infer that σ*<sup>v</sup>* is on the order of 10 cm/s.

## **BAYESIAN PERCEPTION IS OPTIMAL BECAUSE IT IS BENEFICIALLY BIASED**

Before developing our model further, we pause to consider an important conceptual question: we have described the Bayesian observer as achieving an optimal perceptual inference, but we have also shown that the observer consistently underestimates the measured distance between taps. How can an observer be both biased and optimal? This important question applies to any Bayesian observer with a non-uniform prior distribution.

The short answer to the question is that bias is optimal when it accurately reflects the stimulus statistics. In a world in which slow trajectories are more common than fast ones (and, therefore, among trajectories with any given inter-stimulus time, *t*, short lengths are more common than long ones), an observer is justified

in perceiving trajectories as shorter than measured. Paradoxically, then, the Bayesian observer is optimal precisely because it is biased.

To understand this thoroughly, we must appreciate the consequences of both measurement and stimulus variability. In **Figures 2–5** we artificially specified (*x*1*m*, *x*2*m*). In a laboratory experiment, however, the investigator can control only the stimulus, not the measurements. As explained, we conceive of each measured tap location as sampled from a Gaussian distribution of standard deviation σ*<sup>s</sup>* , centered on the actual tap location. Thus, if the skin is stimulated repeatedly with the identical trajectory, the measurement and consequently the percept will vary stochastically from trial to trial (**Figure 6**).

By incorporating measurement variability, the simulation shown in **Figure 6** is a more realistic representation of a laboratory experiment than are the simulations shown in the earlier Figures. Crucially for our understanding of the paradox of bias and optimality, however, **Figure 6** would be an unrealistic portrayal of the Bayesian observer's experience in the real-world. In the real-world, not only the measurements but also the trajectories themselves are drawn from a distribution. In **Figure 7**, we more closely simulate what we envision to be real-world tactile experience. The figure plots the lengths of one million trajectories sampled from a zero-mean velocity distribution (for clarity of illustration, all with *t* = 0.15 s), from each of which spatial measurements were sampled and processed into a percept.

A comparison of the statistics of the measured length, *l<sup>m</sup>* (**Figure 7A**) with those of the perceived length, *l* ∗ (**Figure 7B**) reveals that, although the observer's perception is biased, it is more accurate than the measurement. In fact, the observer's perception is optimal precisely because it is biased. To understand why, consider that the majority of these real-world trajectories have very short lengths (*l* close to zero). Because short trajectories are more common, any measured length, *lm*, most often originates from a trajectory of shorter true length, *l*. The Bayesian observer's percept is biased by the prior to take this crucial knowledge into account; consequently, over the course of many trials, the percept more closely reflects the true stimulus than the measurement does. This is indicated by the smaller vertical scatter of the percept (**Figure 7B**, left) than of the measurement (**Figure 7A**, left) around the diagonal line.

Further inspection of the scatterplot in **Figure 7A** reveals that, for any true trajectory length, *l*, the measurement, *lm*, occurs with equal frequency above and below the diagonal line. Thus, the histogram of *l<sup>m</sup>* samples is centered on *l* (**Figure 7A**, center). For this reason, the measured length is termed an "unbiased estimator" of the true length. Despite this lofty denomination, however,it is clear from the same scatterplot that for any magnitude *l<sup>m</sup>* other than 0, the distribution of true lengths has a smaller average magnitude (when *l<sup>m</sup>* > 0, *l* tends to lie to the left of the diagonal line; when *l<sup>m</sup>* < 0, *l* tends to lie to the right of the diagonal line). Thus, *l<sup>m</sup>* is an inaccurate estimator in the sense that the stimuli that result in a particular *l<sup>m</sup>* are on average offset from that *l<sup>m</sup>* (**Figure 7A**, right). If an observer were to report *l<sup>m</sup>* as the estimate of trajectory length, the observer would be found to systematically report trajectories as being longer than they actually are.

**Figure 7B** shows that the statistics of the perceived length, *l* ∗ , are opposite in character to those of the measured length. For any

true trajectory length, *l*, the perceived length, *l* ∗ , systematically underestimates the magnitude of *l* (**Figure 7B**, left and center). Thus, the perceived length is termed a "biased estimator." This bias is beneficial, however: because of it, at any *l* ∗ , the distribution of true lengths is centered on a mean of *l* ∗ (the values of *l* are symmetrically distributed around the diagonal line in the scatterplot). Thus, *l* ∗ is an accurate estimator in the sense that the stimuli that result in a particular *l* ∗ indeed on average have length equal to that *l* ∗ (**Figure 7B**, right). The observer's report of *l* ∗ can be trusted as accurately reflecting, on average, the true trajectory length. Importantly, the variance of *l* given *l* ∗ (**Figure 7B**, right) is smaller than the variance of *l<sup>m</sup>* given *l* (**Figure 7A**, center). This again reveals that the percept is more accurate than the measurement.

## **SELECTIVE SPATIAL ATTENTION SHIFTS THE PERCEIVED TRAJECTORY**

Up to this point,we have assumed that the observer's spatial uncertainty, σ*<sup>s</sup>* , is uniform within the tested area (σ*<sup>s</sup>* will, of course, differ between body areas, such as forearm and finger). However, spatial attention is associated with cortical receptive field recruitment and sharpening within the attended area (Anton-Erxleben and Carrasco, 2013). Thus, if an observer were to focus attention preferentially on one location, we might expect σ*<sup>s</sup>* to decrease there while plausibly increasing at unattended locations. Indeed, on the arm, the spatial error of localization decreases by as much as 30% when attention is directed to the stimulated skin region (Moore et al., 1999; O'Boyle et al., 2001).

If spatial acuity is modulated by selective attention, how might length contraction percepts be affected? In a cutaneous rabbit experiment, Kilgard and Merzenich (1995) found that when participants were not asked to focus their attention to any particular area of the arm, the midpoints of the perceived and actual trajectories tended to coincide (**Figure 8A**, left). In contrast, when participants were instructed to direct their attention either distally or proximally, the midpoint of the perceived trajectory shifted toward the attended location (**Figure 8A**, center, right). This occurred because the tap within the attended skin area migrated less perceptually than did the tap within the unattended area, an effect confirmed by Flach and Haggard (2006).

The Bayesian observer replicates this attention effect: when σ*<sup>s</sup>* decreases in one skin area relative to the other, the perceived trajectory midpoint shifts toward the attended location (**Figures 8B,C**). The relatively precise measurement of the "attended tap" impedes its perceptual migration, while the relatively imprecise measurement of the "unattended tap" facilitates its perceptual migration. In this situation, length contraction is accomplished primarily by the perceptual displacement of the unattended tap.

In the Section"Generalization to inhomogeneous spatial uncertainty" in Appendix, we derive a generalization of the length contraction formula that incorporates separate σ*s*<sup>1</sup> and σ*s*<sup>2</sup> values representing spatial uncertainty around the two tap locations.

In the general equation, the single spatial uncertainty, σ*<sup>s</sup>* , of Eq. 1 is replaced by the root-mean-square uncertainty at the two locations, σrms:

$$l^\* = \frac{l\_m}{1 + 2\left(\frac{\sigma\_{s(\text{rms})}}{\sigma\_\forall t}\right)^2} = \frac{l\_m}{1 + \frac{\sigma\_{s1}^2 + \sigma\_{s2}^2}{\left(\sigma\_\forall t\right)^2}}\tag{4}$$

We show further that the shift, ∆midpt, in the perceived trajectory midpoint away from the measured trajectory midpoint is:

$$
\Delta\_{\text{mid}\text{pt}} = \frac{l\_m}{2} \left( \frac{\sigma\_{s1}^2 - \sigma\_{s2}^2}{(\sigma\_{\text{v}} t)^2 + \sigma\_{s1}^2 + \sigma\_{s2}^2} \right) \tag{5}
$$

## **THE PREDICTIVE-POSTDICTIVE FORMULATION**

The rabbit illusion is often described as providing compelling evidence for perceptual postdiction, a process whereby the perception of an earlier event is modified by the occurrence of a later one. Postdiction is indeed an attractive explanation for the perceptual migration of tap 2 toward the location of tap 3 in the rabbit illusion (**Figure 1C**). As shown by Kilgard and Merzenich (1995), tap 3 also migrates perceptually toward the location of tap 2 (**Figure 1C**). Therefore, prediction apparently is also at play: the perception of a later event (tap 3) depends upon an earlier one (tap 2).

In light of these considerations, it may seem surprising that our Bayesian observer replicates length contraction illusions without explicitly representing either pre- or postdictive inference. How is this possible? The answer is that pre- and postdiction are implicitly embedded in the model via the action of the low-speed prior. The low-speed prior transforms the observer's likelihood function into a posterior density by pulling the observer's perception of each tap position toward the measured position of the other (**Figure 2**).

We can reveal the pre- and postdiction hidden in the Bayesian observer by decomposing the model's two-dimensional (*x*1, *x*2) calculations (**Figure 9A**) into a series of one-dimensional inferences regarding each tap's position individually (**Figure 9B**). Using its low-speed expectation, the observer can from the first tap's likelihood function predict a probability distribution over the position of the subsequent, second, tap, and from the second tap's likelihood function postdict a probability distribution over the

**FIGURE 8 | Modeling the effects of spatial attention**. **(A)** Depiction of a cutaneous rabbit illusion experiment reported by Kilgard and Merzenich (1995). Participants either received no specific instruction or were instructed to direct their attention (yellow highlight) toward the proximal or distal forearm. The investigators found that in the directed attention conditions, the perceived positions of tap 2 (green) and tap 3 (blue) were shifted toward the attended location (forearm sketches). **(B)** In the Bayesian observer, a reduction in σ<sup>s</sup> at the attended relative to the unattended location reproduces the perceptual shift reported by Kilgard and Merzenich (1995). Left panel: the Bayesian observer's likelihood function, prior and posterior density when σ<sup>s</sup> does not vary with location, simulating the no-instruction condition in **(A)**. In

this case, the perceived and measured trajectory midpoints coincide. Center two panels: effect of σsp < σsd, where the subscripts p and d refer to the proximal and distal arm areas. The greater the reduction of σsp relative to σsd, the more the perceived trajectory migrates proximally toward the tap 2 measurement. Right two panels: effect of σsd < σsp. The greater the reduction of σsd relative to σsp, the more the perceived trajectory migrates distally toward the tap 3 measurement. For all plots in **(B)**, the measurements (x <sup>2</sup><sup>m</sup> , x <sup>3</sup><sup>m</sup> ) were (3, 7 cm), the time between taps 2 and 3 was 0.06 s, and σ<sup>v</sup> was 10 cm/s. **(C)** The perceived (mode of posterior) tap 2 and 3 positions (green and blue circles) for each of the five conditions in **(B)** directly above, compared to the measured tap positions (dashed lines).

position of the previous, first, tap (arrows in **Figure 9B**). We call these two distributions the *predicted prior* and *postdicted prior* densities<sup>4</sup> .

Next, the observer simply multiplies each tap's likelihood function by that tap's prior to obtain the posterior density over the tap's position. We show in the Sections "One-dimensional reductions" and "The prediction-postdiction formulation" in Appendix that

the posteriors so obtained are identical to those that would result from extracting one-dimensional distributions from the joint (*x*1, *x*2) posterior: if the joint posterior (**Figure 9A**, bottom) were marginalized (i.e., integrated) vertically, it would yield the posterior over *x*<sup>1</sup> shown in **Figure 9B**, bottom left; if integrated horizontally, it would yield the posterior over *x*<sup>2</sup> shown in **Figure 9B**, bottom right.

In the Section "The prediction-postdiction formulation" in Appendix, we show that the predicted and postdicted priors are Gaussian densities, and that their means and variances are:

$$\begin{aligned} \mu\_{\text{pre}} &= \varkappa\_{1m} & \mu\_{\text{post}} &= \varkappa\_{2m} \\ \sigma\_{\text{pre}}^2 &= \sigma\_{s1}^2 + (\sigma\_{\text{v}}t)^2 & \sigma\_{\text{post}}^2 &= \sigma\_{s2}^2 + (\sigma\_{\text{v}}t)^2 \end{aligned} \tag{6}$$

<sup>4</sup>Note that "prior" in the Bayesian context does not imply "before" the stimulus occurs, but rather "independent of the measurement." The predicted prior over tap 2's position is constructed using all knowledge available to the observer except the tap 2 measurement, *x*2*m*. Similarly, the postdicted prior over tap 1's position is constructed using all knowledge available to the observer except the tap 1 measurement, *x*1*m*.

**FIGURE 9 | Prediction-postdiction formulation**. **(A)** The observer's two-dimensional joint (x <sup>1</sup>, x <sup>2</sup>) likelihood function, prior and posterior densities. The measured trajectory was x <sup>1</sup><sup>m</sup> = 3 cm, x <sup>2</sup><sup>m</sup> = 7 cm, with t = 0.15 s. The observer settings were σ<sup>s</sup> = 1 cm, σ<sup>v</sup> = 10 cm/s. **(B)** The inference process in **(A)** reformulated as a series of one-dimensional inferences regarding x <sup>1</sup> and x <sup>2</sup> individually. Top left: the tap 1 likelihood function (red), p(x <sup>1</sup><sup>m</sup> | x <sup>1</sup>), is centered on x <sup>1</sup><sup>m</sup> . Because of its low-speed expectation, the observer predicts (red arrow) that the most probable position for a future tap 2 will also be 3 cm. Middle right: the observer's predicted prior over tap 2 (light red) represents its belief concerning the position of tap 2, projected 150 ms forward in time from the occurrence of tap 1. Top right: the observer's tap 2 likelihood function (blue), p(x <sup>2</sup><sup>m</sup> | x <sup>2</sup>), is centered on x <sup>2</sup><sup>m</sup> . Because of its low-speed expectation, the observer postdicts (blue arrow) that the most probable position for the preceding tap 1 was also 7 cm. Middle left: the observer's postdicted prior over tap 1 (light blue) represents its belief concerning the position of tap 1, projected 150 ms backward in time from the occurrence of tap 2. Left column: using Bayes' theorem, the observer multiplies the tap 1 likelihood function (red) by the tap 1 postdicted prior (light blue) to obtain the

tap 1 posterior (purple). Right column: similarly, the observer multiplies the tap 2 likelihood function (blue) by the tap 2 predicted prior (light red) to obtain the tap 2 posterior (purple). **(C)** Individual tap likelihoods, priors, and posteriors graphed with the same color scheme as in **(B)**, for three trajectories of progressively increasing ISI. At t = 0.05 s, pre- and postdiction both result in relatively sharp priors that exert a strong influence over the percept (mode of the posterior). As t is increased, the pre- and postdicted priors become lower and broader: pre- and postdiction become increasingly uncertain with the passage of time. The priors thus exert diminishing influence, and the percept approaches the measurement (compare to **Figure 3A**). For all panels in **(C)**, σ<sup>s</sup> = 1 cm, σ<sup>v</sup> = 10 cm/s. **(D)** Effect of directed spatial attention, as in **Figure 8**. Top: a reduction in σs<sup>1</sup> sharpens the tap 1 likelihood function, increasing the strength of prediction (note sharp predicted prior over tap 2), while an increase in σs<sup>2</sup> broadens the tap 2 likelihood function, decreasing the strength of postdiction (note broad postdicted prior over tap 1). Middle: when σs<sup>1</sup> = σs2, pre- and postdiction have equal strength. Bottom: reduction in σs<sup>2</sup> relative to σs<sup>1</sup> results in effects opposite those seen in the top panel. For all panels in **(D)**, t = 0.06 s, σ<sup>v</sup> = 10 cm/s.

Equations 6 show that the prior density over each tap's position is centered on the measurement of the other tap, reflecting the observer's low-speed expectation (the most probable speed being zero). The variance of each prior density reflects the observer's uncertainty regarding the other tap's measurement (σ*s*<sup>1</sup> or σ*s*2) and the observer's prior uncertainty regarding trajectory speed (σ*v*), which translates into an increasing uncertainty regarding the distance traversed as the elapsed time,*t*,increases (σ*vt*). Thus, perceptual length contraction diminishes with increasing *t* (**Figure 9C**), as shown previously (**Figures 3** and **5A**).

**Figure 9D** shows that the predictive-postdictive formulation accurately reproduces the effects of directed spatial attention, previously explored in **Figure 8**. When attention is directed around the location of the first tap (σ*s*<sup>1</sup> < σ*s*2), the predicted prior is sharper than the postdicted prior (σ 2 pre < σ 2 post). Consequently, prediction exerts a dominant influence, perceptually displacing the second tap asymmetrically toward the first (**Figure 9D**, top). When attention is directed around the location of the second tap (σ*s*<sup>2</sup> < σ*s*1), the postdicted prior is sharper (σ 2 post < σ 2 pre). In this case, postdiction dominates, perceptually displacing the first tap asymmetrically toward the second (**Figure 9D**, bottom).

#### **THE PERCEPTION OF MULTI-TAP SEQUENCES**

Up to this point, we have modeled the perception of two-tap trajectories<sup>5</sup> . How might a Bayesian observer handle multi-tap sequences, delivered conceivably to any number of skin sites? An observer could apply a low-speed prior independently to the movement between each tap and the next one. Alternatively, an observer might apply a low-speed prior to the first tap pair of the sequence, but thereafter incorporate an expectation that the velocity of each pair be similar to that of the preceding pair: a low-acceleration prior (See "Multi-tap perception" in Appendix).

Here, we test each of these Bayesian observers with multi-tap sequences that produce illusions in humans.We consider two wellknown illusions. The first is the tau effect, so-named by Helson (1930) and subsequently described in elegant detail by Helson and King (1931). The second is a multi-tap rabbit, characterized in a delightful paper by Geldard (1982). In **Figures 10** and **11**, we show that the observer with a low-speed prior produces good fits to the human perceptual data; in **Figure 12**, we show that the observer with a low-acceleration prior does not.

In the tau effect experiment, taps at three skin positions define two spatial and two temporal intervals (**Figure 10**). Helson and King (1931) reported that, when *t* <sup>2</sup> = *t* <sup>1</sup> and *l*<sup>2</sup> = *l*1, the participants perceived the two lengths as equal: *l* ∗ <sup>2</sup> = *l* ∗ 1 . As *t* <sup>2</sup> was progressively reduced, however, tap 3 had to be located progressively farther down the arm (i.e., *l*<sup>2</sup> had to be progressively increased) in order to make *l* ∗ 2 equal *l* ∗ 1 (**Figures 10B,C**). The best-fit of our lowspeed-prior observer to the average of the human data occurred at τ = 0.10 s. The Bayesian observer closely replicated the space-time curve characterizing human perception (**Figure 10C**).

**FIGURE 10 | The tau effect**. **(A)** Three taps to the arm, at positions x <sup>1</sup> = 0 cm, x <sup>2</sup> = 3 cm, and x <sup>3</sup> (variable), define two spatial intervals, l <sup>1</sup> = 3 cm and l <sup>2</sup> (variable), and two temporal intervals, t <sup>1</sup> = 0.5 s and t <sup>2</sup> (variable). Because t <sup>2</sup> < t <sup>1</sup>, at some l <sup>2</sup> > l <sup>1</sup> the two intervals will be perceived to be of equal length (l <sup>2</sup>\* = l <sup>1</sup>\*). **(B)** At each of five t <sup>2</sup> settings (identified at right of plots), Helson and King (1931) progressively increased l <sup>2</sup> by shifting x <sup>3</sup> along the arm in 0.5-cm increments. On each trial, the participant reported whether the second spatial interval was perceived to be shorter than, equal to, or longer than the first interval. To accurately estimate each participant's point of subjective equality (PSE), we transformed these data into a two-alternative forced-choice format by distributing the participant's "equal" responses evenly to the "shorter" and "longer" response categories. We then fit each participant's transformed data (proportion "l <sup>2</sup> is longer" responses) at each t <sup>2</sup> setting with a Weibull psychometric function (blue curves). Each psychometric function provides a PSE (vertical line): the x <sup>3</sup> at which the psychometric function intersected 0.5 (horizontal line), indicating that l <sup>2</sup>\* = l <sup>1</sup>\*. The PSE shifted progressively to the left as t <sup>2</sup> was increased (note: when x <sup>3</sup> = 6 cm, l <sup>2</sup> actually does equal l <sup>1</sup>). The transformed data shown are from one participant ("Observer C") in Helson and King (1931). **(C)** Trajectories for which l <sup>2</sup>\* = l <sup>1</sup>\*. Blue points: mean x <sup>3</sup> that resulted in l <sup>2</sup>\* = l <sup>1</sup>\* among the six participants tested by Helson and King (1931), at each of the five t <sup>2</sup> settings. Blue lines: ±1 SD. Red points: best-fit performance of the Bayesian low-speed observer (τ = 0.10 s).

In the 15-tap rabbit experiment, five taps are delivered consecutively at each of three positions along the arm (**Figure 11**). Geldard (1982)found that when the time between consecutive taps was 0.05 s, participants perceived the first 10 taps in the sequence as hopping at an approximately uniform rate up the arm, each tap displaced by a constant spatial increment from the preceding one (**Figures 11A,B**, center). At an ISI of 0.3 s, perception was reportedly veridical (**Figure 11B**, left). At an ISI of 0.02 s, the perceived sequence began partway up the arm and traced a non-linear, somewhat sigmoidal path (**Figure 11B**, right).

The low-speed-prior observer's perception with τ = 0.10 s agrees qualitatively with the perception of human participants (**Figure 11C**). To understand why, first note that, at an ISI of 0.05 s

<sup>5</sup>Although we have encountered a four-tap rabbit experiment (**Figures 1C** and **8**), our approach was to consider the first and forth taps as mere reference points, so we modeled the perception of taps 2 and 3 only. Indeed, the first and forth taps in that sequence do not interact perceptually with the second and third, from which they are separated by large ISIs.

**FIGURE 11 |The 15-tap rabbit illusion**. **(A)** Geldard (1982) delivered five taps at each of three locations along the arm. When ISI between successive taps was 0.05 s, participants reported perceiving a linear spatial progression of taps 1 through 10 (forearm sketch). **(B)** The same spatial sequence shown in **(A)**, at three different ISIs, resulted in distinct percepts (Geldard, 1982). Left: at 0.3 s ISI, perception was veridical. Center: at 0.05 s ISI, perception was as shown in **(A)**. Right: at 0.02 s ISI, the taps were perceived to begin at a position between 2 and 3 cm along the arm, and to advance in a non-linear spatial progression. Open circles: true tap positions; blue points: human perceptual report. **(C)** The Bayesian low-speed observer's perception with a standard setting of τ = 0.10 s (e.g., σ<sup>s</sup> = 1 cm, σ<sup>v</sup> = 10 cm/s) shows much similarity to participants' subjective reports. Open circles: true tap positions;

red points: Bayesian observer's perception (mode of the posterior). Dashed slanted lines have slope 10 cm/s (i.e., 1σ<sup>v</sup> ). Note that the two rapid jumps in the true trajectory (from tap 5 to tap 6, and from tap 10 to tap 11) occur at a speed much greater than σ<sup>v</sup> when the ISI is 0.05 s (center) or 0.02 s (right); thus, perceptual length contraction occurs in these cases. In contrast, at an ISI of 0.3 s (left), the trajectory does not strongly violate the observer's low-speed expectation; thus, perception is nearly veridical. **(D)** The Bayesian low-speed observer's perception can be made even closer to human reports if the value of σ<sup>s</sup> varies along the arm. The observer's percept at each ISI is shown for σ<sup>s</sup> = 1, 2, and 0.5 cm around the proximal, middle, and distal arm regions, respectively. Line segments at right have length equal to 1σ<sup>s</sup> at each location. The value of σ<sup>v</sup> was fixed at 10 cm/s.

(**Figure 11C**, center) or 0.02 s (**Figure 11C**, right), the rapid jumps in the stimulus sequence are in clear violation of the observer's low-speed expectation (see diagonal dotted lines with slope σ*v*). Consequently, perceptual length contraction occurs for those tap pairs: the perceived distance between taps 5 and 6, and between taps 10 and 11, is considerably smaller than the actual distance.

Now, what causes the progressive perceptual displacement of the many taps that are, in reality, at the same position? Interestingly, each jump in the actual stimulus sequence results in a chain reaction that propagates, with diminishing strength, to more distant taps. The rapid jump from tap 5 to tap 6 induces perceptual length contraction that pulls tap 5 considerably upward in the plot (and tap 6 downward). This places perceived distance between taps 4 and 5, which given the short ISI is sufficient to violate the observer's low-speed expectation as applied to that tap pair. Consequently, taps 4 and 5 are perceptually attracted, resulting in some upward perceptual displacement of tap 4, placing perceptual distance between it and tap 3, and so on.

How would perception of the 15-tap sequence change if the observer were to direct its spatial attention unequally along the arm? To explore this question, in **Figure 11D** we have plotted the low-speed-prior observer's perception under conditions of "standard"attention to the proximal arm (σ*<sup>s</sup>* = 1 cm), directed attention to the distal arm (σ*<sup>s</sup>* = 0.5 cm), and relative inattention (σ*<sup>s</sup>* = 2 cm) to the area in-between. Comparison of **Figures 11D,C** indicates that adjustment to spatial attention affects perception in ways that depend upon ISI. For the particular values of σ*<sup>s</sup>* used in this example, perception of the 0.3 s ISI sequence remains nearly veridical (**Figure 11D**, left), whereas perception of the 0.05 s ISI sequence to some extent (center), and of the 0.02 s ISI sequence to a greater extent (right), are shifted upwards in the plots. The result is that the observer's perception even more closely resembles that of the human participants reported by Geldard (1982).

Unlike the low-speed-prior observer, the low-accelerationprior observer distinctly fails to match human perception (**Figure 12**). In the tau effect scenario, a discordant feature of the low-acceleration-prior observer is that, when *t* <sup>2</sup> = *t* <sup>1</sup> and *l*<sup>2</sup> = *l*1, the observer fails to perceive the lengths as equal, instead perceiving *l*<sup>2</sup> <sup>∗</sup> > *l*<sup>1</sup> ∗ . This perceptual asymmetry occurs because only the first segment of the trajectory is subject to a low-speed prior. Thus, when *t* <sup>2</sup> = *t* <sup>1</sup>, *l*<sup>2</sup> must be made shorter than *l*<sup>1</sup> in order to be perceived as equal. Consequently, in our simulation of Helson and King (1931) using the low-acceleration-prior-observer, *x*<sup>3</sup> fails to converge to 6 cm as the tap 3 time approaches 1 s (**Figure 12A**, purple points). The performance of the low-speed-prior observer, in contrast, does converge as expected (red points).

In the 15-tap rabbit experiment, at 0.05 s ISI and more markedly at 0.02 s ISI, the low-acceleration-prior observer perceives the trajectory to start below the actual tap 1 location and to end above the actual tap 15 location: the perceived trajectory is longer than the actual trajectory (**Figure 12B**, purple points). This is incompatible with human perceptual report, and opposite to the perception of the low-speed-prior observer (red points). The perceptual undershoot and overshoot occur because the rapid jumps in the actual stimulus sequence extend perceptually in both directions at nearly constant velocity, in keeping with the observer's low-acceleration expectation.

## **DISCUSSION**

## **PERCEPTUAL LENGTH CONTRACTION AS BAYESIAN INFERENCE**

Length contraction illusions have long fascinated and puzzled investigators. The tactile tau effect was first reported almost 100 years ago (Gelb, 1914). It was later named and investigated in detail in the early 1930s (Helson, 1930; Helson and King, 1931). The best-known length contraction illusion, the cutaneous rabbit, was discovered serendipitously some 40 years later, when Geldard and colleagues, intending to study the tau effect, mistakenly produced a stimulus pattern similar to the rapid sequences shown in **Figure 11B** (Geldard and Sherrick, 1972; Geldard, 1982). The resulting perception of taps hopping up the arm led a surprised observer to exclaim "who let the rabbit loose?" (Geldard, 1982). Over the years, investigators have proposed creative explanations – geometrical, mathematical, and neural – for these and related illusions (Jones and Huang, 1982; Brigner, 1988; Wiemer et al., 2000; Grush, 2005; Flach and Haggard, 2006).

The Bayesian observer model expounded here provides a concise and coherent explanation for the tau effect, the cutaneous rabbit, and related spatiotemporal illusions. Elapsed time influences the perception of traversed space because the observer expects objects to move slowly. In its simplest form, the model contains a single free parameter, tau: a time constant for space perception (Eqs 2 and 3). While much research remains to be done, we are encouraged by the close fit of the model to human perceptual data. Because a single model replicates the tau effect (**Figure 10**), the rabbit (**Figures 1C** and **11**), and other spatiotemporal illusions (**Figures 1A,B**; see also Goldreich, 2007), we suggest that these illusions are manifestations of a single perceptual assumption: a low-speed prior. Our confidence in this suggestion is strengthened by the finding that a single value of the tau parameter (∼0.1 s) provides good fits to perception on the forearm as measured in experiments using different paradigms and carried out by multiple laboratories.

A central feature of Bayesian perceptual models is that they consider multiple hypotheses – in our case, candidate trajectories. The idea that the brain perceives by evaluating candidates is consistent with the "multiple drafts" theory of Dennett and Kinsbourne (1992). These authors propose that, confronted with stimuli such as those depicted in **Figure 11**, the brain favors a distributed sequence of taps as the most "parsimonious" interpretation. This suggestion is compatible with our model if one equates parsimony with posterior probability. However,Dennett and Kinsbourne (1992) do not explain on what grounds an observer judges a particular interpretation to be the most parsimonious, nor do they explain why the percept changes as a function of ISI.

Bayesian perceptual models make precise, quantitative predictions regarding the relationships among perceptual variables (e.g., Eq. 1). These relationships spring from Bayes' theorem: the product of a hypothesis'likelihood and prior probability is proportional to its posterior probability. We liken the prior distribution to the observer's expectation derived from experience, and the likelihood function to the sensation evoked by the stimulus (**Figure 2**). In our view, then, the Bayesian perceptual framework beautifully formalizes Helmholtz's suggestion that "previous experiences act in conjunction with present sensations to produce a perceptual image" (Helmholtz, 1925).

Bayesian observers interpret sensory data in light of an internal model – a conception of the structure and statistics of the world. Bayesian perception is optimal when the observer's internal model accurately represents the world – that is, when the observer's prior distribution matches the stimulus distribution, and the observer's likelihood function accurately reflects the process by which stimuli map to measurements (**Figure 7**). Unfortunately, the natural statistics of tactile stimuli have not been sufficiently characterized to constrain a prior distribution, nor is our knowledge of tactile sensorineural responses sufficient to specify the precise shape of a likelihood function. Accordingly, we fit a Gaussian prior and Gaussian likelihood to the human behavioral data. Subtle discrepancies between the human data and the model's performance could result from our Gaussian assumptions. Future research is needed to determine the precise shapes of the priors and likelihoods used by individual participants. In any event, we speculate that a low-speed prior reflects the natural statistics of tactile stimuli, learned by humans through experience. If so, illusions such as the cutaneous rabbit may reveal the operation of an optimal observer who brings an expectation forged by real-world experience (the low-speed prior) into an artificial setting (the laboratory).

## **THE WIDE APPLICABILITY OF THE LOW-SPEED-PRIOR OBSERVER**

Our Bayesian observer model may explain a variety of perceptual phenomena beyond the tactile illusions we have considered. One such phenomenon is the out-of-body rabbit illusion. In a clever experiment, Miyazaki et al. (2010) showed that humans perceived taps as hopping progressively along an aluminum bar resting across the index fingers of the hands, when in actuality the taps were delivered only to the points on the bar directly above each finger. To apply the model to this scenario, it is necessary only to know the observer's likelihood function evoked by a tap to the bar: *p*(measurement | tap location along bar). An interesting twist here is that both hands might detect any single tap to the bar. This does not preclude the construction of a likelihood function; it simply requires consideration of the sensory input to both hands. For instance, a more intense vibration felt with the right hand would result in a likelihood function whose peak lies to the right of the bar's center. Once the single tap likelihood functions are determined empirically, it would be straightforward to fit the model to the behavioral data with a low-speed prior. Of interest would be to compare the value of σ*<sup>v</sup>* so obtained to the value (∼10 cm/s) that fits the perception of trajectories delivered directly to the skin.

Our model provides insight into crossmodal interactions in length contraction illusions (Kawabe et al., 2008; Asai and Kanayama, 2012). In a 2-location, 3-tap rabbit paradigm,Asai and Kanayama (2012) demonstrated that the cutaneous rabbit was more consistently perceived when a visual flash occurred concurrently with, and at the typical illusory location of, the second tap. The model readily accommodates this cue-combination scenario. As shown in **Figure 6**, stochastic variability in the measurement causes trial-to-trial variability in the perceived location of either tap. Provided the Bayesian observer assumes that the concurrent visual and tactile measurements resulted independently from the same event, the observer's likelihood function over that event's location will be the product of the visual and tactile likelihoods. The visual measurement will therefore sharpen and shift the combined likelihood function toward the flash location, increasing the frequency with which the observer perceives the tactile stimulus to fall at that location. To test the model, one would first measure participants' spatial uncertainty (σ*s*) in response to taps and flashes delivered in isolation. The model could then be used to make testable predictions regarding the perceptual influence of the flash.

Finally, our model may account for saltation illusions in both vision (Geldard, 1976; Lockhead et al., 1980; Khuu et al., 2011) and audition (Bremer et al., 1977; Shore et al., 1998; Getzmann, 2009). Provided the brain expects visual and auditory stimuli to move slowly, the model predicts pronounced length contraction when stimulus sequences traverse areas of poor spatial acuity (high σ*s*). In vision, this prediction has already been confirmed: the visual rabbit illusion occurs in response to peripheral but not central stimuli (Geldard, 1976). Furthermore, a low-speed prior has been implicated in visual motion perception (Weiss et al., 2002; Stocker and Simoncelli, 2006). Future experimental studies will assess the quantitative fit of our model to visual and auditory saltation illusions.

Despite its apparently wide applicability, we do not suggest that a low-speed prior alone can account for a majority of motion illusions. Interestingly, several visual motion phenomena (Nijhawan, 2002; Hubbard, 2005) involve endpoint overestimation similar to that caused by the low-acceleration prior that did not match the tactile data considered here (**Figure 12B**). Research is needed to clarify the conditions under which perception incorporates a low-acceleration prior.

#### **THE PERCEPT AS A COMBINED PRE- AND POST-DICTIVE INFERENCE**

Our Bayesian observer's percept can be viewed as resulting from concomitant pre- and post-dictive inference. For instance, in twotap trajectories, the first tap predicts the location of the second, while the second postdicts the location of the first (**Figure 9**). We suspect that Bayesian pre- and postdiction will be found to act together in many perceptual scenarios, whether or not these scenarios incorporate a low-speed prior. Indeed, it has already been reported that the two processes collaborate in the flash-lag effect (Rao et al., 2001; Soga et al., 2009), an illusion in which a brief visual flash placed alongside a moving object is perceived to lag behind the object.

By hypothesizing a link between spatial attention and σ*<sup>s</sup>* , as suggested by point localization experiments (Moore et al., 1999; O'Boyle et al., 2001), we have shown how attention can shape the relative influence of pre- and postdiction on the percept (**Figure 9D**). When attention is directed around the location of the first tap (σ*s*<sup>1</sup> < σ*s*2), prediction dominates, and the second tap is perceived as asymmetrically displaced toward the first. When attention is directed around the location of the second tap (σ*s*<sup>2</sup> < σ*s*1), postdiction dominates, and the first tap is perceived as asymmetrically displaced toward the second. Under conditions of imbalanced spatial attention, the trajectory midpoint is therefore perceived as shifted toward the attended location, as specified by Eq. (5). As the spatial attention balance is adjusted from one extreme to another, the model smoothly transitions between a percept influenced predominantly by prediction to one influenced predominantly by postdiction.

Researchers have often referred to the rabbit illusion as a postdictive phenomenon, without mentioning the involvement of prediction (Bays et al., 2006; Blankenburg et al., 2006; van Wassenhove, 2009; Miyazaki et al., 2010; Asai and Kanayama, 2012). Indeed, initial work on the rabbit described only the perceptual displacement of the earlier tap(s) toward the later one(s) (Geldard and Sherrick, 1972), consistent with an exclusively postdictive process. However, it is clear from modern studies of the rabbit that both earlier and later taps undergo perceptual displacement – whether by equal distances or not (Kilgard and Merzenich, 1995; Flach and Haggard, 2006; Trojan et al., 2010). This supports our conclusion that the illusion involves concomitant predictive and postdictive inference.

Why did initial rabbit illusion investigations describe only the displacement of earlier taps toward later ones? In his three-tap "reduced rabbit" paradigm, Geldard (1982) stimulated with a "locator" (tap 1) followed at large ISI by an"attractee" (tap 2) at the same position, which he reported as perceptually displaced toward the subsequent "attractant" (tap 3) delivered at a different location. The participants' report that tap 2 was perceptually displaced toward tap 3, but not vice versa, may have owed to the absence of a second locator tap placed at the position of tap 3. Without a locator tap for spatial comparison, participants may have been unaware that tap 3 was perceptually displaced. This hypothesis was considered and discarded by Geldard (1982) upon preliminary investigation, but Kilgard and Merzenich (1995), using a 4-tap paradigm that included a second locator tap, did find symmetric perceptual displacement of taps 2 and 3 (**Figure 1C**).

Alternatively, as demonstrated byKilgard andMerzenich (1995) and modeled here, asymmetric rabbit percepts could reflect an imbalance in spatial attention (**Figures 8** and **9D**; Eq. 5). An interesting possibility is that – particularly during multi-tap sequences – participants have time to redistribute their spatial attention on the fly. When investigators randomize the direction of movement (up or down along the arm), the participants cannot know where to expect the first tap, so they presumably distribute their spatial attention equally. After the first tap has occurred, however, experienced participants will know where the trajectory is heading, and might direct their attention fully toward the upcoming final location. This would cause a decrease in σ*<sup>s</sup>* at the final location, consequently shifting the percept toward that point (e.g., **Figure 11D**).

#### **SPECULATIONS REGARDING NEURAL IMPLEMENTATION**

We have described two computational approaches by which our Bayesian observer could obtain its percept: either multidimensional inference (e.g., the two-dimensional inference shown in **Figure 9A**) or equivalent one-dimensional predictionpostdiction (**Figure 9B**).Which,if either, approach might the brain implement? The two approaches yield the same percept, but they scale very differently in difficulty as the number of taps increases. In the case of a sequence of *n* taps, the joint likelihood function, prior, and posterior would each require *n* dimensions. The neural representation of such multi-dimensional distributions would appear to pose considerable challenges. More plausibly, the brain could undertake one-dimensional predictive-postdictive inference recursively.

It is tempting to reinterpret the graphs in **Figure 9** as plots of activity (e.g., spike rates) of a series of cortical neurons that represent the corresponding skin positions (*x*-axes). Under this interpretation, the predicted prior is a mound of cortical neural activity evoked by tap 1 that decays and broadens over time (**Figure 9C**). When the second tap initiates a second mound of cortical activity (the tap 2 likelihood function), the two mounds interact (e.g., through summation), resulting in a tap 2 percept that is shifted toward the tap 1 location. For trajectories with greater ISI, the tap 1 mound would have more time to decay, and would thus exert less influence over the tap 2 percept. This idea is similar to a model proposed by Flach and Haggard (2006). The idea is attractively simple; nevertheless, it seems able to account satisfactorily only for prediction, not postdiction. A more complex network model was proposed by Wiemer et al. (2000), but that model produces perceptual length dilation at large ISIs, a result contradicted by behavioral data.

Computationally, the perception of multi-tap sequences can be achieved with recursive predictive-postdictive Bayesian inference. The Kalman filter is an algorithm for recursive predictive inference (Haykin, 2001), for which plausible neural implementation schemes have been proposed (Deneve et al., 2007; Beck et al., 2011). Kalman smoothing combines the Kalman filter with recursive postdictive inference (Haykin,2001). The percepts obtained by our Bayesian observer are identical to those that would result from an appropriately configured Kalman smoother (see "Multi-tap perception"in Appendix). Smoothing has already been implicated in the flash-lag effect (Rao et al., 2001) and proposed to contribute to a variety of motion illusions, including the rabbit (Grush, 2005), though to our knowledge a specific neural implementation for the Kalman smoother has not yet been proposed.

#### **TESTABLE PREDICTIONS**

Our Bayesian observer model makes many testable predictions; we encourage other investigators to pursue these experimentally.

The model predicts that perceptual length contraction will be more pronounced on body areas with worse spatial acuity or – on a given body area – in response to stimuli that are harder to

### **REFERENCES**


primary sensory cortex somatotopically. *PLoS Biol.* 4:e69. doi:10.1371/journal.pbio.0040069


localize (e.g., weaker taps to the skin). Because σ*<sup>s</sup>* can be independently manipulated and measured using single taps, the length contraction formula (Eq. 1) can be used to make specific testable predictions regarding the effect of body area or stimulus strength on the perception of two-tap trajectories.

Under conditions of imbalanced spatial attention, the model predicts that perceptual length contraction will occur in accordance with Eq. 4 and that the midpoint of the perceived two-tap trajectory will vary in accordance with Eq. 5. These predictions could be tested experimentally by independently measuring an observer's σ*s*<sup>1</sup> and σ*s*<sup>2</sup> under different degrees of directed spatial attention, then measuring the trajectory percepts under the same conditions.

As explained above, the model can be used to make testable predictions regarding a variety of perceptual length contraction phenomena beyond those that we have modeled in this paper. These include the out-of-body rabbit, crossmodal influences on the rabbit percept, and the visual and auditory rabbit illusions.

We encourage readers to generate their own predictions by using our freely downloadable computer program, Leaping Lagomorphs (http://psych.mcmaster.ca/goldreichlab/LL/Leaping\_Lagomorphs.html). This convenient program implements the Bayesian observer, with either balanced or imbalanced spatial attention, and outputs its perception in response to any stimulus sequence that the user cares to enter.

#### **ACKNOWLEDGMENTS**

This research was supported by an Individual Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors thank Andy Bhattacharjee, Luxi Li, Ryan Peters, Mike Wong, and Deda Gillespie for their insightful comments.


"reduced-rabbit" paradigm. *J. Exp. Psychol. Hum. Percept. Perform.* 35, 289–304.


smoothing in visual motion perception. *Neural Comput.* 13, 1243–1253.


as a function of body part, sex, and laterality," in *The Skin Senses: Proceedings*, ed. D. R. Kenshalo (Springfield, IL: Thomas), 195–222.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 March 2013; accepted: 11 April 2013; published online: 10 May 2013.*

*Citation: Goldreich D and Tong J (2013) Prediction, postdiction, and perceptual length contraction: a Bayesian low-speed prior captures the cutaneous rabbit and related illusions. Front. Psychol. 4:221. doi: 10.3389/fpsyg.2013.00221*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Goldreich and Tong . This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX**

Here, we further develop mathematically, and offer new conceptual insights into, the *basic Bayesian observer* model put forth by Goldreich (2007). In the following seven sections, we: 1) specify the observer's generative model, and derive the posterior probability density over tap trajectories and the perceptual length contraction formula; 2) generalize the derivation to include inhomogeneous spatial acuity caused by selective spatial attention; 3) consider useful one-dimensional reductions of the two-dimensional posterior density; 4) reformulate the observer's percept as a combined predictive-postdictive inference; 5) model the perception of multi-tap sequences; 6) consider extensions of the model that incorporate additional sources of uncertainty; and 7) describe how we fit the model to human perceptual data.

#### **THE BAYESIAN MODEL**

We consider here an observer whose goal is to perceive the locations of two-taps delivered to the skin in rapid succession. We assume that the observer has an internal generative model – a conception of the statistics of moving tactile stimuli – and that it interprets the stimulus sequence optimally within the context of its generative model. Briefly, the observer considers two taps that occur in rapid succession to result from a single moving object, and it considers that tactile objects tend to move slowly. Specifically, according to the generative model: (1) An object briefly touches the skin at a location, *x*1, drawn from a uniform density. (2) The object moves away from *x*<sup>1</sup> with velocity *v*, drawn from a Gaussian density with mean zero and standard deviation σv; at some elapsed time *t* (independent of *x*1), the object again briefly touches the skin, at location *x*2. (3) Noisy sensorineural activity evoked by each tap results in measured values for the tap positions, *x*1*<sup>m</sup>* and *x*2*m*, drawn from Gaussian densities centered on the actual tap positions, *x*<sup>1</sup> and *x*2, with standard deviations σ<sup>s</sup> .

#### **Bayes' formula**

The observer's goal is to infer the positions of the taps (*x*1, *x*2), which we refer to as the movement trajectory. We assume in this basic model that the observer perceives the time between taps, *t*, veridically. Thus, the observer knows *x*1*m*, *x*2*m*, and *t*, and wishes to infer *x*<sup>1</sup> and *x*2. According to Bayes' formula, the posterior over trajectories is proportional to the product of likelihood and prior:

$$\mathbf{p}\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) \propto \mathbf{p}\left(\mathbf{x}\_{1m},\mathbf{x}\_{2m}|\mathbf{x}\_{1},\mathbf{x}\_{2},t\right)\mathbf{p}\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|t\right) \tag{A1}$$

We now work out the observer's prior and likelihood.

#### **Prior probability density**

The observer's prior probability density over trajectories is:

$$\boldsymbol{p}\left(\mathbf{x}\_{\mathrm{l}},\mathbf{x}\_{\mathrm{l}}|t\right) = \boldsymbol{p}\left(\mathbf{x}\_{\mathrm{l}}|\mathbf{x}\_{\mathrm{l}},t\right)\boldsymbol{p}\left(\mathbf{x}\_{\mathrm{l}}|t\right) \tag{A2}$$

Because *t* and *x*<sup>1</sup> are independent, *p*(*x*1|*t*) = *p*(*x*1), and this is a constant (*x*<sup>1</sup> being drawn from a uniform distribution). Therefore, we can write more concisely:

$$p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|t\right)\propto p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},t\right)\tag{\text{A3}}$$

We note that, given *x*<sup>1</sup> and *t*, *x*<sup>2</sup> is a function of the velocity, *v*:

$$\mathbf{x}\_2 = \mathbf{x}\_1 + \boldsymbol{\nu}\mathbf{t}\tag{A4}$$

Thus, the probability that *v* resides in the infinitesimal region (*v* ± *dv* 2 ) is equal to the probability that *x*<sup>2</sup> resides in the corresponding infinitesimal region (*x*<sup>2</sup> ± *dx*2 2 ):

$$\not{p}\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},t\right)d\mathbf{x}\_{2} = \not{p}\left(\mathbf{v}\right)d\mathbf{v}\tag{A5}$$

It follows that:

$$p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},\ t\right) = p\left(\nu\right) \left|\frac{d\nu}{d\mathbf{x}\_{2}}\right| = \frac{p(\nu)}{t} \tag{A6}$$

Now recall that the observer has a low-velocity prior expectation:

$$p(\nu) = \frac{1}{\sqrt{2\pi}\sigma\_{\nu}} \exp\left(-\frac{\nu^2}{2\sigma\_{\nu}^2}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{\nu}} \exp\left(-\frac{\left(\left(\infty-\infty\right)/t\right)^2}{2\sigma\_{\nu}^2}\right) \tag{A7}$$

Referring to Eqs A3, A6, and A7, we therefore have:

$$p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|t\right)\propto p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},t\right)=\frac{1}{\sqrt{2\pi}\sigma\_{r}t}\exp\left(-\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{1}\right)^{2}}{2\left(\sigma\_{r}t\right)^{2}}\right)\tag{A8}$$

The observer's prior probability density over trajectories is proportional to a Gaussian distribution over the distance between taps, with mean zero and standard deviation σ*<sup>v</sup> t*. Reflecting the low-speed prior, when the elapsed time, *t*, is large, a wide range of displacements is permissible; when *t* is shorter, the observer expects the two taps to more closely coincide spatially.

For future reference, we note that *x*2, like *x*1, is independent of *t*. We see this by integrating Eq. A8 with respect to *x*1:

$$\int p\left(\mathbf{x}\_{2}|t\right) = \int\_{\mathbf{x}\_{1}} p\left(\mathbf{x}\_{1}, \mathbf{x}\_{2}|t\right) d\mathbf{x}\_{1} \propto \int\_{\mathbf{x}\_{1}} \frac{1}{\sqrt{2\pi}\sigma\_{\mathbf{v}}t} \exp\left(-\frac{\left(\mathbf{x}\_{2} - \mathbf{x}\_{1}\right)^{2}}{2\left(\sigma\_{\mathbf{v}}t\right)^{2}}\right) d\mathbf{x}\_{1} = \mathbf{l} \tag{A9}$$

Thus, *x*<sup>2</sup> is independent of *t*, and, like *p*(*x*1), *p*(*x*2) is a constant. Eq. A8 shows that *x*<sup>2</sup> is conditionally dependent on *t*, given *x*1.

#### **Likelihood function**

The tap positions measured by the observer, *x*1*<sup>m</sup>* and *x*2*m*, are drawn independently from Gaussian densities centered on the actual tap positions, with standard deviations σ*<sup>s</sup>* . Therefore, the observer's likelihood function is:

$$\mathfrak{p}\left(\mathbf{x}\_{1m}, \mathbf{x}\_{2m} | \mathbf{x}\_1, \mathbf{x}\_2, t\right) = \mathfrak{p}\left(\mathbf{x}\_{1m} | \mathbf{x}\_1\right) \mathfrak{p}\left(\mathbf{x}\_{2m} | \mathbf{x}\_2\right) \tag{A10}$$

where

$$p\left(\mathbf{x}\_{\mathrm{lm}}|\mathbf{x}\_{\mathrm{l}}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{\mathrm{s}}} \exp\left(-\frac{(\mathbf{x}\_{\mathrm{lm}} - \mathbf{x}\_{\mathrm{l}})^2}{2\sigma\_{\mathrm{s}}^2}\right) \quad p\left(\mathbf{x}\_{2\mathrm{m}}|\mathbf{x}\_{\mathrm{l}}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{\mathrm{s}}} \exp\left(-\frac{(\mathbf{x}\_{2\mathrm{m}} - \mathbf{x}\_{\mathrm{l}})^2}{2\sigma\_{\mathrm{s}}^2}\right) \tag{A11}$$

#### **Posterior probability density**

The observer uses Bayes' formula (Eq. A1) to calculate the posterior density over trajectories. It is useful to express the posterior density in several ways. First, referring to Eqs A3 and A10, we see that Bayes' formula can be rewritten:

$$\mathbf{p}\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) \propto \mathbf{p}\left(\mathbf{x}\_{1m}|\mathbf{x}\_{1}\right)\mathbf{p}\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right)\mathbf{p}\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},t\right) \tag{A12}$$

Next, from Eqs A8 and A11, we have

$$p\left(\mathbf{x}\_{l},\mathbf{x}\_{l}|\mathbf{x}\_{lm},\mathbf{x}\_{2m},t\right)\propto\exp\left(-\left(\frac{\left(\mathbf{x}\_{lm}-\mathbf{x}\_{l}\right)^{2}+\left(\mathbf{x}\_{2m}-\mathbf{x}\_{2}\right)^{2}}{2\sigma\_{s}^{2}}+\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{l}\right)^{2}}{2\left(\sigma\_{r}t\right)^{2}}\right)\right)\tag{A13}$$

Finally, following some rearrangement, Eq. A13 can be written as a two-dimensional (2D) Gaussian distribution

$$p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right)\propto\exp\left(-\frac{1}{2(1-\rho^{2})}\left(\frac{\left(\mathbf{x}\_{1}-\mathbf{x}\_{1^{\*}}\right)^{2}+\left(\mathbf{x}\_{2}-\mathbf{x}\_{2^{\*}}\right)^{2}-2\rho\left(\mathbf{x}\_{1}-\mathbf{x}\_{1^{\*}}\right)\left(\mathbf{x}\_{2}-\mathbf{x}\_{2^{\*}}\right)}{\sigma^{2}}\right)\right)\tag{A14}$$

where the posterior mode (*x*<sup>1</sup> <sup>∗</sup> , *x*<sup>2</sup> <sup>∗</sup> ) is given by

$$\begin{split} \mathbf{x}\_{1^\*} &= \mathbf{x}\_{1m} \left( \frac{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + \boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2}{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + 2\boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2} \right) + \mathbf{x}\_{2m} \left( \frac{\boldsymbol{\sigma}\_\boldsymbol{\nu}^2}{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + 2\boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2} \right) \\ \mathbf{x}\_{2^\*} &= \mathbf{x}\_{1m} \left( \frac{\boldsymbol{\sigma}\_\boldsymbol{\nu}^2}{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + 2\boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2} \right) + \mathbf{x}\_{2m} \left( \frac{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + \boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2}{(\boldsymbol{\sigma}\_\boldsymbol{\nu}\boldsymbol{t})^2 + 2\boldsymbol{\sigma}\_\boldsymbol{\boldsymbol{s}}^2} \right) \end{split}$$

and the variance (σ 2 ) and correlation coefficient (ρ) are given by:

$$\sigma^2 = \sigma\_s^2 \frac{\sigma\_s^2 + (\sigma\_\nu t)^2}{2\sigma\_s^2 + (\sigma\_\nu t)^2} \qquad \rho = \frac{\sigma\_s^2}{\sigma\_s^2 + (\sigma\_\nu t)^2}$$

We assume that the observer reads out the posterior mode as the percept. Note that the perceived positions, *x*<sup>1</sup> <sup>∗</sup> and *x*<sup>2</sup> <sup>∗</sup> , are weighted averages of the measurements, *x*1*<sup>m</sup>* and *x*2*m*. The perceived positions are drawn toward one another as the time between taps shortens, converging toward the measurement midpoint, (*x*1*<sup>m</sup>* + *x*2*m*)/2, in the limit that *t* approaches zero. As *t* approaches infinity, by contrast, *x*1 <sup>∗</sup> and *x*<sup>2</sup> <sup>∗</sup> approach the measured values, *x*1*<sup>m</sup>* and *x*2*m*.

Subtracting *x*<sup>1</sup> <sup>∗</sup> from *x*<sup>2</sup> <sup>∗</sup> , we find that the perceived distance between taps, *l* <sup>∗</sup> = *x*<sup>2</sup> <sup>∗</sup> − *x*<sup>1</sup> <sup>∗</sup> , relates to the measured distance, *l<sup>m</sup>* = *x*2*<sup>m</sup>* − *x*1*m*, according to the formula:

$$l^\* = \mathbf{x}\_{2^\*} - \mathbf{x}\_{1^\*} = \frac{\mathbf{x}\_{2m} - \mathbf{x}\_{1m}}{1 + 2\left(\frac{\sigma\_i}{\sigma\_r t}\right)^2} = \frac{l\_m}{1 + 2\left(\frac{\mathbf{x}}{t}\right)^2} \tag{A15}$$

where we have defined the parameter tau as the ratio of the observer's spatial uncertainty to the width of the low-speed prior: τ = σ*s* σ*v* .

Although the measured tap positions will vary stochastically from trial to trial, on average they will equal the actual tap positions. Thus, on average the perceived distance is related to the true distance, *l*, as:

$$I^\* = \frac{l}{1 + 2\left(\frac{\pi}{t}\right)^2} \tag{A16}$$

This is the perceptual length contraction formula, previously derived – using a different approach and expressed in a slightly different form – by Goldreich (2007).

#### **GENERALIZATION TO INHOMOGENEOUS SPATIAL UNCERTAINTY**

So far we have assumed equal spatial uncertainty, σ*<sup>s</sup>* , at each point on the skin. Here, we consider the more general situation in which each tap may be associated with a different spatial uncertainty, σ*s*<sup>1</sup> and σ*s*2, as might occur if the participant were to focus spatial attention on one skin region. In this case, the likelihood functions, Eq. A11, become:

$$p\left(\mathbf{x}\_{\mathrm{lm}}|\mathbf{x}\_{\mathrm{l}}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{s1}}\exp\left(-\frac{(\mathbf{x}\_{\mathrm{lm}}-\mathbf{x}\_{\mathrm{l}})^2}{2\sigma\_{s1}^2}\right) \quad p\left(\mathbf{x}\_{2\mathrm{m}}|\mathbf{x}\_{2}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{s2}}\exp\left(-\frac{(\mathbf{x}\_{2\mathrm{m}}-\mathbf{x}\_{2})^2}{2\sigma\_{s2}^2}\right) \tag{A17}$$

Consequently, the posterior density over tap positions (Eq. A13) becomes

$$p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right)\propto\exp\left(-\left(\frac{\left(\mathbf{x}\_{1m}-\mathbf{x}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}}+\frac{\left(\mathbf{x}\_{2m}-\mathbf{x}\_{2}\right)^{2}}{2\sigma\_{s2}^{2}}+\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{1}\right)^{2}}{2\left(\sigma\_{r}t\right)^{2}}\right)\right)\tag{A18}$$

Following rearrangement, Eq. A18 can be re-written as a 2D Gaussian distribution,

$$p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right)\propto\exp\left(-\frac{1}{2\left(1-\rho^{2}\right)}\left(\frac{\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\circ\right)^{2}}{\sigma\_{1}^{2}}+\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{2}\circ\right)^{2}}{\sigma\_{2}^{2}}-\frac{2\rho\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\circ\right)\left(\mathbf{x}\_{2}-\mathbf{x}\_{2}\circ\right)}{\sigma\_{1}\sigma\_{2}}\right)\right)\tag{A19}$$

where the posterior mode (*x*<sup>1</sup> <sup>∗</sup> , *x*<sup>2</sup> <sup>∗</sup> ) is given by

$$\begin{aligned} \mathbf{x}\_{1^\*} &= \mathbf{x}\_{1m} \left( \frac{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s2}^2}{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s1}^2 + \boldsymbol{\sigma}\_{s2}^2} \right) + \mathbf{x}\_{2m} \left( \frac{\boldsymbol{\sigma}\_{s1}^2}{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s1}^2 + \boldsymbol{\sigma}\_{s2}^2} \right) \\ \mathbf{x}\_{2^\*} &= \mathbf{x}\_{1m} \left( \frac{\boldsymbol{\sigma}\_{s2}^2}{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s1}^2 + \boldsymbol{\sigma}\_{s2}^2} \right) + \mathbf{x}\_{2m} \left( \frac{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s1}^2}{(\boldsymbol{\sigma}\_\nu t)^2 + \boldsymbol{\sigma}\_{s1}^2 + \boldsymbol{\sigma}\_{s2}^2} \right) \end{aligned}$$

and the variances σ 2 1 , σ 2 2 and correlation coefficient (ρ) are given by:

$$\sigma\_1^2 = \sigma\_{s1}^2 \frac{\sigma\_{s2}^2 + (\sigma\_r t)^2}{\sigma\_{s1}^2 + \sigma\_{s2}^2 + (\sigma\_r t)^2} \qquad \sigma\_2^2 = \sigma\_{s2}^2 \frac{\sigma\_{s1}^2 + (\sigma\_r t)^2}{\sigma\_{s1}^2 + \sigma\_{s2}^2 + (\sigma\_r t)^2} \qquad \rho = \frac{\sigma\_{s1} \sigma\_{s2}}{\sqrt{\left(\sigma\_{s1}^2 + (\sigma\_r t)^2\right) \left(\sigma\_{s2}^2 + (\sigma\_r t)^2\right)}}$$

It follows that

$$I^\* = \mathbf{x}\_{2^\*} - \mathbf{x}\_{1^\*} = \frac{l\_m}{1 + \frac{\sigma\_{s1}^2 + \sigma\_{s2}^2}{\left(\sigma\_r t\right)^2}} = \frac{l\_m}{1 + 2\left(\frac{\sigma\_{s(\text{min})}}{\sigma\_r t}\right)^2} \tag{A20}$$

Thus, the uniform spatial uncertainty, σ*<sup>s</sup>* , of Eq. A15 is replaced by the root-mean-square of the uncertainty at the two locations:

$$
\sigma\_{s(\text{rms})} = \sqrt{\frac{\sigma\_{s1}^2 + \sigma\_{s2}^2}{2}}.
$$

Interestingly, when σ*s*<sup>1</sup> 6= σ*s*2, the midpoint of the perceived trajectory no longer coincides with the midpoint of the measured trajectory. From the expressions (Eq. A19) for *x*<sup>1</sup> <sup>∗</sup> and *x*<sup>2</sup> <sup>∗</sup> it is easily shown that the shift, ∆midpt, in the perceived trajectory midpoint away from the measured trajectory midpoint is:

$$
\Delta\_{\text{mid}} = \frac{\mathbf{x}\_{1^\*} + \mathbf{x}\_{2^\*}}{2} - \frac{\mathbf{x}\_{1m} + \mathbf{x}\_{2m}}{2} = \frac{l\_m}{2} \left( \frac{\sigma\_{s1}^2 - \sigma\_{s2}^2}{(\sigma\_r t)^2 + \sigma\_{s1}^2 + \sigma\_{s2}^2} \right) \tag{A21}
$$

#### **ONE-DIMENSIONAL REDUCTIONS**

The two-dimensional joint (*x*1, *x*2) posterior density (Eq. A19) fully represents the observer's belief distribution over stimulus trajectories, and it captures dependencies between the variables. Nevertheless, it can be useful to express the observer's belief about a single parameter of interest, although this entails a loss of information about dependencies. One such parameter of interest is the length, *l*, between taps. Other parameters of interest are the tap positions, *x*<sup>1</sup> and *x*2, considered individually. Here we derive the observer's one-dimensional posterior densities over each of these parameters.

#### **Posterior density over trajectory length**

The posterior over trajectory length, *l* = *x*<sup>2</sup> − *x*1, can be found by integrating across the joint posterior:

$$\int p\left(l|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) = \int\_{\mathbb{T}} p\left(\mathbf{x}\_{1}, \mathbf{x}\_{2} = l + \mathbf{x}\_{1} | \mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) d\mathbf{x}\_{1} \tag{A22}$$

The posterior over *l* can also be found by noting that, from Eq. A8, the observer's prior over *l* is:

$$p\left(l|t\right) = \frac{1}{\sqrt{2\pi}\sigma\_\nu t} \exp\left(-\frac{l^2}{2\left(\sigma\_\nu t\right)^2}\right) \tag{A23}$$

Further, from Eq. A17, we see that the observer's displacement measurement*, l<sup>m</sup>* = *x*2*<sup>m</sup>* − *x*1*m*, is normally distributed with mean *l* and variance σ 2 *<sup>s</sup>*<sup>1</sup> + σ 2 *s*2 :

$$p\left(l\_m|l\right) = \frac{1}{\sqrt{2\pi \left(\sigma\_{s1}^2 + \sigma\_{s2}^2\right)}} \exp\left(-\frac{\left(l\_m - l\right)^2}{2\left(\sigma\_{s1}^2 + \sigma\_{s2}^2\right)}\right) \tag{A24}$$

Thus, by Bayes' rule, the posterior over *l* is proportional to the product of these two Gaussian densities:

$$\not{p}\left(l|l\_m, t\right) \propto \not{p}\left(l\_m|l, t\right)\not{p}\left(l|t\right)\tag{A25}$$

The result is a Gaussian posterior density with mean and variance given by:

$$\mu\_{I\text{posterior}} = \frac{l\_m}{1 + \frac{\sigma\_{sl}^2 + \sigma\_{\ell 2}^2}{\left(\sigma\_r t\right)^2}}, \quad \sigma\_{I\text{posterior}}^2 = \frac{1}{\frac{1}{\sigma\_{sl}^2 + \sigma\_{\ell 2}^2} + \frac{1}{\left(\sigma\_r t\right)^2}}\tag{A26}$$

The mean of the posterior over *l* is again the length contraction formula, Eq. A20. The variance of the posterior over *l* is smaller than the variance of *lm*, given *l*. For this reason, the observer's length percept is more accurate than the length measurement (see **Figure 7**).

#### **Marginal posterior densities over x<sup>1</sup> and x<sup>2</sup>**

To express the observer's belief about each tap's position individually, we can integrate the joint posterior along *x*<sup>2</sup> to find the marginal posterior over *x*1, and integrate the joint posterior along *x*<sup>1</sup> to find the marginal posterior over *x*2:

$$\begin{aligned} p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) &= \int\_{\mathbf{x}\_{2}} p\left(\mathbf{x}\_{1}, \mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) d\mathbf{x}\_{2} \\ p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) &= \int\_{\mathbf{x}} p\left(\mathbf{x}\_{1}, \mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) d\mathbf{x}\_{1} \end{aligned} \tag{A27}$$

Because the joint posterior density is a 2D Gaussian (Eq. A19), the marginalization integrals (Eq. A27) have simple solutions:

$$\begin{split} p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{1}} \exp\left(-\frac{\left(\mathbf{x}\_{1}-\mathbf{x}\_{1^{\*}}\right)^{2}}{2\sigma\_{1}^{2}}\right) \\ p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{2}} \exp\left(-\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{2^{\*}}\right)^{2}}{2\sigma\_{2}^{2}}\right) \end{split} \tag{A28}$$

#### **THE PREDICTION-POSTDICTION FORMULATION**

Here, we show that the observer's marginal posterior over *x*<sup>2</sup> can be equivalently derived from predictive inference: upon observing tap 1, the observer predicts (infers forward in time) a prior over tap 2; the observer then combines this *predicted prior* with the tap 2 likelihood to obtain the posterior over *x*2. Conversely, the marginal posterior over *x*<sup>1</sup> can be derived from postdictive inference: upon observing tap 2, the observer postdicts (infers backward in time) a prior over tap 1; the observer then combines this *postdicted prior* with the tap 1 likelihood to obtain the posterior over *x*1.

#### **Predicting tap 2 upon observing tap 1**

Replacing the integrand in lower Eq. A27 with the expression from Eq. A1, we have:

$$\int p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) \propto \int\_{\mathbf{x}} p\left(\mathbf{x}\_{1m}, \mathbf{x}\_{2m}|\mathbf{x}\_{1}, \mathbf{x}\_{2}, t\right) p\left(\mathbf{x}\_{1}, \mathbf{x}\_{2}|t\right) d\mathbf{x}\_{1} \tag{A29}$$

Further expanding the integrand, we have:

$$\oint \mathbf{p} \left( \mathbf{x}\_{2} | \mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t \right) \propto \int\_{\mathbf{x}} \mathbf{p} \left( \mathbf{x}\_{1m} | \mathbf{x}\_{1} \right) \mathbf{p} \left( \mathbf{x}\_{2m} | \mathbf{x}\_{2} \right) \mathbf{p} \left( \mathbf{x}\_{2} | \mathbf{x}\_{1}, t \right) \mathbf{p} \left( \mathbf{x}\_{1} \right) d \mathbf{x}\_{1} \tag{A30}$$

Because *p* (*x*2*m*|*x*2) does not depend on *x*1, we move it outside the integral. Thus, we have:

$$\int p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) \propto p\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right) \int\_{\mathbf{x}\_{1}} p\left(\mathbf{x}\_{1m}|\mathbf{x}\_{1}\right) p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1}, t\right) p\left(\mathbf{x}\_{1}\right) d\mathbf{x}\_{1} \tag{A31}$$

Now we note that, according to Bayes' formula:

$$\not\!\!\!\!\begin{pmatrix} \mathbf{x}\_{1m}|\mathbf{x}\_{1}\rangle \not\!\!\begin{pmatrix} \mathbf{x}\_{1}\end{pmatrix} \propto \not\!\begin{pmatrix} \mathbf{x}\_{1}|\mathbf{x}\_{1m}\end{pmatrix} \tag{A32}$$

Substituting Eq. A32 into Eq. A31 yields:

$$\oint p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, \mathbf{t}\right) \propto p\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right) \int\_{\mathbf{x}\_{1}} p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1}, \mathbf{t}\right) p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m}\right) d\mathbf{x}\_{1} \tag{A33}$$

Equation A33 is Bayes' formula for the tap 2 position, *x*2. It states that the marginal posterior density over *x*<sup>2</sup> is proportional to the product of the tap 2 likelihood, *p* (*x*2*m*|*x*2), and the tap 2 *predicted prior density*,

$$\oint p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m},t\right) = \int\_{\mathbf{x}\_{1}} p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1},t\right) p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m}\right) \,d\mathbf{x}\_{1} \tag{A34}$$

The predicted prior projects belief forwards in time. It reflects the observer's beliefs about tap 2, given the tap 1 measurement and the elapsed time. Based on *x*1*m*, the observer can generate a posterior over tap 1, *p*(*x*1|*x*1*m*). The predicted prior over a particular tap 2 position is then calculated by integrating across every possible tap 1 the product of this tap 1 posterior with the probability that the particular tap 2 will follow.

**Frontiers in Psychology** | Consciousness Research May 2013 | Volume 4 | Article 221 |

#### **Postdicting tap 1 upon observing tap 2**

Replacing the integrand in upper Eq. A27 with the expression from Eq. A1, we have:

$$\oint p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) \propto \int\_{\mathbf{x}} p\left(\mathbf{x}\_{1m},\mathbf{x}\_{2m}|\mathbf{x}\_{1},\mathbf{x}\_{2},t\right) p\left(\mathbf{x}\_{1},\mathbf{x}\_{2}|t\right) d\mathbf{x}\_{2} \tag{A35}$$

Further expanding the integrand, we have:

$$\int p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) \propto \int\_{\mathbf{x}\_{2}} p\left(\mathbf{x}\_{1m}|\mathbf{x}\_{1}\right) p\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right) p\left(\mathbf{x}\_{1}|\mathbf{x}\_{2}, t\right) p\left(\mathbf{x}\_{2}\right) d\mathbf{x}\_{2} \tag{A36}$$

Because *p*(*x*1*m*|*x*1) does not depend on *x*2, we move it outside the integral. Thus, we have:

$$\int p\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m}, \mathbf{x}\_{2m}, t\right) \propto p\left(\mathbf{x}\_{1m}|\mathbf{x}\_{1}\right) \int\_{\mathbf{x}\_{2}} p\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right) p\left(\mathbf{x}\_{1}|\mathbf{x}\_{2}, t\right) p\left(\mathbf{x}\_{2}\right) d\mathbf{x}\_{2} \tag{A37}$$

Now we note that, according to Bayes' formula:

$$\mathbf{p}\left(\mathbf{x}\_{2m}|\mathbf{x}\_{2}\right)\mathbf{p}\left(\mathbf{x}\_{2}\right)\propto\mathbf{p}\left(\mathbf{x}\_{2}|\mathbf{x}\_{2m}\right)\tag{A38}$$

Substituting Eq. A38 into Eq. A37 yields:

$$\mathbf{p}\left(\mathbf{x}\_{1}|\mathbf{x}\_{1m},\mathbf{x}\_{2m},t\right) \propto \mathbf{p}\left(\mathbf{x}\_{1m}|\mathbf{x}\_{1}\right) \int\_{\mathbf{x}\_{2}} \mathbf{p}\left(\mathbf{x}\_{1}|\mathbf{x}\_{2},t\right) \mathbf{p}\left(\mathbf{x}\_{2}|\mathbf{x}\_{2m}\right) d\mathbf{x}\_{2} \tag{A39}$$

Equation A39 is Bayes' formula for the tap 1 position, *x*1. It states that the marginal posterior density over *x*<sup>1</sup> is proportional to the product of the tap 1 likelihood, *p*(*x*1*m*|*x*1), and the tap 1 *postdicted prior density*,

$$p(\mathbf{x}\_1|\mathbf{x}\_{2m}, t) = \int\_{\mathbf{x}\_2} p\left(\mathbf{x}\_1|\mathbf{x}\_2, t\right) p\left(\mathbf{x}\_2|\mathbf{x}\_{2m}\right) d\mathbf{x}\_2 \tag{A40}$$

The postdicted prior projects belief backwards in time. It reflects the observer's beliefs about tap 1, given the tap 2 measurement and the elapsed time. Based on *x*2*m*, the observer can generate a posterior over tap 2, *p*(*x*2|*x*2*m*). The postdicted prior over a particular tap 1 position is then calculated by integrating across every possible tap 2 the product of this tap 2 posterior with the probability that the particular tap 1 preceded.

#### **Formulas for the predicted and postdicted prior densities**

We now solve the predicted and postdicted prior integrals (Eqs A34 and A40). To find the predicted prior, we substitute from Eqs A8 and A17 left, into Eq. A34:

$$\begin{split} \rho \left( \mathbf{x} \middle| \mathbf{x}\_{1:m}, t \right) &= \int\_{\mathbf{x}} \frac{1}{\sqrt{2\pi}\sigma\_{r}t} \exp\left( -\frac{\left(\mathbf{x}\_{2} - \mathbf{x}\_{1}\right)^{2}}{2\left(\sigma\_{r}t\right)^{2}} \right) \frac{1}{\sqrt{2\pi}\sigma\_{s1}} \exp\left( -\frac{\left(\mathbf{x}\_{1:m} - \mathbf{x}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}} \right) d\mathbf{x}\_{1} \\ &= \frac{1}{2\pi\sigma\_{r}t\sigma\_{s1}} \int\_{\mathbf{x}\_{1}} \exp\left[ -\left( \frac{\left(\mathbf{x}\_{2} - \mathbf{x}\_{1}\right)^{2}}{2\left(\sigma\_{r}t\right)^{2}} + \frac{\left(\mathbf{x}\_{1:m} - \mathbf{x}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}} \right) \right] d\mathbf{x}\_{1} \end{split} \tag{A41}$$

We note that, upon much rearrangement:

$$\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{1}\right)^{2}}{2\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}}+\frac{\left(\mathbf{x}\_{1m}-\mathbf{x}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}}=\frac{\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}+\sigma\_{s1}^{2}}{2\sigma\_{s1}^{2}\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}}\left(\mathbf{x}\_{1}-\frac{\mathbf{x}\_{2}\sigma\_{s1}^{2}+\mathbf{x}\_{1m}\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}}{\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}+\sigma\_{s1}^{2}}\right)^{2}+\frac{1}{2}\left(\frac{\left(\mathbf{x}\_{2}-\mathbf{x}\_{1m}\right)^{2}}{\left(\mathbf{o}\_{\mathrm{v}}t\right)^{2}+\sigma\_{s1}^{2}}\right)\tag{A42}$$

Thus, Eq. A41 becomes,

$$p\left(\mathbf{x}\_{l}|\mathbf{x}\_{lm},t\right) = \frac{1}{2\pi\sigma\_{r}t\sigma\_{s1}}\exp\left(-\frac{\left(\mathbf{x}\_{l}-\mathbf{x}\_{lm}\right)^{2}}{2\left(\left(\sigma\_{r}t\right)^{2}+\sigma\_{s1}^{2}\right)}\right)\int\_{\mathbb{R}}\exp\left(-\frac{\left(\mathbf{x}\_{l}-\frac{\mathbf{x}\_{l}\sigma\_{s1}^{2}+\mathbf{x}\_{lm}(\sigma\_{r}t)^{2}}{\left(\sigma\_{r}t\right)^{2}+\sigma\_{s1}^{2}}\right)^{2}}{\frac{2\sigma\_{s1}^{2}\left(\sigma\_{r}t\right)^{2}}{\left(\sigma\_{r}t\right)^{2}+\sigma\_{s1}^{2}}}\right)d\mathbf{x}\_{l}\tag{A43}$$

The integrand is a Gaussian function with standard deviation

$$\frac{\sigma\_{s1}\sigma\_{v}t}{\sqrt{\left(\sigma\_{v}t\right)^{2}+\sigma\_{s1}^{2}}}\frac{\cdot}{\cdot}$$

Because the integral of an un-normalized Gaussian function of standard deviation <sup>σ</sup> is <sup>√</sup> 2π σ, Eq. A43 simplifies to:

$$p\left(\infty \middle| \chi\_{1m}, t\right) = \frac{1}{2\pi\sigma\_r t \sigma\_{s1}} \exp\left(-\frac{\left(\chi\_2 - \chi\_{1m}\right)^2}{2\left(\left(\sigma\_r t\right)^2 + \sigma\_{s1}^2\right)}\right) \frac{\sqrt{2\pi}\sigma\_{s1}\sigma\_r t}{\sqrt{\left(\sigma\_r t\right)^2 + \sigma\_{s1}^2}}\tag{A44}$$

Therefore, the predicted prior density over *x*<sup>2</sup> is

$$p\left(\mathbf{x}\_{2}|\mathbf{x}\_{1m},t\right) = \frac{1}{\sqrt{2\pi\left(\left(\sigma\_{\rm{V}}t\right)^{2} + \sigma\_{\rm{s1}}^{2}\right)}}\exp\left(-\frac{\left(\mathbf{x}\_{2} - \mathbf{x}\_{1m}\right)^{2}}{2\left(\left(\sigma\_{\rm{V}}t\right)^{2} + \sigma\_{\rm{s1}}^{2}\right)}\right) \tag{A45}$$

That is, the predicted prior is a Gaussian with mean and variance

$$
\mu\_{\rm pre} = \varkappa\_{\rm lm} \qquad \sigma\_{\rm pre}^2 = (\sigma\_{\rm v}t)^2 + \sigma\_{\rm s1}^2 \tag{A46}
$$

A similar derivation reveals that the postdicted prior density over *x*<sup>1</sup> is

$$p\left(\mathbf{x}\_{1}|\mathbf{x}\_{2m},t\right) = \frac{1}{\sqrt{2\pi\left(\left(\sigma\_{\nu}t\right)^{2} + \sigma\_{s2}^{2}\right)}} \exp\left(-\frac{\left(\mathbf{x}\_{1} - \mathbf{x}\_{2m}\right)^{2}}{2\left(\left(\sigma\_{\nu}t\right)^{2} + \sigma\_{s2}^{2}\right)}\right) \tag{A47}$$

That is, the postdicted prior is a Gaussian with mean and variance

$$
\mu\_{\rm post} = \mathbf{x}\_{2m} \qquad \sigma\_{\rm post}^2 = (\sigma\_r t)^2 + \sigma\_{s2}^2 \tag{A48}
$$

#### **MULTI-TAP PERCEPTION**

So far, we have considered trajectories composed of just two taps. An interesting question arises in modeling the perception of multi-tap stimuli: is the observer's generative model (a) a direct extension of the one we have considered here, such that a zero-mean low-speed prior applies independently to each pair of consecutive taps, or (b) does the observer expect velocity to be consistent across the multi-tap trajectory, such that the prior applied to each tap pair might be a Gaussian centered on the velocity of the preceding pair (a zero-mean low-acceleration prior)?

Considering trajectories with an arbitrary number of taps, *n*, and permitting inhomogeneous spatial acuity, possibilities (a) and (b) result in the following generalizations of Eq. A18:

(a)

$$p\left(\left|\mathbf{x}\_{i}\right|\left|\left<\mathbf{x}\_{\mathrm{im}}\right>,\left<\mathbf{t}\_{i}\right>\right)\propto\exp\left(-\left(\sum\_{i=1}^{n}\frac{\left(\mathbf{x}\_{i\mathrm{m}}-\mathbf{x}\_{i}\right)^{2}}{2\sigma\_{si}^{2}}+\sum\_{i=1}^{n-1}\frac{\left(\mathbf{x}\_{i+1}-\mathbf{x}\_{i}\right)^{2}}{2\left(\mathbf{o}\_{V}\mathbf{t}\_{i}\right)^{2}}\right)\right)\tag{A49}$$

(b)

$$p\left(\left\{\mathbf{x}\_{i}\right\}\left(\left\{\mathbf{x}\_{\mathrm{im}}\right\},\left\{\mathbf{t}\_{i}\right\}\right)\propto\exp\left(-\left(\sum\_{i=1}^{n}\frac{\left(\mathbf{x}\_{i\mathrm{m}}-\mathbf{x}\_{i}\right)^{2}}{2\sigma\_{\mathrm{sl}}^{2}}+\frac{\left(\mathbf{x}\_{i}-\mathbf{x}\_{i}\right)^{2}}{2\left(\sigma\_{\mathrm{v}}\mathbf{t}\_{i}\right)^{2}}+\sum\_{i=2}^{n-1}\frac{\left(\frac{\left\|\mathbf{x}\_{i+1}-\mathbf{x}\_{i}\right\|}{\mathbf{t}\_{i}}-\frac{\left\|\mathbf{x}\_{i}-\mathbf{x}\_{i-1}\right\|}{\mathbf{t}\_{i-1}}\right)^{2}}{2\sigma\_{\mathrm{v}}^{2}}\right)\right)\tag{A50}$$

Here {*xi*} refers to the set of tap positions, *x*1, *x*2, . . . *xn*; {*xim*} to the corresponding set of measurements; {*ti*} to the set of times elapsed between each tap *i* and tap *i* + 1; and σ*si* to the spatial uncertainty associated with tap *i*.

The observer's percept {*x* ∗ *i* } in case (a) or (b) can be found by taking partial derivatives of Eq. A49 or Eq. A50 with respect to each of the {*xi*}, setting these to zero, and solving the simultaneous equations. We used this method to find the percepts depicted in **Figures 10** and **11** [case (a)] and **Figure 12** [case (b)].

Alternatively, the identical percept can be found through Kalman smoothing (Haykin, 2001), a recursive extension of the predictivepostdictive formulation described above. The Kalman smoother consists of an iterative forward (predictive) pass through the stimulus sequence, followed by a backward (postdictive) pass. For model (a), the algorithm for the forward pass (the Kalman *filter*) is:

$$\begin{aligned} K\_i &= \frac{\sigma\_{i-1\mid i-1}^2 + (\sigma\_\nu t)^2}{\sigma\_{i-1\mid i-1}^2 + (\sigma\_\nu t)^2 + \sigma\_s^2} \\ \hat{\mathbf{x}}\_{l\mid i} &= \hat{\mathbf{x}}\_{i-1\mid i-1} + K\_i \left( \mathbf{x}\_{im} - \hat{\mathbf{x}}\_{i-1\mid i-1} \right) \\ \sigma\_{i\mid i}^2 &= (1 - K\_i) \left( \sigma\_{i-1\mid i-1}^2 + (\sigma\_\nu t)^2 \right) \end{aligned} \tag{A51}$$

Here, *K<sup>i</sup>* is the *Kalman gain* at time *i*; the notation *x*ˆ*i*|*<sup>j</sup>* refers to the estimated position of tap *i* based on all taps up to and including tap *j*; and σ 2 *i*|*j* is the variance of that estimate. The filter is initialized at the first tap, with *x*ˆ1|<sup>1</sup> = *x*1*m*, σ 2 <sup>1</sup>|<sup>1</sup> = σ 2 *s* , and runs forward until tap *n* is reached. The Rauch-Tung-Striebel algorithm for the subsequent backward pass is:

$$\begin{aligned} \mathbf{C}\_{i} &= \frac{\sigma\_{i|i}^{2}}{\sigma\_{i|i}^{2} + (\sigma\_{v}t)^{2}}\\ \hat{\mathbf{x}}\_{i|u} &= \hat{\mathbf{x}}\_{i|i} + \mathbf{C}\_{i} \left(\hat{\mathbf{x}}\_{i+1|u} - \hat{\mathbf{x}}\_{i|i}\right) \\ \sigma\_{i|u}^{2} &= \sigma\_{i|i}^{2} + \mathbf{C}\_{i}^{2} \left(\sigma\_{i+1|u}^{2} - \sigma\_{i|i}^{2} - (\sigma\_{v}t)^{2}\right) \end{aligned} \tag{A52}$$

We verified that Eqs A51 and A52 yielded the same percepts plotted in **Figures 10** and **11**.

#### **EXTENSIONS**

Although skin is a two-dimensional surface, we have so far considered only a single position axis, *x*, along which stimuli occur. In essence, we have assumed that the orthogonal, *y* coordinate, of the taps is a known constant. We have also assumed that the time, *t*, is known. Each of these restrictions can be removed.

#### **Two-dimensional movement**

A more realistic generative model would allow stimuli to move in any direction along a two-dimensional skin surface. To accomplish this, we can adopt an (*x*,*y*) Cartesian coordinate system in which the orthogonal components of the velocity vector are independently specified by low-speed priors:

$$\begin{split} p\left(\mathbf{v}\_{\mathbf{x}}\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{\mathbf{v}}} \exp\left(-\frac{\mathbf{v}\_{\mathbf{x}}^{2}}{2\sigma\_{\mathbf{v}}^{2}}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{\mathbf{v}}} \exp\left(-\frac{\left(\left(\mathbf{x}\_{2} - \mathbf{x}\_{1}\right)/t\right)^{2}}{2\sigma\_{\mathbf{v}}^{2}}\right) \\ p\left(\mathbf{v}\_{\mathbf{y}}\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{\mathbf{v}}} \exp\left(-\frac{\mathbf{v}\_{\mathbf{y}}^{2}}{2\sigma\_{\mathbf{v}}^{2}}\right) = \frac{1}{\sqrt{2\pi}\sigma\_{\mathbf{v}}} \exp\left(-\frac{\left(\left(\mathbf{y}\_{2} - \mathbf{y}\_{1}\right)/t\right)^{2}}{2\sigma\_{\mathbf{v}}^{2}}\right) \end{split} \tag{A53}$$

The tap 1 and 2 likelihood functions generalize to:

$$\begin{split} p\left(\mathbf{x}\_{1m},\boldsymbol{\chi}\_{1m}|\mathbf{x}\_{1},\boldsymbol{\chi}\_{1}\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{s1}} \exp\left(-\frac{(\mathbf{x}\_{1m}-\mathbf{x}\_{1})^{2}+\left(\boldsymbol{\chi}\_{1m}-\boldsymbol{\chi}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}}\right) \\ p\left(\mathbf{x}\_{2m},\boldsymbol{\chi}\_{2m}|\mathbf{x}\_{2},\boldsymbol{\chi}\_{2}\right) &= \frac{1}{\sqrt{2\pi}\sigma\_{s2}} \exp\left(-\frac{(\mathbf{x}\_{2m}-\mathbf{x}\_{2})^{2}+\left(\boldsymbol{\chi}\_{2m}-\boldsymbol{\chi}\_{2}\right)^{2}}{2\sigma\_{s2}^{2}}\right) \end{split} \tag{A54}$$

The posterior over trajectories then takes the form:

$$\begin{aligned} &p\left(\mathbf{x}\_{1},\boldsymbol{y}\_{1},\mathbf{x}\_{2},\boldsymbol{y}\_{2}|\mathbf{x}\_{1m},\boldsymbol{y}\_{1m},\mathbf{x}\_{2m},\boldsymbol{y}\_{2m},t\right) \\ &\propto \exp\left(-\left(\frac{(\mathbf{x}\_{1m}-\mathbf{x}\_{1})^{2}+\left(\boldsymbol{y}\_{1m}-\boldsymbol{y}\_{1}\right)^{2}}{2\sigma\_{s1}^{2}}+\frac{(\mathbf{x}\_{2m}-\mathbf{x}\_{2})^{2}+\left(\boldsymbol{y}\_{2m}-\boldsymbol{y}\_{2}\right)^{2}}{2\sigma\_{s2}^{2}}+\frac{\left(\left(\mathbf{x}\_{2}-\mathbf{x}\_{1}\right)^{2}+\left(\boldsymbol{y}\_{2}-\boldsymbol{y}\_{1}\right)^{2}\right)^{2}}{2\left(\sigma\_{s}t\right)^{2}}\right)\right) \end{aligned} \tag{A55}$$

It is straightforward to show that the length contraction formula resulting from Eq. A55 is identical to Eq. A20. Indeed, if we define the *x*-axis as the axis along which the tap measurements lie, then marginalization of Eq. A55 over *y*<sup>1</sup> and *y*<sup>2</sup> recovers the posterior density Eq. A18.

## **Temporal uncertainty**

Our model has assumed that the time between stimuli, *t*, is perceived veridically. This assumption can be removed. Goldreich (2007) showed that the Bayesian observer with temporal uncertainty tends to overestimate *t* in addition to underestimating *l*. Thus, the Bayesian observer can model time dilation as well as length contraction illusions.

## **FITTING TO HUMAN PERCEPTUAL DATA**

We found the value of tau that minimized the mean-squared error (MSE) between human and model performance. This was done separately for the perceptual data from Marks et al. (1982), Lechelt and Borchert (1977), and Kilgard and Merzenich (1995), shown in **Figures 1A–C**, and for the data from Helson and King (1931), shown in **Figure 10**.

The data of Helson and King (1931) required some processing prior to the fitting procedure. We fit the data reported in Tables 2–6 of Helson and King (1931). In those experiments, on each trial the participant reported whether the second spatial interval was perceived to be shorter than, equal to, or longer than the first interval (which was fixed at 3 cm). To fit these data, we first transformed them into an equivalent two-alternative forced-choice format by distributing each participant's "equal" responses evenly to the "shorter" and "longer" response categories. We then fit each participant's transformed data (proportion "*l*<sup>2</sup> is longer" responses) at each *t* <sup>2</sup> setting with a Weibull psychometric function:

$$\Psi\_{a,b,\chi,\delta}(l\_2) = (1-\delta)\left[\chi + (1-\chi)\left(1 - 2^{-\left(\frac{l\_2-3cm}{a}\right)^b}\right)\right] + \frac{8}{2}$$

Here δ is a lapse rate, γ is the probability that the concentrating participant would answer "*l*<sup>2</sup> is longer" when in fact *l*<sup>2</sup> = *l*<sup>1</sup> (i.e., 3 cm), *a* is a position parameter, and *b* is a slope parameter. We found the maximum likelihood parameter settings, and from them read off the point of subjective equality (PSE: *l*<sup>2</sup> that the participant judged longer than *l*<sup>1</sup> with 50% probability). We fit the Bayesian observer's tau to minimize the MSE between its performance and the average PSE of the six human participants across the five *t* <sup>2</sup> values tested by Helson and King (1931). Before doing these fits, we discarded the data from one of the six participants on one of the five *t* <sup>2</sup> points: "Observer B" of Helson and King (1931) did not have a valid PSE at *t* <sup>2</sup> = 0.25 s because that participant's transformed "*l*<sup>2</sup> is longer" response proportion was greater than 50% at all *l*<sup>2</sup> values.

## A transient auditory signal shifts the perceived offset position of a moving visual object

## **Sung-en Chien<sup>1</sup>\*, Fuminori Ono1,2 and KatsumiWatanabe<sup>1</sup>**

<sup>1</sup> Research Center of Advanced Science and Technology (Cognitive Science), The University of Tokyo, Meguro-ku, Tokyo, Japan <sup>2</sup> Yamaguchi University, Yamaguchi-shi, Yamaguchi, Japan

#### **Edited by:**

Takahiro Kawabe, Nippon Telegraph and Telephone Corporation, Japan

#### **Reviewed by:**

Stephen R. Arnott, Baycrest Centre for Geriatric Care, Canada Timothy Hubbard, Texas Christian University, USA

#### **\*Correspondence:**

Sung-en Chien, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan. e-mail: chiensungen@gmail.com

Information received from different sensory modalities profoundly influences human perception. For example, changes in the auditory flutter rate induce changes in the apparent flicker rate of a flashing light (Shipley, 1964). In the present study, we investigated whether auditory information would affect the perceived offset position of a moving object. In Experiment 1, a visual object moved toward the center of the computer screen and disappeared abruptly. A transient auditory signal was presented at different times relative to the moment when the object disappeared.The results showed that if the auditory signal was presented before the abrupt offset of the moving object, the perceived final position was shifted backward, implying that the perceived visual offset position was affected by the transient auditory information. In Experiment 2, we presented the transient auditory signal to either the left or the right ear.The results showed that the perceived visual offset shifted backward more strongly when the auditory signal was presented to the same side from which the moving object originated. In Experiment 3, we found that the perceived timing of the visual offset was not affected by the spatial relation between the auditory signal and the visual offset. The present results are interpreted as indicating that an auditory signal may influence the offset position of a moving object through both spatial and temporal processes.

**Keywords: motion offset, audiovisual interaction, representational momentum, visual motion representation, auditory transients**

## **INTRODUCTION**

Tracking the trajectory and localizing the position of a moving visual object are essential abilities for carrying out many tasks in everyday life. Studies have demonstrated that the perceived or remembered position of a moving object is consistently biased in the forward direction of motion. This forward bias is referred as representational momentum (RM) which can be observed in both implied and continuous motion. Studies of RM have also demonstrated that the final perceived position of a moving object is mislocalized in the forward direction of motion (Freyd and Finke, 1984; Hubbard and Bharucha, 1988). RM could result from the mental representation of the object's motion persisting for a brief period after abrupt offset (Teramoto et al., 2010).

The perceptual system receives information through different, interacting sensory modalities. The inputs from different sensory modalities interact in various ways. In this study, we were interested in whether the perceived position of a visual motion offset would be influenced by a transient auditory signal.

Several previous studies have investigated how visual motion perception is modulated by a transient auditory signal. In the flashlag effect, the perceived position of a moving object appears to be relatively ahead of a physically aligned flash (e.g., Nijhawan, 1994; Watanabe and Yokoi, 2006, 2007, 2008; Maus and Nijhawan, 2009). This phenomenon seems to be a result of the visual representation of moving objects being spatially shifted forward to counteract delays in the neural system on the perceived position. Vroomen and de Gelder (2004) showed that the magnitude of the flash-lag

effect is reduced when a transient auditory signal is presented before or simultaneously with the flash. In addition, Heron et al. (2004) demonstrated that the location of a horizontally moving object that changes its direction against a vertical virtual surface is perceptually displaced forward with respect to the direction of previous motion when a sound is presented after the actual bounce event, and the perceived bounce position is shifted in the direction opposite to previous motion when a sound is presented before the actual bounce. Fendrich and Corballis (2001) asked participants to report the position of a rotating flash when an audible click was heard. The flash was seen earlier when it was preceded by an audible click and later when followed by the click.

These studies indicate the possibility that,when judging the offset position of a moving visual object, our perceptual system may not rely exclusively on visual information, but may also utilize information from other modalities. However, this explanation is not completely consistent with the modality precision hypothesis, which suggests that the modality with the highest precision with regard to the required task tends to be dominant in multimodal interactions (Shipley, 1964; Welch and Warren, 1980, 1986; Welch et al., 1986; Spence and Squire, 2003). The modality precision hypothesis would suggest that when judging the offset position of a moving visual object, the perceived visual offset would be processed exclusively by the visual system rather than also utilize information from other modalities (e.g., audition). Therefore, we hypothesized that in a situation allowing a transient auditory stimulus to be associated with a visual motion offset, the auditory

stimulus will influence the perceived final position of the moving object.

Recently, Teramoto et al. (2010) found that the magnitude of RM is influenced by a continuous sound accompanying a moving visual object. They showed that RM is enhanced when the sound terminates after the offset of the visual object, but reduced when the sound terminates before visual offset. However, their results also indicated that transient auditory signals presented at the onset and around the offset of the visual motion had no effect on the perceived offset position of the visual object. On the basis of these observations, they suggest that the sustained sound during visual motion is necessary for the audiovisual integration to have an effect. However, based on studies indicating that visual motion perception can be modulated by a transient auditory signal (Fendrich and Corballis, 2001; Heron et al., 2004;Vroomen and de Gelder, 2004), it is still possible that the visual offset position could be influenced when a transient sound is presented temporally proximal to the offset of the visual stimulus without an auditory signal having been presented at the motion onset. Additionally, in the study by Teramoto et al. (2010), the authors measured RM with a probe-judgment task. However, a mouse-pointing task is typically used with continuous motion target (Hubbard, 2005). In light of this information, we decided to measure the perceived visual offset position by using a mouse-pointing task in the present study.

Multisensory interactions are also affected by the characteristics of the stimuli in different modalities. For example, a single visual flash can be perceived as multiple flashes if accompanied by multiple auditory stimuli (sound-induced illusory flash). Discontinuous stimuli in one modality seem to alter the perception of continuous stimuli in another modality. This indicates that multisensory interaction is at least partly affected by stimulus characteristics: continuous versus discontinuous (Shams et al., 2002). Additionally, Courtney et al. (2007) reported that one flash presented near a visual fixation induces an illusory flash in the periphery. Courtney et al. suggest that the effect of stimulus discontinuity/continuity may also be valid for unisensory stimuli.

The multisensory effect of a transient stimulus is not confined to perceptual alternation between competing incompatible interpretations when the perceptual system is confronted with ambiguous stimuli. The multisensory effect can also be observed when there are no competing incompatible interpretations. Attentional repulsion is described as the perceived displacement of a vernier stimulus in a direction that is opposite to a brief peripheral visual cue. Arnott and Goodale (2006) demonstrated that the repulsion effect could be induced by presenting lateralized sounds as peripheral cues, showing that auditory spatial information can displace the perceived positions of static visual stimuli. This finding indicates the possibility that the location of sound may affect the retinotopic coding. Recently, Teramoto et al. (2012) presented results of a study of visual apparent motion in conjunction with a sound delivered alternately from two loudspeakers aligned horizontally or vertically. Participants reported that the direction of visual apparent motion was consistent with the direction of sound alternation or the auditory stimulus influenced the path of apparent motion. The researchers suggest that auditory spatial information could also modulate the perception of a visual moving object, especially in the peripheral visual field.

Audiovisual interaction is enhanced when visual signals and auditory signals are presented in close proximity spatially. For example, observers are more likely to report that visual stimuli and auditory stimuli are presented simultaneously when they originate from the same spatial position than when they originate from different positions (Zampini et al.,2005).When observers are asked to determine the direction of auditory apparent motion while trying to ignore unrelated visual motion, they perform worse when the auditory motion is in the opposite direction to the visual apparent motion. This audiovisual dynamic capture effect is larger when the auditory and visual stimuli are presented from close spatial locations (Soto-Faraco et al., 2002; Meyer et al., 2005; Spence, 2007).

On the basis of these findings, we hypothesized that it is possible for auditory information to affect perceived visual motion offset in the peripheral visual field, and that this effect will be enhanced when visual stimuli and auditory stimuli are presented to the same hemifield. Because studies have indicated that an auditory transient can alter apparent motion perception (e.g., Heron et al., 2004), we examined whether a transient auditory signal would affect the perceived offset position of a visual moving object, and if so, spatial contingency between auditory signal and visual object would enhance the auditory modulation. To achieve this goal, we presented a transient sound around the time of visual motion offset and asked participants to report the perceived offset position of the visual stimulus (Experiment 1). In addition, we tested whether the auditory spatial information would influence the effect of the auditory stimulus on the perceived visual offset position (Experiment 2). After affirmative results were obtained in both experiments, we examined whether the auditory effects were caused by distortion in the perceived timing of the offset of the visual moving object (Experiment 3).

## **EXPERIMENT 1**

In Experiment 1, we examined the possibility that the timing of a transient auditory signal would affect the perceived offset position of a visual moving object. Such an effect would demonstrate that a continuous auditory stimulus during visual motion is not necessary to alter the perceived visual offset position. We conducted Experiment 1A and 1B. The visual target appeared in left visual field and moved rightward (Experiment 1A) or in right visual field and made rightward motion (Experiment 1B), and then the visual target disappeared around the center the display. A transient auditory signal was presented around the visual motion offset of the visual target. We treated the two motion direction conditions as a between-subjects variable to reduce task loads for each participant.

## **METHOD**

#### **Participants**

There were 16 paid volunteers in Experiment 1A (10 males, 6 females) and 1B (11 males, 5 females). Their ages ranged from 20 to 34 years (mean = 25.1) in Experiment 1A and from 19 to 28 years (mean = 21.7) in Experiment 1B. All were right-handed by self-report. All participants had normal or corrected-to-normal vision and audition and were naïve as to the purpose of this study.

## **Apparatus and stimuli**

Participants observed the visual stimuli on a 23<sup>00</sup> CRT monitor at a viewing distance of 60 cm. The monitor's refresh rate was 100 Hz. The visual and auditory stimuli were presented using the MATLAB operating environment and the Psychtoolbox extensions (Brainard, 1997; Pelli, 1997). The background was divided horizontally into two parts (**Figure 1**). The upper part was gray (40˚ × 10.5˚, 7.85 cd/m<sup>2</sup> ) and the lower part was black (40˚ × 19.5˚, 0.03 cd/m<sup>2</sup> ). A white fixation cross (1˚ × 1˚, 61.27 cd/m<sup>2</sup> ) was presented at the center of the lower background.

The visual stimulus was a black disk (1˚ in diameter) that appeared at the bottom of the gray background, 15˚ to the left (Experiment 1A) or right (Experiment 1B) of the midpoint. The disk moved from left to right (Experiment 1A) or from right to left (Experiment 1B) at a constant speed of 15˚/s. The disk disappeared when its center was at the midpoint or randomly jittered from the midpoint by ±0.3˚. The auditory stimulus was a transient auditory signal with a 1000-Hz pure tone without onset or offset intensity ramps, presented via headphones to both ears for 10 ms. Note that previous research has shown that a 10-ms-sound could produce effect on audio-visual interaction (e.g., Fujisaki et al., 2004; Ono and Kitazawa, 2011). The approximate range of sound pressure level was 60–65 dB. The sound was presented 120, 80, or 40 ms before the visual motion offset, simultaneously with the visual offset (0 ms), or 40, 80, or 120 ms after the visual offset. As a control condition, we included trials in which the sound was absent.

## **Procedure**

Participants started each trial by pressing the space key. The black disk appeared and stayed stationary at the initial position for 500 ms. Participants were asked to observe the disk while keeping their eyes on the fixation cross. After the initial stationary period, the black disk moved at a constant speed of 15˚/s for 1000 ms and then disappeared around the midpoint of the display. A mouse cursor appeared 1˚ above the fixation cross 200 ms after the disappearance of the visual target. The participants were instructed to move the mouse cursor and click the mouse button at the target's visual offset position.

Participants performed 10 practice trials to familiarize themselves with the position judgment task. Then, they performed 10 trials in each combination of conditions for a total of 240 trials (8 sound conditions × 3 visual offset positions × 10 trials). Trials of all conditions were randomly ordered.

## **Statistical analysis**

The data were submitted to a two-way mixed-design analysis of variance (ANOVA) followed by *post hoc* comparisons with the *Bonferroni* correction with the alpha level set at 0.05.

## **RESULTS AND DISCUSSION**

We calculated the average deviation of the perceived visual offset position from the physical visual offset point for each sound condition. **Figure 2** shows the combined results of Experiments 1A and 1B. The horizontal axis represents the different sound conditions. The vertical axis represents the perceived deviation from the actual physical visual offset position. A negative value in the deviation from visual offset (*Y* -axis) means that the perceived visual offset position was behind the actual visual offset position.

We performed a two-way mixed-design ANOVA, in which the visual field of start position was the between-subjects factor and the timing of the auditory signal was treated as the within-subject factor. The main effect of the visual field of start position was not significant [*F*(1,30) = 0.499, *p* = 0.485]. The main effect of the timing of the auditory signal was significant [*F*(7,210) = 36.261, *p* < 0.001]. There was no significant interaction between the visual field of start position and the timing of the auditory signal [*F*(7,210) = 0.48, *p* = 0.849]. Overall, these results suggest that the earlier the auditory signal was presented, the farther away the visual offset was shifted backward (i.e., the perceived visual offset position shifted backwards).

negative value in the deviation from offset (Y -axis) means that the perceived visual offset position was behind the actual visual offset position. Error bars represent within-participants SEMs (Loftus and Masson, 1994; Cousineau, 2005) for each presentation. Data points with an \* mark indicate that the perceived positions differ from 0.



Values with \* mark indicated that perceived displacements significantly differ from zero.

Then, we compared the cell means of the perceived visual offset position against zero to test whether there was a significant displacement from the actual position in each condition (**Table 1**). The adjusted alpha level was 0.006 (0.05/8) when comparing the cell means against zero. In Experiment 1A, only the −120, −80, and −40 ms conditions significantly differed from zero [*t*(15) = 5.69, *t*(15) = 5.88, and *t*(15) = 5.89, respectively; all *p* < 0.006]. In Experiment 1B, the −120 and −80 ms conditions differed significantly from zero [*t*(15) = 6.57 and *t*(15) = 5.27, respectively; *p* < 0.006]. Thus, we confirmed that when the auditory signal was presented before the physical offset of the visual stimulus, the visual offset position tended to be perceived as behind the actual physical visual offset position. Conversely, no significant displacement was found in the 0, 40, 80, and 120 ms conditions, implying that the auditory signal did not produce an effect when presented after or at the moment of the visual motion offset.

We also compared the cell means of each condition in which an auditory signal was presented to the cell mean of the silent condition. The adjusted alpha level is 0.007 (0.05/7). In Experiment 1A, the perceived visual offset positions in the −120, −80, and −40 ms conditions differed from that in the silent condition [*t*(15) = 4.46, *t*(15) = 4.23, and *t*(15) = 3.34, respectively; all *p* < 0.007]. In Experiment 1B, the perceived visual offset positions in the −120, −80, and −40 ms conditions differed from that in the silent condition [*t*(15) = 5.24, *t*(15) = 5.01, and *t*(15) = 3.30, respectively; all *p* < 0.007]. We observed that the silent condition did not differ from the conditions in which the auditory signal was presented after physical visual offset in either Experiment 1A [*t*(15) < 1.11, *p* > 0.05] or 1B [*t*(15) < 1.05, *p* > 0.05].

The lack of RM in the present experiments is notable, but similar findings have been reported in several previous studies in which observers were given instructions to maintain fixation. Previous research has also indicated that fixation decreases RM for targets with smooth and continuous motion (Kerzel, 2000). It is possible that we did not observe RM in Experiment 1 because we used visual stimuli with smooth and continuous motion. However, RM has also been observed for targets with implied motion and for frozen-action photographs that do not elicit eye movements (Kerzel, 2003; Hubbard, 2005, 2006). Although we emphasized to participants the importance of maintaining focus on the fixation cross, we did not record eye movements. In order to examine whether eye movements might have played a major role in the present experiment, we performed an experiment for a supplementary examination using the same stimuli as in Experiment 1A, in which participants (*N* = 5) were free to move their eyes during the experiment. The results showed the same pattern as Experiment 1A [*F*(7,28) = 8.028, *p* < 0.001]. We observed a tendency toward greater backward displacement when the sound was

presented earlier. Therefore, the lack of RM in the present study cannot be explained completely by the instruction to maintain fixation. The lack of RM might be due partially to the shorter delay from the target offset to the appearance of the mouse cursor. Kerzel et al. (2001) showed that RM was larger with the longer delay between the target and probe. The delay was 200 ms in our study while it was 500 ms in Teramoto et al.'s study.

In the present study, it is more likely that the perceived timing of visual motion offset was attracted toward the timing of the presentation of the transient sound when the sound was presented before the physical visual motion offset, which resulted in the decreased magnitude of RM and consequently induced backward displacement. When the transient sound was presented after the physical visual motion offset, the perceived visual offset position of the visual target did not differ from the condition in which the sound was absent. In addition, our results also imply that this effect might not be confined to a visual stimulus presented at the periphery that moves to the foveal region. However, these issues require further empirical examination.

We also analyzed the average response times for completing the mouse-pointing task in each trial. Response times were not affected by different auditory stimulus timings [visual field of start position, *F*(1,30) = 0.967; timing of the auditory signal, *F*(7,210) = 0.873; interaction, *F*(7,210) = 0.250; all *p* > 0.05].

## **EXPERIMENT 2**

In Experiment 2, we investigated whether the spatial contingency between auditory signals and visual events would modulate the auditory influence on the perceived offset position of the visual motion. We presented a lateralized transient auditory signal to either the left or the right ear with the same visual stimuli used in Experiment 1. The visual target appeared at left visual field and moved rightward in Experiment 2A. In Experiment 2B, the visual target appeared at right visual field and moved leftward. The visual field of start position was treated as a between-subjects variable to reduce the task load for each participant.

## **METHOD**

## **Participants**

There were 15 paid volunteers in Experiment 2A (10 males, 5 females) and 2B (9 males, 6 females). Their ages ranged from 19 to 31 years (mean = 21.79) in Experiment 2A and from 19 to 25 years (mean = 21.72) in Experiment 2B. All participants were right-handed by self-report, except for one left-handed participant in Experiment 2B. All of the participants had normal or correctedto-normal visual acuity and were naïve as to the purpose of this study.

## **Stimuli and procedure**

The apparatus and visual stimuli of Experiment 2 were the same as those of Experiment 1 except for the following points. In Experiment 2, the auditory signals were presented to the left ear in the half of trials and to the right ear in the other half of trials. The sound was presented 40, 80, or 120 ms before or after the offset of the visual target or at the same time as the visual offset. Since we did not find any differences between any conditions in which the auditory signal was presented after physical visual offset and the silent condition in Experiment 1, we did not include a silent condition in Experiment 2. After 10 training trials, the participants performed 10 trials of each experimental condition for a total of 420 trials (2 sound positions × 3 visual offset positions × 7 timings × 10 trials). Trials of all conditions were randomly ordered, so that in each trial the auditory signal might be presented to the same or the opposite side as the visual target origination.

## **Statistical analysis**

The data were submitted to a three-way mixed-design ANOVA followed by *post hoc* comparisons with *Bonferroni* corrections *p* < 0.05.

#### **RESULTS AND DISCUSSION**

The top and bottom panels of **Figure 3** show the results of Experiments 2A and 2B, respectively. We conducted a threeway mixed-design ANOVA, in which the visual field of the start position was the between-subjects factor and the sound contingency and the timing of the auditory signal were treated as within-subject factors. The sound contingency indicates if the auditory signal was presented to the same or the opposite side as the originating position of the visual target. We found the significant main effects of auditory timing [*F*(6,168) = 77.48, *p* < 0.001] and sound contingency [*F*(1,28) = 43.526, *p* < 0.001]. The main effect of the start position approached the significance level [*F*(1,28) = 4.127, *p* = 0.052]. There were no significant interactions [visual field × sound contingency, *F*(1,28) = 0.049; visual field × auditory timing, *F*(6,168) = 0.52; sound contingency × auditory timing, *F*(6,168) = 1.540; visual field × sound contingency × auditory timing, *F*(6,168) = 1.010; all *p* > 0.05]. The results imply that the timing of the auditory signal affected the perceived visual offset position, replicating the findings of Experiment 1. Furthermore, when the sound is presented in the same hemifield as the visual target's start position, the effect was enhanced (i.e., more backward displacement). In addition, the results of Experiment 2B seemed to shift positively along the *Y* -axis, suggesting the possibility that RM was generally more pronounced in Experiment 2B.

## **Experiment 2A**

We compared the cell means of the perceived visual offset position in Experiment 2A against zero to test whether there was a significant displacement from the actual visual offset position in each condition (**Table 2**). The adjusted level of the *p*-value required for significance with *Bonferroni* correction is 0.007 (0.05/7) when comparing the cell means to zero. When the auditory signal was presented to the same side from which the visual target appeared,

was behind the actual visual offset position. Error bars represent within-participants SEMs (Loftus and Masson, 1994; Cousineau, 2005) for each presentation. Data points with an \* mark indicate that the perceived positions differ from 0.

the perceived visual offset positions significantly differed from zero in the −120, −80, and −40 ms conditions [*t*(14) = 5.02, *t*(14) = 4.25, and *t*(14) = 4.09, respectively; all *p* < 0.007]. When the auditory signal was presented in the hemifield opposite from which the target appeared, the perceived visual offset positions significantly differed from zero in the−120,−80, and−40 ms conditions [*t*(14) = 4.72, *t*(14) = 4.04, and *t*(14) = 4.02, respectively; all *p* < 0.007].


Values with \* mark indicate that perceived displacements significantly differ from zero.

We also compared the cell means between different sound positions in the −120, −80, and −40 ms conditions in which significant displacements were observed. The adjusted level of the *p*-value required for significance with *Bonferroni* correction was 0.017 (0.05/3). When the auditory signal was presented to the same side as the visual target origination, the backward displacement was larger [*t*(14) = 2.44 and *t*(14) = 2.51 for the −120 and −80 ms conditions, respectively; all *p* < 0.017].

#### **Experiment 2B**

The results of Experiment 2B were different from Experiment 2A when comparing the cell means to zero (**Table 2**). The adjusted level of the *p*-value required for significance with *Bonferroni* correction is 0.007 (0.05/7) when comparing the cell means to zero. In Experiment 2B, significant forward displacements (i.e., RM) were observed in the conditions with 40, 80, and 120 ms delays and with the sound presented to the side opposite the visual target origination [*t*(14) = 3.45, *t*(14) = 3.27, and *t*(14) = 3.51 for the 40, 80, and 120 ms conditions, respectively; all *p* < 0.007]. However, there was no significant difference among these three cell means (all *p* values > 0.017, the *p*-value required for significance with *Bonferroni* correction is 0.05/3 = 0.017). RM was observed when the sound was presented to the opposite side, implying that when the auditory signal was presented to the opposite side (i.e., to the side toward which the visual target moved), it attracted the offset position of the visual target, which resulted in larger forward displacements.

Conversely, significant backward displacements were observed only in the −120 and −80 ms conditions when the sound occurred on the same side as the visual target [*t*(14) = 3.25 and *t*(14) = 3.21 for the−120 and−80 ms conditions, respectively; both *p* < 0.007], but the difference between these conditions was not significant [*t*(14) = 1.91, *p* = 0.15]. Significant backward shift was observed only when the sound was presented on the same side as the visual target. It seems that RM (i.e., forward displacement) was more evident in Experiment 2B. However, similar to the results of Experiment 2A, an auditory signal presented before the physical offset of the visual object exhibited a net effect of RM and a process induced by the transient auditory signal decreasing RM or even inducing backward displacement of the perceived visual offset position in some conditions, and this effect was stronger when the auditory signal was presented to the side from which the visual target appeared. Hubbard (2005) indicated that displacement of the perceived visual offset position is influenced by multiple factors such

as RM, representational gravity, and characteristics of the context. The result of the present study implied the possibility that a transient auditory signal closely associated with the visual offset also influences the perceived visual offset from the physical visual offset position.

A consistent effect of motion direction on RM has not been reported in horizontal motion (Hubbard, 2005). Several previous researchers have suggested that forward displacement of horizontally moving targets is larger in the left visual field (Halpern and Kelly, 1993; White et al., 1993). However, this conflicts with our results in which the larger RM was observed in the right visual field. Since we did not compare rightward and leftward motion within each visual field and with sounds presented from different directions, the present study could not rule out the possibility that motion direction might have influenced the enhancement of RM.

Other research has shown that the attentional mechanisms in the left hemisphere tend to distribute attentional resources within the right visual field, while the attentional mechanisms in the right hemisphere distribute attentional resources across both left and right visual fields. Therefore, there might be a slight bias of spatial attention favoring the left visual field (Mesulam, 1999). It is possible that processing speed or acuity is slightly different between left and right visual fields. However, this issue needs to be tested in future investigations.

Averaged response times for completing the mouse-pointing task in each trial were not affected by different auditory onset times in Experiment 2 [visual field of start position, *F*(1,28) = 0.057; timing of the auditory signal, *F*(6,168) = 1.646; sound contingency, *F*(1,28) = 0.403; visual field × sound contingency, *F*(1,28) = 1.080; visual field × auditory timing, *F*(6,168) = 1.741; sound contingency × auditory timing, *F*(6,168) = 1.522; visual field × sound contingency × auditory timing, *F*(6,168) = 1.344; all *p* > 0.05].

## **EXPERIMENT 3**

The results of Experiment 1 implied that the perceived offset position of a visual moving object shifts backward when a transient auditory signal is presented before the physical visual offset. In addition, we observed larger backward displacement when the auditory signal was presented earlier.We interpreted this finding to mean that the perceived timing of visual motion offset is attracted toward the *timing* of the presentation of transient sound, which results in a decreased magnitude of RM and induces backward displacement. Experiment 2 showed that the perceived visual offset position exhibits a larger shift induced by the spatial information relative to the visual target when the transient auditory signal is presented to the same side as the visual field from which the moving object originates. It is possible to argue that a sound presented to the same side as the visual object might be heard earlier (perhaps because attention might be biased toward the side where the visual object appeared) and consequently shift the perceived visual offset position backward more strongly (i.e., the effect is temporal). Alternatively, the spatial information of the sound relative to the visual target might shift the perceived visual offset position toward the side of the auditory signal without influencing the timing judgment (i.e., a spatial attraction of the visual offset by the auditory signal). Experiment 3 was conducted to examine if the relative timing between visual and auditory events differs when the sound is presented to the same or opposite side as the visual object. Although RM was observed only in Experiment 2B, the results of Experiments 2A and 2B were similar. For this reason, we used the same visual and auditory stimuli as in Experiment 2A, but we asked participants to perform a temporal-order judgment task.

## **METHOD**

#### **Participants**

Fifteen paid volunteers participated in the experiment (9 males, 6 females). Their ages ranged from 20 to 25 years (mean = 22.4) and all were right-handed. All the participants had normal or corrected-to-normal visual acuity and were naïve as to the purpose of this study.

#### **Stimuli and procedures**

The apparatus and stimuli were identical to those used in Experiment 2A. Participants were asked to focus on the fixation cross and observe the moving object. The transient auditory signal was presented −120, −80, −40 ms before the visual offset; synchronous with the visual offset; or 40, 80, or 120 ms after the visual offset. The participants were asked to judge whether the auditory signal was presented before or after the offset of the moving disk. After 10 training trials, 10 experimental trials in each condition were presented for a total of 420 trials (2 sound positions × 3 visual offset positions × 7 sound timings × 10 trials). Trials of all conditions were randomly ordered.

### **Statistical analysis**

The data were submitted to a two-way mixed-design ANOVA.

## **RESULTS AND DISCUSSION**

**Figure 4** shows the results of Experiment 3. A two-way repeated measures ANOVA revealed that the main effect of sound timing was significant [*F*(6,84) = 18.214, *p* < 0.001], while the main effect of sound contingency was not significant [*F*(1,14) = 0.90, *p* = 0.358]. No interaction was observed between sound timing and sound contingency [*F*(6,84) = 0.735, *p* = 0.623]. Thus, the proportion of "target disappeared first" responses increased with the delay of the auditory signal, and more importantly, the proportion of these responses did not differ between the same-side and opposite-side conditions. This suggests that the spatial information of the auditory signal did not affect the judgment of

**FIGURE 4 | Results of Experiment 3.** The horizontal and vertical axes represent the different sound presentation conditions and the proportion of "target disappeared first" responses, respectively. Error bars represent within-participants SEMs (Loftus and Masson, 1994; Cousineau, 2005) for each presentation.

relative timing between auditory events and visual events. Therefore, enhanced displacement induced by the sound from the same visual field with the visual target in Experiment 2 resulted from the spatial information of the sound relative to the visual target. It produced a larger spatial attraction of the visual offset. The effect of a sound's spatial information did not interact with the effect of the sound's temporal information.

Averaged response times for completing the temporal-order judgment task in each trial were not affected by the auditory timing or sound contingency in Experiment 3 [sound contingency, *F*(1,14) = 0.852; auditory timing, *F*(6,84) = 1.229; interaction, *F*(6,84) = 1.205; all *p* > 0.05].

## **GENERAL DISCUSSION**

The present study reports several novel findings. First, a transient auditory signal presented before the visual offset of a moving object shifted the perceived visual offset position backward as if it truncated the visual trajectory (Experiment 1). Second, when the auditory signal was lateralized, the sound's spatial information (on the same or opposite side as the visual target) influenced the perceived visual offset position; the visual offset position tended to be attracted toward the side of the sound presentation (Experiment 2). Third, the spatial information of the lateralized sound did not influence the judgment of visual offset timing, implying that the effect of the lateralized sound in Experiment 2 was mainly in the spatial domain (Experiment 3). Fourth, the effect of the lateralized sound was different for visual targets starting from the left or right visual field. For a visual target appearing in the left visual field and moving rightward, RM was not observed, and only a sound presented before physical visual offset shifted the perceived visual offset position backward. However, a lateralized sound from the same direction as the visual target shifted the perceived visual offset position toward the side of the presentation of the sound more strongly than the backward shift observed with lateralized sound from the opposite visual field. For a visual target appearing in the right visual field and moving leftward, RM was observed when the auditory signal was presented from the opposite direction after physical visual offset. When the auditory signal was presented before the physical visual offset, RM was not observed, while the backward displacement of the perceived visual offset position was enhanced by sound from the same direction as the visual target (Experiment 2).We interpret these results to mean that the auditory signal may influence the visual offset position of the moving object through both spatial and temporal processes. Temporal information of the auditory signal influenced the perceived offset timing of the visual object as if it truncated the visual trajectory. However,when the auditory signal occurred in the same hemifield as the visual target, enhanced backward displacement was observed relative to when the auditory signal occurred in the hemifield opposite to the visual target.

The results of Teramoto et al. (2010)suggest that the close association between the auditory and visual signals accomplished by onset synchrony is necessary for the presented sound to have an effect on the perceived position of a visual offset. Their results also suggest that a transient auditory signal presented around the moment of visual motion offset has no influence on perceived visual offset position when another sound is presented at the onset of the motion. The findings of present study seem inconsistent with Teramoto et al.'s (2010). One possible source of discrepancy between their findings and ours would be that Teramoto et al. (2010) presented the auditory signals at both the onset and offset of the visual motion, whereas we presented an auditory signal only at or near the offset of the visual motion. The auditory signal at the onset of the motion might start a duration estimation process that may counteract the auditory influence on the visual offset. To address this question, we performed an experiment for a supplementary examination (*N* = 5), presenting a sound at both the onset and offset of the visual motion. However, the same results as in Experiment 1 were again observed [*F*(7,28) = 7.016, *p* < 0.01]. A tendency for larger backward displacement was observed when the sound was presented before the visual target offset. The pattern showed that perceived visual offset positions were not influenced by sound presented after the visual motion offset. Therefore, it seems that the reason why the offset sound does not exhibit its effect in Teramoto et al.'s (2010) study does not result from the sound presented at the onset.

Another source of discrepancy could be differences between the ways of response acquisition. We asked participants to report the visual offset position directly by clicking a mouse, and we observed backward displacement in all experiments, but RM only in Experiment 2B (around 0.2˚), whereas Teramoto et al. (2010) measured visual offset by probe judgments and observed robust RM (around 0.3˚–0.6˚). Previous research has shown that RM is larger when participants report the offset position by pointing with a mouse (Kerzel et al., 2001). This enhancement might result from the separate processes or representations subserving motor actions and cognitive judgments (Goodale and Milner, 1992). While Goodale and Milner's model suggests that hand movement is not "deceived" by visual illusion, other researchers have indicated that the mental extrapolation that calculates a visual object's position by analyzing its speed and trajectory occurs in the motor system to a larger degree than in the visual system (Yamagishi et al., 2001; Kerzel, 2003). Therefore more localization errors occur with motor-oriented measurement methods. A response that depends more upon perception-for-action might lead to larger localization errors both when forward and backward displacement occurs. In the present study, backward displacement was induced by transient sounds, and a response depending more upon perception-for-action might allow for a stronger effect of the auditory signal than response depending more upon perceptionfor-identification on the perceived offset position of the visual stimulus.

Previous research has shown that a transient visual stimulus presented at the moment of visual motion offset affects the perceived offset position of a visual target. Müsseler et al. (2002) presented a visual flash simultaneously with the offset of a moving visual target and asked participants to judge the target's position when the flash appeared. They observed no RM; rather, the perceived visual offset position was displaced backward compared to the actual visual offset position, similar to our observations. Although the stimulus parameters and procedure were different, their findings point to the possibility that intramodel interaction (the effect of visual transients on visual localization) might be extended into audiovisual interaction. That is, both visual and auditory transient signals presented before the visual motion offset could induce backward displacement of perceived visual offset position. This will be an interesting venue for future investigations.

In addition, when a brief presented stationary visual stimulus was aligned with the final portion of the moving target's trajectory, memory for the location of the stationary object was displaced in the direction of motion of the moving target (Hubbard, 2008). It was suggested that RM of the moving target influences the representation of the stationary object's location, and this influence the stationary object being displaced in the direction of the motion of the moving target. It also implied the possibility that a general mechanism coding both location and motion information. Therefore, information of the stationary object and the moving object influence the perceived position of each other.

The auditory system is generally superior to the visual system in terms of temporal perception, and the visual system is generally superior to the auditory system in terms of spatial perception. Therefore, vision can provide more accurate spatial information, while audition can provide more accurate temporal information. The modality precision hypothesis suggests that the modality with the highest precision with regard to the required task tends to be dominant in multimodal interactions (Shipley, 1964; Welch and Warren, 1980, 1986; Spence and Squire, 2003). In the present study, we found that the perceived visual offset position was shifted backward when the auditory signal was presented before the visual offset. This implies that the perceived timing of the visual motion offset was attracted to the presentation timing of the auditory signal, consequently inducing the backward displacement. This is consistent with auditory superiority for temporal perception (e.g., the temporal ventriloquism effect; Vroomen and de Gelder, 2004). On the other hand, our results also suggest that the effect of lateralized sound was spatial rather than temporal, a finding that cannot be explained by the modality precision hypothesis. There seem to exist significant spatial effects from audition to vision, particularly when blurred visual stimuli which are poorly localized are presented (Alais and Burr, 2004). Teramoto et al. (2012) have demonstrated that spatial aspects of sound can modulate visual motion perception, suggesting that visual and auditory modalities influence each other in motion processing. Thus, taken together, our results indicate that auditory information influences visual perception (at least for the perceived position of a visual offset) via both temporal and spatial processes.

Maus and Nijhawan (2009) proposed a dual-process model to explain differences between how the visual system processes the positions of abruptly vanishing objects and gradual disappearing objects. The first process calculates the position of a moving object in the near future by analyzing its speed and trajectory. When the moving object disappears abruptly, the second process modulates the forward displacement. This modulation mechanism relies on accurate spatial information provided by the transient of the abrupt offset of the moving object. A stronger transient leads to more accurate localization of the moving object because it aids position representation by employing the retinal off-transient to win the competition for perceptual awareness. The present findings could be interpreted to mean that the modulation mechanism relies not only on visual information provided by the retinal off-transient, but also on information provided by a transient auditory signal that is temporally and spatially close to the visual motion offset. If the transient auditory signal is firmly associated with the visual motion offset, the neural system also uses temporal and spatial information provided by the auditory signal to modulate possible overshoots. The present study suggests the possibility that the visual system integrates auditory information presented before and after the offset of visual motion.

However, the results of Teramoto et al.'s study was not consistent with Maus and Nijhawan's account and the present study. In Teramoto et al.'s study, they suggest that the sustained sound during visual motion is necessary for the audiovisual integration to enhance or reduce RM. Conversely, the results of

## **REFERENCES**


the present observed the effect of a transient auditory signal on perceived visual offset. However, due to discrepancies in experiment paradigm, parameters, and stimuli, it prevents directly comparisons between the present study or Maus and Nijhawan's account and Teramoto's et al.'s study. Perhaps the sustained sound influences the audiovisual integration in a different way with the transient sound. This will also be interesting for future investigations.

Nevertheless, signals from different sensory modalities are not combined indiscriminately. We observed the backward displacement mainly when an auditory signal was presented 120 or 80 ms before the actual visual offset. However, we observed that the spatial information of an auditory signal modulated RM only when sound was presented 80 ms after physical visual offset (Experiment 2B). This might imply that the temporal window during which the visual system integrates auditory information is approximately 100 ms before and after visual motion offset. This is consistent with the temporal window of sound-induced illusory flash (Shams et al., 2002) and multisensory integration in superior colliculus neurons in the mammalian brain (Meredith et al., 1987).

In conclusion, a transient auditory signal presented before or after the offset of physical motion of a visual stimulus can modulate the perceived visual offset position. The magnitude of the backward or forward shift depends on the spatial relation between the auditory and the visual stimulus. In order to elucidate the underlying mechanism of these results, future experiments should be conducted to investigate how closely visual and auditory information must correspond and whether the auditory effect on visual offset occurs when the visual object moves toward the peripheral field. In the present experiments, the visual field,motion direction, and sound position were confounded, and therefore we cannot rule out the possibility that the observed effects were induced by a combination of these factors. Further investigations are warranted to address this issue.

## **ACKNOWLEDGMENTS**

This research was supported by the Japan Science and Technology Agency (CREST) and the Grants-in-Aid for Scientific Research (23240034) from the Ministry of Education, Culture, Sports, Science and Technology.


Loftus, G. R., and Masson, M. E. J. (1994). Using confidence intervals in within-subject designs. *Psychon. Bull. Rev.* 1, 476–490.

Maus, G. W., and Nijhawan, R. (2009). Going, going, gone: localizing abrupt offsets of moving objects. *J. Exp. Psychol. Learn. Mem. Cogn.* 35, 611–626.

Meredith, M. A., Nemitz, J. W., and Stein, B. (1987). Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. *J. Neurosci.* 7, 3215–3229.

Mesulam, M. M. (1999). Spatial attention and neglect: parietal, frontal and cingulate contributions to the mental representation and attentional targeting of salient extrapersonal events. *Philos. Trans. R. Soc. Lond. B Biol. Sci.* 354, 1325–1346.


intervals followed by repetitive stimulation. *PLoS ONE* 6:e28722. doi:10.1371/journal.pone.0028722

Pelli, D. G. (1997). The videotoolbox software for visual psychophysics: transforming numbers into movies. *Spat. Vis.* 10, 437–442.

Shams, L., Kamitani, Y., and Shimojo, S. (2002). Visual illusion induced by sound. *Cogn. Brain Res.* 14, 147–152.

Shipley, T. (1964). Auditory flutterdriving of visual flicker. *Science* 145, 1328–1330.

Soto-Faraco, S., Lyons, J., Gazzaniga,M., Spence,C., and Kingstone,A. (2002). The ventriloquist in motion: illusory capture of dynamic information across sensory modalities.*Cogn. Brain Res.* 14, 139–146.

Spence, C. (2007). Audiovisual multisensory integration. *Acoust. Sci. Technol.* 28, 61–70.

Spence, C., and Squire, S. B. (2003). Multisensory integration: maintaining the perception of synchrony. *Curr. Biol.* 13, R519–R521.

Teramoto, W., Hidaka, S., Gyoba, J., and Suzuki,Y. (2010). Auditory temporal cues can modulate visual representational momentum. *Atten. Percept. Psychophys.* 72, 2215–2226.

Teramoto, W., Hidaka, S., Sugita, Y., Sakamoto, S., Gyoba, J., and Iwaya, Y. (2012). Sounds can alter the perceived direction of a moving visual object. *J. Vis.* 12, 1–12.

Vroomen, J., and de Gelder, B. (2004). Temporal ventriloquism: sound modulates the flash-lag effect. *J. Exp. Psychol. Hum. Percept. Perform.* 30, 513–518.

Watanabe, K., and Yokoi, K. (2006). Object-based anisotropies in the flash-lag effect. *Psychol. Sci.* 17, 728–735.

Watanabe, K., and Yokoi, K. (2007). Object-based anisotropic mislocalization by retinotopic motion signals. *Vision Res.* 47, 1662–1667.

Watanabe, K., and Yokoi, K. (2008). Dynamic distortion of visual position representation around moving objects. *J. Vis.* 8, 1–11.

Welch, R. B., DuttonHurt, L. D., and Warren, D. H. (1986). Contributions of audition and vision to temporal rate perception. *Percept. Psychophys.* 39, 294–300.

Welch, R. B., and Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. *Psychol. Bull.* 88, 638–667.

Welch, R. B., and Warren, D. H. (1986). "Intersensory interactions," in *Handbook of Perception and Human Performance*, eds K. R. Boff, L. Kaufman, and J. P. Thomas (New York: Wiley), 25.1–25.36.

White, H., Minor, S. W., Merrell, J., and Smith, T. (1993). Representationalmomentum effects in the cerebral hemispheres. *Brain Cogn.* 22, 161–170.

Yamagishi, N., Anderson, S. J., and Ashida, H. (2001). Evidence for dissociation between the perceptual and visuomotor systems in humans. *Proc. Biol. Sci.* 268, 973–977.

Zampini, M., Guest, S., Shore, D. I., and Spence, C. (2005). Audiovisual simultaneity judgments. *Percept. Psychophys.* 67, 531–544.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2012; accepted: 01 February 2013; published online: 21 February 2013.*

*Citation: Chien S-e, Ono F and Watanabe K (2013) A transient auditory signal shifts the perceived offset position of a moving visual object. Front. Psychology 4:70. doi: 10.3389/fpsyg.2013.00070*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Chien, Ono and Watanabe. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Adaptation to implied tilt: extensive spatial extrapolation of orientation gradients

## *Neil W. Roach\* and Ben S. Webb*

*Visual Neuroscience Group, School of Psychology, The University of Nottingham, Nottingham, UK*

#### *Edited by:*

*Yuki Yamada, Yamaguchi University, Japan*

#### *Reviewed by:*

*Isamu Motoyoshi, NTT Communication Science Labs, Saint Vincent and the Grenadines Nicolaas Prins, University of Mississippi, USA*

#### *\*Correspondence:*

*Neil W. Roach, Visual Neuroscience group, School of Psychology, The University of Nottingham, Nottingham, NG7 2RD, UK e-mail: neil.roach@nottingham.ac.uk* To extract the global structure of an image, the visual system must integrate local orientation estimates across space. Progress is being made toward understanding this integration process, but very little is known about whether the presence of structure exerts a reciprocal influence on local orientation coding. We have previously shown that adaptation to patterns containing circular or radial structure induces tilt-aftereffects (TAEs), even in locations where the adapting pattern was occluded. These spatially "remote" TAEs have novel tuning properties and behave in a manner consistent with adaptation to the local orientation implied by the circular structure (but not physically present) at a given test location. Here, by manipulating the spatial distribution of local elements in noisy circular textures, we demonstrate that remote TAEs are driven by the extrapolation of orientation structure over remarkably large regions of visual space (more than 20◦). We further show that these effects are not specific to adapting stimuli with polar orientation structure, but require a gradient of orientation change across space. Our results suggest that mechanisms of visual adaptation exploit orientation gradients to predict the local pattern content of unfilled regions of space.

**Keywords: adaptation, psychological, tilt aftereffect, texture analysis, orientation, cortical plasticity**

## **INTRODUCTION**

Analysis of orientation structure is fundamental to many aspects of visual perception, including the ability to parse the retinal image into distinct regions and identify the form of different objects. To achieve these goals, the visual system must first encode local orientation signals at different points in the visual field before integrating this information across space. Representation of local orientation is typically associated with primary visual cortex (V1), which is characterized by an orderly mapping of receptive field location and orientation preference across the cortical surface (Hubel and Wiesel, 1968; Blasdel and Salama, 1986; Wandell et al., 2007). Interaction between neighboring neurons provides a potential means to begin extracting orientation structure beyond the spatial constraints of an individual receptive field. For example, long-range excitatory horizontal connections in V1 linking regions of similar orientation preference (Gilbert and Wiesel, 1979, 1983, 1989) have been proposed as a mechanism for integrating along contours (Kapadia et al., 1995, 2000; Li et al., 2006). However, it is likely that more complex and spatially extensive orientation structure analysis relies upon the progressive convergence of V1 outputs in extra-striate visual areas.

Progress is being made toward understanding the types of structure represented at intermediate levels of the processing hierarchy. While the majority of neurons in V2 display spatially homogenous orientation tuning comparable to that seen in V1, sub-populations have been identified that exhibit distinct preferences for orientation discontinuities (Nishimoto et al., 2006; Anzai et al., 2007; Schmid et al., 2009) and texture boundaries (El-Shamayleh and Movshon, 2011). Selectivity to higher order shape properties such as contour curvature (Pasupathy and Connor, 1999, 2001) and polar form (Gallant et al., 1996) has been reported in V4, where neurons have larger receptive fields and begin to show sensitivity to the relative (rather than absolute) positioning of orientations within their receptive fields. These neurophysiological findings are complemented by a growing body of psychophysical studies examining the spatial integration of orientation signals in tasks such as texture segregation (Nothdurft, 1985; Landy and Bergen, 1991), contour detection (Field et al., 1993; Hess et al., 2003), symmetry detection (Dakin and Herbert, 1998; Wilson and Wilkinson, 2002), structure detection (Wilson et al., 1997; Wilson and Wilkinson, 1998; Dakin, 1999; Webb et al., 2008), shape discrimination (Wilson and Wilkinson, 1998; Wilkinson et al., 1998) and contrast detection (Meese and Summers, 2007; Meese et al., 2007; Meese, 2010). Together, this work is providing insight into the mechanisms underpinning human sensitivity to orientation structure of varying complexity and spatial scale.

While the majority of studies in this area tends to focus on the feed-forward pooling of local orientation signals to extract global structure, an important but relatively under-explored question is whether the presence of structure exerts a reciprocal influence on local orientation coding. We know that the responses of single neurons in V1 are strongly modulated by stimulation in the space surrounding the receptive field (e.g., Blakemore and Tobin, 1972; Kapadia et al., 1995, 2000; Webb et al., 2005), but evidence for any selectivity to global orientation structure across large regions of space is currently lacking (Smith et al., 2002). Functional imaging studies have demonstrated that BOLD responses in V1 are systematically suppressed when images contain coherent shapes or objects compared to when they have random orientation structure (Murray et al., 2002; Rainer et al., 2002; Murray, 2004). These findings are consistent with hierarchical predictive coding models, which posit that feedback from higher-level areas acts to remove or "explain away" the predictable components of signals, thereby reducing redundancy in the neural representation (Mumford, 1991; Rao and Ballard, 1999; Spratling, 2010). Predictive coding has been shown to provide an elegant account of a variety of response properties in the retina (Srinivasan et al., 1982), lateral geniculate nucleus (Dong and Atick, 1995; Dan et al., 1996), and V1 (Rao and Ballard, 1999; Spratling, 2010). However, the precise nature of the interaction between local orientation coding and higher-order structure processing remains unclear. For example, in some instances BOLD responses in V1 appear to increase with the presence of orientation structure rather than decrease (Altmann et al., 2003; Kourtzi et al., 2003). It has also been argued that local orientation variability may be the prime determinant of the observed changes in V1 response, rather than the degree of coherent structure *per se* (Dumoulin, 2006).

In a previous psychophysical study, we investigated the impact of global orientation structure on the adaptation of local orientation coding mechanisms (Roach et al., 2008). Following passive exposure to a large, circular grating centered on fixation, observers discriminated the orientation of a small near-vertical Gabor test stimulus presented at different locations along an isoeccentric ring. An annular region of the circular stimulus was occluded during adaptation, ensuring no spatial overlap occurred between the adapting and test patterns. Robust tilt aftereffects (TAEs) were observed in this occluded region, the direction and magnitude of which were consistent with adaptation to the orientation implied by the circular structure (but not physically present) at each test location. Earlier experiments investigating adaptation to partly-occluded visual patterns have been criticized on the basis that after-effects reported in occluded regions of space might be explainable in terms of a spatial spreading of local orientation adaptation effects from adjacent areas (see Sekuler et al., 1970; Weisstein, 1970). The properties of our spatially "remote" TAEs however, strongly suggest that they cannot be explained in this manner. Unlike traditional TAEs obtained following local adaptation (e.g., Ware and Mitchell, 1974), we found that remote TAEs were immune to manipulations of the relative spatial frequency of adapting and test patterns across several octaves. This produced an interesting double dissociation: whereas traditional TAEs obtained with matched adapt/test frequencies were on average ∼2.5 times larger than the equivalent remote TAEs, this pattern was reversed when a three octave difference in spatial frequency was introduced. Spatially remote TAEs were also found to be selective to particular types of global orientation structure. Remote TAEs of comparable magnitude were obtained when observers adapted to radial, rather than circular patterns. However, little or no effect was found using simple isooriented grating adaptors with equivalent dimensions (Roach et al., 2008), making it unlikely that it is driven by local grouping processes (e.g., Sugita, 1999).

These remote TAEs are interesting for several reasons. Although a number of studies have suggested that high-level aftereffects may be inherited from adaptation occurring at early stages of visual processing (Xu et al., 2008, 2012; Dickinson et al., 2010), to our knowledge it is the only evidence suggesting that adaptation of a low-level stimulus property may be induced via feedback from processing at subsequent stages of analysis. In addition, because remote TAEs can be induced by adaptation to some types of orientation structure but not others, investigation of the factors driving this selectivity provides a novel opportunity to gain insight into the integration process itself. In the present study we extend our examination of these effects using texture patterns enabling us to flexibly manipulate the orientation structure present during adaptation.

## **GENERAL METHODS**

## **OBSERVERS**

Six observers participated in the study, the authors and four individuals who had previous experience of psychophysical observing, but were naive to the specific purposes of the study. All had normal or corrected-to-normal visual acuity.

## **STIMULI**

Stimuli were generated in Matlab and displayed via a Cambridge Research Systems ViSaGe system on a photometrically calibrated 22-inch Mitsubishi Diamond Pro 2045U CRT monitor. The display resolution was 1024 × 768 pixels, and at the viewing distance of 33.5 cm each pixel subtended a visual angle of 4 arcmin. The frame-rate was 100 Hz and the mean luminance of the display was 39 cd/m2.

As depicted in **Figure 1A**, each test stimulus was a Gaborlike patch comprising a 2 c/◦ sine wave multiplied by a 1.33◦ diameter isotropic Hanning window. Test stimuli were presented 9.66◦ to the right and 2.59◦ above fixation (a polar angle of π/12 and eccentricity of 10◦). The peak Michelson contrast of the test stimuli was 0.25.

Adapting stimuli were sequences of dense texture patterns, each formed by combining 5000 local oriented elements. Each oriented texture element was constructed in an identical manner to the target stimuli, but had a spatial frequency of 1 c/◦. Different spatial frequencies were chosen for adapting and test stimuli to try and minimize the contribution of any local adaptation effects (see Roach et al., 2008). Texture elements were assigned a random phase and were centered at a random position within a 48 × 48◦ square region. Each adapting texture was normalized to ensure the mean luminance remained equal to the background and each had a RMS contrast of 9%. To minimize spatial overlap between the adapting and test stimuli, the contrast of the textures within a 3◦ radius circular region centered on the test location was set to zero. Beyond this region, contrast was restored gradually via a quarter-cycle cosine ramp over 1.6◦. In each experiment, the orientation of each local element was determined by its location within the texture field and the desired orientation structure.

## **PROCEDURE**

Participants were positioned in a chin rest and viewed the stimulus display binocularly in a darkened room. During a testing block, fixation was maintained on a small dot positioned in the center of the screen. Each block began with an initial 30 s period of adaptation, during which the adapting texture was regenerated every 100 ms to avoid the build-up of a retinal afterimage.

and to the right of the fixation dot. **(B)** Noisy concentric adapting stimulus comprising signal and noise elements randomly distributed throughout the ("proximal noise" condition). **(D)** Concentric adapting stimulus with all signal elements surrounding the test location ("distal noise" condition).

Following a 500 ms blank inter-stimulus interval, a test stimulus was presented for 100 ms and the participant indicated whether it appeared to be tilted clockwise or counter-clockwise with respect to vertical. A further 3 s of top-up adaptation to the dynamic adapting texture preceded each subsequent trial. The orientation of the test stimulus was manipulated according to a method of constant stimuli, with 10 presentations of 7 linearly spaced orientations randomly ordered within a testing block. Participants completed 2–4 blocks per adaptation condition and breaks were taken between blocks to allow recovery from adaptation and avoid contamination across conditions. Psychometric functions were constructed for each condition and fitted with a logistic function using a maximum likelihood criterion, allowing estimation of the point of subjective equality (PSE): the physical orientation producing equal proportions of clockwise and counter-clockwise responses. The standard error associated with each PSE estimate was obtained via bootstrapping (Efron and Tibshirani, 1993). TAEs were inferred from the change in PSE relative to a baseline condition with no adaptation, with a positive change indicating a repulsive shift in the perceived orientation of the test stimulus away from the local orientation implied by the adapting structure.

## **EXPERIMENT 1 METHODS**

In our previous study, we induced remote TAEs at spatial locations coinciding with an occluded (i.e., zero-contrast) annular region of a circular adapting stimulus (Roach et al., 2008). To investigate the spatial extent over which this effect holds, we could have manipulated the size of this occluded region. However, this approach would have also altered the overall area and contrast energy of the adapter—a confound that we wished to avoid. Here we took a different approach using adapting textures composed of a variable proportion of signal and noise texture elements. Signal elements were assigned orientations consistent with a circular structure centered on fixation, whereas noise elements were assigned random orientations independent of their position. At the fixed test stimulus location, the tangential orientation implied by the circular structure was 15◦ counter-clockwise of vertical. This arrangement was chosen as it has been previously shown to produce the largest remote TAE (Roach et al., 2008).

Three variants of the noisy circular adaptor were used to investigate the spatial specificity of the remote TAE. In the "intermixed" condition, signal and noise elements were distributed throughout the texture pattern, as depicted in **Figure 1B**. In the "proximal noise" condition, all of the noise elements were restricted to an annular region surrounding the test location (**Figure 1C**). Conversely, in the "distal noise" condition, all of the signal elements surrounded the test location (**Figure 1D**). In each of the two segregated conditions, the outer radius of the annulus was manipulated. This enabled independent control of the spatial distance of signal texture elements relative to the test site and the overall coherence of the circular orientation structure (defined as the ratio of signal to noise elements). The overall area and RMS contrast of the adapting textures remained constant in all conditions. Note that in some conditions the outer radius of the annulus extended beyond the limits of the square texture region. However, this did not affect coherence calculations, which were based on the relative frequencies of visible texture elements.

#### **RESULTS**

**Figure 2** shows individuals' PSEs plotted as a function of structure coherence for each of the different adapting configurations. When all of the local elements of the adapting texture had an orientation consistent with circular structure (i.e., 100% structure coherence) repulsive TAEs of approximately 2◦ were found for each observer. These effects are comparable in size to those previously reported with polar grating stimuli (Roach et al., 2008). Reducing the structure coherence of the adapting texture by introducing randomly oriented elements produced a concomitant reduction of the size of the TAE. Interestingly, this effect was largely robust to manipulations of the spatial configuration of signal and noise elements. Restricting the placement of noise elements to an annular region surrounding the test site ("proximal noise" condition, blue symbols) had no greater impact than randomly positioning them throughout the texture pattern ("intermixed" condition, black symbols). Instead, the two sets of data are virtually indistinguishable across the tested range of coherence values. This is remarkable because to achieve 50% structure coherence in the "proximal noise" condition, the required outer radius of the noise annulus was 22◦, meaning that every texture element signaling the circular structure was at least this distance away from the center of the test site. Clearly, remote TAEs involve mechanisms operating across large regions of visual space.

Results for the complementary condition in which only signal elements were presented in the space surrounding the test location are indicated by the red symbols ("distal noise" condition). Again, TAE magnitude increased systematically as a function of coherence. However, shifts in PSE in this condition are larger than that observed with random positioning of signal and noise elements, suggesting there may be some contribution of local adaptation driven by regions of the adapting stimulus adjacent to the test region. To avoid this in subsequent experiments, we exclusively used adapting textures containing a proximal region of random orientation noise.

## **EXPERIMENT 2**

#### **METHODS**

Spatially remote TAEs can be induced by adaptation to circular or radial gratings, but not to simple Cartesian gratings (Roach et al., 2008). To better understand the reason for this discrepancy, we next investigated the contribution of two characteristics

respectively. Unadapted (baseline) performance is shown by the unfilled

black symbol. Error bars indicate ± 1 standard error.

of polar gratings that do not apply to simple Cartesian gratings: the presence of orientation gradients and reflectional symmetry.

Adapting textures were constructed in which the orientation of each element in the adapting texture was a linear function of either its horizontal or vertical position (see **Figure 3A**). The rate of change of orientation across space was manipulated between 0

defined by a linear change in orientation as a function of either the horizontal (x) or vertical position (y). In each stimulus, the underlying orientation gradient is anchored about the test location (center of noisy reflectional symmetry. Textures with linear orientation gradients along the x and y axes contain symmetry about the horizontal and vertical meridians, respectively.

(no change in orientation) and 10 degrees of rotation per degree of visual angle (see **Figure 3A**). In separate conditions, reflectional symmetry was introduced to adapting textures about the meridian of the axis along which the orientation gradient was applied. For example as shown in **Figure 3B**, textures with a change in orientation as a function of horizontal (x) position were made to be symmetrical about the horizontal midline. Note in **Figure 3A**, some conditions naturally contain reflectional symmetry about the axis orthogonal to the orientation gradients. In all cases, the orientation implied by the gradient at the test site remained constant at 15◦ counter-clockwise from vertical. To minimize the potential influence of local adaptation brought about by poor fixation stability, randomly oriented noise elements were presented in the region of space surrounding the test site (outer radius of annulus = 9.53◦; structure coherence = 90%).

## **RESULTS**

Adapting patterns containing a linear change in orientation across space were effective at inducing remote TAEs. As shown in **Figure 4**, the magnitude of observed effects displayed a tuned dependency on the gradient of the orientation change. For each observer, the largest shifts in PSE occurred for adapting textures in which orientation changed by 5◦ for each degree of visual

angle. Remote TAEs in this gradient condition were comparable in size to those produced with high coherence circular structure in Experiment 1 (∼2◦). This similarity is noteworthy, because the orientation gradient of our circular textures was very similar to this peak value at the test location (5.73◦ change in orientation per unit space along arc at 10◦ eccentricity). Little or no consistent effect was observed across participants in the absence of a change in orientation across space, or where the orientation gradient approached 10 degrees of orientation change per degree of visual angle.

Comparison of the filled and unfilled symbols in **Figure 4** suggests that remote TAEs are not sensitive to the degree of reflectional symmetry in the adapting stimulus. Although two participants displayed slightly larger remote TAEs when symmetry was applied around the horizontal meridian, in general no systematic pattern was observed across individuals.

## **EXPERIMENT 3**

#### **METHODS**

The importance of having a smooth change in orientation across space was investigated by quantizing the orientation gradient into discrete spatial regions along the gradient axis. A fixed orientation gradient (5◦ of rotation per degree of visual angle in the horizontal dimension) was used and the width of each spatial band was varied between 0.2 and 36◦ of visual angle (see **Figure 5**). Within a spatial band, the orientation of all local elements was set to the mean value of the underlying linear gradient. Spatial bands were positioned such that the center of the test region always coincided with the center of a band and was assigned an orientation of 15◦ counter-clockwise. Note that in the most extreme quantization

condition tested (36◦), the size of a band coincides with the spatial period of the orientation modulation. In this situation, averaging within a band completely removes the orientation gradient, resulting in an iso-oriented texture.

## **RESULTS**

The dependency of remote TAEs on the smoothness of the adapted orientation gradient is shown in **Figure 6**. Participants' results were insensitive to the introduction of small discontinuities in the orientation gradient, but the effect was abolished in the coarsest quantization conditions. Neither observer displayed a remote TAE when the width of each iso-oriented band exceeded ∼12◦ of visual angle. This spatial band width corresponds to

one third of the period of the underlying orientation gradient, meaning that each cycle of orientation change is signaled by three discrete orientations separated by 60◦ (see **Figure 5C**). The largest spatial band width at which consistent remote TAEs were observed was 8◦. In this condition, the 40◦ change in orientation between each band is sufficient to produce a salient boundary percept (see **Figure 5B**), suggesting that perceptual segregation of the adapting texture into distinct regions is not the critical factor at play.

## **DISCUSSION**

The visual system is often characterized as a hierarchy, in which successive stages analyse progressively more complex attributes of the visual scene. Adaptation is thought to occur at multiple stages, giving rise to a rich variety of perceptual aftereffects, ranging from distortions of basic stimulus properties such as local orientation (Gibson and Radner, 1937), to more complex higher-level structures (e.g., Suzuki, 2001; Peirce and Taylor, 2006; Anderson et al., 2007; Gheorghiu and Kingdom, 2007). However, surprisingly little is known about the interplay between changes occurring at multiple stages of analysis—a critical component of understanding adaptation in the visual system as a whole. In the present study we investigated the effect of adapting to images containing spatially-extensive orientation structure on the perceived orientation of small test stimuli. Replicating earlier findings (Roach et al., 2008), we were able to induce repulsive TAEs in regions of the visual field that did not receive input during adaptation. These remote TAEs cannot be explained by a spreading of local orientation adaptation effects across space, such as might result from fixational instability or some form of low-pass filtering by the visual system (i.e., optical or neural blur). Any mechanism of this sort would be highly dependent on the orientation content of the adapting stimulus within the region of space immediately surrounding the tested location. Counter to this prediction, the results of Experiment 1 indicate that placing a large annular field of random orientations around the test site does not prevent the induction of remote TAEs. Rather than being driven by local image content, these biases in perceived orientation are best explained in terms of extrapolation of the adapted orientation structure. Put simply, observers appear to adapt to the local orientation that is "implied" by the orientation structure of a nearby texture. Our results show that this extrapolation of adaptation effects to unfilled regions of the visual field is spatially extensive, spanning at least 22◦ of visual angle.

Previously we found that while adaptation to circular and radial patterns results in robust effects, adaptation to iso-oriented patterns does not. Several researchers have proposed that specialized mechanisms exist in the visual system for processing polar form (e.g., Wilson et al., 1997; Wilson and Wilkinson, 1998; Kurki, 2004; Dumoulin and Hess, 2007; Motoyoshi and Kingdom, 2010). However, the results of Experiment 2 show that remote TAEs are not specific to polar form *per se*, but do require some form of systematic change in orientation across space. The size of the effect is a tuned function of the adapting orientation gradient, peaking when orientation changes by approximately 5◦ for every degree of visual angle. This pattern of selectivity is an interesting result, as it runs counter to the large body of research showing that the visual system is especially sensitive to linking common orientations across space (for reviews see Hess et al., 2003; Loffler, 2008). In contrast to the tuning of the remote TAE, the ability of human observers to detect (Field et al., 1993; Geisler et al., 2001) and interpolate (Fulvio et al., 2008) contours typically decreases as a function of curvature. It is also established that extensive spatial summation occurs for iso-oriented textures both at and above threshold (Meese and Summers, 2007; Meese et al., 2007; Meese, 2010), so it is intriguing that adaptation to this form of orientation structure does not produce comparable remote aftereffects.

Our results raise the possibility that the visual system may have specialized mechanisms for processing orientation gradients contained in visual textures. Previous studies on orientation gradients have focused primarily on their role in texture segmentation. Perceptual segmentation tends to occur when the change in orientation between two regions (i.e., the orientation gradient across a border) is large relative to changes in orientation occurring within each region (e.g., Landy and Bergen, 1991; Nothdurft, 1992; Wolfson and Landy, 1995). It is possible that the mechanisms supporting texture segmentation may overlap with those underlying remote TAEs. However, the relationship between these phenomena is not clear. Several aspects of our results indicate that the generation of a remote TAE does not depend upon the adapting texture being perceived as a coherent surface. For example, the "proximal noise" and "distal noise" manipulations in Experiment 1 produce clear segmentation of the region surrounding the test site from the remainder of the adapting pattern (see **Figure 1**). Yet these stimuli produced comparable effects to "intermixed" adaptors, which had a more uniform appearance. Also in Experiment 3, remote TAEs were observed when quantization of the orientation gradient resulted in the adapting texture being perceptually segregated into vertical bands (see **Figure 5B**). It is unlikely therefore, that these effects are a simple by-product of the texture segmentation process.

We have previously hypothesized that remote TAEs could arise via a reciprocal interaction between local orientation coding mechanisms in V1 and extrastriate visual areas tasked with extracting global orientation structure (Roach et al., 2008). This suggestion was motivated by psychophysical studies showing that extraction of global form structure involves the pooling of local orientation signals across space and spatial scale (Dakin and Bex, 2001; Achtman, 2003) and anatomical and physiological studies showing a close alignment of feedforward and feedback connections between V1 and extrastriate areas (e.g., Angelucci et al., 2002). We reasoned that if feedback to V1 acts to inhibit *all* local orientation detectors over which a second-stage unit receives input, any resulting effects ought to show a loss of selectivity commensurate with nature of the feed-forward pooling. According to this idea, adapting to a globally structured stimulus could produce selective suppression of V1 neurons with orientation preferences matching the orientation structure, but where receptive field position and/or spatial frequency tuning dictate that they are relatively unresponsive to the adapting stimulus. This in turn would be sufficient to drive TAEs in regions of space where the adapting pattern was occluded and that are tolerant to changes in spatial frequency (see Roach et al., 2008 for further details). The notion of feedback suppressing activity in V1 that is consistent with orientation structure represented in higher visual areas is broadly

## **REFERENCES**


suggestive of some form of predictive coding (Mumford, 1991; Rao and Ballard, 1999; Spratling, 2010). However, interpretation of remote TAEs within this framework is not straightforward. One complicating factor is that our paradigm measures changes in orientation perception occurring within a region of space in which the adapting stimulus is occluded. In predictive coding models, feedback functions to remove or reduce activity in lower areas that matches the predictions of higher areas. In the sub-population of V1 neurons representing the occluded region of space however, there may be little or no activity to "explain." There is some evidence suggesting that feedback continues to contribute to V1 activity in the absence of any feed-forward stimulation, but these effects are not typically accounted for by predictive coding models (see Muckli and Petro, 2013 for a recent review). A second issue is that rather than a modulation of ongoing activity during the presentation of a structured stimulus, explanation of remote TAEs requires a lasting and selective change in neural responsivity. Notionally, visual adaptation can be conceived as predictive coding operating in time. However, formal model implementations of this process are currently lacking.

Why does the visual system adapt to local orientations that are implied by the structure of a texture, but not actually present in the image? One situation in which this could be functionally advantageous is when regions of a scene are temporarily occluded from view. A consequence of having coherent spatial structure in an image is that the composition of occluded areas can be predicted from the surrounding spatial context. Adaptation mechanisms could exploit this predictability to mimic the processing that would have occurred with full viewing of the scene, thereby preparing the visual system for when the occlusion is removed. When a region of the visual field is deprived of input for long period of time (e.g., patients with scotoma), perceptual filling-in of texture and other image properties often occurs (Gerrits and Timmerman, 1969; Ramachandran and Gregory, 1991). Remote TAEs may reflect the operation of a shorter-term neural filling in process, one that changes the adaptive state of local orientation detectors without generating a conscious percept.

## **ACKNOWLEDGMENTS**

This work was supported by the Wellcome Trust [WT097387], [WT085222]. We thank David McGovern for assistance with data collection.


*Proc. Biol. Sci.* 265, 659–664. doi: 10.1098/rspb.1998.0344


in the lateral geniculate nucleus. *Netw. Comput. Neural Syst.* 6, 159–178. doi: 10.1088/0954-898X/ 6/2/003


cortex. *Nature* 280, 120–125. doi: 10.1038/280120a0


*Proc. Biol. Sci.* 274, 2891–2900. doi: 10.1098/rspb.2007.0957


monkey. *Neuroimage* 16, 607–616. doi: 10.1006/nimg.2002.1086


(2005). Early and late mechanisms of surround suppression in striate cortex of macaque. *J. Neurosci.* 25, 11666–11675. doi: 10.1523/JNEUROSCI.3414-05.2005


structure in glass patterns: implications for form vision. *Vision Res.* 38, 2933–2947. doi: 10.1016/S0042-6989(98)00109-6


across the cortical hierarchy: lowlevel curve adaptation affects high-level facial-expression judgments. *J. Neurosci.* 28, 3374–3383. doi: 10.1523/JNEURO SCI.0182-08.2008

Xu, H., Liu, P., Dayan, P., and Qian, N. (2012). Multi-level visual adaptation: dissociating curvature and facial-expression aftereffects produced by the same adapting stimuli. *Vision Res.* 72, 42–53. doi: 10.1016/j.visres. 2012.09.003

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 March 2013; accepted: 25 June 2013; published online: 19 July 2013.*

*Citation: Roach NW and Webb BS (2013) Adaptation to implied tilt: extensive spatial extrapolation of orientation gradients. Front. Psychol. 4:438. doi: 10.3389/fpsyg.2013.00438*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Roach and Webb. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The experience of agency: an interplay between prediction and postdiction

## *Matthis Synofzik1,2\*, Gottfried Vosgerau3 and Martin Voss <sup>4</sup>*

*<sup>1</sup> Department of Neurodegenerative Diseases, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany*

*<sup>2</sup> German Research Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany*

*<sup>3</sup> Institut für Philosophie, Heinrich-Heine-Universität, Düsseldorf, Germany*

*<sup>4</sup> Department of Psychiatry and Psychotherapy, Charité University Hospital and St. Hedwig Hospital, Berlin, Germany*

#### *Edited by:*

*Takahiro Kawabe, Nippon Telegraph and Telephone Corporation, Japan*

#### *Reviewed by:*

*Florian Waszak, Université Paris Descartes, France Takaaki Kaneko, Kyoto University, Japan*

#### *\*Correspondence:*

*Matthis Synofzik, Department of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Hoppe-Seyler-Str. 3, Tübingen 72076, Germany. e-mail: matthis.synofzik@ uni-tuebingen.de*

The experience of agency, i.e., the registration that I am the initiator of my actions, is a basic and constant underpinning of our interaction with the world. Whereas several accounts have underlined predictive processes as the central mechanism (e.g., the comparator model by C. Frith), others emphasized postdictive inferences (e.g., *post-hoc* inference account by D. Wegner). Based on increasing evidence that both predictive and postdictive processes contribute to the experience of agency, we here present a unifying but at the same time parsimonious approach that reconciles these accounts: predictive and postdictive processes are both integrated by the brain according to the principles of optimal cue integration. According to this framework, predictive and postdictive processes each serve as authorship cues that are continuously integrated and weighted depending on their availability and reliability in a given situation. Both sensorimotor and cognitive signals can serve as predictive cues (e.g., internal predictions based on an efferency copy of the motor command or cognitive anticipations based on priming). Similarly, other sensorimotor and cognitive cues can each serve as *post-hoc* cues (e.g., visual feedback of the action or the affective valence of the action outcome). Integration and weighting of these cues might not only differ between contexts and individuals, but also between different subject and disease groups. For example, schizophrenia patients with delusions of influence seem to rely less on (probably imprecise) predictive motor signals of the action and more on *post-hoc* action cues like e.g., visual feedback and, possibly, the affective valence of the action outcome. Thus, the framework of optimal cue integration offers a promising approach that directly stimulates a wide range of experimentally testable hypotheses on agency processing in different subject groups.

**Keywords: agency, schizophrenia, delusions of influence, control, internal model, efference copy, comparator model, optimal cue integration**

## **INTRODUCTION**

The experience of agency, i.e., the registration that I am the initiator of my actions, is a basic and constant underpinning of our interaction with the world: whenever we grasp, type, or walk, we register the resulting sensory consequences as caused by ourselves. In the last two decades, several different accounts have been proposed to explain the neurocognitive underpinnings of this experience. While some accounts put a stronger emphasis on processes *preceding* the execution of one's respective action for installing an experience of agency, others more strongly emphasize processes *succeeding* one's action. According to this emphasis (which is, of course, not to be seen as an absolute dichotomy, but rather as two poles on a continuous spectrum), these accounts can be grouped in *predictive* and *postdictive* accounts.

Here we discuss the short-comings of either type of account (if seen in isolation) and propose a framework of the experience of agency that will combine both accounts and stimulate manifold experimentally testable hypotheses. This will be illustrated by the example of impaired agency processing in schizophrenia patients suffering from delusions of control. The framework presented here elaborates on and specifies several recent studies that have likewise investigated and proposed mechanisms of an "integration model of agency" (Wegner and Sparrow, 2004; Bayne and Pacherie, 2007; Fletcher and Frith, 2009; Moore et al., 2009a,b; Moore and Fletcher, 2012). However, in contrast to these earlier studies, this framework brings in a new perspective by starting off from an analysis of predictive vs. postdictive accounts, by focussing not only on delusions of control but rather the experience of agency in general [in contrast to e.g., Fletcher and Frith (2009)] and by integrating also very recent results on both predictive processes (e.g., Desantis et al., 2012; Hughes et al., 2013) and *post-hoc* processes. Moreover, it proposes a novel scheme how and on which level different agency cues might be integrated (**Figure 1**). Finally, we describe the affective valence of an action outcome as a relatively novel self-agency cue, which has not been considered in the original predictive and postdictive accounts and which might explain why delusions of control in schizophrenia

patients rarely refer to trivial, non-emotional actions, but rather to very specific actions with high affective and moral value.

## **POSTDICTIVE vs. PREDICTIVE ACCOUNTS OF AGENCY**

An example for an influential account of postdictive agency processing is Daniel Wegner's famous account (Wegner, 2002, 2003) 1 . Here, the experience of agency is mainly seen as the product of a fallible *post-hoc* inference *during* and *after* the action has occurred, rather than as the result of an infallible direct access to one's cognitive and motor preparation processes *preceding* one's action. According to this notion, the experience of agency for a particular event comes in degrees: it is most strongly, (1) when one's action is the exclusive potential cause of the event (exclusivity), (2) when one has prior thoughts or plans about the action (priority), and (3) when the occurred action matches the action that was planned (consistency). Based on these three criteria, an inference of self-agency is constructed *after* the event has taken place, namely by *postdictive inference*. In this account, low-level motor mechanisms directly related to the motor command and the execution of the action play only a minor role for this inference. Rather, cognitive priors and anticipations, background thoughts, and intention-outcome matching processes (unrelated to very specific and fine-grained characteristics of the actual motor command and the actually executed action) assume a critical role for inferring self-agency. Thus, many inferential accounts—from both Wegner and other authors—also integrate some predictive mechanisms, as they also regard movement priors as important cues for experiencing agency [see e.g., Linser and Goschke (2007)]. However, the experience of agency is nevertheless still essentially seen as the inferential product of a fallible *post-hoc* inference which integrates, inter alia, also cognitive and motor priors. It is not seen as the result of an infallible direct access to one's motor preparation processes *preceding* one's action.

<sup>1</sup>For the following summary of these accounts, we were inspired by the nice overview and comparison given at http://en.wikipedia.org/wiki/ Inferring\_self-agency (Accessed 08/11/2012).

On the other end of the spectrum, accounts elaborating on computational models of sensorimotor integration (Sperry, 1950; von Holst and Mittelstaedt, 1950; von Holst, 1954; Wolpert et al., 1995) hypothesize that the experience of agency for a given action essentially arises from internal motor representations associated with generating the movement that *precede* the action. For example, according to the renowned comparator model (Frith et al., 2000; Blakemore et al., 2002), an internal prediction about the sensory consequences of one's actions is generated on the basis of an efference copy of the motor command. These predicted sensory consequences can be compared with the actual sensory state after that action has been initiated. If the actual sensory state matches the predicted one, it is registered as self-caused. In case of a mismatch, it is registered as externally caused. Although, strictly speaking, this account is also not a purely predictive account of agency—as agency registration here requires the sensory feedback of one's action (and thus also a "postdictive" component) for the comparison process—, the predictive mechanism here plays the critical role. The sensory feedback is only required for comparison purposes and does not *per se* carry the critical information for installing an experience of agency. Thus, in contrast to the inferential accounts of agency, the main emphasis here is not on postdictive inferences but on predictive sensorimotor processes.

## **PREDICTIVE AND POSTDICTIVE ACCOUNTS EACH HAVE MAJOR LIMITATIONS**

Within the sense of agency, two levels have to be distinguished: the *feeling of agency*, which consists of a non-conceptual, automatic registration of whether I am the agent or not, and the *judgment of agency*, which is the formation of a belief about who the initiator of the movement was [Synofzik et al., 2008a,b; for a partly different distinction between two levels within the sense of agency see Bayne and Pacherie (2007)]. The automatic registration on the level of feeling can lead to the perception of a particular action or sensory event as self-caused. Subsequently and based on this feeling, a judgment might be established (depending on the demands of the context), which takes into account not only the feeling itself but also context information, background beliefs, general social norms, etc.

Both the predictive and the postdictive accounts have difficulties because they do not respect this distinction. For example, the predictive account based on internal predictions about the sensory consequences of one's movements model might explain the basic, non-conceptual feeling of agency; but it cannot explain the actual conceptual attribution of an action to one's own or somebody else's agency, i.e., the judgement of agency (Synofzik et al., 2008b). This attribution does not depend only on sensorimotor processes, but requires integration of context cues, background beliefs, and *post-hoc* inferences (Synofzik et al., 2008b). In turn, Wegner's postdictive account and many studies supporting this account seem to focus mainly on conscious conceptual *judgements of agency*. These judgements might indeed essentially build on *post-hoc* inferences based on complex cognitive cues such as prior expectations about the task, background beliefs, social interaction, and context estimations. Nevertheless, this postdictive account cannot give an explanation of the feeling-level of agency.

Moreover, Wegner's postdictive account of agency is confronted with several further challenges and biological or explanatory disadvantages:


But also the Frith'ian predictive account of agency faces several further challenges and biological or explanatory disadvantages (Synofzik et al., 2008b; Vosgerau and Synofzik, 2012):

1. The output of the comparator model is not only insufficient to explain judgements of agency. In some instances, it

<sup>2</sup>The self-external distinction which also occurs in simple animals and during many continuous sensorimotor operations in humans should, of course, not be equated with the experience of agency, but is only a necessary (yet not sufficient) condition for this experience. This distinction might build the basis and trigger an experience of agency, but is, in itself, only a very basic, mostly non-conscious registration of a low-level registration system (Vosgerau and Newen, 2007; Synofzik et al., 2008a).

can also not fully explain the direct non-conceptual perception of one's actions. A recent study by Wilke and colleagues shows that the perception of one's actions is—in addition to the comparison between internal predictions and sensory feedback—also modulated by external cues presented *posthoc* (here: the affective valence of action outcomes) (Wilke et al., 2012).


## **OPTIMAL CUE INTEGRATION: COMBINING PREDICTIVE AND POSTDICTIVE AGENCY CUES**

If evaluated in separation, both the predictive and the postdictive account face severe challenges and limitations. And, indeed, there is increasing evidence that the experience of agency does not result from *either* predictive *or* postdictive processes, but that *both* types of processes contribute to the experience of agency, and that they do so in a closely interacting way. For example, Kühn and colleagues suggested that agency judgements incorporate early information processing components (based on the finding that agency judgements were predictable already by the P3a component of tone event-related potentials), and are not purely reconstructive, *post-hoc* evaluations generated only at time of judgement (Kuhn et al., 2011). In turn, as mentioned above, the perception of one's actions is not fully determined by predictive motor processes, but also modulated by external cues presented *post-hoc*, like e.g., the affective valence of the action outcome (Wilke et al., 2012).

But how might the brain integrate predictive and *post-hoc* cues to form a valid and reliable experience of agency for a given sensory event in a particular situation? A proposal of *optimal cue integration* has recently emerged: the brain constantly integrates several different authorship cues and weights each cue according to its relative reliability in a given situation (Synofzik et al., 2009, 2010; Synofzik and Voss, 2010). The reliability of a cue would be low if its variance is high; in turn, its reliability would be high if it is present in a very salient way and/or highly precise. This notion follows the framework of optimal cue integration established in the field of object perception: according to this framework, no single information signal is powerful enough to convey an adequate representation of a certain perceptual entity under all everyday conditions. Instead, depending on the availability and reliability of a certain information cue, different combination and integration strategies should be used to frame the weighting of sensory and motor signals. Usually, predictive efferent signals such as internal predictions serve as the most reliable and robust agency cues, as they usually provide the fastest and least noisy information about one's own actions (Wolpert and Flanagan, 2001). However, in some situations and subjects, other cues might outweigh or even replace these efferent signals to install a basic registration of agency. For example, if predictive cues like internal predictions are weak or imprecise, *post-hoc* cues like the action feedback or the action outcome should receive a higher weight for determining one's experience of agency. In other words: the variance within one agency cue should be directly related to the reliance on another. Thus, optimal cue integration might not only allow robust perception of objects and the world (Ernst and Banks, 2002; Ernst and Bulthoff, 2004) and efficient sensorimotor learning (Kording and Wolpert, 2004), it could also provide the basis for subjects' robust, and at the same time flexible, agency experience in variable contexts (Synofzik et al., 2009; Synofzik and Voss, 2010; Moore and Fletcher, 2012).

Predictive cues entering the cue integration process are in a sensorimotor format and can consist of e.g., an efference copy, internal predictions based on an efferency copy of the motor command (Frith et al., 2000) or sensorimotor predictions based on automatic associations [e.g., through subliminal priming priming (Wegner, 2003; Wegner et al., 2004; Aarts et al., 2005)]. We refer to these different predictive components as "sensorimotor priors" (see **Figure 1**). Some sensorimotor priors can also be influenced by cognitive cues like background beliefs or knowledge about the world [e.g., motor processing or sensorimotor predictions can by influenced by autosuggestion or through supraliminal priming (Wegner et al., 2004; Aarts et al., 2005) or through prior causal beliefs induced by contextual information (Desantis et al., 2011)] (see **Figure 1**). Also the postdictive component can contain sensorimotor cues, e.g., the visual feedback of the action (Synofzik et al., 2010) or feedback in other sensory modalities (including proprioception). Both predictive and postdictive components can contribute to the feeling of agency, which operates on a non-conceptual sensorimotor level (see **Figure 1**).

On the conceptual cognitive level, a judgement of agency is formed. This is largely based on the feeling of agency, but also takes into account cognitive cues like background beliefs and information about the environment [e.g., the *post-hoc* observation that I am the only person in the room (cf. de Vignemont and Fourneret, 2004)]. At both levels—the level of feeling and the level of judgement of agency—the cue integration process can be modulated by affective components (e.g., affective valence of the action outcome [Wilke et al., 2012] (see **Figure 1**)). The context and the environment have a direct influence on the weighting of postdictive sensorimotor cues (e.g., lighting conditions on the reliability of vision), and a more indirect influence on the formation of the judgment of agency via cognitive representations of the environment (see **Figure 1**).

If understood in this way, optimal cue integration provides a unified framework to explain many findings from recent studies of agency, such as priming studies. For example, in the abovementioned study by Moore et al. (2009a), which combines intentional binding and priming, passive movements can be seen as an instance where internal predictions are not available for the system. The optimal cue integration approach would now predict that external cues (e.g., primes) should receive a higher weight for determining the experience of agency. This is exactly what the authors observed: primes modulated perceived intervals for both active and passive movements, but the modulation was greatest for passive movements (Moore et al., 2009a; Synofzik et al., 2009).

This finding, however, has to be interpreted with caution as—in contrast to a long-standing assumption—intentional binding (present in the active condition) does not necessarily reflect a signature of agency. As we have argued earlier (Synofzik et al., 2009), the fact that perceived time intervals between movement and effect were decreased by priming also in case of involuntary movements opens up the possibility that the binding between movement and effect might not be specific to agency and intentionality, but can also present—at least in part—a more unspecific effect linked to temporal binding between two events (in this case between the two congruent sounds, i.e., between prime and effect). Indeed, recent studies suggest that intentional binding is neither linked specifically to motor predictive processes (Desantis et al., 2012; Hughes et al., 2013) nor to agency (Buehner and Humphreys, 2009; Buehner, 2012; Dogge et al., 2012), but rather to causality in general. However, even if the phenomenon of binding of movements to their effects was not due to motor predictive processes, it could still contribute to the experience of agency, for instance, by accentuating subject's perception of the temporal contiguity between movements and their effects (Desantis et al., 2012). Since this accentuation would probably be higher for active than for passive movements, it might also serve as a stronger agency cue in active than in passive movements. Correspondingly, the optimal cue integration approach would predict that subjects' experience of agency would be more open to modulation by external primes in the passive condition than in the active condition. This interpretation would still be compatible with the findings by Moore et al. (2009a).

If internal predictions do not allow to predict the effect of an action—e.g., because of a low contingency between action and effect—, the optimal cue approach would predict that other cues (e.g., primes) should be given more weight for the registration of agency. These additional cues, however, should not receive particular weight if internal predictions serve as a sufficiently reliable predictor for an upcoming event.

This hypothesis was investigated by Gentsch et al. (2012). Subjects had to press a key, which was followed by a certain visual outcome on a computer screen (arrows pointing up or down) with high (75%) or low (50%) contingency, and which was preceded by a congruent or incongruent prime. In case of high contingency, subjects could reliably predict the visual outcome (arrow pointing up or down), and they should not need to rely on the prime. In case of low contingency, however, they could not do so; here they should rely also on the prime. This is exactly what the authors observed: in the low contingency condition, but not in the high contingency condition, priming had an effect on the judgement of the causal strength between action and effect. However, this effect was not found on the level of the cortical N1 response to actively generated feedback, which the authors take as a measure for the feeling of agency. Here priming influenced the response independent of the contingency between action and effect. However, the cortical N1 response might not be a measure of the feeling of agency [as suggested by the authors (Gentsch et al., 2012)], but only of one of the cues—in this case a sensorimotor prediction based on priming as opposed to the motor prediction based on implicit learning of contingencies. On this interpretation, the sensorimotor prediction would be weighted high if no motor predictions are present (low-contingency) and low if motor predictions are present (high-contingency).

## **INTEGRATION OF PREDICTIVE AND** *post-hoc* **CUES IN SCHIZOPHRENIA PATIENTS**

Schizophrenia patients suffering from delusions of influence can be seen as "pathophysiology model" for agency processing, i.e., they provide a window to the processes underlying one's selfattribution of actions. In particular, they illustrate how predictive and *post-hoc* cues of agency are both integrated according to the principles of cue integration (Fletcher and Frith, 2009; Synofzik et al., 2010).

Schizophrenia patients with delusions of influence feel that their actions are no longer controlled by themselves. Sometimes they not only experience their actions as not self-caused, leading only to a vague and strange experience, but also attribute them to some specific other agents (e.g., to a friend, neighbor, or the devil) (Frith, 1992). How can this experience be explained by the optimal cue integration approach? Although several studies that argue for a close link between delusions of influence and a deficit in internal motor predictions have to be interpreted with caution<sup>3</sup> , two recent studies using very different paradigms—namely a visual distortion paradigm and an

<sup>3</sup>A deficit of motor predictive mechanisms in schizophrenia is often inferred from studies that observe abnormal sensory attenuation and intentional binding in these patients. However, it has been argued that the contrasts used by these studies appear to differ in a number of processes other than motor prediction, such as temporal prediction and temporal control (Hughes et al., 2013). Also many other studies commonly taken as support for the notion of prediction deficits in schizophrenia patients with delusions of control can, in fact, not directly explain delusions of control (Synofzik et al., 2008a,b, 2010).

intentional binding paradigm—provide complementary evidence that schizophrenia patients might indeed show imprecise internal predictions about the sensory consequences of their own actions (Synofzik et al., 2010; Voss et al., 2010). Both studies also showed that this deficit correlated with the severity of the psychopathology: the higher the imprecision in predicting the sensory consequences of one's own actions, the higher the score for delusions of influence (Synofzik et al., 2010). Similar results using an intentional binding paradigm were found for patients in a putative psychotic prodromal stage, suggesting a disturbance of agency already early in the course of the disease (Hauser et al., 2011a). Following the optimal cue integration approach, imprecise predictions should prompt the perceptual system to rely more strongly on *post-hoc* cues in order to receive a more reliable account of one's own actions. And indeed, the study by Synofzik and colleagues found that schizophrenia patients relied more on *post-hoc* information about their actions (in the study: vision) (Synofzik et al., 2010). Similarly, another study investigating schizophrenia patients, as well a group of patients with a putative psychotic prodrome, showed that both patient groups, compared to healthy individuals, relied more strongly on external additional sensorimotor cues to agency in an ambiguous situation, where the reproduction of a drum-pad sequence had to be judged with respect to self-agency (Hauser et al., 2011b).

The approach of optimal cue integration might thus provide a common basis for the various misattributions of agency in schizophrenia patients, including their episodic nature (Synofzik and Voss, 2010; Synofzik et al., 2010). In schizophrenic patients with delusions of influence, internal predictions about the sensory consequences of one's own actions could be frequently imprecise and non-reliable. Patients should therefore be prompted in certain situations to rely more on (seemingly more reliable) alternative cues about self-action. These might either be *post-hoc* (e.g., vision, auditory input, affective valence of the action outcome, or postdictive thoughts), or predictive (e.g., prior sensorimotor expectations based on specific background beliefs or prior emotional appraisal of the situation). The stronger weighting of these alternative cues could help patients to avoid misattribution of agency for self-produced sensory events in the case of imprecise internal action-related predictions. However, as a consequence of giving up the usually most robust and reliable internal action information source, i.e., internal predictions, the sense of agency in psychotic patients is at constant risk of being misled by *ad-hoc* events, invading beliefs, and confusing emotions and evaluations. In other words: schizophrenia patients would be at constant risk of becoming "a slave to every environmental influence" (Frith, 1994, p. 151)—and to every affective and moral *ad-hoc* evaluation. Different agency judgement errors may result: patients might over-attribute external events to their own agency whenever these more strongly weighted alternative agency cues are not veridical and misleading, as is the case in delusions of reference (also referred to as "megalomania"). Conversely, if alternative cues are temporarily not attended or unavailable, patients might fail to attribute self-produced sensory events to their own agency and instead assume external causal forces (as is the case in delusions of influence). A context-dependent weighted integration of imprecise internal predictions and alternative agency cues may therefore reflect the basis of agency attribution errors in both directions: over-attribution, as in delusions of reference/megalomania, and under-attribution, as in delusions of influence (Synofzik and Voss, 2010; Synofzik et al., 2010).

Agency attribution in patients with delusions of influence usually has a very specific semantic content, differing from individual to individual (e.g., a delusional attribution of an action to a particular neighbor, relative, or religious entity), and fails only episodically and only in certain contexts. The cue integration approach might also explain these features: (1) an imprecision in efferent action-related information leads generally to a fluctuating, unreliable basis on which the sense of agency is built, prompting schizophrenia patients to rely more on other alternative cues, which might be misleading in some situations. (2) An altered weighting of affective cues and the well-established disturbances in formal thinking <sup>4</sup> in schizophrenia will then lead to an unbalanced and disturbed integration of different agency cues with a lack of coherency and consistency. (3) This leads to the formation of a delusional belief, resulting from an individual's weighting of cognitive and affective cues in a particular situation and the individual's personal background beliefs and history.

This would also explain why delusions of control do mostly not refer to trivial, non-emotional actions in daily life (e.g., brushing teeth or typing on a computer), but mainly to very specific, singular actions with high affective and/or moral value. Mostly, they refer to actions that are morally and socially not acceptable or at least negatively connoted, e.g., causing an accident, hurting someone, or behaving inappropriate in the presence of one's peers. Here the affective and moral valence gains major influence on both the sensorimotor and the cognitive level (which might lead to modulated predictions and perception as well as to specific negative beliefs), such that the action is consequently not attributed to one's own agency.

## **CONCLUSIONS**

The registration of being the initiator of one's own actions seems to arise from a dynamic interplay between predictive cues and postdictive cues. These can be in a sensorimotor format (e.g., internal predictions about the sensory consequences of one's actions or visual feedback) or in a cognitive format (e.g., background beliefs or information about the environment). The cues are not mutually exclusive, but used in combination according to their respective reliability to establish the most robust agency representation in a given situation. The cues and the weighting itself can be modulated by factors of the environment as well as by affective factors (e.g., emotional appraisal or reward anticipation).

<sup>4</sup>Features of formal thought deficits in schizophrenia patients which are probably particularly relevant for the formation of delusional beliefs include deficits in probabilistic reasoning and a premature "jumping to conclusions." Based on these deficits, patients might not give an adequate probabilistic weight to each agency cue and reach conclusions on the basis of significantly less evidence than healthy subjects and express more confidence in their decisions (Fletcher and Frith, 2009). This might explain the clinical observation that "patients all too easily develop false beliefs, which they then hold with great confidence and immunity to any counter evidence" (Fletcher and Frith, 2009, p. 50).

So far, only limited and preliminary experimental evidence is available to support this novel framework of agency awareness (Moore et al., 2009a; Synofzik et al., 2010; Hauser et al., 2011b; Gentsch et al., 2012; Moore and Fletcher, 2012). Yet this framework stimulates a wide range of questions and hypotheses on agency processing in different subject groups that will be experimentally testable:


## **REFERENCES**


driven by the mere presence of an action and not by motor prediction. *PloS ONE* 7:e29557. doi: 10.1371/ journal.pone.0029557


## **ACKNOWLEDGMENTS**

This work was supported by joined grants by the Volkswagen Stiftung (VW II/85 158 awarded to Matthis Synofzik; VW II/85 068 and VW II/85 155 awarded to Gottfried Vosgerau; VW II/85 067 awarded to Martin Voss). Publication of this article was supported by the Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Tuebingen University.


the sense of agency with external cues. *Conscious. Cogn.* 18, 1056–1064.


agency cues? *Conscious. Cogn.* 18, 1065–1068.


Wolpert, D. M., Ghahramani, Z., and Jordan, M. I. (1995). An internal model for sensorimotor integration. *Science* 269, 1880–1882.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 28 February 2013; published online: 15 March 2013.*

*Citation: Synofzik M, Vosgerau G and Voss M (2013) The experience of agency: an interplay between prediction and postdiction. Front. Psychol. 4:127. doi: 10.3389/fpsyg.2013.00127*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Synofzik, Vosgerau and Voss. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Awareness as observational heterarchy

#### *Kohei Sonoda1 \*, Kentaro Kodama2 and Yukio-Pegio Gunji 3,4*

*<sup>2</sup> Department of Informatics, School of Multidisciplinary Science, The Graduate University for Advanced Studies, Chiyoda-ku, Japan*

*<sup>3</sup> Department of Earth and Planetary Science, Faculty of Science, Kobe University, Kobe, Japan*

*<sup>4</sup> The Unconventional Computing Centre, University of the West England, Bristol, UK*

#### *Edited by:*

*Yuki Yamada, Yamaguchi University, Japan*

#### *Reviewed by:*

*Koichiro Matsuno, Nagaoka University of Technology, Japan Stanley N. Salthe, City University of New York, USA*

#### *\*Correspondence:*

*Kohei Sonoda, Department of Education, Faculty of Education, Shiga University, Hiratsu 2-5-1, Otsu 5200862, Japan e-mail: koheisonoda@gmail.com* Libet et al. (1983) revealed that brain activity precedes conscious intention. For convenience in this study, we divide brain activity into two parts: a conscious field (CF) and an unconscious field (UF). Most studies have assumed a comparator mechanism or an illusion of CF and discuss the difference of prediction and postdiction. We propose that problems to be discussed here are a twisted sense of agency between CF and UF, and another definitions of prediction and postdiction in a mediation process for the twist. This study specifically examines the definitions throughout an observational heterarchy model based on internal measurement. The nature of agency must be *emergence* that involves observational heterarchy. Consequently, awareness involves processes having duality in the sense that it is always open to the world (postdiction) and that it also maintains self robustly (prediction).

**Keywords: prediction, postdiction, internal measurement, heterarchy, awareness, emergence, wholeness**

## **INTRODUCTION**

Libet et al. (1983) reported mounting brain activity related to a resultant action for approximately three hundred milliseconds before subjects reported their first awareness of a conscious intention to act. In other words, conscious decisions to act were clearly preceded by an unconscious buildup of electrical charge within the brain. This buildup came to be called readiness potential (RP). Such a division between the conscious field (CF) and the unconscious field (UF) can be found in postscripts of intention in experiments<sup>1</sup> . Stimulating particular brain regions led to reactions of particular body parts without a subject's own intention (Delgado, 1969; Penfield, 1975). Consequently, one attributes actions executed by others (not one's own actions) to one's intention (Wegner et al., 2004). These results are explained as below. After an efferent copy of an initial motor command is generated and simulated, it is compared with afferent information from sensory feedback as a result of actual movement. In the case of congruence between efferent and afferent information, it is said that one experiences a sense of agency for the movement (e.g., Gallagher, 2000; Synofzik et al., 2008a,b). Here, these studies are based on the idea of a hierarchy comprising a higher monitoring part and a lower part executing actual movement. Results reported for an apparent mental causal path (Wegner and Wheatley, 1999; Wegner, 2003) show that that conscious will is subject to unconscious will and the comparator model (e.g., Wolpert et al., 1995; Frith et al., 2000a). However, the system presumed in those studies is hierarchical, not heterarchical (McCulloch, 1945).

Most previous studies have specifically investigated ways of mechanism comparing conscious intention with movement result. When expressing the comparing mechanism as a pair of thought–action, the pair is usually assumed as that of CF in those earlier studies. We raise a question of whether a pair of thought–action will be dual in brain. It means duality of the pair in CF and UF (**Figure 1**). Considering that the area playing a role of conscious will is just a part of brain, and that it is separate from areas generating actual motor command. We can accept some kind of independence between CF and UF, and assume dual pairs of thought–action (dual operating systems). Gunji (2013) showed that the area comparing an efferent copy with movement results is not merely a monitoring area but rather CF, and the RP area (UF) plays a role of execution of specific movements preceding CF. Comparing CF, UF is absolutely *others* in the brain. Gunji (2013) argued that an origin of voluntariness comes from such a twisted feeling of operation. One has a sense of being operated by *others* in the brain. Nevertheless, that person finds out that the *other* is himself. When we have a feeling of operation, we also face a difficulty of self-reference that "I operate on me." "I" operating (subjective *self*) and "Me" operated (objective *self*) are strictly different in status. Thus, "I operate on me" is fragile. Consequently, "*others* in the brain (UF) operate on me" is a stronger keynote than "I (CF) operate on me." Moreover, it should be found that the *other* is just "I" (myself). We argue that the twist is an origin of the sense of agency (SoA). Then, the *other* in the brain can become not only "myself" but also "someone unknown" or "you" just in front of me.

What is important here is the twisted viewpoint of accepting a mixture of "I" (CF) and "the *other*" (UF) while assuming

*<sup>1</sup> Department of Education, Faculty of Education, Shiga University, Otsu, Japan*

<sup>1</sup>In this study, we use CF and UF in the abstract sense.

some mutual independence2 . As described above, the twist can become the origin of voluntariness, but simultaneously engender the crisis of a system such as self in autism or integration disorder syndrome (e.g., Frith, 1989; Frith et al., 2000b). Therefore, we can describe a schematic model of conflict = mediation between CF and UF or "I" and "the *other*" (**Figure 1**). Then we would suggest that prediction and postdiction could be identified in a process of mediation. We presume that aspects of prediction and postdiction do not appear in previous studies (e.g., Blakemore et al., 2002; Bays et al., 2006; Synofzik et al., 2013). Those studies examined only problems in comparing motor intention with movement result in CF. We do not specifically examine such a simple problem on the comparison mechanism in CF. Beyond it we would rather specifically examine the conflict between two

<sup>2</sup>Many discussions based on the comparator model (Frith et al., 2000b) invariably presume an author list, {*I, Michael, Cathy, ...* }. Furthermore, we choose only one of the authors from the fundamental list depending on our situations. We sometimes mistake a choice. For example, we choose "*Michael*" instead of "*I*." This paradigm shows that agency is only a mechanism with error or illusion. However, *self* and *others* must be completely different categories. If that were not true, then the subject/object problem would disappear. Furthermore, the problem is in subject itself. The distinction of subject and object will be corresponded to that of CF and UF. Awareness involves the problem in itself. Our argument is that this problem can be formalized using set theory. The author list can correspond to a set of natural numbers {*1, 2, 3*, *...*}. Then we can place "arbitrary *n*" in the set where "*n*" is obtained after we survey the infinite set, although we actually cannot. "*n*" actually points nothing in the set. If we choose "*1*," "*1*" is a negation of the others (*2, 3*, *...*) in the set. However, "*n*" is a denial of the set itself. Thus, "*n*" is completely different status from the other numbers. Consequently, the placement of "*n*" into the set is a category mistake, but we do it easily in mathematics. Herein the "*n*" is "*I*" itself. The sentence "*the other is I*" represents a category mistake. However, agency is beyond the logical mistake. Therefore, we can state that the agency of "*I*" is emergence. In other words, "*n*" represents latency and "*1*," possibility: "*n*" is changeable after chosen since it is a meaningless sign and "*1*" is not changeable. We discuss the details of this argument below in the text.

operating systems: CF and UF. Consequently, we aim to redefine prediction and postdiction from the conflict in this study. In the conflict, the difference between prediction and postdiction is a gap separating "I" of CF and *others* in brain of UF. The gap is just the origin of voluntariness. Prediction stands for the aspect of equalizing "I" and *others* in brain by erasing the gap. However, postdiction means the aspect of materializing the gap as "someone" by being open to the world. In the next section, we dissert these aspects in detail through an observational heterarchical model, with a dynamic hierarchy including a latent mixture of levels.

## **AGENCY AND EMERGENCE**

What is the nature of agency? It must be that of emergence. Our argument expressed in this paper is that the nature of agency is that of emergence. *Other* (UF) operates on *me*(CF). Furthermore, I find that *the other* is *I*. This characteristic is the very emergence of agency. However, most current discussions depend on the comparator model (Frith et al., 2000b) that agency derives from a mechanism, and the judgment problem of whether it occurs before an event or after: roughly speaking, we are machines with agency and only judge events' timing, which sometimes reveals errors. The model cannot explain a vicarious agency with no efferent copy in which a person feels that one is doing something despite actually doing nothing himself (Wegner et al., 2004). A feeling of doing is only illusion if it is not accurate (Wegner, 2002). Herein, we can identify a dichotomy between the two: mechanism or illusion (**Table 1**). The problem is not abnormality or illusion of agency but normal agency that we feel in daily life. The daily life agency, a feeling that "I" operate on me, is not fundamental (mechanism). Then we obtain from the nature of emergence that *the other is I* <sup>3</sup> . Consequently, the salient difficulty is not a lack of experimental evidence but the concept of *emergence*.

As described in this paper, we attempt to describe the nature of agency using the notion of "observational heterarchy" (Gunji and Kamiura, 2003, 2004). In this section, we introduce notions of emergence. In the subsequent section, we also discuss this point in light of the notions of "hierarchy" (Salthe, 2012) and "heterarchy" (Stark, 1999). In the third section, we introduce the notion of observational heterarchy.

Notions of emergence have been discussed for a long time. There are many definitions of emergence (O'Connor and Wong, 2012). We can identify some kinds of hierarchical structures assume under the various notions (Barabási and Albert, 1999; Odum and Barrett, 2004; Postle, 2006). We briefly define emergent phenomenon as macroscopic patterns running through underlying microscopic interactions. For example, when we

#### **Table 1 | Dichotomies on the notion of awareness.**


3The original category of "*I*" is extended to "the other is *I*." In other words, {*I*} is extended to {*the other, I*}.

observed a population that has a new novel ability that the others of the same species do not have, we call that observation one of an emergent phenomenon. For explanations of such an emergent phenomenon, there are many discussions in philosophy (e.g., Kim, 1999; Bedau, 2008; Bitbol, 2012). However, these philosophical discussions are beyond the scope of this paper. Therefore, we only show a model of observational heterarchy as one of models of emergent phenomena in this paper <sup>4</sup> . And we discuss that the notions of hierarchy and heterarchy cannot be models of emergence and thus comparator model = mechanism (Frith et al., 2000b) or apparent mental causation = illusion (Wegner, 2002) cannot explain the nature of agency (**Table 1**).

## **HIERARCHY AND HETERARCHY**

First, we define the notions of hierarchy and heterarchy in this paper. Hierarchy is identifiable as some kind of order structure of a company (**Figure 2A**). Cladograms of taxonomy are also familiar <sup>5</sup> . Therefore hierarchy is definable as a partial ordered set (POS) <sup>6</sup> . Heterarchy is a dynamical hierarchy including of a mixture of levels (**Figure 2B**). Although heterarchy is apparently consistent, as in some discussions (McCulloch, 1945; Stark, 1999; Norman et al., 2010), it is inconsistent in the strict sense of the word (Salthe, 2012). Consequently, it cannot have its

<sup>4</sup>Our approach may be close to Bitbol's interventionist-constitutive view (Bitbol, 2012). Body is not fundamental but an observable (i.e., cognitive boundary) for an observer even if it is his own body. Mind is the same case. We concern about an observational process (an internal observer) for an observed relation of some kind of two levels (e.g., body and mind), and what the problem is if we admit that a substance can be such an observer. In the notion of internal measurement (Matsuno, 1989; Gunji, 1993, 2006), we express the relation as a mathematical duality and weaken it by various ways for the use in science. This dynamical duality can be an expression for the latency of downward causation.

formal expression attributable to its logical flaw for the mixture as described in the discussion presented below.

The wholeness7 that the notion of hierarchy invariably depends on is "transcendental wholeness" (Gunji, 2006)<sup>8</sup> . Transcendental wholeness is a privileged concept that differs from other concepts because of the point that it is not permitted to have an extent-perspective910. This wholeness seals the discussion of interaction between parts and a whole. Even if we discuss a hierarchical world (system), we cannot address a variation of the world (emergence). The wholeness of set theory is this transcendental one. This notion avoids Russell's paradox <sup>11</sup> and removes inter-level interaction12. We can also identify the removal of the mixture of levels from the notions of hierarchy (Salthe, 1985, 2012).

10Try to consider the extent of wholeness = the world. The definition that extent is a collection of objects to which a concept is applicable forces to us out of the concept. However, we cannot observe out of wholeness = the world. Here we can identify the impossibility of defining the extent of wholeness = the world. "Possible world" presents us with the same case.

11Russell's paradox is the following (Whitehead and Russell, 1925). First define class 1 set as a set that does not include itself (itself is not one of its elements). Next define class 2 set as a set that includes itself. For example, English is class 2 because it includes "English." Japanese is class 1 because it does not include "Japanese" (this word is in English). From this distinction, we can classify every set as class 1 or class 2. Here we make *M* by collecting all of class 1 sets. *M* does not include itself if *M* is class 1. However, the definition of *M* means that *M*, a collection of all of class 1 sets, includes itself because *M* is class 1. Therefore, it ends up in a contradiction. Next, when we assume that *M* is class 2, *M* includes *M*. However, from the definition of *M* , *M* does not include *M*. This also presents a contradiction. Finally, *M* cannot be class1 or class 2. This explanation shows characteristics of the Russell's paradox.

12From the definition of a concept, a pair of intent and extent defines a set dually. We can express the intent of a set as *y* = {*x*|*A(x)*} if we define *A(x)* as a nature of *x*. Extent of the set *y* is *x* ∈ *y* if *y* is a set. Equivalence of intent and extent is ∃*y*∀*x(x* ∈ *y* ⇔ *A(x))*. Russell's paradox can be derived easily from this notation. The definition of the class 1 expresses *A(x)* as *x* ∈*/ x*. From the equivalence of intent and extent, we can obtain *x* ∈ *y* ⇔ *x* ∈*/ x*. Therein, *x* is arbitrary and *y* is special. Therefore, we can exchange *x* by *y* and obtain *y* ∈ *y* ⇔ *y* ∈*/ y*. This is Russell's paradox. The paradox derives from the mixture of elements and sets. Current set theories forbid mixture by adding the restriction that element *x* is one element of arbitrary set *a*. Set theories define the restriction as separation schema ∀*a*∃*y*∀*x(x* ∈ *y* ⇔ *x* ∈ *a* ∧ *A(x)).*

<sup>5</sup>Some current discussions distinguish compositional hierarchy and subsumption hierarchy (Salthe, 2012).

<sup>6</sup>POS (Davey and Priestley, 2002) is defined as the following. If an element and an order are expressed as an alphabet and ≤, respectively, POS satisfies (1) *a* ≤ *a*, (2) *a* ≤ *b* and *b* ≤ *a* imply *a* = *b*, (3) *a* ≤ *b* and *b* ≤ *c* imply *a* ≤ *c*. The growth mode of hierarchy (Salthe, 2012) can be expressed as application of order-homomorphism between the POSs. It should be observational heterarchy or contradictory (heterarchy) if not the case.

<sup>7</sup>Herein, the expression of the "wholeness" does not mean the whole of the observer's focal level. It means the whole of all of every level in a hierarchical system. By contrast, in a usual sense, "outside" and "inside" are defined, respectively, as upper levels for the focal level and lower ones. Furthermore, an observer in the focal level cannot know about the outsides and insides (Salthe, 2012).

<sup>8</sup>Gunji (2006) point out three aspects of generation—origin, norm, and variation—by weaving the concepts of Deleuze and Guattari (1991), *plan d'immanence* and *les personnages conceptuels*, and his own considerations into his original theory of life, *weak wholeness, the meditating term, and internal observer*. Herein, we refer to the concepts of "wholeness" discussed in Gunji (2006).

<sup>9</sup>A pair of intent and extent can define the Classification concept. Intent is an attribute of a concept. Extent is a collection of objects or specified models to which the concept is applicable. If we observe "cheese" as a concept, then its intent is "a food derived from milk" and its extent is a collection of "Mozzarella, Parmigiano-Reggiano, Ricotta cheese, *...* " The pair of intent and extent can also define a concept of a set. The intent of a set of even numbers is "2*N* where *N* is a natural number." The extent of the set is "2, 4, 6, *...*"

Consequently, the transcendental wholeness corresponds to the concept of hierarchy.

Heterarchy is "contradictory wholeness" (Gunji, 2006). The second wholeness implies a whole consisting of parts while defining the whole as a contraposition to the parts. Furthermore, we obtain a contradiction of the concept. This wholeness appears in Russell's paradox (Whitehead and Russell, 1925). In other words, the second wholeness permits a mixture of levels: the mixture leads to Russell's paradox. Consequently, this wholeness corresponds to the notions of heterarchy that permit the mixture (McCulloch, 1945; Stark, 1999; Norman et al., 2010).

What is the difference between a transcendental wholeness and a contradictory one? It is the restriction of extent-perspective: a mixture of levels. Contradictory wholeness is an unrestrictive version of transcendental wholeness. The difference appears when we examine the "whole" of a description (a system or hierarchy). In Russell's paradox, when we survey the whole of all sets, the difference appears 13. The difference is latent until we survey the whole of the description of sets. Roughly speaking, it had been latent until Russell found it. Here, we emphasize that the difference between the two notions of wholeness is not limited in mathematics. For the discussions presented above, we correspond hierarchy and heterarchy, respectively, to the comparator model (Frith et al., 2000b) = mechanism, and the apparent mental causation (Wegner and Wheatley, 1999) = illusion. Determinism and vitalism are the same case (**Table 1**). In understanding of history, we also divide human history into two parts—stable periods and change periods—. We understand it through alternation of the first wholeness and the second wholeness. In such a dichotomy between the two, we can identify the fragile relation between affirmation of the world (stable period) and negation of the world (change period) to ascertain the wholeness of history comprehensively. Here we can expect the key that connects the two notions of wholeness as two different phases that comprise the nature of the world and of our understanding of the world.

## **OBSERVATIONAL HETERARCHY**

What should we do about the problem that emergent phenomena are beyond description (the first and second wholeness)? We cannot describe the phenomena. However, we, herein, strive to reveal the nature of emergence. The key to the problem must be reconsideration of the concept of wholeness that description is based on. Description invariably accompanies the notion, but remains outside of it. However, system theories construct models without consideration of this characteristic of description. The models are based on a transcendental viewpoint by which an emergent element (component or one level) derives from inside of the description (**Figure 3A**). We do not designate this picture as one showing emergence. Therefore, we reconsider the nature of description with internal measurement (Matsuno, 1989; Gunji, 1993; Gunji et al., 1997) in which an emergent element originates from outside of the description (**Figure 3B**). We express the characteristic by an agent's apparent reference of its description,

which seems to lead to a self-referential paradox (without this reference, the transcendental perspective reappears). Moreover, we construct invalidation of the paradox by a frame problem. Nevertheless, the model remains a mere description. Therefore, our construction is a model that implies the nature of emergence. Specifically we use weak duality of intent- and extent-perspectives of a description. In this section, we introduce the notion of observational heterarchy (Gunji and Kamiura, 2003, 2004) as the third wholeness: "weak wholeness" (Gunji, 2006) 14. This third notion of wholeness connects the other two. In the discussion presented above, the first and second wholeness appear in Russell's paradox: a mixture of levels. Thus, we reconsider this mixture.

Although the notion of heterarchy sounds contradictory, it aims at the nature of emergence: a mixture of levels. Why do we specifically examine the mixture? We do so because it is not limited in the problem of an abstract concept. We can find, in biology, some evidence that we can call not developments but evolutions. Important evidence for it is adaptive mutation (Shapiro, 1997, 2002). Splitting enzymes for sugar are controlled by an operon on

<sup>13</sup>In Russell's paradox, we first assumed that we could check all sets and divide them into two types of sets according to whether a set can include itself (class 2) or not (class 1). This assumption engenders a contradiction in Russell's argument. We mean this assumption as looking over a description of sets.

<sup>14</sup>In Cantor's diagonal argument (Moore, 1991), Cantor used the argument to extend the notion of a cardinal number. The argument only shows a contradiction of a statement for which the size of an infinite set *S* and that of the power set of *S* (the set of all partial sets of *S*) are the same. However, this negative argument led to the new limit of infinity (countable infinity) in mathematics. We herein identify a positive creation from the negative argument in the working of mathematicians. Gunji (2006) reconsidered a meaning of the diagonal argument to clarify the concept of internal observer who bears a positive meaning of negation (the notion of weak wholeness). In the argument, two kinds of "wholeness" of all of infinite bit strings are identified. "Intentwholeness" is defined by some kind of counting operation ({*1, 2, 3*,*...*}) and "extent-wholeness" by its use ({*...*, *n*, *...*}) for a comparison with an inverted diagonal bit string. Roughly speaking, intent-wholeness is a set of all of infinite strings before the comparison of each of all strings in the argument. Extentwholeness is one after the comparison. Therefore, we define "weak wholeness" as a notion that is intermediate of intent-wholeness and extent-wholeness. For example, Cantor creates the new limit of infinity (countable infinity) for an intermediate of them from the diagonal argument. Consequently, observational heterarchy with weak wholeness has intent- and extent-perspectives explicitly (Gunji and Kamiura, 2003, 2004).

DNA. If it switches on, an enzyme is expressed, if it is switched off, then it is not. In the experiment of adaptive mutation, *Escherichia coli* bacteria are cultured in culture media with sugar. The DNA of bacteria is converted not to express the splitting enzyme corresponding to the sugar. The bacteria have difficulty surviving because of the absence of the enzyme, which gives rise to a malfunction of the DNA–protein system. The mutation rate becomes high, and mutation hits the broken gene corresponding to the splitting enzyme for the sugar. Consequently, the bacteria can acquire the ability to use the sugar as energy source. DNA is definable as a higher level than cell interactions corresponding to the wasting state because proteins (enzymes) control the interactions and DNA also control the proteins in the bacteria. For adaptive mutation, the cell interactions affect DNA's behaviors directly, whereas DNA usually controls them through the enzyme. Here we can identify an apparent mixture or interaction of different levels—DNA and cells—in the bacteria. Furthermore, it can be expressed as two processes that do not involve hierarchy or heterarchy. When a malfunction of the DNA–protein system occurs in a focal level, the cell level, the DNA mutation rate becomes high in the upper level: the DNA level (**Figure 2C**). Consequently, the mutation hits the broken gene in the upper level and the splitting enzyme becomes activated at the focal level (**Figure 2D**). This image motivates us to consider the notion of observational heterarchy as a robust model for a mixture of levels.

Here we quote the summary of observational heterarchy presented in Gunji and Kamiura (2003) below.

*(1) Heterarchy*<sup>15</sup> *consists of two levels and inter-level operations. (2) Simultaneous interaction among levels is defined as simultaneous choice that is expressed as a surjective map from a set of one level to a set of inter-level operations. (3) Simultaneous choice implies the collapse of the logical framework; then heterarchy is regarded as a system inheriting logical collapse. (4) Because of the logical collapse, heterarchy gives rise to re-organization of the structure. (5) Heterarchy is not a real entity but it results from the interaction between an object and an observer. Two levels are fundamentally an intent-perspective and extent-perspective*16.

Observational heterarchy is not only an abstract notion but also a computational model. The model is the time-state-scale reentrant system (TSSRS) (Gunji et al., 2008; Sasai and Gunji, 2008) consisting of two perspectives: one is a logical self-reference paradox derived from an external observer (**Figure 2C**); the other is a frame-problem derived from an internal observer (**Figure 2D**). The logical self-reference paradox is a mixture of levels, whole of system (time-scale) and subsystem levels (state-scale). In a dynamical system, behavior of a system is expressed as a time development of its state. However, the state is obtainable only from the system's boundary condition in which only the upper level, a theorist, can provide. An operation of developing the state of the system (time development: time-scale) and that of providing the state (boundary condition: state-scale) is independent. TSSRS make the two operations re-entrant and invalidate the self-reference paradox (**Figure 2C**). The invalidation provides re-framing of the system by changing boundary conditions that mean invalidation of the frame-problem (**Figure 2D**).

In observational heterarchy, mediation of the self-reference paradox (a mixture of levels) provides re-framing of hierarchical structures, compression effect (**Figure 2C**) and an extension effect (**Figure 2D**) (we will define these notions in the next section). Here, it is noteworthy that we can identify a re-framing in the nature of agency (Wegner, 2002; Wegner et al., 2004). Consequently, there must be a mediating process of an apparent mixture of levels, observational heterarchy, in the nature of agency. Now, for discussions, we defined some hierarchical structures in light of agency. **Figure 4A** presents three structures in which upper components correspond to upper levels: CF interprets UF, our thoughts include my thought, and maps are applicable to elements 17. **Figures 4B,C** show re-framing phenomena in the observational heterarchy: "*I*" operates on me (**Figure 4B**), and "*you*" or "*someone*" operates on me (**Figure 4C**). In the next section, we explain the application of observational heterarchy to a mental causal path (Wegner and Wheatley, 1999; Wegner, 2003) and resolve it.

## **OBSERVATIONAL MENTAL CAUSAL PROCESS**

A mental causal path can be formalized as follows. In a case of body movement, "thought" is an intention to move and "action"

**FIGURE 4 | (A)** Assumed hierarchies on which observational heterarchy is based in this paper: abstract brain activity (left), All thought category in mental processes (middle), and Sets category (right). **(B)** Observational heterarchy with a compression effect in thought category: "I operate on me" (usual agency). **(C)** Observational heterarchy with an extension effect in the thought category: "You operate on me" or "Someone operates on me."

17From Libet et al. (1983), RP precedes conscious intention. In our description, UF precedes CF. We can say that UF converted actions and that CF interprets the actions. This sketch can also be used to identify aspects of the comparator model (Frith et al., 2000b). Interpretation indicates an order relation (Salthe, 2012). Therefore, CF is higher than UF. However, we mean some kind of order relation in which a mixture of levels is latent.

<sup>15</sup>In this quote, "*heterarchy*" means observational heterarchy.

<sup>16</sup>In the usual case, Intent and Extent are defined when a concept is being given. However, we generalize it here and consider the Intent and Extent that give a concept. We refer the definition by Gunji and Kamiura (2003) below. Definition (Generalized Intent and Extent): Given a concept, Intent is defined as a collection of attributes of the concept, and Extent is defined as a collection of objects to which the concept is applied. Conversely, given two collections of attributes and objects, if each object has all attributes and each attribute contributes to all objects, a pair of collections is called a pair of Intent and Extent. Then we say that Intent and Extent constitutes a concept. The operations by which an attribute in Intent is applied to an object in Extent are called interlevel operations. A triplet, *<*Intent, Extent, inter-level operation*>* constitutes a concept.

is an objective movement (**Figure 5A**) (movement usually consists of a pair of a body part and a content of the movement, but now the body part represents the pair). Executing a movement means mapping an intention to a body part (right arm) that is 0 (intention) → 0 (right arm) (**Figure 5A**). Simultaneous choice can be found in the mixture between thought and the mental causal path (mapping of thought–action). It is unavoidable that one realizes movement of a particular body part and not moving other parts simultaneously when one executes a movement. That means raising the right hand while not raising the left hand. We do not simply raise the right hand without keeping the left hand to balance our posture when raising the right hand. In other words, we cannot separate mapping an intention to a body part from mapping no intention to other parts. However, several mapping exist, we must consider all combinations between each element of a thought set and each of the action set (Details are in the next section). In other words, we choose one path (mapping) from the path set concurrently with choosing an intention (element) from the thought set. In sum, we conduct some kind of logically impossible operation by simultaneously choosing an element at lower level and mapping at a higher level (In usual computations, the element is substituted into prepared mapping after selected). This operation corresponds with simultaneous choice in observational heterarchy.

Following the summary of observational heterarchy (1)–(5) presented above, we summarize the application specifically (**Figures 5A–C**).

1. Define a set of value for the thought and the action as St = {0(intention), 1(not intention)}, and Sa = {0(right arm), 1(left arm)}, respectively. We designate all possible operations from the thought to the action Path-0, -1, -2, and -3, in the set of the mental causal path. Operations are defined as follows. Path-0: 0 → 0; 1 → 0,Path-1: 0 → 1; 1 → 1,

Path-2: 0 → 0; 1 → 1,Path-3: 0 → 1; 1 → 0.

Then, we obtain the path set as Sp = {Path-0, Path-1, Path-2, Path-3}.


<sup>19</sup>Gunji and Kamiura (2003, 2004) emphasized only the extension effect in reorganization derived from a mixture in observational heterarchy. This paper, however, indicates also a compression effect as a result of careful consideration of the mixed situation. We reconsider the difference between these two aspects as the difference between postdiction and prediction.

<sup>18</sup>Simultaneous choice is the definition of mixture of different levels and that expressed as a mapping. Mapping is defined as an operation from domain to codomain. Domain and codomain can be regarded as having different status, such as a start point and a target. Mixtures of different levels mean those of these two. The set defining a map is required domain and codomain dually. Furthermore, the map is not unidirectional but bidirectional, and then it also has nature of duality. Considering these requirements, isomorphic (bijective) mapping is a necessary and sufficient condition: Because mappings for all elements must be defined according to the requirement that a domain is a codomain, a surjective map is needed. If it is not injective map, it becomes one to many mapping in the opposite direction. Thereby an injective map is needed. As a result of the use of a surjective map and injective map needs, a bijective map is required.

from out of the set (**Figure 5A**). Results show that the value in the thought set changes from {0, 1} to {0, 1, 2 (someone), 3}. Conversely in a case of compression, the path set of two elements can be reconstructed by reducing path-2 and path-3 from the original set of four elements. Consequently, this set of two corresponds with the thought set of two (**Figure 5B**). This solution is temporary. Therefore, reconstruction of elements and maps should occur after resolving it.

5. As described above, observational heterarchy is not an actual entity but something observed by internal measurement. Therefore, sets of thought and action have intent-perspective and extent- perspective, similarly to internal (cause) and external (effect) descriptions in behavior. Consequently, the mental causal path can be resolved by application of the observational heterarchy model.

We present a development of the mental causal path as an observational heterarchy (**Figure 5C**). Usually, we can act as we intend to. A pair of thought–action as a mental causal path is realized here. In other words, intent–perspective is consistent with extent– perspective as a behavior. However, this assertion of consistency is merely an approximation. Simultaneous choices between intralevel and inter-level are latent in such a normal condition. It appears under abnormal conditions in experiments.

The apparent mixture of different hierarchical levels can be shown in the problem of self-referential mixture between the thought set and path set. As described at the beginning, however, the system never collapses despite some kind of self-referential condition. The mixture results in a collapse in logic, but not in the living systems. The body (system) never engenders collapse but engenders one-to-one correspondence by making consistence. This feature is called robustness. That means to engender oneto-one correspondence can be regarded as reconstruction of a set (frame). Such a reconstruction cause can be formally interpreted as two aspects, compression and extension of a set. As described below, we suggest that these two aspects of mediation of oneto-one correspondence correspond, respectively, with prediction and postdiction. Postdiction can be understood as the aspect of extension effect like a rubber-hand illusion (e.g., Botvinick and Cohen, 1998; Tsakiris and Haggard, 2005), out-of-body experience (e.g., Blanke and Mohr, 2005; Lenggenhager et al., 2007) or embodiment of instruments (e.g., Iriki et al., 1996; Maravita and Iriki, 2004; Sonoda et al., 2012). However, prediction can be understood as the aspect of a compression effect that compresses various interpretations related to cause set attenuation of sensation. Details will be described later. In sum, postdiction and prediction are not problems of the comparison mechanisms, but are instead derived from the perceptual difference in mediation of conflict between CF and UF as to agency.

## **POSTDICTION AND PREDICTION**

When we devote attention to experimental data of postdiction, we can find the extension process that specific experimental conditions cause unexpected feeling for observers. For instance, alien hand (e.g., Banks et al., 1989; Wegner, 2002; Biran and Chatterjee, 2004) or table turning (Wegner, 2002) are feelings of being moved by someone unknown. They can just arise for actors with thought extension. These examples show extension of the thought set (SoA). The following are examples of extension as to the action set [Sense of Ownership (SoO)]. The I-spy study (Wegner and Wheatley, 1999) or vicarious agency experiment (Wegner et al., 2004) shows the illusion of agency by which a subject feels SoA despite not operating by him in fact. These phenomena are regarded as illusions in the attribution of intention. Thereby they can be regarded as extension actions because a subject's attribution of their intention to action by others means that they choose elements from outside of the action set. Thus, it can be regarded as extension of SoO to some extent. This corresponds with the case in which a new element appears as presented in **Figure 5A**. The aspects will correspond with extension of SoO as reported by Botvinick and Cohen (1998) and by Lenggenhager et al. (2007). Regarding visual awareness, Eagleman and Sejnowski (2000) reported the perception of a ring trajectory despite its absence in fact. As described above, it is also regarded as extension effect.

In a case of prediction, the compression process can be identified. We can observe it in the experiment reported by Bays et al. (2006). Attenuation of the sensation was observed by selfgenerated tactile means. In brief, this observation indicates that sensation by touch becomes weaker when one touches one's own hand by oneself than when touched by others. Bays et al. (2006) constructed an apparatus consisting of a torque motor to realize two conditions: self-generated tactile (contact trial) and non self-generated tactile conditions (no-contact and delay trial). In the apparatus, when the right finger presses the button, the torque begins to rotate, resulting in the left finger being pressed (pulse). They differentiated self-generated tactile conditions with non-self-generated one by manipulating the duration between the time of button press and that of torque rotation in milliseconds. Therefore, without delay, it becomes a self-generated condition even though the torque intermediates (contact trial). With delay, it becomes a non-self-generated condition (delay trial). It becomes a no-contact condition if the button is out of alignment. At the moment if a sensor device senses the finger movement and it actuates the motor and presses left finger, the same finger movement can cause a pulse (no-contact trial). In the no-delay condition, when a subject's finger contacts the button (contact trial) that is a self-generated tactile condition. But, whekin a subject's finger does not contact the button (no-contact trial), which means a non-self-generated tactile in the sense of postdiction. Note that attenuation of sensation is observed in the self-generated tactile condition. However, identical results were shown not only in the contact trial but also in the no-contact trial (Experiment 1 in **Table 2**). Therefore, it was concluded that attenuation of sensation was not postdictive but predictive.

Note the assumption that the difference between contact and non-contact is discriminated after the button press event. Then the fact of attenuation despite the discrimination indicates that this perception is not postdictive but predictive. The problem here is the assumption of discrimination after the event. Although the discrimination indicates whether it is self-generated or not by contact, it is the problem that the discrimination and the pulse are perceptually in synchrony. How to address this synchronicity is a problem. In other words, we can find the problem of how we

**Table 2 | Summary of results of Bays et al. (2006).**


can interpret a causality of button press and pulse in the experimental setting (upper left side in **Figure 6A**). What we should devote attention to here be the fact that the compression process of interpretation, the contact trial, was regarded as the same trial as the non-contact trial. Moreover we should confirm the result that attenuation of sensation (interpretation of self-generated tactile) was not observed in the no-contact trial without a contact trial in the other experiment of Bays et al. (2006)(Experiment 2 in **Table 2**).

Following Gunji and Kamiura (2004), we try to describe this situation with division into internal description (Intent) based on subjective report and external one (Extent) based on orders of objective events 20. In this description, we use lattice structure (Davey and Priestley, 2002) and an order relation is a mental path (time) such as event A-event B when event A occurs before event B 21. In the contact trial, both Intent and Extent became the contact– attenuation order, mental path (time) shown in **Figure 6A** (upper right side). In the no-contact trial, however, Extent became a partial order set where, for simultaneous feeling of no contact and pulse, they have no order relation, as shown in **Figure 6A**

(upper left side). At this moment, the order of intention– attenuation might be readily apparent. No contact and pulse were arranged between them. In other words, because there were obvious order relations such as intention—no-contact and intention—pulse, these relations should be described as the lattice structure depicted in **Figure 6A** (upper left side). Considering mapping to Intent, several interpretations exist (groupings)22. Because contact and intention were trained by repetition, no contact and intention would be grouped in Extent (lower left side in **Figure 6A**). Intent of no-contact—attenuation would be formed (lower right side in **Figure 6A**). In a no-contact trial, appearance of contradiction between no contact and pulse as to the order relation can trigger the compression of interpretation. We can consider the compression as derived from by repetition in contact trial because the finger movements of nocontact trial are same as those of contact trial (Experiment 1 in **Table 2**). Note that, in the advance contact trial, grouping between contact and intention is not so readily apparent but trained. In fact, even in the contact trial, attenuation can never be

<sup>20</sup>McTaggart (1989), a philosopher, proposed a model for subjective and/or cognitive time. He evaluated two kinds of models, called A series and B series. The B series consist of events linearly ordered, and is designed by "before" and "after," on one hand. The A series consists of past, present and future which cannot co-exist and exclusive with each other, on the other hand. This original pair of the A and B series is utilized as a causal set (Bombelli et al., 1987) and its semantics in the field of quantum mechanics (Markopoulou, 2000), independent of philosophy. Gunji et al. (2009)studied a relation or interaction of the two series. In that study, both the series are defined as lattices (Davey and Priestley, 2002).

<sup>21</sup>We show the definition of causal set, partially ordered set (POS), and lattice, in brief. A causal set consist of separable events. Each event can be connected by another event via a directed edge without loops. If two events are connected by two edges that have different directions, they are equivalent to each other. Thus, these particular directed networks can be expressed as a POS (Davey and Priestley, 2002). If an event and directed edge are expressed as an alphabet and ≤, respectively, POS satisfies (1) *a* ≤ *a*, (2) *a* ≤ *b* and *b* ≤ *a* imply *a* = *b*, (3) *a* ≤ *b* and *b* ≤ *c* imply *a* ≤ *c*. For lattice, we also add some terminologies. Any elements *a* and *b* in a POS, *P*, are anti-chain with each other if neither *a* ≤ *b* nor *b* ≤ *a* does not hold. For any subsets *Q* ⊆ *P*, join of *Q*, denoted by ∨ *Q* is defined by such that for any *q* ∈ *Q*, *q* ≤ ∨*Q* and if *q* ≤ *s*, then ∨*Q* ≤ *s*. Especially, if *Q* is a two elements set such as {*a*, *b*}, ∨{*a*, *b*} is represented by *a* ∧ *b*. Similarly meet of *Q*, denoted by ∧*Q* is defined by such that for any *q* ∈ *Q*, *q* ≥ ∧*Q* and if *q* ≥ *s*, then ∧*Q* ≥ *s*. Especially, if *Q* is a two elements set such as {*a*, *b*}, ∧{*a*, *b*} is represented by *a* ∧ *b*. Given a partially order set, *P*, if for any *x, y* ∈ *P*, *x* ∧ *y*, *x* ∨ *y* ∈ *P* , then *P* is called a lattice. For example, a four elements lattice { *a, b,c, d*} such as that in **Figure 6A** (left side) has order relations {*a* ≤ *b, a* ≤ *c, b* ≤ *d, c* ≤ *d, b* and *c* are anti-chain}.

<sup>22</sup>Considering lattice-homomorphism from Extent to Intent, there are four: (1) {all elements} → {attenuation}, (2) {all elements} → {no-contact}, (3) {no contact, attenuation} → {attenuation}; {intention, pulse} → {nocontact} (4) {no contact, intention} → {no-contact}; {attenuation, pulse} → {attenuation}.

observed with delay. Alternatively, in the no-contact trial, attenuation is not observed either without contact trial (Experiment 2 in **Table 2**).

Are postdiction and prediction mutually independent effects? No. They must be just concurrent effects, but different in their own ways. Specifically they play different roles in the reorganization of sensation and perception. We explain this in a choice blindness experiment (Johansson et al., 2005). If the experimenter changed a picture that a subject had chosen to another one in secret, then subjects made up a reason for why they chose it even though they actually did not choose it. Again we try to describe this situation with division into Intent and Extent. In Intent, before changing to the other picture, order was reason A – choice A (upper right side in **Figure 6B**). After the change, it became reason B – choice B (lower right side in **Figure 6B**). In Extent, after the change, representing a fake picture and fake result of choice were sure (presentation B – choice B), whereas reason B and intention were unsure (left side in **Figure 6B**). By compression playing a role of prediction, reason B – choice B were reflected in Intent (right pointing arrow). However, it was impossible without producing reason B by extension: a role of postdiction (left pointing arrow). Consequently, it is concluded that postdiction and prediction emerge as the difference of two aspects of extension and compression in the organization of causality, even in the experimentally manipulated contradictive situation. Therefore, we are always internal observers. When perceiving a world that we cannot supervise, our perceptions necessarily accompany both postdiction and prediction.

## **APPLICATIONS OF OUR FRAMEWORKS TO EXPERIMENTAL PARADIGMS**

Our frameworks of awareness will be testable within some experimental paradigms, based on a gap or mixture of different sensational/perceptional information (intent-/extent-perspective). Specifically, they predict re-framing of thought/action set in a mental causal path (**Table 3**) 23. SoA and SoO will be corresponded to a thought and action set, respectively. Their dynamical duality relation (re-framing of the sets) can be derived from our frameworks naturally. Although there are some discussions about a relation between SoA and SoO (e.g., Gallagher, 2000; Tsakiris et al., 2006), they cannot predict such re-framings comprehensively. Additionally, ours can derive out of body experience (OBE) (Blanke and Mohr, 2005) and sleep paralysis (Santomauro and French, 2009) jointly while the other frameworks do not even mention their relation. In our frameworks, OBE and sleep paralysis is corresponded to extreme version of extension effect and compression one for re-framing of an action set (SoO), respectively24.

#### **Table 3 | Experimental paradigms derived from our framework.**


For the re-framing of thought sets, one feels oneself operated by someone (hypnotism), group will, something like a ghost (Ouija board), or nothing (automatism) (Wegner, 2002) 25. For the re-framing of action sets i.e., cognitive body frame, disownership (de Vignemont, 2011) and embodiment (Botvinick and Cohen, 1998) are famous and our frameworks also predict sleep paralysis and OBE (**Table 3**). Although there are eight experimental paradigms from our prediction, six out of eight have been already established and herein we only show the rest of them, sleep paralysis and OBE. Our frameworks correspond to the below experimental paradigms: extension effect—OBE, and compression effect—sleep paralysis.

#### **OUT-OF-BODY EXPERIENCE (TEST OF EXTENSION EFFECT)**

A particular subjective sensation called "out-of-body experience (OBE)" was reported (e.g., Blanke and Mohr, 2005; Ehrsson, 2007; Lenggenhager et al., 2007). Their procedures were based on mixture between visual and haptic information through a head mount display (HD) that showed subject's own back touched by a stick in real time. In this section, we introduce a new preliminary construction (Gunji et al., 2013) that causes a feeling of OBE, which differs from that of the previous studies. They used the system of substituted reality (SR) (Suzuki et al., 2012). Their experimental design is based on mixture of subjective and objective view. This design matches with our frameworks.

The SR system consists of multiple video cameras, recorder, and HD. In their design of OBE, a subject sitting in a room wears a helmet-type HD equipped with a subject-eye camera. He first sees an experimenter in front of him with naked eye, and after wearing HD he sees subjective viewed scene via HD. After that, the scene recorded by the objective eye cameras set in front of him is projected in HD. The subjective view and objective view are exclusive with each other, although they are both sides of the same coin—"now." They cannot be united by a single event in this situation. However, if he experiences continuous change between objective and subjective cameras, he can feel that he himself exists in his own subjective view. In a preliminary experiment, a subject can feel OBE in the situation. That is not just an experience in which a subject can see himself. He can feel that he creates objective view as if it was his lucid dream. Therefore, in this feeling exclusive subjective and objective scenes are united as a

<sup>23</sup>Although we described a compression effect for interpretations (maps) in the discussion presented above, we can also presume the effect for thought/action sets (elements) according to a mediation process, e.g. in a case of an extreme compression for maps.

<sup>24</sup>In both OBE and sleep paralysis, one is fully awareness. Thus, agency seems to be unchanged. Ownership associated with cognitive body would be changed to some kind of a compressed (unmovable) body in sleep paralysis and an extended (out of) body in OBE. These extreme offsets of SoO may cause an abnormal body feeling.

<sup>25</sup>Although this classification is expedient, we think that it is suggestive. In the classification, Ouija board is expressed as a compression effect since one may feel ghost's agency and its presence is not strong. Automatism is the same case. By the way, we can also assume that an extreme concentration in sports as an example of automatism instead of questionable studies of automatism (Wegner, 2002).

single event, different from the feeling experienced in the previous studies.

The mixture of subjective and objective scene leads the integration of the two scenes, and a subject gets objective view in which he can see himself. Then, an expanding "*self* " who has objective view, out of the body, appears. Note that self is a relation of the world and me. This paradigm of OBE demands to switch from the concept that self is reliable to the new one that self is flexible. This is the self who can expand itself in our frameworks. The paradigm of materializing the gap of subjective and objective view as ones lucid dream may also give an understanding of depersonalization disorder (Lambert et al., 2002).

## **SLEEP PARALYSIS (TEST OF COMPRESSION EFFECT)**

Sleep paralysis is a consciously experienced paralysis either when going to sleep or waking up. During an episode, one is fully conscious, able to open ones eyes but aware that it is not possible to move limbs, head or trunk (Dahlitz and Parkes, 1993; Santomauro and French, 2009). Sleep paralysis can be considered to be an intrusion of rapid eye movement (REM) sleep characteristics into wakefulness. That is, the muscles of the body are deeply relaxed and they cannot be moved with ease, and the dreamlike element with hallucinations may result from the brain activity "dreaming" that is typical of this sleep period (Dement and Kleitman, 1957). Putting it simply, there is a gap between the conscious activity in the brain and the deeply relaxing body: the gap may cause the unmovable body with consciousness. Note that one can move his relaxing body in normal sleep and cannot in sleep paralysis. Consequently, here is the self who compresses oneself into the unmovable body, a compressed self. This self contrasts to that of OBE.

There is currently no known way to induce sleep-onset REM periods, which have been found to be associated with sleep paralysis (Santomauro and French, 2009). But, note that the SR systems can cause a feeling of a lucid dream to some extent by continuous changing between subjective and objective view. Although there is no evidence, the SR system could cause a sleep paralysis like experience. It would need careful designs. One of them may be a continuous change between a real time scene (he see his moving body) and a recorded one (he sees not moving body), which causes some degrees of a gap between intention to move and resultant movement like in sleep paralysis. A subject could mediate the gap by not moving his body with a feeling in his lucid dream. If we could develop these methods, they might have

### **REFERENCES**


Attenuation of self-generated tactile sensation is predictive, not postdictive. *PLoS Biol.* 4:e28. doi: 10.1371/journal.pbio. 0040028


an effect to retain behaviors to some extent. Consequently, with these methods, we could apply them to retrain behavioral disorders such as hyperactivity disorder for rehabilitations. These applications may be contrast to a "mirror box" that can cause movement of unmovable phantom limbs (Ramachandran and Rogers-Ramachandran, 1996).

## **CONCLUSION**

A self-referential problem of mixture between different levels (element and map) can be mediated in two processes: compression and extension of a system. However, we should not regard them simply as different effects at the same level. We do not consider observational heterarchy simply as the model that can account for both postdiction and prediction, considering the fact that element and map have originally different status. Mediation of compression is compression of map, which means that one maintains the attitude that "the world is just what I predicted" even if inconsistency exists in the map (interpretation). However, mediation of extension is extension of an element, which means the deconstruction of the frame of self for inconsistency. Compression is a transcendental viewpoint that enforces institutionalization from the outside, whereas extension is an internal measurement that intends to make some adjustment from the inside. These two aspects are different levels in the mediation process. In this sense, prediction and postdiction are not mechanisms for the event (a normal feeling of doing is not fundamental because we can feel it even if without an efferent copy), but rather represent difference in the aspect of mediation. Consider the situation in which conscious will and unconscious will come together and an inconsistency appears. Prediction is the aspect that conscious will maintains the process persistently. Then "I" equal "the *other* in my brain." Conversely postdiction is the aspect by which conscious will is threatened and enforced by unconscious will to adjust. Then the gap separating "I" and "the *other* in my brain" is materialized as "someone."

Consequently, awareness can be found in such a conflict between conscious will (CF) and unconscious will (UF) that engender origin of voluntariness. It should be identified as a process having duality in the sense that it opens the world (postdiction) and that it closes (prediction).

## **ACKNOWLEDGMENTS**

This work was supported by JSPS KAKENHI Grant Number 255880.

185, 233–255. doi: 10.1007/s11229- 010-9723-5


184–199. doi: 10.1016/j.brainresrev. 2005.05.008


*D* 238, 2016–2023. doi: 10.1016/j. physd.2009.07.011


voluntary act. *Brain* 106, 623–642. doi: 10.1093/brain/106.3.623


form. *BioSystems* 92, 182–188. doi: 10.1016/j.biosystems.2008. 02.004


7, 65–69. doi: 10.1016/S1364-6613 (03)00002-0


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 June 2013; accepted: 11 September 2013; published online: 01 October 2013.*

*Citation: Sonoda K, Kodama K and Gunji Y-P (2013) Awareness as observational heterarchy. Front. Psychol. 4:686. doi: 10.3389/fpsyg.2013.00686*

*This article was submitted to Consciousness Research, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Sonoda, Kodama and Gunji. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Effects of consciousness and consistency in manual control of visual stimulus on reduction of the flash-lag effect for luminance change

## **Makoto Ichikawa<sup>1</sup>\* andYuko Masakura<sup>2</sup>**

<sup>1</sup> Department of Psychology, Chiba University, Chiba, Japan

<sup>2</sup> School of Computer Science, Tokyo University of Technology, Hachioji, Tokyo, Japan

#### **Edited by:**

Yuki Yamada, Yamaguchi University, Japan

#### **Reviewed by:**

Andre Mascioli Cravo, Federal University of ABC, Brazil Philip M. Grove, The University of Queensland, Australia

#### **\*Correspondence:**

Makoto Ichikawa, Department of Psychology, Chiba University, 1-33 Yayoi-cho, Inage, Chiba 263-8522, Japan.

e-mail: ichikawa@l.chiba-u.ac.jp

Four experiments investigated how observers' consciousness about their control of stimulus change affects the visual perception associated with the illusory flash-lag effect. In previous study (Ichikawa and Masakura, 2006), we found that the flash-lag effect in motion is reduced if observers were conscious that they were controlling stimulus movements by the use of computer mouse, even if the stimulus moved automatically, independently of observer's mouse control. In the other study (Ichikawa and Masakura, 2010a), we found that the consistent directional relationship between the observer's mouse control and stimulus movement, which is learned in our everyday computer use, is important for the reduction of the flash-lag effect in active observation. In the present study, we examined whether the reduction of the flash-lag effect in active observation requires the observers' consciousness about their control of stimulus change, and consistency in coupling mouse movement direction and stimulus change across trials in experiments. We used the flash-lag effect in luminance change because there is no intrinsic relationship between observer's mouse control and luminance change in our everyday computer use. We compared the illusory flash-lag effects for automatic change of the luminance with luminance change that was controlled by the observers' active manipulation of a computer mouse. Because the flash occurs randomly in time, observers could not anticipate when the flash was presented. Results suggest that the not only observer's consciousness of controlling the stimulus, but also consistency in coupling mouse movement direction with stimulus change, are required for the reduction of the flash-lag effect in active observation. The basis of the reduction of the flash-lag effect in active observation is discussed.

**Keywords: active observation, subjective set, controlling of stimulus change, proprioceptive information, training**

## **INTRODUCTION**

When a flash is presented physically aligned with a continuously moving stimulus, the flash is perceived in a lagged position relative to the moving stimulus. This is called the flash-lag effect (Nijhawan, 1994). This illusory lag effect has been found not only for positional transition, but also for transition in other visual attributes, such as changes in luminance, shape, and randomness (Sheth et al., 2000). For instance,for the luminance flash-lag observation, a stationary disk appeared on one side of the fixation point at the start of each trial and gradually increased (or decreased) its luminance. The second disk was briefly presented for one frame on the opposite side of the fixation point. Even if the luminance of those disks was the same, the first disk looks brighter (or dimmer) than the second one. This illusion has been explained by extrapolation of the delay of the visual processing (Nijhawan, 1994), postdictive processing for the moving stimulus (Eagleman and Sejnowski, 2000), differences in the processing time between the flash and moving stimulus (Murakami, 2001), delay of shift of attention which was captured by the flash (Baldo and Klein, 1995), and so on.

Our previous study has demonstrated that a viewer's active observation of the moving stimulus reduces the flash-lag effect (e.g., Ichikawa and Masakura, 2006). That is, the flash-lag effects in movement and luminance change were reduced if the observer actively controlled the continuous movement or luminance change of visual stimulus by the use of a computer mouse. The aim of this study is to find the necessary condition for the reduction of the flash-lag effect in active observation.

Lopez-Moliner and Linares (2006) reported that a reduction of the flash-lag effect when an observer's key press controlled the presentation of the flash, and hence observer could predict the presentation of the flash. Other studies found that removing attention from either the flash (Murakami, 2001; Baldo et al., 2002) or the moving stimulus (Shioiri et al., 2010) increases the flash-lag effect. These findings suggest that the active observation which is associated with attention directed to either the moving stimulus or the flash may facilitate visual processing and hence reduce the flash-lag effect in active observation. However, in our previous study (Ichikawa and Masakura, 2006), even if the flash occurs randomly in time, hence cannot be anticipated, the flash-lag effect was reduced when an observer actively controlled continuous movement of visual stimulus. This result indicates that, even if observer has difficulty to attend the stimuli and flash, active observation may reduce the flash-lag effect.

Our previous study also found that, even if the stimulus moved automatically, the flash-lag effect is reduced when the observers had a consciousness (subjective mental set) that they were controlling stimulus movements (Ichikawa and Masakura, 2006). That is, when the moving stimulus was controlled by the mouse until it reached the middle point of the movement, and then it moved automatically, observers did not noticed that the stimulus movement turned to automatic. For this condition, the flash-lag effect was reduced as in the active observation although the stimulus movement was automatic when the flash was presented. From this result, one may assume that the mental set of observers that they actively control the stimulus movement may reduce the flash-lag effect.

However, subsequent findings cast doubt on the assumption that the subjective-set of control over the stimulus plays a main role for the reduction of the flash-lag effect in active observation. That is, even when observer was conscious that they control the stimulus movement, this did not reduce the flash-lag effect if the observer used an unfamiliar device to control the visual stimulus, such as trackball (Ichikawa and Masakura, 2010a) or robotic arm (Scocchia et al., 2009). In addition, we found a reduction of the flash-lag effect when upward (and downward) movement of the moving stimulus was coupled with the forward (and backward) movement of the observer's hand, as in most computer-operating systems (e.g., MS's Windows, Apple's Mac OS, and Linux). However, this was not the case when the directional relationship between the stimulus movement and hand movement was reversed (Ichikawa and Masakura, 2010a). In addition, we found that even for the reversed pairing of directional relationship, the flash-lag effect was significantly reduced when observers were trained on the reversed relationship. These results indicate that learning about the everyday relationship between hand movement and stimulus transition may cause a reduction in the flash-lag effect by facilitating the visual processing through motor-sensory interaction.

The results of those previous studies do not exclude the possibility that the subjective consciousness of controlling over the stimulus has the effect to reduce the flash-lag effect if there is no factor which may disturb the visual processing. In those previous studies, the factors which are related to unfamiliarity in experimental setup (e.g., in the directional relationship between the hand movement and stimulus movement, and in the experimental devices) might disturb the visual processing. One should notice the possibility that this disturbance might cause the failure of the reduction of the flash-lag effect although observer's subjective consciousness of controlling the stimulus may have the effect to reduce the flash-lag effect.

In the present study, we examined whether the consciousness of controlling over the stimulus may reduced the flash-lag effect when there is no factors which may disturb the visual processing. That is, we used the flash-lag effect in luminance change (Sheth et al., 2000), in which there is no obvious intrinsic or learned directional relationship between hand movement and luminance change of a stimulus in any computer-operating systems, and for which we

found the reduction of the flash-lag effect in active observation (Ichikawa and Masakura, 2006). In addition, we examined how both a directional relationship between the hand movement and stimulus change, and learning about these relationship affect the facilitation of the visual perception, and consequently the extent of the flash-lag effect in active observation. We conducted four experiments to examine if and how the reduction of the flash-lag effect in luminance change depends upon the consistency of directional relationship between hand movement and stimulus change while the observers were conscious of controlling the luminance change of the stimulus. We will discuss the results of these experiments and the role of consciousness of controlling the stimulus, and consistency in the directional relationship between the hand movement and stimulus change in visual processing.

## **EXPERIMENT 1**

It is possible that reversing the directional relationship between hand movement and luminance changes used in the previous study (Ichikawa and Masakura, 2006) would diminish the degree to which the flash-lag effect was reductive effect by active observation. If so, then this would implicate the impact of some implicit learning of a specific directional relationship between the hand movement and luminance change due to our routine use of the computer mouse. Therefore, Experiment 1 examined whether an observer's control of a computer mouse reduces the flash-lag effect if the directional relationship between hand movement and luminance changes was reversed relative to the relationship examined in that previous study.

## **METHOD**

#### **Observers**

Five observers participated in the first experiment. They were graduate or undergraduate students; their ages ranged from 21 to 28 years. Although three of them had took part in the experiment in which we examined the effects of active observation on the flash-lag effect for motion, and showed significant reduction of the flash-lag effect, they were naïve as to the purpose of this study. All of them had normal or corrected-to-normal visual acuity and were right-handed, and all had used a personal computer with a computer mouse for at least 4 years.

### **Stimuli and apparatus**

We used the same apparatus and setting as used in our previous studies (Ichikawa and Masakura, 2006, 2010a). A personal computer (Apple Macintosh G4 with Mac OS 9) presented stimuli by the use of Vision Shell programing on a 21<sup>00</sup> display (Eizo T962, 75 Hz). The viewing distance was about 50 cm. The observer sat on a chair in front of a desk, with the head fixed on a chin rest, and grasped the computer mouse (Apple Pro Mouse M5769) with the right hand on the desk (**Figure 1A**). A computer keyboard (Sanwa Supply SKB-M1090H) was placed by the observer's left hand. The mouse and keyboard were connected to the computer by USB cables.

The center of the display was at the eye level of the observer. The luminance change stimulus was a stationary square (57.3 arcmin × 56.9 arcmin) whose luminance changed from 31.1 to 81.4 cd/m<sup>2</sup> (or from 81.4 to 31.1 cd/m<sup>2</sup> ). It was presented 1.0˚

below or above the red fixation point (19.0 arcmin × 19.1 arcmin) that was located at the center of the display (**Figure 1B**). In order to handle luminance of the luminance change stimulus, we conducted Gamma correction, and choose the range of color look-up table, which enables monotonic luminance change in the stimulus. We used 50% of random dot display as background in order to reduce the afterimage of the flash. The size of a dot in the background was 2.4 arcmin × 2.4 arcmin, and the luminance of the white and black dots were respectively 85.1 and 1.0 cd/m<sup>2</sup> . A red horizontal line (334.3 arcmin × 2.4 arcmin) was presented at the bottom or top of the display (about 15.2˚ above or below the fixation point) to indicate the start position for the mouse.

During the luminance change, a flash stimulus (57.3 arcmin × 56.9 arcmin) was presented for 13.3 ms, 1.0˚ above or below the fixation point with random timing. There were nine conditions for the luminance of the flash stimulus (ranging from 46.9 to 65.9 cd/m<sup>2</sup> by about 2.4 cd/m<sup>2</sup> step).

#### **Procedure**

Procedures were very similar to those that we used in our previous study to investigate the effects of active observation on the flashlag effect in luminance change (Ichikawa and Masakura, 2006). Observer's task was to judge whether the flash was brighter than the luminance change stimulus at the moment of the flash presentation. There were two observation conditions in which the luminance change stimulus was controlled in different ways. In the first condition (the Manual condition), the luminance of the luminance change stimulus changed with the position of the computer mouse that the observers manually moved forward (away from the body) or backward (toward the body) on a desk. That is, forward and backward mouse movements were respectively coupled with the decrement and increment of the luminance in the luminance change stimulus. This directional relationship between the hand movement and luminance change was consistent throughout the session. This relationship was opposite to that used in our previous study in which we found the reduction of the flash-lag effect in luminance change (Ichikawa and Masakura, 2006). About 27.0 cm of hand movement in depth dimension on the desk corresponded to the luminance change from 31.1 to 81.4 cd/m<sup>2</sup> of the luminance change stimulus.

Observers were instructed to fixate on the red fixation point and to move the mouse for about 2 s from the darkest (or brightest) to the brightest (or darkest) appearance to create continuous changes in luminance with a constant change velocity. If the luminance change took less than 1,600 ms or longer than 3,200 ms, the experimenter told the observer that the hand movement was out of the acceptable range and that he or she should move his/her hand faster or slower. That trial was presented again at the end of a block. In order to learn both the acceptable hand movement rate and that the hand movement changes the luminance of the stimulus, observers had a practice session with at least 40 trials before the experimental trials until the observer's hand movement was within the acceptable range (from 1,600 to 3,200 ms) for at least 10 consecutive trials. In the practice session, observers moved the mouse while viewing a display that showed the luminance change stimulus with the red fixation point and index line, but no flash stimulus.

In the second condition (the Automatic condition), the luminance change stimulus changed its luminance with the constant velocity (change rate) that was determined by the average velocity for the first conditions for each individual. Therefore, sessions for this condition were conducted just after all of the sessions for the first condition. Each trial began with presentation of a fixation point and luminance change stimulus, as in the first condition. After a randomly determined time interval (1,000–2,000 ms) after the observer pressed the space key, the luminance change stimulus began to change its luminance.

There were five blocks for each of the Manual and Automatic conditions. In each block, 36 stimulus conditions [luminous lag between the stimuli (9) × direction of the stimulus movement (2) × vertical position of the luminance change stimulus (2)] were presented once in random order (Total numbers of trials were 360 for an observer). At the beginning of each trial, the red fixation A

point and the red horizontal line were presented. For the Manual condition, the observers located the computer mouse at the start point on the desk in accordance with the position of the horizontal red line. When the observers pressed the space key to start the trial, the luminance change stimulus was presented below or above the fixation point. In each condition, the observer's mouse control (Manual condition) or key press (Automatic condition) started the luminance change of the stimulus. After a randomly determined time interval (0–400 ms) after the luminance change stimulus passed its luminance mid point, a flash was presented for 13 ms with one of the nine possible luminance levels. After the luminance change stimulus reached the end point of the luminance change, the observers pressed one of two keys to report whether the flash was brighter or darker than the luminance change stimulus.

In all of the experiments in this study, after all of the experimental sessions, the observers reported which of the conditions is the easiest in the luminance judgment, and guessed in which conditions their judgment was the most valid. In addition, they reported whether they felt that they controlled the luminance change of the stimulus during the sessions for each condition.

### **RESULTS AND DISCUSSION**

In the Manual condition, means of the time that each observer took in moving the mouse ranged from 2,313 to 2,421 ms (*M* = 2,354 ms). The mean of the velocities (change rate) in the luminance change stimulus for each observer ranged from 20.8 to 21.8 cd/m<sup>2</sup> /s (*M* = 21.4 cd/m<sup>2</sup> /s) for the Manual condition. The SD of the velocity within an individual observer ranged from 1.5 to 1.8 cd/m<sup>2</sup> /s (*M* = 1.6 cd/m<sup>2</sup> /s) for the Manual condition. These small SD indicate that the observer complied with instruction to move the mouse with a constant and stable velocity in the trials for these conditions. All of the observers reported that they controlled the luminance change in the Manual condition although they never felt that they controlled the luminance change in the Automatic condition.

**Figure 2A** shows results for a single observer, as an example. The vertical axis indicates the frequency of trials that the observer reported that the luminance change stimulus exceeded the luminance level of the flash. The horizontal axis shows the luminance lag between the luminance change stimulus and the flash. A zero point on this axis represents the luminance change stimulus and flash with the same luminance level. Therefore, on these trials, the appropriate frequency would be to judge stimulus brighter than flash 50% of the time. However, in the Manual condition, MT judged that the luminance change stimulus was brighter than the flash on about 80% of the trials and in the Automatic condition this rose to 95% of the trials.

A Probit analysis (Finney, 1971) determined as the 50% threshold for the response that the luminance change stimulus exceeded the luminance level of the flash. The flash-lag effect was derived from the division of the threshold by velocity of the luminance change for each observer. **Figure 2B** shows the means of the result thresholds value in each of the two conditions, averaged over the five observers. This figure reveals a clear reduction of the flash-lag effect for the Manual condition relative to the Automatic

for the two conditions.

condition. A paired *t*-test, used to compare the observed means for these two conditions, indicated that the difference between the Manual and Automatic conditions was significant [*t*(4) = 3.061, *p* < 0.05]. This result indicates that coupling the forward and backward hand movement with the decrement and increment of luminance leads to a reduction of the flash-lag effect that is similar to the reduction reported in Experiment 2 in that previous study

in which forward and backward hand movements were respectively coupled with the increment and decrement of luminance (Ichikawa and Masakura, 2006).

We conducted a 2 × 2 mixed design ANOVA to compare the flash-lag effects in luminance change in this study with that found in our previous study (Ichikawa and Masakura, 2006; with *N* = 6 observers) by the use of the experiment (present, or previous experiment) as a between factor, and observing condition (Manual, or Automatic) as a within factor. Only the main effect of the observing condition was significant [*F*(1, 9) = 8.633, *p* < 0.05]. Thus, we could duplicate the results of our previous study with the flash-lag effect for luminance change for the opposite directional combination between hand movement and luminance change. The present results, together with the those of the previous study, suggest that it is likely that consistency in the relationship between the hand movement and luminance change leads to reduction of the flash-lag effect for luminance change, and that the flash-lag effect is reduced in active observation regardless of whether the forward and backward hand movements were respectively coupled with the increment or decrement of the luminance.

The result that the naïve observers' perception in the Manual condition was more valid than that in the Automatic condition is not congruent with the observers' consciousness. That is, all of them guessed that their performance is more valid in the Automatic condition because, in the Automatic condition, they could concentrate on the visual stimuli although, in the Manual condition, they had to pay attention to the hand movement in order to move the mouse with a constant velocity. This incongruence between the measurement of the illusory flash-lag and observers' subjective introspection suggest that observers did not aware the reduction of the flash-lag effect in the active observation.

## **EXPERIMENT 2**

The second experiment was designed to pursue the possibility that consistency in the directional relationship between the stimulus luminance change and the observer's hand movement is the source of reduction in the flash-lag effect. In a previous study (Ichikawa and Masakura, 2010a), we found that the active control of stimulus movement reduced the flash-lag effect in motion only when the directional relationship between the hand movement and stimulus movement corresponded to the directional relationship in the popular computer OS. The luminance change task, however, offer the advantage of having no such intrinsic or default (routine) relationship between stimulus change and hand movements. Therefore, in the present task observers should not have acquired a learned preference for a certain directional relationship between the stimulus changes and hand movements.

In both of Experiment 1 in the present study and Experiment 2 in the previous study (Ichikawa and Masakura, 2006), observers felt that they controlled over the luminance change of the stimulus by the use of a computer mouse. In addition, in our previous studies (Ichikawa and Masakura, 2006, 2010a), observers reported that they always felt that they controlled the stimulus movement in any conditions in which they moved the stimulus by the use of computer mouse, even if the directional relationship between hand movement and stimulus motion was inconsistent. Therefore, we expected that observers would have consciousness of controlling over the luminance change of the stimulus regardless of the directional relationship between hand movements and stimulus change. Because there are no acquired biases for directional relations of action and luminance change in stimulus, one might anticipate that observer's consciousness would reduce the flash-lag effect, regardless of the directional relationship or their consistency over trials in a session. If so, that consciousness would be a sufficient conditionfor reduction of the flash-lag effect regardless of the consistency of directional relationship between hand movements and luminance change of the stimuli. In the second experiment, we examined this notion.

## **METHOD**

#### **Observers**

Nine new observers took part in the second experiment (four females and five males). Five of them had took part in the experiment concerning with the effects of active observation on the flash-lag effect in motion, and showed significant reduction of the flash-lag effect in active observation. All of them were naïve as to the purpose of this study. In addition, one of the two authors (Makoto Ichikawa), who had taken part in the experiments in the previous study (Ichikawa and Masakura, 2006, 2010a), participated in the experiment. Ages of the observers ranged from 21 to 44 years. All had normal or corrected-to-normal visual acuity and were right-handed, and all had used a personal computer with a computer mouse for at least 4 years.

#### **Stimuli and apparatus**

The set up of the equipment and the stimulus configuration were the same as in Experiment 1. As in Experiment 1, there were nine conditions for the luminance of the flash stimulus (ranging from 46.9 to 65.9 cd/m<sup>2</sup> by about 2.4 cd/m<sup>2</sup> step).

#### **Procedure**

Three observation conditions controlled the luminance change stimulus in different ways. In the first condition (Forward-Increment condition), the luminance of the luminance change stimulus was yoked to the position of the computer mouse that the observers manually moved forward or backward on a desk. The forward and backward movements of the mouse were respectively coupled with the increment and decrement of the luminance in the luminance change stimulus, as in our previous study (Ichikawa and Masakura, 2006). In the second condition (Forward-Decrement condition), the directional relationship between luminance change and mouse movement was the same as that used in Experiment 1 (i.e., the reverse of the Forward-Increment condition). In these two conditions, about 27.0 cm of mouse movement in depth dimension on the desk corresponded to the luminance change from 31.1 to 81.4 cd/m<sup>2</sup> of the luminance change stimulus. In this experiment, the two different mapping of mouse direction onto stimulus change (Forward-Increment, or Forward-Decrement) were presented on different trials within the same block, and thereby violated the consistency of these stimulus-mapping in each block. In all other respects, the procedure was the same as those used in the Manual condition in Experiment 1.

In the third condition (Automatic condition), the luminance change stimulus changed its luminance with the constant velocity (change rate) that was determined by the average velocity for the first and second conditions for each individual. At the beginning of each trial, the fixation point and luminance change stimulus were presented, as in the first and second conditions. After the random interval ranging from 1,000 to 2,000 ms after the observer pressed the space key, the luminance change stimulus started to change its luminance with a constant change rate. The blocks for this condition immediately followed the blocks for the first and second conditions.

There were five blocks in which both the Forward-Increment and Forward-Decrement conditions were presented. In each block of trials for these conditions, 72 stimulus conditions [directional relationship between the hand and luminance change (2) × luminous lag between the stimuli (9) × direction of the stimulus movement (2) × vertical position of the luminance change stimulus (2)] were presented in random order. There were also five blocks for the Automatic condition. In each block for the Automatic condition, 36 stimulus conditions [luminous lag between the stimuli (9) × direction of the luminance change (2) × vertical position of the luminance change stimulus (2)] were presented in random order (Total numbers of trials were 540 for each observer). Between the blocks, observers had short rests.

At the beginning of each trial, the red fixation point and the red horizontal line were presented. In the Forward-Increment and Forward-Decrement conditions, in accordance with the position of the horizontal red line, the observers located the computer mouse at the start point on the desk for the Forward-Increment and Forward-Decrement conditions. When the observers pressed the space key to start the trial, the stimulus was presented at the below or above the fixation point. In each condition, the observer's mouse control (Forward-Increment or Forward-Decrement condition) or key press (Automatic condition) started the luminance change of the stimulus.

#### **RESULTS AND DISCUSSION**

For all trials in the Forward-Increment and Forward-Decrement conditions, the means of the time for each observer to move the mouse ranged from 2,310 to 2,394 ms (*M* = 2,361 ms) for the Forward-Increment condition, and from 2,281 to 2,415 ms (*M* = 2,356 ms) for the Forward-Decrement condition. The mean change rate of the luminance was recorded in each trial; the mean of the change rates for each observer ranged from 21.0 to 21.8 cd/m<sup>2</sup> /s (*M* = 21.3 cd/m<sup>2</sup> /s) for the Forward-Increment condition, and from 20.9 to 22.1 cd/m<sup>2</sup> /s (*M* = 21.4 cd/m<sup>2</sup> /s) for the Forward-Decrement condition. All of the observers reported that they felt that they controlled the luminance change in the Forward-Increment and Forward-Decrement conditions although they did not in the Automatic condition.

As in Experiment 1, the flash-lag effect was derived from the luminance lag using Probit analysis determined as the 50% threshold for the response that the moving stimulus passed the level of the flash. **Figure 3** shows the means of the 50% thresholds in each condition for the 10 observers. This figure reveals no significant difference among three conditions. A one-way repeated measure ANOVA compared means of these three conditions using the data from the 10 observers (**Figure 3**). The

main effect of condition was not significant [*F*(2, 18) = 0.527, *p* > 0.05].

All of the observers, including MI (one of the authors) reported that they felt that they controlled the luminance change in the stimulus in both the Forward-Increment and Forward-Decrement conditions. In addition, they reported that there were no subjective differences between these two conditions in difficulty in controlling the stimulus luminance and in judging the relative luminance of the flash and the luminance change stimulus. This suggests that the observers were conscious that they controlled the luminance levels through their movement of a mouse, as in Experiment 1, and in our previous study (Ichikawa and Masakura, 2006). However, that consciousness was not accompanied by a significant reduction in the flash-lag effect for these two Manual conditions relative to the Automatic condition even if there is no inconsistency based on learning in the directional relationship between hand movement and stimulus change. This result suggests that the observers' consciousness that they control the stimulus luminance is not a sufficient condition for the reduction of the flash-lag effect in luminance change.

As in Experiment 1, the observers guessed that their performance was more valid in the Automatic condition because they could more concentrate on the luminance judgment in the Automatic condition, than in the other two conditions. However, their guess was not congruent with the obtained flash-lag effect.

#### **EXPERIMENT 3**

Experiment 2 produced no significant reduction of the flashlag effect based upon observers' manual control of the stimulus luminance although such effects were evident in Experiment 1, as well as in our previous study (Ichikawa and Masakura, 2006). In Experiment 2, the Forward-Increment and Forward-Decrement conditions were conducted randomly within the same

block. Therefore, it is possible that the inconsistency of the directional relationship between the hand movement and luminance change impaired the effect of active observation on the flash-lag effect.

In Experiment 3, we examined whether observer's manual control of a computer mouse can reduce the flash-lag effect when the directional relationship between the hand movement and luminance change is consistent within each block. In Experiment 3, the seven observers who took part in the second experiment conducted the manual and Automatic conditions where the Manual condition involved only the Forward-Decrement mapping of hand movement and stimulus luminance changes, as in Experiment 1.

### **METHOD**

#### **Observers**

The seven naives of 10 observers who took part in Experiment 2 participated in Experiment 3 4–8 weeks after the second experiment.

#### **Stimuli and apparatus**

The set up of the equipment and the stimulus configuration were the same as in Experiment 1.

#### **Procedure**

The procedures were the same as in Experiment 1 except that the observers who served in Experiment 2 (and also experienced the opposite, and inconsistently presented directional relationship between the hand movement and luminance change) participated in this experiment. There were five blocks for each of the Manual (Forward-Decrement) and Automatic conditions. In each block, 36 stimulus conditions [luminous lag between the stimuli (9) × direction of the luminance change (2) × vertical position of the luminance change stimulus (2)] were presented in random order.

#### **RESULTS AND DISCUSSION**

For all trials in the Forward-Decrement condition, the observers' mean times for moving the mouse ranged from 2,342 to 2,408 ms (*M* = 2,367 ms). The mean change rate of the luminance was recorded in each trial; the mean of the change rates for each observer ranged from 20.9 to 21.5 cd/m<sup>2</sup> /s (*M* = 21.3 cd/m<sup>2</sup> /s). The SD of the change rate within an individual observer ranged from 1.3 to 1.6 cd/m<sup>2</sup> /s (*M* = 1.5 cd/m<sup>2</sup> /s) for the Manual condition. No consistent difference in the change rate was observed in the Manual condition between Experiments 1 and 3.

As in Experiments 1 and 2, the flash-lag effect was derived from the luminance lag based upon the 50% threshold for the response that the luminance change stimulus exceeded flash luminance level. **Figure 4** shows the means of the 50% thresholds in each of the two conditions averaged over data from seven observers. A paired *t*-test comparing the means of these two conditions reveals no statistically significant difference between them [*t*(6) = 0.282, *p* > 0.10].

As in Experiment 1 and 2, all of the observers reported that they felt that they controlled the luminance change in the Manual

(that is, Forward-Decrement) condition. Such a finding indicates that, even if the observers were conscious that they controlled the luminance change of the stimulus as in Experiment 1 (and as in our previous study Ichikawa and Masakura, 2006), and even if a consistent directional relationship between the hand movement and luminance change was maintained during the experiment, manual control of the stimulus luminance, nor the conscious of controlling over the stimulus could not reduce the flash-lag effect in the Manual condition. This result suggests long lasting effects of the prior (Experiment 2) experience of an inconsistent relationship between the hand movement and luminance change on the flash-lag effect, which stretch over several weeks.

## **EXPERIMENT 4**

In our previous study (Ichikawa and Masakura, 2010a), we found that the flash-lag effect in motion was significantly reduced after the training session of 360 with an unfamiliar directional relationship between the hand movements and stimulus motions on computer display. In Experiment 4, we examine whether training with a specific directional relationship between the hand movement and luminance change (Forward-Decrement condition) can reduce the flash-lag effect in a luminance change if an observer has had prior experiences with inconsistencies in the directional relationship between the hand movement and luminance changes. After the training, observers would be able to more easily control the luminance of the visual stimulus with less attention to the hand movement. This easiness might reduce the cognitive load in active observation, and consequently reduce the flash-lag effect. In order to examine whether this is the case, in Experiment 4 we used observers from Experiment 3, and Experiment 2 as well who showed no reduction of the flash-lag effect in the Forward-Decrement condition in Experiment 3.

## **METHOD**

### **Observers**

The six of seven observers who took part in Experiment 3 participated in Experiment 4 from 27 to 32 weeks (about 6 months) after the third experiment.

### **Stimuli and apparatus**

Equipment and stimulus configuration were the same as in Experiment 1.

## **Procedure**

Experiment 4 consisted of two sessions; a training session and a post-training session. In the training sessions, procedures were similar to those of our previous study involving the flash-lag effect in motion (Ichikawa and Masakura, 2010a). That is, over 10 blocks of 36 trials each, observers were instructed to move the mouse in about 2,400 ms from the start line to the goal line. During the training sessions, the acceptable time for this hand movement in the Manual condition ranged from 2,000 to 2,800 ms; this range was narrower than that of Experiments 1 and 2. If the movement took longer than 2,800 or less than 2,000 ms, a low beeping sound notified an observer that the velocity was out of the acceptable range. In addition, if the time for the movement was within the range between 2,373 and 2,427 ms, a high beeping sound notified the observer that the movement was in the center of the acceptable range. Following the 360 training trials (10 blocks), observers had a post-training session in which they had 10 additional trial blocks. This numbers of training trials was sufficient to reduce the flash-lag effect significantly for moving stimulus in our previous study (Ichikawa and Masakura, 2010a). In the post-training session, the Manual condition (Forward-Decrement condition) was presented for five blocks followed by the Automatic condition for five blocks (procedures in both conditions were the same as those of Experiments 1 and 3).

## **RESULTS AND DISCUSSION**

For all of the sessions in the Manual condition, the mean amount of time that each of the six observers took in moving the mouse from the start point to the goal ranged from 2,281 to 2,500 ms (*M* = 2,394 ms) in the post-training sessions. The mean change rate of the luminance ranged from 20.1 to 22.1 cd/m<sup>2</sup> /s (*M* = 21.0 cd/m<sup>2</sup> /s). The SD of the change rate within an individual observer ranged from 1.3 to 1.7 cd/m<sup>2</sup> /s (*M* = 1.4 cd/m<sup>2</sup> /s) for the Manual condition. For the six observers, no consistent differences were observed in these values from those values in the Manual conditions of Experiments 2 and 3. All of the observers reported that they felt that they controlled the luminance change in the Manual condition although they did not in the Automatic condition both before and after the training session. They guessed that their performance in the luminance judgment was more valid in the Automatic condition. They reported there were no remarkable difference in easiness in controlling the stimulus luminance between the sessions before and after the training session while the number of the trials in which the velocity of the luminance change was outside of the acceptable range decreased from 7.1% (SD = 3.61%) to 5.4% (SD = 1.48%) in average.

The flash-lag effect was derived in the same way as in the other experiments. **Figure 5** shows the means of the 50% thresholds from the six observers in each condition. A paired *t*-test on mean data from the six observers found no significant difference between the Manual and Automatic conditions [*t*(5) = 1.508, *p* > 0.10]. In order to compare the flash-lag effect between before and after the training sessions, we also conducted a three by two analysis of variance in order to compare the flash-lag effects for the Forward-Decrement mapping in this study with those in Experiments 2 and 3 for the six observers who took part in all of these three experiments. The two within factors were experiment (Experiment 2, 3, or 4) and observing condition (Manual, or Automatic). We found no significant main effect [experiment factor, *F*(2, 10) = 2.795, *p* > 0.10; observing condition factor, *F*(1, 5) = 0.279, *p* > 0.10], or interaction [*F*(2, 10) = 0.779, *p* > 0.10]. These results indicate that there was no consistent variance in the flash-lag effect among these experiments.

These results suggest that, even if the observers have consciousness of controlling the stimulus during the experimental sessions, the experience of inconsistent relationship between the mouse movement and luminance change (in Experiment 2) is long lasting (for at least as much as 6 months). This inconsistency would impair the original visual facilitation process that leads to reduction of the flash-lag effect in luminance change regardless of the directional relationship between the hand movement and luminance change.

## **GENERAL DISCUSSION**

Results of the present Experiment 1, as well as those from a previous study (Ichikawa and Masakura, 2006), showed that the flash-lag effect could be reduced when observers actively engage in observation of relevant stimulus, even without the learning of the directional relationship between the active hand movement and stimulus change. That is, regardless whether the forward and backward hand movements were respectively coupled with luminance increment and decrement, the flashlag effect was reduced if the directional relationship was consistent over the trials within an experimental session. During the trials in the experiment, observers felt that they controlled the stimulus change. Together with the previous study (Ichikawa and Masakura, 2006), these results indicate that the observer's subjective consciousness of controlling stimulus change plays important role in the reduction of the flashlag effect in active observation when the directional relationship between the hand movement and stimulus change is consistent.

As referred in Introduction, several studies demonstrated that the prediction for the flash has effect to reduce the flash-lag effect in active observation (e.g., Baldo et al., 2002; Lopez-Moliner and Linares, 2006). In the experiments in the present study, however, because the timing of the flash was random, and therefore because the observers could not predict the timing of the flash, observer's prediction for the flash cannot explain the reduction of the flash-lag effect in Experiment 1.

The results of the present four experiments showed that, the reduction of the flash-lag effect was restricted to the case in which the directional relationship between hand movement and stimulus change was consistent within each experiment. In those experiments, observers always felt that they controlled the luminance change of the stimulus in the active condition. The results of these experiments suggest that the reduction of the flash-lag effect in active observation require not only the consciousness of controlling the stimulus change, but also the consistency in the directional relationship between the hand movement and stimulus change to reduce the flash-lag effect. This notion is compatible with the results of our previous studies that, although the flash-lag effect was reduced in active observation for the initial relationship between hand movement and stimulus movement in direction (Ichikawa and Masakura, 2010a) and ratio in distance (Ichikawa and Masakura, 2010b), it was not reduced in the following sessions in which that relationship turned to novel ones. These results suggest the importance of the consistency in the relationship between the hand movement and stimulus change across experimental sessions for the reduction of the flash-lag effect in active observation.

We consider that the proprioceptive information, which is involved in the active observation, would be the factor which enables our visual system to reduce the flash-lag effect, in addition to the consciousness of active control of stimulus change and prediction for the timing of the stimulus presentation. There are several studies that have shown active hand movement can facilitate the visual processing of the stimuli that are coupled with observer's own movements. For instance, active hand movement, which caused the rotation of a radial grating stimulus below the hand, enhanced the duration of the motion aftereffect for the grating stimulus if the direction of the hand movement was consistent with the direction of the visual motion (Matsumiya and Shioiri, 2008). Proprioceptive information which is related to the movement of viewing point facilitates detection of motion signal during the viewing of motion illusion figures (Spillmann et al., 2003). Moreover, tactile motion with hand would activate the human MT+ (Hagen et al., 2002; Blake et al., 2004). In short, these studies suggest that proprioceptive information of active movement of hand or body which is related to the visual motion can facilitate the processing of that visual motion. In addition, we found that the active observation reduced the reaction time both for the shape change of the moving stimulus and for the flash (Ichikawa and Masakura, 2006). This result indicates that the active observation facilitates the processing not only for the moving stimulus, but also for the area, which include both the moving stimulus and flash. This facilitation of the visual processing in terms of proprioceptive information in active observation may explain more accurate perception because the visual processing is improved by that facilitation. The results of the present study suggest that the proprioceptive information which is related to the change in stimulus may make the visual processing more accurate not only for stimulus motion, but also for luminance change in the stimulus.

Similar reduction of the flash-lag effect was found for the case in which observer attended to the moving stimulus (Shioiri et al., 2010). Because of long lasting effect of exposure to inconsistency in the directional relationship between hand movement and stimulus change, we think that the reduction of the flash-lag effect that we found in this study is caused by the proprioceptive information, rather than by the attention to the moving stimulus which is actively controlled by observer. That is, once observer is exposure to inconsistency in the directional relationship between hand movement and stimulus change is inconsistent, the visual system failed to reduce the flash-lag effect in active observation in the following experimental sessions, even several months later, and even after the hundreds of training trials with a specific directional relationship between the hand movement and stimulus change. However, as shown in Experiment 1 in the present study and our previous study (Ichikawa and Masakura, 2006), observers needed no training to acquire the reduction of the flash-lag effect if the directional relationship is consistent. These results indicate that the basis of the reduction of the flash-lag effect in active observation is established without any previous learning if the directional relationship between the proprioceptive information of the hand movement and visual information of the stimulus change is consistent, and that the inconsistency in that directional relationship impairs the basis of the reduction of the flash-lag effect in active observation for long term. Future studies should examine how the consistency in the relationship between the proprioceptive information of hand movement and visual information of stimulus change affect the flash-lag effect, and what factors may facilitate the visual processing due to active observation with specific relationship between the hand movement and stimulus change.

## **ACKNOWLEDGMENTS**

A preliminary report on this research was presented at the annual meeting of the Vision Sciences Society, at Naples, Florida, USA in May 2011, and International Multisensory Research Forum at Fukuoka, Japan in October 2011. This research was partially supported by the Japan Society for Promotion of Science Grant 21530760 to Makoto Ichikawa.

## **REFERENCES**


(MT/V5) complex. *Eur. J. Neurosci.* 16, 957–964.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 December 2012; accepted: 25 February 2013; published online: 14 March 2013.*

*Citation: Ichikawa M and Masakura Y (2013) Effects of consciousness and consistency in manual control of visual stimulus on reduction of the flash-lag effect for luminance change. Front. Psychol. 4:120. doi: 10.3389/fpsyg.2013.00120*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Ichikawa and Masakura. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Visuomotor control of human adaptive locomotion: understanding the anticipatory nature

## **Takahiro Higuchi \***

Department of Health Promotion Science, Tokyo Metropolitan University, Tokyo, Japan

#### **Edited by:**

Makoto Miyazaki, Yamaguchi University, Japan

#### **Reviewed by:**

Michael Eric Cinelli, Wilfrid Laurier University, Canada Hiroyuki Mishima, Waseda University, Japan

#### **\*Correspondence:**

Takahiro Higuchi, Department of Health Promotion Science, Graduate School of Human Health Science, Tokyo Metropolitan University, 1-1 Minami-Ohsawa, Hachioji, Tokyo 192-0397, Japan. e-mail: higuchit@tmu.ac.jp

To maintain balance during locomotion, the central nervous system (CNS) accommodates changes in the constraints of spatial environment (e.g., existence of an obstacle or changes in the surface properties). Locomotion while modifying the basic movement patterns in response to such constraints is referred to as adaptive locomotion. The most powerful means of ensuring balance during adaptive locomotion is to visually perceive the environmental properties at a distance and modify the movement patterns in an anticipatory manner to avoid perturbation altogether. For this reason, visuomotor control of adaptive locomotion is characterized, at least in part, by its anticipatory nature. The purpose of the present article is to review the relevant studies which revealed the anticipatory nature of the visuomotor control of adaptive locomotion.The anticipatory locomotor adjustments for stationary and changeable environment, as well as the spatio-temporal patterns of gaze behavior to support the anticipatory locomotor adjustments are described. Such description will clearly show that anticipatory locomotor adjustments are initiated when an object of interest (e.g., a goal or obstacle) still exists in far space.This review also show that, as a prerequisite of anticipatory locomotor adjustments, environmental properties are accurately perceived from a distance in relation to individual's action capabilities.

**Keywords: walking, obstacle avoidance, adaptation, gaze behavior, older adults, optic flow**

## **INTRODUCTION**

Locomotion, such as walking, running, cycling, or using an automobile or a wheelchair, is the behavior of moving one's body toward a desired place. During locomotion, the critical role of the central nervous system (CNS) is not only to propel the body in the intended direction but also to maintain balance (i.e., not to fall). Balance of upright stance is ensured provided vertical projection of the center of mass (COM) falls within the base of support (BOS) (Patla, 2003). A challenging aspect of maintaining balance during locomotion is that, whereas balance during quiet stance is maintained with control of the position of COM within BOS, COM, and BOS are in motion during locomotion with BOS changing its size; during the single support phase, the size of BOS is as small as the size of one anatomical foot. Furthermore, COM during the single support phase is outside BOS; every time an individual steps with single leg, gravity-produced rolling movement of COM to the side, referred to as the lateral sway, occurs (Winter, 2004).

Another challenging aspect of maintaining balance during locomotion is that the CNS is required to accommodate changes in the constraints of spatial environment. When confronting an obstacle, for example, individuals need to control the displacement of COM to either step over the obstacle, change direction, or even stop walking. Navigating through a narrow opening requires modification of locomotor patterns if the size of the opening is too small relative to the body. Locomotion while modifying the basic movement patterns to propel in response to environmental constraints is referred to as adaptive locomotion.

To maintain balance with these challenging aspects, the CNS takes both a reactive strategy to deal with unexpected perturbation and a pre-planned strategy to avoid potential perturbation *a priori*. A pre-planned strategy is further divided into predictive and anticipatory strategies (Massion, 1992; Huxham et al., 2001; Patla, 2003; da Silva et al., 2011). A predictive strategy refers to the maintenance of inter-segmental stability within the body or between the body and surface based on the estimation of expected perturbation generated by ongoing movements. The predictive strategy is therefore used to regulate locomotion on a local level (i.e., a step-by-step basis). In contrast, an anticipatory strategy refers to the maintenance of balance on a more global level (i.e., sustained over several steps). Locomotor patterns are modified on the basis of visual information about environmental properties at a distance to avoid a future perturbation altogether.

While vision plays an important role on all of the reactive, predictive, and anticipatory strategies, the anticipatory strategy is driven exclusively by vision. This is because vision provides the spatio-temporal information regarding a remote place very precisely. Understanding the anticipatory nature of the adaptive locomotion is, therefore, particularly helpful to understand how vision is used to adaptively control our locomotion.

The purpose of the present article is to review relevant studies to reveal the anticipatory nature of the visuomotor control of adaptive locomotion. This review will yield tentative conclusions: (a) adaptive locomotion is controlled in part through anticipatory locomotor adjustments, which can be sustained over several steps; (b) while anticipatory (i.e., pre-planned) locomotor adjustments are the most powerful way to avoid perturbation, visually guided, on-line adjustments also come into play particularly in the final phase under a changeable environment; (c) a common characteristic of eye movements during adaptive locomotion is that the majority of fixations are directed either toward a desired future path or toward an object of interest; and (d) accurate visual perception of environmental properties relative to action capabilities from a remote place underlies the adaptive locomotor adjustments.

## **ANTICIPATORY LOCOMOTOR ADJUSTMENTS**

## **LOCOMOTOR ADJUSTMENTS INITIATED AT LEAST A FEW STEPS PRIOR TO REACHING AN OBSTACLE**

When walking and encountering a specific area that would not afford stable balance, such as an icy spot on the ground, an individual would need to select an alternative foot placement to avoid stepping on that area. The dominant strategy to modulate a foot placement is to lengthen the stride to step farther from the normal landing spot (Patla et al., 1999; Moraes et al., 2004). This is understandable because it does not impede an individual's forward progression. Importantly, the stride was gradually lengthened afew steps before they reached a spot to be avoided (Moraes et al., 2004) (**Figure 1**). This suggests that the adjustment of foot placement starts a few steps before reaching the area to be avoided.

When participants were asked to step over two obstacles located 1 m apart, their foot placement to take off prior to the first obstacle was closer to the obstacle than when they were stepping over a single obstacle (Krell and Patla, 2002). This is also an understandable method in order to obtain a better take-off position prior to a second obstacle and suggests that the modification of limb movement for avoiding the second obstacle was already initiated before stepping over the first one.

A similar conclusion was obtained from a study about stepping over an obstacle (Patla, 1998). The study demonstrated that even when an obstacle of height was not visible for the duration of one step prior to the participant stepping over it, the limb movements were quite similar to the condition when the obstacle was visible throughout the stepping motion (Patla, 1998). This suggests that a limb movement to step over an obstacle is already planned at least one step prior to stepping over it.

These findings clearly show the anticipatory nature of adaptive locomotor adjustments; to ensure balance at the time of avoiding an obstacle, modification of locomotor patterns are initiated at least a few steps prior to reaching it. Importantly, a decrease in movement speed has been observed prior to executing the critical locomotor adjustments such as body rotation when passing through an opening (Higuchi et al., 2006a; Cowie et al., 2010). Provided that a decrease in movement speed assists in accurately executing the critical changes in locomotor pattern (i.e., speedaccuracy trade-off), an anticipatory locomotor adjustment would be initiated much earlier than a few steps prior to reaching an object of interest.

It is likely that prior experience and knowledge about environmental constraints affect the anticipatory strategy (Huxham et al., 2001; Patla, 2003). For example, when a slip was suddenly and unexpectedly generated following foot contact on a set of steel freewheeling rollers (i.e., first-time experience of a slip in that situation), participants reactively coped with the perturbation of balance. However, after just a single experience of this unexpected slip, the participants adapted to the potential slip and modified their locomotor patterns in an anticipatory manner whenever stepping on the rollers (Marigold and Patla, 2002). This anticipatory strategy to step safely on the rollers was referred to as "a surfing strategy" by the authors, which included the attenuation

of muscle response magnitude, reduced braking impulse, landing more flat-footedly, and elevating the COM. Similarly, when participants were asked to step over a fragile obstacle, they modified their limb elevation when crossing over it so that a larger spatial margin was created between the obstacle and the toe (Patla et al., 1996). This suggests that knowledge about the environmental constraints affects the anticipatory locomotor adjustments (Wagman and Malek, 2009).

## **ANTICIPATORY LOCOMOTOR ADJUSTMENTS IN A CHANGEABLE ENVIRONMENT**

Advances in the understanding of adaptive locomotor control have been made by an increase in the number of investigations using changeable environmental properties (Cutting et al., 1995; Montagne et al., 2003; Fajen and Warren, 2004; Gerin-Lajoie et al., 2005; Andersen and Enriquez, 2006; Cinelli and Patla, 2008; Cinelli et al., 2008, 2009). Many of these studies showed that the anticipatory nature of adaptive step adjustments is maintained in a changeable environment. However, the strategy to adapt in a changeable environment seems to slightly different from that to adapt in a stationary environment; although an anticipatory locomotor adjustment under a changeable environment would be initiated as early as those under a stationary movement, the critical locomotor adjustments to avoid an obstacle or reach a goal are achieved in a final phase by a combination with visually guided, on-line locomotor adjustments.

Gerin-Lajoie et al. (2005) investigated how their participants circumvented an obstacle (a full-sized department store mannequin) that was initially located on a participant's right (about 8 m apart from the participant), crossed the participant's path at a 45˚ angle, and interrupted a straight walking path toward the goal (about 5 m apart from the participant). Since it is more natural to pass behind a moving obstacle (Cutting et al., 1995), the participants' walking path was deviated to the right to pass behind the mannequin. Their initial path deviation to the right occurred about 4.5 m (approximately six steps) from the mannequin. This clearly showed that the changes in the walking path to circumvent the obstacle were planned *a priori* and initiated as soon as the participants started walking. However, the most pronounced step adjustments to deviate to the right occurred about 1.5 m from the obstacle. This suggests the importance of the final locomotor adjustments just prior to obstacle avoidance.

Cinelli et al. (2009) investigated how participants steered toward the middle of a door opening that was located 8 m from them and moved to the side as soon as they initiated walking. The main finding was that, interestingly, irrespective of whether the door opening moved to the left or right, the participants initially walked in such a way as to aim at the middle of a "doorframe," with which the door was suspended, rather than the middle of the door opening. Cinelli et al. interpreted this finding that, when locomoting in a changeable environment, participants simplified the task by placing themselves in an area that has the greatest potential for avoiding collision (i.e., aligning themselves with the middle of the doorframe enabled them to move in either direction quickly). However, once they were in the middle of the pathway (about 2 s prior to passing through the opening), they began to

aim at the middle of the door opening while looking at it. This suggests that the final locomotor adjustments were driven mainly by visually guided, on-line control, rather than by an anticipatory, pre-planned control.

When the environmental properties were continuously changing, the initial strategy to adapt was approaching while slowing down the movement (Montagne et al., 2003; Cinelli et al., 2009). When passing through moving doors that oscillated at a frequency of 1 Hz, participants were able to successfully pass through an opening by refining the regulation of their approach speed. In the final part of each walking trial, fixations were directed exclusively toward the middle of the opening. It is at this point that fine motor control is important. The coincidence of heading toward the middle of the opening and looking at that point suggests that, again, the final locomotor adjustments were likely to be driven by visually/guided, on-line control.

## **MAINTAINING A SPATIAL MARGIN BETWEEN AN OBSTACLE AND THE SELF**

To step over an obstacle, both correctfoot placement prior to"takeoff" and correct limb elevation over the obstacle are required. Kinematic studies have demonstrated that for obstacles of different locations and heights,individuals can produce relatively consistent foot placement in front of the obstacle (i.e., the frontal spatial margin) and a relatively consistent toe clearance (i.e., vertical spatial margin) while stepping over it (Patla et al., 1996; Krell and Patla, 2002). This implies that maintaining a spatial margin between an obstacle and the self is one of the critical control parameters to determine how locomotor patterns were modified.

In agreement with this idea, we recently reported that when passing through an opening, the CNS is likely to determine the amplitude of body rotation to ensure that the minimal spatial margin (6–8 cm) is created at one side of the body at the time of crossing (Higuchi et al., 2012). In this study, we asked participants to walk through narrow openings of three widths relative to their body width (ratio value = 0.9, 1.0, and 1.1) while holding one of three horizontal bars (one shorter than the body width and the others 1.5 and 2.5 times the body width). The experimental manipulation of holding the long bar was helpful in addressing this issue because the longer the bar was (i.e., the wider the spatial requirements for passage were), the smaller the amplitude of body rotation sufficient to produce the same spatial margin for the respective ratio value of an opening was (see Higuchi et al., 2012 for detail). The results showed that the amplitude of rotation angles became smaller for the respective ratio value as the bar increased in length. This clearly supported the idea that producing a constant spatial margin is a control parameter for determining the amplitude of body rotations.

The magnitude of the spatial margin itself is dependent on locomotor and environmental constraints. Compared to when walking through a horizontal opening, the spatial margin for walking through a "vertical" opening (e.g., ducking to avoid a low-hanging branch) was significantly smaller (Franchak et al., 2012). Franchak et al. attributed the difference in the magnitude of spatial margin to the reflection of difference in locomotor constraints between lateral sway of the body during walking and vertical bounce; lateral sway shifts the body outside of BOS during

the single support phase (Shumway-Cook and Woollacott, 2001; Fujikake et al., 2011), whereas vertical bounce only makes the body shorter (Murray et al., 1964). Likewise, when circumventing a moving obstacle, a much larger spatial margin was necessary (approximately 2 m in front and 0.5 m on each side), suggesting that environmental constraints (i.e., either a stationary or a moving environment) also affect the ideal magnitude of the spatial margin (Gerin-Lajoie et al., 2005).

## **GAZE BEHAVIOR DURING ADAPTIVE LOCOMOTION**

## **INDIVIDUALS LOOKING AT FAR SPACE DURING ADAPTIVE LOCOMOTION**

As discussed in the previous section, adaptive locomotor adjustments in response to environmental constraints, such as the existence of an obstacle are initiated when an obstacle is still far away. To assist such anticipatory adjustments, visual information about far space is necessary. Analyses of spatio-temporal patterns of gaze behavior during adaptive locomotion under a variety of environments, as well as under a variety of forms of locomotion, have shown that, except in a situation where very precise stepping on a footfall target is necessary (Hollands et al., 1995; Hollands and Marple-Horvat, 2001; Chapman and Hollands, 2006a,b; Young et al., 2012), fixations are directed toward far space.

The basic rules are that we are looking at far space and that "we are moving as we are looking"(Bernardin et al.,2012). More specifically, common characteristics of eye movements during adaptive locomotion are that the majority of fixations were directed either toward a desired future path or toward an object of interest (Land, 1999; Hayhoe and Ballard, 2005). Such a common characteristic has been observed under a variety of situations, including walking down a straight hallway to turn (Turano et al., 2001, 2002), walking through an opening (Cinelli et al., 2008, 2009; Higuchi et al., 2009a), stepping over an obstacle (Patla and Vickers, 1997), stair ascent and descent (Zietz and Hollands, 2009), stepping multiple footfall targets (Patla and Vickers, 2003; Yamada et al., 2012), steering during walking (Imai et al., 2001; Hollands et al., 2002; Lamontagne and Fung, 2009), driving a car (Land and Lee, 1994; Land and Horwood, 1995), and even walking in the dark (Grasso et al., 1998) or along mentally simulated complex trajectories (Bernardin et al., 2012).

Although common characteristics of eye movements are maintained, actual locations of fixation are different depending on whether an object of interest is on the floor. When an object of interest is on the floor, fixations tend to be directed toward the floor, particularly along a desired future path. For instance, when walking and approaching a single static obstacle located on the ground, fixations were located either at a fixed distance ahead of the individual on the floor (i.e., the direction of travel) or at the obstacle; however, fixations were never directed toward the obstacle when participants were stepping over it (Patla and Vickers, 1997). When stepping on multiple footfall targets (Patla and Vickers, 2003; Yamada et al., 2012) or going down stairs (Zietz and Hollands, 2009), individuals fixated approximately two or three targets ahead. These findings suggest that even when fixations are maintained on the floor, the rule of looking at far space is maintained. When there is no object of interest on the floor, on the other hand, fixations are rarely directed toward the

A somewhat exceptional case in which the rule of looking at far space is not necessarily maintained is the case of stepping very precisely on a footfall target (Hollands et al., 1995; Hollands and Marple-Horvat, 2001; Chapman and Hollands, 2006a,b; Young et al., 2012). In such cases, individuals look at the footfall target on which they intend to step until they step on the intended target. This suggests that on-line visual information is necessary to step very precisely on a footfall target. Importantly, however, fixation patterns in this case are still the same as those in other cases in that individuals are likely to rely on the maintained fixations directed toward goal-oriented locations; that is, individuals are aiming at where they are looking (Bernardin et al., 2012).

## **THE USE OF OPTIC FLOW**

Maintaining fixation at a distant point on (or very close to) a desired future path helps individuals to align themselves with the goal (Hollands et al., 2002;Wilkie andWann, 2003; Marple-Horvat et al., 2005) because such fixations simplify control of the heading direction through reliance on optic flow (Warren et al., 2001; Andersen and Enriquez, 2006). Optic flow is the retinal motion pattern generated by body movement (Gibson, 1958;Warren et al., 2001). When an individual fixates on a point, its location on the retina remains stationary while motion radiates from the point with the maximum velocity to the side. Gibson (1958) called this stationary point the focus of expansion (FoE) of the optic flow field. When traveling in a straight line, the current direction of motion is specified by the FoE, so in principle the heading direction can be accurately controlled by ensuring that the FoE always lies in the desired path (Wilkie and Wann, 2003).

**Figure 2A** shows average percentages of fixations directed toward each of the four locations [left door, aperture, floor (path), or right door] while approaching and crossing a narrow opening (Higuchi et al., 2009a). As already explained in the previous section (Cinelli et al., 2009), fixations were directed exclusively toward the middle of the opening in the final part of each walking trial (for the last 10% of the normalized walking time). This finding is very important and suggests that even for stationary obstacles, visually guided, on-line control with the use of optic flow will come into play at the final phase of avoiding a collision.

## **ALTERNATIVE EXPLANATIONS FOR THE FUNCTION OF GAZE BEHAVIOR**

Alternative explanations for the functions of observed fixation patterns are also possible. First, by directing their fixations toward a desired future path or an object of interest, individuals could have been using their peripheral visual field to search for potential collision or perturbation. When passing through an opening, for example, maintaining a fixation toward the opening may have served as "visual pivot" (Ripoll et al., 1995;Williams et al., 1999) so that both sides of the doors with which collision could occur were captured in their peripheral vision, leading to the safest navigation through an opening (Cinelli et al., 2009).

Second, considering that common characteristics of eye movements during adaptive locomotion are maintained even when walking in the dark (Grasso et al., 1998), stepping on an invisible footfall target (Hollands and Marple-Horvat, 1996), or along mentally simulated complex trajectories (Bernardin et al., 2012), the brain uses the corollary motor command to the eye as a feed-forward signal to guide the expected direction. The efferent information about motor commands and proprioception given by eye muscles when modifying their direction provides an important non-visual source of information. The fixation location during locomotion may therefore be necessarily aligned with a desired future path. Studies regarding eye-hand coordination during manual aiming tasks support this explanation (Abrams et al., 1990; Wilmut et al., 2006).

Notably, at least for passing through a narrow opening, spatiotemporal patterns of fixation are dramatically different when the form of locomotion is quite novel for participants (Higuchi et al., 2009a). **Figure 2B** shows that when participants were naïve to wheelchair use and they tried to pass through an opening while sitting in a wheelchair, fixations were directed more frequently toward the door edges throughout their locomotion. At the same time, the duration of each fixation became significantly shorter. By foveating the door edges, the participants were better able to attend to the doors' positions, while short fixation durations allowed the participants to process each door's location more frequently. The differences in spatio-temporal patterns of fixation while walking or using a wheelchair seem to be similar to those between elite and non-elite athletes (Kato and Fukuda, 2002; Martell and Vickers, 2004; Nagano et al., 2004; Panchuk and Vickers, 2011), in that non-elite participants showed shorter fixation and more frequent saccades at critical moments.

**wheelchair conditions**. The value on the x axis shows the normalized time of

It appears that without a great deal of locomotor experience with a wheelchair, participants were unable to adapt to locomotor constraints imposed during wheelchair use and/or to effectively use optic flow to guide wheelchair locomotion. Attributing the specific patterns of fixation under the wheelchair condition to unfamiliarity with wheelchair use is indirectly supported by the findings demonstrating that a great deal of practice is necessary to effectively use optical variables in motor control (Michaels and de Vries, 1998; Jacobs et al., 2001; Fajen and Devaney, 2006). This is referred to as perceptual attunement. The existence of perceptual attunement has been demonstrated with perceptual-motor tasks, such as judging optic angles or the expansion rate of an approaching ball (Smith et al., 2001). It seems likely that similar learning process is necessary to effectively use optical variables during adaptive locomotion.

## **MALADAPTIVE GAZE BEHAVIOR IN OLDER ADULTS WHO ARE AT HIGH RISK OF FALLING**

We recently developed a new assessment for the fall risk of older individuals, the multi-target stepping (MTS) test, to measure stepping accuracy in a simplified manner (Yamada et al., 2011). In the MTS test, participants were asked to walk while stepping on multiple footfall targets and avoiding non-targets. In one of the studies to validate the MTS test (Yamada et al., 2012), we compared gaze behaviors while performing the MTS test among the three groups: the older individuals who are at high risk (HR) of falling (HR older), those who are at low risk (LR)

et al. (2009a).

of falling (LR older), and young individuals. The results showed that whereas the younger participants fixated approximately three targets ahead, the older participants directed their fixation closer toward the imminent footfall target. Such a tendency was significantly higher for the HR older participants than the LR older participants (**Figure 3**). Furthermore, the closer toward the imminent footfall target their fixations were, the more frequent were the errors of failure in stepping on the target and of avoiding non-targets.

From these findings, it is suggested that HR older individuals may have tried to precisely step on multiple footfall targets by heavily relying on on-line, visual information about an imminent footfall target; as a result, they were unable to modify their locomotor pattern in an anticipatory manner, resulting in more frequent stepping errors. HR older individuals generally exhibit increased gait variability (Verghese et al., 2009;Brach et al., 2010), a decline in visuomotor control of foot movement (Chapman and Hollands, 2006b, 2007). For these reasons, relying deeply on visually guided, on-line control of foot movement until they step on an imminent footfall target may have been unavoidable.

## **PERCEPTION OF ENVIRONMENT IN RELATION TO ACTION CAPABILITIES**

## **OPTICAL VARIABLES AS A FUNDAMENTAL BASIS FOR GUIDANCE OF LOCOMOTION**

The fundamental basis for the guidance of locomotion is the patterned distribution of light available at a moving point of observation. As discussed, this patterned distribution experienced at the retina has been commonly referred to as optic flow (Gibson, 1958, 1979) or optical variables (Warren, 1998; Fajen, 2001). A considerable amount of research has been conducted to identify both the properties of optical flow that might support the guidance of locomotion and strategies that humans and animals use to exploit these properties of optic flow to achieve locomotor tasks.

For example, the time until contact with an object toward which one is moving at constant velocity happens to equal the inverse of the rate of dilation of the closed optical counter of the object (Lee, 1976). It is possible to tell when an object will be contacted by determining the rate at which its image expands (Rosenbaum, 2010). One can use this well-known tau-dot strategy for regulating deceleration during braking (Lee and Thomson, 1982; Lamontagne et al., 2007). Similarly, one behaves similarly to bees in that both steer down the middle of a passageway by equating the speed of optic flow (Duchon and Warren, 2002).

One can also align the direction of locomotion with the goal by turning by an amount that corresponds to the visual angle between the FoE and the goal (Harris and Carre, 2001; Warren et al., 2001; Lamontagne et al., 2010; Li and Cheng, 2011). Steering toward a goal requires that observers null the heading angle before reaching the target. That is, steering may be thought of as coordinating the closure of the two gaps: the heading and between the observer and the target. This strategy has been referred to as the tau-equalization theory (Fajen, 2001). Central vision is likely to be important for using optic flow to guide walking (Turano et al., 2005).

## **THE NECESSITY OF BODY-SCALED (OR ACTION-SCALED) INFORMATION FOR ADAPTIVE LOCOMOTION**

As explained, individuals generally rotate their body when an opening is narrower than 1.1–1.3 times their shoulder width (Warren and Whang, 1987; Higuchi et al., 2006a, 2012; Franchak et al., 2012). This rotation is initiated generally two steps prior to entering the opening (Higuchi et al. in an unpublished data) and its amplitude is determined so that it produces a minimum spatial margin under safe situations (Higuchi et al., 2012). The prerequisite of such behavior is that individuals can perceive "the width of an opening relative to the body width," or more generally, "the environmental properties relative to one's action capabilities" very accurately when the opening is still far from them. The perception of the fit between a person's action capabilities and relevant environmental properties has generally been referred to as perception of affordances (Gibson, 1979).

The information of the environmental properties relative to one's action capabilities is often referred to as body-scaled (or action-scaled) information (or more simply, the critical ratio value). Not only scaling body rotation angles but also other locomotor modifications when navigating through openings, such as changes in speed (Higuchi et al., 2006a; Cinelli et al., 2008; Cowie

**FIGURE 3 | Group differences in how far ahead the participants fixated while performing the MTS task**. Compared to the younger participants, who generally fixated three steps ahead, older participants showed the tendency to fixate on/around an imminent footfall target. Such a tendency

was stronger for those who were categorized as high risk (HR) older participants than for those who were categorized as low risk (LR) older participants. This figure was produced on the basis of the report in Yamada et al. (2012), and reproduced with permission from Higuchi et al. (2013).

et al., 2010; Fajen and Matthis, 2011) or the magnitude of deviation of the body midline from the center of the apertures (Higuchi et al., 2006a; Nicholls et al., 2010; Fujikake et al., 2011), were also well proportioned to this critical ratio value. These findings lead researchers to a general understanding that the perception of the ratio value be important to control gait and posture for navigating through apertures (Warren and Whang, 1987; Wagman and Taylor, 2005; Higuchi et al., 2006a; Fajen and Matthis, 2011). The validity of this understanding has been strengthened by another line of studies demonstrating that body-scaled (or action-scaled) information is also used to estimate a maximum reach (or jumpreach) height (Ramenzoni et al., 2008, 2010;Wagman and Morgan, 2010), a maximum height of surface that can be sat on (Mark, 1987; Mark et al., 1990) a maximum inclined surface that afford standing (Regia-Corte and Wagman, 2008), or stepping over a gap (Burton, 1992; Jiang and Mark, 1994; Snapp-Childs and Bingham, 2009).

#### **RECALIBRATION IN RESPONSE TO ALTERED ACTION CAPABILITIES**

Action capabilities are not always constant in daily locomotor activities; they are altered when walking while holding a shopping bag or a large box. Since a wider space than usual is required for locomotion under these situations, the dimensions of an external object needs to be accurately represented by the CNS as if it were a biological extension of the body. In other words, body-scaled (or action-scaled) information needs to be recalibrated in response to the extension (Higuchi et al., 2006b).

Previous studies showed an individual's superior ability to adapt to artificial extensions of body dimensions while walking (Mark, 1987; Mark et al., 1990; Hirose, 2002; Higuchi et al., 2006a) or while judging whether an aperture is passable with the extensions (Wagman and Taylor, 2005; Wagman and Malek, 2007). Higuchi et al. (2006a) demonstrated that when rotation of the shoulders was free at the time of door crossing, participants were very successful in crossing a doorway while holding a 63-cm horizontal bar; virtually the same locomotor patterns as those during normal walking were observed.

However, an individual's superior ability to quickly adapt to artificial extensions seems to occur only for well-learned behavior. In one of our studies (Higuchi et al., 2004), we demonstrated that young, non-handicapped participants who had never used a wheelchair underestimated the space required for a wheelchair, risking a potential collision. They determined that apertures would be passable when apertures were greater than 0.94 times the width of the wheelchair. Their underestimation was not completely eliminated even after 8 days of practicing moving through openings of various widths with a wheelchair. These findings suggest that it would take a long time to adapt to altered action capabilities while using a wheelchair. Since the biomechanical features of locomotion dramatically change from walking to wheelchair use (e.g., upper-limb propulsion, restricted mobility, and dramatic changes in the position of the COM and BOS), extensive practice may be required to accurately determine whether safe passage is possible. In fact, the estimation of the space required for wheelchair use was accurate in experienced users with tetraplegia (Higuchi et al., 2009b) and healthy participants trained for 6 months (Flascher and Shaw, 1995).

Moreover, an individual's superior ability to quickly adapt to artificial extensions under a specific context, which is obtained through extensive practice, is not necessarily transferred under a novel context. Players of American football, who have had extensive practice in running through narrow spaces while wearing large shoulder pads, exhibited greater efficiency in running through narrow apertures than control athletes (Higuchi et al., 2011). Specifically, they exhibited smaller magnitudes and later onset times of body rotations than the control athletes. Importantly, however, such differences occurred only when they were running through openings and not while they were walking through openings. The results highlight that their excellent ability to quickly adapt to artificial extensions while wearing the shoulder pads is context specific (i.e., speed dependent).

Age-related changes in adaptability to altered action capabilities have also been reported (Hackney and Cinelli, 2013). Hackney and Cinelli initially demonstrated an age-related difference in behavior when walking through apertures; older adults were likely to adopt a more cautious strategy when passing through (i.e., creating a wider spatial margin). They then showed that affordance perception for aperture passability was less accurate for older participants only when the perception was made while they were in motion. The authors concluded with the finding that, for older adults, affordance perception is affected by self-motion, which could carry over to their locomotion.

## **CONCLUDING REMARKS**

Understanding the anticipatory nature of adaptive locomotion is helpful in understanding how vision is used to adaptively control our locomotion. This is because vision exclusively provides the information regarding a remote place very precisely. This paper reviewed a number of studies that have shown the anticipatory nature of adaptive locomotion. To ensure balance at the time of avoiding an obstacle, modifications in locomotor patterns are initiated at least a few steps prior to reaching the obstacle. It seems likely that maintaining a consistent but minimum spatial margin between an obstacle and the self is one of the dominant control parameters in determining how locomotor patterns are modified. Particularly, to avoid moving obstacles, not only executing anticipatory locomotor adjustments when obstacles are still far away but also making visually guided, on-line locomotor adjustments in the final phase is necessary. Eye movement during adaptive locomotion is well suited to assisting anticipatory locomotor adjustments. The basic rules are that we are looking at far space and that "we are moving as we are looking" (Bernardin et al., 2012). The CNS is smart enough to perceive environmental properties relative to action capabilities. The CNS is also flexible enough to recalibrate the perception in response to altered action capabilities, although the recalibration seems to occur very quickly only for well-learned behavior. Finally, inability to rely on anticipatory strategy to control adaptive locomotion with age can result in increased fall risk. Future studies need to examine whether balance problems during locomotion in some types of patients, such as stroke patients or patients with Parkinson's disease, can also be explained in part with their inability to rely on anticipatory strategy to control adaptive locomotion.

## **REFERENCES**


the field. *Percept. Mot. Skills* 99, 968–974.

	- Panchuk, D., and Vickers, J. N. (2011). Effect of narrowing the base of support on the gait, gaze and quiet eye of elite ballet dancers and controls. *Cogn. Process.* 12, 267–276.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 March 2013; accepted: 29 April 2013; published online: 16 May 2013.*

*Citation: Higuchi T (2013) Visuomotor control of human adaptive locomotion: understanding the anticipatory nature. Front. Psychol. 4:277. doi: 10.3389/fpsyg.2013.00277*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Higuchi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Imposed visual feedback delay of an action changes mass perception based on the sensory prediction error

## *Takuya Honda1,2, Nobuhiro Hagura1,3, Toshinori Yoshioka1 and Hiroshi Imamizu1,4\**

*<sup>1</sup> Cognitive Mechanisms Laboratories and Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan*

*<sup>2</sup> Research Fellow of the Japan Society for the Promotion of Science, Tokyo, Japan*

*<sup>3</sup> Institute of Cognitive Neuroscience, University College London, London, UK*

*<sup>4</sup> Center for Information and Neural Networks, National Institute of Information and Communications Technology and Osaka University, Osaka, Japan*

#### *Edited by:*

*Makoto Miyazaki, Yamaguchi University, Japan*

*Reviewed by: R. Chris Miall, University of Birmingham, UK David Franklin, University of Cambridge, UK*

#### *\*Correspondence:*

*Hiroshi Imamizu, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan e-mail: imamizu@gmail.com*

While performing an action, the timing of when the sensory feedback is given can be used to establish the causal link between the action and its consequence. It has been shown that delaying the visual feedback while carrying an object makes people feel the mass of the object to be greater, suggesting that the feedback timing can also impact the perceived quality of an external object. In this study, we investigated the origin of the feedback timing information that influences the mass perception of the external object. Participants made a straight reaching movement while holding a manipulandum. The movement of the manipulandum was presented as a cursor movement on a monitor. In Experiment 1, various delays were imposed between the actual trajectory and the cursor movement. The participants' perceived mass of the manipulandum significantly increased as the delay increased to 400 ms, but this gain did not reach significance when the delay was 800 ms. This suggests the existence of a temporal tuning mechanism for incorporating the visual feedback into the perception of mass. In Experiment 2, we examined whether the increased mass perception during the visual delay was due to the prediction error of the visual consequence of an action or to the actual delay of the feedback itself. After the participants adapted to the feedback delay, the perceived mass of the object became lighter than before, indicating that updating the temporal prediction model for the visual consequence diminishes the overestimation of the object's mass. We propose that the misattribution of the visual delay into mass perception is induced by the sensorimotor prediction error, possibly when the amount of delay (error) is within the range that can reasonably include the consequence of an action.

#### **Keywords: sensorimotor prediction, feedback delay, mass perception, delay adaptation, prediction error**

## **INTRODUCTION**

While performing an action, information on the timing of the sensory feedback has a crucial role in detecting the causal link between the action and its consequence (Kitazawa et al., 1995; Blakemore et al., 1999; Farrer et al., 2008; Tanaka et al., 2011; Honda et al., 2012a,b). For example, when the visual feedback is delayed, a self-generated visual motion is perceived as generated by someone else (Blakemore et al., 1999; Farrer et al., 2008). Furthermore, the motor learning process is also disrupted in such situations, possibly due to the failure of accurately linking the feedback information with one's own action (Kitazawa et al., 1995; Tanaka et al., 2011; Honda et al., 2012a,b). It has been suggested that the central nervous system (CNS) uses a forward model to predict the sensory consequence of an action (e.g., the position of the hand at a certain time point) by using a copy of the motor command (Miall et al., 1993; Wolpert et al., 1995; Miall et al., 2007). In such a scenario, the amount of prediction error, which is the difference between the predicted and the actual sensory feedback, contributes to detecting whether the sensory input is actually generated by the person.

The feedback timing information is not only used for linking the action and the consequence but can also contribute to the perception of the external environment. For example, the perception of a somatosensory event induced by self-touch is modified when a delay is imposed between the action and the touch (Blakemore et al., 1999). Likewise, it has been shown that delaying the visual feedback of an action while carrying an object makes people feel that the mass of that object is greater (Di Luca et al., 2011). Such evidence suggests that delay in the sensory feedback of an action may violate the authorship of the sensory consequence and, at the same time, change the quality of perception of that sensory event. In this study, we focus on the influence of feedback timing on the perceptual quality of the object's mass. Similar to the violation of authorship induced by the prediction error, in this case, the difference in the visually predicted hand position and the actual visual feedback (prediction error) may also contribute to such an overestimation of the object's mass. However, it is not yet clear whether the prediction error or the actual delay itself plays the major role in causing this phenomenon.

To test these two possibilities, we set up a reaching experiment where participants made a straight reaching movement while holding a manipulandum. The movement of the manipulandum was presented as a cursor movement on a monitor, which allowed us to impose various delays between the actual hand movement and the visual cursor movement. In Experiment 1, we examined the relationship between the amount of delay and the illusory increase of mass. Since the authorship of the sensory consequence is violated with a longer imposed delay between the action and the consequence, (Farrer et al., 2008) we predict that the mass of the manipulandum will be perceived as heavier for a shorter delay but not for a longer delay.

In Experiment 2, we investigated the effect of delay adaptation on the perceived mass. If the prediction error were the cause of the increase in perceived mass, reducing the prediction error by adapting to the delay would alleviate the overestimation of the mass. On the other hand, if the actual delay were the cause, the mass would be overestimated irrespective of the adaptation.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

A total of 24 neurologically normal right-handed (Oldfield, 1971) volunteers (22 males and 2 females; age range, 20–44 years) participated. The study was approved by the Ethics Committee of the Advanced Telecommunications Research Institute. Written informed consent was obtained from all participants prior to performing the experiments.

#### **APPARATUS**

Participants sat on an adjustable chair while grasping the handle of a twin visuomotor and haptic interface system (TVINS) (**Figure 1**). The participant's forearm was secured to a support beam on the horizontal plane. TVINS consists of two parallellinked, direct-drive floating manipulanda using air magnets. Thus, the experiment can be conducted either by using only one manipulandum or by using both at the same time. Each manipulandum was powered by two DC direct-drive motors controlled at 2000 Hz. TVINS yielded a virtual mass (*M*) according to the equation of motion: *M* = *F/a*. Here *F* is a resistance force generated by TVINS in proportion to the handle acceleration (*a*). We confirmed that the accuracy in online measurement of the acceleration was ±0*.*04 m/s2 even at the peak acceleration. We also confirmed by measuring the resistance force with a spring scale that TVINS could generate a target force with the precision of 0.1 Kgf. The position of the manipulandum was measured using optical joint position sensors (4,800,000 pulse/rev). The position of the hand (handle of the manipulandum) was projected on a horizontal screen placed above the mechanical plane and below shoulder level. The projector refresh rate was 75 Hz. The screen prevented the participants from directly seeing their arm.

It should be noted that there was a slight time delay for the actual handle movement to be reflected as the cursor movement on the screen, due to the limitations of the computer's data processing speed. When the delay between the handle and the cursor movements was measured 10 times by a high-speed camera (600 Hz), it was found to be 42.5 ms (SD 2.4 ms) when around the handle position near the body (around the "starting position" in the experiments) and 41.8 ms (SD 2.4 ms) when at a distance

from the body (around the "target position"). No significant difference between the positions [*t(*18*)* = 2*.*11, *p* = 0*.*543] was observed. Since this delay is comparable across positions, and our interest was in the difference between the conditions, we believe that this delay itself will not affect our results. In the following, we describe the delay from this "baseline delay" but note that an additional 42-ms delay always existed in all of the conditions.

#### **EXPERIMENT 1**

We tested how the difference in imposed visual delay between the actual movement and the cursor feedback information influences the mass perception.

#### *Task procedure*

Fourteen volunteers participated. By making a reaching movement while grasping the manipulandum with their right hand, participants moved the cursor toward the target presented 10 cm from the starting point on the screen. After the reaching movement, the handle automatically moved back to the starting position. Participants judged the perceived mass of the manipulandum after each trial.

Three target locations were prepared. The middle target was straight ahead from the starting point, and the other two were 20◦ rotated clockwise or counterclockwise around the starting point from the middle target's path. The peak velocity of the reaching movement was required to be within the range of 300–450 mm/s. A warning message appeared on the screen if the movement velocity of the handle was faster ("Fast") or slower ("Slow") than the set velocity range. The mass of the manipulandum was varied from trial to trial by adding a resistive force against the movement of the hand (see above). Nine mass values were prepared: 1, 2, 3, 4, 5, 6, 7, 8, and 9 kg. Furthermore, a variable delay was imposed between the cursor movement and the actual movement of the hand in each trial, where this delay was chosen from five values: 0, 100, 200, 400, or 800 ms. The experiment investigated every combination between mass and delay (9 masses × 5 delays: 45 combinations). Each combination was repeated 15 times in random order. Consequently, each participant carried out a total of 675 trials. The experiment was divided into five 135-trial blocks, and the participants were allowed to take a break between the blocks. After each reaching movement, participants judged whether the mass of the manipulandum was greater or smaller than the average of all of the mass values presented in the previous trials. This is a version of the "method of single stimuli," (Morgan et al., 2000) which requires participants to use their internal criterion for the judgment. The accuracy of this method is comparable to, and even more accurate than, (Nachmias, 2006) the method that always presents a standard stimulus as a comparison stimulus (Wearden and Ferrara, 1995; Hagura et al., 2012). Moreover, this method enables us to increase the number of trials for a given period of time, which is crucial for reconstructing the psychometric function. The judgment ("lighter" or "heavier") was made by pressing one of two buttons with the left hand. Before the experiment, participants practiced and experienced each mass 30 times in order to familiarize themselves with the distribution of the input mass.

## *Data analysis*

Participants' judgments of the masses were analyzed separately for five imposed delays. Logistic regression was used to relate the percentage of "heavier" judgment to overall stimulus mass value for each participant. The form of the function was

$$\mathcal{Y} = \frac{1}{1 + e^{\left(\frac{x-a}{\theta}\right)}},$$

where α is the mass value corresponding to the point of subjective equality (PSE, the 50% response level on the psychometric function) and θ provides an estimate of the mass discrimination sensitivity. To estimate the parameters, the logistic function was fitted to the judgment data of individual subjects by using a generalized linear mode as implemented in a MATLAB *glmfit* function (MathWorks, Natick, MA, USA). One-Way analysis of variance (ANOVA) with repeated measures was used to test the effect of the delay value on both the PSE and the discrimination sensitivity. Ryan's multiple comparison tests were used to compare the 0-ms delay condition with the other delay conditions. The threshold for statistical significance was set at *p <* 0*.*05 throughout this study.

#### **EXPERIMENT 2**

In Experiment 1, we found that the perceived mass of the manipulandum significantly increases with the amount of imposed visual feedback delay when the delay is in the short range, but not when it is in the longer range (see Results). Next, we examined whether decreasing the prediction error for the feedback delay would change this delay-induced overestimation of the object's mass.

#### *Task procedure*

Ten volunteers participated. There were two conditions: *delay condition* and *no-delay condition*. In *delay condition*, participants were continuously exposed to the visual feedback delay when reaching to a target with the manipulandum, whereas delay was not imposed in *no-delay* condition (*simple reach* trials). Between these reaching trials, the participants' ability to perceptually recognize the delay (*delay awareness* trials) and their perception of the manipulandum mass (*mass comparison* trials) were measured.

The two conditions were performed by each participant on two separate days. The order of the conditions was randomly assigned to each participant: five of them performed *no-delay condition* first, while the others performed *delay condition* first. Each condition consisted of 404 trials, which were divided into four 101-trial blocks. Each block consisted of 87 *simple reach* trials, 5 *delay awareness* trials, and 9 *mass comparison* trials. Participants took short breaks between blocks. Note that the *delay awareness* and *mass comparison* trials were identical between conditions. Therefore, any conditional difference observed in these trials would be due to the pre-exposure to the feedback delay occurring in the *simple reach* trials. The details of these different trial types are explained below.

In *simple reach* trials, participants made a right-hand reaching movement by moving the manipulandum toward the target that appeared 15 cm from the starting position. The visual feedback was delayed for 200 ms in *delay condition*, whereas no delay was imposed in *no-delay condition*. The aim of *simple reach* trials was to allow participants to adapt to the 200-ms delay in *delay condition* (and the lack of delay in *no-delay condition*). To maintain participants' concentration, in one of 10 to 11 trials, the target jumped to the 20◦ clockwise-rotated position immediately after the onset of the reaching. The *simple reach* trials were distributed pseudo-randomly in a block, where more than one *simple reach* was conducted before *delay awareness* or *mass comparison* trials.

In *delay awareness* trials (sequence **A** in **Figure 2**), after making the right-hand reaching movement, participants were required to answer whether they felt any delay between their hand and the cursor movement. This trial was used to assess the change in perceptual sensitivity to the delay. Since we found in our pilot study that the delay of 200 ms was easily detectable, the cursor delay in *delay awareness* trials was set to 150 ms to avoid any ceiling effect.

Finally, in *mass comparison* trials (see sequence **B** in **Figure 2**), after making a right-hand reaching movement, participants were asked to make the same straight reaching movement with their left hand. Then, they were asked to judge whether the right hand was heavier or lighter than the left hand. The cursor delay was set to 200 ms for the right-hand movement, and there was no cursor presented for the left-hand movement. The mass value of all of the right-hand reaches was set to 3 kg (this was also the case for *simple reach* and *delay awareness*), while the mass was set to 1, 3, or 5 kg for left-hand reaches. This trial was used to evaluate the perception of mass under the delay of visual feedback, in the same manner used in Experiment 1. Since our aim was to extend our findings in Experiment 1, which was performed with the right hand, the left hand was used only to present the reference mass for the right hand.

Before *delay awareness* and *mass comparison* trials, participants were instructed about which type of trial they were going to perform (see "announcement" in **Figure 2**).

**FIGURE 2 | Two types of trials in Experiment 2.** The horizontal flow is the sequence of each trial; **(A)** *Delay awareness* trial in which subjects were asked if they felt any delay in cursor movements, and **(B)** *mass comparison* trials in which subjects were asked to judge whether the right- or left-hand movement

was heavier. The instruction was on the screen from the end of the last trial until the onset of the next trial (target appearance). The yellow letters and arrows are used to explain each display, but they are not shown on the screen during the actual experiment.

### *Data analysis*

For the *delay awareness* and the *mass comparison* trials, the probability of judging the trial as "delayed" and that of judging the mass of the right hand "heavier" were calculated. These values were compared between the *delay* and *no-delay conditions*.

## **RESULTS**

## **EXPERIMENT 1**

The psychometric function in **Figures 3A,B** describes the participants' judgment of mass as a function of actually delivered mass. **Figure 3A** shows the psychometric function constructed for different imposed delays (0, 100, 200, 400, or 800 ms) of a representative participant, while **Figure 3B** shows that of the data averaged across all participants. One-Way ANOVA with repeated measures performed on the PSEs of different delay values showed a significant effect of delay on mass perception [*p* = 0*.*024, *F(*4*,* <sup>52</sup>*)* = 3*.*082]. **Figure 3C** shows a shift in PSE for each delay from the case of 0-ms delay. The *post hoc* comparison performed from the 0-ms delay condition showed that the PSE significantly shifted toward the heavier side when the 200-ms delay (*p* = 0*.*030 after correction with Ryan's nominal significant level) or the 400-ms delay (*p* = 0*.*038 after the correction) was imposed, but not when the delay was 100 ms (*p* = 0*.*175 after the correction) or 800 ms (*p* = 0*.*175 after the correction). Moreover, One-Way ANOVA with repeated measures performed on the discrimination sensitivity of different delay values showed no significant effect of delay [*p* = 0*.*130, *F(*4*,* <sup>52</sup>*)* = 1*.*866; mean sensitivity (±SD) was 0.97 ± 0.09 for a 0-ms delay, 1.11 ± 0.10 for a 100-ms delay, 1.02 ± 0.10 for a 200-ms delay, 0.94 ± 0.08 for a 400-ms delay, and 1.01 ± 0.08 for a 800-ms delay]. This indicates that sensitivity to the mass did not differ according to the delays. The results show that the visual feedback delay significantly modifies the mass perception of the manipulandum but failed to reach significance for a longer delay.

**FIGURE 3 | Results of Experiment 1. (A)** Results are shown for a typical participant. The mass value at which each curve crosses the 0.5 line is PSE for each delay value. The red arrow indicates the shift of PSE for a 800-ms delay from that for a 0-ms delay (see panel **C**). **(B)** Psychometric functions are fitted to data averaged across participants. Average judgment rate across participants was calculated for each mass value, and sigmoid functions were fitted to the averaged rates. **(C)** For each delay, the shift of PSE from that for a 0-ms delay is shown. Shifts were calculated for each cursor delay and averaged across participants. Error bars indicate standard error of measures across participants. \**p <* 0*.*05 according to Ryan's multiple (four) comparison tests for difference in PSE between 0-ms delay and the other delay conditions.

## **EXPERIMENT 2**

One participant was excluded from the analysis based on his extremely slow reaction times, that is, initiation of the movement onset from the target appearance was more than 1000 ms on average, possibly due to a lack of concentration on the task.

For the *delay awareness* trials, the rate of delay awareness was significantly higher in the no-delay condition than in the delay condition according to the paired *t*-test [*p <* 0*.*001; *t(*8*)* = 6*.*468; **Figure 4A**]. Namely, participants tended to more accurately perceive the imposed 150-ms delay in the *no-delay condition* compared to the *delay condition*. This indicates that repeated exposure to the delay in the simple *reach trials* made the participants less sensitive to the delay. The lower sensitivity to the delay after being exposed to the delay was already observed in the first block of trials, and it continued throughout the experiment (**Figure 4B**). When we analyzed the data with a Two-Way ANOVA, using the effect of block number along with the effect of condition (*delay* or *no-delay*), only a main effect of the condition [*p* = 0*.*0001, *F(*1*,* <sup>8</sup>*)* = 47*.*059] was observed, without any main effect of the block [*p* = 0*.*102, *F(*3*,* <sup>24</sup>*)* = 2*.*313] nor of the interaction between the two factors [*p* = 0*.*592, *F(*3*,* <sup>24</sup>*)* = 0*.*649]. These results show that exposure to the delay seems to have an immediate impact on the delay sensitivity, and the effect was consistent throughout Experiment 2.

Following this tendency, participants perceived the mass of the manipulandum as lighter in the mass comparison trials of the *delay condition* compared to that of the *no-delay condition* (**Figure 4C**). Two-Way ANOVA with repeated measures showed significant main effects of the condition [*p* = 0*.*002, *F(*1*,* <sup>8</sup>*)* = 19*.*139] and the mass value [*p <* 0*.*001, *F(*2*,* <sup>16</sup>*)* = 71*.*627], without any significant effect of interaction [*p* = 0*.*558, *F(*2*,* <sup>16</sup>*)* = 0*.*605]. This indicates that the adaptation to the delay induced the insensitivity to the delay, and this was accompanied by the perception of smaller mass compared to when there was no adaptation. In other words, the perceived delay may play a critical role in judging the mass of an object while making a movement.

#### **DISCUSSION**

We examined how imposing a delay between an action and its visual feedback influences mass perception. In Experiment 1, participants felt that the manipulandum was heavier as the feedback delay increased to 400 ms, but this effect was less clear when the delay was 800 ms (**Figure 3C**). This indicates that mass perception modified by feedback delay is not solely related to the amount of delay. The results of Experiment 2 show that the mass overestimation was alleviated when the participants adapted to the delay, compared to when there was no adaptation (**Figure 4C**). This suggests the sensory feedback prediction error may play an important role in inducing the overestimation of mass.

Delaying the visual feedback during manual actions causes a discrepancy between visual and proprioceptive positional estimates of the hand, or between expected and actual hand positions. This kind of discrepancy tends to be attributed to the mass perception, making the participants feel that the hand-held object is heavier than expected (Di Luca et al., 2011). Within the range of the delay investigated in the previous study (0–200 ms), the perceived mass linearly increased as the delay increased. However, this was not the case for much longer delays, which was specifically tested in our experiment (**Figure 3C**); when the delay was 800 ms, the effect of overestimating the mass became variable. This shows that longer feedback delay is processed differently from shorter delays. A previous study showed that delaying the timing of a sensory consequence of an action makes people feel that the time between an action and its sensory consequence is shorter than it actually is (Haggard et al., 2002). This binding effect was regarded as an implicit measure of whether the sensory input is actually processed as one's own action (authorship of the sensory event) (Haggard et al., 2002; Haggard and Clark, 2003). Several studies have shown that the binding effect is modulated by temporal contiguity: When the feedback delay is large, the binding effect becomes weak (Haggard et al., 2002; Heron et al., 2009). Many studies have demonstrated that such binding does not occur if the delay is more than 200–300 ms (Blakemore et al., 1999; Haggard et al., 2002; Heron et al., 2009). In considering this evidence, the reason why the longer feedback delay (800 ms) was not reflected as an increase in mass may be due to the disruption of the association between an action and its sensory consequence: The longer delay may have violated the authorship of the sensory feedback information rather than being processed as the consequence of the action. Violation of action-authorship modifying the quality of the sensory perception may reflect findings in the literature showing that the participant's perceived intensity (amount of force) (Shergill et al., 2003) or the quality (ticklishness) of a tactile input depends on the applied timing of the tactile stimulus in relation to the participant's own action (Blakemore et al., 1999).

It should be noted that the violation of the authorship of the feedback information in the present study can occur without depending on the amount of delay; since the average movement time of reaching movement was 882 ± 88 ms, participants may not have related the 800-ms-delay feedback to their own movement simply because the movement had nearly terminated. Our current experimental design cannot separate the effects of these two factors, and so further study is needed to separate these possibilities.

In Experiment 2, when the participants were repeatedly exposed to the delay (*delay condition*), they became less sensitive to the delay compared to when not exposed to the delay (*no-delay condition*) (**Figures 4A,B**). Reduced sensitivity to the temporal delay shows that the participants were perceptually adapted to the feedback delay in the *delay condition*, as has been shown both in the perceptual domain (Haggard et al., 2002; Haggard and Clark, 2003) and in the motor control domain (Honda et al., 2012a,b). Accompanying this adaptation effect, the illusory increase in mass was significantly alleviated in the *delay condition* in comparison to the *no-delay condition* (**Figure 4C**; red plots are significantly below blue ones). This result clearly shows that the mass overestimation accompanying feedback delay is not caused by the actual delay itself, since the actual delay is constant in the two conditions (**Figure 4C**). Furthermore, this suggests that the factors changing in accordance with the perceptual temporal adaptation might be tightly linked to the alleviation of mass overestimation.

Two different types of adaptation may underlie the temporal adaptation observed between the action and the sensory input in this study. One is the adaptation between different sensory inputs, such as between vision and proprioception (Kambara et al., 2013). Feedback delay will lead to a discrepancy between the two, which may require calibration. The other possibility is the involvement of a motor command, providing prediction about the timing of the sensory reafference. In this case, adaptation may have occurred between the predicted and the actual timing of the incoming sensory input (prediction error). Either mechanism could have worked in our experiment. However, Stetson et al. (2006) showed that the strength of calibration of perceived timing between pressing a button and a visual flash

## **REFERENCES**


4:e7681. doi: 10.1371/journal.pone. 0007681


is much weaker when the button press was replaced by a passive button touch. Other studies on delay perception have also suggested that prediction in the sensorimotor system is critical for a change in temporal perception (Haggard and Clark, 2003; Stetson et al., 2006). Therefore, we believe that the increase in mass perception dominantly involves motor-based prediction error. In any case, further study is necessary to clarify this point.

In conclusion, we propose that the misattribution of a visual delay to the increased mass perception is induced by the sensorimotor prediction error, and it seems to preferentially occur when the delay is within the range that can be attributed to the consequence of the action.

## **ACKNOWLEDGMENTS**

The authors would like to thank Drs. Kenji Ogawa, Chang Cai, Satoshi Hirose, and Mitsuo Kawato for their helpful comments, and Ms. Yuka Oshima for recruitment of the participants. This research was supported by Grants-in-Aid for JSPS Fellows #22- 9430 (Takuya Honda). Nobuhiro Hagura was supported by Marie Curie International Incoming Fellowships. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.


internal model for sensorimotor integration. *Science* 269, 1880–1882. doi: 10.1126/science.7569931

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 June 2013; accepted: 28 September 2013; published online: 23 October 2013.*

*Citation: Honda T, Hagura N, Yoshioka T and Imamizu H (2013) Imposed visual feedback delay of an action changes mass perception based on the sensory prediction error. Front. Psychol. 4:760. doi: 10.3389/fpsyg.2013.00760*

*This article was submitted to Consciousness Research, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Honda, Hagura, Yoshioka and Imamizu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neurobiological mechanisms behind the spatiotemporal illusions of awareness used for advocating prediction or postdiction

## **Talis Bachmann\***

Laboratory of Cognitive Neuroscience, Institute of Public Law, University of Tartu (Tallinn branch), Tartu, Estonia

#### **Edited by:**

Yuki Yamada, Yamaguchi University, Japan

#### **Reviewed by:**

Ned Block, New York University, USA Katsumi Watanabe, The University of Tokyo, Japan

#### **\*Correspondence:**

Talis Bachmann, Laboratory of Cognitive Neuroscience, Institute of Public Law, University of Tartu (Tallinn branch), Kaarli puiestee 3, Tallinn 10119, Estonia. e-mail: talis.bachmann@ut.ee

The fact that it takes time for the brain to process information from the changing environment underlies many experimental phenomena of awareness of spatiotemporal events, including a number of astonishing illusions. These phenomena have been explained from the predictive and postdictive theoretical perspectives. Here I describe the most extensively studied phenomena in order to see how well the two perspectives can explain them. Next, the neurobiological perceptual retouch mechanism of producing stimulation awareness is characterized and its work in causing the listed illusions is described. A perspective on how brain mechanisms of conscious perception produce the phenomena supportive of the postdictive view is presented in this article. At the same time, some of the phenomena cannot be explained by the traditional postdictive account, but can be interpreted from the perceptual retouch theory perspective.

**Keywords: consciousness, awareness, illusion, prediction, postdiction, timing, brain**

## **INTRODUCTION**

In the changing environment our brains inevitably provide us with a bit outdated percepts because of the time it takes to process new information (Eagleman, 2008; Nijhawan, 2008; Nijhawan and Khurana, 2010; Yamada et al., 2012). Obviously, this state of affairs is adaptively disadvantageous. Evolution must have provided us with some means to compensate or correct the often non-veridical perception vis-à-vis the actual appearance of the changing scene in order to enable subjects to act efficiently and interpret world around us veridically. The two most popular solutionsfor explaining how sensory-perceptual and sensorimotor systems may overcome, reduce, or re-interpret this processing delay dependent perceptual non-veridicality are prediction (Nijhawan, 1994, 2008; Kerzel, 2003; Cardoso-Leite et al., 2010 – all these associated with an increasingly popular Bayesian account of predictive encoding, e.g., Kersten et al., 2004; Bar, 2007; Hohwy et al., 2008) and postdiction (Eagleman and Sejnowski, 2000; Choi and Scholl, 2006; Eagleman, 2008; Buehner and Humphreys, 2010; Kawabe, 2011, 2012). (Approaches combining these accounts can be also acknowledged, e.g., Soga et al., 2009.)

The empirical evidence where the limits of the perceptual system in coping with challenges of the environmental stimulation come to the fore is surprisingly rich, consisting in many well established experimental awareness phenomena. In a recent review (Bachmann et al., 2011) the following examples are listed where spatiotemporal information is represented either non-veridically, surprisingly poorly or as if seeing more than is there: anorthoscopic perception, anthropomorphic perception effect of causality, attentional blink,Aubert–Fleischl effect, autokinetic effect, biological motion (Johansson effect),Cai and Schlag effect, change blindness, Cohene and Bechtoldt effect, color-phi phenomenon, cutaneous rabbit phenomenon, Czermak effect, feature attribution, feature inheritance, filled-duration illusion, flash-lag effect (FLE), (continuous) flash-suppression effect, flicker fusion, Fröhlich effect, Galli effect, induced-motion effect, Lawrence effect, line motion illusion (Hikosaka effect), masking effects, motion capture, motion induced blindness (MIB), Motoyoshi effect, multiple flash effects, path-guided motion, perceptual asynchrony effect, perceptual latency priming, phenomenal causality (Michotte) effect, proactive contrast facilitation, Pulfrich effect, repetition blindness, representational momentum, repulsion effects, sequential blanking, size transformation effects, sound-induced illusory flash phenomenon, standing wave illusion of invisibility, Stoper and Mansfield effect, stroboscopic motion, tandem effect, temporal context effect of brightness, temporal order reversal effect, Ternus–Pikler effect, tunnel effect, ventriloquist effect, voluntaryaction effect on perception timing, wagon-wheel illusion, Zöllner effect. Many of these phenomena are used for providing evidence for predictive or postdictive accounts of explicit perception. In this paper I will focus on some of these phenomena, indicate whether the predictive or the postdictive account is consistent with them and describe how the action of the perceptual retouch theory based awareness mechanism explains these phenomena. The choice of the seven phenomena for the purposes of the present article is not haphazard. First, from the long list presented above only a couple of the phenomena have been frequently used in the context of experimentation and theorizing trying to test or juxtapose both the predictive and postdictive accounts of spatiotemporal processing. Thus, only for a relatively limited set of the phenomena there is a sufficiently voluminous published record of discussion relevant to our topic. Second, space would not permit a systematic analysis of all the phenomena in the context of prediction, postdiction, and the perceptual retouch theory. Third, in order to be able to compare the validity of the alternative theoretical accounts

in explaining the phenomena, both of these accounts should have statements and working principles specific enough with regard to the spatiotemporal characteristics of the phenomena. This is in order to make the comparative evaluation possible. This also restricted our choice.

## **EXPERIMENTAL AWARENESS PHENOMENA AND THE TWO ACCOUNTS**

The most studied and discussed phenomena we use in this paper can be listed as follows.


Eagleman, 2008) has explained the FLE like this: encoding of the features of the changing object/event → waiting for the slowest feature to have been encoded → re-interpretation of the encoded signals post-dicted back in time to the moment of flash to compensate the inevitable delay in feature processing. However, because in Bachmann and Põder (2001) FLE was found also when the target flashed in the stream and the reference flash presented out of the stream were simultaneous and identical, but different from the stream items, it is difficult to understand how the postdiction could lead to the FLE illusion.


by Wu et al. (2009) suggest that before reaching consciousness, the non-conscious representation has to be processed for about 100 ms. Second, these results show that there has to be a nonspecific mechanism which brings the non-conscious specific representation to consciousness – any explanation for these results requires a process that is activated by the flash but at the same time acts on the representation of the target (and is, therefore, not specific for the target). I will return to this theme in the next part of this article. As for the predictive account there is nothing supportive in the Wu et al. (2009) data. Predicting the past before future is suspect. The postdictive account is better (i.e., reinstates what was there earlier, albeit in the pre-conscious format), but requires additional assumptions for explaining why the flashed stimulus that caused the reappearance of the target in awareness was not perceived at the same time with the reappeared target. According to Eagleman (2008), for the visual brain to correctly align the timing of events in the world, it may have to wait about 100 ms for the slowest information to arrive – thereby allowing the visual system to discount different delays imposed by the early stages. In the Wu et al. (2009) experiment the flash comes when also the target stimulus *is* present and *has been* present. Thus postdicting the flash-plus-target event back should have anyway represented both the flash-stimulus and the target stimulus together.

7. *Anorthoscopic perception* where the full shape of a moving stimulus is perceived despite that only part of its contours are visible through a slit at any moment in time is another phenomenon relevant in our context (e.g., Zöllner, 1862; McCloskey and Watkins, 1978; Aydin et al., 2008). Because this happens also with new stimuli, the shape of which is unknown beforehand to the perceivers, the prediction account cannot explain this effect. If the system does not know the regularities of change on which to found its predictive transformation, this kind of transformation is not possible. However, the postdictive account assuming a time consuming spatiotemporal integration of the unpredictable shape signals and motion signals after they have been processed can explain the shape formation post factum. Due to the space limitations I will skip here some other relevant phenomena for which there is sufficient level of specification allowing comparison with our theories such as the *line motion illusion* (Hikosaka et al., 1993) or the *Tandem Effect* (Müsseler and Neumann, 1992). Suffice it to say that the explanations for them are basically similar to what will be given in the next section of this article.

We saw that predictive and postdictive theory both had their successes in explaining the listed phenomena. At the same time these phenomena are the cases where subjective, consciousawareness-level representation is inconsistent with the objective, physical characteristics of the presented stimulation. How the known properties of the brain mechanisms necessary for contentful conscious perception may be causally relevant in leading to these illusory phenomena? Because these phenomena are typically the empirical basis for the theoretical arguments either in favor of the prediction or the postdiction account it is useful to

see whether the workings of the awareness mechanism provide explanations for the phenomena and thus provide a mechanistic basis for either one of the theoretical accounts.

## **THE MECHANISMS FOR PERCEPTUAL AWARENESS VIS-À-VIS THE PHENOMENA**

In this article I stick to the neurobiological mechanisms responsible for producing consciousness-level perceptual awareness as was suggested in the perceptual retouch theory (Bachmann, 1984, 1994, 2007). Consciousness-level visual perception of the environmental objects involves two types of binding operations, which both require some time to be carried out. First, there is the content-specific binding of features to integrated objects which is accomplished by the selectively tuned cortical stimulus-specific (SP-) modules in V1, V2, V3, V4, V5, and various temporal lobe areas (Koch, 2004; Rose, 2006; Gazzaniga, 2009). The processing by the SP-system can be carried out pre-consciously, without a concomitant awareness (explicit perception) of the presented, encoded and featurally bound stimuli (Naccache and Dehaene, 2001; Ruz et al., 2003; Kotchubey, 2005; Dehaene and Changeaux, 2011; van Gaal et al., 2011). Secondly, awareness of any of these object requires the binding of the neural representation formed by SP-operations with the more global and non-specific neural activity supported by the thalamo-cortical processes of neuromodulation (Bachmann, 1994; Purpura and Schiff, 1997; Ribary, 2005; Bogen, 2007; Alkire et al., 2008; Urbano et al., 2012) that I label as NSP (for "non-specific"). The NSP-processes do not communicate specific contents of the environmental stimuli, but they are necessary in order to bring the specific contents represented by SP-processes into consciousness-level representation. So, paradoxically, *non*-specific is specific for providing the phenomenal capacity for the specific contents. Interaction between cortical SPmodules and the subcortical (e.g., non-specific thalamic) nuclei constitutes the key mechanism for modulation of the SP-carried perceptual contents by the NSP. The boost of NSP-activity is caused by the presented stimulation and especially notably by the appearance of the new inputs. (The ignition of the NSP system is one of the subparts of the orienting reflex circuitry, its early working part.) Importantly, the receptive fields of the neurons constituting NSP are larger than the receptive fields of the neurons in the cortical SP whose function is to process specific incoming signals from the presented stimuli. Therefore, presentation of a certain specific stimulus with its specific content K can ignite a NSP-process which is capable of modulating the activity of some other neurons X with different specific content (even before the signals for X have been presented). The presynaptic inputs from both, SP-channels (from receptors via the lateral geniculate body up to the cortex) and NSP-channels (from the thalamo-cortical modulation system) converge on the cortical SP and both types of inputs regulate the excitatory postsynaptic potentials of the SP neurons. When this presynaptic input combining somatic and dendritic presynaptic effects from direct SP-channels and indirect NSP-channels is strong enough (e.g., as applied onto pyramidal neurons with their characteristic long apical dendrites), the specific neurons begin firing or increase their firing rate. When only SP-channels are active for representing actual stimulus objects but dissociated from NSP influence, no consciousness of the perceptual contents

of these objects can be experienced (Bachmann, 1994; Koch, 2004; Ribary, 2005; Bogen, 2007). The SP works faster than NSP which means that pre-conscious perceptual representation is formed ahead in time with regard to the time when NSP-modulated contents become consciously available. (The time difference between an effective pre-conscious SP-encoding of objects with their bound features and an effective process of NSP-modulation necessary for awareness to emerge amounts to about 50–150 ms.) **Figure 1** summarizes the general framework of the SP + NSP processing system.

Within this framework, the phenomena reported in the current article can be explained as follows.

1. Flash-lag effect where a moving and a static (flashed) stimulus are compared for their relative position and the flash appears to lag behind (e.g., Nijhawan, 1994). The retouch theory explanation (Bachmann et al., 2003, 2012; Bachmann, 2010) is this: because the action of NSP takes more time than SP-encoding and because no awareness of the SP-represented contents emerges before NSP-modulation has had its effect, the

percept in awareness emphasizes features that are or become present in SP somewhat later. For the features of the static flash this means that its initial position as stored in sensory memory will be"retouched"for consciousness, but for the features of the moving stimulus this means "retouching" an advanced spatial position for consciousness. (Additionally, the lingering sensory trace of the moving stimulus is erased for SP by a Reichardt type of movement detector; Reichardt, 1961.) This creates the illusion of a spatial lag. This explanation is valid also for the flash-initiated conditions, the conditions where the post-flash movement directions are unpredictable and the conditions where the pre-flash stimulation includes contradictory motion direction signals that could nullify or complicate prediction (Khurana and Nijhawan, 1995; Whitney and Murakami, 1998; Bachmann et al., 2012). In some sense the retouch theory explanation can be considered as a variety of the latency difference account. For example, Whitney and Murakami, 1998, p. 657) state that "The simplest explanation is that the neural delays for the flash and the moving bar are different . . . approximately 45 ms . . . represents the difference between the latencies for

moving and flashed stimuli. Specifically, the delay for the moving bar is shorter. . ., perhaps because responses of motion detectors at one location facilitate the response of other detectors along the expected path of motion." Actually, there are some important differences between the simple latency difference account and the retouch theory explanation. It is not essential that processing of the *motion* signals may be faster, but that any signals with precedence have shorter delay to arrive awareness because the action of the NSP (a system necessary for awareness of the already pre-consciously represented stimuli) has been activated in advance and the signals later in-stream win time to reach conscious awareness. The latency difference means latency-to-awareness, difference. Furthermore, as we will see subsequently, the retouch mechanism explains FLE also in the conditions where motion is not the case and static stimuli are presented. The retouch mechanism supports the postdictive account, but it also does not need the somewhat mystical "referral back in time" (Eagleman and Sejnowski, 2000).


from within the retouch theory context. The general postdictive account seems valid here.


fast. However, the flashed object appears in consciousness later because the corresponding SP-representation of the flash has to be built up ab ovo from the lower levels up to the higher pattern levels and this takes some time. Therefore, the NSP that brings contents to awareness finds the SP-contents of the target ready on the "waiting list"; however, this NSP-activity has to wait until the SP-contents of the flashed object become ready (i.e., bound to the object representation to be bound into consciousness). The predictive account is not useful here because predicting the past before future is suspect. The general postdictive account needs some ways to explain why the flashed stimulus that caused the reappearance of the target in awareness was not perceived at the same time with the reappeared target. Postdicting the flash-plus-target, event back should have anyway represented both – the flash-stimulus and the target stimulus.

7. The last phenomenon we consider is anorthoscopic perception where the shape of a moving stimulus is perceived, although only part of its contours are visible through a slit at any one moment (e.g.,Zöllner, 1862;McCloskey andWatkins, 1978). As this happens also with new stimuli unknown to observers, the prediction account cannot explain this effect. The postdictive account assuming a time consuming spatiotemporal integration of the unpredictable shape signals and motion signals after they have been processed can explain the shape formation post factum. Perceptual retouch account in its present form cannot explain the effect unless the NSP effects can be very slow and the SP-modules are termed to include highlevel visual-cognitive representations enabling more complex dynamic transformations.

**Table 1** summarizes my evaluations of whether the predictive account, postdictive general account, and the retouch mechanism based mechanistic explanation are consistent with the seven spatiotemporal phenomena of awareness used here for our analysis.

It is easy to see that the predictive as well as postdictive account both can explain more than half of the phenomena under consideration. However, the distribution of the consistency ratings is different. Except for the perception of causality in collision which can be explained by both accounts without reservations, the other phenomena are more puzzling for either the prediction or the postdiction theory or both. Certain special varieties of motioninvolving FLEs and static FLEs cannot be accounted for by these theories. Moreover, while the phenomena involving a kind of inertia effects (Fröhlich effect and representational momentum) are well accounted for by the predictive account, they cannot be easily explained by the postdictive account. On the other hand, the predictive theory is in trouble trying to explain reappearance in awareness after MIB and the anorthoscopic perception, both of which can be either fully or partly explained by postdiction. In the majority of cases the retouch mechanism also explains the phenomena and where it does, it does this without reservations (phenomena 1, 2, 3, 4, 6 from **Table 1**). For the "overshoot" effect in the representational momentum phenomenon and for the "creative" formation of the full shape from its dynamic fragments in the anorthoscopic effect the retouch theory does not have any specialized modules helping to lead to these effects (items 5 and 7 in the Table).

From the **Table 1** and the above analysis we see that no theory is able to explain the effects singlehandedly. Each one has its advantages and disadvantages. For some phenomena, the accounts are not exclusive in their explanations and can be mutually consistent. For example, the perceptual retouch mechanism can be considered as the neurobiological mechanism by which the phenomena are produced, which in turn may become subject for the interpretational higher order cognitive mechanisms working according to the abstract principles of postdiction (1, 2, 3, 6 in the Table). Regarding some phenomena, the contributions of the mechanisms suggested in the three theories may be additive, such as when motion extrapolation in certain varieties of the FLE or causalityfrom-collision experimental setups are used as examples (items 1–3 in the Table). Importantly, future experiments must be useful in trying to disentangle these relative contributions by clever experimental designs allowing control over the variables specific to each of the theories.

The general picture as it emerges from this analysis reveals some main differences between the theoretical accounts. The predictive account may be relatively restricted to the lower level effects involving motion and simple feature change analysis. The postdictive account fares better with effects where relatively highlevel visual-cognitive processes play their part. The perceptual retouch theory completes the picture by providing the neurobiological foundations for the effects where conscious perception represents the dynamic environment non-veridically because the NSP component of the retouch mechanism is slow. As the NSP component is necessary for upgrading the already processed information for conscious awareness, the slowness dependent illusions are inevitable in the direct perception. In the regulation of behavior and cementing general knowledge of the dynamic world around the subject higher level cognitive mechanisms implied in the postdiction account may be of help.

## **Table 1 | Evaluation of the consistency of the three theoretical explanations for the seven spatiotemporal perceptual awareness phenomena.**


+: Account/theory and the phenomenon are consistent.

−: Account/theory and the phenomenon are not consistent.

+/−: Consistency satisfied depending on which variety of the phenomenon is used.

Our comparative analysis suggests that a uniform explanation of all of the observed effects seems impossible right now. There is a complex interacting set of low-level and high-level mechanisms and also the capacity of the visual system to execute sufficiently sophisticated computations and encodings unconsciously. Given the variability and complexity of the spatiotemporal stimulation a subject may encounter and lack of unequivocally interpretable and invariant set of cues to be processed, a single one relatively simple mechanism may not be sufficient to account for all possible perceptual effects. Though having said this, it is surprising that the perceptual retouch mechanism can explain majority of the phenomena without reservations.

## **CONCLUDING REMARKS**

In this paper I presented a mechanistic explanation for the typical visual awareness phenomena that have been used for testing and advancing predictive and/or postdictive accounts of conscious perception. It seemed natural to look for the mechanisms precisely there where neurobiological data has shown what are the necessary brain processes for the emergence of a contentful perceptual experience (Bachmann, 1994; Koch, 2004; Ribary, 2005; Bogen, 2007). This small endeavor showed that both the predictive account and the postdictive traditional account can explain more than half of the "litmus-test" phenomena typically used in visual awareness studies in the present theoretical context. Surprisingly or not,

## **REFERENCES**


between conscious and unconscious perceptual streams. *Curr. Biol.* 19, 2003–2007.


the perceptual retouch theory based mechanistic explanation produced even a bit higher summary rating for the consistency (see **Table 1**). This explanation also supports several of the postdictive account principles, however this is without the need to invoke a somewhat mystical concept of referral back in time. Simply the delay to conscious awareness of featured perceptual information depends on whether the target stimuli were preceded by other input signals from spatially close/overlapping locations or not. If there was precedence, the NSP-processes are prepared to have their effect ahead in time and subsequent stimuli reach awareness relatively faster.

I do hope also that the perspective suggested here and based on the perceptual retouch theory of conscious perception might be useful in order to specify the so-called postdictive account more precisely in terms of the underlying neural mechanisms. Ultimately, it may turn out that postdiction in its radical sense may not be needed at all. On the other hand, the predictive account also cannot be sufficient. Not least because there are too many experimental effects of conscious vision unaccountable by the traditional approaches.

## **ACKNOWLEDGMENTS**

The author is supported by Estonian Science Agency, project SF0180027s12 (TSHPH0027), "Attention and Consciousness."

binding in the perception of collision events. *Psychol. Sci.* 21, 44–48.


binocular rivalry: an epistemological review. *Cognition* 108, 687–701.


*and Action*. Cambridge: Cambridge University Press.


briefly flashed ones. *Nat. Neurosci.* 3, 489–495.


Zöllner, F. (1862). Über eine neue art anorthoskopischer zerrbilder. *Ann. Phys.* 117, 477–484.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 November 2012; accepted: 16 December 2012; published online: 04 January 2013.*

*Citation: Bachmann T (2013) Neurobiological mechanisms behind the spatiotemporal illusions of awareness used for advocating prediction or postdiction. Front. Psychology 3:593. doi: 10.3389/fpsyg.2012.00593*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Bachmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Do the flash-lag effect and representational momentum involve similar extrapolations?

## *Timothy L. Hubbard\**

*Department of Psychology, Texas Christian University, Fort Worth, TX, USA*

#### *Edited by:*

*Yuki Yamada, Yamaguchi University, Japan*

#### *Reviewed by:*

*Gerrit W. Maus, University of California at Berkeley, USA Jochen Musseler, RWTH Aachen University, Germany*

#### *\*Correspondence:*

*Timothy L. Hubbard, Department of Psychology, Texas Christian University, 2800 S. University, Fort Worth, TX 76129, USA. e-mail: timothyleehubbard@ gmail.com*

In the flash-lag effect (FLE) and in representational momentum (RM), the represented position of a moving target is displaced in the direction of motion. Effects of numerous variables on the FLE and on RM are briefly considered. In many cases, variables appear to have the same effect on the FLE and on RM, and this is consistent with a hypothesis that displacements in the FLE and in RM result from overlapping or similar mechanisms. In other cases, variables initially appear to have different effects on the FLE and on RM, but accounts reconciling those apparent differences with a hypothesis of overlapping or similar mechanisms are suggested. Given that RM is simpler and accounts for a wider range of findings (i.e., RM involves a single stimulus rather than the relationship between two stimuli, RM accounts for displacement in absolute position of a single stimulus and for differences in relative position of two stimuli), it is suggested that (at least some cases of) the FLE might be a special case of RM in which the position of the target is assessed relative to the position of another stimulus (i.e., the flashed object) rather than relative to the actual position of the target.

**Keywords: flash-lag effect, representational momentum, displacement, spatial mislocalization, spatial cognition**

If observers view a moving target and a flashed (i.e., briefly presented) stationary object is presented in alignment with that moving target, the flashed object appears to lag behind the moving target. This is referred to as the *flash-lag effect* (FLE; Nijhawan, 1994; for review, Maus et al., 2010; Hubbard, in press-a). If observers view a moving target, the remembered final position of that target is shifted in the direction of target motion. This is referred to as *representational momentum* (RM; Freyd and Finke, 1984; for review, Hubbard, 2005, 2010). In the FLE (e.g., Shi and de'Sperati, 2008) and in RM (e.g., Hubbard, 1990), the represented position of a moving target is displaced forward, and the FLE (Nijhawan, 1994, 2008) and RM (Finke et al., 1986; Hubbard, 2005) have each been suggested to reflect compensation for delays in neural processing times and adaptation for realtime interaction with environmental stimuli. Surprisingly, there has been little comparison of the FLE and RM. Apparent similarities and differences of the FLE and RM are considered here, and it is suggested that displacement of the moving target in the FLE or in RM involves overlapping or similar mechanisms, or more radically, that the FLE is a special case of RM in which the represented position of the moving target is assessed relative to another object rather than relative to the actual target position.

## **APPARENT SIMILARITIES OF THE FLE AND RM**

There are numerous apparent similarities of the FLE and RM. The existence of such similarities is consistent with a hypothesis that the FLE and RM result from overlapping or similar mechanisms.

### **PERCEPTUAL SIMILARITIES**

Perceptual similarities involve effects of (1) velocity, (2) visual field, (3) a reference point, (4) multiple modalities, and (5) crossmodal information.

## *Velocity*

The FLE (Nijhawan, 1994; Brenner and Smeets, 2000; López-Moliner and Linares, 2006; Cantor and Schor, 2007; Wojtach et al., 2008) and RM (Hubbard and Bharucha, 1988; Hubbard, 1990) increase with increases in velocity of the moving target in the picture plane. The FLE (Lee et al., 2008) and RM (Hubbard, 1996) increase with increases in velocity of the moving target in depth. The FLE (Whitney et al., 2000) and RM (Finke et al., 1986) increase or decrease with acceleration or deceleration, respectively, of the moving target in the picture plane. The velocity effect is one of the most replicated effects in the FLE or RM literatures.

## *Visual field*

Whether the moving target is in the left or right visual field does not consistently influence the FLE or RM, but if an effect of visual field occurs, the FLE (Kanai et al., 2004) and RM (Halpern and Kelly, 1993; White et al., 1993) are larger if the moving target is in the left visual field. Maus and Nijhawan (2009) presented variations of a horizontally moving FLE stimulus and reported a slightly greater effect of velocity on displacement of moving targets in the upper visual field, and Hubbard (2001) reported RM was larger for vertically moving targets in the lower than in the upper visual field.

## *Reference point*

RM is larger if a target moves toward rather than away from a landmark, and Hubbard and Ruppel (1999) suggested RM combined with a landmark attraction effect: If RM and landmark attraction operate in the same direction (motion toward a landmark), they sum and displacement is larger, whereas if RM and landmark attraction operate in opposite directions (motion away from a landmark), they partially cancel and displacement is smaller. Similarly, the FLE is larger if a target moves toward rather than away from the fixated region (Mateeff and Hohnsbein, 1988; Shi and Nijhawan, 2008), and Brenner et al. (2006) suggested the FLE combined with a bias toward the fixated region: If the FLE and bias toward fixation operate in the same direction (motion toward fixation), they sum and target displacement is larger, whereas if the FLE and bias toward fixation operate in opposite directions (motion away from fixation), they partially cancel and target displacement is smaller.

## *Multiple modalities*

Most research on the FLE and RM presented visual stimuli. However, auditory stimuli can produce a FLE (Alais and Burr, 2003; Arrighi et al., 2005) and RM (Johnston and Jones, 2006), and haptic stimuli can produce a FLE (Nijhawan and Kirschfeld, 2003) and RM (Brouwer et al., 2005). It is possible that separate modality-specific mechanisms for the FLE and RM exist, but it is more parsimonious to posit a single mechanism or small number of higher-level mechanisms produces displacement of the moving target in the anticipated direction across multiple modalities (e.g., in higher-level processes or by top-down modulation of lower-level processes, Hubbard, 2005, 2006).

## *Crossmodal information*

Visual information can influence the auditory FLE (Alais and Burr, 2003) and auditory RM (Hubbard and Courtney, 2010), and auditory information can influence the visual FLE (Vroomen and de Gelder, 2004) and visual RM (Teramoto et al., 2010). Kinesthetic information can influence the visual FLE (Cai et al., 2000; Schlag et al., 2000). Such influences of crossmodal information on the FLE and on RM are not consistent with solely lower-level explanations of the FLE or RM.

## **COGNITIVE SIMILARITIES**

Cognitive similarities involve effects of (1) attention and cueing, (2) conceptual knowledge of target identity, (3) control and movement planning, (4) attribution regarding the source of motion, (5) frame of reference, and (6) neural mechanisms.

## *Attention and cueing*

The FLE (Sarich et al., 2007) and RM (Hayes and Freyd, 2002) increase if attention is divided between the moving target and a concurrent irrelevant stimulus. If the position of the flashed object or moving target is cued, valid cues result in a smaller FLE (Brenner and Smeets, 2000; Namba and Baldo, 2004; Shioiri et al., 2010; but see Khurana et al., 2000) and smaller RM (Hubbard et al., 2009) than do invalid cues. RM is influenced by verbal cues given prior to target presentation (Hubbard, 1994), but whether the FLE is influenced by verbal cues has not been reported. The FLE (Maiche et al., 2007) and RM (Hubbard and Ruppel, 1999) increase if the target moves toward another object, and this might reflect spatial distribution of attention.

## *Conceptual knowledge of target identity*

The FLE (Noguchi and Kakigi, 2008; Nagai et al., 2010) and RM (Reed and Vinson, 1996; Vinson and Reed, 2002) can be increased or decreased by conceptual knowledge regarding the identity of the moving target. Such influences suggest the FLE and RM do not result solely from lower-level processes, but rather result from higher-level processes or top-down modulation of lower-level processes. Also, the FLE (Moore and Enns, 2004) and RM (Kelly and Freyd, 1987) are diminished if the moving target does not maintain a consistent identity.

## *Control and movement planning*

The FLE is decreased if participants control presentation of the flashed object (López-Moliner and Linares, 2006), and decreased or increased if participants control target motion with a computer mouse (Ichikawa and Masakura, 2006, 2010) or robotic arm (Scocchia et al., 2009), respectively. RM is decreased if participants control the moving target (Jordan and Knoblich, 2004; Stork and Müsseler, 2004). If participants judge the position of a moving target after acquiring experience controlling target motion, RM is larger (Jordan and Hunsinger, 2008); this was attributed to effects of action planning, and such effects might similarly account for the larger FLE when participants controlled the target in Scocchia et al. The FLE (Nijhawan, 1994, 2008) and RM (Finke et al., 1986; Hubbard, 2005) aid in planning body movements and in interactions with environmental stimuli.

## *Attribution regarding the source of motion*

The FLE is influenced by whether participants believe they control target motion (Ichikawa and Masakura, 2006, 2010), and RM is influenced by whether participants attribute target motion to contact from an external stimulus or to an internal source (Hubbard, in press-b; Hubbard et al., 2001). Thus, the FLE and RM are decreased if motion is attributed to a source other than the target (e.g., the participant in the FLE, contact from another stimulus in RM), and this might result from higher-level processes or top-down modulation of lower-level processes.

## *Frame of reference*

Studies of the FLE usually involve judgment of relative position (but see Munger and Owens, 2004; Shi and de'Sperati, 2008; Becker et al., 2009), whereas studies of RM usually involve judgment of absolute position; however, regardless of whether relative or absolute position is judged, represented target position in the FLE and in RM is displaced in the direction of target motion. The FLE (Maiche et al., 2007) and RM are influenced by whether another stimulus provides a landmark (Hubbard and Ruppel, 1999) or surrounding context (Hubbard, 1993), and localization of the flashed object in the FLE (van Beers et al., 2001) and moving target in RM (Hubbard, 1990, 1997) are influenced by the direction of implied gravitational attraction<sup>1</sup> .

## *Neural mechanisms*

The FLE (Maus et al., 2013) and RM (Senior et al., 2002) are disrupted by transcranial magnetic stimulation of area MT. Kimura et al. (2011) suggested visual mismatch negativity might be related to the FLE and to RM. RM activates prefrontal cortex and anterior cingulate cortex (Rao et al., 2004); surprisingly, imaging information on the FLE has not been reported (although see Nijhawan, 2008, for discussion of potentially relevant neural mechanisms). The retina appears to compute a "crude extrapolation of the object's trajectory" (Gollisch and Meister, 2010, p. 155), and this is consistent with the FLE and with RM.

## **APPARENT DIFFERENCES OF THE FLE AND RM**

There are fewer apparent differences than apparent similarities regarding the FLE and RM. In many cases, differences that initially appear inconsistent with a hypothesis of overlapping or similar mechanisms in the FLE and RM can be reconciled with that hypothesis.

## **PERCEPTUAL DIFFERENCES**

Perceptual differences involve effects of (1) oculomotor behavior, (2) environment-centered direction, (3) object-centered direction, and (4) location within the target trajectory.

## *Oculomotor behavior*

The FLE (Nijhawan, 2001) and RM (Kerzel, 2000) for a continuously moving target are decreased and increased, respectively, if participants use smooth pursuit eye movements to track that target. However, such a difference does not rule out overlapping or similar higher-level mechanisms for the FLE and RM any more than differences in oculomotor behavior with continuous motion, implied motion, or frozen-action photographs rule out overlapping or similar higher-level mechanisms for RM (for discussion, Hubbard, 2005, 2006, 2010). Oculomotor behavior modulates target displacement for only some types of visual stimuli <sup>2</sup> , and so cannot be the sole cause of the FLE and RM with visual stimuli. Consistent with this, the FLE and RM occur with auditory, haptic, and crossmodal stimuli, and the FLE and RM are influenced by higher-level processes (Hubbard, 2005, in press-a).

## *Environment-centered direction*

The FLE is not influenced by whether targets descend or ascend (Ichikawa and Masakura, 2006, 2010), but RM is larger when targets descend than when targets ascend (Hubbard, 1990, 1997). Such findings would be consistent with a hypothesis of overlapping or similar mechanisms if the absolute positions of the moving target and the flashed object in the FLE were displaced forward equal distances (preserving their relative positions, cf. Hubbard, 2008), and these displacements were larger for ascending than for descending motion; however, measures of relative position typically used to study the FLE are not sensitive to displacement in absolute position.

## *Object-centered direction*

The FLE (Nagai et al., 2010) and RM (Nagai and Yagi, 2001) are smaller and larger, respectively, if a target moves forward (its typical direction of motion) rather than backward (opposite its typical direction of motion). Such findings would be consistent with a hypothesis of overlapping or similar mechanisms if the (1) flashed object and moving target were displaced in the direction of motion, (2) displacement of the flashed object and of the moving target were smaller for backward than for forward motion, and (3) decrease in displacement with backward motion was larger for the flashed object than for the moving target. The difference between the moving target and the flashed object would appear larger for backward than for forward motion, and the FLE (difference in relative positions) would look larger even though absolute displacement was smaller.

## *Location within the target trajectory*

A FLE usually occurs if the flashed object is presented at the beginning or midpoint, but not at the end, of the target trajectory (Hubbard, in press-a); however, RM is measured at the end of the target trajectory (Hubbard, 2005)<sup>3</sup> . One possibility consistent with a hypothesis of overlapping or similar mechanisms is that at the end of the trajectory, simultaneous decay of displacement of the moving target and displacement of the flashed object preserves their relative positions, resulting in no FLE, whereas at the

<sup>1</sup>An influence of direction of implied gravitational attraction on represented position is referred to as *representational gravity* (RG; Hubbard, 1995, 1997). van Beers et al. (2001) reported flashed objects located above or below the trajectory of a horizontally moving target were displaced away from the trajectory, and this displacement was larger if flashed objects were below the trajectory. Although RG was not considered by van Beers et al., the data they reported are consistent with a combination of RG and a bias away from the trajectory: If RG and bias away from the trajectory operate in the same direction (flashed object below the trajectory), they sum and displacement is larger, whereas if RG and bias away from the trajectory operate in opposite directions (flashed object above the trajectory), they partially cancel and displacement is smaller. Similarly, Hubbard (1990, 1997) reported that horizontally moving targets were displaced downward and forward (RG and RM operate in orthogonal directions) and that forward displacement was larger for descending targets (RG and RM operate in the same direction and sum) than for ascending targets (RG and RM operate in opposite directions and partially cancel).

<sup>2</sup>The FLE (e.g., Rizk et al., 2009) and RM (e.g., Munger et al., 1999) also occur with implied motion (referred to as *station to station* motion in the FLE literature), and RM also occurs for frozen-action photographs (e.g., Futterweit and Beilin, 1994); neither implied motion nor frozen-action photographs evoke

smooth pursuit eye movements. Oculomotor behavior involves the hardware implementation level or perhaps algorithmic level, whereas RM or the FLE might be caused by a higher-level mechanism involving the computational level (for discussion, Hubbard, 2005, 2006). Moreover, even if oculomotor behavior is correlated with extrapolation, such a linkage is not evidence that oculomotor behavior is causal in generation of extrapolation.

<sup>3</sup>A related forward displacement of target position at the beginning of the target trajectory is referred to as the *Fröhlich effect* (for review, Kerzel, 2010), but the relationship of the Fröhlich effect with the FLE or with RM is beyond the scope of this paper. Even so, it should be noted that just as the FLE might reflect RM in which the perceived final position of the moving target is assessed relative to an external stimulus (i.e., the flashed object) rather than relative to the actual final position of the target, a FLE in a flash-initiated display might reflect a Fröhlich effect in which the perceived initial position of the moving target is assessed relative to an external stimulus (i.e., the flashed object) rather than relative to the actual initial position of the target (for discussion, Hubbard, in press-a).

beginning or midpoint of the trajectory, decay of displacement of the flashed object, coupled with continuing RM for the stillmoving target, results in a FLE. Alternatively, a flashed object near the end of the target trajectory might eliminate RM (Müsseler et al., 2002) or exhibit displacement similar to that of the target (Hubbard, 2008), and thus preserve the relative positions of the flashed object and moving target, resulting in no FLE.

## **COGNITIVE DIFFERENCES**

Cognitive differences involve effects of (1) level of processing, (2) predictability, and (3) expertise.

## *Level of processing*

In the FLE, the moving target and flashed object are perceptually available during judgment, and the FLE is usually considered a lower-level perceptual phenomenon. In RM, the moving target vanishes before judgment, and RM is usually considered a higher-level cognitive phenomenon. However, higher-level cognitive variables influence the FLE (Noguchi and Kakigi, 2008; Nagai et al., 2010), and lower-level perceptual variables influence RM (Kerzel et al., 2001; Kerzel, 2002a); thus, the FLE (Hubbard, in press-a) and RM (Hubbard, 2005) each involve lower-level perceptual processes and higher-level cognitive processes. In the FLE and in RM, represented target position at the sampled time (when the flashed object appeared or moving target vanished, respectively) is displaced forward, and this might involve overlapping or similar mechanisms at a lower or higher level.

## *Predictability*

Forward displacement of a moving target in the FLE (Munger and Owens, 2004; Shi and de'Sperati, 2008) and RM (Finke and Freyd, 1985; Hubbard, 1990) suggests the FLE (Nijhawan, 1994, 2008) and RM (Hubbard, 1995, 2005) reflect predictions regarding subsequent target position<sup>4</sup> . Similarities in effects of attention and cueing in the FLE and in RM noted earlier suggest effects of predictability should be similar in the FLE and in RM. The FLE increases if predictability of the flashed object decreases (Baldo and Namba, 2002; Vreven and Verghese, 2005). However, RM decreases or increases if predictability (certainty) regarding target position is decreased by blocking target direction (Kerzel, 2002b) or increasing target blurriness (Fu et al., 2001), respectively. Manipulation of predictability in the FLE usually involves the flashed object, whereas manipulation of predictability in RM involves the moving target; it is not clear how predictability of a flashed object would influence localization of a moving target.

## *Expertise*

RM is increased for stimuli in a domain of expertise (Blättler et al., 2010, 2011). The FLE in a given domain might be compensated for by experts in that domain (Catteeuw et al., 2009), and this suggests the FLE is decreased for stimuli in a domain of expertise. Compensation for the FLE could involve smaller displacement of the moving target or larger displacement of the flashed object. Only in the former case would effects of expertise differ for FLE and RM; the latter case is consistent with effects of RM on a nearby stationary object (Hubbard, 2008) and effects of expertise on RM for a moving target.

## **CONCLUSIONS**

The FLE and RM involve forward displacement of the represented position of a moving target. A large group of variables have similar influences on the FLE and on RM, and this suggests the FLE and RM might arise from overlapping or similar mechanisms. A smaller group of variables appear to have dissimilar influences on the FLE and on RM, and potential ways to reconcile those differences with a hypothesis of overlapping or similar mechanisms were suggested. Interestingly, if perceived target position is assessed relative to the position of another stimulus, then displacement is usually referred to as a "flash-lag effect," whereas if perceived target position is assessed relative to the actual target position, then displacement is usually referred to as "representational momentum." This suggests the FLE is a special case of RM in which displacement of the moving target is assessed relative to the position of a flashed object rather than relative to the actual target position. A hypothesis that the FLE arises from overlapping or similar mechanisms as RM, or is a special case of RM, provides important constraints for theories of the FLE and of RM. Such a hypothesis also has heuristic value, as other variables that influence the FLE or RM (e.g., contrast, presence of feedback) could be predicted to have similar effects on RM and the FLE, respectively.

In defense of a mental extrapolation theory of the FLE, Nijhawan et al. (2004, p. 278) stated "a newer interpretation of a given phenomenon can be accepted over and above an existing one only if the newer interpretation is conceptually simpler (requires fewer assumptions) and/or is capable of explaining a wider class of empirical findings." By these criteria, an explanation of (at least some examples of) the FLE as a special case of RM should be preferred: RM is simpler than the FLE (e.g., RM involves one stimulus rather than the relationship between two stimuli) and accounts for a wider range of findings (e.g., involving a single stimulus as well as involving two stimuli). Indeed, the term "representational momentum" has a longer history in referring to extrapolation of a moving target, and so it might be appropriate and more parsimonious to consider this term and mechanism when referring to automatic extrapolation of target position, regardless of whether that extrapolation is measured relative to the position of another object or relative to the actual target position. Such an approach is consistent with models of spatial representation that address the FLE and RM (e.g., Müsseler et al., 2002; Jancke and Erlhagen, 2010) and suggests the possibility of overlapping or similar mechanisms of extrapolation.

<sup>4</sup>It should be noted that there are at least two different senses of "predict" in the displacement literature. Munger and Minchew (2002) use "predict" to refer to an explicit and deliberate judgment regarding the subsequent potential position of the target and use "remember" to refer to judgments of the final position (see also Finke and Freyd, 1985), whereas Nijhawan (2008) uses "predict" to refer to an implicit and automatic process that extrapolates the representation. In the current paper, "predict" and "predictability" are used in the sense of an implicit and automatic extrapolation.


of visual memory. *J. Exp. Psychol. Gen.* 115, 175–188.


representational momentum and related types of displacement: a reply to Kerzel. *Psychon. Bull. Rev.* 13, 174–177.


sensory consequence of our action. *Vision Res.* 46, 2122–2129.


Watanabe, K. (2010). "Conceptual influence on the flash-lag effect and representational momentum," in *Space and Time in Perception and Action,* eds R. Nijhawan and B. Khurana (New York, NY: Cambridge University Press), 366–378.


smoothness on the flash-lag illusion. *Vision Res.* 49, 2201–2208.


representational momentum. *Vis. Cogn.* 9, 41–65.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 February 2013; accepted: 05 May 2013; published online: 23 May 2013.*

*Citation: Hubbard TL (2013) Do the flash-lag effect and representational momentum involve similar extrapolations?. Front. Psychol. 4:290. doi: 10.3389/fpsyg.2013.00290*

*This article was submitted to Frontiers in Consciousness Research, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Hubbard. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

## Postdiction: its implications on visual awareness, hindsight, and sense of agency

## *Shinsuke Shimojo\**

*Shimojo Psychophysics Laboratory, Division of Biology and Biological Engineering/Computation and Neural Systems, California Institute of Technology, Pasadena, CA, USA*

### *Edited by:*

*Yuki Yamada, Yamaguchi University, Japan*

#### *Reviewed by:*

*Takahiro Kawabe, Nippon Telegraph and Telephone Corporation, Japan Talis Bachmann, University of Tartu, Estonia*

#### *\*Correspondence:*

*Shinsuke Shimojo, Division of Biology and Biological Engineering/Computation and Neural Systems, California Institute of Technology, MC 139-74, 1200 E.California Blvd., Pasadena, CA 91125, USA e-mail: sshimojo@caltech.edu*

There are a few postdictive perceptual phenomena known, in which a stimulus presented later seems causally to affect the percept of another stimulus presented earlier. While backward masking provides a classical example, the flash lag effect stimulates theorists with a variety of intriguing findings. The TMS-triggered scotoma together with "backward filling-in" of it offer a unique neuroscientific case. Findings suggest that various visual attributes are reorganized in a postdictive fashion to be consistent with each other, or to be consistent in a causality framework. In terms of the underlying mechanisms, four prototypical models have been considered: the "catch up," the "reentry," the "different pathway" and the "memory revision" models. By extending the list of postdictive phenomena to memory, sensory-motor and higher-level cognition, one may note that such a postdictive reconstruction may be a general principle of neural computation, ranging from milliseconds to months in a time scale, from local neuronal interactions to long-range connectivity, in the complex brain. The operational definition of the "postdictive phenomenon" can be applicable to such a wide range of sensory/cognitive effects across a wide range of time scale, even though the underlying neural mechanisms may vary across them. This has significant implications in interpreting "free will" and "sense of agency" in functional, psychophysical and neuroscientific terms.

#### **Keywords: postdiction, flash lag, TMS, causality perception, hindsight, free will, sense of agency**

## **INTRODUCTION**

This paper will review postdictive phenomena in perception and cognition, mainly from the author's own work with his collaborators but from some classical studies as well, to discuss the implications of these works. The first part of the paper will introduce a number of classical examples of "backward perceptual phenomena" (section Backward Perceptual Phenomena), as well as the flash-lag effect and its variations as more modern examples (section Flash-lag Effect, its Variations, and Object Updating). These phenomena will clearly suggest that there is a limited temporal time range (on an order of 100–200 ms) within which the processing of a stimulus presented later can affect the percept of another stimulus presented earlier. Starting from here, we will extend our review and discussion into several different directions. One unique contribution of ours is the TMS-triggered scotoma and the backward filling-in, which provide us with some insights into how cortical signals are dynamically reorganized (section TMS-Induced Scotoma, and Backward Filling-in). These may provide an empirical basis upon which to explore schematic prototypes of possible mechanisms (section Underlying Neural Mechanisms?). We will further extend our list of postdictive phenomena to (a) the memory and sensory consequences of voluntary movements (section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements), to discuss neural and computational mechanisms further (section Neural and Computational Considerations), as well as (b) "hindsight bias" and cognitive reconstruction for consistency, at even longer time scales (section Hindsight Bias, and Cognitive Consistency). Whereas the underlying neural mechanisms in these cases may be different from the more sensory phenomena, the operational definition, the functional significance, and computational structure at an abstract level, of the "postdictive phenomenon" may still hold.

In the last few sections, we will further extend our discussion to Benjamin Libet's well-known claims, and the "free will" as endangered (section Libet's Claims, and the "free will" Endangered?). We will consider "sense of agency" as a postdictive attribution and an authentic illusion, as a solution to this contention (section "Sense of agency" as Postdictive Attribution and an Authentic Illusion).

This paper is not meant to be an inclusive overview of backward phenomena in general (in the context of prediction vs. postdiction to cope adaptively with neural delay, see Bachmann (2013) for a systematic overview.) Rather, it aims to focus on the variety of phenomena at a wide range of time scales, to discuss possible underlying mechanisms as well as philosophical/realworld implications.

## **BACKWARD PERCEPTUAL PHENOMENA**

There are a few classical perceptual phenomena in which a stimulus presented later seems causally to affect the percept of another stimulus presented earlier. (To avoid ambiguity, "seems to" above means "seems to scientists," and "percept" means the "percept to the observer.") We would like define "postdiction" or "postdictive perceptual phenomena" as such, throughout this paper. For example, a masking stimulus that is presented later can suppress the visibility of a target that is presented earlier in physical time (backward masking; see **Figure 1**).

Kolers and von Grunau (1975, 1976) examined the "color phi" situation. The stimuli are similar to those for the classical apparent motion ("phi"; **Figure 2I**), except that the two stimuli (snapshots) are colored differently (e.g., green and red). Their observer tended not to see a smooth change of colors, but instead saw an abrupt change of the color at one point, in the trajectory (**Figure 2II**). However, Kolers and von Grunau (1976) also reported that a shape version (with two distinctively different shapes in the two frames) works better (**Figure 2III**). In this case, a quick yet smooth morphing of contours/shape can be observed, which is clearly different from the color case. Moreover, this observation seems to hold even in the abrupt, one-shot presentation, as opposed to repeated presentations of the same sequence.

The one-shot observation case is more stringent and intriguing particularly when there is no clue or knowledge is given as to where and what is given in the second frame. In fact, even the most classical case of apparent motion should be considered postdictive under such a condition, as quite logically, the smooth trajectory of motion should be constructed only after the information about the second stimulus is given. Indeed, we have demonstrated that even in a condition in which the apparent motion can be leftward or rightward randomly across trials, the perception of apparent motion is no less obvious and/or smooth than the repeated case. Moreover, by adding an additional probe dot around the spatio-temporal trajectory of the apparent motion, we demonstrated that re-ordering of the temporal sequence of events occurs along with the spatio-temporal trajectory of motion (Nadasdy and Shimojo, 2010).

Examples are not limited to vision. In the cutaneous modality, the most well-known form perhaps would be the "cutaneous

target from being visible. The backward case, in particular, pauses a paradox in the framework of single-line, or feedforward ("Cartesian") model of time. rabbit" effect (Geldard and Sherrick, 1972; see **Figure 3**). The cutaneous stimulus sequence is composed as the following for this demonstration; three tap stimuli are presented sequentially on an arm with temporal intervals equal but locations different (e.g., the first and second stimuli at the same location, and with the third then jumps, as shown in **Figure 3**). In effect, the second tap is mislocalized in the direction of the third.

These and other backward perceptual phenomena are mostly established at phenomenological and experimental levels. They obviously impose a hard problem on any interpretations based on the "a one-directional, single arrow" analogy of time, along which only an earlier event causally affect another subsequent event. One may call this the "Newtonian" model (or "Cartesian theater" after Dennett and Kinsbourne (1992); see the same for a theoretical review of the postdictive phenomena). In neural processing terms, the model may be characterized as strictly feedforward. When one considers the mental time, however, this would be an unnecessarily strong, and inappropriate, analogy to the physical time. As will be suggested later (section "Sense of agency" as Postdictive Attribution and an Authentic Illusion), the perceptual sequence of two perceptual events (as the content of percept, in the Mind Time) should be strictly separated from the physical sequence of corresponding neural events (in the Brain Time: see **Figure 6**). In other words, the strict isomorphism is not guaranteed to hold in the microscopic temporal domain (as analogous to no direct isomorphism hold between spatial perception and spatial relationship of neural activity in the brain). We will revisit to detail this point later (section "Sense of agency" as Postdictive Attribution and an Authentic Illusion).

impression of shape morphing, and seemingly works even under an

one-shot presentation without prior knowledge or a cue.

There is yet another line of perceptual phenomena which are closely related to the backward phenomena, and indeed yielded the concept of "postdiction" via debates concerning the underlying mechanisms—that is, the flash-lag effect and variations of it, as discussed next.

## **FLASH-LAG EFFECT, ITS VARIATIONS, AND OBJECT UPDATING**

Consider a smoothly moving object with yet another flashed object. Even when the flashed one is vertically aligned in its position with the moving object, the moving object tends to be mislocalized ahead in the direction of the motion (**Figure 4I**). This is called the "flash lag effect" (Nijhawan, 1994). The initial interpretation was that the brain predicts along the motion trajectory, to compensate its own neural processing delay by perceiving it ahead (but only for the moving stimulus, not for the flashed stimulus which is harder to predict). This was consistent with other circumstantial evidence that the brain compensates for its own delay (e.g., Changizi et al., 2008). However, a variety of other hypotheses/theories have been proposed to account for the effect, and none have been conclusive thus far (for a review, see Nijhawan, 2002).

What is critical in the current context is the following counter intuitive fact: the "flash terminated" case, where the moving and the flashed object disappear at the same time (**Figure 4II**), does not yield the effect (that is, the location of the flashed object is not mislocalized). The "flash initiated" case, on the other hand, where the two objects appear at the same time, with one continuing to move while the other disappears immediately, as "flashed" (**Figure 4III**), yields the effect (Nijhawan, 2008). Obviously, it is counterintuitive in any views of the effect based upon the predictability of the position of the moving target from its prior trajectory. To account for such a retrospective modulation of conscious visual perception, Eagleman and Sejnowski (2000) proposed a "postdiction" mechanism in which the percept attributed to the time of the flash is a function of events that occur in a timewindow of a maximum 80 milliseconds after the flash. Also note, with regard to the main theme of this paper, that they consider the postdictive process as a mechanism to yield visual awareness, or a conscious percept (beyond the mere operational definition of the "postdictive phenomena"; section Backward Perceptual Phenomena).

**Figure 4IV** illustrates "generalized flash lag" effect (Sheth et al., 2000). The sustained object did not move its position, but instead was smoothly changing in terms of one visual attribute such as color (or luminance, size, or spatial randomness, for instance), and another object is briefly flashed with a color which matches the changing object's color at that moment. In the example illustrated in the figure, the flashed yellow is perceived simultaneously as the orange color of the color-changing object. The effect is structurally similar to the classical flash-lag in the space/position domain; that is, a color of the changing object subsequent to the moment of presentation is perceived as simultaneous with the flashed. It is also critical to note the asymmetric pattern of results, similar to that in the classical flash lag effect—that is, the generalized flash lag tends to occur when the second half of a stimulus movie is presented (starting from the flash of the target; the "flash initiated"), but not when it is terminated there (the "flash terminated").

It may be fair to say that there are some "non-postdictive" accounts proposed for the flash lag effect, and specifically the flash terminated case. For example, one may rely on the alleged extra neural delay (from the stimulus onset to the onset of conscious perception) of the suddenly-flashed object relative to the moving object (e.g., Whitney and Murakami, 1998). This account may be generalized to any sorts of smooth stream of an object representation with an abrupt onset of another object, thus possibly to the generalized flash lag effect. However, the mere fact of different neural delay may be somewhat dubious (see Moutoussis and Zeki, 1997; Nishida and Johnston, 2002). Moreover, the situation seems to be a bit more complicated, and other factors such as whether stimulation comes in stream or flashed plays a role (Bachmann, 2010, 2013; Bachmann et al., 2012).

Either way, the postdictive account of the flash lag effect, especially of the flash terminated case is worth mentioning here, for several reasons. First, it may be considered the original case of the term "postdiction" specifically employed to describe the retrospective modulation of visual awareness. Second, along with our strictly operational definition of the postdictive phenomena (section Backward Perceptual Phenomena), a physically subsequent event (of the moving object) affects the perceptual (spatiotemporal) relationship between it and another flashed object. Therefore, the neural delay accounts should be considered "nonpostdictive" mechanisms which are still proposed to account for the postdictive (flash-lag) effect (operationally defined). Third, this is a rich perceptual phenomenon with a wide range of variations where a physical spatio-temporal sequence of visual stimuli leads to a percept of different sequence, thus providing ample opportunities to investigate underlying mechanisms of the phenomena postdictive.

One may still wonder what relationship the phenomena described so far have to the idea of "object updating" by James Enns et al. The object updating framework may be considered the closest to the idea of postdiction and related phenomena which are outlined here. A closer comparison may reveal similarity as well as the current implications beyond those of the object updating.

Object updating refers to the process whereby recently sampled information is integrated with an existing representation of

a scene, resulting in an updated version (e.g., Lleras and Moore, 2003; Lleras and Enns, 2004; Moore and Enns, 2004). They argue that this theoretical framework provide a more comprehensible account for a variety of effects, such as the object substitution masking (especially the *Negative Compatibility Effect*, or the *NCE*), and the flash lag effect, etc.

The negative compatibility effect (NCE) is the surprising finding that visual targets that follow a brief prime stimulus and a mask can be *identified more rapidly when they are opposite* rather than identical to the prime. This was originally taken to reflect a competition between inhibitory unconscious processes and excitatory conscious processes (Klapp and Hinkley, 2002). However, Lleras and Moore (2003) offered an alternative account based on the object updating. If the perceptual processing interacts between the prime and the mask features, these seemingly neutral masks may, in fact, act as strong positive primes for the features that are not shared between prime and mask, they argue.

Likewise, the object updating may provide an alternative account especially for the classical, and some special variations of the flash lag effect (Nijhawan, 2008), where a smoothly moving object appears to be ahead in its trajectory, relative to a simultaneously flashed another object. The effect occurs when the moving object continues following the flash, but is eliminated if the object's motion path ends with the flash, as described above (the "flash terminated"). In the object updating framework, this may be interpreted as proving the necessity of updating the object representation after the flash. It seems to be consistent with the postdictive account of the effect, but with a somewhat different emphasis.

Whereas the object updating emphasizes the distinction between a representation of new object vs. that of the same object with feature changes, the postdictive construction view emphasizes that the content of conscious percept (e.g., the spatial alignment judgment of the two objects in this case) is a postdictive construct at an implicit level. The critical phenomenological observation here is that the updated representation is "experienced" as a percept, but "referred back" in time to the original moment of focus. It will be clearer especially in the case of the postdictive phenomena in a longer time scale (section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements), but isomorphically true in nearly all the cases dealt in the current paper.

The object updating theory seems to be relatively limited to a short time range within several hundred ms or so, and to only a handful of visual effects, as mentioned above. The critical question raised in the section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements and the subsequent sections will be whether the postdiction framework, while highly consistent with the object updating, will offer a more inclusive (or at least continuous) list of phenomena over-arching a much wider range of time scale, from teens of ms (at the level of sensation) to months (at the level of long-term memory and cognition).

## **TMS-INDUCED SCOTOMA, AND BACKWARD FILLING-IN**

Transcranial Magnetic Stimulation (TMS) is an intriguing technique which is used to stimulate or to suppress visual cortical activity, without stimulating the retina with a light. It is intriguing specifically in the current context because when using it, one may investigate how the direct manipulation (activation/suppression) of the visual neuronal activity can interact, and be integrated with the retinal signals.

We demonstrated that an artificial and temporal scotoma can be created by a combination of a visual stimulus and a singlepulse TMS (Kamitani and Shimojo, 1999). In each trial, there was a fixation point on a gray background, and a large-field grid stimulus was presented briefly (40–80 ms). After a variable delay, a single-pulse TMS was applied to the occipital scalp (**Figure 5I**). When the delay of the magnetic stimulation was within 67–200 ms, the observer typically reported a scotoma, i.e., a gray homogenous patch in the hemi visual field contra-lateral to the TMS (**Figure 5II**). The phenomenology was qualitatively common and reliable across participants. We could even ask them to draw a gray-filled elliptic patch by adjusting its size via a mouse. **Figure 5II** shows an example of an actual data set obtained that way. The results in five trials within a participant with a fixed delay were superimposed, in order to show the across-trial reliability of the effect.

In a subsequent experiment, we maintained the stimulus sequences, but changed the color of the background: there was initially a red(green) background for 5 sec, then a black-andwhite stripes for 80 ms., and finally green(red) background for 5s (**Figure 5III**). (A two-dimensional grid was used in the first experiment, whereas stripes were used in this experiment. As a result, the scotoma was compressed along the orientation of the stripes, which is not essential given the current context.) With

of stimulus presentation, and also the result, i.e., averaged color chosen across

though they were given to the retina subsequently in the physical time.

this design, we tried to address the following question—why did we perceive the gray-filled scotoma filled gray in the first experiment? Was it because all of the color selective neurons were equally suppressed by the TMS (hypothesis 1: the "broken color TV" hypothesis), or merely because the background in the corresponding retinal region was occupied by gray as a part of the preceding background (hypothesis 2: "forward filling-in" hypothesis)? The participant's task in this particular experiment was to report the filled color inside the scotoma by pointing and click on a continuous color scale showing a smooth transition from a pure gray to the most saturated red (green).

The results betrayed both of the hypotheses above, as shown in the figure (**Figure 5III**). The colors in the scotoma in the upper and the lower row of the figures were the actually selected color, averaged across the participants. Thus, when the subsequent background was green (the preceding was red; the upper row), a green-filled scotoma resulted. When the subsequent background was red (the lower), it then was red-filled. Thus, a sort of "backward" filling-in seemed to occur.

**Figure 5IV** schematically summarizes the result. When a local region of the topographical map of the visual field in the early visual cortices was suppressed by the TMS, the corresponding region in the grid/stripe pattern was perceived as a scotoma. The scotoma, however, was filled backward from the subsequent background color (indicated by the black arrow); thus, the stimuli presented only sequentially on the retina (i.e., the grid pattern and the subsequent background color) were perceived simultaneously in the particular spatial configuration (i.e., the elliptic scotoma in the large BW-patterned field). The filling-in is "backward" in this limited sense.

According to our informal observations, qualitatively identical results can be observed when we replaced the colored backgrounds with textured ones (although colors were the easiest to identify and thus to report). Therefore, the backward filling-in is a general phenomenon, not specific to color. When a part of the topographical representation was lost (by the TMS with a delay shorter than 200 ms), the visual cortex automatically utilizes the latest input in the particular region (the scotoma) at the moment and fills it in. This is consistent with our findings of the TMStriggered replay of a visual stimulus (Wu and Shimojo, 2002, 2004; Halelamien et al., 2007; Vasudevan et al., 2009), indicating that content of a conscious percept is determined by the interplay of the retinal input and the internal state of the visual cortex at the moment.

Since this is a very special case with TMS, not with regular retinal inputs, it may not be appropriate to include it in the list of the "postdictive visual phenonena." Indeed, one may account for the backward filling-in effect strictly relying on the instantaneous effect of the TMS on the visual cortex, as opposed to the neural conductance delay from the retina to the primary visual cortex, in the vicinity of 80–100 ms minimally. But even so, this may still be considered a special case of the "catch up," as described as the first prototypical neural model of the postdiction mechanism in the next section Underlying Neural Mechanisms?. Moreover at the very phenomenological level, the background color (or pattern) in the scotoma area is perceived as "simultaneous" as the surround target pattern, which is qualitatively different from the temporal sequence of the visual stimuli. This is consistent with the operational definition of the postdictive phenomena. The TMS and retinal inputs are interactively compromised to yield a stable spatial percept (for instance, the shape of the scotoma is filled in and thus squeezed along the direction of background stripes; Kamitani and Shimojo, 1999), and this is reminiscent of the case of "smooth pursuit mislocalization" which will be described in section Pursuit Mislocalization, and Effects of the Spatial Context.

The set of findings with TMS allows us a glimpse into the dynamic process of integration to yield a postdictive effect at the early cortical levels within a 100–200 ms time window. Although in the previous examples of visual postdiction phenomena there was no direct stimulation/suppression of the visual cortical activity, a qualitatively similar process may operate during the dynamic reorganization of inputs. Overall, these findings indicate that dynamic, and at least partly postdictive processes are involved in the neural mechanisms yielding visual awareness, or a conscious percept.

Before moving on further to extend our list of postdictive phenomena to a more macro timescales, we would like to consider what prototypical neural/psychological mechanisms are conceivable as candidate underlying mechanisms (next section).

## **UNDERLYING NEURAL MECHANISMS?**

We have reviewed backward phenomena using our own definition at sensory/perceptual levels. It may be the time to consider what alternative we have, in terms of possible neural mechanisms. Albeit schematic, we can list some, as illustrated in **Figure 6**. External (environmental, or physical stimulation) Time, Brain (neural/physiological) Time, and Mind Time are represented separately in these diagrams. The oblique arrows denote neural conductance delays (the more oblique from the vertical direction, the slower).

A remark may be necessary here, with regard to the distinction between the Brain Time and the Mind Time. "Mind Time" will be used as a short name for "mental representation of the temporal events." Most of scientists naively assume that the Brain Time defines the Mental Time, and thus equate them, which the author cannot agree. A perceptual sequence of events, as a content of a percept, should be logically dissociated from the physical sequence of neural correlate events which caused them. When an event A is perceived prior to another event B ("A→B"), such a stream of percept ("A→B") should also have a neural correlates. The neural correlates, however, does not have to be in the form such that there are two dissociable neural events corresponding to A and B respectively, nor that they are in this physical sequential order [the neural event(A) → neural event(B)], although such one-to-one mapping between the perceptual events and the neural events may be found at the peripheral or the lower-level visual representations. This point may not seem to be necessary in this section, but the significance will be clearer when we argue against the "first-order isomorphism" in the temporal domain and Benjamin Libet's view later (section Libet's Claims, and the "free will" Endangered? and "Sense of agency" as Postdictive Attribution and an Authentic Illusion).

The first intuitive option is the "catch up" model (**Figure 6I**). It has been accepted that the same retinal input may arrive at

the primary visual cortex with various timings, and the same may be applied to the lower and upper levels of the visual hierarchy in general. Thus, a fast signal of a physically subsequent stimulus B may catch up with a slow signal of a stimulus A to affect the percept of it causally (e.g., the visibility of it, as in the case of backward masking; Breitmeyer and Williams, 1990), as indicated in the red circle in the figure. The slow and the fast signals have been associated with either X and Y channels, or sustained (P) and transient (M) channels (e.g., Breitmeyer, 1993) in terms of the neural implementation. It may appear confusing to some readers because this model solely based on feedforward pathways, yet claimed to be a potential account for postdiction. Note once again that throughout this paper, the definition of the postdictive phenomena is strictly operational (section Backward Perceptual Phenomena), and the proposed mechanism can be either feedforward like this, or re-entry (as the next model) which can be considered postdictive at the implementation level, or even more explicitly postdictive as the Benjamin Libet's model (as will be described in section Libet's Claims, and the "free will" Endangered?).

**Figure 6II** denotes an alternative idea ("reentry"), which assumes vigorous feedback from a higher level to a lower level of the visual information processing hierarchy. It is such feedback pathways that enable various sorts of contextual effects, including some postdictition (as indicated by the thick blue and green downward arrows) and even conscious awareness (Lamme, 2001; Fahren fort et al., 2007). This may allow more room to account for paradoxical causal perception, as will be described later (section Spatial Memory Updating with Perception). Both the "catch up" and the "reentry" models have been entertained especially for the backward masking and the flash lag.

The third option ("different pathways") heavily relies on the known dichotomy of two visual pathways (ventral vs. dorsal, "what" vs. "where" or "cognition" vs. "action"; Goodale and Milner, 1992). This scheme is meant to explain dissociative, or selective deficiency in patients, as well as differences between explicit and implicit measures (such as reflexive reaction times vs. elaborated, conscious perception; e.g., Vorberg et al., 2003). However, it can also be applied to account for some of the seemingly paradoxical, postdictive phenomena, as will be described later (Neural and Computational Considerations). For a realworld example, competitive 100 m sprinters often report that their legs start moving even before their conscious awareness of the starter's pistol sound. It can be interpreted with multisensory prior entry, i.e., a difference in neural delay in different sensory-motor pathways, such as auditory→motor vs. motor→kinesthetic. If so, this actually reflect a rare failure in the ordinary postdictive reconstruction process of causality, as "the pistol sound triggered my leg reaction," thus allowing us a glimpse into what is normally occurring a the implicit level, before the postdictive process operates (we will be back again to a similar real-world example in section Libet's Claims, and the "free will" Endangered? and **Figure 10II**). Given that this model incorporates global pathways/connectivity aspects of the brain, it may have more flexibility to account for paradoxical causality like this.

As the fourth option, we can add the "memory revision" model (Dennett and Kinsbourne, 1992), in which a tentatively established memory representation may be revised later. The object updating idea (described in section Flash-lag Effect, its Variations, and Object Updating) may be considered a specific example of it. This model may be more appropriate for the phenomena with a longer time scale, as will be described in the next and the subsequent sections.

These concepts exemplify the prototypical ideas of mechanisms underlying various sorts of postdictive phenomena. They are not necessarily mutually exclusive, especially because some tap into existing neural mechanisms while others emphasize more hypothetical, theoretical structures. More recent models may be considered as hybrids. For example, Bachmann's (2013)"perceptual retouch" model seems to have incorporated both the "reentry" and the "different pathway" ideas. Likewise, the fourth model, i.e., the revision of memory, may be involved more or less in all the other models (although it depends on the definition of "memory") because those inevitably refer to some neural representation of sensory input, which may be called memory (albeit very iconic, or short-term). The distinction between perception and memory may be important when one discusses neural implementation, but it will be made less important when we will extend this review to a longer time scale because of the similarity and the continuity in function and abstract structure (section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements and Hindsight Bias, and Cognitive Consistency).

What is also noteworthy here is that some of these models (especially the first and the third) are rather conservative, in that temporal sequence of the relevant neural events can directly determine and thus be "read out" as the perceived order (and in some cases causality) of the perceived event. Thus once again, "nonpostdictive" (such as the "catch up" and the different "pathway") models as well as "postdictive" neural implementations (such as the "re-entry") can potentially offer alternative accounts for the "postdictive" phenomena in its operational definition.

For the rest of this paper, we will every now and then refer back to these diagrams. When we discuss the relevance of Benjamin Libet's claims, especially the "backward referral" claim, we will point out some potential problems related to "the first-order isomorphism" between the Brain and Mind Times, that is implicitly assumed particularly at microscopic time scales in these models (with the possible exception of the memory revision model). A more intriguing possibility based on a strict distinction between perceived timing as a content of perception vs. its physical timing of its neural correlates, will be introduced.

Thus far, we have discussed about vigorous postdictive reorganization in the time scale of hundreds of milliseconds, whereas now we will include memory updating and perceptual reorganization on a time scale of one to several seconds (section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements), as well as higher cognitive functions including hindsight in visual exploration/detection, and the postdictive reconstruction of causal attribution in long-term memory, where the relevant time scale will range from minutes to days (section Hindsight Bias, and Cognitive Consistency).

The extension of our list of postdictive phenomena into the longer-time scale, and memory will have two implications. First, it will point to the possibility that the postdiction may be a very general principle from sensation to cognition to memory, and with time delays from tens of millisecond to months of time delay (Neural and Computational Considerations). Second, it will make it more feasible to consider visual awareness as extrashort (iconic) visual memory, which is phenomenologically and structurally continuous to short-term memory. For an intuitive example, consider a "percept" of flickering light. It is directly "perceived" as such, but some form of memory is logically necessary "to perceive" it.

## **EXTENDING THE "POSTDICTION" CONCEPT TO THE MEMORY AND THE SENSORY CONSEQUENCES OF VOLUNTARY MOVEMENTS**

Perceptual events are constantly consolidated into memory, but the transition process is not precisely akin to simply creating a Xerox copy. Instead of faithfully duplicating the perceptual structure at the time, it rather reorganizes the event sequence in accordance with various principles, such as information compression, better Gestalt, consistency with regard to the relevant context, and a causal framework, etc. Wu et al. (2009), for example, demonstrated that a flash that actually caused reappearance of the target stimulus in awareness (after having been "subliminated" by motion induced blindness, Bonneh et al., 2002) was itself consciously perceived as appearing later than the reappearing target. Thus in this case (as many other cases dealt with in the current paper), perceived temporal sequence of the two events are detached and inconsistent with the physical causality. Note that the "catch up" model (in section Underlying Neural Mechanisms?) may suffice to explain the illusory temporal order, but a conscious percept may require more, including causal attribution at least in some cases. Wu et al. (2009) prefer the reentry model to explain, but there may be another account feasible based on a neural delay difference and a distinction of specific/nonspecific processes (Bachmann and Aru, 2009).

This type of backward cognitive reorganization has been reported repeatedly in cognitive and social psychology. For example, F. C. Bartlett in his classical study (1932) used American Indians' folktales as materials to recall, which may appear illogical or unrealistic to average Westerners, in a recall experiment. Recalled stories by British participants (students) revealed some distinctive eliminations, re-ordering, and biases to make the stories more consistent and logical. In their seminal series of studies, E. F. Loftus and her group (1979) demonstrated that witnesses' memories of an accident can be biased by the way of questions/instructions and by the context and episodic memory of recall itself. Memory was reorganized mainly for consistency, information compression, and ease to of retrieval in these cases. In some cases it can be interpreted just as a simple confusion on temporal sequence, but in most cases, the causal interpretation or even a revision of the content of memory is involved. Similar causal misattribution/memory modification has been observed when one is asked for "intention" of action as the cause of a movement (as will be mentioned again in section "Sense of agency" as Postdictive Attribution and an Authentic Illusion).

Memory reorganization of this type is commonly known, but the current question is as follows—Could the same type of postdictive reconstruction of memory occur at a much lower sensory-motor level, and in a much shorter time scale? If such a case exists, then it would bridge the gap between the backward perceptual phenomena (reviewed in the previous sections) and memory, raising the intriguing theoretical possibility that the postdictive construction be a general neural-cognitive principle that governs from lower sensory to higher cognitive processes, from micro to macroscopic time scales.

## **SPATIAL MEMORY UPDATING WITH PERCEPTION**

In **Figure 7I** (revised from Sheth and Shimojo, 2000), a target dot undergoes a smooth translational motion at a constant speed from the left to the right on a CRT display. When it disappears, a tone plays with either a high or a low pitch randomly. Depending on the tone pitch, the participant in the experiment was asked to report either the initial, or the final position of the target respectively, by moving the cursor and clicking the button as soon as possible. The stimuli and the task were as simple as such, except for one critical aspect that is, a random dot texture, which moved either downwards or upwards randomly, was added to the display.

oblique direction (called "Duncker illusion," or induced motion). Upon the stimulus offset, the participant localized either the target's initial or the final positions (depending on a tone cue given at that moment) by pointing and clicking with a mouse cursor. **(II)** Response (the initial position estimation) expected in the case of a full postdictive reconstruction. Since the participant was more certain about the exact location of the final position of the target, and also since the oblique trajectory of movement due to the illusion was so compelling, we

length of the rectangle indicates the standard error of the initial and the final localizations, respectively. As can be seen from the figure, the final position was deviated relatively little (right), but the initial position was biased opposite to the illusory bias of the motion trajectory, as expected (left). The differences in localization error between the initial and the final positions were highly significant, in terms of both accuracy (*P* < 10<sup>−</sup>7) and directional bias (*p* < 10<sup>−</sup>30; *N* = 7). (Figures are modified from Sheth and Shimojo, 2000, Figures 1 and 2.)

Due to the well-known "Duncker illusion," a target that physically moved horizontally appeared to move obliquely upwards (the red arrow in **Figure 7I**; against the background dots moving downwards) or obliquely downwards (against the background upwards). Would the memorized initial position be affected by this illusory bias of the trajectory? More specifically, would the bias of the initial location be in a direction that was more consistent with (1) the final position which should be the latest and thus more accurate visual signal, and (2) the perceived (illusory) direction of the trajectory (as illustrated in **Figure 7II**)?—These were the critical questions that we raised with this paradigm.

**Figure 7III** shows the results Sheth and Shimojo (2000). As expected, the errors in the final positions were relatively small (right), but the initial positions (left) were biased significantly in the direction consistent with that of the perceived trajectory and the final position. Because the participants had learned quickly via the practice and in the initial trials that they would be asked for the initial position with a 50% probability, it can be assumed to be a trivial cognitive strategy for them simply to memorize the initial position as accurately as they could at the outset in the each trial. In the result, the bias was substantially smaller than what expected from a complete compensation to be consistent with the perceived trajectory, but it was significantly above zero.

Several control experiments revealed further that: (a) making known the nature of the illusion, or (b) making the trajectory of target motion much more irregular and complicated (to minimize a straightforward, conscious and logical inverse calculation), did not significantly reduce the bias. Moreover, (c) reducing the latency of the response (i.e., allowing the subject to respond immediately when they saw the beginning of the target motion) reduced the bias substantially, but not completely.

This type of spatial memory updating has two significant implications, at least. First, as emphasized previously, it indicates a constant updating process of memory when faced with real-time sensory inputs. Second, it may indicate the "revising" of causal perception, albeit implicitly. That is, the initial location, the trajectory, and the final location are reorganized in a more consistent causal framework of perception in this case. Thus, it may share implications with several other studies concerning causality perception. For examples, on top of Wu et al. (2009) that is described in section Extending the "postdiction" Concept to the Memory and the Sensory Consequences of Voluntary Movements, Choi and Scholl (2006) demonstrated that visual events can determine whether a collision is perceived in an ambiguous situation even when those events occur after the moment of "impact" of the putative collision has already passed. Thus, the findings overall indicate a vigorous automatic tendency of updating shortterm memory to be consistent with on-line perceptual inputs, even at this simplest and lowest sensory level. This immediately raises a related question as to whether this type of postdictive reconstruction occurs only in positional information, or whether it may occur in any other visual attributes, such as shape or color? The logical expectation, especially from the "generalized flash lag" observation (section TMS-induced Scotoma, and Backward Filling-in), would be the latter because there is nothing intrinsically special about position in this case (i.e., dynamic reconstruction). Albeit inconclusive, we do have some evidence consistent with this expectation as described in the next section.

## **PURSUIT MISLOCALIZATION, AND EFFECTS OF THE SPATIAL CONTEXT**

Pursuit eye movement on a smoothly moving object leads to a mislocalization of the target that is briefly presented nearby during the pursuit (Mitrani and Dimitrov, 1982). To be more specific, the direction of mislocalization is in the direction of the pursuit movement (**Figure 8IA**). What if there is an obstacle (a continuously present static object) in the trajectory of the mislocalization (**Figure 8IB**)? It would be inconsistent if the brain has a spatial representation in which it has to carry the location of the flashed target along the translational trajectory. How would the brain resolve such an inconsistent situation? This was the motivation of the experiments (Noguchi et al., 2007). Directly extending the implications of the previous study (with the Duncker illusion, in the previous section), one may hypothesize that the visual system pursues a more consistent interpretation of spatio-temporal events, modifying the natural tendency of the pursuit-caused spatial bias. **Figure 8IC** illustrates one variable, which is the position of an obstacle relative to that of a flashed target, and **Figure 8IIA** shows the results where the positional errors are plotted against the relative positions. Essentially, the mislocalization was "stopped" by the obstacle, but only if it is within the trajectory of the mislocalization. Likewise, when the obstacle was indeed in the trajectory of the mislocalization but only partially covering the length of the flashed target (**Figure 8III**), the perceived mislocalization is consistent with it in terms of the shape and the position of the mislocalized target in the spatial representation (**Figure 8III**).

More intriguing was when the obstacle had a different color (e.g., red) from that of the flashed target (green). As shown in **Figure 8IV**, a color mixture resulted. Note that while the color perceived was a mixture, that the mixed hue itself was never presented to the retina, which should be considered very convincing evidence for integration of signals within a temporal window. Note that the differently-colored obstacle needed to be located in the direction of the mislocalization (**Figure 8IVA**), not elsewhere (B). This effectively eliminates the possibility of any local aftereffect.

A related observation was made in the flash lag circumstance, where a red target was flashed exactly on top of a green object, for instance. This would yield an yellow percept due to color mixture normally, but when the green object underwent a smooth motion (either rotational or translational), the red flash was mislocalized and at the same time seen qualitatively very close to the original saturated red (Nijhawan, 1997). Therefore, in this case, color decomposition instead of color mixture (of the retinal inputs) occurred. What is common between these two cases, the smooth pursuit mislocalization and the flash lag, is that the color perceived was seemingly consistent with the perceptual localization, as opposed to the retinal.

In the study of pursuit-driven mislocalization, we also manipulated the timing of the obstacle with regard to that of the target. The results (**Figure 8V**) suggested that the reorganization of the shape and the color were maximal when the obstacle was presented in the "post" period (i.e., having the same

**FIGURE 8 | Saccade mislocalization, and effects of spatial context (adopted from Noguchi et al., 2007). (I)** The main experiment. **(A)** The basic effect of the pursuit mislocalization. The black arrow indicates the direction of both the target's (black) movement and smooth pursuit eye movement, while the white arrow indicates the mislocalization effect. **(B)** The main experimental configuration where a static "obstacle" was present in the trajectory of mislocalization throughout the trial. **(C)** Four different locations of the obstacle, as the main variable. **(II)** Results. Position errors (deg.) are plotted for the spatial conditions of the obstacle. The solid and dotted rectangles indicate the location of the wall in each condition. As can be seen, the position errors were the largest in the low condition (with no significant reduction), then smaller in the far, the middle, and the near conditions in this order. This was exactly what should be expected from the topographical "spatial representation" idea. **(III)** Manipulations to partially occlude the trajectory zone **(A)**,

onset as the flash, but lasting longer after the flash offset), relative to the "pre," "during" or "pre and during" periods. This suggests that the "carrying over" mechanism for the localization of the flashed target operates beyond the duration of the flashed target itself, and that the presence of the obstacle interferes with it in the critical time zone. Nonetheless, the resulting phenomenological results **(B)**, and more quantitative results plotted as length **(C)** and position **(D)** of the perceived target. **(IV)** Color mixture. The two stimulus configurations/ sequences employed **(A,B)**, and the results in the CIE xy color space **(C,D)** were shown. As can be seen in **(C)**, the colors were mixed into a subjective yellow. As can be seen in **(D)**, the color mixture effect was much larger in the "right wall" condition (**A**, where the obstacle was located right in the middle of the trajectory) than the "left wall" (**B**, where the obstacle was behind it). **(V)** Effect of timing. We compared four different timing conditions: (a) Pre, (b) During (the flash target presentation), (c) Post (during + after), and (d) Pre + Dur. In the partial occlusion ("a hole") variation **(A)**, the effect of blocking the mislocalization was largest in the Post condition **(B)**. In the color mixture variation **(C)**, the mixture effect was maximum also in the Post condition **(D,E)**. (Reproduced from Noguchi et al., 2007, Figures 1, 2, 3, 6 and 7, with permission from ARVO.)

consistent features (i.e., positions, shapes and colors) are perceptually "backward-referred" to the moment of the flashed target backward-referred because it is phenomenologically not the case that the original positions/colors/shapes are perceived first, and then re-perceived as modified. Rather, all of those "reconstructed" features are perceptually given as a one-shot, immediate percept from its onset of appearance. The similarity to the flash lag (especially the "flash initiated" case) should be obvious.

Thus, the postdictive reconstruction occurs in not only the position, but rather in various visual attributes including, the shape and color (and even the temporal order). Together with the generalized flash lag effect (section Flash-lag Effect, its Variations, and Object Updating) and the memory updating results with the Dunker illusion (section Spatial Memory Updating with Perception), in terms of postdictive processing the position is not special. Rather, all the concurrent visual feature information is dynamically and iteratively processed to reach a consistent scene interpretation at the given moment.

## **NEURAL AND COMPUTATIONAL CONSIDERATIONS**

Here, we would like to reconsider the possible mechanisms (section Underlying Neural Mechanisms?), but this time with more explicit references to specific neural mechanisms, with additional computational thoughts especially in the extended time range.

As mentioned in section Underlying Neural Mechanisms?, varying speeds of neural signal conductance even within the same pathway (say, from the retina to the V1 via the LGN) are well established (e.g., "Parvo" vs. "Magno" pathways; Livingstone and Hubel, 1988), providing the basis of the "catch up" idea (**Figure 6I**).

More recent advances in neuroscience implicate reentrant signaling as the predominant form of communication between brain areas, and this idea may help us to understand the neural correlates of visual awareness, in situations such as backward masking (Di Lillo et al., 2000). To be more specific, they identified two masking processes both of which are based on reentrant signaling. One is an early process that is affected by physical factors such as adapting to luminance, and the other is a later process that is the masking by object substitution. Iterative reentrant processing, when formalized in a computational model, provides a more comprehensible account of all forms of visual masking than do the long-held feedforward views based on inhibitory contour interactions. Along this line, V. Lamme and his colleagues revealed that the EEG derivatives that are typically associated with reentrant processing were absent in the masked, as opposed to non-masked, condition (Fahrentfort et al., 2007; although there is a notable objection, e.g., Põder et al., 2013). A study employing TMS with the metacontrast paradigm suggests that a prior visual stimulus can influence subsequent perception at the early stages of visual encoding via feedback projections (Ro et al., 2003). In the context of "blind sight," there is substantial evidence in favor of the theory that unconscious visuo-motor transformations, as in the blindsight, are executed in an entirely feedforward processing cycle, while visual awareness is critically dependent on feedback connections to the primary visual cortex (Lamme, 2001).

These findings make reentrant signaling as a good candidate for the postdictive phenomena described thus far in this paper (**Figure 6II**), for a variety of reasons. First, the reentrant feedback is appealing intuitively in the sense that an earlier (in both the temporal sense and the visual information processing hierarchy) visual representation is "revised" by the feedback from a higher level. Second, the distinction between the implicit vs. the explicit processes may nicely map onto the feedforward/feedback distinction (as shown in the case of blind sight above). Last but not the least, such reentrant signaling may in principle occur from very short-ranges (such as different layers of the visual cortex, or neighboring visual areas such as V1 and V2) to very long-ranges (such as occipito-frontal and occipito-temporal connections). This last point is especially significant in the current paper, which aims to find a common thread in various postdictive phenomena, across very different temporal and neural scales.

Finally, the idea of two major, dissociable visual streams has been presented. Whereas Mishkin et al. (1983) characterized the ventral and the dorsal pathways "what" vs. "where," Goodale and Milner later modified as "cognition" vs. action with new patient data. This provided the basis for the "different pathways" idea (**Figure 6III**).

From a more computational viewpoint, at least some of the postdictive phenomena may be understood in the Bayesian framework, where the conditional probability indicates signal-tonoise ratio in the visual input while the prior probability may be encoded in the prior internal state of the relevant brain region. Indeed, a similar attempt to account for the rabbit and some other postdictive effects in the Bayesian framework has been made elegantly (Goldreich and Tong, 2013). It is also consistent with the general implications from the TMS studies (reviewed in section TMS-induced Scotoma, and Backward Filling-in) in which a conscious percept reflects both the retinal input (as a likelihood) and the internal neural state (as a prior). More specifically, some additional (potentially arbitrary) assumptions may be necessary to be consistent with the findings. The occurrance of the scotoma itself is due to a local disruption of topographic representation of the retinal input (i.e., a local blockage of the likelihood). There is evidence that the TMS locally suppresses the retinotopic mapping of the visual field on the surface of the visual cortex (Kamitani and Shimojo, 1999) so this assumption is reasonable. Then, the backward filling-in may simply reflect the brain's tendency to rely heavily on the prior (whichever information internally available at the critical moment) when the likelihood is locally not available or very noisy. The Bayesian may provide an overarching framework to more explicitly formalize the postdictive phenomena across the wide range of time scale (from sensation, to perception to cognition).

The idea concerning "compensation of a neural delay by extrapolation" in the flash lag (Nijhawan, 1994) may also be considered in this framework, where expectation or prediction (or a "set" in a higher cognitive term) is implemented in the internal state (as suggested in Berkes et al. (2011) and de Lange et al. (2013), for example).

As for the big picture, more complex brains have more reentrant connections, thus enabling Bayesian-like complex decisions, postdictive reconstructions, and possibly "awareness."

## **HINDSIGHT BIAS, AND COGNITIVE CONSISTENCY**

As mentioned above, there is a rich source of evidence of cognitive reorganization for consistency, information compression, and ease of recall. In the social science literature, a similar effect is known as "hindsight bias." Hindsight bias is the tendency to retrospectively think of outcomes as being more foreseeable than they actually were. It is a robust judgment bias and is difficult to correct (or "debias"). It has been demonstrated in historical events as well. For example, people retrospectively overestimated how well they could predict the restoration of US-China relations during the period of Nixon's surprise visit to China, as opposed to their actual predictions during the visit. So this is a cognitive postdiction phenomena in a large time scale, where people tend to implicitly "revise" their memory on prediction in the past under the influence of the outcome. (The study on athelete's "sixth sense" which will be described in the next sub-section is also in the same format).

Hindsight bias may explain the cognitive gap between those who are accused vs. those who accuse them in a medical law suit or after a more large-scale disaster such as a nuclear plant accident, because the accusers accuse the accuses always based on their retrospective, thus postdictive, estimation of how much prediction was possible on the disastrous outcome, only after it occurred. The author and his colleagues became interested in a situation in which one who was informed regarding a problem situation tended to overestimate how much an uninformed could perform a perceptual task. In the experiments, we used a visual paradigm in which performers decided whether blurred photos contained humans, while the image was progressively made sharper (**Figure 9I**; Wu et al., 2012). Evaluators, who saw the photos unblurred (visually primed) or verbally primed thus knew the correct answer (a human present/absent), estimated the proportion of participants who would guess whether a human was present at a given degree of defocus. The evaluators exhibited visual hindsight bias, i.e., an overestimation of judgment performance by the uninformed participants (the data not shown; Wu et al., 2012), but only with a visual priming, not with a verbal priming. It can be again considered a form of cognitive postdictive bias because the known answer (presence or absence of a human) substantially affects the estimation of the task difficulty *before* knowing the correct answer (although in this case the estimation was on some others' performance, not the informed own). The data qualitatively and structurally matched earlier data on judgments of historical events surprisingly closely. Using eye tracking, we further showed that a higher correlation between the gaze patterns of performers and evaluators (shared attention; as indicated in the heat map in **Figure 9IIa**) is associated with lower hindsight bias in the stimuli with humans (**Figure 9IIb**). This association was validated by a causal method for debiasing: showing the gaze patterns of the performers to the evaluators as they viewed the stimuli progressively reduced the extent of hindsight bias, as indicated in two different measures of performance change (**Figure 9III**).

The study suggests that task difficulty/performance is often reconstructed retrospectively. The exact neural mechanism underlying such long-term cognitive hindsight bias would be different from that which underlies perceptual backward phenomena on the microscopic time scale. Nonetheless, the similarity in the results between these types of visual and the historical tasks indicate, at a functional level, that they may reflect a general intrinsic tendency of the brain to learn from experiences but exclusively in the cognitive format of "cause and effect" such that it can be used for adaptive predictions in the future.

## **AN ATHLETE'S "SIXTH SENSE"?**

For a further investigation of this type of postdiction, i.e., the re-construction of events into a cause-effect format in a more controlled way, and how such an automatic tendency overcomes the natural tendency to be consistent with one's own past decisions, we decided to examine athletes' "sixth sense" as to how well they predict they would do in the next game/match. Athletes in various sports, including top professionals and amateurs, often claim that they can tell whether they will be a hero or not in the next game/match, but is that a real prediction based on some implicit self-assessment of one's physiological and mental conditions, or simply a postdictive construction (which can occur when the question is raised only after the game or match)? We asked over 100 college and high school athletes in a variety of sports [volleyball, soccer, basketball, and Kendo (Japanese fencing)] to fill out a questionnaire in the morning before an actual game or match later in the day (Kadota et al., 2009). The question of our interest was embedded in other ordinary questions about their mental and physical conditions, their teamwork, etc., asking "How do you think you will perform today?" (Prediction). We then repeated a similar set of questions, including another question of our interest, "How did you think you would perform this morning?" (Postdictively-reconstructed prediction).

Virtually the same question was repeated within the same day within subjects, thus it should have been easy for the participants to notice their own inconsistency. Nonetheless, more than a half of the athletes who participated changed their prediction in the postdictively reconstructed case. Moreover, those who lost tended to make their changed predictions more negative, whereas those who won tended to make them more positive. The tendency of interaction was highly significant (*p* < 0.005). On the other hand, neither the predictions before the game, nor answers to other questions (such as mental and physical conditions), nor physiological measures (such as body temperature, hear rate, and blood pressure) did accurately predicted the performance, according to a path analysis performed later. The overall pattern of the results went against the natural tendency to be consistent when answering the postdictive question with the memory of the predictive, strongly indicating a robust tendency at an automatic and implicit level, of postdictive reconstruction to be consistent with the actual outcome. Such automatic and implicit characteristics thus hold generally, from the sensory to the more cognitive levels.

## **REAL-WORLD IMPLICATIONS**

These studies described above have obvious social-scientific implications because the hindsight bias can be a cause of various sorts of conflict in employer-employee relationship, sports, medial lawsuits, and even international affairs. It may even cast a doubt on some "scientific" studies in other fields. For example, millions of dollars of federal science budget have been spent in China, Japan and various European countries to explore the possibility of "predicting" massive earthquakes from certain "signs." An intuitive part of the motivation behind this came from anecdotal reports of observations such as abnormal animal behavior or natural phenomena (such as unusual shapes of clouds or an extra bright sunset, etc.) as a possible precursor to the disaster. The fundamental problem with these reports, needless to say,

just had to do this task, whereas the evaluators had to evaluate "proportion of performers said human (present)" with a visual, or a verbal priming. **(II)** A comparison of eye movement patterns and task performance. (a) The top raw shows an example of "low correlation" between the performer's and the evaluator's gaze patterns as heat maps, with the stimulus and original (clear) photo image. The bottom raw shows an example of "high correlation" stimulus. (b) Median hindsight biases plotted for each conditions (stimuli with/without

is that those episodes were collected only after the earthquake with no exception, making them highly susceptible to postdictive biases. Formalistically, the conditional probability of such a large earthquake to occur, given such an "unnatural" sign reported in a *post-hoc* fashion, should be compared with a conditional probability calculated via daily (prior) observations; that is, given a predesignated unnatural sign in one morning, what was the chance that a major earthquake would occur later on that day (or a predefined short time period). The latter type of data would be very difficult practically to collect (because it would require enormous amount of time and resources), and perhaps may never exist.

Represents statistical significance between xxx (*p* < xx). **(III)** How much debiasing effects were obtained are shown either with (left) or without (right) gaze pattern information of the performers. Two different quantitative measures (-RMSE and linear bias) of bias produced similar results. Black bars denote stimuli with humans, whereas gray bars denote stimuli without. "∗" Represents statistical significance from zero (*p* < 0.05) (reproduced from Wu et al., 2012, Figures 1, 3 and 4, with permission from Assoc. Psychol Sci.)

What did we learn thus far in this review? First, there are various cases in the perceptual domain in which a conscious percept is based on some integration process in a limited temporal time window (of approximately 100–200 ms), within which a stimulus presented later can seemingly affect causally how the subsequent stimulus is perceived. Second, conscious perception can thus be equated to a sort of "ultra-short-term" (iconic) memory, except that against the classical concept of a passive, faithfully duplicated but fainting copy of the original input, this process should be considered to be a very dynamic reconstruction from a sequence of sensory inputs. Third, there are several prototypical mechanisms conceivable, such as the "catch up," the "reentry," the "different pathway," and the "memory revision" models, each of which has reasonable behavioral/neural evidence behind it. Fourth, the structurally similar postdictive reconstruction seems to occur as well on a much larger temporal scale, in the domain of retrospective causality attribution and the postdictive reconstruction of a prediction, which may characterize complex brains.

Regarding the last remark, we have used the term "reconstruction" repeatedly, but it is not meant to imply the repeated experience of the conscious percept itself. Instead, the reconstruction process may be postulated in the following way. In the first implicit stage, there may be a faithful representation of a physical event sequence at earlier implicit levels of information processing. It is only at the later levels, a downstream of the information processing or along a different pathway, where a conscious percept is constructed (for the first time) such that it is more consistent with a context including the subsequent stimuli and a causal framework of cognition.

This last point should be taken seriously, as it implies both the presence of an implicit, automatic predictive process, as well as a reconstructive, postdictive process for conscious perception.

## **LIBET'S CLAIMS, AND THE "FREE WILL" ENDANGERED?**

Benjamin Libet made several important observations and claims which are highly relevant to the central thesis of the current paper, i.e., postdiction (Libet, 2004).

The first of these involves his simple observations with a train of electric pulses to stimulate the somatosensory cortex of the human patients. He observed that a sensation generated by a weak electric pulse (just above the threshold) can be suppressed "backwards" by a train of pulses applied with a 200–500 ms delay. If the initial stimulus is repeated within a several-second interval however, a cutaneous sensation is rather facilitated by the same subsequent train pulses with the same 200–500 ms delay. The relevance of the observations is obvious because these are considered to be another example of postdiction, but this is more related to the TMS example above (section TMS-induced Scotoma, and Backward Filling-in) because in both cases, a direct neural intervention causally affects the percept of a stimulus presented earlier (although the former case is in vision, while this is in the cutaneous modality).

Second, in the same setup with direct current stimulation, he claimed that some implicit neural process precedes conscious perception, yet the onset of the conscious percept is perceptually "referred backwards" to the stimulus onset. He also pointed out that the first peak of the evoked potential recorded from the somatosensory cortex is a good candidate for the time marker, to which the backward referral occurs (**Figure 10I**).

Along this line, Nishida and Johnston (2002) recently reexamined Moutoussis and Zeki's observation (1997) of the asynchrony of color and motion percepts, arguing that the perceived timing of a sensory event should be strictly distinguished from the objective, physical timing of its neural correlates. To be more specific, they argue that even if the critical neural process of a visual attribute (say, color) is faster than another attribute (motion), it does not necessarily require that the former (color) appears earlier than the latter (motion) in the perceived sequential order. It is because the perceived sequence is the content of the percept in

**FIGURE 10 | Benjamin Libet's findings on postdictive process, and backward referral. (I)** Time marker for the backward referral. The first peak of the evoked electric response from the primary somatosensory cortex is quick, temporarily locked to the stimulus onset, and present even when the stimulus is below the sensory threshold. Thus, it could be a good candidate of the time marker, to which the backward referral of the sensation caused by the sustained cortical activity occur. (Modified from Libet, 2004.) **(II)** Libet's functional account of the backward referral in the real world. The figure illustrates time sequences of external and mental events in 500 ms or so. When a driver hit the break because (s)he saw a small boy running into the road ahead of his/her car, his/her conscious report of the event sequence would be exactly in this order (as illustrated in red in the top raw). However, what actually happened with regard to the implicit and explicit levels of his/her mind would be different. It was rather likely that his implicit sensory-motor pathway had triggered the brake immediately (within 150 ms or so; as indicated by the gray dashed arrow), even before he was consciously aware of the presence and the content of the sudden object, i.e. the boy (as indicated at the top right). According to Libet, this scenario is well supported by a variety of laboratory evidence indicating presence of implicit and fast sensory-motor pathways. Thus, the backward referral process put the sequences of events into concise, cognitive frameworks such as causality and "intention of action." (Modified from Libet, 2004.)

the Mind Time (in **Figure 6**), whereas the neural event sequence is in the Brain time. This critical distinction logically allows a room for Libet-type backward referral, and resolves its seemingly paradoxical (or even "anti-scientific" to some) appearance. By the same token, it effectively eliminates a "homunculus," a mysterious Brain-Mind enigma who is sitting at the "brain center" to judge whether event A (color) or B (motion) occurs first. The same may apply to other postdictive phenomena, especially in the sensory-perceptual domain within 100 or 200 ms.

Libet's third claim concerns action. His findings on "preparatory potentials" suggest that there is specific neural activity that precedes and causally determines the execution of an action, in the order of several hundreds of millisecond. He also developed his own unique psychophysical paradigm in which a participant evaluated the timing of the onset of a conscious intention toward an action preceding its execution. Overall, he argued that the neural activity precedes and causes both the intention and the execution of action.

Why do we need such a complex process as backward referral? Moreover, how could we integrate his three claims into a general framework? Libet offers a functional account. He uses a real world example. The **Figure 10II** illustrates time sequences of external and mental events occurring within approximately 500 ms or so. When a driver hits the breaks because he sees a small boy running into the road ahead of his/her car, his conscious report of the event sequence would be precisely in this order (as labeled in red). However, what actually happened with regard to the implicit and explicit levels of his/her mind would be different. It was rather likely that his implicit sensory-motor pathway had triggered the brake immediately, even before he was consciously aware of the sudden appearance of the boy (as indicated at the bottom as External time line). This is akin to the other realworld example that we used earlier (section Underlying Neural Mechanisms?) of the 100 m sprinters who occasionally report their starting movements even before their conscious awareness of the sound.

According to Libet, this scenario is well supported by a variety of evidence indicating the presence of implicit and fast sensorymotor pathways. Thus, the backward referral process puts the sequences of events into concise, cognitive frameworks such as causality and "intention of action."

Libet's claims generally injected some controversy into theories of the philosophy of mind and neuro-philosophy because it may (or not) endanger what is termed "free will," which is to some the critical basis of legal responsibility in a democratic social system. Apparently, Libet himself suffered from this problem to put substantial efforts to rescue "freedom" from the implications of his own findings, relying on a concept of "vetoeing" of own intention (Libet, 2004), but it did not seem to be very successful. A different insight to resolve this difficulty actually comes by integrating his claims above, i.e., the implicit neural correlates preceding a conscious percept, and the backward referral of its perceived timing. Note especially that the backward referral may be considered an implicit, automatic (stimulus-driven) process of causal attribution. (Thus, causal attribution may not be always a higher level, cognitive process).

Finally, a remark on terminology may be necessary here to avoid a confusion. Throughout the current article, the term "postdictive phenomenon" is used strictly in the operational sense, as repeatedly mentioned above. However, the term "postdiction" sometimes refers to the "reentry" model or the Libet-type backward referral as the underlying mechanisms. One may want to make a clear distinction between these two usages.

The next section will be devoted to expounding the details of this idea of the (generalized) backward referral. Although it may seem to substantially exceed the specific scope of this paper, the author feels that this is necessary for a full understanding of the broad impacts of the findings discussed here.

## **"SENSE OF AGENCY" AS POSTDICTIVE ATTRIBUTION AND AN AUTHENTIC ILLUSION**

Based on the review and the discussion thus far, we have at least three lines of reasoning with which to believe in the compatibility of neuroscientific determinism and the spontaneity/volition of the human action. We will now examine them one after one.

## 1. *The feeling of free choice may live in the postdictive process, not in the predictive process.*

The overwhelming majority of studies on perception, choice decision making and action have focused on the neural mechanisms that precedes and causally determines an action. However, there is a good possibility that psychological/neural processes after the decision may significantly contribute to determining whether a completed decision is felt as forced or more spontaneous/voluntary. Cognitive dissonance (Festinger, 1957), causal attribution (Heider, 1958), and choice justification (Staw, 1976) are some of the keywords in the social psychology literature which are potentially related. To state this simply, the sense of agency (or a feeling of free choice in a given situation) may well play out as a postdictive construct. This may be structurally similarly to the case of conscious perception, as in that case as well, a percept can be confirmed as "conscious" only when it is consolidated and reviewed (typically in a response to a question on the event). The challenging task for neuroscientists to account for the neural mechanism underlying the feeling of agency and "freedom" (and likely visual awareness as well) may not be accomplished until they shift their attention from the predictive process to the postdictive process.

2. *The feeling of free choice is a matter of content in perception/cognition. It should be distinguished strictly from the deterministic nature of the neural correlates.* For example, a content of "red" color perception is possible even though the neurons or the neuronal activity underlying that perception itself is in no physical sense colored red. When a part of somatosensory cortex is activated, the pain is not felt there, but rather felt at the "referred" body part. Likewise, the feeling of free choice as a content of perception/cognition can be conceivable as a result of strictly deterministic neuro-physiological sequence (in the Brain Time). This is analogous to the failure of one-toone mapping in the temporal domain between the perceived sequence of two events and the underlying and corresponding neural events (section Underlying Neural Mechanisms? and Libet's Claims, and the "free will" Endangered?).

As we noted, the perceived timing of an event should be considered separately from the physical timing of its neural correlates, particularly on the microscopic time scale (Nishida and Johnston, 2002). Thus, what is termed the "first-order isomorphism" may not hold between the perceived sequence and the physiological sequences of their neural correlates.

Köhler's psychophysical isomorphism assumed that an organized structure of percept (such as relative sizes) has a direct counterpart in a common structure (relative sizes) of the dynamic neural field in the brain (Köhler, 1940). He used figural aftereffect as an example in the space domain. His claim has been criticized as being "too literally isomorphic," and is thus sometimes called the "first-order isomorphism." At present, neuroscientists do not believe that the neural correlates of "a figure A being perceived as larger than another figure B" should be "the neural circuit encoding A being spatially more extending than that encoding B." Indeed, there are notable exceptions in which a larger stimulus naturally activate a larger cortical area (Murray et al., 2006; Schwarzkopf et al., 2011), but it is very limited to the early visual cortices where a strict retinotopic mapping is maintained.

Because the skepticism on the first-order isomorphism is already a commonsense notion in the field, it is rather puzzling that the majority of scientists and philosophers still believe in such a first-order (direct) isomorphism in the time domain, between the temporal sequence of neural correlates and the time/sequence perception as the contents, especially on the microscopic scale.

Similarly, a cognitive content (a feeling of agency, spontaneity or volition) can be considered separately from its neural correlates of it. To be more specific, a neural process may causally determine that a given action is felt voluntary or not (as the cognitive content), whereas that neural process remains to be entirely deterministic. This inevitably argues for involvement of postdictive and possibly semantic functions carried by the neural mechanisms subserving the higher-order perceptual experiences, with transformations of reality and illusions being typical for this symbolic level. Note also that being stochastic is categorically different from being voluntary; hence, the author would not endorse to some attempts to rely on the stochastic/undeterministic properties of neural dynamics to save free will and consciousness from determinism.

3. *A feeling of free choice is very much like a perceptual illusion, in that it will not be eliminated by objective knowledge*.

Not all types of non-veridical perception are considered perceptual illusions in the "authentic" sense of psychophysics. For example, various sensory and cognitive hallucinations in the schizophrenia should not be considered perceptual illusions. Other than the fact that a percept is non-veridical with regard to the pertinent physical properties, a perceptual illusion should satisfy the following criteria, traditionally.


Just as with perceptual illusions, the feeling of "agency" or "free choice" is unlikely to be "exorcised" by scientific knowledge of the underlying neural mechanisms (although actually no empirical data are available). This is similar to color perception in that the subjective color experiences (as some want to call "qualia") would not disappear (as everyone's intuition tells) when color perception is fully *explained out* in neurophysiological terms, starting from photoreceptors, retinal ganglion cells, the LGN, through to the primary visual cortex, etc. And this is true even though color perception is also in a sense an "authentic illusion" because colors do not exist in the world, they are rather created by interactions between the physical stimuli and the brain. Likewise, the feeling of agency/free choice can be regarded as one type of robust, healthy and authentic illusion, for most of which not many people are concerned about the degree of compatibility to scientific determinism.

One may consider this view just as a variation of the "free will as a cognitive illusion" view proposed by Daniel Wegner and his colleagues (Wegner and Wheatley, 1999; Wegner, 2002). According to their view, people can experience conscious will quite independent of any actual causal connection between their thoughts and actions. The impression that a thought has caused an action rests on a causal inference.

Thus at a very crude level, the postdictive construction view shares a lot with Wegner's view of free will as a sort of cognitive illusion. Yet, there are several distinctive differences that would be noteworthy. Wegner's view has an obvious implication that free will is "an illusion, therefore wrong" with regard to the "true" physical causation. For instance, They make an analogy of the free will to a magic, in that there are real, and "disguised" causal relationship. The experience of conscious will in their view is merely an illusion produced by the perception of an apparent causal sequence. Apparent mental causation is generated by an interpretive process that is fundamentally separate from the mechanistic process of real causation.

Whereas we agreed that the free will (together with the sense of agency) is a mental construction, we take a somewhat different view. The free will reflects a normal function of the very general processing principle in the brain, i.e., postdictive construction employing the re-entry, the backward referral and possibly other mechanisms, which then leads to a normal experience of the "sense of agency." In the very same sense as most of the geometric illusions qualify, it should rather be considered an authentic, or valid illusion based on mostly automatic, implicit processes.

Another deviation of our view from Wegner's "illusion" view is related to the three criteria they proposed for the interpretive process to experience free will. Those are (1) priority, (2) consistency, and (3) exclusivity. Among them we would like to impose a substantial constraint on the first criterion, i.e. the priority. As obvious from the detailed examination of various postdictive phenomena in the current article, starting from very sensory to highly cognitive levels, the priority may only be a distinctive feature of *the output* (i.e., the percept) of the processing, *not the physical condition* for it. As a matter of fact, all the three properties above, including the consistency and the exclusivity, may be, at least in some cases, results of postdictive reconstruction.

Whereas the two views are consistent in various aspects, this single contrast (priority vs. postdiction) may highlight the stark distinction.

## **SUMMARY AND GENERAL DISCUSSION**

This paper reviewed "postdictive" perceptual phenomena known, in which a stimulus presented later seems to causally affect percept of another stimulus presented earlier. Starting from some classical examples such as backward masking and apparent motion, the list included the cutaneous rabbit effect and the flash lag effect. Some new studies such as the TMS-triggered scotoma and the pursuit mislocalization suggest that various visual attributes are reorganized in a postdictive fashion to be consistent with each other, or to be consistent in a causality framework.

We then extended our discussion into several directions. First, in terms of underlying mechanisms, four prototypical models have been considered: the "catch up," the "reentry," the "different pathway," and the "memory revision" models. Whereas they are meant to account for the "postdictive" phenomena but only in the operational sense above, the mechanism itself does not have to be postdictive in any sense (perhaps with the exception of the "reentry" model, and the "backward referral" idea by Benjamin Libet).

Second, by extending the list of postdictive phenomena to memory, sensory-motor and higher-level cognition (e.g., "hindsight"), one may note that such postdictive reconstruction may be a general principle of neural computation, ranged from milliseconds to months of time scale, from local neuronal interactions to long-range connectivity, in the complex brain. The operational definition of the "postdictive phenomenon" can be applicable to such sensory/cognitive effects across a wide range of time scale, even though the underlying neural implementations may vary across the variety of phenomena.

This notion of generic postdiction has a good affinity with the Bayesian framework, as well as the notion that perceptual awareness is in fact a very brief (possibly iconic) memory. As obvious in the case of a flicker perception previously mentioned (section Underlying Neural Mechanisms?), it is hard to draw a line between conscious perception and memory. And this is where a postdictive process operates on the preceding implicit process to yield a conscious visual percept.

Finally, this structurally the same mechanism may apply to body movements and its attribution to "free will." The "sense of agency" which is the basis of "free will" may be considered a sort of "authentic illusion" which may hardly evaporate merely by reductionistic neural account for it.

Closer examinations of the postdictive phenomena may provide an entirely new and insightful framework to understand perception, cognition, memory and action. Moreover, it may add a new angle in the discussion of implicit vs. explicit mental processes, determinism vs. free will, etc.

## **ACKNOWLEDGMENTS**

The author has been supported by NIH, NSF, HFSP, JST.ERATO, JST.CREST, and Tamagawa-Caltech gCOE (MEXT). He also thanks many colleagues including Kaoru Amano, Jaeseung Jeong, Roman Weber, Yong Jun Lin, Robert Tilford, the editors of this special issue, and the two reviewers for their valuable comments.

## **REFERENCES**


Interplay between Conscious and Unconscious Perceptual Streams." *Curr. Biol.* 19, 2003–2007. doi: 10.1016/j.cub.2009.10.017


Wegner, D. M. (2002). The Illusion of Conscious Will. Cambridge, MA: MIT Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 June 2013; accepted: 20 February 2014; published online: 31 March 2014. Citation: Shimojo S (2014) Postdiction: its implications on visual awareness, hindsight, and sense of agency. Front. Psychol. 5:196. doi: 10.3389/fpsyg.2014.00196*

*This article was submitted to Consciousness Research, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Shimojo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

## OPEN ACCESS

Articles are free to read, for greatest visibility

## TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

## COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org