# **THE ROLE OF BODY AND ENVIRONMENT IN COGNITION**

**Topic Editors Dermot Lynott, Louise Connell and Judith Holler**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-262-5 **DOI** 10.3389/978-2-88919-262-5

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **THE ROLE OF BODY AND ENVIRONMENT IN COGNITION**

Topic Editors:

**Dermot Lynott,** Lancaster University, United Kingdom **Louise Connell,** Lancaster University, United Kingdom **Judith Holler,** Max Planck Institute for Psycholinguistics, Netherlands

"Sunshine through my body" (https://www.flickr.com/photos/tfjensen/8089729228) by Thomas Frost Jensen, used under Creative Commons Attribution 4.0 (http://creativecommons.org/licenses/by/4.0) / Cropped from original

Recent evidence has shown many ways in which our bodies and the environment influence cognition.

In this Research Topic we aim to develop our understanding of cognition by considering the diverse and dynamic relationship between the language we use, our bodily perceptions, and our actions and interactions in the broader environment. There are already many empirical effects illustrating the continuity of mindbody-environment: manipulating body posture influences diverse areas such as mood, hormonal responses, and perception of risk; directing attention to a particular sensory modality can affect language processing, signal detection, and memory performance; placing implicit cues in the environment can impact upon social behaviours, moral judgements, and economic decision making.

This Research Topic includes papers that explore the question of how our

bodies and the environment influence cognition, such as how we mentally represent the world around us, understand language, reason about abstract concepts, make judgements and decisions, and interact with objects and other people.

Contributions focus on empirical, theoretical, methodological or modelling issues as well as opinion pieces or contrasting perspectives.

Topic areas include, perception and action, social cognition, emotion, language processing, modality-specific representations, spatial representations, gesture, atypical embodiment, perceptual simulation, cognitive modelling and perspectives on the future of embodiment.

# Table of Contents




Mariska E. Kret, Jeroen J. Stekelenburg, Karin Roelofs and Beatrice de Gelder

*189 The Role of the Environment in Eliciting Phantom-Like Sensations in Non-Amputees*

Elizabeth Lewis, Donna M. Lloyd and Martin J. Farrell

*199 Training of Manual Actions Improves Language Understanding of Semantically Related Action Sentences*

Matteo Locatelli, Roberto Gatti and Marco Tettamanti

*209 Neurological Evidence Linguistic Processes Precede Perceptual Simulation in Conceptual Processing*

Max Louwerse and Sterling Hutchinson

*220 Using Actions to Enhance Memory: Effects of Enactment, Gestures, and Exercise on Human Memory*

Christopher R. Madan and Anthony Singhal


Barbara Tomasino and Raffaella Ida Rumiati


#### *Dermot Lynott <sup>1</sup> \*, Louise Connell <sup>2</sup> and Judith Holler 3,4*

*<sup>1</sup> Embodied Cognition Lab, Decision and Cognitive Sciences Research Centre, Manchester Business School, University of Manchester, Manchester, UK*

*<sup>2</sup> Embodied Cognition Lab, School of Psychological Sciences, University of Manchester, Manchester, UK*

*<sup>3</sup> Language and Cognition Department, Max Planck Institute for Psycholinguistics, Nijmegen, Nijmegen, Netherlands*

*<sup>4</sup> School of Psychological Sciences, University of Manchester, Manchester, UK*

*\*correspondence: d.lynott@lancaster.ac.uk*

#### *Edited by:*

*Eddy J. Davelaar, Birkbeck College, UK*

In this Research Topic, we aimed to develop our understanding of cognition by considering the diverse and dynamic relationship between the language we use, our bodily perceptions, and our actions and interactions in the broader environment. We received twenty-six articles that take very different approaches to exploring the question of how our bodies and the environment influence cognition.

Several papers examine how perceptual concepts are developed and accessed. Gainotti (2012) reviews evidence from cognitive neuropsychology and proposes that different types of concepts differentially rely on sensorimotor experience, with somatosensory and movement information playing a major role in artifact representations and visual and other perceptual information playing a major role in the representation of living things. Krause et al. (2013) find an interference effect between fingers and numbers in a numerosity comparison task and suggest that it emerges from an embodied representation of number based on a shared metric for symbolic and tactile numerosities. Since perceptual stimulation sometimes interferes with and sometimes facilitates other conceptual processing Connell and Lynott (2012), review recent findings and propose that these differences arise due to the attentional demands on modality-specific processing. Two groups use event-related potentials to examine how perceptual information is accessed in conceptual tasks. Hald et al. (2013) find evidence for modality-specific grounded representations when processing negated sentences, and demonstrate differential modulation of the N400 according to whether or not a true vs. false sentence involves modality switching. Louwerse and Hutchinson (2012) show that different tasks rely on linguistic vs. perceptual information to different extents, with activation in linguistic cortical regions preceding activation in perceptual cortical regions when both types of processing were associated with the task.

As well as perceptual information, motor information relating to action concepts was also a central topic. In a review of behavioral and neuroimaging work on semantics across different domains (e.g., concrete/abstract words, numbers, and arithmetic), Hauk and Tschentscher (2013) argue that the specific function of sensorimotor areas in processing meaning remains unclear, and suggest that only by employing a combination of methods can causal underpinnings be deduced. However, in their review, Tomasino and Rumiati (2013) contend that the strategy a participant employs in a task is more important than the nature of the stimulus in determining whether motor simulations will be activated and support the view that the motor system is implicated in—but not necessary to—semantic processing. Locatelli et al. (2012) provide evidence for the role of motor experience in motor semantics by demonstrating that action experience in the form of manual dexterity training facilitated subsequent performance in judging sentence-picture pairs that were related to the previously-learned actions. Motor semantics also depend on the time at which an action is described as taking place. Anderson et al. (2013) found that changing the grammatical aspect of action verbs (e.g., *walking* vs. *walked*) caused people to represent events at different levels of detail according to whether event descriptions were set in the recent or distant past.

Perception and action, of course, interact. In a novel use of a Wii balance board, Haazebroek et al. (2013) asked people to imagine they were on either a snowboard or skis and found that this imagined difference mediated a Simon effect, which they subsequently simulated in the HiTEC connectionist model, and suggest a tight coupling exists between perception/action and higher-level cognition. Action execution is also affected by what one knows about a target object: Asai et al. (2012) showed that the knowledge of whether a ball weighed 1kg (vs. 130 g) caused participants to raise their arms above the horizontal in response to an image of a hand holding the ball. They propose that this "heaviness contagion" emerges automatically due to mandatory simulation of others' sensations. Fukui and Inui (2013) demonstrated that whether or not participants could see their own hand when pantomiming a grasp action affected variability and aperture of the executed grasp, and argue that the dorsal stream, as well as the ventral stream, is involved in pantomimed action.

The body and environment interact extensively in spatial cognition. Crollen and Collignon (2012) review how visuallydeprived individuals develop representations of spatial frames of reference and propose that sighted people learn to recode spatial information to an external reference frame (i.e., independent of limb/body position) as opposed to the internal reference frame (i.e., dependent on limb/body position) preferred by those without vision. Johannsen and de Ruiter (2013) observed that people's reference frame selection during scene processing is affected by the realism of the scene, with people more likely to choose an egocentric frame of reference when the background is more realistic. They suggest that greater realism results in easier perceptual simulation and therefore a greater preference for egocentric processing. Two separate articles focused on examining how abstract spatial terms may be grounded in concrete spatial experience. Tower-Richardi et al. (2012) demonstrated a correspondence between abstract absolute frames of reference (e.g., *north*, *east*) and relative body-centered frames of reference (*up, left*): people performed longer hand movements toward relative targets when primed with incongruent absolute terms (e.g., *north* priming *left*). Dijkstra et al. (2012), on the other hand, showed that even metaphorical space is affected by bodily perceptions. In a study using Wii balance boards, they found that when participants unconsciously leaned to the left or right, they attributed more political statements to congruent left-leaning or right-leaning political parties.

Several articles point to the interplay between body and emotion. Havas and Matheson (2013) provide a theoretical perspective on the importance of bodily feedback in the representation of emotions and understanding of emotional language, and argue that bodily states can facilitate the simulation of emotional content during language processing. Kret et al. (2013) demonstrate that emotion recognition depends not only on others' faces, but also on others' bodies. Participants were sensitive to the congruency of emotions expressed by paired bodies and faces, but emotional responses to these stimuli were also mediated by individual differences in anxiety. Furthermore, where previous work has demonstrated that emotional valence judgments (e.g., *right is good*, *left is bad*) are body-specific, Kominsky and Casasanto (2013) showed that such evaluations can also depend on the abilities of other people's bodies when we reason from their perspective.

As well as taking other people's bodies into account, people are also highly sensitive to where other people are looking. Knoeferle and Kreysa (2012) found that listeners rapidly respond to shifts in speaker's gaze in affecting not only their allocation of visual attention, but also their processing of syntactic structures and assignment of thematic roles, even when such information is not central to the task. Additionally, Pfeiffer and colleagues (Pfeiffer et al., 2012) used a novel interactive eye-tracking paradigm to show that both congruency and latency of an interaction partner's gaze behavior influence one's experience of agency, and that shared attention takes longer to establish than joint attention.

While the majority of articles focus on typical embodiment, two contributions focus on examples of atypical embodiment. Eigsti (2013) provides a review of embodiment in autism spectrum disorders (ASD), and suggests that deficits in coordinating motor and conceptual information may result in under-embodiment in individuals with ASD. Lewis et al. (2013) investigated phantom limb experience in non-amputees using a variation on the rubber hand illusion. They found that participants experienced a sense of presence of a "missing" finger, and even described specific sensations (e.g., tingling), suggesting that phantom limb experiences may be an example of overembodiment where peripersonal perception is folded into body representations.

Finally, a number of contributions consider future directions for the field of embodied cognition. Madan and Singhal (2012) ask the question that, if the body affects cognition, could exercising the body enhance cognition? They draw on diverse literature including work on gesture, memory, and physical exercise, and suggest that a much more integrative approach is needed to determine how movement and exercise may boost cognitive performance. Willems and Francken (2012) contend that, while there is good general support for theories of embodied cognition, too often underspecified theories can generate opposing predictions for the same phenomenon. As such, embodied theories should be capable of providing more specific hypotheses to elucidate exactly when and how the body and environment affect cognition. Wilson and Golonka (2013), however, suggest that body and environment are constantly affecting cognition. They consider whether mental representations are at all necessary to cognitive function in their support of the replacement hypothesis of cognition, which puts the focus firmly on the interaction between an organism and the rich and varied information provided by the environment.

In highlighting the diversity of perspectives and approaches current in embodied cognition research, these articles paint a picture of a field that has matured significantly in recent years. We hope this Research Topic opens up new avenues and challenges for future work on the interplay between cognition, body, and environment.

### **REFERENCES**


task goals mediate the interplay between perception and action. *Front. Psychol.* 4:247. doi: 10.3389/fpsyg.2013.00247


for symbolic and tactile numerosities. *Front. Psychol.* 4:7. doi: 10.3389/fpsyg.2013.00007


*Front. Psychol.* 3:547. doi: 10.3389/fpsyg.2012.00547


interaction. *Front. Psychol.* 3:537. doi: 10.3389/fpsyg.2012.00537


you think it is. *Front. Psychol.* 4:58. doi: 10.3389/fpsyg.2013.00058

*Received: 24 June 2013; accepted: 03 July 2013; published online: 22 July 2013. Citation: Lynott D, Connell L and Holler J (2013) The role of body and environment in cognition. Front. Psychol. 4:465. doi: 10.3389/fpsyg.2013.00465*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Lynott, Connell and Holler. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Grammatical aspect and temporal distance in motion descriptions

#### *Sarah E. Anderson1 \*, Teenie Matlock2 \* and Michael Spivey2*

*<sup>1</sup> Department of Psychology, University of Cincinnati, Cincinnati, OH, USA*

*<sup>2</sup> Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, USA*

#### *Edited by:*

*Dermot Lynott, University of Manchester, UK*

#### *Reviewed by:*

*Carol Madden, Stem Cell and Brain Research Institute, France Zhenguang Cai, University of Plymouth, UK*

#### *\*Correspondence:*

*Sarah E. Anderson, Department of Psychology, University of Cincinnati, Suite 4130 Edwards 1, Cincinnati, OH 45221-0376, USA e-mail: sec57@cornell.edu; Teenie Matlock, Department of Cognitive and Information Sciences, School of Social Sciences, Humanities, and Arts University of California, Merced, 5200 North Lake Road, Merced, CA 95343, USA e-mail: tmatlock@ucmerced.edu*

Grammatical aspect is known to shape event understanding. However, little is known about how it interacts with other important temporal information, such as recent and distant past. The current work uses computer-mouse tracking (Spivey et al., 2005) to explore the interaction of aspect and temporal context. Participants in our experiment listened to past motion event descriptions that varied according to aspect (simple past, past progressive) and temporal distance (recent past, distant past) while viewing scenes with paths and implied destinations. Participants used a computer mouse to place characters into the scene to match event descriptions. Our results indicated that aspect and temporal context interact in interesting ways. When aspect placed emphasis on the ongoing details of the event and the temporal context was recent (thus, making fine details available in memory), this match between conditions elicited smoother and faster computer mouse movements than when conditions mismatched. Likewise, when aspect placed emphasis on the less-detailed end state of the event and temporal context was in the distant past (thus making fine details less available), this match between conditions also elicited smoother and faster computer mouse movements.

#### **Keywords: embodied cognition, mouse-tracking, grammatical aspect, motion verbs**

Everyday conversation is replete with reports of when and how events have occurred in the past. Take the sentences, "Last week David walked to the school," and "Last week David was walking to the school." Both sentences describe past events, but the former, marked with the simple past (verb+ed), focuses on completion of the event, and the latter, marked with past progressive (was verb+ing), on its ongoing nature. How does grammatical information influence our understanding of events, especially the reporting of past events? How does it interact with information about when an event has occurred, specifically, recent past vs. distant past? Here, we use a mouse-tracking task to explore how grammatical aspect and tense interact in perceptual simulations influence the comprehension of event descriptions.

Over the past several decades, linguistics research has significantly advanced our understanding of aspect and of how it works in various languages. One valuable observation is that languages often make a distinction between imperfective and perfective aspect. Simply stated, imperfective emphasizes the ongoing nature of an event, and perfective, the completion of an event (see Comrie, 1976). In some cases, this difference is realized through grammatical information, and in others, lexical information (see Croft, 2012, for discussion). In English, imperfective aspect is realized by using the past progressive verb form, as in *was walking*, and perfective aspect, by using the simple past verb form, as in *walked* (see Brinton, 1988). Another valuable observation from linguistics is that imperfective aspect gives the speaker and listener an internal view of event descriptions, at least more than perfective aspect does (see Langacker, 1987). A statement such as "Yesterday David was chopping wood" gives access to details about the event as it unfolds in time, including for instance, lifting the ax, slamming it into the wood, standing back to cut another piece of wood, lifting the ax again, and a simple past statement such as "Yesterday David chopped wood" focuses on the endpoint of the event or gives a diffuse sense of the entire event. Despite a wealth of information on aspect, including useful insights on crosslinguistic patterns and historical work, relatively little is known about how it is processed in everyday language, including how it influences the interpretation of when and how events occurred in the past.

In recent years, language theorists have begun to explore the role of aspect in processing everyday language. In a series of offline studies, Matlock (2011) found that varying aspectual information in event descriptions leads to consistent differences in how action is conceptualized. In one experiment, participants completed a sentence that began with a past progressive adverbial clause, "When John was walking to school," or a simple past adverbial clause, "When John walked to school." The results showed that participants mentioned more actions when completing sentences with past progressive adverbial clauses. In another experiment, participants read simple transitive sentences that implied a state change in objects, specifically, simple past "John painted houses last summer" or past progressive "John was painting houses last summer," and then estimated how many houses had been painted. Their estimates were reliably higher with the progressive form (e.g., "was painting"). In related work by Matlock (2010) participants read the sentences "Bob planted pine trees along his driveway last week" or "Bob was planting pine trees along his driveway last week," and then estimated the length of the driveway. Their estimates were reliably larger with past progressive. Together these results suggest that the past progressive leads to inferences about more action in a given time period than does the simple past. [For similar work on how aspect can influence attitudes about political candidates and political issues, see Fausey and Matlock (2010)].

Earlier research, specifically, on aspect in narrative comprehension showed compatible results. Madden and Zwann (2003) conducted several experiments that incorporated event descriptions with pictures to investigate how aspectual cues shapes the construction of situation models (see Zwaan and Radvansky, 1998, for discussion of situation models). They were especially interested in perfective and imperfective aspect (corresponding to English simple past and past progressive). In one experiment, participants viewed pictures of events that appeared to have just been completed or in progress, for instance, a car that had just gone through an intersection, or a car going through an intersection. Next they indicated whether sentences such as "The car sped through the intersection" (simple past) or "The car was speeding through the intersection" (past progressive) matched the scene depicted in the picture. Participants were found to be notably faster to read simple past sentences after having viewed depictions of completed events, but about equally fast after to read past progressive sentences after having viewed depictions of intermediate events. These results were consistent with another experiment in which participants read sentences and then made a speeded decision about whether pictures matched. Madden and Zwann (2003) offered various explanations for why there was no difference with the progressive form, including the possibility that readers perceptually simulated the actions at different stages of completion. For instance, people may have thought about the car being in different locations with "The car was speeding through the intersection" (e.g., entering intersection, in the middle of intersection, exiting intersection). Together, their results provided groundbreaking insights on how aspectual cues constrain the construction of situation models. Different aspectual cues were shown to yield real time processing differences with event descriptions (for similar findings, see also Morrow, 1985; Magliano and Schleich, 2000; Ferretti et al., 2007; Madden and Therriault, 2009; Bergen and Wheeler, 2010).

Additional work on the role of aspect in the time course of processing event descriptions has employed eye tracking. A recent study by Huette et al. (2012) used the blank visual world approach to explore how aspect would influence eye movements during the course of comprehending event descriptions without visual input (see Spivey and Geng, 2001, for information on blank visual world approach). In their study, participants listened to short descriptions of actions that included simple past or past progressive verb forms while they simply looked ahead at a blank screen. The results showed fewer eye movements and longer fixations on the blank screen with past progressive descriptions than with simple past descriptions, suggesting that participants conceptualized more action with the past progressive [consistent with Matlock (2011)].

The studies mentioned above resonate to contemporary theories of perception and action, and more specifically, to perceptual simulations. In this general view, it is assumed that cognitive abilities are grounded in sensorimotor experiences and that high-level processes are intimately linked to low-level processes (see Thelen and Smith, 1994; Barsalou, 1999; Zwaan, 2004; Gallese and Lakoff, 2005; Calvo-Merino et al., 2006; Gibbs, 2006). In the realm of language comprehension, nouns, verbs, adjectives, and other lexical information partially reactivates the actual perceptual or motor correlates of those constituents. In particular, comprehending an action description partially reactivates neural correlates associated with performing that action (see Pulvermuller, 2001; Hauk and Pulvermuller, 2004; Hauk et al., 2004), which in turn influences subsequent and current behavioral responses that rely on those same neural substrates (Glenberg and Kaschak, 2002; Boulenger et al., 2006; Nazir et al., 2007).

Given that grammatical aspect influences the way events are conceptualized, it certainly has the potential to influence the way goal-directed motion events are realized in real time. The past progressive form (*was* verb+ing) in English gives an internal perspective that highlights the moment-to-moment unfolding of an event, and the simple past (verb+ed), an external perspective that focuses on the end state of an event or provides a "snapshot" of the whole event (see Comrie, 1976; Langacker, 1987; Madden and Zwann, 2003). In the current work we investigate how these two forms influence the understanding of past events in which a mover traverses a path toward a goal. We use computer-mouse tracking (Spivey et al., 2005) to explore motor output in response to variations in aspect and temporal distance in motion descriptions. Earlier work with this approach discovered that mouse movements that accompanied past progressive motion descriptions resulted in longer durations than did mouse movements that accompanied simple past motion descriptions (Anderson et al., 2008). Here, we extend our approach to investigate aspect (past progressive vs. simple past) in the context of temporal distance (recent vs. distant past). Of interest is how these aspectual cues will play out with recent and distant past contexts. Social psychology research on construal-level theory has shown that temporally distant events are construed as relatively more abstract than temporally close events (Trope and Liberman, 2003, 2010; Liberman and Trope, 2008). We anticipate that events in the distant past may be simulated with a different perceptual character (less detailed, more punctate, with emphasis on the end state) than events in the recent past (more detailed, with more emphasis on the interstitial components of the event as it unfolds).

## **METHOD**

## **PARTICIPANTS**

Sixty-four undergraduates at University of Cincinnati participated for class credit in Introduction to Psychology courses. All were right-handed native speakers of American English.

## **MATERIALS AND PROCEDURES** *Verbal stimuli*

Stimuli included 16 sentences about a person moving along a path. Each sentence had four variants realized by combining timeframe distance and aspect. Each represented an experimental condition, as shown in **Table 1**: recent past simple past; recent past progressive; distant simple past; distant past progressive.


**Table 1 | Examples of target verbal stimuli that accompanied visual scenes.**

All sentences were read by a native speaker of American English and recorded using a Mac-based sound software. Each of the 16 experimental items was spliced to produce each of the four experimental conditions, ensuring that the prosody of the targets was otherwise identical. An additional 15 s of silence was added to the end of each target sentence, allowing us to time lock participants' mouse-movements to the raw time stamp of the sound files. The experimental items were counterbalanced across four presentation lists. Each list contained four instances of each condition, so a participant heard all the target sentence frames, but only one version of each.

## *Visual stimuli*

Corresponding visual scenes were created for each target sentence pair. Each target visual scene consisted of a diagonal path starting halfway up and on the far left side of the screen. The path slanted to the right, terminating at the middle top of the screen. A character was located to the right of the beginning of the path and under the destination, and separated from the scene by a black box that framed the destination and path, as shown in **Figure 1**. Items in the scene were created by hand or taken from clipart and edited in Adobe Photoshop. The only moveable item was the character, which subtended an average of 1.53◦ of visual angle in width by 2.05◦ in height. The destinations were an average of 11.22◦ in width by 4.09◦ in height, and the path itself occupied a square of 8.42◦ in width by 6.11◦ in height. The character was located 14.25◦ from the destination. The stimuli were presented using Macromedia Director MX, and mouse movements were recorded at an average sampling rate of 40 Hz. The display resolution was set to 1024 × 768.

Sixteen filler items were created to keep participants from developing strategies specific to the experimental sentences. Similar to the target sentences, all filler sentences began with a timeframe description. These filler sentences also included past progressive or simple past aspectual information, and conveyed movement (e.g., "Last month, Janet swam in the pool") but not along the path. These were accompanied by 16 filler scenes, which had a short path beginning on the right side of the screen and slanting to the top, center of the screen.

#### **PROCEDURE**

Participants were first asked to make themselves comfortable in front of the computer and allowed to adjust the mouse and mouse-pad to a location that suited them. Participants then read the instructions, which asked them to place the character into

the scene to make the scene match the sentence they heard. After indicating that they understood the task, participants were next presented with two practice trials, followed by the experimental task. At the onset of each trail, participants were presented with the entire visual scene. The sound file began after a 500 ms preview. Also, a "Done" button was present in the bottom left corner of the screen from the beginning of the trail. When participants were finished placing the character in the scene, they clicked on "Done" to move to the next trial. A blank screen with a button in the center labeled "Click here to go on" separated the trials. The entire experiment lasted about 10 min.

## **RESULTS**

Mouse movements were recorded during the grab-click, transferal, and drop-click of the character in all experimental trials. Data from three participants who immediately clicked the "done" button for every trial (and thus, produced no mouse trajectories) were removed from analyses. There were no significant differences in movement onset latencies, suggesting that sentences from the different conditions were approximately equally understandable and acceptable. Above and beyond such simple reaction time measures, computer-mouse tracking is robust for measuring various indices of response and motor dynamics (Spivey et al., 2005). We investigated four of these indices here.

### **DROP LOCATIONS**

First, we investigated the final placement of the character in each scene, precisely, where it was drop-clicked. We examined the x- and y-coordinates of the drop locations separately. In the x-coordinates, there was no significant interaction between aspect and temporal distance, nor was there a main effect of the temporal distance. However, there was a main effect of aspect, with the x-coordinates of the drop locations in response to simple past sentences being farther to the right (or closer to the destination, i.e., location of completed action) than those in response to past progressive sentences, *F(*1*,* <sup>60</sup>*)* = 12*.*47, *p <* 0*.*01. In the y-coordinates, there was no interaction of aspect and temporal distance. There was, however, a main effect of aspect, with the y-coordinates of the drop locations in response to listening to past progressive sentences being lower on the screen (closer to location of ongoing action) than those of the simple past sentences, *F(*1*,* <sup>60</sup>*)* = 10*.*26, *p <* 0*.*01. There was also a main effect of temporal distance, with the y-coordinates of the drop locations in response to listening to recent past descriptions being lower on the screen (again, closer to the location of ongoing action) than those of the distant past descriptions, *F(*1*,* <sup>60</sup>*)* = 4*.*31, *p* = 04, as shown in **Figure 2**.

These data are consistent with our earlier explorations of aspect using mouse-tracking. Specifically, aspect differentially influenced the final placement of the character, with an additive influence of temporal distance. When participants listened to past progressive sentences, they placed characters farther from the destination, or, closer to the location of ongoing action. When they listened to simple past sentences, they placed the character closer to the destination, namely, the location of completed action.

### *Spatial differences*

**Figure 3** shows the average time-normalized trajectories in each of the four conditions. Since the [0, 0] x,y starting position is

near the bottom center of the screen, leftward movements naturally take on negative *x*-values, and upward movements naturally take on positive *y*-values. Panel **(A)** shows the average timenormalized trajectory produced in response to sentences that included simple past recent past; panel **(B)** shows progressive recent past; panel **(C)** shows simple distant past; and panel **(D)** shows progressive distant past. Visual inspection shows differences among these four conditions, especially in the case of panel **(D)**, the past progressive, distant past targets. The averaged trajectory in panel **(D)** stretches leftward to an x-pixel value beyond −130, whereas the other conditions only reach to about −105.

To begin to statistically assess online aspectual differences, we looked at spatial differences between the average trajectories elicited in response to each of our conditions. To determine whether these averaged trajectories significantly diverged from each other, we time-normalized the trajectories and conducted a series of *t*-tests at each of the 101 time-steps. These analyses were conducted separately on the x- and the y-coordinates at each of the 101 time-steps to compare spatial differences across participants and across conditions. To avoid the increased probability of a Type-1 error associated with multiple *t*-tests, and in keeping with Bootstrap simulations of such multiple *t*-tests on mouse trajectories (see Dale et al., 2007), an observed divergence was not considered significant unless differences between the coordinates elicited *p*-values less than 0.05 for at least eight consecutive time-steps.

In the x-coordinates, there was no interaction or main effect of temporal distance. However, there was a main effect of aspect in the x-coordinates between time-steps 45 and 101. The past progressive average trajectory diverged away from the simple past average trajectory and toward the path in the visual display, suggesting that the average past progressive trajectory was closer to the path on the screen, which is the location of the ongoing action. In the y-coordinates, there was no significant interaction, but there was a main effect of aspect between time-steps 54 and 101. Again, we observe the average past progressive trajectory adhered more closely to the path than did simple past average trajectories. There was also a main effect of temporal distance

**13**

from time-steps 69–101, with recent past descriptions adhering more closely to the path than distant past description trajectories. Numerous studies have demonstrated that the continuous movement of a computer-mouse (or continuous movement of a hand) provides a moment-by-moment index of where visual attention is being deployed in the display (e.g., Song and Nakayama, 2006, 2009; Spivey et al., 2010). Therefore, it appears that past progressive sentences may have drawn attention to the location of the ongoing action, namely to the path, and that the simple past sentences may have drawn attention to the location of the completed action. Additively, recent temporal information may also have encouraged greater attention to the path itself.

#### **MOVEMENT DURATIONS**

Finally, we examined movement durations. Movement durations measured the time that it took participants to move the character from its departure position (grab-click) at the to its destination position (drop-click). This measurement was not merely a reaction time because it did not include the movement onset latency. Before we examined the movement durations, individual trials that exceeded 5.5 s (more than 2 standard deviations from the overall mean) were removed (less than 9% of the data). Notably, the average length of the movement trajectories was approximately equal across all conditions. As shown in **Figure 3**, and discussed above, the past progressive condition tended to produce trajectories that extended to an endpoint about 25 pixels further along the x-axis, whereas the simple past condition tended to produce trajectories that extended to an endpoint about 25 pixels further along the y-axis. Therefore, with these trajectories extending about the same overall length, comparing the durations of them is a fair test of the speed and fluidity with which the action took place. Therefore, comparing the durations of these movements is a useful test of the speed and fluidity with which the action took place. Analysis of Variance of movement durations revealed no main effects, but did reveal a significant interaction of temporal information and aspect, *F(*1*,* <sup>60</sup>*)* = 4*.*63, *p <* 0*.*05, as shown in **Figure 4**. When the time frame was distant (i.e., "Last year"), movement durations in response to simple past sentences took less time (*M* = 2240*.*48, *SD* = 652*.*49) than those in response to past progressive sentences (*M* = 2365*.*62, *SD* = 735*.*35). However, when the time frame was recent (i.e., "Yesterday"), the pattern reversed. In that case, movement durations in response to simple past sentences took longer (*M* = 2365*.*86, *SD* = 869*.*65) than those produced in response to past progressive sentences (*M* = 2226*.*34, *SD* = 804*.*11). This interaction could have been driven by a variety of factors. Compatibility of aspect and temporal distance is one possible explanation. The pairing of simple past and distant past could have resulted in relatively quick, smooth movements. The simple past is associated with a snapshot interpretation or prominent end state (and not the ongoing nature) of an event, which is consistent with the distant past, i.e., too "far" to conceptualize in any detailed fashion. Similarly, the past progressive highlights the ongoing nature of an event, which is consistent with recent past, i.e., ongoing nature is highlighted because it has just happened. And pairings that were less compatible, i.e., simple past and recent past or with past progressive and distant past, could have resulted in longer movement durations.

It is important to stress again that these movement durations are not simply reaction time measures. Rather, they reflect time spent *moving* the character, not total time spent responding to the stimulus. Therefore, it could be that while the hand-and-mouse were in the process of executing the placement of the character, these temporal characteristics of the perceptual simulation were spreading out into that motor movement itself. Thus, while the past progressive placed emphasis on the ongoing intermediate

stages of the event (as though it were still happening), the context placed the event in the distant past, resulting in a mismatch that manifested itself as slower movement of the character. Similarly, when the simple past condition induced an emphasis on the static completed state of the event (as though it was in the distant past), but the context placed the event in the recent past, this mismatch again resulted in longer movement durations. Commensurate with earlier investigations of aspect, past progressive processing appears to correspond to diffuse, intermediate stages of an event, and simple past processing, with the end state (Madden and Zwann, 2003).

## **DISCUSSION**

The results reported here provide new insights into how information about grammatical aspect and temporal distance interact to shape perceptual simulations in the understanding of event descriptions. First, in analyzing drop locations, we found that aspectual information (simple past vs. past progressive) influenced where participants placed the character in the scene, with an additive influence of temporal context (distant vs. recent past). When participants heard recent past progressive descriptions, such as "Yesterday David was walking to the university," they placed the character closer to the location of ongoing action (on the path, where the character did the walking) and farther from the destination, than they did when they listened to distant simple past event descriptions, such as "Last year David walked to the university." Second, the spatial differences analysis showed a consistent pattern: past progressive sentences and, additively, recent past temporal information, appeared to draw attention to the location associated with ongoing action, while simple past sentences, to the location associated with completed action. Finally, our movement duration data revealed a full interaction of aspect and temporal context.

The results with final placement data and with spatial differences data show the expected findings, and provide compelling support for the effects of aspect and temporal context, yet the interaction in the movement duration data is not what one might have initially expected. Based on findings with the other measures, the straightforward prediction for the movement duration data would have been for a main effect of aspect (where the past progressive would induce longer and slower movement durations that practically "act out" the emphasis on the ongoingness of the event), and a main effect of temporal distance (where recent past would also induce longer and slower movement durations resulting from the simulated recency and availability of the event and its temporal details). However, instead of finding these two main effects, we observed a surprisingly well-balanced crossover interaction of the two factors. Given the support for perceptual simulations in the other measures, and with previous versions of these sentences, the lack of these two main effects is puzzling, and may be due to the greater complexity of the sentences resulting from adding temporal context. If we had indeed found such a pair of main effects, some concern might have arisen about the comprehension of the stimuli in the distant past context and the simple past condition, e.g., "Last year David walked to the university." Note that in English, the distant simple past has an inherent ambiguity: a distant simple past event can be construed

as iterative (as if to mean, "All last year David regularly walked to the university."), or as a one-time event (as if to mean, "Last year for his first and only time, David walked to the university."). Based on the results of this single experiment alone, it is not possible to determine how participants interpreted some of our distant simple past verbal stimuli. Some the distant simple past items may have been interpreted as iterative. Future research with experiments that include a range of time frames and a variety of verb types will be informative, and help obtain a better picture of how processing unfolds in time.

When the interaction between aspect and temporal distance in the movement duration data is examined on its own, the result suggests a resonance account where linguistic devices that share semantic properties tend to induce smooth, fast, and unhindered processing (not unlike phenomena observed in the actionsentence compatibility effect; Glenberg and Kaschak, 2002). For example, perfective aspect (simple past in English) and a distant past context both tend to mentally package the event as an atomic unit whose emphasis is on the completed end-state, so they are compatible with one another. Thus, when distant past and simple past are paired, the completion of the simple past event description resonates with the distant past description, and, hence, the movement trajectory is fast, smooth, and brief. By contrast, imperfective aspect (past progressive in English) and a recent past context both mentally represent the event as a drawn out process whose intervening temporal details are available and emphasized, so they are compatible with one another. Therefore, when the event is in the recent past, the ongoingness of the past progressive resonates with that temporal description, so again response movements are fast, smooth, and brief. However, when the pairings do *not* resonate with each other, as in either simple past with recent past or past progressive with distant past, the two do not match in the level and type of detail invoked, and consequently, the movement trajectories are not as smooth or fast. In future work, it will be useful and informative to consider how natural or familiar these pairings are, in particular, how frequent they are across a range of contexts. Some forms may occur more often and possibly be more natural to process than others. Statements such as "Last year David was walking to the university" certainly occur in everyday English, but they may be less common than statements such as "Yesterday David was walking to the university." It is possible that naturalness of these pairings influenced our results.

These data add to our understanding of how grammatical aspect influences language comprehension, especially with various types of temporal information. Our results expand previous research on the role of aspect in event descriptions, including investigations with mouse-tracking (Anderson et al., 2008, 2010), narrative comprehension (Magliano and Schleich, 2000; Madden and Zwann, 2003), surveys (Fausey and Matlock, 2010; Matlock, 2011), language production in natural discourse (Matlock et al., 2012), and offline spatial judgment tasks (Matlock et al., 2007). The consistent pattern that emerges from these varied methodologies is that grammatical aspect systematically influences perceptual simulations that drive language comprehension, for instance, enhancing or diminishing certain properties of events.

These results also contribute to research on the linguistic connection between time and space. In particular, they complement previous research on space as a metaphor for time. People often describe time in terms of physical space (Clark, 1973; Traugott, 1978; Alverson, 1994). Importantly, this relationship tends to be asymmetrical: people use space to talk about time far more often than they use time to talk about space (Lakoff and Johnson, 1980, 1999). Even when people are asked to make non-linguistic judgments about time, they recruit spatial metaphors (Cassasanto and Boroditsky, 2007), suggesting that understanding time in terms of space is not simply a matter of linguistic convention. More importantly, in understanding events, people understand naturally think about and communicate about "where" things happen in time relative to the time of reporting, for instance, near past or distant past (Trope and Liberman, 2003, 2010). So, events in the recent past are processed with rich detail, and events of the more distant past are processed with less detail (Liberman and Trope, 2008).

Many questions remain about the processing of grammatical aspect, and certainly there are alternative explanations. For example, past progressive descriptions may somehow be more effortful to comprehend than simple past sentences. People might think about actions in a more engaged, moment-by-moment way with past progressive descriptions than they do with simple past descriptions. Previous research is also unclear on this point. Madden and Zwann (2003), for instance, found that participants took more time to process progressive sentences, possibly because they were more difficult to comprehend. Differences in processing various forms of aspect may also arise because of verb semantics. Careful study of telicity, person, voice, and other semantic dimensions of verb meaning need to be given careful attention in the study of aspect (see Matlock, 2011; Croft, 2012). This could help clarify issues that we were unable to address, including iterative interpretation with sentences, such as "David walked to the university last year." The focus here was on literal translational motion verbs (i.e., verbs that convey contiguous movement from one point in space to another).

Our findings have implications for research on event understanding. They show how subtle differences in aspect alone can systematically influence motion events are conceptualized. They also provide new insights on how aspect influences thought about

## **REFERENCES**


aspect," in *Proceedings of the 30th Annual Conference of the Cognitive Science Society* (Mahwah, NJ: Lawrence Erlbaum Associates), 143–148


events in the near and distant past. They contribute to a growing body of research on how events are conceptualized differently depending on "where" they are relative to the time of reporting (e.g., Trope and Liberman, 2003, 2010; Liberman and Trope, 2008). The work helps expand a new, exciting line of research on how grammatical information can influence construal of events (see, Kaup et al., 2010, for instance, for a study on how German speakers process sentences with adjectives and adjectival passives). Last, our results provide evidence to support cognitive linguists' claims about how grammar has meaning rooted in our embodied experience (Langacker, 1987; Lakoff and Johnson, 1999; Talmy, 2000).

This research resonates with embodied cognition work on perceptual simulation and language understanding (Barsalou, 1999). It is consistent with the methodological advances of Balota and Abrams (1995) by providing new evidence from the temporal dynamics of a response after the it has been initiated, and by demonstrating that the motor system is not a robot-like automaton triggered by completed cognitive processes. Rather, motor processes co-exist with cognitive processes during perceptual/cognitive tasks (e.g., Balota and Abrams, 1995; Gold and Shadlen, 2000; Spivey et al., 2005). This work also aligns with our understanding of how mental models and visual information are coordinated in motor output. Similar to the way understanding spatial events is created and observed through tracking eye movements (Spivey and Geng, 2001; Richardson and Matlock, 2007), this work shows that event understanding varies as a function of changes in aspect and temporal distance. Our results add to the emerging pattern of data that suggest that differences underlying perceptual simulations, resulting in these differences in the dynamics of a motor response, may account for observed processing differences in comprehending sentences that use different aspectual forms. This means that perceptual simulations behave in predictable ways, even when it comes to grammatical aspect.

## **ACKNOWLEDGMENTS**

Many thanks to Meghan Salomon for recording auditory stimuli, and to Editor Dermot Lynott as well as two anonymous reviewers for offering insightful comments that helped us improve the manuscript.

150–158. doi: 10.1016/j.bandl.2009. 07.002


doing? Influence of visual and motor familiarity in action observation. *Curr. Biol.* 16, 1905–1910. doi: 10.1016/j.cub.2006.07.065


future actions when processing sentences that describe a state. *Brain Lang.* 112, 159–166. doi: 10.1016/j.bandl.2009.08.009


*Cognition.* Bradford Books, Cambridge, MA: MIT Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2012; accepted: 23 May 2013; published online: 01 July 2013.*

*Citation: Anderson SE, Matlock T and Spivey M (2013) Grammatical aspect and temporal distance in motion descriptions. Front. Psychol. 4:337. doi: 10.3389/ fpsyg.2013.00337*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Anderson, Matlock and Spivey. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The body knows what it should do: automatic motor compensation for illusory heaviness contagion

## **Tomohisa Asai \*, Eriko Sugimori andYoshihiko Tanno**

Department of Cognitive and Behavioral Science, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Christian Huyck, Middlesex University, UK Matthew R. Longo, Birkbeck, University of London, UK

#### **\*Correspondence:**

Tomohisa Asai, Department of Cognitive and Behavioral Science, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. e-mail: as@beck.c.u-tokyo.ac.jp

We can share various feelings with others just through observation, as if it were an automatic resonance. This connective function between the self and others could promote the facilitation of our social communication; however, it is still unclear as to how it works in terms of self-other representation. In this study, we showed participants a picture of a model holding a ball, which was weighted with sand. We instructed participants to move one of their arms to a horizontal position and hold it immobile.Those participants who knew the actual weight of the ball (1 kg) tended to raise this arm above the horizontal, in response to their expectation of the need to resist the weight of the ball.This compensatory reaction to the illusion of heaviness suggests that our bodily resonance could be mandatory and predictive. We discuss this new behavioral phenomenon in terms of motor simulation or the mirror-neuron system.

**Keywords: body resonance, motor simulation, simulation hypothesis, mirror-neuron system, motor compensation**

## **INTRODUCTION**

When we are watching movies or home videos, we can enjoy the experiences of a character as if we are undergoing these experiences ourselves. A clear example of this might be a situation wherein a character is in pain, or, additionally, some people may strain themselves when watching weight lifting. Simulation theory might explain such automatic responses, that is, observing another person may automatically generate anticipation of the same experience in oneself (e.g., Jeannerod and Pacherie, 2004; Thioux and Keysers, 2010). Action and perception might be fundamentally coupled (James, 1890; Watanabe, 2008); therefore, observers may have the capacity to simulate a variety of different information that is available from others: tactile sensation (Keysers et al., 2004), pain (Singer et al., 2004), emotional state (Platek et al., 2005; Palagi et al., 2009; de Greck et al., 2012), and motor performance (Calvo-Merino et al., 2005; Lahav et al., 2007; Aglioti et al., 2008). These social cognitive functions that allow us to understand what others are experiencing are often broadly referred to as empathy (Decety and Ickes, 2009), and might be underpinned by neural mechanisms, such as the mirror-neuron system (MNS; Iacoboni, 2009).

Among these, the domain of perception and action, which does not involve emotional reactions, is referred to as bodily resonance, motor contagion,motor simulation, automatic imitation, or direct matching (Iacoboni et al., 1999; Blakemore and Frith, 2005; Brass and Heyes, 2005; Schutz-Bosbach and Prinz, 2007; Aglioti et al., 2008; Liepelt et al., 2010). Some studies have suggested that this simulation of others' sensation could be "mental re-enaction," which implies that we simulate according to our own previous experiences (Heyes et al., 2005; Prinz, 2006), because an observer lacking the specific representation of a given feeling may hardly be capable of directly simulating someone experiencing this feeling (that is, correspondence problem; Brass and Heyes, 2005; Singer, 2006). This may be especially true of skilled and complicated actions, such as dancing or piano playing (Calvo-Merino et al., 2005; Lahav et al., 2007), for which specific training is required (Heyes et al., 2005). It seems as though we have the capacity to simulate the action of others if that action is also included in our own repertoire of actions. However, those who have never experienced weight lifting can also simulate the sensations experienced by others undertaking those activities. Therefore, another possibility may be that the simulation is through "predictive encoding or computational interpretation" (Hurley, 2008), and might not be limited to sensations that have already been experienced (Danziger et al., 2009). The interpretation of the actions of others, which are visually identical, but have different contexts, may affect the reactions of observers (Iacoboni et al., 2005), suggesting that we can simulate the actions of others predictively (Blakemore and Frith, 2005) and even estimate background intentions or goals (Liepelt et al., 2008, 2010) as long as those actions are simple (Flanagan and Johansson, 2003; Fogassi et al., 2005). As is obvious, this idea is not contradictory to mental re-enaction theory, because previous experiences could help this predictive computation, especially with regard to skilled actions. A previous study suggested that pro-basketball players, but not big fans of basketball, could predict the future success or failure of the shots of others (Aglioti et al., 2008). Nevertheless, our first hypothesis is that our motor simulation might be realized by predictive encoding of the sensation of others, according to the interpretation of the situation.

These phenomena, wherein we can simulate the sensations of others automatically using our own body, might sound passive and mandatory; therefore, some studies refer to these kinds of illusions as "contagion," in which we feel non-existent pain by observing others, for example, when we see or hear something non-existent in perceptual illusions (Singer et al., 2004; Watanabe, 2008; Palagi et al., 2009). However, whether this simulation might

really be driven mandatorily remains unclear, although some previous studies have suggested that some types of empathy including motor simulation could be driven automatically (Bien et al., 2009; de Greck et al., 2012). In other words, it is a question of whether we ignore the information available from others and inhibit our simulation. This is also essential in terms of the neural mechanism. It is now well established that a neuronal system, named the MNS, exists in both monkeys and humans. During action observation, the neural structures involved in the execution of the observed actions are recruited in the brain of the observer through the MNS, as if that person is the agent of the action (Rizzolatti and Craighero, 2004). If the motor simulation is based on the MNS, which does not distinguish between external (others) and internal (self) action representation, this process should be mandatory. However, a question that often emerges is that why, if this is so, do we not imitate with others all the time (Brass and Heyes, 2005; Pineda, 2008)? Therefore, the MNS probably possesses an inhibitive component, which keeps us from having resonant reactions for everything we see (Brass and Heyes, 2005), because having an automatic process such as this is not always appropriate for effective social behavior (Lee and Tsai, 2010). Therefore, a second hypothesis is that the observation of others would mandatorily affect our own mental state, but that we would simultaneously compensate automatically for this transmitted sensation.

The present study suggests that our motor simulation would be predictive and mandatory, and we attempt to demonstrate this by administering the new illusory phenomenon: heaviness contagion. We showed participants a picture of another person's hand holding what appeared to be a lightweight ball. In reality, the ball was weighted with sand (1 kg). Participants were instructed to hold their arms in a horizontal position and to keep them immobile. We focused on the arm movements of the participants when they observed another person's hand holding a ball. In Experiment 1 (A, B), only the group who knew that the ball was heavy raised their arms above the horizontal in response to their expectation of the need to resist the illusory heaviness, suggesting that the heaviness contagion is predictive and mandatory. In Experiment 2 (A, B), we showed that heaviness contagion is driven by observing others (not objects), and in conditions in which the self (participants) and others are in the same situation (i.e., a similarity effect), suggesting that the heaviness contagion might be a possible expression of motor simulation as well as empathy.

## **GENERAL METHOD**

#### **PARTICIPANTS**

All the participants were right-handed university students (handedness index >8: H.N. handedness inventory (Hatta and Kawakami, 1995), and none of them attended more than one experiment. They were recruited randomly from an introductory psychology class, and written informed consent was obtained from all participants before the experiments were conducted. All participants reported normal or corrected-to-normal vision, hearing, and somatosensation and no neurological abnormalities.

#### **APPARATUS**

The experiments took place in a silent, dim room. In order to display the visual stimuli and conduct the experiment, we used MATLAB (MathWorks, Natick, MA, USA). The visual stimuli were presented on a virtual screen through a head-mounted display (Experiment 1A), white board through a projector (Experiment 1B), or PC display (Experiment 2AB). The hand positions of the participants were recorded during the task by using a wireless mid-space mouse (Experiment 1A), a 3D motioncapture device (Experiment 1B), or a high-speed video camera (Experiment 2AB).

#### **STIMULI**

The visual stimuli consisted of life-sized pictures of a model's hand holding a ball, as shown in **Figure 1**. Some previous studies suggest that personal information (e.g., sex, hand size, mole, skin color, etc.) can affect the degree of empathy that participants feel for others (see General Discussion for detail); therefore, in order to exclude such information, the model wore a blue rubber glove. The weighted ball shown in the visual stimuli (Weighted Ball, Regent Far East, Inc., Ashiya, Japan) weighed 1 kg and was 40 cm round. It appeared to be a normal, lightweight rubber ball; however, it was actually filled with sand to add weight. In some conditions, we also used pictures of a hand without the ball, or showed pictures of the ball placed on objects (a wooden block). The weight stimuli were identically weighted balls. Some participants held the ball in their left hand, which was resting on the table, while others held an identical-looking, but light weight (130 g), ball, from which the sand had been removed.

## **PROCEDURE**

All participants sat in front of the display or screen. Before the experiment began, they received brief training to ensurefamiliarity with the instruments and experimental requirements. In the experiment itself,we instructed each participant to hold their right hand in a horizontal position throughout the trial, which lasted 30−90 s depending on the experiments. We intentionally manipulated the duration of visual stimuli presentation between experiments in order to suggest duration- or timecourse-independence. The arm was first held out straight, to ensure what was felt to be a horizontal

position. When the arm was properly positioned, the visual stimulus appeared. We instructed the participants to remain immobile when the stimulus appeared. We then recorded the height of the hand, if it was raised, throughout the remainder of the trial. After the trial, the participants were asked to lower the hand and relax.

#### **DATA ANALYSIS**

In order to measure the hand position, we translated the row pixel data into Euclidean distance (i.e., mm), and the starting position was set at zero, so that a positive value of hand height meant that the participant's hand was raised from its starting position. These values are useful when observing the time course of the hand movement of the participants. Furthermore, we calculated movement velocity (average hand position displacement per second: mm/s) during the task, for comparison among conditions or groups in each experiment. A positive value of movement velocity meant that the position of the hand was being progressively raised during that period.

#### **ETHICS STATEMENT**

The protocol of the present study was approved by the local ethics committee (The Ethical Committee on Human Experimentation of the Graduate School of Arts and Sciences, The University of Tokyo).

## **EXPERIMENT 1A**

In this experiment, we suggested that automatic predictive compensation would occur in response to a simulated feeling. We hypothesized that participants would raise their hand when observing a person who feels heaviness in the hand because they should predict a compensatory need to adjust to the illusory weight: "heaviness contagion."

## **MATERIAL AND METHODS**

#### **Participants**

Forty university students (26 males and 14 females, mean age = 19.0 years, range = 18−21 years) were randomly divided into four groups: the BB (Ball seen, Ball held), BN (Ball seen, No ball held), NB (No ball seen, Ball held), and NN (No ball seen, No ball held) groups. In this experiment, only the weighted ball (1 kg) was used as a prop. For example, those in the BB group saw a model's hand holding the weighted ball, and held an identically weighted ball in their left hands, whereas those in the NN group saw the model's hand holding nothing, and held no ball themselves. In the BB and NB groups, participants held the ball and were therefore aware of its weight. In the NN and BN group, participants had no information regarding the weight of the ball.

#### **Apparatus**

The head-mounted display device (GVD-510-3D, Shenzhen Oriscape Electronic Co., Ltd., Shenzhen, Guangdong, China) was attached to a chin rest, and the participants looked through a device that displayed the image of a 28˚ visual angle virtual screen. The apparatus was arranged so that it appeared as though the virtual screen was located just beyond the reach of the participants (approximately 60 cm). An eye pad prevented them from seeing their hands, and hand positions were measured every second (1 Hz), using a wireless mid-space mouse (BOMU-W24A/BL, Buffalo, Inc., Nagoya, Japan). This device weighed 135 g and was equipped with a gyroscopic sensor that allowed it to be used in the air.

#### **Procedure**

Each participant sat in front of the chin rest, on which they each placed their chin. Their right arm was held out straight, using the mouse device to ensure a horizontal position. When the arm was properly positioned, the participant clicked the mouse button once. Following a random interval of 1-2 s, to allow for micro-motions caused by clicking the mouse, the visual stimulus appeared on the virtual screen for 45 s. The task requirement was to remain immobile when the stimulus appeared. Each participant completed a single trial.

## **Questionnaire**

After the experiment, participants completed a retrospective questionnaire designed to measure the extent to which they felt as though the hand of the model was their own hand, and therefore actually felt the weight of the ball as presented on the screen. It was explained that the purpose of the questionnaire was to simply provide information regarding impressions of the task, and participants were encouraged to answer freely. It was expected that this instruction would avoid the possibility of an influence of experimenter effects or demand characteristics on responses. The questionnaire consisted of five items, each of which asked for an accuracy rating of a particular statement using a five-point scale. The statements were as follows: (1) It felt as though your hand was weary and numb. (2) It seemed as if the hand on the screen was your own hand. (3) It felt like your hand was moving lower. (4) It seemed as if the ball was in your own hand. (5) Your hand felt the weight of the ball. Participants in NN group had neither seen nor held the ball, so they rated answered for only three statements: Q1, 2, and 3. The topics "resonance with the model's hand" and "a feeling of weight" were included in questions 2, 4, and 5, and questions 1 and 3 respectively.

#### **Results and discussion**

The time courses of the hand position of the participants indicated that only those in the BB group tended to raise their right hand gradually, whereas those in the other groups kept their hand almost immobile (**Figure 2**). A two-way ANOVA (two visual stimuli × two weight stimuli) was conducted to examine the movement velocity of the four groups (**Figure 3**). These analyses demonstrated a significant interaction [*F*(1.39) = 4.88, *p* < 0.05], significant simple main effect of visual stimuli under the ball-held condition [*F*(1.36) = 6.16, *p* < 0.05], and significant simple main effect of weight stimuli under the ball-seen condition [*F*(1.36) = 11.67, *p* < 0.01]. It is suggested that only those participants who saw a model holding the weighted ball and held an identical weighted ball in their left hands raised their right hand.

The results of the questionnaires were then analyzed (**Figure 4**). The NN group did not answer questions 4 and 5; therefore, for statistical analysis we conducted a two-way ANOVA to all five questions for just three groups (five questions × three groups),

omitting the NN group. These results were then analyzed further using Ryan's multi-comparison method (i.e., R-E-G-W's *F* test). These calculations revealed significant main effects for groups: *F*(2.27) = 5.99, *p* < 0.01. Main effects for the questions were also significant: *F*(4.108) = 33.73, *p* < 0.01; however, the interaction was not significant: *F*(8.108) = 0.84, *p* > 0.50. Comparisons among the three groups revealed significant differences between the BB and BN groups, and between the BB and NB groups (*p* < 0.01). With regard to the main effect of the questions, Q3 was most often agreed with, followed by Q1; fewer participants agreed with the other three statements (i.e., Q3 > Q1 > Q2 = Q4 = Q5, *p* < 0.05). These findings suggest that the BB group agreed most strongly with the statements related to the feeling of resonance and then heaviness, although in general, the participants did not agree with the statements related to resonance (Q2, 4, 5) compared to those related to the feeling of heaviness (Q1, 3).

The results of hand movement and the questionnaire showed that the participants in the BB group subjectively felt the weight of the ball most heavily. They could have felt a need to adjust to the perceived weight, since they were given instructions to keep their hand horizontal throughout the trial. In the absence of actual weight, we might have expected their hands to move higher as they attempted to compensate for this illusory weight. The finding that participants in the BB group raised their hands over the course of the trial supports the hypothesis that they were compensating for the subjective sense that they were holding a weighted ball. On the contrary, participants in the BN group, who did not know that the ball in the picture was heavy, did not raise their hand. Though we might assume that this is because the BN group predicted that the ball must be as light as it appeared to be, we conducted an additional experiment to address the limitation revealed by this problem.

## **EXPERIMENT 1B**

In this follow-up experiment, minor changes were made in order to examine the dynamic process of heaviness contagion (i.e., a within-participants procedure) as well as entire arm movements (shoulder, elbow, wrist, and fingertip) for a longer period of time (90 s). Furthermore, the no ball group in the previous experiment was replaced with the light-ball group in the present experiment to control for prediction of the weight of a ball in a picture.

## **MATERIAL AND METHODS Participants**

Eight participants (Five males and three females mean age = 27.8 years, range = 22−44 years) were randomly divided into two groups. Both groups saw a model's hand holding a ball and they also held a visually identical ball in their left hands. We used two balls as the weight stimuli with visually indiscernible differences: one was filled with sand (as in Experiment 1A: 1 kg); the other was not filled with sand (130 g). The first group held a weighted ball (heavy-ball group), whereas the second group held a non-weighted ball (light-ball group). The former group anticipated that the ball in the pictures was heavy, but the latter group anticipated that the ball was light.

## **Apparatus**

We refurbished the apparatus, because the previous apparatus appeared to be unique. We used a virtual screen to exclude external noise (i.e., participants could only see the visual stimuli over a black background) in the previous experiment, expecting the participants to feel a sense of immersion. Furthermore, although the mid-space mouse device, which was used to measure hand movement, was not particularly light in weight (135 g), it might nevertheless produce results. In this experiment, the projector device (WT615J, NEC, Tokyo, Japan) presented the visual stimuli on the white board, located 1 m in front of the participants. We measured hand positions using a 3D motion-capture device. Participants attached four infrared reflection markers to the following body parts: shoulder (Position 1), elbow (Position 2), wrist (Position 3), and tip of the middle finger (Position 4). The 3D position of each marker was recorded using a video-based 3D acquisition system, which, in turn, used two highspeed CCD cameras (Himawari CL33; Library, Tokyo, Japan). The

sampling rate was 100 Hz; we finally down-sampled to 1 Hz using averaging.

## **Procedure**

The visual stimuli were presented in front of each participant as they were seated, and they corresponded spatially to each participant's right arm. In this experiment, the pictures of the hand holding a ball changed mid-course into those of pictures with no ball. As in Experiment 1A, we instructed all the participants to hold their right hand in a horizontal position throughout the trial, which lasted 90 s. Our preliminary experiment suggested that 90 s was the approximate limit that the hand could be held in an approximately horizontal position. Participants were also instructed to look at the visual stimuli, not their hand, as we could not use an occluder, since it could visually block the hand from the video cameras. The right arm was held out straight with fingers stretched in order to ensure a horizontal position during the course of a visual countdown of 3 s. The visual stimulus was presented from the time of zero and the recording of the hand position began. After 60 s, the image of a hand holding a ball was changed to one of a hand with no ball (see **Figure 5**), that is, a within-participants procedure was used in this experiment, whereas a between-participants procedure was employed in Experiment 1A. The order of the visual stimuli was fixed (that is, "with ball" first, and then "without ball") in the current experiment because it is possible that the participants would experience muscle fatigue during the latter half of the session (participants who are presented with the "without ball" image first and then the "with ball" image might not raise their hands because of muscle fatigue), which would result in differences between the counterbalanced groups that are not due to experimental manipulation. Each participant completed a single trial where the following

body parts were recorded: shoulder, elbow, wrist, and tip of the middle finger.

#### **Results and discussion**

The time courses of the hand positions of the participants indicated that the heavy-ball group tended to raise their right hand over their shoulders gradually while observing a model's hand holding a ball; however, after 60 s, when the image was changed to a picture of a hand without a ball, the hand started to lower. This indicates that the hand raising was based on their shoulder as a fulcrum point, because they might feel heaviness on the back of the hand as if it were the model's hand. Conversely, participants in the light-ball group lowered their hands gradually (**Figure 5**; **Figure A1** in Appendix).

We conducted a two-way ANOVA (two groups × two visual stimuli) to examine the movement velocity of the hand (i.e., fingertips; **Figure 6**). These analyses demonstrated a significant main effect of group [*F*(1.6) = 6.00, *p* < 0.05], and a significant main effect of visual stimuli [*F*(1.6) = 18.49, *p* < 0.01], but nonsignificant interaction [*F*(1.6) = 0.67, *p* > 0.50]. It is clear that the trend to raise the right hand was observed during the presentation of the image of a model's hand holding a ball, when participants simultaneously held a visually identical heavy ball in their left hand, suggesting replication of Experiment 1A in a withinparticipants manner. Conversely, after 60 s, participants in both groups lowered their hands gradually, maybe because of expected muscle fatigue. The present experiment aimed to observe arm movement up to the limit of fatigue; however, there may be confounding between muscle fatigue and hand-lowering, though the rising hands started lowering after just 60 s from the beginning of the experiment (see **Figure 5**). We addressed this limitation in the following experiments.

Experiment 1B reconfirmed the "heaviness contagion" overall; observation of the model's hand holding a heavy ball was associated with raising of the hand. This could be driven predictively (merely the prediction of heaviness raises the hand of a participant) and mandatorily (that is why participants must compensate

for their illusory heaviness: they did not ignore it). However, a further question must be addressed: which mechanism would cause this phenomenon? The most probable mechanism is direct matching, where we directly map the observed sensation of other agents onto our own sensorimotor representation (Iacoboni et al., 1999). Recent studies have suggested that the direct matching system, which includes motor simulation, bodily resonance, and automatic imitation,might have a biological bias (Press et al., 2005;Tsai and Brass, 2007; Watanabe, 2008; Liepelt and Brass, 2010; Liepelt et al., 2010),indicating that we do not simulate non-human agents. Experiment 2A, with some changes in experimental procedure, was conducted to address this issue. In the current experiment, we presented"with ball"first,followed by"without ball,"and the durations of the two visual stimuli were different (60 s for "with ball" and 30 s for "without ball") in order to confirm that the raising of the hand would continue for a longer time (as long as "with ball" was presented), compared to Experiment 1A (45 s). In the next experiment, we presented "with ball" in the middle of the session with the same duration as the other visual stimuli conditions.

#### **EXPERIMENT 2A**

Our next aim was to show that heaviness contagion could be driven by observing a person, not by observing an object, because we should simulate a co-specific counterpart in terms of MNS. Furthermore, we made some minor changes. A model's hand without a ball was shown first, followed by the presentation of a model's hand holding a ball in order to controlfor hand-lowering caused by muscle fatigue. Participants also repeated trials in this experiment to indicate resistance to habituation.

## **MATERIAL AND METHODS**

#### **Participants**

A total of 17 participants (6 males and 11 females, mean age = 19.5 years, range = 19−21 years) took part in this experiment; however, one female dropped out because she could not keep her hand in a horizontal position during the trials.

#### **Apparatus**

The apparatus was changed slightly. In this experiment,we showed the visual stimuli on a 19<sup>00</sup> LCD display (LCD-AD19H, IO-DATA, Tokyo, Japan),located 60 cm in front of the participants. The visual stimuli were presented in front of each participant where they were seated, and corresponded spatially to each participant's right arm.We measured hand positions using a high-speed camera (EX-FC150, CASIO, Tokyo, Japan), which was located 1 m just behind the right arm when the arm was raised horizontally. The sampling rate was 120 Hz; we finally down-sampled to 1 Hz using averaging. An occluder prevented the participantsfrom seeing their right arm.

#### **Procedure**

As in Experiment 1, we instructed each participant to hold the right hand in a horizontal position with their fingers stretched throughout the trial, which lasted 60 s. For the first 20 s, the image of a hand without the ball was presented. After 20 s, the image was changed to one of a hand holding a ball. Furthermore, after 40 s, the image of a hand holding a ball was changed to one of a ball on a wooden block. The first and second images were the same as those used in previous experiments, whereas the third was newly prepared, so that the size of wooden block was approximately the same as a model's hand. All participants held a weighted ball (1 kg) in the left hand during each trial to ensure that they were aware of the weight of the ball in the picture. In this experiment, each participant repeated three trials, with a fourth trial being the baseline trial, throughout all of which the image of a hand without a ball was presented (60 s). We calculated the average of the data obtained from the first three trials, and then calculated the difference between that and the data of the fourth baseline trial with regard to the height of the hand. This was done because our pilot studies suggested that when participants repeated such trials, it might have become increasing easy to lower their hand as the trials progressed, even if sufficient rest was taken before each trial (as with the results of Experiment 1B), possibly because of muscle fatigue. We recorded the position of the tip of the middle finger in this experiment.

#### **Results and discussion**

The time courses of the hand positions of the participants indicated that they could keep the hand almost immobile for the first 20 s (a model's hand with no ball), then tended to raise the hand gradually for the next 20 s (a model's hand with a ball), and then could again keep the hand almost immobile for the last 20 s (a ball on a wooden block; **Figure 7**). A one-way ANOVA (three visual stimuli conditions) was conducted to examine the movement velocity of the hand (**Figure 8**). These analyses demonstrated a significant main effect of condition [*F*(2.30) = 4.42, *p* < 0.05], and *post hoc* Ryan's multi-comparison revealed significant differences between the first and second stimuli, and between the second

and third stimuli (*p* < 0.05). These results suggest that the participants tended to raise the hand only while observing a weighted ball on a model's hand, and not while observing a ball on a wooden block.

As hypothesized, the heaviness contagion was induced by observing a person, indicating that direct matching might be the underlying mechanism (Iacoboni et al., 1999) and that MNS is the underlying neural mechanism (Rizzolatti and Craighero, 2004). A hand-shaped object was not used because previous studies have shown that its reality (i.e., its similarity to a real person's hand) might affect the simulation process of the observers (see General Discussion for detail). Although the present experiment suggested that an object shaped unlike a hand would not drive a feeling of heaviness in the observers, further research should address this issue (e.g., by using a wooden hand, a robotic hand, a xenogeneic hand, etc.). Although the current experiment suggests that the heaviness contagion as well as other motor simulation have a biological basis (Liepelt and Brass, 2010; Liepelt et al., 2010), previous studies, especially those in social psychology, have suggested that different people affect our simulation mechanisms differently (Calvo-Merino et al., 2006; Hein and Singer, 2008; Xu et al., 2009). The following final experiment examined the type of person, amongst a variety of people, who drives the heaviness contagion of observers.

#### **EXPERIMENT 2B**

Finally, this experiment showed that a person who is similar to an observer could drive a feeling of heaviness in that observer; as in "like will to like." We manipulated the visual appearance between a model's hand and each participant's hand. It was hypothesized that only those participants whose hand was similar to a model's hand would be subject to heaviness contagion.

## **MATERIAL AND METHODS**

#### **Participants**

A total of 24 participants (four males and 20 females, mean age = 19.5 years, range = 18–24 years) were randomly divided into two groups, both of whom saw a model's hand holding a ball, and also held a visually identical ball in their left hands. Participants in the first group wore a blue-glove on their right hand, which was the same as the one that was worn on the model's hand (this was called the blue-glove group), whereas those in the second group wore a yellow-glove (this was designated the yellow-glove group). Both gloves weighed 50 g.

#### **Apparatus**

The experimental device and environment were identical to those in Experiment 2A.

#### **Procedure**

As in previous experiments, we instructed each participant to hold the right hand, on which a glove was worn, in a horizontal position with their fingers stretched throughout the trial, which lasted 30 s. For the first 15 s, the image of a hand without the ball was presented. After 15 s, the image was changed to one of a hand holding a ball. All participants held a weighted ball (1 kg) in their left hand during each trial, so that they were aware of the weight of the ball in the picture. Each participant repeated three trials, with the fourth trial being the baseline trial, throughout all of which the image of a hand without a ball was presented (30 s), as in Experiment 2A. We recorded the position of the tip of the middle finger.

#### **Results and discussion**

The time courses of the hand positions of the participants indicated that those in both groups were capable of keeping the hand almost immobile for the first 15 s (a model's hand with no ball); however, during the last 15 s (a model's hand holding a ball), participants in the blue-glove group,who were wearing the same glove as worn on a model's hand, tended to raise their hands, whereas those in the yellow-glove group kept the hand still and almost immobile (**Figure 9**).

We conducted a two-way ANOVA (two visual stimuli × two groups) to examine the movement velocity of the hand (**Figure 10**). These analyses demonstrated a significant interaction [*F*(1.22) = 5.53, *p* < 0.05], but no significant main effect of group [*F*(1.22) = 1.29, *p* > 0.20] or visual stimuli [*F*(1.22) = 2.30, *p* > 0.10]. The simple main effect of the group under the last visual stimuli (a model's hand holding a ball) condition and the simple main effect of visual stimuli under the blue-glove condition were significant (*p* < 0.05). These results suggested that only participants who wore the same glove as that worn by the model tended to raise their hand while observing a model's hand holding a weighted ball.

This suggested that we have specific targets for motor simulation, that is, a person who is "like me," as suggested in some previous studies (see General Discussion for detail). In the present experiment, participants who wore a glove that was different from that worn by the model did not feel illusory heaviness on their hand, whereas in the previous experiments, although the participants did not wear a glove, they felt an illusory weight. This may seem contradictory in the sense that the hands of both sets of participants were visually different from the model's hand. One reason for this may be that in the previous experiments, the participants perceived a model's hand as a neutral hand wearing a glove (the hand was merely one of others), whereas in the present experiment, a model wearing a glove that is different from that worn by the participants may appear as a person explicitly defined as different from the participants themselves (the hand was one of others that differ from mine), thereby indicating in-group vs. out-group identification bias (see General Discussion). We shall now explain our findings using the mechanism behind motor simulation and how this may be construed as an expression of empathy.

### **GENERAL DISCUSSION**

The results of the present study suggest that we may ourselves feel the heaviness felt by others, by observation alone ("heaviness contagion"). This new phenomenon might be driven predictively (i.e., in the present study, the participants predicted the feeling of heaviness experienced by another and raised their own hands), mandatorily (since they did not ignore it, participants in the present study needed to compensate for illusory heaviness; Experiment 1AB), and as a potential expression of empathy (the participants may have only responded to human counterparts, especially a person who was like them; Experiment 2AB). We shall discuss each factor with regard to extending motor simulation theory and the potential neural mechanism below.

#### **SIMULATION OF OTHERS' SENSATIONS IS PREDICTIVE**

In our daily life, we can share many kinds of feelings with others, which may promote our social interaction as a social animal (see Iacoboni, 2009; Thioux and Keysers, 2010). Some previous studies have suggested that this ability has been learned through our previous experiences, which are underpinned by neural-based learning, such as experience-based Hebbian learning, or an internal model that forms links between the sensory processing of actions and motor plans (Iacoboni, 2009). Therefore, we appear to be able to simulate the action of others only when that action is also part of our own repertoires, especially with regard to skilled actions (Calvo-Merino et al., 2005; Lahav et al., 2007; Aglioti et al., 2008). Furthermore, we may also simulate the action or mental states of others, through prediction or generalization based on a learned model, if this action or mental state is not one that is particularly complicated, even if this is something not previously experienced.

Patients with the rare syndrome of congenital insensitivity to pain showed normal fMRI responses to observed pain in the anterior mid-cingulate cortex and anterior insula (so-called shared circuits for pain experienced by both the self and others (Danziger et al., 2009), indicating that although they could not feel pain subjectively, they could predict the sensation of it, despite no previous experience of pain.

In general, how we feel depends on our predictions. This is true even if the target is not included in our repertoire, as long as it is simple. Size-weight illusion means smaller-sized objects feel heavier than larger-sized objects of the same weight, suggesting that we might predict weight from size, even for unfamiliar objects (Ross, 1966; Flanagan and Beltzner, 2000). In addition, we might see, hear, feel, taste, move, and perform as we predict (e.g., Barber and Calverley, 1964; Santarcangelo et al., 2005; Durgin et al., 2007; Plassmann et al., 2008; Castle et al., 2012). The present study suggested that this is also true in simulating others' sensations; we might be resonant with others as we predicted (Iacoboni et al., 2005), indicating that motor simulation, which might be realized by action-perception coupling (James, 1890), is one of our basic processes, as with other perceptual functions. However, it only targets people (human agents), not objects (non-human agents). The reason for why this function could be driven through prediction is explained in the following discussion in terms of the target that we resonate with.

#### **SIMULATION OF OTHERS' SENSATIONS IS MANDATORY**

As a social animal, are we innately motivated to share feelings with others? Some previous studies have differentiated the brain activity that occurs between automatic and intentional empathy or imitation, by comparing only seeing (evaluating skin color) and actively sharing the feelings regarding the facial expressions of others (de Greck et al., 2012), or by comparing finger movements between only responding to a spatial cue and imitating that cue (Bien et al., 2009). Although these studies have suggested that we have an automatic and implicit function for simulation, "automatic" does not always mean "mandatory," in the sense that we have a veto. It is possible that we could role-play the behaviors of others implicitly and automatically to promote our social communications. Some other studies reported that observing an action made by a human interferes with executed actions (Kilner et al., 2003, 2007). Although these studies have suggested that we do not ignore the observed actions of others, the possibility of demand characteristics of study participants, that is, the ability to speculate on the intention of the experimenters and to behave as expected remains, and therefore should be carefully controlled for, especially in this topic, because empathy or motor simulation could be linked with the estimation of the intention of others (i.e., mind-reading; Singer, 2006). Study participants may be resonant not with the stimuli, but with the experimenter ("experimenter effects"). A compensatory reaction to sensation transmitted from others is suggested by the results of the present study, and might mean that the participants did not ignore the sensation, regardless of the expectation of the experimenters, since they were doubly blind to the purpose (our expectation was neither that the hand could be kept immobile, nor that the hand might be lowered in response to heaviness felt).

This mandatory process might be underpinned by its potential neural mechanism. Because the MNS does not distinguish between external (others) and internal (self) action representation, it allows the individual to gain an experiential knowledge of the observed action in the absence of any motor output, as if that person is the agent of the action (Rizzolatti and Craighero, 2004). This indicates that we also need the process of distinguishing between representation of the action of the self and of others, such as the "who system"or the sense of agency or body ownership (Jeannerod and Pacherie, 2004; Schutz-Bosbach and Prinz, 2007) in order to inhibit such a mandatory contagion in situations such as those used in the present experiments. These functions might share the same circuit in our brain (Miall, 2003). This distinguishing mechanism could contribute to the compensatory reaction to feelings of heaviness. We can see that the participants totally disagreed, at least subjectively, with the assertion that a model's hand on the screen appeared to be like their own hand (see **Figure 4**). They did not prevent the contagion from others, but simultaneously knew that it was not their own hand, which might lead to the need to adjust to the perceived illusory weight. This is not conclusive at the moment; however, it is essential to discuss self-other representation comprehensively in further research: simultaneously connecting and distinguishing between the functions of the self and others.

#### **WHO IS THE TARGET OF OUR SIMULATION?**

Just as we do not constantly simulate, we also do not simulate everybody. Previous studies have suggested that the amplitude of empathic brain responses is modulated by the similarities between the self and others, such as gender, race, or previous experience, through observation (Calvo-Merino et al., 2006; Hein and Singer, 2008; Xu et al., 2009). A computational model-based approach explains that this is not only because of this sense of familiarity but also because individuals can predict the mental state or action representation of others, based on their own knowledge or learned model (Wolpert et al., 2003; Schutz-Bosbach and Prinz, 2007). Mirroring others might help to understand what another person is doing or feeling, or to predict what that individual is most probably going to do next (Blakemore and Frith, 2005; Iacoboni et al., 2005). Thus, this prediction is modulated by top-down processing, similar to animacy perception (Liepelt and Brass, 2010; Liepelt et al., 2010), the impossibility of the action (Longo et al., 2008), or spatial compatibility (Bertenthal et al., 2006). The similarities between observers and targets, even if it is a simple visual appearance as examined in the present study, might enhance an observer's predictability of others for a simulation. The similarity effect may affect simulation responses through the tendency of an observer to identify more closely with others who appear to be similar to themselves, with regard to features such as personality, visual appearance, cultural likeness, sentience, or circumstance (Gruen and Mendelsohn, 1986; Brown et al., 2006), that is, in-group empathy (Rae Westbury and Neumann, 2008).

This may also be true of the difference between humans and other animals, or objects. It has been well documented that the MNS might be activated when observing conspecific counterparts (Gallese and Goldman, 1998), and, in line with this, some studies have suggested that the amplitude of empathic responses is also modulated by the phylogenetic similarity between the observers and their targets (Hills, 1995; Rae Westbury and Neumann, 2008). In addition, motor simulation has a biological bias (Press et al., 2005; Tsai and Brass, 2007; Watanabe, 2008; Liepelt and Brass, 2010; Liepelt et al., 2010), indicating that we do not simulate nonhuman agents. Nevertheless, other previous studies show that it is possible to be resonant with those who are different from us, such as people with different cultural backgrounds, animals, cartoon characters, and artificial objects, even early in life (Abell et al., 2000; Buccino et al., 2004; Hamlin et al., 2007; Perry et al., 2010). We can feel pain on the virtual or artificial hand (Ehrsson et al., 2007; Hägni et al., 2008), whereas observing an action made by a robot might not interfere with executed actions (Kilner et al., 2003). However, action-speed contagion might be driven by point-light biological motions (Watanabe, 2008) or the motor priming effect, which is an expression of motor simulation that is possibly modulated by beliefs about animacy or even virtualness of the hand (Longo and Bertenthal, 2009; Liepelt and Brass, 2010). Although it is also possible that biological tuning of motor simulation is highly action-selective (Liepelt et al., 2010), it might be presently difficult to form clear criteria for differentiating between the agents that we can be resonant with and the ones that we cannot. Nevertheless, since illusory body ownership of an artificial object might depend on its corporeality (Tsakiris et al., 2010), as the present study also suggested, we might again assume the importance of a similarity between observers and targets, which could make us feel closer to others (even animals or objects), and therefore to which we could apply our own knowledge. However, there is still a large gap between the lower level of self-other representation such as sensorimotor direct matching or motor simulation and the higher level of it such as top-down biological bias or in-/out-group empathy. Therefore, future research should tackle this problem in terms of social cognition (Farmer et al., 2012).

#### **LIMITATION OF THE CURRENT STUDY**

The present study suggested the new behavioral phenomenon of motor simulation in order to develop a background theory. The behavioral evidence of motor simulation, however, is not always compatible with neuroscientific or subjective report studies. Observing others' action evokes the cortical activation (Iacoboni et al., 1999) but it does not evoke the execution of the movement; an exception is people with pathological conditions (see for review, Bertenthal et al., 2006). We can observe this through the facilitation in reaction time when observers do the same (e.g., Liepelt and Brass, 2010; Liepelt et al., 2010) or even unrelated action (Brass et al., 2000;Watanabe, 2008). Furthermore, our brain is activated in response to observed tactile stimuli to others (Keysers et al., 2004); however, except for specific people with mirror-touch synesthesia (Blakemore et al., 2005) who could have enhanced subjective empathy traits, we do not generally feel this tactility in reality (Banissy and Ward, 2007). As discussed, this may be because of the inhibition process that we possess to block automatic contagion. Therefore, to increase the behavioral response of study participants, our experimental methodology used a unique procedure: a ball was held during trials, and not just felt its heaviness before trials. This might give a potential artifact, although this

was carefully controlled for in our experiments (that is, a potential effect of holding a ball: see Experiment 1A). Further studies should refine what information would be needed from others, as well as how and when it is needed, in order to elicit heaviness contagion.

## **REFERENCES**


nonconspecifics: an FMRI study. *J. Cogn. Neurosci.* 16, 114–126.


## **ACKNOWLEDGMENTS**

This work was supported by Grant-in-Aid for JSPS Fellows (22– 415). We would like to thank Dr. Kohske Takahashi and Dr. Katsumi Watanabe for their help with the experimental settings and for their comments on the early results of the study.

organization to intention understanding. *Science* 308, 662–667.


computational framework for motor control and social interaction. *Philos. Trans. R. Soc. Lond. B Biol. Sci.* 358, 593–602.

Xu, X., Zuo, X., Wang, X., and Han, S. (2009). Do you feel my pain? Racial group membership modulates empathic neural responses. *J. Neurosci.* 29, 8525–8529.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 May 2012; accepted: 24 June 2012; published online: 13 July 2012. Citation: Asai T, Sugimori E and Tanno Y (2012) The body knows what it should do: automatic motor compensation for illusory heaviness contagion. Front. Psychology 3:244. doi: 10.3389/fpsyg.2012.00244 This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Asai, Sugimori and Tanno. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, providedthe original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX**

## When does perception facilitate or interfere with conceptual processing? The effect of attentional modulation

#### *Louise Connell <sup>1</sup> \* and Dermot Lynott <sup>2</sup> \**

*<sup>1</sup> Embodied Cognition Lab, School of Psychological Sciences, University of Manchester, Manchester, UK*

*<sup>2</sup> Embodied Cognition Lab, Decision and Cognitive Sciences Research Centre, Manchester Business School, University of Manchester, Manchester, UK*

*\*Correspondence: louise.connell@manchester.ac.uk; dermot.lynott@manchester.ac.uk*

#### *Edited by:*

*Judith Holler, Max Planck Institute Psycholinguistics, Netherlands*

#### *Reviewed by:*

*Guido Gainotti, Policlinico Gemelli, Italy*

Conceptual processing relies on the perceptual system, and, as such, perception affects conception. Many studies have demonstrated perceptual-conceptual interference, where perceptual stimulation in a particular modality leads to slower and/or less accurate conceptual processing of information from the same modality (e.g., Kaschak et al., 2005, 2006; Vermeulen et al., 2008). However, many other studies have demonstrated perceptual-conceptual facilitation, where perceptual stimulation leads to faster and/or more accurate conceptual processing in the same modality (Kaschak et al., 2006; van Dantzig et al., 2008; Connell et al., 2012; Connell and Lynott, in preparation).

At first glance, this apparent discrepancy seems like a serious problem for accounts of simulation-based concepts. Such theories hold that offline representations—that is, representations of objects and events that are not in the current environment (Wilson, 2002)—are functionally comprised of partial replays (i.e., simulations) of the neural activation captured during perceptual, motor, affective, and other experience (Barsalou, 1999, 2008; Glenberg and Gallese, 2012; Connell and Lynott, submitted). If conceptual representations therefore require modalityspecific perceptual simulation, then why do they not consistently interact with perception? Why does perceptual stimulation sometimes impair and sometimes facilitate conceptual processing?

One account proposed that the difference lies in whether perceptual stimulation is concurrent with the conceptual task or precedes it, and whether or not the perceptual stimulus can be easily integrated into the simulation required by the conceptual task (Kaschak et al., 2005). According to this account, interference occurs when a concurrent perceptual stimulus cannot be integrated into the simulation required by the conceptual task. For example, Kaschak and colleagues argued that an upwardscrolling visual display could not be easily integrated with the sentence *The cat climbed the tree*, and hence interfered with its simulation. Facilitation occurs when a perceptual stimulus can be easily integrated into a simulation, regardless of whether the perceptual and conceptual components of the trial are presented concurrently or sequentially. For example, an image of a car would facilitate understanding a sentence like *The car approached you*.

However, this account cannot easily explain later findings. For example, concurrent tactile stimulation, in the form of vibrations to the palms and fingers, facilitates people's ability to judge the size of manipulable objects (Connell et al., 2012). Vibrotactile stimulation seems at least as distant from object representations of wallets and keys as upward-scrolling lines are from a cat climbing a tree. Yet, even though both perceptual stimuli appear "nonintegratible," the former produced facilitation and the latter interference.

## **ROLE OF ATTENTION**

We propose that these apparently discrepant effects can be resolved if one considers the attentional demands each task places on modality-specific processing. The perceptual and attentional systems are intertwined, and, since the conceptual and perceptual systems share modalityspecific neural substrates, it should come as no surprise that they also share associated attentional mechanisms (e.g., Pecher et al., 2003; Connell and Lynott, 2010). Interference emerges when the perceptual stimulus occupies attention and leaves few resources free for simulation purposes. For example, a moving stimulus changes over time, and, as such, continuously captures attention in order to monitor its motion. Because a perceptual stimulus automatically directs exogenous attention toward that modality (e.g., Spence et al., 2001), processing a changing percept will wrest attention away from simulating in that modality and lead to interference effects. Conscious perceptual imagery, such as manipulation or memory rehearsal of perceptual information, will also occupy modality-specific attentional resources, and hence interfere with simulation in that modality.

In contrast, facilitation emerges when the perceptual stimulus directs attention toward a particular perceptual modality but leaves adequate resources free for simulation purposes. Selectively attending to a particular perceptual modality, even in the absence of a target, increases activation in the corresponding sensory cortex at the expense of other modalities (Foxe et al., 2005; Mozolic et al., 2008; Langner et al., 2011). That is, attention alone can preactivate modality-specific perceptual systems so that subsequent target processing in that modality is facilitated. All else being equal, perceptual processing is hence faster in an attended than an unattended modality (Spence et al., 2000, 2001; Töllner et al., 2009).

In principle, both interference and facilitation can happen in concurrent and sequential presentation paradigms. For example, if a perceptual stimulus is presented concurrently with a conceptual task, it would interfere if it changes over time and continuously occupies attentional resources in that modality and would facilitate if it doesn't change and instead leaves that modality in an attentionally primed state. Similarly, if a perceptual stimulus has completed its presentation before a conceptual task, it would interfere if it is still occupying attentional resources and would facilitate if it no longer occupies attentional resources. Moreover, the sensory cortices are not homogenous, but rather contain some degree of feature specialization. In the visual modality, for instance, upward and downward motion are processed in different cell assemblies in the visual cortex (Mather et al., 1998), and attending to a particular direction of motion can increase activation in that direction-specific detector (Kamitami and Tong, 2006). As such, attentional effects can operate at either a whole-modality or a feature-specific level.

## **OVERVIEW OF EFFECTS**

It is important, when disentangling facilitation and interference effects, to compare like with like. For that reason, we focus here on studies that (1) combine perceptual stimulation with a linguistic conceptual task and (2) measure responses to the linguistic conceptual task.

## **INTERFERENCE**

A number of studies have shown interference effects because perceptual stimulation occupies attention in that modality, leaving insufficient resources for simulation.

In a concurrent paradigm, Kaschak et al. (2006: Experiments 1, 3) presented an auditory motion stimulus (i.e., an auditory illusion where the source of the sound appears to change location: upwards, downwards, towards or away) while participants read sentences onscreen that described auditory motion in a particular direction (e.g., *The jet pack roared into the sky*). People were slower to judge the sentences as sensible when they described the same direction of motion as the perceptual stimulus. Here, the motion in the perceptual stimulus meant that it continuously grabbed auditory attention as it changed over time. Auditory attention was therefore occupied in monitoring motion in a particular direction, and so there were insufficient attentional resources free when the sentence called for auditory simulation of motion in the same direction. Hence, the perceptual stimulus interfered with conceptual processing. The same account applies to Kaschak et al.'s (2005) studies of visual motion.

In a sequential presentation paradigm, Vermeulen et al. (2008) asked participants to first memorise auditory or visual stimuli (e.g., a series of visual shapes), then respond to a modality-specific property verification question (e.g., *lemon can be yellow*), and finally judge if another perceptual stimulus had been presented at the start of the trial. They found that property verification was slower when people held a perceptual memory load in the same modality. Here, although the perceptual and conceptual stimuli were presented in sequence, perceptual and conceptual processing effectively occurred concurrently because the memory load required imagistic rehearsal (i.e., conscious and effortful simulation) of the perceptual stimulus. In other words, the memory load task occupied modalityspecific attentional resources, and so interfered with conceptual processing in that modality.

## **FACILITATION**

Several other studies have shown facilitation effects because the perceptual stimulus directed attention to a particular perceptual modality without occupying resources.

In a concurrent paradigm, Kaschak et al. (2006: Experiments 2, 3) asked participants to listen to sentences over headphones that described auditory motion in a particular direction (e.g., *The jet pack roared into the sky*) while, in the background of the spoken sentence, an auditory motion stimulus was played. People were faster to judge that sentences were sensible when they described motion in the same direction as the auditory stimulus. Here, participants actually experienced two auditory stimuli: a perceptual stimulus of auditory motion and a speech stream delivering information for the linguistic conceptual task. Since the task goal of sensibility judgement required participants to listen closely to the sentence, their auditory attention was occupied by the speech stream and not by monitoring perceptual motion. As such, the perceptual stimulus directed attention toward motion in a particular direction, and hence facilitated simulation of auditory motion in that direction. These findings contrast with the interference effects found for auditory motion in the same paper when the sentences were presented in visual (text) form. When the perceptual motion stimulus is the only thing presented in that modality, attention will be occupied in monitoring its change over time, and simulation of same-direction motion in that modality will suffer from insufficient resources. But when the perceptual motion stimulus is presented in the same modality as a goal-relevant stimulus (i.e., something that requires a response, such as a sentence that must be judged as sensible or not), then the latter stimulus will have attentional priority. The perceptual motion will be perceived but not monitored—meaning it directs attention but does not continue to occupy it—and so simulation of samedirection motion in that modality will be easier (see also Zwaan and Taylor, 2006; Experiments 3, 5).

In a different concurrent paradigm, we stimulated people's hands or feet with tactile vibrations while asking them to compare the size of manipulable objects (e.g., *Which is bigger? wallet* or *key*: Connell et al., 2012). People were faster to name the relevant object when their hands were stimulated compared to their feet. Because the vibrotactile stimulation was constant and unchanging, it did not require monitoring and simply directed attention to the tactile modality in a somatotopic manner (i.e., the hand or foot area of the somatosensory cortex). Tactile stimulation to the hands therefore facilitated conceptual processing of objects whose simulations contained hand-related tactile information, while having no effect on objects whose representations did not include this information (e.g., *yacht*). The same effects emerged for proprioceptive stimulation. In other words, perceptual stimulation directed attention to modality-specific, body-specific systems and made simulation of such information easier.

Perceptual attention does not have to be directed by an exogenous stimulus but can also be endogenously directed as part of the implicit demands of a task. In recent work, we hypothesized that reading is, in effect, a concurrent paradigm of perceptual and conceptual processing. Both lexical decision and naming tasks involve recognition of visual word forms, and, as such, implicitly direct attention to the visual modality. Hence, we found that strongly visual words (i.e., referring to concepts with a strong visual component) have faster and more accurate lexical decision and naming times than weakly visual words, even when other variables such as length and frequency have been controlled (Connell and Lynott, in preparation). Furthermore, saying the same words aloud in a naming task, where the goal is correct pronunciation, also directs attention to the auditory modality. As a result, strongly auditory words are named more quickly and accurately than weakly auditory words. Indeed, such modality-specific attentional priming effects may be one of the main underlying reasons for concreteness effects in reading tasks (Connell and Lynott, 2012).

Finally, in a sequential paradigm, van Dantzig et al. (2008) asked participants to respond to a perceptual stimulus (visual light, auditory white noise, tactile vibration) and then to a property verification task (e.g., visual *broccoli is green*). People were faster to verify a property in the same modality as the preceding perceptual stimulus. Here, the perceptual stimulus directed attention toward its modality but did not require any further resources once the response was made, which meant that subsequent conceptual processing in that modality was facilitated (see also Vermeulen et al., 2009).

#### **WHAT ABOUT ACTION?**

Similar combinations of facilitation and interference effects have been observed in studies of action and motor simulation, but we do not address them here because these studies tend to differ from those of perceptual simulation in one key respect. Perceptual simulation studies like those discussed above measure their dependent variable on a response act that is *unrelated* to the experimental manipulation (e.g., pushing a button, speaking aloud). In contrast, motor simulation studies typically involve measuring motor responses to action-related words and sentences (e.g., Glenberg and Kaschak, 2002; Boulenger et al., 2006; Zwaan and Taylor, 2006; Kaschak and Borreggine, 2008), and, as such, measure their dependent variable on a response act that is a *function of the experimental manipulation*. The net result of combining the manipulated and response modalities is to render it difficult to separate the effects of simulation on action from the effects of action on simulation, and to make the allocation of attentional resources susceptible to subtle differences in timing. By illustration, effects vary between interference and facilitation depending on the point in time that participants are made aware of the required action (Kaschak and Borreggine, 2008), the tense of verbs employed in the linguistic conceptual task (de Vega et al., 2004; Bergen and Wheeler, 2010), and the possibility of having to interrupt an action mid-execution (Boulenger et al., 2006). For these reasons, the picture of facilitation and interference effects in most motor simulation studies is more complex and variable than that in perceptual stimulation studies.

## **REFERENCES**


two things at once: temporal constraints on actions in language comprehension. *Mem. Cognit.* 32, 1033–1043.


Zwaan, R. A., and Taylor, L. (2006). Seeing, acting, understanding: motor resonance in language comprehension. *J. Exp. Psychol. Gen.* 135, 1–11.

*Received: 08 October 2012; accepted: 16 October 2012; published online: 02 November 2012.*

*Citation: Connell L and Lynott D (2012) When does perception facilitate or interfere with conceptual processing? The effect of attentional modulation. Front. Psychology 3:474. doi: 10.3389/fpsyg.2012.00474*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Connell and Lynott. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Embodied space in early blind individuals

#### *Virginie Crollen1,2 and Olivier Collignon3 \**

*<sup>1</sup> Centre de Neuroscience Système et Cognition, Institut de Recherche en Sciences Psychologiques, Université Catholique de Louvain, Louvain, Belgium*

*<sup>2</sup> Centre de Recherche en Neuropsychologie et Cognition, Université de Montréal, Montréal, QC, Canada*

*<sup>3</sup> Centre for Mind/Brain Science, University of Trento, Trento, Italy*

## *\*Correspondence: olivier.collignon@unitn.it*

*Edited by: Louise Connell, University of Manchester, UK*

#### *Reviewed by:*

*Stephanie A. Gagnon, Massachusetts General Hospital and Harvard Medical School, USA*

The study of visually deprived individuals offers a unique opportunity to investigate the role that vision plays in shaping how we process our surrounding space. The visual system typically provides the most accurate and reliable spatial information about our surroundings and therefore is usually considered as the primary sense when spatial processing is at play. One of the best examples of such visual dominance in space perception comes from experiments showing that when a sound is accompanied by a visual stimulus at a different location, people tend to perceive this sound incorrectly at the same position as the visual stimulus (Pick et al., 1969). This "ventriloquist" effect occurs because the brain affords more weight to visual information in localizing the audiovisual event, thus inducing a "visual capture" of acoustic space (Alais and Burr, 2004).

It was first thought that visual deprivation might be detrimental to the development of spatial abilities in the remaining modalities since vision may be required to calibrate the other sensory systems (Axelrod, 1959; Rock and Halper, 1969; Warren and Cleaves, 1971). Interestingly, this does not appear to be the case since several studies have shown that blind people are usually as good as and often better than normal sighted controls (SCs) in the processing of non-visual spatial inputs (Lessard et al., 1998; see Collignon et al., 2009a for review). The recurrent hypothesis to explain such findings is that vision loss is partly offset by an increased use of the remaining senses (Wong et al., 2011) which triggers enhancement in their efficiency concomitantly to compensatory brain reorganization processes (Gougoux et al., 2005; Collignon et al., 2011). If this may be a part of the story, another possibility that we want to address in the present paper is that aside quantitative differences between sighted and blind people in their perceptual skills, visual deprivation may result in qualitatively different ways of processing non-visual information (Eimer, 2004). While sighted people may automatically process spatial information in an external spatial frame of reference.1 Early blind (EB) participants may preferentially use an internal coordinate system.2

Bradshaw et al. (1986) were the first to suggest a qualitative difference in the way EB individuals process touch. In this study, a rod was placed within a shorter pipe. EB and SC participants were asked to slide the rod within the pipe until the rod extremities were judged equidistant from the ends of the pipe. Results demonstrated that SC placed the midline of the pipe slightly to the left of the true midpoint (leftward bias or pseudo-neglect; see Jewell and McCourt, 2000 for a review) with hands placed in parallel or crossed over the body midline. EB participants, in contrast, showed a leftward bias with hands in parallel (see also Sampaio et al., 1995) but a rightward bias with the arms crossed. Even if this effect was only elusively discussed, authors nonetheless interpreted it as reflecting a more internal representation of space in EB. According to the view that the right hemisphere plays a dominant role in attentional control, the leftward bias shown by sighted individuals may be due to the fact that the right hemisphere bias attention to the left visual space so that rods appear longer in the control lateral left hemifield. The reversed pseudo-neglect effect presented by the EB in the crossed posture may in contrast indicate that the right hemisphere of these participants actually affords more attention to the contralateral tactile space therefore leading to an overestimation of the side of space where the left hand is placed.

More recently, Röder et al. (2004) brought new and more compelling evidence in support of this idea. In their temporal order judgment (TOJ) task, participants were asked which of the two hands received a tactile stimulus first. As expected, SC were less accurate with crossed than with uncrossed hands (Yamamoto and Kitazawa, 2001; Shore et al., 2002). This is accounted by the fact that tactile stimuli are not only represented in an anatomical reference frame but are automatically remapped into external spatial coordinates, inducing a conflict between somatotopic and external spatial codes when the hands are crossed over the body midline (Pavani et al., 2000; Kitazawa, 2002; Shore et al., 2002; Azañón and Soto-Faraco, 2008; Azañón et al., 2010a). By contrast, crossing the hands did not lead to a general decrement in EB tactile discrimination performance (Röder et al., 2004) suggesting that the automatic external remapping process of touch was not innate but rather depended on early visual experience (see also Bremner et al., 2008). This idea was later supported by an electroencephalographic study. While the detection of deviant tactile stimuli on the hand induced event-related potentials that varied in crossed when compared to uncrossed condition in SC, changing the posture of the hand had no influence on the EB brain activity (Röder et al., 2008).

The lower incidence of using an external reference frame in EB individuals has also been observed in tasks investigating the multisensory control of action (Simon

<sup>1</sup> In an external reference frame, locations are represented within a framework external to the individual's body and therefore independent of the position of his limbs.

<sup>2</sup> In an internal reference frame, locations are represented with respect to the position of the observer's body and are therefore dependent of the position of his limbs.

effect: Röder et al., 2007), the processing of numbers (SNARC effect: Crollen et al., 2011), and spatial navigation (Vecchi et al., 2004; Noordzij et al., 2006). When required to press a left or right response key depending on the bandwidth of a sound presented from a left or right loudspeaker, EB reacted as late blind (LB) and SC in an uncrossed hand posture: they performed better when the spatial localization of the sound was compatible with the spatial localization of the response key (i.e., Simon effect). In contrast, when participants performed the task with crossed hands EB performed more rapidly than their sighted peers and, interestingly, presented a reversal of the Simon effect while LB and SC still showed a classic Simon effect (Röder et al., 2007). The presentation of a sensory stimulus to SC and LB therefore primes the response key compatible with the location of the stimulus in external space, regardless of which anatomical hand is used to press it. In EB, in contrast, the sensory stimulus primes the anatomical hand congruent with the location of the stimulus, regardless of where in space that hand is placed. SC, LB, and EB also presented a similar behavioral pattern when performing a numerical comparison task in an uncrossed hands posture. They responded faster when a left response was required for numbers smaller than five and when a right response was required for numbers larger than five (i.e., SNARC effect). As in the Simon task, however, crossing the hands resulted in a reversal of the SNARC effect in EB participants only (Crollen et al., 2011). The fact that LB and SC participants were similarly affected by crossing the hands indicates that once an external frame of reference is acquired it will continue to be used even though visual information may no longer be available (Röder et al., 2004, 2007; Crollen et al., 2011). Finally, differences between blind and sighted subjects have also been highlighted in spatial navigation tasks. While tasks requiring the use of an egocentric reference frame (i.e., route-knowledge) are performed equally well by SC and EB, tasks requiring the use of an allocentric reference frame (i.e., survey knowledge) are performed less well by the EB than by the SC (Vecchi et al., 2004; Noordzij et al., 2006).

At this stage, one may wonder why sighted individuals automatically remap touch in external coordinates since it can lead to confusion and slow down their reaction times (RT) when discriminating tactile information. This automatic remapping from somatotopic to external space is actually very effective to provide a common framework to coordinate and integrate spatial information obtained through touch with spatial information obtained through other sensory modalities, such as vision or audition which are coded by default in external spatial coordinates. This is particularly critical since the hands move constantly in the peri-personal space as different postures are adopted. The default use of an anatomically anchored reference system in EB may therefore actually prevent the effective integration of different sensory modalities in a multisensory integration task.

In a recent study, EB, LB, and SC groups were required to lateralize auditory, tactile, and audio-tactile stimuli either with the hands uncrossed or crossed over the body midline (Collignon et al., 2009b). While performance in the tactile condition replicated the pattern of results found in previous studies (greater detrimental effect of the crossed posture in LB and SC relative to EB), the results of the auditory and audio-tactile conditions showed a greater detrimental effect of the crossed posture in EB. As mentioned earlier, when EB lateralize tactile stimuli in crossed posture they do not remap the proprioceptive information onto an external spatial frame of reference and therefore do not present the conflict between body-centered and external coordinates that is present in SC or even in LB (Röder et al., 2004). EB therefore process spatial tactile information faster than their sighted peers. In contrast, the absence of automatic external remapping of touch in EB actually prevents these participants from efficiently matching the external sound location and the anatomical coordinate of the responding (auditory condition) or stimulated (audio-tactile condition) hand. The conflict created by crossing the hands is therefore more disrupting in EB than in SC or LB in the auditory and audio-tactile condition (see also Röder et al., 2007, Experiment 2). In other words, the absence of automatic activation of an external reference frame for perception and action in EB may impair multisensory integration and action control when there is a conflict between anatomical and external reference frames, for instance, when a sound has to be integrated with a touch in a hand-crossed posture (Collignon et al., 2009b).

In sum, developmental vision appears to trigger the development of the automatic recoding of sensory-perception/motorcontrol in an external space. Our opinion is that some of the advantages/deficits observed in EB (e.g., faster/slower RTs to non-visual events) might be explained, at least in part, by such qualitative changes in the way they process non-visual spatial information. For example, one of the most recurrent finding in the blind literature is the observation of faster RT to non-visual spatial targets in EB when compared to SC (e.g., Kujala et al., 1997; Hötting et al., 2004; Collignon et al., 2006, 2009b; Collignon and De Volder, 2009). Since the automatic external remapping process appears to occur between 100 and 360 ms (Azañón and Soto-Faraco, 2008; Heed and Röder, 2010; Overvliet et al., 2011), blind participants who do not automatically remap tactile/ spatial information in external space may not only be more resistant to conflict created by crossing hand posture but may also process spatial information some hundreds of milliseconds faster than sighted individuals. Indeed, in the TOJ (Röder et al., 2004), SIMON (Röder et al., 2007), and SNARC (Crollen et al., 2011) experiments described above, EB participants consistently showed faster RT to non-visual targets in the uncrossed posture. Indirect support of the idea that EB may somehow "skip" the external remapping computational step also comes from our observation that transcranial magnetic stimulation over the right intra-parietal sulcus (where the external remapping seems to occur: Makin et al., 2007; Azañón et al., 2010b) disrupted the spatial processing of sounds only in SC but not in EB (Collignon et al., 2009c).

Interestingly, using either an internal or an external frame of reference appears to facilitate performance on different tasks. While the default use of an internal reference frame leads to better performance in tactile lateralization task in EB, the use of an external frame of reference is more adapted for spatial navigation (Noordzij et al., 2006), multisensory integration, and the control of action toward external auditory sources in peri-personal space (Röder et al., 2007; Collignon et al., 2009b). It is however important to note that the spontaneous tendency to organize the environment through internal coordinates in EB does not mean that they are incapable of constructing an external coordinate system (see Eardley and van Velzen, 2011) but this form of encoding is less automatic than the anatomical one in this population.

## **References**


multisensory processing in peripersonal space. *Neuropsychologia* 47, 3236–3243.


*Received: 08 June 2012; accepted: 16 July 2012; published online: 01 August 2012.*

*Citation: Crollen V and Collignon O (2012) Embodied space in early blind individuals. Front. Psychology 3:272. doi: 10.3389/fpsyg.2012.00272*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Crollen and Collignon. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## How body balance influences political party evaluations: a Wii balance board study

## **Katinka Dijkstra\*, Anita Eerland, Josjan Zijlmans and Lysanne S. Post**

Institute of Psychology, Erasmus University, Rotterdam, Netherlands

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Rick Thomas, University of Oklahoma, USA Michiel van Elk, University of Amsterdam, Netherlands

#### **\*Correspondence:**

Katinka Dijkstra, Institute of Psychology, Erasmus University, Burgemeester Oudlaan 50, 3062 PA Rotterdam, Netherlands. e-mail: k.dijkstra@fsw.eur.nl

Embodied cognition research has shown how actions or body positions may affect cognitive processes, such as autobiographical memory retrieval or judgments.The present study examined the role of body balance (to the left or the right) in participants on their attributions to political parties. Participants thought they stood upright on a Wii™Balance Board, while they were actually slightly tilted to the left or the right. Participants then ascribed fairly general political statements to one of 10 political parties that are represented in the Dutch House of Representatives. Results showed a significant interaction of congruent leaning direction with left- or right-wing party attribution. When the same analyses were performed with the political parties being divided into affiliations to the right, center, and left based on participants' personal opinions rather than a ruling classification, no effects were found. The study provides evidence that conceptual metaphors are activated by manipulating body balance implicitly. Moreover, people's judgments may be colored by seemingly trivial circumstances such as standing slightly out of balance.

**Keywords: embodied cognition, conceptual metaphors**

## **INTRODUCTION**

What happens to our thoughts if we reach up or down, lean to the left or the right? For many common experiences, such as reaching up for a glass high in a cupboard, or leaning sideways to grab a leaflet, no major thoughts will come to mind during these routine and goal-specific physical actions. There are circumstances, however, in which body position or body movements may facilitate access to stored information or activate implicit or explicit notions about more abstract concepts of emotions, power, or politics.

Several studies provide support for this assumption by demonstrating facilitation of autobiographical memory retrieval following specific body movements or body position. One study (Dijkstra et al., 2007) showed how assuming a congruent body position during autobiographical memory retrieval with the body position of the original experience (i.e., lying down in a reclined position while retrieving a memory about visiting the dentist) resulted in faster retrieval times compared to an incongruent body position (i.e., standing in a jumping-jack position, while retrieving a memory about a visit to the dentist).

Another study looked at the nature and efficiency of autobiographical memory retrieval after participants moved marbles upward or downward while thinking about personal events from the past (Casasanto and Dijkstra, 2010). When participants retrieved memories to positive and negative prompts when they moved marbles upward and downward, they did this faster under congruent valence-direction conditions (positive memories and upward movements and negative memories with downward movements) than under incongruent conditions. They must have implicitly activated the "positive is up and negative is down metaphor" for no facilitation of congruent movement-valence combinations would have been found otherwise.

The results of these studies suggest an important linkage of bodily action, memory, and emotion. Physical actions can have an impact on how and what we think and facilitate access to earlier experiences. This supports the idea that movements may activate more abstract concepts that have a basis in our own experiences, which is also consistent with theories on metaphorical mental representations.

According to the Conceptual Metaphor Theory (Lakoff and Johnson, 1980), abstract concepts are understood in terms of concrete concepts and experiences (e.g., Lakoff, 1987). People tend to use concrete concepts, such as "up" and "down" in space, to talk about more abstract concepts, in this case "power." As each metaphor only defines specific aspects of an abstract concept, abstract concepts are defined in more than one concrete concept. For example,"love" may be represented in terms of both "warmth" and "closeness." Conversely, "warmth" may also be considered in the attribution of the abstract concept "anger" and "up" may be a representation of both power and (positive) emotion. Moreover, metaphorical concepts arise from physical and cultural experiences. We cheer with our arms up when our favorite team wins a championship and point our thumbs down when a referee makes a bad decision. These are instances of shared experiences that eventually shape the representation of the concept. These concepts and representations not only emerge from our physical experiences but from dominating cultural conventions as well (Lakoff and Johnson, 1980). The positive connotation of the thumbs-up gesture, for example, is not universal across cultures, but denotes a sexual insult in South-American culture.

A common type of metaphorical concept is the orientational metaphor that represents concepts in a linear manner (Lakoff and Johnson, 1980). Examples are the up or down metaphor (up is power, positive, more), and the left or right metaphor (right is more, future, conservative). A difference between the up-down metaphor and the left-right metaphor is that the former has an experiential basis when it reflects power (a child always looks up to an adult) or valence (cheering is up), whereas the left-right orientation is based on conventions such as the mental number line and the organization of political parties on a left-right dimension. The mental number line theory posits that smaller magnitudes are associated with a location on the left and larger magnitudes on the right of this line (Restle, 1970). Research has empirically supported this notion (Dehaene et al., 1993; Eerland et al., 2011).

The left-right dimension in politics originates from the spatial organization of the French Legislative Assembly of conservatives on the right and liberals on the left (Goodsell, 1988). If abstract concepts are activated as a result of concept-matching concrete experiences, it follows that left/right-manipulations of bodily actions can activate the left/right metaphor associated with politics and consequently influence one's thinking about politics. Several instances of such activations have been demonstrated in studies from different countries with a differential representation of political parties.

Oppenheimer and Trail (2010) addressed the question whether physical stimuli activating spatial concepts could influence political judgments. Using a right/left metaphor to distinguish between conservatives on the right and liberals on the left in the US could possibly help individuals'understanding of the political landscape. Support for this notion of political spatial representation comes from web pages that contain more frequently left-Democrat and right-Republican associations than the other way round. Three experiments were conducted to demonstrate the activation of a political concept based on a spatial manipulation.

In experiment 1, participants were asked to squeeze a clothespin shaped hand-grip to a closing position for 5 s with either the right or left hand. After that, they were asked to what extent they agreed with Democrats and Republicans on political issues on an eightpoint scale. Handedness and political affiliation were recorded as well. The results indicated a significant interaction between grip-hand and political agreement with participants squeezing with their left hand agreeing more with Democrats than participants who squeezed with their right hand. The second experiment manipulated spatial orientation differently by having participants sit in a chair that tilted to the right or the left after a wheel was removed. Participants answered the same questions as in the first experiment. Again an interaction was shown. Participants who leaned to the left were more likely to agree with Democrats. A third experiment was conducted online to recruit more Republicans. Here, participants made responses with a mouse to a visual target on the left or right screen. Again, only an effect for the left manipulation and stronger agreement with Democrats was found. Although consistent results were demonstrated for endorsement of Democrats after left-manipulations, no effects were found for the right-manipulations, which limits the generalizability of the results. Moreover, spatial manipulations varied in subtlety which makes it difficult to draw strong conclusions regarding the efficacy of the manipulations tested. Also, endorsement of political attitudes was assessed with two questions on a rating scale only. The validity of the dependent measures may therefore be questionable.

Another study examined the effect of left-right manipulations on the activation of political concepts in a country that is represented by 10 political parties in the first and second Chamber and is governed by a coalition of parties, Netherlands (van Elk et al., 2010). In the Dutch political system, political parties can be considered left-wing, right-wing, and central. A consequence of this system is that voters on one party may not always see the party issues being followed-through because of coalition agreements that necessitated compromises on certain issues. The large number of parties and a level of uncertainty regarding adherence to certain issues may affect the activation of abstract political concepts based on spatial (left-right) manipulations.

The study examined the speed of processing acronyms referring to names of political parties on co-activation of spatial associations (van Elk et al., 2010). Additionally, the role of one's own political preference was taken into account. It was expected that spatial associations should hold irrespective of one's own standpoint. In four experiments, participants made categorization responses to acronyms that represented names of political parties or names of public broadcasting companies by pressing either a button with the right or the left hand following a cue that indicated with which hand they should respond (< for left and > for right). Response facilitation was expected when participants responded with the hand that was congruent (left or right) with the perceived orientation of the political party (left-right).

The results for the experiments indicated a significant party (left-right) by action (left-right) interaction as predicted. In experiment 1, right hand responses were faster for right-wing parties but no differences for left-wing parties occurred. In experiments 2 and 3, the opposite pattern was found, whereas the results for experiment 4 mimicked those of experiment 1. Although the results demonstrated that processing acronyms of political parties is associated with implicit activation of spatial associations, no consistency with regard to the left or right manipulation was found. Moreover, participants with a preference for the political right showed a stronger effect size for left-wing parties, possibly because this (minority) group perceived the distance of left-wing parties to be larger as a result of their own political orientation.

We can conclude that both studies (Oppenheimer and Trail, 2010; van Elk et al., 2010) demonstrate an activation of political concepts based on spatial manipulations. These results are in line with the co-occurrence of references to political parties and spatial orientations in the public debate, news broadcasts, social interactions,and online guidance tools for political orientation (Landauer et al., 1998). A remaining question and a possible weakness of these studies is that none of the experiments demonstrated an effect of both the left and right manipulation on the activation of political concepts. The relatively low number of participants with a right-wing political orientation may have played a role, but does not explain the differential outcomes in the van Elk et al. (2010) studies.

In contrast to the Oppenheimer and Trail study, the current study employed an implicit left-right balance manipulation. Participants were not aware of the fact that their balance was manipulated. Rather, they were under the impression that they were standing straight up. Balance manipulation was used in combination with stimuli materials that were general and would not automatically activate a political party. Political statements were generated that did not obviously belong to a left-wing or rightwing party. A certain level of ambiguity was expected to be more vulnerable to the effects of the body balance manipulation. In addition, more specific questions regarding political knowledge and orientation were asked. The prediction was that participants who were manipulated to lean to the left or the right were more likely to ascribe political statements to left-wing or right-wing political parties.

## **MATERIALS AND METHODS**

#### **PILOT STUDY**

A pilot study was conducted on nine volunteers to test political statements with regard to the ascribed political orientation of the statements (left/right) and political affiliation (name of the party). These political statements were taken from political party programs and reworded in more general terms to make them less obvious as coming from a specific political party. The statements were considered left-wing by about half participants and right-wing by the other half. Statements that were considered leftwing or right-wing by (almost) all participants were eliminated or reworded. None of the 32 statements were attributed to the same political party by all participants. All raters considered the task moderately difficult (mean of 3 on a five-point rating scale) and all of the statements were considered more or less equally difficult to rate (mean of 3 on a five-point scale). Together, the ratings confirmed that the statements were general and difficult enough to create uncertainty with regard to what party it should be attributed to.

## **CURRENT STUDY**

## **Participants**

A total of 32 participants took part in the experiment (mean age = 20.29, SD = 2.16 range = 18–26), 94% women, 85% righthanded. Data from four participants had to be removed because of procedural difficulties (no native speaker of Dutch, too much movement during the experiment), or lack of familiarity with the stimuli materials. The remaining participants had a rather low interest in politics (an average of 2.5 cm, SD = 2.1 on a 10 cm VAS scale). Their knowledge of politics was also rather low (an average of 2.4 cm SD = 1.8, on a 10 cm VAS scale). Only 68% of the participants had voted in the past but this relatively low number can be partly due to the number of 18-year olds (21%) who may have been too young to vote during the elections last year. Overall, participants can be considered moderately involved in politics. The experiment complied with the regulatory standards of the psychology department's ethics committee. Participants consented to their involvement in the experiment prior to their participation and were informed about the procedure of the study.

## **Stimuli materials**

Thirty-two political statements from the pilot study were used for the current study. An overview of these statements with the instructions for participants is listed in Appendix A. Fillers were 32 statements regarding well known television programs from public broadcasting companies. These statements referred to the title, content, or presenter of these programs. A randomized order of

political statements and fillers was created with the restriction that no more than two political statements or fillers would follow one another.

## **Apparatus**

A Nintendo Wii balance board was used to manipulate body position. Recently, Wii balance boards have been used in experimental studies (Clark et al., 2010; Eerland et al., 2011) and demonstrated good psychometric properties. Center of Pressure (COP) can be quantified reliably with a balance board and has shown very good test-reliability on several different test protocols. In other words, the Wii balance board can be considered as a valid tool for assessing balance (Clark et al., 2010).

## **Procedure**

Upon entering the lab, participants removed their shoes and had their height recorded. Based on their height, the position of the computer screen was adjusted so that no changes in posture would occur as a result of reaching or bending to see the text on the screen. Next, they stepped on a Wii balance board for calibration and manipulation of body position. First, the COP was calibrated, then their posture was manipulated such that participants thought that their COP was in the middle of the balance board even though it was manipulated to the right or the left. This manipulation was very subtle (about a 2% change in weight proportion on left and right sensors of the board) and never noticed by participants. To ensure maintenance of the manipulated body position, their COP was displayed throughout the experiment on a computer screen as a square within a surrounding circle. If they strayed from this circle, a warning signal occurred to prompt them to retake their neutral COP position. In reality, the signal occurred to keep participants in the left or right body position.

After calibration, participants responded to statements that appeared on a screen displayed above a fixation circle that either concerned statements from political parties or television programs. When political statements were presented, participants were asked to name the Dutch political party they attributed the statement to. They could choose from 10 political parties that were in the Dutch House of Representatives at that time. Based on an existing division of the parties in a left/right horizontal axis and a progressive/conservative vertical dimension grid (Kieskompas Tweede Kamerverkiezingen, 2010, see **Figure 2**) five left-wing (progressive), four right-wing parties (conservative), and one center party could be identified (see **Figure 3** for the grid). Statements with descriptions of television shows were included to mask the true purpose of the experiment. For this filler task, participants had to name the broadcasting association (from a total of nine) that produced that particular program. A sheet with the acronyms of the names of these broadcasting companies listed vertically and in alphabetical order was taped onto the computer desk to ensure that participants knew the names. This focus on the broadcasting companies was meant to take away attention from the body position/political party manipulation. Participants were instructed to come up with a party/broadcasting company even if they were uncertain or did not know the answer. Participants were led to believe that the computer registered the answers, even though in truth, the experimenter wrote down the answers given by the participants and moved the statements forward with a Wiimote. This way, participants would not be tempted to turn toward the experimenter to provide the answers which would possibly influence their body position.

Halfway through the experiment, after responding to 32 political/television statements participants played a balance game, supposedly to get a break from the task, but in reality to change body position (from left-to-right or vice versa). Body position was counterbalanced across participants (left position first or right position first), and so was the order of statements (half of the participants received the statements in reverse order). Afterward, participants answered questions regarding their political knowledge, political interest, and filled in the parties and their own political affiliation on the progressive/conservative and left/right axes of the grid.

## **RESULTS**

The main prediction was that participants would attribute more political statements to right-wing political parties when leaning to the right than when leaning to the left. For participants leaning to the left, the opposite effect was predicted: higher attribution of political statements to left-wing when leaning to the left,than when leaning to the right. Given that only one political party qualified as a center party, the analysis was limited to attributions to left or

#### **Table 1 | Mean proportions, standard errors (SE), and lower-and upperbound attributions to political parties.**


right-wing parties<sup>1</sup> . **Table 1** presents the proportions and standard errors of the relevant variables.

A two (leaning direction: left vs. right) by two (political attribution: left vs. right) repeated measures ANOVA on the proportions of answers demonstrated a main effect of attribution, *F*(1,27) = 4.80, *p* < 0.05, η <sup>2</sup> = 0.151, and a leaning direction by attribution interaction, *F*(1,27) = 6.29, *p* < 0.05, η <sup>2</sup> = 0.189. **Figure 1** displays the results. Subsequent simple effects analyses demonstrated that when leaning to the right participants ascribed more political statements to right-wing political parties than when leaning to the left, *F*(1,27) = 7.94, *p* < 0.01, η <sup>2</sup> = 0.17. When participants were leaning to the left they tended to ascribe more political statements to left-wing political parties than when leaning to the right, though this effect showed only a trend toward significance, *F*(1,27) = 3.06, *p* = 0.07, η <sup>2</sup> = 0.12.

The same analysis was performed with the political parties being divided into right and left based on participants' personal opinions on what political parties' position should be on the grid, rather than on their actual political positions on the grid. No effects were found (*p* > 0.05). A similar analysis was conducted with the addition of their political affiliation as a between subjects factor based on how participants positioned themselves on the grid. No effects were found either (*p* > 0.05).

Participants demonstrated poor knowledge on how parties are organized along left-right and conservative-progressive dimensions. On average, participants were able to place less than half the parties correctly on the grid in the left-right dimension (*M* = 4.57, SD = 1.67, range = 1–7). A sample of 33 comparable participants that attributed political parties only on a left-right dimension showed a much higher correct attribution of political

<sup>1</sup>An analysis including the center party yielded the same results as the analysis without the center party. In both leaning conditions, the same number of statements were attributed to the center party. Posture had no effect on attribution to center parties.

parties (*M* = 5.97, SE = 0.18) than the current sample (*M* = 4.57, SE = 0.33).

## **DISCUSSION**

The results support the prediction that conceptual metaphors are activated when body balance is manipulated. People's judgments appear indeed to be affected by seemingly inconsequential circumstances such as leaning slightly to the left or right on a Wii balance board. In contrast to earlier studies, this manipulation was implicit and unknown to the participants involved. In some of the experiments in the Oppenheimer and Trail (2010) study and the van Elk et al. (2010) study, participants could have been aware of the left-right manipulation. The implicit manipulation and the generality and difficulty of the political statements seem to have been effective.

The marginal effect of the manipulation to the left, reminds us of the earlier studies that also did not demonstrate effects of the manipulation to both sides in one experiment. This outcome cannot be attributed to the design of these studies. Some of the experiments had a between subjects design and some had a within subjects design, just like the current study. Moreover, counterbalancing prevents possible effects from fatigue. Possibly, with more statements being attributed to right-wing than left-wing parties fewer statements were available for attribution to left-wing parties.

Surprisingly, participants' own political affiliation or personal organization of political parties on the grid did not matter. Scoring these parties on horizontal and vertical dimensions may have been difficult because of the joint left-to-right and progressiveconservative axes. The grid may not have been the right task to use for this purpose. So far, it has mostly been applied as an online task to determine party preference as an aid to prospective voters. Positioning of parties on the grid as outcome in the party preference process depends on which issues are prioritized by the person taking the test. This adds to the complexity of the task. Moreover, participants in the current sample may not have been motivated to fill in the grid given their low-to-moderate scores on political knowledge and interest.

The study contributes to the current discussion in the field of embodied cognition and Conceptual Metaphor Theory. Abstract concepts seem grounded in concrete experiences, even if these experiences are based on conventions (left-right = left-wing and right-wing). Subtle manipulations in combination with general stimuli materials may be an effective way to demonstrate such

activations. The role of the body on cognitive processes has again been demonstrated and for a somewhat different task as before: attribution of political parties based on statements rather than reaction times in a go, no-go task (van Elk et al., 2010), or endorsement of political parties (Oppenheimer and Trail, 2010). The outcomes contribute to a growing body of evidence suggesting that cognition is grounded in action, even with subtle actions.

Future studies could assess whether the activation of abstract concepts is more effective when the whole body is involved, as was demonstrated in this study, or when parts of the body

#### **REFERENCES**


and political culture. *Br. J. Polit. Sci.* 18, 287–302.


are involved (van Elk et al., 2010). If the body is considered essential for cognitive reactions to occur as is proposed by the strong view of embodiment, manipulations with the whole body may be more effective than with parts of the body. On the other hand, if the body is merely a tool in this respect, manipulations with part(s) of the body may be equally effective.

Another issue is the bi-directionality of these phenomena (Rueschemeyer et al., 2009; Miles et al., 2010a). Miles et al. (2010b) showed that direction of apparent motion in the form of dots appearing to move toward or away from the center of a display affected their mental time travel. Perceived backward motion was associated with thoughts about the past whereas perceived forward motion was associated with future-oriented thoughts. An earlier study by Miles et al. (2010a) had showed a converse relationship, that of moving forward or backward as a result of mental time travel to the future or the past. It is plausible then that after demonstrating an effect of bodily actions on the cognitive process of attribution, an opposite effect could be demonstrated as well. It is not unlikely that if participants are primed with a left-wing or right-wing political orientation, that they may unconsciously start leaning to the left or the right. If so, bi-directional processes would be at play.

What should be done during elections? Making sure that voters sit and are unable to lean to the right or the left? Such precautions are probably not necessary. Most effects that have been documented in embodied cognition research are short-lived, context-dependent, and only appear after very specific and scopelimited manipulations and seem to affect reaction times, ratings and attributions to parties, not the selection of a party to vote for. Otherwise, we could never be in a warm room without worrying that we may feel affective or hostile toward another person in the room or feel miserable whenever we are shorter or at the bottom of a slope because we lack power.

through time. *Psychol. Sci.* 21, 222–223.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 August 2012; accepted: 13 November 2012; published online: 07 December 2012.*

*Citation: Dijkstra K, Eerland A, Zijlmans J and Post LS (2012) How body balance influences political party evaluations: a Wii balance board study. Front. Psychology 3:536. doi: 10.3389/fpsyg.2012.00536 This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Dijkstra, Eerland, Zijlmans and Post . This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX A**

Read the statement and tell me to what political party the statement should be ascribed:

A dedicated group of police officers should be assigned to only focus on dealing with sexual offenses. Fired employes should be able to claim supplementary training. Future generations should not be left with the debt of the crisis. Teachers in primary and secondary education should attend refresher courses on a regular basis. There should be a law to determine benefits after job termination. There should be local hotlines for nuisance in and around residential dwellings. There should be one national general inspection and audit for entrepreneurs. Organizers of sport events should pay for the deployment of police officers themselves. The number of available rental homes should increase gradually. As soon as someone turns 65, this person has the right to retire. When signing with a health care insurance company existing handicaps should be taken into account. Gifted children should be given the opportunity to receive an alternative education program. The Dutch government should follow the European policy regarding regulations around fishery. More money should become available for caregivers. Everyone below the age of 23 has a mandatory job or school education. To resolve issues related to traffic jams, the government should invest in better public transport. Human trafficking should be handled by specialized judges. The European Union is very important for welfare in the Netherlands. There should be clear guidelines regarding the provision of hard drugs. There should be mandatory parenting support for families dealing with problems. Civilians should pay administrative costs when they complain about the police. Norms should be set up to determine the distance from every house to the nearest public transport stop. The number of public broadcasting stations should be reduced in order to reduce government spending. Reclassification of local governments is necessary to increase administrative power. The government should be allowed to take extra security measures in order to fight crime. Rules regarding road safety should be tightened. Parents should not receive child support if their child skips school on a regular basis. European cooperation is needed to counteract terrorism. The Netherlands should no longer invest in organic farming. People immigrating to the Netherlands should be helped more extensively concerning integration. The chronically ill and their families should receive more support.

The national armies of countries in the European Union should cooperate more closely.

## **APPENDIX B**


## A review of embodiment in autism spectrum disorders

## **Inge-Marie Eigsti \***

Department of Psychology, University of Connecticut, Storrs, CT, USA

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

David R. Simmons, University of Glasgow, UK Susan W. Cook, University of Iowa, USA

#### **\*Correspondence:**

Inge-Marie Eigsti, Department of Psychology, University of Connecticut, 406 Babbidge Road, U-1020, Storrs, CT 06269, USA. e-mail: inge-marie.eigsti@uconn.edu In classical approaches to cognition, sensory, motor, and emotional experiences are stripped of domain-specific perceptual and sensorimotor information, and represented in a relatively abstract form. In contrast, the embodied cognition framework suggests that our representations retain the initial imprint of the manner in which information was acquired. In this paper, we argue that individuals with autism spectrum disorders (ASD) display impairments in the temporal coordination of motor and conceptual information (as shown in gesture research) and striking deficits in the interpersonal mimicry of motor behaviors (as shown in yawning research) – findings we believe are consistent with an embodied account of ASD that includes, but goes beyond, social experiences and is driven in part by significant but subtle motor deficits. In this paper, we review the research examining an embodied cognition account of ASD, and discuss its implications.

**Keywords: autism, ASD, embodiment, gesture, mimicry**

## **WHAT DOES IT MEAN FOR HUMAN COGNITION TO BE EMBODIED?**

Classic models of information processing in the cognitive sciences allow sensory, motor, and emotional experience to be represented as stripped of their perceptual and experiential basis. In such models, largely inspired by the metaphor of "mind as computer," information taken in by the different sense modalities is preserved in memory in the form of abstract symbols, functionally separated from the original neural systems (those involved in motor action, vision, olfaction, and audition, for example) that encoded them in the first place.

In contrast, the theoretical framework of *embodied cognition* encompasses the notion that bodily experiences play an integral role in human cognition, and that our experiences are stored in a manner that maps onto the original neural systems (motor, visual, olfactory, and auditory) that encoded them in the first place. In this formulation, the ability to represent objects and events is subserved by sensorimotor systems that govern interactions with objects and events (Barsalou, 1999).When objects and events are recalled from memory to serve action goals, the sensorimotor systems involved in their initial representation are reactivated.

There are a number of research findings that are consistent with this hypothesis. For example,individuals hearing a description of a skyscraper tend to make vertical eye movements; those describing horizontally oriented structures tend to make side-to-side movements (Spivey et al., 2000). Similarly, individuals making judgment about events involving forward motion ("pushing closed a drawer") respond more quickly if their response involves a similar motion (forward push) than an incompatible motion (e.g., responding to "opening the drawer" with a forward movement) (Glenberg and Kaschak, 2002). This suggests that envisioning the response has activated the motor schema, thereby facilitating (or interfering with) the subsequent response.

Embodied effects can be "offline," as described above, or online, concurrent, effects (as described in Niedenthal et al., 2005). For

example, in one classic study, individuals were asked to hold a pencil in their mouths while watching cartoons (Strack et al., 1988). In one condition, the pencil was oriented laterally, projecting away from the face; in a second condition, the pencil was oriented parallel to the mouth. These distinct facial positions were chosen because they activate, respectively, the musculature involve in frowning or smiling. The dependent measure in the study was participant judgments of cartoon humor; consistent with an embodied model of cognition, the "smile-activating" group rated the cartoons as significantly funnier. When subjects activated their smile musculature, as they watched a cartoon, they found the cartoon to be more humorous; the muscle activation influenced their representation of the cartoon. Effects were entirely implicit; although the relevant musculature was active, participants were not instructed to smile *per se*. There are a number of similar findings: when participants shake versus nod their heads (under the guise of judging the quality of headphones), while listening to a persuasive message, the listeners in the nodding condition are more likely to agree with the message in subsequent evaluation (Wells and Petty, 1980). Again, attitudes toward a stimulus are influenced by a physical (bodily) posture enacted when the individual encountered the stimulus; note that there is no *a priori* reason why smiling while seeing something should make one find that stimulus more humorous or pleasant, unless one is encoding the smile posture as part of the stimulus representation. The opposite is also true: when individuals'motor movements are inhibited, there is *interference* in the experience of emotion and processing of emotional information (Niedenthal et al., 2005).

A construct that has played a role in the development of embodied approaches is the notion of "affordance perception" – that is, the qualities of objects that suggest to the perceiver *how* those objects are to be used (Gibson, 1977). For example, *how* does one know how to sit on a chair? Gibson (1977) proposed that qualities of the chair (its affordances) are independently available as percepts in the environment. We, as perceivers, make use of this

information as we engage with the world. Research on children with Developmental Coordination Disorder suggests that these children are less aware of their own reaching capabilities (Johnson and Wade, 2009). Based on these findings, the authors hypothesize that impairments in skilled movements, generally, lead to differences in the ability to generate and detect information about affordances.

Casasanto and colleagues have formulated an *individual differences approach* to embodied cognition. Specifically, their bodyspecificity hypothesis proposes that people with "different kinds of bodies" think differently (Casasanto, 2011). For example, they show that right versus left-handed individuals represent abstract concepts differently, as revealed by their spontaneous gestures (Casasanto, 2009; Casasanto and Jasmin, 2010) and by differential patterns of brain activity (Casasanto,2011). Certainly, this research would suggest that individuals with different motor control abilities are likely to exhibit differential cognitive representations; in this paper, we examine this possibility for the case of autism spectrum disorders (ASD).

There has been, to date, relatively little research directly exploring representation and cognition in individuals with ASD *from an embodied perspective*<sup>1</sup> *.* ASD refers to a developmental disorder characterized by atypicalities in three domains: social reciprocity; language and communication; and repetitive behavior and stereotyped interests. The three primary diagnoses that comprise the autism spectrum (Autistic disorder; Asperger's syndrome; Pervasive Developmental Disorder, Not Otherwise Specified, PDD-NOS) share a similar pattern of deficits, though they differ in severity and long-term prognosis. Individuals with PDD-NOS typically exhibit better long-term outcomes, for example, than those with autistic disorder. The autism "spectrum" refers to variability across these diagnoses, as well as the large variability in IQ that can range from severe intellectual disability to the gifted range.

This research is valuable on at least two accounts (consistent with a framework proposed in the field of developmental psychopathology; Cicchetti and Rogosch, 1996). *First*, it holds the potential for revealing strengths and weaknesses in the embodied framework, by testing a population that presents with a wide range of abilities in the relevant domains. We propose that deficits in motor control and synchrony (at the cognitive and neural levels, reviewed below) have downstream effects on representation. As such, ASD may provide a useful test case for examining the framework of embodied cognition. *Second*, studies of ASD from an embodied perspective are likely to illuminate meaningful strengths and weaknesses, and potentially core deficits, in ASD. Klin et al. (2003) proposed that differences in embodied social cognition may cause a broad range of symptoms and differences in ASD. In this account, because individuals with ASD experience social stimuli as less salient (e.g., Dawson et al., 1998), seeking out and acting on physical rather than interpersonal stimuli, their experience of the world is "socially disembodied." Theories of embodied cognition predict that experience with social interaction is, essentially, physically encoded; individuals who have less early experience with

social contexts have reduced physical responses related to those contexts; their representations of the contexts, and subsequent physical responses in social contexts, are thus less automatic and efficient. This provocative hypothesis has received little empirical attention to date.

In this review, we provide a brief overview of embodied cognition; we next describe ASD, and then review the small set of studies that directly test embodiment effects in ASD. Based on the literature to date, we suggest that the role of embodiment in ASD may go beyond merely *social* contexts. Specifically, we propose that because motor processes are subtly impaired in ASD (reviewed in detail below), because of noisier synchronization of neural assemblies (Milne, 2011), or because of reduced cortical connectivity (Belmonte, 2004; Just et al., 2012), sensorimotor information may be *poorly integrated across modalities*. This leads in turn to impairments in the encoding of sensorimotor representations of the world; these noisier representations are more difficult to access and reproduce. We suggest that, because social representations are temporally evanescent, and complex, they are *more susceptible to noise*. In this account, individuals with ASD "embody" *all* their experiences differently, not just social ones; this difference impacts social along with other processes.

The notion of embodiment – that our sensorimotor input gives us access to the actions, emotions, and sensations, of other people (Gallese, 2006) – seems intuitively relevant to ASD, because individuals with ASD struggle to understand others, and to be as automatically and implicitly engaged with other people. We now review some potential "underpinnings" of embodiment, that may play an important role in the symptomatology of ASD.

#### **MOTOR SKILLS IN ASD**

If an individual cannot plan or implement a motor movement effectively, this will decrease the efficiency with which this person can build links between motor movements and other information (ideas, emotions, cognition, etc.). In other words, this individual will experience a different influence of embodiment. We turn now to a discussion of motor skills in ASD. Several decades ago, Rogers and Pennington (1991) proposed that motor skill deficits may be quite important in the symptomatology of ASD. This suggestion was seconded by a review of multiple studies (Smith and Bryson, 1994). More recently, Mostofsky has proposed that motor impairments are central in the phenotype of ASD (e.g., Mostofsky et al., 2006).

A variety of studies have shown motor coordination problems in ASD, including problems with: fine motor control (Szatmari et al.,1990); grip planning (Hughes,1996); anticipatory movement preparation (Rinehart et al., 2001); gait and posture (Ghazziudin et al., 1992; Fournier et al., 2010), including shortened steps, "toe walking," and generally poor coordination of limb movements (Vilensky et al.,1981); balance and coordination (Mari et al.,2003); imitation and pantomime (DeMyer et al., 1972; Stone et al., 1997); and reaching and grasping movements (Glazebrook et al., 2009).

Differences in motor skills may be present as early as infancy and toddlerhood (Teitelbaum et al., 1998; Brian et al., 2008; Dowell et al., 2009). For example, Teitelbaum et al. (1998) showed that early disturbances of movement (at 4–6 months) were apparent in home videos of infants later diagnosed with autism. Motor skills

<sup>1</sup>For example, a PubMed search for joint terms "embodied cognition" and "autism" yielded exactly 10 published papers, out of 22,138 for the terms independently (21,705 for "autism" and 433 for "embodied cognition").

measured in children with ASD at age 5 years and again at age 7 years 11 months showed less improvement than in children with ADHD or non-specific developmental delays (VanWaelvelde et al., 2010). A study that followed a large (*n* = 95) group of toddlers longitudinally found that motor skills in infancy were a strong predictor of later social and communicative outcomes in ASD (Sutera et al., 2007) – *stronger than the severity of autism symptomatology*. Motor abilities are important predictors of outcomes in ASD.

Studies have consistently found that while planned actions may be ultimately"accurate" (e.g.,individuals are able to reach for a target; see Dewey, 1993) in high-functioning autism, individuals with ASD are likely to have movements that exhibit significantly more temporal and spatial variability (Mostofsky et al., 2006, 2007, 2009; Glazebrook et al., 2009). Impairment often involves subtle anticipatory adjustments (Schmitz et al., 2003); for example, children with autism fail to anticipate the motor consequences of an action's final goal (Cattaneo et al., 2007). Motor impairments in ASD can be subtle (Dowell et al., 2009) or absent; one elegant study of ballthrowing that required postural adjustment found absolutely no differences between ASD and control groups (Gidley Larson et al., 2008). Thus, the literature has been marked by what seem like highly inconsistent findings. These discrepancies were addressed in a recent review, which used a computational framework (rather than individual measures and tasks) to divide motor control into five components (Gowen and Hamilton, 2013). The authors concluded that findings were,in fact, consistent, showing impairments in ASD in two domains: (1) poor integration of information for efficient motor planning, and (2) deficits in organizing motor knowledge. They suggested that increased sensorimotor noise, and higher level motor planning, were both important contributors to ASD symptomatology. The issue of noisy representations as raised in the Introduction appears to have significant support from other domains.

There are clear parallels between the *production* of motion in ASD, and its *perception*, reflecting the relationship between perception and action more generally (Sperry, 1952). Many studies of motion perception have used a handy methodological tool known as "point-light displays," a technique first described in Johansson (1973). In point-light displays, the participant sees a number of small, bright, dots on a dark background; the dots move in a coordinated fashion. Initially, the stimuli are created by fastening actual lights on the arms and legs of a moving person, dimming the lights, and then recording the resulting action; in this way, the body is invisible and the observer sees only the moving points of light. This is analogous to the real-life experience one might have of seeing only bobbing lights in motion, when a jogger clothed in black runs along a dark road wearing running shoes with reflective dots. Nowadays, stimuli are typically created using computer animation rather than actual lights.

While the many point-light display studies in ASD cannot be reviewed in detail here, one can draw several generalizations. Many studies of biological motion perception in ASD have reported striking differences in both behavioral and neural (especially fMRI) responses to such displays (Atkinson, 2009; Kaiser and Shiffrar, 2009;Nackaerts et al., 2012). These findings are not always replicated (McAleer et al., 2011), with differences reported for only *some* stimuli and tasks (Saygin et al., 2010). One potential resolution to the conflicting findings is that, when behavioral performance is carefully matched, individuals with ASD may activate clearly distinct brain networks as observed in fMRI (McKay et al., 2012). In other words, brain differences should only be interpreted for tasks on which behavioral performance is similar across groups (otherwise confounds of task difficulty/effortfulness render group differences less interpretable). If we assume that individuals with ASD do indeed exhibit impaired responses to point-light displays, this suggests that their perception of physical motion in other individuals is altered,which likely is reflected in their different mimicry abilities, reviewed next.

## **MIMICRY**

Mental simulation refers to the process by which we compare a representation of someone else's thoughts, feelings, or actions, to our own. This simulation could take place through the mediating mechanisms of mimicry and peripheral feedback. The ability to contrast our own with another person's mental states has been described as a potentially critical impairment in ASD. *Mimicry*, likely one of the building blocks of mental simulation, involves the non-volitional, implicit, automatic matching of another person's actions; it is clearly distinguished from imitation, defined as the deliberate, explicit, effortful reproduction of another person's actions (Call and Tomasello, 1995). Many studies of typically developing (TD) individuals have demonstrated that when two people interact, an unconscious mimicry and synchronization of behavior occurs, impacting body posture, facial expressions, vocal prosody, speech patterns, emotions, and gestures (Niedenthal et al., 2005). Furthermore, the coupling of our automatic tendency to mimic and the effects of peripheral feedback on our inner emotional states may explain the phenomenon of emotional contagion (Hatfield et al., 1994). That is, because we unconsciously mimic the emotional movements of others, we unconsciously feel the emotions of others as we interact with them. The mimicry aspect of embodiment is so fundamental to our everyday interactions, that it is difficult to imagine being unaffected by the non-verbal signals and cues of those around us. It is likely responsible, at least in part, for why we enjoy comedic movies in a crowded movie theater more than we might enjoy such movies when watching at home alone. Research is still in the process of fleshing out the full cascade of consequences that might come about as the result of an earlyfailure in mimicry. However, it seems likely that this failure would be catastrophic. Mimicry increases feelings of closeness and connection between individuals, which, in turn, increases the amount of mimicry individuals will display toward one another. One can imagine this process continuously unfolding in a cascade between caregiver and child throughout the early years of development. Abilities such as imitation, joint attention, and speech develop in the context of this increasingly synchronous bond. Strikingly, imitation (Smith and Bryson, 1994), joint attention (Kasari et al., 1990; Charman et al., 1997; Clifford and Dissanayake, 2008), and language (Eigsti et al., 2011; Mayo et al., 2013) are all signal impairments in ASD.

Research has suggested interactions between mimicry and communication. For example, one account describes conversation as "joint action" (Garrod and Pickering, 2004). A dialog between two interlocutors requires cooperation between those individuals in order for them to understand the meaning of the dialog; observers who do not participate are typically less accurate in their comprehension. The ease with which we engage in dialog, and our comprehension in dialogs, potentially involves *priming* of representations at multiple levels: phonological, lexical, syntactic, and semantic (Levelt and Kelter, 1982; Branigan et al., 2000; Hartsuiker et al., 2004). Priming reduces the effortfulness of conversation; it is likely that priming is similar to the notion of mimicry described above. Because individuals with ASD struggle with many aspects of conversation, including maintaining relevance, turntaking, task-switching between speaking and listening, higher level organization of narratives, and so on, it is possible that the model of interactive alignment provides a useful framework for understanding pragmatic and discourse impairments in ASD<sup>2</sup> .

Impairments in mimicry potentially underlie other social skills impairments in ASD. Adolescents and adults with highfunctioning autism do not appear to exhibit normal automatic facial mimicry (McIntosh, 1996). Despite being able to produce typical facial expressions deliberately and explicitly, a group of individuals with ASD failed to exhibit mimicry of emotional expressions monitored via electromyography (EMG) (which monitors minute muscle contractions) when passively viewing emotional expressions. Similarly, while viewing emotional faces, adults with ASD had behaviorally intact performance, but decreased autonomic arousal, measured via galvanic skin conductance (Hubert et al., 2009). This suggests an altered implicit response to emotional faces. It should be noted that children with autism have been found to exhibit typical levels of autonomic arousal when viewing others in distress (Shenk and Ramachandran, 2003; Ben Shalom et al., 2006, November) even though children may not respond with typical *behaviors* to the distress of others (Sigman et al., 1992; Bacon et al., 1998). These studies suggest that when explicit and conscious behaviors are measured, responses to emotion displays may look intact; however, when we probe "under the hood" for more physiological responses, individuals with ASD look atypical. Clearly, for mimicry to be available as a source of feedback, an individual must attend to another person. Individuals with ASD, who often avoid looking at others, may be forced to rely solely on top-down cognitive strategies, rather then benefiting from bottom-up perceptually driven mimicry to understand other people's emotions, ideas, and thoughts.

## **MIMICRY, YAWNING, AND EMOTIONAL CONTAGION**

In our research, we have examined in detail one form of highly automatic mimicry: contagious yawning. Distinct from the spontaneous yawns that are observed in the human fetus, contagious yawns are prompted by seeing or hearing another person yawn. Sometimes yawning can be elicited by just reading the word; perhaps the readers of this chapter are, even now, stretching their jaws. Interestingly, susceptibility to contagious yawning is apparently associated with self-recognition and theory of mind, two abilities that contribute to complex empathy (Platek et al., 2003).

In our work, we tested contagious yawning in a large sample (*n* = 123) of TD children, ages 1–6 years, by reading a story to them for 12 min and deliberately yawning at four points during the reading (Helt et al., 2010). Results showed that TD children did not exhibit contagious yawning until age *four.* The late onset of contagious yawning implies that emotional contagion (a form of embodiment) becomes more developed and more sensitive over time, resulting in increased affective attunement with others. Furthermore, as part of the same study, we used the same method to elicit contagious yawning in a sample of 30 children with ASD ages 6–15 years (e.g., well beyond the point when TD children exhibit robust contagious yawning). Their responses were compared to a new group of chronological-age-matched (*n* = 28) or mental-age-matched (*n* = 28) TD children. In stark contrast to the TD participants, *none* of the children with autistic disorder and only three out of the 10 children diagnosed with PDD/NOS (milder ASD) yawned contagiously, as compared to 43% of an age-matched TD group. There was thus a relationship between diagnostic severity in ASD and susceptibility to contagious yawning. We hypothesized that the relationship reflected a difficulty in recognizing or acting on the correspondence between oneself and others, or a deficit in mimicking emotional behavior. When a person mimics (even unconsciously), the activation of emotional body schemas also creates the corresponding emotional reaction (i.e., the act of smiling causes us to feel happier, McIntosh, 1996), a phenomenon that may facilitate understanding the thoughts and feelings of others. Individuals with ASD may not experience emotional contagion during the early years of development.

Another study of implicit, spontaneous mimicry asked whether individuals with ASD are less likely to coordinate or synchronize their actions with a significant other. In this research (reviewed in Marsh et al., 2009), child-caregiver dyads were invited to sit in two adjacent rocking chairs of appropriate sizes, while the adult read aloud a book. During the reading, the adult was asked to rock her chair in tempo with a metronome that *only* the adult could hear. Analyses probed the relative synchrony of the child's and adult's rocking movements. Results indicated that the children with ASD were significantly less likely to coordinate their rocking with the caregiver, compared to a group of TD children and their caregivers.

A third study of non-conscious mimicry in ASD examined the kinematics of grasping objects as a tool for examining the relative impact on one person's behaviors on another person's action (Becchio et al., 2007). In this clever paradigm, children with autism and children with TD matched on age and gender (IQ was not assessed) watched a model reach out and grasp an object (in one condition), or look at an object (in another condition). Furthermore, on some trials, a second distracter object was present on the table. After the model completed the action (looking or grasping), the child was told to grasp the target object; in no case was there a distracter object in the participant's display. Interestingly, in the TD group, participants' reaches to objects where there had been a distracter for the model showed consistent kinematic differences, even though there was no distracter present. Effects were similar when the model simply *looked* at the target. In those cases, the child's reach showed a less efficient path. In contrast, even though they looked as much at the model, the group with ASD showed no such difference. The findings suggested that the participants with autism were less influenced in their own actions by the actor's gaze. This, and related, research depends on the underlying integrity of motor planning and control; that is, if motor control is impaired

<sup>2</sup>We thank an anonymous reviewer for suggesting this potential link.

in ASD, as reviewed above, one would expect differences in all motor tasks; these differences may reflect not sociocommunicative impairments, but in reaching, grasping, and so on. In fact, in Becchio et al.'s (2007) reaching study, participants with ASD showed intact motor control in some conditions, but not others, allowing us to attribute performance impairments to differences in those conditions.

#### **CONVERSATIONAL GESTURES AND EMBODIMENT IN ASD**

We turn now to a discussion of a phenomenon that we propose provides support for the possibility that even subtle differences in motor control in ASD have significant consequences, potentially reflecting a developmental difference in embodiment. Gestures – the spontaneous manual movements that accompany speech – are an important form of non-verbal communication that may facilitate early language learning and knowledge acquisition (McNeill, 1992). One category of gesture that is particularly relevant for the current paper is that of *iconic* gestures, which depict physical properties of referents. Iconic gestures often provide information that *complements* the information in the co-occurring speech. For example, a throwing motion can add information to the statement that "*he threw the coconut*," for example by showing the direction (*over to the left*), or the manner (with *excitement* versus with *anger*) of the throwing action. Such gestures are informative about semantic representations.

Gestures undergo a similar developmental course as that of speech in language acquisition. In TD children, gestures precede first words (Bates et al., 1975) and may often substitute for specific lexical items (Acredolo and Goodwyn, 1988). Longitudinal studies have shown that children enter the first-word stage (at 10 months) producing more gestures than words (Capirci et al., 2005). The majority of objects to which children refer during this period are referred to first in gesture; they emerge in speech approximately 3 months later (Goldin-Meadow et al., 2007). Similarly, gesture-speech combinations (e.g., pointing to a hat while saying "dada") emerge before two-word phrases (e.g., "dada hat," McEacherns and Haynes, 2004; Özçaliskan and Goldin-Meadow, 2005; Goldin-Meadow et al., 2007). Gestures may facilitate early language development by offering an opportunity for symbolic representation *without* the complex motor sequences required by speech.

One influential theory proposes that gestures originatefrom the interface of speech and visuospatial thinking (Kita and Ozyurek, 2003); they are shaped by language and simultaneously express information that may not be encoded in speech (viz., visuospatial and motoric properties). Perhaps even more relevant to the current account is the"Gesture as Simulated Action" theory (Hostetter and Alibali, 2008). In this account, language (and gesture) involve simulations of perception and action that *activate or reactivate* perception and action states. Gesture specifically involves the simulation of motor and perceptual components of visuospatial imagery. In both theories, gestures encode visuospatial and motoric properties of lexical referents; in this role, gestures serve as the manifestation of action in a virtual environment. In other words, *conversational gestures reflect the operation of embodied cognition*, a notion supported by numerous empirical findings (Hanlon et al., 1990; Hansen et al., 2008; Iverson and Thelen, 1999; Kita and Ozyurek, 2003). Furthermore, hand gestures have a substantial impact on listeners, whose interpretations and subsequent movements are reliably affected by characteristics of speakers' gestures (McNeill et al., 1994; Cook and Tanenhaus, 2009).

Gesture is thought to be specifically impaired in ASD, a fact reflected in the diagnostic criteria, in which gesture, and its integration with speech, is mentioned in numerous symptom criteria (American Psychiatric Association, 2000). The absence of pointing gestures (known as deictics) is considered an early warning sign of ASD; both the comprehension and production of deictics are found to be reduced (Mundy et al., 1986) and delayed (Camaioni et al., 1997) in ASD. Pointing is also often associated with joint attention, a major developmental milestone that involves sharing experiences with others (Mundy and Stella, 2000). Gestural joint attention skills are associated with language skills in children with ASD, such that children with reduced deictic use (for the purpose of drawing someone else's attention to an object or event) are more delayed in early language acquisition (Loveland and Landry, 1986; Mundy et al., 1990; Bono et al., 2004). A reduced gestural repertoire has been observed in ASD (Wetherby and Prutting, 1984; Colgan et al., 2006), such that gestures fulfill fewer communicative functions.

Interestingly, most studies have failed to find group differences in *rates* of gesturing, after controlling for the amount of speech (Attwood et al., 1988; Capps et al., 1998). Rather than differences in gesture rate, or *quantity*, research and clinical description have both reported differences in gesture *quality*, including reduced synchrony with speech (Tantam et al., 1993), and "oddness" of greeting waves (Hobson and Lee, 1998). The unusual quality of gestures produced by individuals with ASD has long been noted in clinical accounts of the disorder. For example, Wing (1981) cites odd gestures in her case descriptions: "he uses large, jerky, inappropriate gestures to accompany speech" (Wing, 1981, p. 126). Similarly, Hans Asperger's original account of the disorder noted the "large," "clumsy," and "inappropriate" gestures of the patients he described Asperger (1944), from Wing (1981). The odd quality of gestures, and their poor integration with speech, were noted by these influential clinicians.

Given these suggestive impressions that gesture quality may differ, our group examined the spontaneous gestures of adolescents with (*n* = 15) and without (*n* = 15) ASD, matched on age (ages 12–17), language level, and non-verbal IQ, as they told a story based on six cartoon prompts (de Marchena and Eigsti, 2010). There were striking group profiles. The adolescents with ASD produced as *many* gestures as their peers; there were large individual differences within each group, with some participants producing as few as two gestures during their story, and some producing as many as 23. However, their gestures were poorly *synchronized* with the semantically related speech. That is, in the ASD group, gestures were likely to either precede or follow the relevant speech by a lag of (on average) 333 ms. Furthermore, we asked raters (typical college students, entirely naïve to diagnosis and study questions) to judge the spontaneous narratives for clarity, how well the student could imagine the action, etc. Results indicated that the degree to which a narrative included poorly coordinated speech and gestures correlated strongly with ratings of communicative quality. It was also associated with ASD symptom severity. This kind of speech-gesture asynchrony appears to violate basic requirements for gestural comprehension (Habets et al., 2010), and it also seems to have devastating consequences for communication.

Gesture-speech coordination requires the efficient mobilization and ordering of distinct behaviors. There is a growing literature demonstrating impairments in behavioral timing in ASD. For example, electrophysiological studies have shown delayed responses to social stimuli by children with ASD, compared to TD peers (McPartland et al., 2004; Webb et al., 2006). A recent study of mimicry that measured facial muscle activity (via electromyography) showed that children with ASD differed from TD peers only in their *latency* to mimic (Oberman et al., 2009). Although the amount and appropriateness of mimicry was comparable, children with ASD took longer to mimic, suggesting that deficits in interpersonal synchrony were driven primarily by inefficient *timing* of behaviors, rather than by the *execution* of the behaviors themselves. Timing impairments are also present in non-social cognitive processes. For example, an intriguing study asked adults with and without ASD to complete an eyeblink conditioning procedure (Sears et al., 1994). In this task, a beep (tone) reliably precedes the delivery of a puff of air to the eye, eliciting an eyeblink. After training with tone-puff pairs, the individual receives a tone in isolation; when the individual blinks in response to the tonealone stimulus, this is taken as evidence that the individual has learned the tone-puff association. In the Sears et al. (1994) study, the adults with ASD did show conditioned learning; however, their blink response was produced at a maladaptive interval, such that the eye reopened to its maximal aperture just as the puff of air arrived. Gesture deficits are also consistent with the hypothesis that individuals with ASD exhibit deficits in multi-modal sensory integration, as has been found for visual speech (lipreading) effects on auditory perception (Smith and Bennetto, 2007) and for temporal asynchrony between auditory and visual linguistic cues (Bebko et al., 2006).

Models of embodiment specify that the neural state that obtained when a stimulus was first encountered impacts subsequent processing of that stimulus. If timing impairments mean that the "initial state" is less clearly defined (e.g., that a given stimulus is not tightly coupled to the individual's motor action), this should lead to relatively weaker embodiment effects. The above evidence of poor behavioral timing and synchrony in ASD is certainly consistent with this possibility. Furthermore, there is a relevant literature addressing the ease of performance of behaviors. Those behaviors that are performed frequently are well-learned and have a low threshold; in contrast, novel or very infrequent behaviors have a higher threshold."Activation" refers to the relative strength of the behavior once the threshold is reached. A critical assumption is that the dynamic coupling of two systems – e.g., two limbs, or limbs and oral structures – requires relatively high levels of activation in order for mutual entrainment to occur. There is strong evidence of reduced information integration in ASD that is the direct consequence of decreased connectivity of local neural assemblies and of overconnectivity within local assemblies (Brock et al., 2002; Belmonte et al., 2004; Just et al., 2004, 2007; Rippon et al., 2007; Wicker et al., 2008). One suggestion from this literature is thus that, if an individual has activation that spreads less smoothly between brain regions (because of reduced connectivity, for example), that individual will be less able to mutually entrain a given system.

One strategy for addressing the question of whether "embodiment" is as strong an influence on individuals with ASD, is to examine directly embodied processes in that population. If a task were relatively non-social in nature, this would permit us to test specifically embodiment effects without the confound of whether an individual with ASD performs differently just because of a lifetime of reduced social interest and engagement. Although no such studies have been conducted to date, this research is in progress in our lab.

## **NEURAL MECHANISMS OF ASD: SPECIFIC DIFFERENCES IN EMBODIMENT**

Some of the research on embodiment differences in ASD has been behavioral, but there are also several important pieces of evidence suggesting a solid neurophysiological foundation for these differences. Perhaps the most influential has been the documentation of reduced functional connectivity between distant brain regions in ASD (Belmonte, 2004; Just et al., 2004; Kana et al., 2006). Decreased connectivity between prefrontal and other cortical regions is specifically implicated in sociocognitive processing deficits in ASD (Wicker et al., 2008), and many researchers describe functional connectivity as tightly linked to these functional brain differences (Klin et al., 2003).

The cerebellum is involved in the timing and integration of behaviors, and has been shown to be anatomically atypical in ASD in multiple studies (Courchesne et al., 1988; Ritvo and Garber, 1988). Eyeblink conditioning, mediated by the cerebellum, requires rapid and precise timing, and is highly impaired in ASD (Sears et al., 1994), as described above. This autism-specific impairment is particularly striking, because 1- to 2-day-old newborns demonstrate eyeblink conditioning during sleep (Fifer et al., 2010); it is an early mastered ability. The cerebellum also controls the timing of behaviors that have both a cognitive and a motor component (Glickstein, 2006), and that require close synchrony (Katz and Steinmetz, 2002), such as speech production (Ackermann et al., 2004). Given all the differences in timing, temporal coordination, and synchrony, that seem to characterize ASD, and the importance of those processes for embodiment, researchers have looked carefully for specific links between cerebellar structure and function, and embodied processes.

#### **MIRROR NEURON SYSTEM AND EMBODIMENT IN ASD**

In addition to general discussions of functional connectivity and the cerebellum, many researchers have been excited by the prospect that the neural "architecture" underlying the mechanism of embodied cognition might be the mirror neuron system (MNS; Niedenthal, 2007; Gallese et al., 2013). The MNS refers to a set of neurons that fire when an action is *executed* and also when that same action is merely *observed*; as such, they appear to encode the observation and execution of action (Oztop and Arbib, 2002; Williams, 2008). Regions involved in MNS processing include the ventral part of the precentral gyrus, the posterior part of the inferior frontal gyrus, the rostral part of the inferior parietal lobe, and regions within the intraparietal sulcus and the superior temporal sulcus (Cattaneo and Rizzolatti, 2009). Mirror neurons have also been identified in supplementary motor area, an area mainly dedicated to movement initiation and sequencing, and medial temporal lobe, principally involved in memory tasks. Premotor areas active during the execution and the observation of an action may also be involved in the intention promoting the action (Gallese, 2006).

Research on the MNS suggests that social interaction draws on the capacity to predict and understand the motor goals and motor intentions of others, through their actions, and that this ability is instantiated in the cortical motor system organization via the MNS. It is possible that this network is established very early in development; studies of infants have suggested that there is"motor simulation"activity in premotor and posterior parietal cortex (Shimada and Hiraki, 2006). Mirror neurons respond most strongly to behaviors that are in our own behavioral repertoire (Buccino et al., 2004), implying that the more idiosyncratic a child's emotional expressions and behaviors are, the less likely they may be to trigger mimicry in those around him.

While the MNS is an excellent candidate for serving as the neural substratefor embodiment, there is considerable controversy about its existence in humans and its specificity for understanding the symptomatology of ASD. While the original MNS data from primates are widely accepted, many researchers disagree about the existence and nature of the MNS system in humans. There is disagreement about the specific *location* of the mirror neurons; about whether there is a specific *interconnected system* of these neurons; and about whether there are *systematic differences* between neurons that perform mirroring functions, or whether any neuron can take on this function (e.g., Niedenthal, 2007). One review has suggested that the MNS hypothesis provides a useful model for understanding ASD, because "across the spectrum of autismrelated disorders, it appears to be the cognitive functions that are embodied by action that are most affected" (Williams, 2008, p. 84). In sharp contrast, a more recent review of neuroimaging studies (Hamilton, 2013) concluded that studies using *non-emotional* hand action stimuli typically reveal *no group differences*, concluding that there is little evidence for global dysfunction of the mirror neuron system (Hamilton, 2013). More research is required to better understand the dynamics and anatomical substrates of the MNS in humans.

## **SUMMARY AND IMPLICATIONS**

While there have been no direct tests of embodied processes in ASD, we hypothesize that ASD is characterized by a relative decrease or lack of embodiment. That is, the stimuli that an individual with ASD encounters may be less bound to the sensory and motor conditions that held when that stimulus was first encountered. Because individuals with ASD seem to have motor deficits, involving poor integration of information for motor planning, and deficits in higher level motor planning,it seems possible that motor deficits contribute to a weakened role of embodied processing in functioning in individuals with ASD.

Language acquisition research provides evidence consistent with this possibility. For example, Linda Smith and colleagues have suggested an important role for sensorimotor functions in early word learning (Yu and Smith, 2012). That is, TD infants were most likely to learn words when the object to which a word referred was visually dominant, according to eyetracking data, and when they were physically manipulating those objects. Yu and Smith (2012) suggested that the infant's visual focus on a specific object during naming, along with the infant's handling of the object, served to reduce referential ambiguity. These behaviors were better predictors of the infant's later word knowledge than was parent verbal labeling. Sensory-motor behaviors of infants and parents seemed to create optimal visual moments for learning, playing a stronger role in word learning than verbal naming by parents. These findings suggest that for language acquisition, sensorimotor cues help to constrain the learning process. If such constraints were not operating as efficiently in toddlers with ASD, due to motor control or sensorimotor integration deficits, one would anticipate language delays, as are found in ASD (Eigsti et al., 2007).

Given the pattern of findings described here, we suggest that there are (at least) three possible explanations for embodiment differences in ASD. Note that these explanations are not necessarily mutually exclusive.


In general, it is of course true that development in TD individuals is shaped and sculpted by embodied processes. In particular, embodiment appears to be a particularly direct pathway for understanding the implications or associations of information. If individuals withASD do not have access to this rich source of information, they must rely on alternative pathways to learn. Of course, in life, many important cues arrive via social interactions; this may explain why prior discussions of embodiment in ASD were limited to social cognition and social embodiment. The current data seem to suggest a broader reach of embodiment impairments in ASD.

These data raise several points to consider for intervention. Speaking generally, children with ASD benefit from explicit teaching of motor action and facilitating top-down mechanisms to bolster the less-active, implicit, bottom-up processes. These are processes that TD children use "for free" – that is, they engage bottom-up embodiment processes without explicit attention or effort. It is possible that explicitly directing children with ASD to adopt particular facial expressions or body postures, may help explicitly "entrain" the body with embodiment.

While data are limited, one study is consistent with this suggestion (Yilmaz et al.,2004). This intervention study involved teaching of a swimming intervention ("hydrotherapy") to individuals with ASD. In addition to the benefits to physical health, including gains in balance, speed, agility, strength, flexibility, and endurance, this could potentially enhance motor control and motor planning and lead to downstream improvements in implicit mimicry, emotional contagion, and so on. Interventionists already engage in such

#### **REFERENCES**


Becchio, C., Pierno,A.,Mari,M., Lusher, D., and Castiello, U. (2007). Motor contagion from gaze: the case of autism. *Brain* 130, 2401–2411.

Belmonte, A. (2004). Temporal averaging of turbulence-induced uncertainties on coherent power measurements. *Opt. Express* 12, 3770–3777.


explicit teaching of top-down approaches to skills in ASD, such as eye contact, that are absolutely critical in good social interaction. The current review also suggests that dyadic approaches that involve synchronization or mimicry training, perhaps via EMG, may also be a powerful approach.

While evidence of embodiment "impairments" in ASD is lacking, the collective impact of relevant studies reviewed here suggests that the time has come for a direct test of embodied processes in ASD.

## **ACKNOWLEDGMENTS**

The author is grateful to funding from a Fulbright Research Scholar Fellowship, which supported this research, and to helpful suggestions from several anonymous reviewers.

*sapiens*). *J. Comp. Psychol.* 109, 308–320.


basic motor skill with dyspraxia in autism: implication for abnormalities in distributed connectivity and motor learning. *Neuropsychology* 23, 563–570.


Goldin-Meadow, S., Goodrich, W., Sauer, E., and Iverson, J. (2007). Young children use their hands to tell their mothers what to say. *Dev. Sci.* 10, 778–785.

Gowen, E., and Hamilton, A. (2013). Motor abilities in autism: a review using a computational context. *J. Autism Dev. Disord.* 43, 323–344.


question answering. *Cogn. Psychol.* 14, 8–106.


of language development. *Cognition* 96, B101–B113.


ERP evidence of atypical face processing in young children with autism. *J. Autism Dev. Disord.* 36, 881–890.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 October 2012; accepted: 11 April 2013; published online: 30 April 2013.*

*Citation: Eigsti I-M (2013) A review of embodiment in autism spectrum disorders. Front. Psychol. 4:224. doi: 10.3389/fpsyg.2013.00224*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Eigsti. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## How vision affects kinematic properties of pantomimed prehension movements

## **Takao Fukui \*† and Toshio Inui**

Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Caroline Whyatt, Queen's University Belfast, UK Michiru Makuuchi, National Rehabilitation Center for Persons with Disabilities, Japan

#### **\*Correspondence:**

Takao Fukui, Equipe ImpAct, Centre de Recherches en Neurosciences de Lyon, INSERM U1028 – CNRS UMR5292 – Université Lyon 1, 16 Avenue Doyen Lépine, Bron 69500, France. e-mail: takao.fukui@inserm.fr

#### **†Present address:**

Takao Fukui, Equipe ImpAct, Centre de Recherches en Neurosciences de Lyon, INSERM U1028 – CNRS UMR5292 – Université Lyon 1, Bron, France.

When performing the reach-to-grasp movement, fingers open wider than the size of a target object and then stop opening. The recorded peak grip aperture (PGA) is significantly larger when this action is performed without vision during the movement than with vision, presumably due to an error margin that is retained in order to avoid collision with the object. People can also pretend this action based on an internal target representation (i.e., pantomimed prehension), and previous studies have shown that kinematic differences exist between natural and pantomimed prehension. These differences are regarded as a reflection of variations in information processing in the brain through the dorsal and ventral streams. Pantomimed action is thought to be mediated by the ventral stream. This implies that visual information during the movement, which is essential to the dorsal stream, has little effect on the kinematic properties of pantomimed prehension. We investigated whether an online view of the external world affects pantomimed grasping, and more specifically, whether the dorsal stream is involved in its execution. Participants gazed at a target object and were then subjected to a 3-s visual occlusion, during which time the experimenter removed the object. The participants were then required to pretend to make a reach-to-grasp action toward the location where the object had been presented. Two visual conditions (full vision and no vision) were imposed during the pantomimed action by manipulating shutter goggles. The PGA showed significant differences between the two visual conditions, whereas no significant difference was noted for terminal grip aperture, which was recorded at the movement end. This suggests the involvement of the dorsal stream in pantomimed action and implies that pantomimed prehension is a good probe for revealing the mechanism of interaction between the ventral and dorsal streams, which is also linked to embodied cognition.

**Keywords: reach-to-grasp movement, pantomimed action, vision, dorsal and ventral streams, grip configuration**

## **INTRODUCTION**

People perform adaptive motor behaviors in their daily lives, and these adaptive behaviors are assumed to emerge from continuous interaction among the nervous system, body, and environment (Chiel and Beer, 1997). Embodied cognition argues that the sensorimotor process mediating this adaptive control of bodies in environments is tightly related to the cognitive system (e.g., Clark, 1997;Wilson,2002). In particular, the hand developed in a remarkably human-specific manner and the actions by this body part added variety to human lives and led ultimately to civilization. The old mot "The hand is the window on to the mind," which is ascribed to Immanuel Kant (cf. Tallis, 2004), also indicates that the hand serves as a substantial interface between the external world and the individual self.

Reaching for and grasping an object is one of the basic functions of the human hand in daily life. Following Jeannerod's (1981, 1984) pioneering studies, this fundamental human skill has been a research focus for the last three decades (e.g., Castiello and Begliomini, 2008; Grafton, 2010; Rosenbaum et al., 2012 for recent reviews). The reach-to-grasp movement consists of two components: a transport component, which is thought to direct the arm

to the spatial location of the target, and a manipulation component, which is involved in grasping a three-dimensional object (Jeannerod, 1981, 1984). Jeannerod (1981) was the first to systematically describe the behavioral aspects of the grasping action in which the fingers first open gradually to form the appropriate configuration for the target object to be grasped ("preshaping"). The fingers then continue to open wider than the size of the target object and stop opening at a point about 60–70% into the movement (i.e., the peak grip aperture, PGA), after which they enclose the object, finally touching its surface (e.g., Jeannerod and Marteniuk, 1992). Accomplishing this movement requires appropriate visuomotor transformation, which indicates that visual information is essential for the online control of goal-directed movements. When visual information from the entire visual field is absent during prehension, this invokes a significantly larger PGA (Wing et al., 1986; Jakobson and Goodale, 1991; Bradshaw and Elliott, 2003; Fukui and Inui, 2006). This is due to the greater margin of hand aperture, which allows for error in movement and prevents collision of the fingers with the target object (e.g., Wing et al., 1986). Therefore, PGA has been regarded as an indicator of the influence of online vision on grasping.

In addition to these types of goal-directed movements, motor behavior can be performed toward an object, even when that object is no longer present, based on the memory of the object by imagining its properties (i.e., pantomimed action). Despite our assumption that people could pantomime well and replicate the motor performance of (natural/real) goal-directed movements, previous studies have demonstrated that the kinematics of pantomimed prehension differ quite substantially from prehension to an existing object (e.g., Goodale et al., 1994). Specifically, these researchers found that, when compared to normal prehension, pantomimed prehension consistently reached lower peak velocities, tended to last longer, followed more curvilinear trajectories, and undershot the target location. PGA was also smaller when pantomiming than when grasping the existing objects. Unlike normal prehension, pantomimed prehension has no haptic feedback due to lack of a target object; thus people have to configure their terminal grip aperture (TGA) according to a memory representation of the target object. When the participants could see a visual image of the target object via a mirror apparatus while they reached for it, a significant difference in the TGA was noted between with and without haptic feedback conditions (Bingham et al., 2007). These researchers also found that mixing the trials with and without haptic feedback in one experimental session resulted in an appropriate configuration of the TGA in the no feedback condition, indicating the importance of haptic calibration opportunities (see also Schenk, 2012).

In association with cognitive and sensorimotor processes, two relatively parallel streams have been proposed to explain visual information processing in the brain; namely the ventral stream, which projects from the primary visual cortex to the inferotemporal cortex, and the dorsal stream, which projects from the primary visual cortex to the posterior parietal cortex (Ungerleider and Mishkin, 1982). The ventral stream was initially proposed to play a critical role in the identification and recognition of objects ("what" pathway), whereas the dorsal stream was thought responsible for localizing those objects in space ("where" pathway). However, research revealed that "where" did not fully express the functions of dorsal streams. For example, some patients with damage to the posterior parietal cortex (i.e., the dorsal stream) were found unable to orient the hand and form an appropriate grasp, in addition to the inability to reach a proper spatial location (e.g., Rondot et al., 1977; Perenin and Vighetto, 1988; Jakobson et al., 1991). Therefore, Goodale and Milner (1992) focused on the differences in the output systems served by each stream. Specifically, they proposed that the ventral stream plays a major role in constructing a perceptual representation of the visual world and the objects within it, while the dorsal stream mediates the visual control of actions directed at those objects (the "How" pathway; Goodale, 2011).

The observed differences in kinematics between pantomimed and natural motor behaviors suggest that different control is exerted on pantomimed actions from that of natural goal-directed motor behavior. Specifically, pantomimed motor behavior might be guided by the ventral system, whereas natural goal-directed motor behavior is mediated by the dorsal stream (Westwood et al., 2000; Milner and Goodale, 2006). This argument was strengthened by a neuropsychological study that investigated an optic

ataxic patient who suffered with visuomotor difficulties due to severe bilateral damage to the posterior parietal lobes (Milner et al., 2001). The PGA of this patient's pantomimed prehension scaled according to the object size, implying that visual memory for this action was appropriately used and that the intact ventral stream could contribute to this motor behavior.

Although the contribution of the ventral stream to pantomimed prehension was revealed by these previous studies, the nature of the involvement of the dorsal stream with execution of the pantomimed prehension remains unclear. The primal function of the dorsal stream is the online transformation of visual information into action execution (Jeannerod et al., 1995; Desmurget et al., 1999; Pisella et al., 2000; Grea et al., 2002). The question becomes whether visual information of the environment affects the performance of pantomimed prehension, particularly the configuration of the grip aperture, such as the PGA and the TGA. There are two possibilities each for the kinematics of PGA and TGA when manipulating vision during movement.

Concerning the PGA,


Concerning the TGA,


In this study, we investigated whether visual information during movement affects the kinematics of pantomimed prehension; specifically, we determined which of the above mentioned possibilities would be more plausible, by manipulating the vision with crystal shutter goggles during movement execution.

## **MATERIALS AND METHODS PARTICIPANTS**

Seven self-reported right-handed students (mean: 24.4 years of age, SD: 2.70; one female) participated in the experiment. All participants reported normal or corrected-to-normal vision and none of them had any motor or sensory abnormalities. They were naive with regard to the purpose of the experiment, and gave their informed consent according to the Declaration of Helsinki.

## **APPARATUS AND STIMULUS**

Participants were equipped with liquid crystal shutter goggles (Takei Scientific Instruments Co., Ltd., Niigata, Japan) and seated comfortably on a chair in front of a table (120 cm × 75 cm) in a room with natural lighting. As illustrated in **Figure 1**, a target object was presented along the participant's sagittal plane, with a distance between the target object and the starting position of 50 cm. A pressure-sensitive switch was located at the starting position, which was approximately in line with the participant's right shoulder. Three wooden cylinders (4, 6, and 9 cm in diameter and 11 cm in height) were used as target objects. An electromagnetic motion tracking sensor FASTRAK system (Polhemus, Colchester, VT, USA) was used for measuring the position of wrist (the head of ulna), while the aperture between the thumb and middle finger was calculated by a data glove (Virtual Technologies, Inc., Palo Alto, CA, USA). The temporal resolution of the motion tracking sensor was 120 Hz and that of the data glove was 100 Hz. The liquid crystal shutter goggles take about 3 ms to become transparent and about 20 ms to become opaque. A workstation Octane (SGI Japan, Ltd., Tokyo, Japan) controlled the apparatus and recorded the kinematics.

#### **PROCEDURE**

Participants were told to place their right hand at the starting position before each trial and to begin each trial with the tips of the thumb and middle finger touching each other. Goggles were opaque before trials. The experiment consisted of two sessions: prehension to a real object and pantomimed prehension. Participants first performed prehension to a real object ("natural" prehension) with vision, and then they performed pantomimed prehension with or without vision, presented in a random fashion.

pressure-sensitive switch; b: position of cubical receiver on the wrist when grasping; c: position of target presentation. The reaching distance recorded by the receiver was approximately 35 cm from the starting position.

A natural grasping task with vision was performed as a baseline condition. We assumed that an experience of real interaction with a target object via natural prehension is a prerequisite for an appropriate pantomimed action; therefore, the pantomime task was preceded by the natural grasping task. In the real grasping task, participants were required to reach for and grasp the presented target object and then lift and move the object 5–10 cm toward their bodies, under full vision (cf. Fukui and Inui, 2006; and, see also **Figure 1**). We did not test natural prehension without vision because our main interest was determining how the visual information of the environment during movement execution influences the configuration of pantomimed prehension movements.We were concerned that the control modulated by the visual context (i.e., full vision or no vision) in natural prehension session would influence the control of the following pantomimed prehension.

Participants performed pantomimed prehension movements as follows: first, goggles became transparent following a beep signal and stayed transparent for 1 s. During this period, the participants were required to memorize the target properties (i.e., size, location, etc.). After this period, the goggles became opaque and remained in this condition for 3 s (i.e., delay). During this time, the experimenter removed the target object. Two viewing conditions were designed for the subsequent procedure: (i) Pantomimed action with vision (PV), where the goggles again became transparent after a beep and participants performed a pantomimed action to the memorized target object; (ii) Pantomimed action with no vision (PNV), where the goggles remained opaque and the participants performed the pantomimed action, cued by the beep, according to the memorized target object. The two viewing conditions (PV/PNV) and three target sizes (4, 6, and 9 cm) were presented randomly, with nine trials for each combination (i.e., a total of 54 experimental trials). In addition to the pantomimed movement of grasping, the participants were required to pretend to lift and move the object 5–10 cm toward their bodies.

In the real grasping (RV) session, nine trials for each object size (4, 6, and 9 cm) were presented randomly (i.e., a total of 27 experimental trials). A 3-s delay was also inserted in this session, as well as in the pantomimed session, and the target object was not removed during this delay (cf. Milner et al., 2001). Before both pantomime and real grasping sessions, participants were required to perform each action a few times as practice trials (within five trials) according to the instructions.

## **DATA PROCESSING AND ANALYSIS**

The initiation of the movement was defined as the time that the participant's hand released from the pressure-sensitive switch. The termination of movement was defined as the time point at which the maximal value of the distance between the wrist and the starting point (i.e., the point in which direction of the wrist movement was changed) was recorded (Zaal and Bootsma, 1993; Bootsma et al., 1994; Fukui and Inui, 2006).

The positional data given by Cartesian coordinates in three dimensions from the receiver were recorded and filtered offline by a second-order dual-pass Butterworth filter with a cut-off frequency of 10 Hz. Further offline analysis included computation of wrist velocity from the filtered position signal. We also calculated two grasp component values; specifically, PGA and TGA (the aperture between thumb and middle finger at the point in time when the changes in grasp configuration were stable). As an index of movement variability (reach distance, PGA, and TGA), standard deviations across trials were computed for each participant.

The mean data for each dependent variable were analyzed with an ANOVA, with object size (4, 6, and 9 cm) and the task (PV, PNV, and RV) as within-participant factors (alpha level = 0.05). Huynh–Feldt adjustments to the degrees of freedom were performed when necessary. As described earlier, participants did not perform real prehension without vision because previous experience of both full and no vision conditions during natural prehension was expected to influence online control in the subsequent pantomimed prehension session. That is why we incorporated PV, PNV, and RV into one within-participant factor as a task. Our interest is the comparisons of each dependent variable between PV and PNV conditions and those between PV and RV conditions. As *post hoc* comparisons, we performed paired *t*-tests, using the Bonferroni correction, on the mean values for PV and PNV conditions, for PV and RV conditions, and for each size.

## **RESULTS**

We found lower peak wrist velocity and smaller PGA (except for the 9-cm object) in the pantomimed prehension tasks when compared to natural prehension tasks, as demonstrated by Goodale et al. (1994). In addition to these results,kinematic differences were found in pantomimed prehension between the full vision and no vision conditions. Specifically, we found a larger PGA when pantomiming with no vision than when pantomiming with full vision. At the same time, no significant difference was noted for the TGA values between the full vision and no vision conditions.

#### **REACH DISTANCE AND REACH DISTANCE VARIABILITY**

Reach distance (**Figure 2A**) showed a main effect of task [*F*(2, 12) = 4.550, *p* = 0.034, partial η <sup>2</sup> = 0.431], but no significant main effect of size [*F*(2, 12) = 3.261, *p* = 0.074] and no interactions between the two factors [*F*(4, 24) = 0.825, *p* = 0.522]. Further analysis revealed a significant difference between PV and PNV conditions (*p* < 0.001), but no significant difference between PV and RV conditions (*p* = 0.297). The reach distance was undershot when visual information was not available during pantomimed prehension,whereas the distance was comparable to that of natural prehension when visual information was available.

between participants.

velocity **(E)**, PGA variability **(F)**, time to peak grip aperture **(G)**, and TGA

Reach distance variability (**Figure 2B**) showed a main effect of task [*F*(2, 12) = 7.288, *p* = 0.009, partial η <sup>2</sup> = 0.548], but no significant main effect of size [*F*(2, 12) = 2.890, *p* = 0.095] and no interaction between the two factors [*F*(4, 24) = 0.316, *p* = 0.865]. Further analysis revealed a significant difference between PV and PNV conditions (*p* = 0.012), indicating that the lack of available visual information during pantomime action increased in the reach distance variability.

#### **MOVEMENT DURATION**

A significant interaction between size and task [*F*(2.990, 17.942) = 3.375, *p* = 0.041, partial η <sup>2</sup> = 0.360] was evident, although no main effects were noted [size: *F*(2, 12) = 0.350, *p* = 0.712, task: *F*(2, 12) = 1.318, *p* = 0.304]. We found a simple main effect of size in the RV condition (*p* = 0.007), but further analysis revealed no significant differences among the different sizes (**Figure 2C**).

#### **TRANSPORT COMPONENT**

#### **Peak wrist velocity**

Both size [*F*(2, 12) = 13.905, *p* < 0.001, partial η <sup>2</sup> = 0.699] and task [*F*(2, 12) = 13.546, *p* < 0.001, partial η <sup>2</sup> = 0.693] had significant effects on the peak wrist velocity (**Figure 2D**). No significant interaction was noted between size and task [*F*(4, 24) = 1.217, *p* = 0.330]. Further analysis revealed significant differences between the PV and RV conditions (*p* < 0.001),indicating that pantomimed action was slower than natural prehension. We also found a significant difference between the PV and the PNV conditions (*p* < 0.001), suggesting further reduction of the velocity was observed when visual information was not available during pantomimed prehension. We also found significant differences between the 4 and 9-cm objects (*p* = 0.001) and between the 6 and 9-cm objects (*p* < 0.001). This result might be due to the grip manner of the 9-cm object in which the fingers were almost completely extended to their capacities to ensure that the object was stably held, leading to a cautious action manner even in the pantomimed conditions.

#### **Time to peak wrist velocity**

Time to peak velocity (**Figure 2E**) showed a main effect of task [*F*(2, 12) = 5.719, *p* = 0.018, partial η <sup>2</sup> = 0.488], but no significant main effect of size [*F*(2, 12) = 1.174, *p* = 0.342] and no interaction between the two factors [*F*(4, 24) = 0.804, *p* = 0.535]. Further analysis revealed a significant difference between the PV and RV conditions (*p* < 0.001), indicating a later timing to peak wrist velocity in pantomimed prehension than in real grasping, under full vision condition.

## **MANIPULATION COMPONENT**

## **Peak grip aperture and variability of PGA**

Both size [*F*(2, 12) = 130.140, *p* < 0.001, partial η <sup>2</sup> = 0.956] and task [*F*(2, 12) = 4.341, *p* = 0.038, partial η <sup>2</sup> = 0.420] significantly affected the PGA. We found a significant interaction between size and task [*F*(4, 24) = 26.797, *p* < 0.001, partial η <sup>2</sup> = 0.817]. We also found significant differences between PV and PNV (*p* = 0.003) and between PV and RV (*p* = 0.008) for the 4 cm object and a significant difference between PV and PNV (*p* = 0.006) for the 6-cm object (see **Figure 3A**). In other words, the PGA was significantly larger when visual information was not available in the pantomimed action. The results imply that visual information appeared to affect the configuration of grip

**FIGURE 3 | Mean values of peak grip aperture (A) and terminal grip aperture (B) in each condition.** Differences between PV and PNV conditions for the 4 and 6-cm objects were found, whereas no difference of terminal grip aperture was noted between the PV and PNV conditions for any object size. PV, PNV, and RV indicate pantomime prehension with vision, pantomime prehension without vision, and real prehension with vision, respectively. Error bars indicate the standard errors of the values between participants.

aperture in pantomimed prehension, although no difference was found for the 9-cm object, presumably due to a kind of ceiling effect constrained by the hand structure. The variability of PGA (**Figure 2F**) showed a main effect of task [*F*(2, 12) = 6.088, *p* = 0.015, partial η <sup>2</sup> = 0.504], but no main effect of size [*F*(2, 12) = 0.991, *p* = 0.400] or interaction between size and task [*F*(4, 24) = 0.826, *p* = 0.522]. Further analysis revealed a significant difference between the PV and RV conditions (*p* = 0.017), but the difference between the PV and PNV conditions was not significant (*p* = 0.400). This result suggests that this value would be modulated depending on the existence of the target object during movement.

## **Time to peak grip aperture**

Time to PGA (**Figure 2G**) showed a main effect of size [*F*(2, 12) = 19.125, *p* < 0.001, partial η <sup>2</sup> = 0.761], and a significant interaction between the size and task [*F*(4, 24) = 4.945, *p* = 0.005, partial η <sup>2</sup> = 0.452], but no main effect of task [*F*(2, 12) = 1.008, *p* = 0.394]. Further analysis revealed a significant difference between the 4 and 9-cm objects (*p* < 0.001) in the PV condition, as well as significant differences between the 4 and 9-cm objects (*p* = 0.005) and between the 6 and 9-cm objects (*p* = 0.002) in the PNV condition. No significant differences among target sizes were noted in the RV condition (*p* = 0.074). The time to PGA in pantomimed prehension tasks would shift later according to the increase of the object size.

## **Terminal grip aperture and variability of TGA**

Both size [*F*(2, 12) = 198.662, *p* < 0.001, partial η <sup>2</sup> = 0.971] and task [*F*(2, 12) = 17.319, *p* < 0.001, partial η <sup>2</sup> = 0.743] significantly affected TGA. A significant interaction was noted between size and task [*F*(4, 24) = 48.496, *p* < 0.001, partial η <sup>2</sup> = 0.890]. Further analysis revealed a significant difference between the PV and RV conditions for the 9-cm object (*p* < 0.001). No significant difference was noted between the PV and the PNV conditions for any object size. This suggested that the availability of visual information during pantomimed prehension did not affect the TGA (see **Figure 3B**). The variability of TGA (**Figure 2H**) showed a main effect of task [*F*(2, 12) = 32.616, *p* < 0.001, partial η <sup>2</sup> = 0.845] but no main effect of size [*F*(2, 12) = 1.261, *p* = 0.318] and no interaction between size and task [*F*(1.857, 11.140) = 1.488, *p* = 0.266]. Further analysis revealed a significant difference between the PV and RV conditions (*p* < 0.001), suggesting larger variability in pantomimed movements, under full vision condition.

## **DISCUSSION**

The current study explored: (i) whether kinematic differences exist between pantomimed prehension and natural grasping, as shown previously (e.g.,Goodale et al.,1994); and (ii) whether visual information during movement affects the kinematics of pantomimed prehension. We confirmed the kinematic differences between natural and pantomimed prehension movements with full vision, as Goodale et al. (1994) had previously demonstrated. Specifically, when participants performed pantomimed action, they showed lower peak wrist velocity and smaller PGA (except for the 9-cm object) when compared to the natural prehension task, although no significant difference in the reach distance was noted in the current experiment.

At the same time, the kinematic differences were found in pantomimed prehension between the full vision and no vision conditions. Specifically, we found an undershot reach distance and a larger variability, a lower peak wrist velocity, and larger PGA when pantomiming with no vision than when pantomiming with full vision, which suggested that a view of the external environment affects the execution of pantomimed prehension. Previous studies in a goal-directed movement (e.g., Watt et al., 2000) demonstrated that a situation of visual uncertainty induced an undershot bias, and the current findings of the transport component (i.e., the undershot reach distance and its larger variability with a lower peak velocity when pantomiming with no vision) suggest that this visual uncertainty also influences the pantomimed action in a similar manner of a goal-directed action.

The interesting finding in our study was the significant difference observed in the PGA between the full and no vision conditions, while no significant difference was observed in the TGA. As for the larger PGA without vision, we could not ascribe this result to a decay of the memory for performing this action, as Hesse and Franz (2009) pointed out in their natural prehension experiments. That is because, in contrast to natural grasping, even if the target representation is decayed and more vague in the no vision condition than in the full vision condition, physical contact does not need to occur with an object in a pantomimed action, so the PGA does not need to increase in the no vision condition as there is no need for an error margin to avoid a collision with the object, as described in the Introduction. Furthermore, TGA, which is configured according to this representation, would have also showed the difference between these two conditions, but we did not find such difference in the TGA (and its variability). Rather, the target representation, which was assumed to be reflected in TGA, showed a stable property that was immune to the decay. This result implies that an internal representation about the target object for the pantomiming might not be influenced by the availability of visual information. In fact, the TGA for the 4 and 6-cm objects corresponded to the object size (although, the 9-cm object was somehow overestimated), which implied that the pantomime grip aperture might depict the form of the object (Laimgruber et al., 2005). The interpretation for the smaller PGA obtained in the full vision condition is that environmental visual information and/or view of the hand contributed to a "better" grip aperture configuration even in the pantomimed action (i.e., memory-guided movements; cf. Ietswaart et al., 2001; Heath, 2005). This "better" configuration does not mean that the pantomimed prehension with vision shows more similar kinematic properties to natural grasping than that without vision; rather, it implies that in the full vision condition, there is no additional opening of grip aperture when pantomimed prehension is performed. In addition to the online information extraction, another interpretation of the current results is that participants, presumably implicitly, would try to "simulate" the grip configuration of the real grasping action in the task of pantomimed prehension. Specifically, in spite of the requirement of performing the pantomimed prehension, participants modulate the grip aperture by taking into account the environmental (visual) context (i.e., full vision or no vision). Note that, as described in the introduction, a larger PGA without vision was observed in the real grasping. Although such modulation in

the pantomimed prehension would not be necessary because there is no real object to be grasped, the modulation of the PGA, according to the visual context, would suggest involvement of the body in the cognition of the external world (cf. Witt and Proffitt, 2008).

The pantomimed prehension seen in the current study could be characterized as a motor behavior into which a delay period is inserted between the target object presentation and action phases. Milner and Goodale (2006) proposed that a delay between stimulus presentation and grasping led to a shift from dorsal to ventral control of the movement because the dorsal stream does not retain a visual memory for more than 2 s; therefore, a memory-guided action introduced by a delayed period is mediated by the ventral stream (see also Westwood and Goodale, 2003). However, Himmelbach and Karnath (2005) found that the movement error of a pointing task performed by patients with optic ataxia decreased linearly with longer delays and argued that residual dorsal processing still exists in delayed movements and that there is a gradual change between the dorsal and ventral streams. The current results (i.e., the significant difference in the PGA according to the availability of visual information as opposed to no significant difference in the TGA) suggest that the TGA is generated by the perceptual representation of the visual world, which is immune to the availability of visual information and is mediated by the ventral stream, while the PGA difference reflects an online information extraction mediated by the dorsal one. Therefore, generating the grip configuration of the delayed pantomimed prehension would be contributed by both the ventral and dorsal streams, along with the findings from the delayed pointing task by Himmelbach and Karnath (2005) (see also Franz et al., 2009; Hesse and Franz, 2009; Janczyk et al., 2010 for arguing the "one representation" hypothesis).

As for the related neural basis of this study, an fMRI study by Króliczak et al. (2007) found that pantomimed grasping invokes activation primarily in several areas in the right posterior parietal lobe (e.g., the superior parietal lobe and posterior parts of the intra-parietal sulcus) as well as in some other areas (e.g., the area overlapping both the right medial temporal gyrus and the superior temporal sulcus) in the right hemisphere. Recently, Makuuchi

## et al. (2012) demonstrated that execution of pantomimed prehension requires the interaction between dorsal and ventral streams. Specifically, they found significant intrinsic connections between the anterior intraparietal sulcus (AIP, dorsal) and posterior inferior temporal gyrus (pITG, ventral), consistent with the anatomical connection between these areas (Borra et al., 2008). These fMRI studies indicate that the dorsal stream is involved in execution of pantomimed action; this supports our current finding for the PGA, which would result from the contribution of the dorsal stream. Rizzolatti and Matelli (2003) proposed that the dorsal visual stream could be functionally subdivided into (i) the dorso-dorsal pathway runningfromV6 toV6a and a medial intra-parietal region (MIP) in the superior parietal lobule (SPL), functioning in the online control of action; and (ii) the ventro-dorsal pathway running from the medial superior temporal (MST) area to the inferior parietal lobule (IPL), functioning in motor control, action understanding, and space perception (see also Pisella et al., 2006). The open question remaining for further investigation is which dorsal stream (dorso-dorsal or ventro-dorsal pathways) is dominantly involved with the current PGA result. Specifically, the question is what function (motor aspect and/or a kind of space perception) is reflected on the PGA result observed in the present study (cf. Neggers et al., 2006; Schenk, 2006; Cavina-Pratesi et al., 2011).

Pantomime research traditionally focuses on tool-use pantomime actions (e.g., Goldenberg, 2009) while only a few studies have investigated reach-to-grasp pantomimed action. Recently, Binkofski and Buxbaum (in press) proposed that the dorso-dorsal system was characterized as the "grasp" system for the purposes of reach-to-grasp actions and that the ventro-dorsal stream was characterized as the "use" system for the specific skilled actions associated with familiar objects. In addition to real prehension studies, further research on reach-to-grasp pantomimed action is essential to clarify the mechanism of the "grasp" system. In summary, the results presented here indicate that pantomimed action is mediated by the coordination of the ventral and dorsal streams. These observations suggest that this action might be a good probe for revealing the mechanism of interaction between the ventral and dorsal streams (cf. Rossetti et al., 2000; Cloutman, in press).

## **REFERENCES**


of the macaque anterior intraparietal (AIP) area. *Cereb. Cortex* 18, 1094–1111.


environment. *Trends Neurosci.* 20, 553–557.


visuomotor mechanisms. I. Different aspects of the deficit in reaching for objects. *Brain* 111, 643–674.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 August 2012; accepted: 20 January 2013; published online: 07 February 2013.*

*Citation: Fukui T and Inui T (2013) How vision affects kinematic properties of pantomimed prehension movements. Front. Psychology 4:44. doi: 10.3389/fpsyg.2013.00044*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Fukui and Inui. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The role of body-related and environmental sources of knowledge in the construction of different conceptual categories

## **Guido Gainotti 1,2\***

<sup>1</sup> Center for Neuropsychological Research, Università Cattolica del Sacro Cuore, Policlinico Universitario Agostino Gemelli, Rome, Italy

<sup>2</sup> Department of Clinical and Behavioral Neurology, Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione Santa Lucia, Rome, Italy

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Roel M. Willems, Donders Institute for Brain, Cognition and Behaviour, Netherlands Saskia Van Dantzig, Philips Research, Netherlands

#### **\*Correspondence:**

Guido Gainotti, Department of Neurosciences, Center for Neuropsychological Research, Catholic University of Rome, Largo A. Gemelli 8, 00168 Rome, Italy. e-mail: gainotti@ rm.unicatt.it

Controversies exist regarding: (a) the relationships between perceptual and conceptual activities and (b) the format and neuro-anatomical substrates of concepts. Some authors maintain that concepts are represented in the brain in a propositional, abstract way, which is totally unrelated to the sensory-motor functions of the brain. Other authors argue that concepts are represented in the same format in which they are constructed by the sensorymotor system and can be considered as activity patterns distributed across different perceptual and motor domains. The present paper examines two groups of investigations that support the second view. Particular attention is given to the role of body movements and somatosensory inputs in the representation of artifacts and, respectively, of visual and other perceptual sources of knowledge in the construction of biological categories. The first group of studies aimed to assess the weight of various kinds of information in the representation of different conceptual categories by asking normal subjects to subjectively evaluate the role of various perceptual, motor, and encyclopedic sources of knowledge in the construction of different semantic categories.The second group of studies investigated the neuro-anatomical correlates of various types of categorical disorders. These last investigations showed that the cortical areas damaged in patients with a disorder selectively affecting a given category have a critical role in processing the information that has contributed most to constructing the affected category. Both lines of research suggest that body movements and somatosensory information have a major role in the representation of actions and artifacts mainly known through manipulations and other actions, whereas visual and other perceptual information has a dominant role in the representation of animals and other living things.

**Keywords: models of conceptual knowledge, category-specific semantic disorders, animals vs. fruits and vegetables, sources of knowledge, anterior temporal lobes, left fronto-parietal lesions**

## **INTRODUCTION**

Our knowledge of the world is mediated by two types of activity: (1) perceptual-motor activity, which allows us to obtain information about external objects through our actions or analysis of the perceptual attributes of environmental stimuli; and (2) conceptual activity, which permits the construction of internal representations of complex categories of knowledge. Two points must be stressed regarding this basic distinction. First, objects we know mainly through actions accomplished by our body only partly overlap with those we principally know through auditory and visual modalities. The former usually belong to the category of artifacts that can be touched, manipulated and used for different purposes, whereas the latter often belong to living categories (such as wild animals) we know in a physical or virtual environment located in far extra-personal space. The second point refers to the fact that until recently there was an important gap between our knowledge of the mechanisms and neuro-anatomical substrates of perceptual and motor activities and respectively of conceptual activities. Indeed, we have clear and detailed knowledge about the organization and neuro-anatomical structures subsuming actions or different perceptual modalities, but we have only controversial and uncertain models about the format and neuro-anatomical substrates of concepts.

Concerning the format of conceptual representations, many cognitive models (e.g., Phylyshyn, 1973; Fodor, 1975 and, more recently, Humphreys and Riddoch, 1988; Riddoch et al., 1988; Caramazza et al., 1990; Patterson and Hodges, 2000; Coccia et al., 2004) have claimed that perceptual and conceptual activities result from the activity of interrelated but completely independent systems.

According to this view, the hierarchical stages of perceptual analysis proceed up to the level of structural description, which includes a complete three-dimensional specification of the sensory characteristics of objects. But after this level, no trace of the previous sensory-motor mechanisms persists because the format of conceptual representations accessed through these structural descriptions is considered abstract, amodal, and propositional. Nevertheless, subsequent to the pioneering work of Kolers and Brison (1984), Allport (1985), and Jackendoff (1987), an increasing number of authors have refuted this model of an abstract, amodal conceptual/semantic system. These authors maintain that conceptual representations keep the stamp of the perceptual mechanisms through which they were formed and are stored in the same format in which they were constructed. Drawing in part on these cognitive models and in part on Hebb's (1949) model of "cell assemblies," Damasio (1989, 1990) proposed the dynamic "higher-order convergence zone" construct. This construct assumes that concept retrieval results from a process of recollection of modality-specific bits of memories, stored near the sensory portals and motor output sites of the system and triggered by firing in "higher-order convergence zones" (Damasio's, 1989, 1990). This construct was further developed by Barsalou (Barsalou, 1999; Barsalou et al., 2003), who added the similarityin-topography (SIT) principle to Damasio's model. According to this model, the proximity of two conjunctive neurons in a convergence zone increases with the similarity of the features they conjoin. Consequently, conjunctive neurons become topographically organized into local regions that represent properties and categories. Both Damasio's and Barsalou's models can be considered eminent examples of "embodied cognitive models," whose central tenet is that semantic knowledge is grounded in sensory-motor systems that are automatically engaged during online conceptual processing, re-enacting modality-specific patterns of activity normally evoked during perception and action.

#### **SUMMARY OF THE MAIN POINTS OF THIS SECTION**

The format of conceptual representations is the object of a strong debate between authors who maintain that concepts are represented in a propositional, abstract manner in the brain and authors who argue that concepts are represented in the same format in which they are constructed by the sensory-motor system. Some models are discussed that specify how sensory-motor information could converge in the construction of a conceptual representation.

## **THE DISCOVERY OF CATEGORY-SPECIFIC SEMANTIC DISORDERS AND THEIR IMPACT ON THE DEBATE BETWEEN SUPPORTERS OF THE ABSTRACT AND THE EMBODIED FORMAT OF CONCEPTUAL REPRESENTATIONS**

One important development in the debate between supporters of the "abstract/amodal" and "embodied/sensory-motor" format of conceptual representations was the discovery that disruptions of conceptual knowledge are not necessarily homogeneous across categories but are sometimes "category-specific" (Warrington and McCarthy, 1983, 1987; Warrington and Shallice, 1984). Categoryspecific semantic disorders usually affect biological ("living") more than artifact ("non-living") categories, but sometimes preferentially impair artifact ("non-living") categories (see Saffran and Schwartz, 1994; Gainotti et al., 1995; Gainotti, 2000, 2005; Capitani et al., 2003 for reviews). In any case, they have been explained differently by supporters of the abstract and the embodied format of conceptual representations.

In particular, Warrington and co-workers (Warrington, 1975, 1981; Warrington and McCarthy, 1983, 1987; Warrington and Shallice, 1984) claimed that their patients' semantic disorders did not respect the boundaries between living/biological and

non-living/artifact entities (e.g., the representation of "body parts" tended to be disrupted together with that of artifact categories, whereas the representation of "musical instruments" tended to be disrupted with that of living items). This suggested that "categoryspecific semantic disorders" might not be due to the disruption of true "biological" and "artifact" categories, but might be the by-products of a more basic dichotomy concerning the differential weighting of visual-perceptual and functional attributes in the representation of biological and, respectively, artifact categories. This interpretation is obviously at variance with the views of authors who claim that the format of conceptual representations is abstract, amodal, and propositional. In fact, these authors (e.g., Caramazza, 1998; Caramazza and Shelton, 1998; Capitani et al., 2003; Caramazza and Mahon, 2003) proposed a very different theoretical interpretation of category-specific semantic disorders. They hypothesized an "innate" categorical organization of conceptual knowledge in which category-specific impairments for animals, plant life and artifacts are due to the disruption of innate brain networks shaped by natural selection for rapid identification of objects that are very important for survival. This interpretation is more consistent with the model of an abstract, propositional semantic system (because all the above-mentioned categories could be represented in the same abstract format) but fails to explain the joint breakdown of artifacts with body parts (see McCarthy, 1995; Gainotti, 2000, 2004; Hart and Kraut, 2007 for reviews) and musical instruments with living items (see Dixon et al., 2000; Gainotti, 2000; Masullo et al., 2012 for reviews).

The present review examines two different groups of investigation that support the "embodied/sensory-motor" model of conceptual representations, devoting particular attention, on one hand, to the role of body movements and somatosensory inputs and, on the other hand, to that of visual and other perceptual sources of knowledge in the construction of different semantic categories.

The first group includes studies that evaluated the weight of various kinds of information in the representation of different conceptual categories by asking normal subjects to subjectively evaluate the role of various perceptual, motor, and encyclopedic sources of knowledge in constructing different living and artifact categories.

The second group includes studies concerned with the neuroanatomical correlates of various types of category-specific disturbances, because it is important to check for consistency between the cortical areas damaged in patients with a disorder selectively affecting a given category and the specific functions these areas have in processing information that contributes to the construction of the affected category. In order to facilitate the comprehension of these rather hard issues, **Table 1**, reporting an overview and clustering of the main findings of the present survey has been included.

#### **SUMMARY OF THE MAIN POINTS OF THIS SECTION**

The discovery of category-specific semantic disorders for living things and artifacts has strongly influenced the debate between supporters of the abstract and the sensory-motor format of conceptual representations. Warrington and co-workers maintained that category-specific semantic disorders were not due to the

#### **Table 1 | Overview of the main findings of the present review.**


disruption of true "biological" and "artifact" categories but were the by-product of a more basic dichotomy concerning the different weight of visual-perceptual and functional attributes in the representation of biological and, respectively, artifact categories. Supporters of the abstract models counter-argued that category-specific impairments for animals, plant life and artifacts are not due to the loss of specific clusters of sensory-motor information but reflect an innate categorical organization shaped by natural selection to support rapid identification of objects important for survival. The present review examines two groups of investigations that support the sensory-motor model of conceptual representations. It takes into account, on one hand, studies conducted in normal subjects, to evaluate the weight of various kinds of information in the representation of different conceptual categories and, on the other hand, studies that investigated the neuro-anatomical correlates of various types of category-specific disorders.

## **STUDIES THAT ASSESSED THE WEIGHT OF VARIOUS KINDS OF INFORMATION IN THE REPRESENTATION OF DIFFERENT CONCEPTUAL CATEGORIES**

The view that semantic knowledge is not stored in an amodal, abstract format in the brain, but in the same concrete format in which it was constructed by the sensory-motor system, was prompted by a series of seminal papers by Warrington and

co-workers (Warrington, 1975, 1981; Warrington and McCarthy, 1983, 1987; Warrington and Shallice, 1984). These papers suggested (a) that different brain lesions can disrupt different categories of knowledge (e.g., living beings vs. artifacts); and (b) that these "category-specific disorders" are not due to the disruption of an innate categorical brain organization but to disorganization of the sensory-motor mechanisms that primarily contributed to the development of different categories (i.e., the "differential weighting hypothesis"). In particular, Warrington and Shallice (1984) described four patients recovering from herpes simplex encephalitis (HSE) who presented a dissociation between a selective impairment for living things and a relative sparing of artifacts. They believed that the dissociation was due to the major role of visual features in the identification of living things and functional features in the identification of artifacts. The lesions in HSE selectively affect the anterior parts of the temporal lobes, where the ventral stream of visual processing terminates (Ungerleider and Mishkin, 1982; Mishkin et al., 1984; Goodale et al., 1991). Therefore, Warrington and Shallice (1984) proposed that these structures have a critical role in the construction of living categories because they subsume the high-level visual data on which distinctions among members of the "living" categories are based. According to this view, the distinction between a lion, a tiger, and a leopard would depend on a visual sensory feature, namely, the plain, striped, or spotted aspect of their skin; in the

case of artifacts, however, identification of a category member would depend on functional attributes (i.e., the subtly different functions man designed them for). This interpretation, which is usually called the "sensory-functional theory" (SFT; Caramazza and Shelton, 1998; Tyler et al., 2000; Capitani et al., 2003; Ventura et al., 2005), prompted studies in various areas of research in brain-damaged patients and normal subjects that reported conflicting results (see Capitani et al., 2003; Gainotti et al., 2009 for reviews). For example, the disproportionate impairment of visual (rather than functional) attributes predicted by the SFT in patients with a category-specific semantic impairment for living things was confirmed in some patients (e.g., Sartori and Job, 1988; De Renzi and Lucchelli, 1994; Gainotti and Silveri, 1996;Rosazza et al., 2003) but not in others (see Capitani et al., 2003 for survey). Moreover, the assumption of differential weighting of sensory and functional information in the representation of knowledge about living things and artifacts has not been systematically confirmed by studies conducted in normal subjects, using various experimental procedures, which will be described later.

These conflicting results are probably due to the inappropriateness of the expression "SFT" to account for the "differential weighting" hypothesis, because both "functional" and "sensory" features include very heterogeneous components. Based on the suggestion of Warrington and McCarthy (1987), Buxbaum et al. (2000), Buxbaum and Saffran (2002), and Boronat et al. (2005) distinguished, within the functional knowledge, the function of an object from its manipulation. They suggested that, because the "manipulation" is related to a sensory-motor activity, manipulation might be the component most tightly linked to the "differential weighting hypothesis". The above-cited studies showed that also the properties subsumed by the term "sensory" are heterogeneous, because different types of sensory data might have different weights in the construction of different semantic categories. Thus, visual perception could have a leading role in the mental representation of animals and somatosensory data in that of tools. These facts prompted various authors (e.g., Gainotti, 1990, 2006; Saffran and Schwartz, 1994; Gainotti et al., 1995; Chao et al., 1999; Chao and Martin, 2000;Martin et al., 2000;Martin and Chao, 2001;Martin, 2007) to replace the expression "SFT" with the more specific "sensory-motor model of conceptual knowledge" (SMCK), which, in keeping with Barsalou et al.'s (2003) "embodied cognition theory," assumes that various perceptual, motor and encyclopedic sources of knowledge have different weights in the construction of different living and artifact categories.

The assumption that various kinds of sensory information may have different weights in the representation of different categories of knowledge has been confirmed by studies conducted in normal subjects (following the principles of the SFT and the SMCK) to evaluate their mental representations of the sources of knowledge in these categories. Farah and McClelland (1991) and Caramazza and Shelton (1998) were the first authors who tried to assess the weight of various kinds of information in the representation of different conceptual categories in normal subjects. Following the principles of the SFT, they asked participants to underline either visual or functional descriptors in dictionary definitions of living things or artifacts. The results of these studies were conflicting. Farah and McClelland (1991) found a much larger ratio of visual than functional attributes for living things than for artifacts, whereas Caramazza and Shelton (1998) only found a non-significant difference between these two domains of knowledge. This discrepancy emerged because in the former study a property was considered "functional" only if it described "what the item did or what it was for,"whereas in the latter all"non-sensorial" (i.e., functional, encyclopedic, etc.) descriptors were contrasted with sensory properties. Analogous inconsistencies emerged in studies conducted by Devlin et al. (1998), Tyler et al. (2000), Garrard et al. (2001), McRae and Cree (2002), Vanovenberghe and Storms (2003), Ventura et al. (2005), and Zannino et al. (2006), when feature generation or feature verification tasks were used to check the assumption of differential weighting of sensory and functional information in the representation of knowledge about living things and artifacts.

Tranel et al. (1997b),Vigliocco et al. (2004),McRae et al. (2005), Gainotti et al. (2009),Hoffman and Lambon Ralph (under review), and Gainotti et al. (2012) obtained more consistent results using different procedures to test the principles of the "SMCK." Tranel et al. (1997b) asked normal subjects who had been shown slides of entities from different conceptual categories to rate the extent to which a number of factors, including manipulability and various sensory modalities, had been part of their experience with the corresponding objects.Vigliocco et al. (2004) and McRae et al. (2005) gathered data on conceptual feature representations from the conceptual domains of objects and actions, by asking undergraduate students to list the features of the things the stimulus words referred to. They distinguished (in the object field) several categories of living things and artifacts and classified the features in five categories: visual, other perceptual, functional, action-related, and other (including superordinate and encyclopedic). Gainotti et al. (2009, 2012) and Hoffman and Lambon Ralph (under review) used a procedure that was more directly and specifically derived from the "SMCK." This procedure consisted of asking normal subjects to use Likert scales to evaluate the influence of different perceptual (visual, auditory, tactual, olfactory, and gustative) and motor activities, as well as encyclopedic information, in the mental representation of living and artifact categories.

The same results were consistently found in all of these investigations using both feature-listing tasks and Likert scales to evaluate the weight of different "sources of knowledge."

First, visual information was consistently evaluated with both methodologies as the dominant type of sensory or motor feature by averaging results obtained across all concepts and comparing the scores for each modality (Tranel et al., 1997b; Cree and McRae, 2003; Vigliocco et al., 2004; McRae et al., 2005; Gainotti et al., 2009; Hoffman and Lambon Ralph, under review). The major importance attributed to vision in the mental representation of all kinds of concrete entities is not surprising if we consider that most of our knowledge of the world is obtained through this perceptual modality.

Second,when hierarchical cluster analyzes were used in featurelisting studies (e.g., Cree and McRae, 2003; Vigliocco et al., 2004) or in studies based on a separate rating of the various sources of knowledge (e.g., Gainotti et al., 2009, 2012; Hoffman and Lambon Ralph, under review), a tripartite organization of knowledge (with three major clusters corresponding to animals, fruits and vegetables, and artifacts) was found.

Third, the distinction between living things and artifacts, on one hand, and "animals" and "plant life" (within the "living" categories), on the other hand, was confirmed by a more detailed analysis of the next most relevant sources of information after vision. In fact, the next most relevant sources of information consisted of other perceptual data (and encyclopedic information) for the living categories but of body-related features (actions and somatosensory data) for the artifact categories. Furthermore, within the "living" categories the next most relevant sources of information included encyclopedic knowledge and auditory perceptions (i.e., typical sounds) in animals, whereas they consisted of olfactory and gustatory perceptions and actions (e.g., peeling, cutting, and stirring) in fruits and vegetables.

Taken together, these data suggest that the greatest difference between living and artifact categories lies in the interaction between visual data and other perceptual (auditory, olfactory, gustatory, and tactual) attributes in the case of living things, and between visual data, action-related properties, and somatosensory information in the case of artifacts. The greatest difference between living and artifact categories does, therefore, not lie in the prominent role played by vision in the representation of animals, fruits, and vegetables, and by functional features in the representation of artifacts.

#### **SUMMARY OF THE MAIN POINTS OF THIS SECTION**

Warrington and co-workers' "SFT" has not been systematically confirmed by studies conducted in normal subjects using various experimental procedures. Conflicting results may be due to the inappropriateness of the expression "SFT," because both "functional" and "sensory" features include very heterogeneous components. Indeed, the expression has been replaced with the more specific "SMCK," which assumes that various perceptual, motor and encyclopedic sources of knowledge have different weights in the construction of different living and artifact categories. The usefulness of this new model has been confirmed in studies performed to assess the weight of various kinds of information in the representation of different conceptual categories by asking normal subjects to subjectively evaluate the role of various sources of knowledge in the construction of different semantic categories. These studies have consistently shown: (a) that visual information is evaluated as the dominant feature in both living and non-living categories; (b) that the next most relevant sources of information are other perceptual data for the biological categories and body-related features (actions and somatosensory data) for the artifact categories.

## **INVESTIGATIONS CONCERNING THE NEURO-ANATOMICAL CORRELATES OF VARIOUS TYPES OF CATEGORICAL DISORDERS**

From the neuro-anatomical point of view, data obtained by studying evaluations of the weight of various kinds of information in the representation of different conceptual categories suggest that brain structures with a critical role in the representation of living and artifact categories might have a well-defined cortical localization. Thus, the anterior parts of the temporal lobes (where the ventral stream of visual processing converges with auditory, olfactory, and gustatory inputs) should have a critical role in the representation of biological entities. On the other hand, the fronto-parietal, sensorimotor cortices (where the dorsal stream of visual processing converges with bodyrelated and action-oriented structures) should have a major role in the representation of artifacts. Furthermore, subjective evaluations of the weight of various kinds of information in the representation of different conceptual categories suggest there is a different degree of lateralization in the brain's representation of animals, fruits and vegetables, and artifacts. The major sources of knowledge about animals (i.e., visual and auditory inputs) should, indeed, be bilaterally represented, whereas the action-oriented structures, which provide an important source of knowledge about artifacts (and to a lesser extent about fruits and vegetables), should be mainly represented in the left hemisphere, which controls the movements of the right side of the body.

Both of these predictions have been confirmed by a number of anatomo-clinical and neuroimaging studies.

Concerning the critical role played by lesions of the anterior parts of the temporal lobes in semantic disorders for biological entities, several reviews of the anatomical correlates of category-specific semantic disorders (e.g., Saffran and Schwartz, 1994; Gainotti et al., 1995; Damasio et al., 1996; Tranel et al., 1997a; Gainotti, 2000, 2005; Capitani et al., 2003) have shown that brain structures located in the terminal parts of the ventral stream of visual processing (such as the IT cortices) or responsible for integrating highly processed visual data with other sensory modalities (such as the perirhinal and entorhinal cortices) are usually disrupted in patients with category-specific semantic disorders for living things. For example, Gainotti (2000) made a detailed and systematic review of all available anatomo-clinical reports of patients who presented a category-specific semantic disorder for living things and artifacts and found bilateral injury to the antero-mesial and inferior parts of the temporal lobes (temporal pole, IT cortex, parahippocampal, perirhinal, and entorhinal cortices) in almost all patients with a category-specific semantic impairment for living things. Strauss et al. (2000) and Luckhurst and Lloyd-Jones (2001) also reported similar data, because they showed that temporal lobectomy patients were disproportionately more impaired in naming living than non-living things.

Data supporting this model were also reported by Grabowski et al. (2001), Devlin et al. (2002), Tyler et al. (2004), Moss et al. (2005), and Bright et al. (2005) in a series of neuroimaging studies. These authors showed that the human perirhinal cortex and neighboring anterior temporal structures provide the neural infrastructure for living categories.

For example, Devlin et al. (2002) entered data from seven PET studies into a single multifactorial design that crossed category (living vs. man-made) with a range of tasks and found that living things activated medial aspects of the anterior temporal poles bilaterally and tools activated a left posterior middle temporal region. And Bright et al. (2005) reviewed recent neuropsychological and neuroimaging studies which showed that the human perirhinal cortex and contiguous anteromedial temporal structures provide the neural infrastructure for making fine-grained discriminations among objects, suggesting that damage in the perirhinal cortex may underlie the emergence of category-specific semantic deficits for living things.

Regarding artifacts, we see that lesions of a network involving the dorso-lateral part of the left frontal lobe, the left inferior parietal lobe and the left middle temporal gyrus, where different components of action schemata are represented (see Saygin et al., 2004), provoke a prevalent impairment for tools and other manmade artifacts, whose knowledge is mainly based on active manipulation and physical contact with objects. This claim is supported by the results of Gainotti's (2000) above-mentioned systematic review, which showed that an extensive lesion in areas located in the dorso-lateral convexity of the left hemisphere was present in all patients with a semantic impairment selectively affecting artifact categories, and by other more recent reviews (e.g., Capitani et al., 2003; Kellenbach et al., 2003; Gainotti, 2005; Buxbaum and Kalénine, 2010; Campanella et al., 2010).

The systematic restriction of brain lesions to the left hemisphere in patients with a category-specific disorder for artifacts was confirmed in activation studies, conducted by Chao and Martin (2000), Gerlach et al. (2002), Kellenbach et al. (2003), and Boronat et al. (2005), and in experiments of direct electrical cortical stimulation, conducted by Ilmberger et al. (2002). For example, Chao and Martin (2000) found that viewing and naming pictures of tools selectively activated the left ventral premotor cortex. Boronat et al. (2005) obtained similar results when participants viewed pairs of pictures or words denoting manipulable objects and had to determine whether the objects were manipulated the same way (M condition) or served the same function (F condition). Significantly greater and more extensive activations in the left inferior parietal lobe occurred in the M than the F condition. Finally, Ilmberger et al. (2002) used tool and animal items to test the naming capabilities of epilepsy patients with subdural electrodes implanted for localization of the epileptogenic zone and preoperative mapping of cognitive functions. Results showed that during stimulation of the left hemisphere naming disorders were more pronounced for tool items than animal items.

The neuro-anatomical correlates of category-specific disorders for fruits and vegetables show some features typical of animals (i.e., importance of the anterior and mesial parts of the temporal lobes) and other features typical of artifacts (i.e., left lateralization). These findings, which were recently discussed by Capitani et al. (2009), Gainotti (2010, 2011), and Capitani and Laiacona (2011), are is in keeping with the results of investigations that subjectively evaluated the weight of various kinds of information in the representation of different conceptual categories. In fact, in fruits and vegetables (as in animals) the most relevant sources of information (after vision) are other perceptual data, whereas in all artifact categories they consist of body-related actions and somatosensory data. This explains the critical role of the anterior and mesial parts of the temporal lobes in the representation of all living categories. On the other hand, in the representation of fruits

and vegetables (as in those of artifacts, but not animals), specific actions, such as peeling, cutting, and stirring, play an important part, which may account for the shared left lateralization of both artifacts and fruits and vegetables.

#### **SUMMARY OF THE MAIN POINTS OF THIS SECTION**

Research on the neuro-anatomical correlates of various types of categorical disorders has shown that the cortical areas damaged in patients with a disorder selectively affecting a given category have a critical role in processing the information that primarily contributed to constructing the affected category. Thus, in patients with a category-specific semantic disorder for biological entities, lesions bilaterally affect the anterior parts of the temporal lobes (where the ventral stream of visual processing converges with auditory, olfactory, and gustatory inputs); and in patients with a preferential impairment of the artifact categories lesions usually affect the left-sided fronto-parietal, sensory-motor cortices (where the dorsal stream of visual processing converges with body-related and action-oriented structures). Taken together, both lines of research suggest that body movements and somatosensory information have a major role in the representation of artifacts (mainly known through their manipulation),whereas visual and other perceptual information has a dominant role in the representation of animals and other living things.

## **CONCLUDING REMARKS**

The scope of the present review was ambitious; indeed, it aimed to clarify the nature and format (abstract or sensory-motor) of our conceptual representations. Both the psychological and the anatomo-clinical data summarized in this survey seem to support the sensory-motor (embodied) theory, because they show: (a) that different perceptual and action-related features contribute to the construction of different conceptual categories; (b) that psychological and anatomical data are consistent, because the cortical areas affected in patients with category-specific semantic disorders and activated during tasks involving the same categories play a critical role in processing information that contributed to the construction of the affected category.

These results indicate: (a) that the distinction between biological and artifact categories is not a primary one for the brain and is not due to an "innate" categorical organization of conceptual knowledge, as maintained by Caramazza and co-workers (Caramazza, 1998; Caramazza and Shelton, 1998; Caramazza and Mahon, 2003); (b) that simple dichotomies, such as the "living"/"non-living" distinction or the "SFT" cannot explain the complexity of factors subsuming the brain's representation of different categories. On the contrary, the assumption that bodyrelated and environmental sources of knowledge experienced through diverse sensory modalities play a different role in the construction of different conceptual categories is consistent with the subjective evaluation of normal subjects and the main functions of cortical areas that have a critical role in the representation of these categories. Nevertheless, the complexity of the experiential factors and brain structures subsuming the brain's representation of different categories suggests that further investigations are necessary to clarify the advantages and possible limitations of this assumption.

## **REFERENCES**


deficits: what do they reveal about the organization of conceptual knowledge in the brain? *Neurocase* 4, 265–272.


lesion in a patient with a categoryspecific semantic impairmentfor living beings. *Cogn. Neuropsychol.* 13, 357–389.


their mental representation. *J. Verb. Learn. Verb. Behav.* 23, 105–113.


Weinberger (NewYork: The Guilford Press), 65–77.


and category-specific naming. *Brain Cogn.* 43, 403–406.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2012; accepted: 01 October 2012; published online: 29 October 2012.*

*Citation: Gainotti G (2012) The role of body-related and environmental sources of knowledge in the construction of different conceptual categories. Front. Psychology 3:430. doi: 10.3389/fpsyg.2012.00430 This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Gainotti. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## How task goals mediate the interplay between perception and action

## **Pascal Haazebroek <sup>1</sup>\*, Saskia van Dantzig<sup>2</sup> and Bernhard Hommel <sup>1</sup>**

<sup>1</sup> Cognitive Psychology, Institute of Psychology, Leiden University, Leiden, Netherlands

<sup>2</sup> Department of Brain, Body and Behavior, Philips Research, Eindhoven, Netherlands

#### **Edited by:**

Dermot Lynott, The University of Manchester, UK

#### **Reviewed by:**

Daniel Lakens, Eindhoven University of Technology, Netherlands Motonori Yamaguchi, Vanderbilt University, USA

#### **\*Correspondence:**

Pascal Haazebroek, Cognitive Psychology, Institute of Psychology, Leiden University, Wassenaarseweg 52, 2333AK Leiden, Netherlands. e-mail: phaazebroek@fsw. leidenuniv.nl

Theories of embodied cognition suppose that perception, action, and cognition are tightly intertwined and share common representations and processes. Indeed, numerous empirical studies demonstrate interaction between stimulus perception, response planning, and response execution. In this paper, we present an experiment and a connectionist model that show how the Simon effect, a canonical example of perception–action congruency, can be moderated by the (cognitive representation of the) task instruction. To date, no representational account of this influence exists. In the experiment, a two-dimensional Simon task was used, with critical stimuli being colored arrows pointing in one of four directions (backward, forward, left, or right). Participants stood on a Wii balance board, oriented diagonally toward the screen displaying the stimuli. They were either instructed to imagine standing on a snowboard or on a pair of skis and to respond to the stimulus color by leaning toward either the left or right foot. We expected that participants in the snowboard condition would encode these movements as forward or backward, resulting in a Simon effect on this dimension. This was confirmed by the results. The left–right congruency effect was larger in the ski condition, whereas the forward–backward congruency effect appeared only in the snowboard condition. The results can be readily accounted for by HiTEC, a connectionist model that aims at capturing the interaction between perception and action at the level of representations, and the way this interaction is mediated by cognitive control. Together, the empirical work and the connectionist model contribute to a better understanding of the complex interaction between perception, cognition, and action.

**Keywords: stimulus–response congruency, task set, perception–action interaction,Wii balance board, connectionist modeling, Simon effect, top-down modulation**

## **INTRODUCTION**

Theories of embodied cognition (e.g., Glenberg, 1997; Barsalou, 1999; Wilson, 2002) suggest that cognition, perception, and action are tightly intertwined and share common representations and processes. In the last decade, this view has been studied extensively, and much evidence in its favor has been accumulated. Many studies have demonstrated that cognition interacts with perception and action, suggesting that these systems share the same representations and processes (e.g., Pecher and Zwaan, 2005). In this study we particularly focus on how cognition can modulate the interaction between perception and action by assessing the role of task instruction on automatic processes in stimulus–response translation. This interaction is demonstrated in an empirical study and further explained by simulations using a connectionist model (HiTEC, Haazebroek et al., submitted). We first describe bilateral interactions between perception, cognition, and action and subsequently focus on the influence of task context on the interaction between perception and action.

## **INTERACTIONS BETWEEN PERCEPTION, COGNITION, AND ACTION**

The interaction between perception and cognition can be demonstrated by so-called spatial congruency effects. Several studies have found interactions between the meaning of words and the spatial position of those words on the computer screen. For example, people respond faster to a word such as *helicopter* or *stork* when it is presented at the top of the computer screen than when it is presented at the bottom of the screen (Šetic and Domijan, 2007). Other studies showed that the spatial meaning of a word may attract attention to a particular location on the screen (e.g., Estes et al., 2008; Zanolie et al., 2012). Spatial congruency effects are also found with words referring to abstract concepts that are metaphorically connected to spatial locations, such as power (Schubert, 2005; Zanolie et al., 2012), valence (Meier and Robinson, 2004), divinity (Meier et al., 2007), or magnitude (Fischer et al., 2003; Pecher and Boot, 2011), but see Lakens (2012) for an alternative explanation, based on polarity correspondence. Furthermore, studies have shown that perceiving motion in a particular direction interacts with the processing of sentences or words describing motion in the same direction (e.g., Kaschak et al., 2005; Meteyard et al., 2007, 2008).

Likewise, spatial congruency effects also occur in the interaction between cognition and action. For example, participants are faster to respond to a sentence when the direction of the response matches the direction of the action described in the sentence. This so-called *action compatibility effect* (see Zwaan and Yaxley, 2003; Zwaan et al., 2012) has been found with different kinds of movement, such as moving the hand toward or away from the body (Glenberg and Kaschak, 2002) and rotating the hand (Zwaan and Taylor, 2006). These results are taken as evidence that the representations underlying conceptual processing partially overlap with the representations underlying the preparation and execution of action.

Finally, spatial congruency effects occur in the interaction between perception and action. Much research has been devoted to stimulus–response congruency (SRC) effects; the canonical example being the Simon effect (Simon and Rudell, 1967; Hommel, 2011). In the typical Simon task, stimuli vary on a spatial dimension (e.g., randomly appearing on the left or right) and on a nonspatial dimension (e.g., having different colors). Participants have to respond to the non-spatial stimulus feature by performing a spatially defined response (e.g., pressing a left or right key). Although the location of the stimulus is irrelevant for the response choice, it nevertheless influences the response time and accuracy, suggesting interaction between stimulus perception and response planning. Participants respond faster (and more accurately) when the stimulus location is congruent with the response location than when the stimulus location is incongruent with the response location. The Simon effect has been replicated numerous times and has been used frequently as a methodological tool to investigate perception, action, and cognitive control (for an overview, see Hommel, 2011).

## **INFLUENCE OF COGNITIVE CONTROL ON THE INTERPLAY BETWEEN PERCEPTION AND ACTION**

To account for SRC effects, traditional cognitive theories, and computational models of stimulus–response translation typically assume that: (1) responses are represented by spatial codes (e.g., Wallace, 1971), (2) attending to a stimulus automatically produces a spatial stimulus code, and (3) the outcome of a comparison between the spatial stimulus code and the spatial response code produces the compatibility effect. Crucially this comparison is assumed to occur automatically and arise from the fact that stimuli and responses are similar (e.g., have dimensional overlap, Kornblum et al., 1990, 1999; but see Proctor and Lu, 1999; Tagliabue et al., 2000 for accounts based on over-learning). Indeed, in typical computational models of SRC effects, such as the Simon effect, stimuli are represented in terms of non-spatial task-relevant codes (e.g., "red shape" and "blue shape") and spatial task-irrelevant codes (e.g., "left shape" and "right shape"), and responses are also represented in terms of spatial codes (e.g., "left key" and "right key"). Stimulus codes and response codes are connected using two routes (e.g., Kornblum et al., 1990; De Jong et al., 1994; Zorzi and Umiltà, 1995). A direct route connects the spatial stimulus codes to the corresponding spatial response codes, which is assumed to reflect the automatic process. The task instruction (e.g., "*when you see a red shape, press the left key*") is implemented as a softwired connection from the non-spatial stimulus code (e.g., "red shape") to a spatial response code (e.g., "left key"), following the task instruction. This is assumed to reflect the controlled process. Now, when a compatible stimulus is presented (e.g., a red shape presented on the left), both the hard-wired spatial connections and the soft-wired task instruction-based connections contribute

to a speedy activation of the correct response code. Conversely, when an incompatible stimulus is presented (e.g., a red shape presented on the right), the direct route activates the incorrect response. The controlled route, however, activates the response determined by the task instruction, which eventually wins the competition. As a result, processing incompatible stimuli results in longer reaction times than processing compatible stimuli. In sum, the stimulus–response congruency effect arises from the interplay between the direct route, reflecting automatic comparison between spatial stimulus and response codes, and the controlled route, reflecting the task instructions.

However, the various spatial congruency effects mentioned in Section "Interactions Between Perception, Cognition, and Action" also suggest an interaction between cognition and perception and between cognition and action. Hence, it is to be expected that the (cognitive) task set may influence the automatic translation from spatial stimulus codes to spatial response codes. Indeed, various studies have demonstrated that SRC effects are strongly influenced by the task. For instance, Riggio et al. (1986) reported that when participants responded with sticks that were either parallel or crossed, the Simon effect was found to relate to the stick end position, not to the hands holding the sticks. In a study by Guiard (1983), participants had to respond with a steering wheel. Their results suggest that not the position of the hands but the steering direction (as in a car) determines the Simon effect, indicating an even more abstract notion of left or right responses. It is this task- and intention-dependent left-ness or right-ness, rather than the actual physical location of a response, that seems to interact with the spatial location of the stimulus and thereby yields the Simon effect – an argument that can also be made for other stimulus–response effects (Hommel, 2000).

In a study by Hommel (1993), the role of task instruction was assessed empirically. Hommel had participants responding with left and right keypresses to the high vs. low pitch of tones, respectively. As usual in a Simon task, the tones randomly appeared on the left or right side. Importantly, when a key was pressed a light flashed on the opposite side of the keypress, which allowed instructing participants in two different ways: one group of participants was instructed to "*press the left/right key*" in response to the pitch of the tone, whereas another group was instructed to "*flash the right/left light*." Given the wiring of lights to response keys, all participants carried out exactly the same movements in response to the same stimuli, but they did so for different reasons: one group in order to press the keys and the other in order to flash the lights. Whereas the Key group showed a standard Simon effect with faster responses when the tone location and key location corresponded, the Light group showed the opposite effect: faster responses when the tone location and light location corresponded. The fact that the irrelevant stimulus locations had an effect at all suggests that stimulus locations were processed and cognitively coded, and that they interacted with spatial response codes. However, the observation that the impact of this interaction on behavior was determined by the instruction and, thus, by the goal representation this instruction must have established, suggests that the interplay between perception and action is controlled by task goals.

Addressing the role of task goals in SRC, Ansorge and Wühr (2001) formulated the response-discrimination hypothesis that states that response representations are not automatically formed, but rather top-down controlled. Only spatial features that discriminate between alternative responses are represented and thus give rise to a Simon effect. This resonates with the conclusions in a general review by Proctor and Vu (2006) that the Simon effect is not resulting from an automatic activation of a corresponding response by means of a hard-wired (e.g.,Kornblum et al., 1990) or over-learned (e.g., Umilta and Zorzi, 1997) route; rather the task defines S–R associations that mediate this responding.

## **HiTEC**

Although it is clear that task context influences SRC, and several hypotheses have been suggested, an overarching framework that connects the different findings and explains computationally how perception, action, and cognition interact *in terms of neurally plausible representations and processes* is still lacking. The development of computational models is mentioned as one of the main challenges for the field of embodied and grounded cognition (Barsalou, 2008, 2010; Borghi and Pecher, 2011; Pezzulo et al., 2011).

To address this challenge, we developed HiTEC, a connectionist computational cognitive model that aims at capturing the interaction between perception and action in terms of neurally plausible representations and processes, and the way this interaction is mediated by cognitive control (Haazebroek et al., 2011, submitted). HiTEC is meant to be a connectionist model that is plausible in terms of neural processing properties and global cortical connectivity. HiTEC enables simulation of human perception and action control, based on the principles and assumptions of the Theory of Event Coding (TEC; Hommel et al., 2001).

Theory of event coding is a general theoretical framework that addresses how perceived events (i.e., stimuli) and produced events (i.e., actions) are cognitively represented and how their representations interact to generate perceptions and action plans. According to TEC, stimuli, and actions are represented in a common representational format, using *the same* feature codes. These codes refer to the distal features of objects and events in the environment, such as shape, size, distance, and location, rather than the proximal features that are registered by the senses. For example, a stimulus presented on the left and an action performed on the left both activate the same distal code representing"left." It is theorized (Hommel et al., 2001) that feature codes emerge from regularities in sensorimotor experience and that they can also be activated conceptually (e.g., by means of verbal labels, Hommel and Elsner, 2009). When a stimulus (or action–effect) is registered, it is represented by *sensory codes* that in turn activate associated distal *feature codes*.

Theory of event coding stresses that perception and action are flexible; that is, they are tuned to the current context and are subject to cognitive control (Hommel et al., 2001). Codes are "intentionally weighted"; the strength of their activation depends on the task context (Memelink and Hommel, 2012). Feature dimensions that are relevant for the task at hand are weighted more strongly than irrelevant dimensions. For example,if the task is to grasp an object, feature dimensions that are relevant for grasping (such as shape, size, location, and orientation) will be enhanced, so that object features on these dimensions have more influence on processing

than feature dimensions that are irrelevant for grasping (e.g., color or sound; Fagioli et al., 2007).

Importantly for the present study, intentional weighting can also affect the coding of response representations. In Hommel (1993) it can be argued that the task set results in stronger weighting of key vs. light location, depending on the instruction. One could ask, however, whether this implies weighting of feature dimensions. Indeed, on closer examination, both the key and the light location are represented by the *same* spatial feature dimension (i.e., left–right). Therefore one could argue that not feature dimensions, rather the respective *sensory dimensions* are selectively enhanced by top-down task influences. In other words, the task instruction determines whether a participant attends to either the (visual) light locations or the (haptic) key locations. Subsequently, the attended locations get encoded on the single spatial left–right feature dimension. The fact that this same left–right feature dimension is also used to encode the stimulus location forms the basis of the observed SRC effects.

## **AIM OF THE CURRENT STUDY**

In line with the above interpretation of the results by Hommel (1993), Memelink and Hommel (2005) demonstrated that mere task instruction may *not* be sufficient to affect action coding if the manipulation does not change the task *goal*. The question then arises: what constitutes a task goal? Does one need to attend to different objects in the environment to selectively enhance sensory coding? Or does the intentional weighting principle apply to more abstract feature codes as well? In the present study we assess the influence of task instruction on automatic processes in stimulus–to–response translation at the feature level.

Since our overall goal is an overarching framework of the interaction between perception and action and cognitive control, the aim of the present study was twofold. First, we were interested to see whether task instruction can change how participants encode a particular movement at the feature level. And, second, we were interested to see whether the outcomes can be accounted for by means of a HiTEC simulation of the task – which could clarify computationally a how task instruction modulates the interplay between perception and action.

In the design of the task there are two important criteria to take into account: (1) the experimental set up needs to employ a *single object* and a *single sensory dimension* which can be encoded in *two different feature dimensions*, based on the task instruction. In this way, we can rule out the role of purely object based attention; (2) the experimental set up needs to use a task in which two different interpretations of the same ambiguous movement are – to a certain extent and in the eyes of the participant – equally intuitive and applicable to the observed (sensory) effects of the physical movements. Otherwise, if participants can easily recode the variations in these dimensions into a single intuitive dimension, they will do so; the influence of task instruction will then disappear (cf., Memelink and Hommel, 2005).

With these criteria in mind we opted for a relatively natural scenario rather than responding by pressing keys (see Wang et al., 2007; Yamaguchi and Proctor, 2011 for similar approaches). In a natural scenario – we hypothesized – participants would be more strongly compelled to adhere to the action coding specified by

the task instruction. In the present study, participants stood on a Wii balance board and were instructed to imagine standing on either a snowboard or a pair of skis. They had to respond to stimuli by leaning sideways. In the ski condition, this lateral movement was presented as moving the skis to the "*left* " or "*right*," whereas in the snowboard condition, it was presented as moving the snowboard "*backward*" or "*forward*." In performing the task, participants could draw on their own motor experience if they had any experience with skiing or snowboarding. Participants who had never skied or snowboarded could still form a mental representation of what it means to be skiing or snowboarding, by combining elements from partial or similar experiences (Barsalou, 2008; Taylor and Zwaan, 2009). For example, they could draw on visual experience (e.g., watching snowboarders on TV), and combine this with related motor experience (e.g., surfing or skateboarding).

In the experiment, the Wii balance board was oriented diagonally toward the screen displaying the stimuli (**Figure 1**). The critical stimuli consisted of colored arrows pointing in one of four directions (backward, forward, left, or right). The study used a between-subjects design; participants were either instructed to imagine standing on a pair of skis or on a snowboard, and to respond to the stimulus color by leaning sideways. Given the diagonal orientation of the balance board, the responses simultaneously varied on the left–right dimension and on the forward–backward dimension.We expected that the weighting of the (feature) dimensions would depend on the instruction given to the participant. A skier stands in the same direction as her skis.When she leans to the left or right, this causes the skis to turn into the respective direction. Therefore, participants in the ski condition would encode the lateral leaning movements as "left" and "right." In contrast, a snowboarder stands on a snowboard perpendicular to its direction of movement. When she leans sideways, the snowboard will slide forward or backward. As a result, we expected that participants in the snowboard condition would not only encode the movements as "left" and "right," but also as "forward" or "backward." Therefore, we expected a forward–backward congruency effect to occur in the snowboard condition, but not in the ski condition.

In the next section we describe the methods of the behavioral experiment.We continue with presenting the results,followed by a HiTEC simulation of the study. Finally,we discuss the implications of both our empirical findings and simulation results.

## **MATERIAL AND METHODS**

#### **PARTICIPANTS**

A total of 83 Dutch undergraduate psychology students from Leiden University (65 women, 18 men) took part in the experiment. In return for their participation they received course credits or a monetary reward of EUR 4.50. Mean age of the participants was 19.8 (SD 2.3).

#### **APPARATUS AND STIMULI**

The instructions and stimuli were presented on a television monitor with a diameter of 107 cm and a refresh rate of 60 Hz. E-Prime software was used to present the stimuli. Stimuli were blue or red symbols, consisting of one direction-neutral stimulus and arrows pointing in one of four different directions; left, right, forward, or backward (**Figure 2**). On screen, each stimulus measured approximately 30 cm × 30 cm.

Participants stood on a Wii balance board (51 cm long × 32 cm wide × 5 cm high), which was placed diagonally, at an angle of 45˚ or −45˚, in front of the monitor.

In order to be able to face the monitor, participants who were positioned at the 45˚ angle always had their left foot forward (i.e., closest to the monitor), and participants at the −45˚ angle always had their right foot forward. Thus, the participant's position with respect to the computer screen was determined by the orientation of the balance board.

The distance between the monitor and the center of the balance board was 200 cm (**Figure 1**). The orientation of the balance board was counterbalanced across participants. Half of the participants stood with their left foot forward, the other half stood with their right foot forward. The participant's weight distribution on the left–right axis and front–back axis of the balance board was recorded at a frequency of 100 Hz. This was done by custom-made software that polls the sensor values of the balance board, using a Bluetooth connection. To respond to a stimulus, participants had to lean sideways far enough to exceed a predefined threshold on the left–right axis of the balance board. When this threshold was exceeded, the response time and accuracy of the response were logged.

#### **PROCEDURE**

The complete experiment lasted approximately 30 min. Upon arrival to the lab, participants were randomly assigned to one of eight counterbalance versions (see **Table 1**), defined by the instruction (snowboard or ski), the orientation of the balance board (45˚ or −45˚), and the stimulus–response mapping (red–left/blue–right or red–right/blue–left). Participants in the snowboard condition received the following instruction: "*Imagine that you're standing on a snowboard, which you can move forward or backward by leaning on your front or back leg*," whereas participants in the ski condition received the alternative instruction: "*Imagine that you're standing on skis, which you can move to the left or right by leaning on your left or right leg*." To enhance the context of the task, an illustration of a skier, or a snowboarder was presented, standing in the same position as the participant on the balance board (see **Figure 3**).

The instruction was followed by a practice block, which contained 24 trials. Each practice trial started with the presentation of the sentence "*Take the start position*" for 1000 ms. Next, the instruction to lean into a particular direction [e.g., "*Move the skis to the left (left leg)*" or "*Move the snowboard forward (front leg)*"] was presented until the participant responded by leaning into the respective direction. In the snowboard condition, the directions were "*backward*" or "*forward*," whereas in the ski condition the directions were "*left* " or "*right*." To enhance the encoding of the movements in the appropriate dimension, participants were instructed to mention out loud the direction in which they had to lean. Following a correct response, the word "*correct* " was presented for 1000 ms. Following a response that was incorrect or

**Table 1 | Overview of the eight different counterbalance versions of the experiment.**


too slow (more than 5000 ms), the word "*error*" or "*too slow*" was presented for 1000 ms.

After completing the practice trials, participants received the instruction for the experimental trials. They were instructed to respond to the stimulus color by leaning into a particular direction. In the snowboard condition, participants had to respond to red or blue stimuli by leaning forward or backward (e.g., "*If the image is red, lean forward*"). In the ski condition, participants had to respond to red or blue stimuli by leaning to the left or right (e.g., "*If the image is red, lean to the left* "). The actual mapping of color to direction was counterbalanced across participants. In addition, participants were urged to respond as quickly and accurately as possible.

The instruction was supported by the illustration of the skier or snowboarder, in which the two skis or the two sides of the snowboard were colored in the corresponding stimulus color (for example, a skier with a red left ski and a blue right ski, see **Figure 3**).

Each trial was either neutral (the neutral shape), left–right congruent (left- or right-pointing arrow, corresponding to the horizontal direction of the response), left–right incongruent (left- or right-pointing arrow, opposite to the horizontal direction of the response), forward–backward congruent (forward- or backwardpointing arrow, corresponding to the forward–backward direction of the response), or forward–backward incongruent (forwardor backward-pointing arrow, opposite to the forward–backward direction of the response).

The experiment was divided into four blocks with 50 trials each. Since there were 10 different stimuli (two colors; red and blue, and five orientations; backward, forward, left, right, and neutral), each stimulus was repeated five times during each block. Stimuli were presented in random order. A trial started when the participant had taken the start position and his/her balance was centered on the Wii balance board. After 500 ms, a black fixation cross was presented for 1000 ms, followed by the experimental stimulus. The

stimulus remained on the screen until the participant's response was recorded or until 5000 ms had elapsed. If the response was incorrect or too slow, a feedback screen was presented for 2000 ms, displaying the word "*error*" or "*too slow*." If the response was correct, no feedback was given. After completing a trial, participants had to return their balance to the center of the balance board. Following each block of 50 trials, there was a short break of 10 s, during which the instruction was repeated. The instruction was visually supported by the same illustration of the snowboarder or skier that had been shown in the initial experimental instruction (**Figure 3**).

After completing the experimental trials, participants indicated whether they had any experience with skiing or snowboarding. Experienced snowboarders also indicated whether they preferred to snowboard with their left foot forward or their right foot forward.

## **RESULTS**

The data from eight participants were discarded because they had an overall accuracy level lower than 0.70. For the remaining participants (38 in the Ski condition and 37 in the Snowboard condition) we computed mean reaction times and accuracy for the responses. Incorrect responses (7.8%) were excluded from the reaction time analysis. Furthermore, based on Tukey's criterion, reaction times below 415 ms and above 1590 ms (5.3%) were also discarded. Mean trimmed reaction times and error rates are presented in **Table 2**. The reaction times were analyzed with a 2 × 2 × 2 repeated measures ANOVA, with dimension (backward–forward vs. left–right) and congruency (congruent vs. incongruent) as within-subject variables, and instruction (ski vs. snowboard) as between-subject variable.

The majority of participants (27 in the ski group, 18 in the snowboard group) had no experience with snowboarding or

**Table 2 | Mean response times (ms) and standard deviations for the different trials in the two instruction conditions.**


skiing, 14 participants had only ski experience (6 in the ski group, 8 in the snowboard group), 5 participants had only snowboard experience (2 in the ski group, 3 in the snowboard group), and 11 participants had both ski and snowboard experience (3 in the ski group, 8 in the snowboard group). Because of the small number of participants in some of the groups, we ignored this factor in the analysis.

There was a main effect of congruency, with congruent trials being faster than incongruent trials, *F*(1,73) = 108.4, *p* < 0.001, η 2 *<sup>p</sup>* = 0.60. In addition, there was a significant interaction between congruency and dimension, *F*(1,73) = 72.5, *p* < 0.001, η 2 *<sup>p</sup>* = 0.50. The congruency effect was larger for the left–right dimension than for the backward–forward dimension. This finding is in line with the left–right prevalence effect found in other studies (e.g., Nicoletti and Umiltà, 1984, 1985; Nicoletti et al., 1988). Different accounts are given for this effect (see e.g., Hommel, 1996; Proctor et al., 2003; Rubichi et al., 2005). We will turn to this matter in the discussion section. Most interestingly, there was a significant three-way interaction between congruency, dimension, and task instruction, *F*(1,73) = 7.1, *p* = 0.01, η 2 *<sup>p</sup>* = 0.09. On the left– right dimension, the congruency effect was significantly larger in the ski condition than in the snowboard condition, *F*(1,73) = 4.5, *p* = 0.04, η 2 *<sup>p</sup>* = 0.60. The opposite result appeared on the front– back dimension; there was a significant congruency effect in the snowboard condition,*t*(36) = 2.4, *p* = 0.02, but not in the ski condition,*t*(37) = 1.0,*p* = 0.33. Although responses in the snowboard condition appeared to be faster in the snowboard condition than in the ski condition, there was no significant main effect of task, *F*(1,73) = 2.7, *p* = 0.11, η 2 *<sup>p</sup>* = 0.03, because the between-subject differences were quite large.

Concluding, significant spatial congruency effects were found both in the left–right dimension and in the forward–backward dimension. Although the instructions did not cause a complete switch of the congruency effects, they modulated the relative size of the effects. On the left–right dimension, the effect was significantly larger in the ski condition than in the snowboard condition. On the forward–backward dimension, the effect was larger in the snowboard condition than in the ski condition. These results suggest that participants in the ski condition may have encoded the movements predominantly as "left" and "right," whereas participants in the snowboard condition may have encoded the movements also as "forward" and "backward." Before discussing our results in more detail, we will first present the HiTEC model and explain how this model can account for our findings.

## **HiTEC SIMULATION**

The experiment was simulated using the HiTEC connectionist model (Haazebroek et al., submitted) in order to explain the results presented above. More specifically,we aimed to simulate the way in which the task context modulates the interaction between stimulus perception and response planning. HiTEC is being developed to computationally specify the mechanisms proposed in TEC (Hommel et al., 2001) in terms of neurally plausible representations and connections. It is the aim to validate TEC's principles and assumptions by means of simulations of particular empirical studies using specific instances of HiTEC (Haazebroek et al., 2009, 2010). In this section we first describe the basic principles of connectionist modeling and discuss global cortical connectivity. We then proceed to discuss HiTEC's general structure and relate this to TECs main assumptions. Finally, we discuss the specific simulation set up for the current study, the simulation results, and the model dynamics in order to account for the empirical findings from Section "Results."

#### **CONNECTIONIST MODELING AND CORTICAL CONNECTIVITY**

In order to devise a neurally plausible model, it is important to consider both representations and patterns of connectivity in the brain. Regarding the former, the primate cortex is composed of a vast amount of spiking neuron cells. The local interactions between these neurons are largely random, but on a group level – a neuron population – the global population activity (i.e., mean spike frequency) can be considered deterministic (Wilson and Cowan, 1972). That is, mean activation depends on various inputs and the decay of the neuron population (see **Figure 4A** for a visual illustration of neuron populations and their inputs).

As we consider a neuron population the basic unit, we can model these neurodynamics with an interactive activation connectionist network (Rumelhart et al., 1986) of units and connections. The propagation of activation of a unit is described by the following equation:

$$\begin{split} A\_i \left( t+1 \right) &= \left( 1 - d\_{\mathfrak{a}} \right) \times A\_i \left( t \right) + \left( 1 - A\_i \left( t \right) \right) \\ &\times \left( \mathrm{Exc}\_i + \mathrm{TD}\_i + \mathrm{Noise}\_i \right) + \mathrm{Inh}\_i \times A\_i \left( t \right) \end{split} \tag{1}$$

This equation states that the activation of unit *i* is determined by its current value, a decay rate *d*<sup>a</sup> (default value of 0.1 in current simulations), excitatory input Exc*<sup>i</sup>* , top-down input Td*<sup>i</sup>* , lateral inhibitory input Inh*<sup>i</sup>* , and background noise input Noise*<sup>i</sup>* (standard Gaussian random additive noise with mean: 0.025, and SD 0.015) The excitatory input is either external stimulation (0.6 in current simulations) or excitatory input originating from connected feedforward units, which is computed according to the following equation:

$$\text{Exc}\_{i} = \sum\_{k} w\_{k}^{+} F\left(A\_{k}\left(t\right)\right) \tag{2}$$

This equation states that the excitatory input consists of the weighted sum of the outputs of all connected feedforward units. Here, *w* <sup>+</sup> are the positive weights of the connections from unit *k* to unit *i*. The output of a unit is a non-linear function of its activation value using the following function with parameters na (4.0 in current simulations) and qa (0.9 in current simulations).

$$F(A\_i) = \frac{A\_i^{\text{na}}}{\left(\text{qa}\right)^{\text{na}} + A\_i^{\text{na}}} \tag{3}$$

Top-down input to a unit originates from units "later" or "higher"in the processing flow and are considered to only enhance activation. This is realized by means of the following computation of top-down input:

$$\text{TD}\_{i} = \sum\_{k} w\_{k}^{+} F\left(A\_{k}\left(t\right)\right) \times \frac{\max\left(A\_{i}\left(t\right) \times \left(1 - d\_{d}\right) - \text{VT}, 0\right)}{1 - \text{VT}} \tag{4}$$

Here, *d*<sup>a</sup> is the same activation decay rate (0.1) as in Eq. 1 and VT (0.5 in current simulations) is a voltage threshold (see also Tononi et al., 1992). When unit *i* has an activation level higher than this threshold, top-down input from connected units is taken into account and rescaled in proportion to the voltage threshold. Conversely, if the unit's scaled activation level is lower than the voltage threshold, this input is discarded.

Finally, inhibition is computed using paired inhibitory units (see also Deco et al., 2002). Each unit has a paired inhibitory unit that receives excitation from the (excitatory) unit and sends inhibition (through negative weights) to (excitatory) units within the same map (i.e., lateral inhibition). This is computed using the following equation:

$$\text{Inh}\_{i} = \sum\_{k} w\_{k}^{-} F\left(A\_{k}\left(t\right)\right) \tag{5}$$

Here, *k* denotes the inhibitory paired unit belonging to any other unit than unit *i* in the map and *w* <sup>−</sup> are the negative connection weights (−0.75 in current simulations). The activation of inhibitory units is updated in a similar fashion as the excitatory

units, but their input can only be excitatory and originating from the paired excitatory units. Note that we do not depict inhibitory units in any model diagram for clarity reasons and that by "code" we always refer to the excitatory unit. In our current simulations the connection weight from an excitatory unit to its paired negative unit is 1.25.

Weights between units are considered to be able to change over time as a result of learning. The weight change depends on the level of activation of both units during learning following Hebbian learning. Weight (bound to vary between 0.0 and 1.0) learning is governed by the following equation:

$$\left(\boldsymbol{w}\_{jk}\left(t+1\right) = \left(1-d\_{\rm w}\right) \times \boldsymbol{w}\_{jk}\left(t\right) + \mathrm{LR} \times \boldsymbol{A}\_{j}\left(t\right) \times \boldsymbol{A}\_{k}\left(t\right)$$

$$\times \left(1-\boldsymbol{w}\_{jk}\left(t\right)\right) \tag{6}$$

In these equations,*wjk* is the weight from unit *j* to unit *k*, the *d<sup>w</sup>* weight decay rate (0.0005 in current simulations) ensures that only repeated co-activations result in stable weight learning, LR (0.1 in current simulations) denotes the learning rate (i.e., the magnitude of the change in weights for each learning trial), *Aj*(*t*) is a value based on the activation of feature code unit *j*,*Ak*(*t*) is a value based on the activation of motor code unit *k*.

In sum, these modeling equations and parameters allow for a biologically plausible simulation of activation propagation through a network of units. Higher decay rates make units decay faster; lower decay rates keep units very active for a longer period of time. Higher input values for external input and stronger weights between units result in faster activation propagation. Higher voltage thresholds make unit activation to a lesser extent enhanced by top-down input; conversely, lower voltage thresholds lead to earlier and stronger influence of top-down modulation on unit activation. Stronger weights between excitatory and inhibitory units strengthen the lateral inhibition mechanism. As a result, they reduce the time required to settle the competition between the units within a shared map, after which only one unit remains strongly activated. Lower weights, conversely, lengthen this time to convergence.

With this basic connectionist machinery in place we can turn to (global) cortical patterns of connectivity. The neurons in primate cortex are organized in numerous interconnected cortical maps (see **Figure 4B**). This allows the brain to encode perceived objects in a distributed fashion. That is, different features are processed and represented across different cortical maps (e.g., DeYoe and Van Essen, 1988), coding for different perceptual modalities (e.g., visual, auditory, tactile, proprioceptive), and different dimensions within each modality (e.g., visual color and shape, auditory locationn, and pitch). Each sensory cortical map contains neurons that are responsive to specific sensory features (e.g., a specific color or a specific visual location). Sensory representations are known to have stronger decay than higher level representations; in simulations, this is typically reflected by a stronger decay rate (0.2 in current simulations) for sensory code units than for other units (0.1 default decay rate). Cortical maps in the motor cortex contain neurons that code for more or less specific movements (e.g., the muscle contractions that produce the movement of the hand pressing a certain key, or more complex movement such as shifting one's weight to the right). Higher up in the processing stream there are cortical maps containing neurons that are receptive to stimulation from different modalities. In effect, they are considered to integrate information from different senses and modalities. Finally, neurons in the prefrontal cortex are involved in task-generic cognitive control (Duncan and Owen,2000). These levels of representationform the basis of the HiTEC model.

## **HiTEC MODEL**

Now, taking this general cortical layering, connectivity, and dynamics, the question arises: how are these connectionist network units interconnected in order to yield behavior that is typically associated with processes like stimulus perception, response selection, and response planning? To this end, we present the HiTEC connectionist model, based on TEC's main assumptions. HiTEC's general structure contains sensory maps, feature maps, a task map, and a motor map, as depicted in **Figure 5**. Each map resembles a cortical map and contains codes implemented as connectionist network units as described above.

connections, dashed lines are connections that are learned during action–effect learning. Depicted is the model in snowboard instruction condition, where the left leg is the front leg, and where a red stimulus

Note that **Figure 5** shows only those sensory maps that are relevant for modeling the current experiment: visual color, visual shape, and proprioceptive direction. However, other specific instances of the model may include other sensory maps as well (e.g., auditory maps). Although motor codes could also be organized in multiple maps, in the present version of HiTEC, we consider only one basic motor map with a set of motor codes.

Theory of event coding's notion of feature codes (Hommel et al., 2001) is captured at the feature level by codes that are connected to and thus grounded in both sensory codes and motor codes. Crucially, the same (distal) feature code (e.g., "left") can be connected to multiple sensory codes (e.g., "left proprioceptive direction" and "left visual shape"). Thus, information from different sensory modalities and dimensions is combined in one feature code representation. It is assumed that feature codes arise from regularities in sensorimotor experience, presumably by detecting co-occurrences of sensory features. The distal feature "left," for example, could arise from perceptual experience of numerous objects that were visible and audible on the left. Future encounters of objects audible on the left activate the "left" feature code

that "forward" and "backward" feature codes are abbreviated as "FW" and "BW" and that "L/F" denotes the ambiguous left/forward sensory code and "R/B" the right/backward sensory code.

which – by means of its connections to both "left auditory location" and "left visual location" – will enhance the processing of visual left locations. In other words, hearing something on the left will result in expecting to see something on the left as well, which seems to be quite useful, for example when visual sensory input is degraded. In the present HiTEC model, for current simulation purposes, we assume that the feature codes (and their connections to sensory codes) already exist.

Finally, the task level contains generic task control codes that reflect alternative stimulus–response combinations resulting from the task context. Different task codes reflect different response choice options within the task context (i.e., the typical "*if X then do Y* " task rules). Task codes connect to feature codes only, both the feature codes that represent stimuli and the feature codes that represent responses, in close correspondence with the current task context. For the current study the appropriate task codes, feature codes, and their connections are depicted in **Figure 5** (i.e., snowboard condition).

In line with TEC, responses are encoded in terms of their perceivable effects. This assumption is derived from the ideomotor theory (Hommel, 2010, 2013), which presumes that when an action is executed, the motor pattern is automatically associated to the perceptual input representing the effects of the action in the distal environment. For example, a novice snowboarder learns that by shifting her weight laterally, she can control the forward movement of her snowboard. She may learn that her snowboard slides forward when she leans to the left, and that it slides backward when she leans to the right (the precise mapping depends on the snowboarder's position on her board). Thus, when she leans to the left and moves forward as a result, the action is not only perceived and represented as "left," but also as "forward."After learning these action–effect associations, the snowboarder can plan and control her movements by anticipating their perceptual effects; that is: (re-)activating the motor patterns by intentionally (re-)activating the associated feature codes. Thus, when an expert snowboarder intends to move "forward," she will automatically shift her weight into the appropriate direction.

Note that the basic dynamics of connectionist modeling used in HiTEC resembles those used in typical connectionist network models (PDP models, e.g., Rumelhart et al., 1986). However, here, input from feedforward and feedback connections is combined, resulting in activation flowing back and forth between units on various levels of coding. This sets the type of modeling apart from – for example – various feedforward PDP models of automaticity (e.g., Cohen et al., 1990; Zorzi and Umiltà, 1995). In addition, codes within the same map inhibit each other. Together, this results in a global competition mechanism in which *all* codes participate, from the first processing cycle to the last.

#### **SIMULATING BEHAVIORAL STUDIES**

Using HiTEC, specific behavioral studies can be simulated. In behavioral studies, participants typically perceive a stimulus and select and plan an action response. In general, a stimulus is presented to the HiTEC model by applying excitatory input to its sensory codes. After a number of cycles of internal processing a motor code becomes highly activated. When this motor code activation exceeds the set response threshold, this response is considered to be produced. Codes and their connections reflect both prior experience and task instructions. By measuring the number of cycles necessary to produce a motor response in various conditions, reaction time can be computed and compared to human data. More importantly, however, the internal dynamics of the model can shed light on the computational principles underlying both the simulation and the empirical results.

In behavioral experiments, participants typically receive a verbal instruction of the task. In HiTEC, a verbal task instruction is internalized as connections between feature codes (cf., in humans presumably using verbal labels, Hommel and Elsner, 2009) and generic task codes. Due to the mutual inhibitory links between these task codes, they will compete with each other during the task. Currently, the connections between feature codes and task codes are systematically set by hand in close correspondence with the task instruction.

Connections between feature codes and motor codes are explicitly learned, following the general set up of action–effect learning paradigms (e.g., Elsner and Hommel, 2001): at first, a random motor code is activated, comparable to the spontaneous motor

babbling behavior of newborns. This leads to a change in the environment (e.g., the left hand suddenly touches an object) that is registered by sensory codes. Activation propagates from sensory codes toward feature codes. Subsequently, associations are learned between the active feature codes and the active motor code using the Hebbian learning equation described in Section "Connectionist Modeling and Cortical Connectivity." Once associations between motor codes and feature codes exist, they can be used to select and plan actions. Planning an action is realized by activating the feature codes that correspond to its perceptual effects and by propagating their activation toward the associated motor codes. Initially, multiple motor codes may become active as they typically fan out associations to multiple feature codes. However, some motor codes will have more associated features and some of the associations between motor codes and feature codes may be stronger than others resulting in variations in dynamics. In time, the network will converge toward a state where only one motor code is strongly activated, which leads to the selection of that motor action.

When a stimulus in an experimental trial is presented, the corresponding sensory codes are activated. Activation gradually propagates toward the associated feature codes and toward those task codes that were associated during task preparation. Consequently, activation is propagated to feature codes that correspond to (perceptual effects of) responses and finally toward motor codes (that were associated during action–effect learning).

Note that all codes are involved from stimulus onset and gradually activate each other; as a result competition takes place between feature codes, between task codes, and between motor codes, simultaneously. Once any one of the motor codes is activated strongly enough, it leads to the execution of the respective motor response to the presented stimulus. In our simulations, this marks the end of a trial.

In general, the passing of activation between codes along their connections is iterated for a number of cycles, which allows for the simulation of reaction time (i.e., number of cycles from stimulus onset to response selection) until the activation level of any one of the motor code reaches a set threshold value (0.6 in current simulations).

#### **MODELING THE CURRENT EMPIRICAL STUDY**

The current study involves colored arrow-shaped stimuli and responses that require a participant to move his/her balance to a certain direction (left/forward and right/backward). In order to be able to register these sensations, the HiTEC model is equipped with sensory maps for color, shape, and proprioceptive direction. In addition, two movements are included in the motor map. We could have included more sensory maps or motor codes, but these would not be activated by any stimulus in the current study. For clarity reasons, we restricted the model to relevant codes only.

The task context includes instructions for responding to the stimulus color ("*red*" or "*blue*"), by moving either "*left* " vs. "*right* " or "*forward*" vs. "*backward*," depending on the instruction group. We have included feature codes for these terms and have connected these codes to task codes appropriately. For each simulated subject, there are only two task rules to choose from, reflected by the two task codes in the task map. **Figure 5** depicts

the codes and connectivity for a simulated subject in the snowboard condition who was instructed to respond to red stimuli by moving forward, and to blue stimuli by moving backward, as can be seen by the connections between feature codes and task codes.

As illustrated in **Figure 5**, sensory codes are connected to feature codes (feedforward weight 0.4,feedback weight 3.0). Stimulus related feature codes are connected to task codes (feedforward weight 1.5, feedback weight 0.2) and task codes to response related feature codes (feedforward weight 1.5, feedback weight 0.2) allowing activation to propagate from sensory codes to stimulus related feature codes to task codes to response related feature codes. Connections between feature codes and motor codes are explicitly learned. Importantly, in the current simulation, we have taken into account that the cognitive system has more experience with coding for "left" and "right" than is the case for "forward" and "backward." In the model this is realized by setting the weights from sensory codes toward "forward" and "backward" slightly lower (0.3 rather than 0.4).

Note that the sensory codes for proprioceptive direction (i.e., proprioceptive map in **Figure 5**) are not considered "left" vs. "right" or "forward" vs. "backward" by themselves. They represent two ambiguous sensations that can activate feature codes in both feature dimensions. We shall see that task context (i.e., the connections between feature codes and task codes, in close correspondence with the task instruction) determines to what extent this sensation is perceived as "left" vs. "right" or "forward" vs. "backward."

The HiTEC simulation of the current empirical study consists of 40 simulated subjects in the ski condition and 40 simulated subjects in the snowboard condition. For each simulated subject, first the instruction is internalized by setting its task code–feature code connections appropriately; then, during 20 training trials feature code–motor code connections are learned, and finally, 20 repetitions of the 10 experimental trials (i.e., 2 colors × 5 shapes) are performed. This corresponds to the design of the empirical study as discussed in Section "Material and Methods." Each individual simulated subject has its own random noise resulting in subtle individual differences in processing and in variance in behavior (i.e., varying reaction times) as is the case with individual human participants.

## **SIMULATION RESULTS**

**Table 3** shows the average number of cycles from stimulus onset until response selection for both instruction conditions and both congruency levels. As accuracy was 1.0 for all simulated subjects, it was not regarded in the analysis. The three-way interaction between congruency, dimension, and task instruction found in the experiment was replicated in the simulation, as depicted in **Figure 6**. The left–right congruency effect was larger in the ski condition, whereas the forward–backward congruency effect was larger in the ski condition. We now explain how these results arose in the simulation by discussing the model dynamics in more detail.

Not that the HiTEC simulation only covers a part of the entire process of stimulus to response production in humans. The actual movements, for example, are included in the empirical reaction times (**Table 2**) but are not part of the simulation reaction times **Table 3 | Average number of processing cycles from stimulus onset until stimulus selection in the HiTEC model, based on all 80 simulated subjects.**


(**Table 3**). This results in larger relative effect sizes in the simulation results as compared to the empirical data.

#### **MODEL DYNAMICS DURING SIMULATION**

Although the stimuli and responses are equal for both instruction groups, the congruency effects differ. These differences between the groups are the result of several dynamics of the model, as we will now explain.

The task instruction is reflected by connections between task codes and feature codes. These connections are bidirectional. As a consequence, activating a feature code will activate each connected task code, which on its turn will activate or enhance all connected feature codes, including the feature code that activated the task code in the first place (i.e., recurrent connectivity). This means that the mere fact of being connected to a task code will further enhance the activation of a feature code. For the ski instruction group, this means that "left" and "right" feature codes receive this enhancement, for the snowboard group this is the case for the "forward" and "backward" feature codes.

Crucially, this selective enhancement is already at play during the learning trials. When a motor code is activated during a learning trial, and its effects are presented to the model, the mere connections between feature codes and task codes will enhance either the "Left" and "Right" feature codes (in the ski condition) or the "Forward" and "Backward"feature codes (in the snowboard condition) and thereby determine the coding of the ambiguous sensation. When the action–effect produced by "M1" is presented (i.e., activating the "L/F" proprioceptive code) this results in a slightly higher activation for the "Left"feature code in the ski condition and a slightly higher activation for the "Forward" feature code in the snowboard condition, as shown in **Figures 7A,B**.When the action–effect produced by"M2"is presented (i.e., activating the "R/B" proprioceptive code), this works in similar fashion.

During the 20 learning trials, this minimal difference in feature code activation results in pronounced differences in the weights learned (see **Figures 7C,D**) and prepares the model for the experimental trials. Note that in the ski condition, the weights between the "Left"/"Right" feature codes and motor codes are strong and the weights between the "Forward"/"Backward" feature codes and motor codes are rather moderate (**Figure 7C**). This is due to both the connections between the task codes and the "Left"/"Right" feature codes and the stronger connections between sensory codes and the "Left"/"Right" feature codes (as compared to the connections between sensory codes and the "Forward"/"Backward" feature codes). In the snowboard condition, the weights between

the "Left"/"Right" feature codes and the motor codes are roughly equally strong as the weights between the "Forward"/"Backward" feature codes and the motor codes (**Figure 7D**). This is due to the "Forward"/"Backward" feature codes being connected to the task codes, resulting in top-down enhancement of these feature codes. At the same time, the "Left"/"Right" feature codes receive more excitatory input due to their stronger connections with the sensory codes.

During the subsequent experimental trials, the model is set to respond to stimulus color and automatically takes stimulus direction into account (stimulus–response congruency, SRC). This is a result from the fact that the model codes for responses and stimuli using common spatial feature codes. In the ski condition, the feature codes "Left" and "Right" are used to encode the responses. When perceiving a horizontal arrow stimulus, however, "Left" and "Right" are also used to encode this stimulus. When a congruent stimulus is presented, the corresponding feature code is already activated to encode this stimulus and therefore speeds up the encoding of the response. When an incongruent stimulus is shown, the wrong feature code is activated which slows down the activation – by means of lateral inhibition – of the correct response feature. This results in longer reaction times for incongruent than for congruent stimuli.

Now, the overlap between feature codes of stimulus and response obviously depends on the spatial coding of the response. As a result of task instruction and subsequent action–effect learning, this is different for the ski group and snowboard group. We now describe in detail the dynamics of the model during the experimental trials in both ski and snowboard conditions and for each type of stimulus (left–right congruent and incongruent, forward– backward congruent and incongruent) as depicted in the panels of **Figure 8**.

In panel A, a red left arrow stimulus is presented to the model in the ski condition, resulting in an initial increase of activation of "Red" and "Left" feature codes. In line with the ski task set, activation propagates from "Red" to a task code and to the "Left" feature code. This overlap results in a fast increase of activation of the "Left" feature code. In the ski condition the "Left" feature code is strongly connected to "M1," resulting in fast activation propagation toward motorcode "M1" and fast action selection. This explains the relatively shorter reaction times for the left–right congruent trials in the ski condition.

In panel B, a red left arrow stimulus is presented to the model in the snowboard condition, resulting in an initial increase of activation of "Red" and "Left" feature codes. In line with the snowboard task set, activation propagates from "Red" to a task code and to the "Forward" feature code; hence the subsequent increase in activation of the "Forward" feature code. In the snowboard condition, "Left," "Right" *and* "Forward" and "Backward" feature codes are strongly connected to the motor codes (as depicted in **Figure 7B**). Thus, both "Left" and "Forward" now propagate activation toward motor code "M1" resulting in fast action selection. This explains the relatively shorter reaction times for the left–right congruent stimulus trials in the snowboard condition.

In panel C, a red right arrow is presented to the model in the ski condition, resulting in initial increase of activation of "Red" and "Right" feature codes. In line with the ski task set, activation propagates from "Red" to a task code and to the "Left" feature code; hence the subsequent increase in activation of the "Left"feature code. Now, both "Left" and "Right" feature codes are active and highly competing. They are both strongly connected to different motor codes that both receive activation and also compete with each other. This competition takes time and lengthens the trial.

In panel D, a red right arrow is presented to the model in the snowboard condition, resulting in initial increase of activation of "Red" and "Right" feature codes. In line with the snowboard task set, activation propagates from "Red" to a task code and to the "Forward" feature code, hence the subsequent increase in activation of the "Forward" feature code. Now, the "Forward" feature code is strongly connected to the M1 motor code, the motor code to be selected. The "Right" feature code, however, is (even more) strongly connected to the "M2" motor code. As both "Forward" and "Right" feature codes are highly activated and propagate activation to both motor codes, it takes longer for the system to settle this competition. This explains the relatively longer reaction times for the left–right incongruent stimulus trials in the snowboard condition.

In panel E, a red forward arrow is presented to the model in the ski condition, resulting in an initial increase of activation of "Red" and"Forward"feature codes. In line with the ski task set, activation propagates from"Red" to a task code and to the"Left"feature code; hence the subsequent increase in activation of the "Left" feature code. Now, in the ski condition the "Left" feature code is strongly connected to the "M1" motor code, the motor code to be selected.

The "Forward" feature code, however, is very weakly connected to the "M1"motor code. Thus the activation mainly propagates from the "Left" feature code toward the "M1" motor code resulting in a speedy selection of the "M1" motor code, whereas the activation of the "Forward" feature code has minimal influence. This explains the unaffected reaction times for the forward–backward congruent stimulus trials in the snowboard condition.

In panel F, a red forward arrow is presented to the model in the snowboard condition, resulting in an initial increase of activation of "Red" and "Forward" feature codes. In line with the snowboard task set, activation propagates from "Red" to a task code and to the "Forward" feature code. This overlap results in fast increase of "Forward" feature code activation. In the snowboard condition the"Forward"feature code is strongly connected to"M1," resulting in fast activation propagation toward "M1" and fast action selection. This explains the relatively shorter reaction times for the forward–backward congruent trials in the snowboard condition.

In panel G, a red backward arrow is presented to the model in the ski condition, resulting in an initial increase of activation of "Red" and "Backward" feature codes. In line with the ski task set, activation propagates from "Red" to a task code and to the

dotted lines the activation of "Forward" and "Backward" feature codes. Trials start with stimulus presentations, hence the fast increase of feature codes that are connected to the sensory codes activated by stimulus presentation. Trials end when a motor code (in these trials motor code M1) reaches the response threshold of 0.6. See text for further explanations of the dynamics leading to action selection.

"Left" feature code, hence the subsequent increase in activation of the "Left"feature code. Now, in the ski condition the "Left"feature code is strongly connected to the"M1"motor code, the motor code to be selected. The "Backward" feature code is connected to the "M2" motor code, introducing competition. However, in the ski condition this latter connection is very weak. Thus the activation mainly propagates from the "Left" feature code toward the "M1" motor code resulting in a speedy selection of the "M1"motor code, whereas the activation of the "Backward" feature code has minimal influence. This explains the unaffected reaction times for the forward–backward incongruent stimulus trials in the snowboard condition.

In panel H, a red backward arrow is presented to the model in the snowboard condition, resulting in an initial increase of activation of "Red" and "Backward"feature codes. In line with the snowboard task set, activation propagates from "Red" to a task code and to the "Forward" feature code. Now, both "Forward" and "Backward"feature codes are active and highly competing. They are both strongly connected to different motor codes that also compete. This competition takes time and lengthens the trial, explaining the relatively longer reaction times for the forward–backward incongruent stimulus trials in the snowboard condition.

In sum, the stronger connections between sensory codes and the "Left"/"Right" feature codes (as compared to the weaker connections between sensory codes and the "Forward"/"Backward" feature codes) together with the differences in mere connectivity between feature codes and task codes – which results from different task instructions – yield a pattern of left–right and forward– backward SRC effects that is comparable to the findings from the empirical study.

## **DISCUSSION**

The Simon effect is known as a particularly robust effect. The empirical study presented here uses a two-dimensional Simon task with two groups of participants who only differ in the instruction (i.e., ski vs. snowboard) they received. And yet, the presence and size of the Simon effect is strongly dependent on the instruction: the left–right congruency effect is larger in the ski condition than in the snowboard condition, while the forward–backward effect only appears in the snowboard condition. Obviously, then, the task instruction moderates the internal translation process from stimulus to response.

Using the TEC, these results could be explained in terms of feature code overlap and intentional weighting: the task context modulates to what extent a feature dimension (i.e., forward– backward or left–right) is used for response coding. Since these feature codes are used both for stimulus encoding and response planning, this results in either facilitation or interference, yielding a stimulus–response congruency (SRC) effect. Simulations using the HiTEC model show how this result may emerge. Task instruction is implemented as connections between feature codes and task codes, closely following the verbal instructions. This mere connectivity automatically results in specific recurrency that selectively enhances either the "Left" vs. "Right" or the "Forward" vs. "Backward" feature codes when perceiving action–effects. This leads to differences in action–effect weight learning and subsequently in how a response is encoded. These differences in response coding, in turn, influence the degree in which the feature codes representing stimuli and responses overlap, giving rise to different SRC effects across conditions.

The data from the empirical study and the results from the simulation clearly show a stronger congruency effect for the left– right dimension than for the forward–backward dimension (see **Figure 6**, depicted effect sizes are listed in **Tables 2** and **3**). As mentioned in Section "Results," the asymmetry in the empirical data is in line with the left–right prevalence effect found in other studies (e.g.,Nicoletti and Umiltà, 1984, 1985; Nicoletti et al., 1988). In the current study, we hypothesize that the use of left and right feet – for both left–right and forward–backward responses – may have yielded this prevalence effect (cf. Hommel, 1996). In more general terms, it could be argued (Rubichi et al., 2005) that the right– left discrimination is over-learned and produces faster processing than discriminations on other dimensions. In the model, the left– right dimension was enhanced by strengthening the connection between the sensory codes and feature codes (0.4 for connections to "Left"/"Right," 0.3 connections to for "Forward"/"Backward"). This resulted in a left–right prevalence effect, similar to the effect found in the empirical data.

## **RELATED WORK**

Our findings are in line with earlier work on the impact of instructions (Hommel, 1993) and otherwise induced task-relevance of stimulus and response dimensions (Memelink and Hommel, 2005) on the Simon effect. Indeed, the effect of task goals on the interaction between perception and action in this study can be ascribed to the basic principle of intentional weighting (Memelink and Hommel,2012). It should be noted, however, that although the current study shows strong resemblance to the experiment conducted by Hommel (1993), the studies differ in *how* intentional weighting is assumed to be at play. In Hommel (1993), different aspects of the action–effect (i.e., light vs. key) contributed selectively to the *same feature dimension* (i.e., left–right) depending on the task instruction. Describing that task in terms of "*key pressing* " focused the (spatial) attention on the keys and increased the contribution of key location to the left–right dimension, whereas describing it in terms of "*light switching* " focused attention on the lights and increased the contribution of light location to the left– right dimension. Subsequently, the stimuli were encoded using this same left–right dimension. This resulted in either facilitation or interference yielding the observed SRC effect. This is fully in line with HiTEC logic, and it has been successfully replicated in HiTEC (Haazebroek et al., submitted).

In contrast, in the current study, a *single sensory dimension* (i.e., proprioceptive balance) was assumed to map onto *two distinct feature dimensions* (i.e., left–right and forward–backward). Here, task instruction modulated the relative weighting of these two feature dimensions in the coding of the response. Subsequently, left vs. right directed stimuli were encoded using the left–right feature dimension and forward vs. backward directed stimuli were encoded using the forward–backward feature dimension. The relative weighting of these feature dimensions – modulated by task instruction – determined the relative sizes of the left–right SRC effects and forward–backward SRC effects, as observed in both the empirical data and simulation results. Indeed, the present

empirical study and simulation results demonstrate that intentional weighting is not limited to weighting sensory dimensions, as demonstrated by Hommel (1993) and simulated by Haazebroek et al. (submitted), but also extends to weighting abstract feature dimensions.

Yamaguchi and Proctor (2011) also found that the SRC effect depends on the attentional demands of the task. In their study participants controlled a simulated aircraft. A response yielded action–effects on multiple dimensions: movement of the aircraft, movement of the horizon and the physical joystick movement. In this study, SRC effects depended on whether the (visual) emphasis was on the orientation of the aircraft (i.e., aircraft tilt, fixed horizon) or of the horizon (i.e., fixed aircraft, horizon tilt), which resonates well with our findings. Their work on a multidimensional vector model of SRC (Yamaguchi and Proctor, 2012) also addresses the issue of task context in the Simon task. They mathematically model the S–R vector space and treat stimulus features and responsefeatures in similarfashion,which is completely in line with our HiTEC model. HiTEC, however, is not aimed at mathematical minimalism, rather at biological plausibility: connectionist codes with activation dynamics that approximate biological neuron populations, bi-directional connections, and within-layer lateral inhibition.

At first sight, the general architecture of HiTEC, a model of codes, and connections, is in line with existing models (e.g., Zorzi and Umiltà, 1995; Kornblum et al., 1999), but there are some crucial differences to be noted: in HiTEC: (1) responses are coded as motor codes which are associated with feature codes as a result of *learning* rather than as a fixed connotation; (2) compatibility effects arise from the fact that the *same* feature codes are used to represent stimuli and responses at the feature level, rather than assuming spatial similarity between stimuli and responses; (3) in line with the response-discrimination hypothesis (Ansorge and Wühr, 2001), the task instruction determines the response coding and thus influences SRC.

Moreover, the model is compatible with the main claims of embodied cognition theories. In fact, HiTEC's concepts are entirely grounded in sensorimotor experience and even the grounding process itself is explicitly modeled. In line with TEC (Hommel et al., 2001), feature codes are assumed to be extracted from regularities in prior sensorimotor experience and can only exist by virtue of their connections to sensory codes. In the current simulation, the model contains feature codes that link to lower level sensory codes. In the same vein, feature codes link to motor codes. In our modeling we explicitly show how these associations are strengthened: through sensorimotor experience. Connections to sensory codes are grounded in regularities in sensory input; connections to motor codes are grounded in regularities in action–effects that follow motor code activation. In our model, task codes are fully generic and recruited when needed. They themselves are meaningless but only function as relay nodes when processing informationfrom (stimulus) perception to action (effect) planning, and vice versa.

The fact that the translation from perception to action involves feature codes that are necessarily grounded in sensorimotor experience is, in HiTEC modeling, the main reason why stimulus–response congruency occurs: the model cannot perceive stimuli or plan actions without using these grounded feature codes. The feature codes used for perceiving stimuli and those used for planning actions (i.e., by anticipating and representing action–effects) are grounded in the same perceptual world (Prinz, 1992) and are therefore prone to overlap. When perception of a particular stimulus and the planning of a particular response involve the same feature code, this code overlap results in either facilitation or interference (Hommel, 2004). This is the foundation of the observed SRC effect (for a more elaborate discussion and application to a variety of SRC paradigms, see also Haazebroek et al., submitted).

By the same token, processing a task instruction is assumed to activate these feature codes grounded in sensorimotor experience. Implementing a – in principle abstract – task set automatically wires the feature codes into a stimulus–to–response processing pathway. The fact that these feature codes also represent (prior) sensorimotor experience (i.e., by virtue of their connections to sensory codes) allows the task instruction to modulate subsequent sensorimotor processing (i.e., by top–down enhancing feature codes and therefore sensory codes), even on the automatic level of SRC.

HiTEC is also compatible with the idea that concepts are flexible and context-dependent. According to the embodied cognition view, concepts are learned from recurrent sensorimotor experiences. During those experiences, the patterns of activity in sensory-motor brain areas are captured and stored in memory to form elaborated, multimodal knowledge structures, called simulators. Representation is achieved by reactivating a subset of this stored knowledge to construct a specific simulation. The exact content of a particular simulation depends on the individual's experience with the simulated concept, as well as on situational factors such as current goals and task demands (Barsalou, 1982, 1993; van Dantzig et al., 2011). This flexibility is strongly reflected in HiTEC. For example, in the current simulation, the task instruction influenced how an ambiguous movement was encoded and represented in the model. Similarly, the context or task instruction could influence which features or feature dimensions of a stimulus are most relevant, and thereby enhance the processing of these features or dimensions (the intentional weighting principle). Indeed, several recent studies have shown that spatial congruency effects only occur when participants perform a task that emphasizes the relevant conceptual dimension of a stimulus. For example, Schubert (2005) found that spatial congruency between power and vertical position only occurred when participants made power judgments of words such as "king" or "servant," but not when they judged the valence of these stimuli. Similarly, Zanolie and Pecher (under review) found a spatial congruency effect between number size and horizontal position when participants processed the magnitude of numbers, but not when they simply viewed the numbers or judged whether the numbers were even or uneven. Similar results were obtained by Santiago et al. (2012), who showed that conceptual congruency effects only appeared when participants attended to the relevant conceptual dimension, either through task instruction or by means of exogenous attentional cueing.

To conclude, perception, cognition, and action interact by using common representations. Many studies and theoretical accounts focus on bilateral interactions between perception–cognition, action–cognition, and perception–action. In this paper, we have shown that the interaction between perception and action is strongly influenced by cognition (i.e., task instruction). Cognition, in turn, is based on prior sensorimotor experience, and is therefore grounded in perception and action. In addition to our empirical findings on a two-dimensional Simon task we set out to provide an overarching framework that connects various findings and explains computationally *how* perception, action, and

**REFERENCES**


cognition interact. We hope that the combination of our empirical work and the computational model contribute to a better understanding of the complex interaction between perception, action, and cognition.

## **ACKNOWLEDGMENTS**

The authors wish to thank all students who participated in the Wii Lab bachelor projects.

translation," in *Interaction Between Dissociable Conscious and Nonconscious Processes*, eds Y. Rossetti and A. Revonsuo (Amsterdam: John Benjamins Publishing Company), 223–244.


effects with Stroop-like stimuli. Simon-like tasks, and their factorial combinations. *J. Exp. Psychol. Hum. Percept. Perform.* 25, 688–714.


processing," in *Parallel Distributed Processing: Explorations in the Microstructure of Cognition*, eds D. E. Rumelhart and J. L. McClelland (Cambridge, MA: MIT Press), 45–76.

Santiago, J., Ouellet, M., Román, A., and Valenzuela, J. (2012). Attentional factors in conceptual congruency. *Cogn. Sci.* 36, 1051–1077.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 October 2012; accepted: 15 April 2013; published online: 07 May 2013.*

*Citation: Haazebroek P, van Dantzig S and Hommel B (2013) How task goals mediate the interplay between perception and action. Front. Psychol. 4:247. doi: 10.3389/fpsyg.2013.00247*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Haazebroek, van Dantzig and Hommel. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Exploring modality switching effects in negated sentences: further evidence for grounded representations

#### **Lea A. Hald1,2\*, Ian Hocking<sup>2</sup> , David Vernon<sup>2</sup> , Julie-Ann Marshall 2,3 and Alan Garnham<sup>4</sup>**

<sup>1</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands


<sup>4</sup> School of Psychology, University of Sussex, Falmer, East Sussex, UK

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Diane Pecher, Erasmus University Rotterdam, Netherlands Max Louwerse, University of Memphis, USA

#### **\*Correspondence:**

Lea A. Hald, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, P.O. Box 9104, 6500 HE Nijmegen, Netherlands. e-mail: l.hald@donders.ru.nl; lea.hald@gmail.com

Theories of embodied cognition (e.g., Perceptual Symbol Systems Theory; Barsalou, 1999, 2009) suggest that modality specific simulations underlie the representation of concepts. Supporting evidence comes from modality switch costs: participants are slower to verify a property in one modality (e.g., auditory, BLENDER-loud) after verifying a property in a different modality (e.g., gustatory, CRANBERRIES-tart) compared to the same modality (e.g., LEAVES-rustling, Pecher et al., 2003). Similarly, modality switching costs lead to a modulation of the N400 effect in event-related potentials (ERPs; Collins et al., 2011; Hald et al., 2011). This effect of modality switching has also been shown to interact with the veracity of the sentence (Hald et al., 2011). The current ERP study further explores the role of modality match/mismatch on the processing of veracity as well as negation (sentences containing "not"). Our results indicate a modulation in the ERP based on modality and veracity, plus an interaction. The evidence supports the idea that modality specific simulations occur during language processing, and furthermore suggest that these simulations alter the processing of negation.

**Keywords: ERP, N400, negation, embodiment, language processing, veracity, modality, modality switch effect**

## **INTRODUCTION**

When reading, it has been demonstrated that switching from a sentence primarily describing information in one modality to text describing information in another modality leads to an increase in processing cost (the modality switch effect, Pecher et al., 2003). Similar modality switching effects have been found across both conceptual and perceptual processing tasks (e.g., Spence et al., 2001; Marques, 2006; Vermeulen et al., 2007; Van Dantzig et al., 2008). For instance, Pecher et al. (2003) presented participants with short sentences one after another that consisted of a concept followed by a modal property (they used audition, vision, taste, smell, touch, and action). Unknown to the participants, the sentences were actually in pairs that either matched or mismatched in modality. For example, a matched auditory modality would be *Leaves can be rustling* followed by *A blender can be loud* vs. mismatched gustatory-auditory modalities *Cranberries can be tart* followed by *A blender can be loud*. Although participants were unaware that the sentences were paired, reaction times to verify whether the final word was a typical property of the concept (e.g., that loud was a typical property of the concept blender, property verification task) were faster and more accurate when the pairs of sentences matched in modality compared to pairs that mismatched. Recent evidence indicates that the modality switch effect also results in a modulation of event-related potentials (ERPs), specifically a modulation of the N400 effect (e.g., Collins et al., 2011; Hald et al., 2011; described in more detail below). An N400 is a negative deflection in the ERP that begins around 250 ms post stimulus onset and peaks around 400 ms. It is typically larger

across the centro-parietal electrode sites. Broadly speaking, an N400 effect has been shown to occur to any meaningful stimuli, such as a word, picture, or sign in sign language, that is either less expected or anomalous based on the particular context or knowledge a person has about the situation (see Kutas and Federmeier, 2011, for a recent review). Typically, the modality switch effect has been explained by the idea that our conceptual system is grounded in modality specific or embodied simulations (e.g.,Barsalou, 1999; Glenberg and Robertson, 1999, 2000; Zwaan, 2004; Zwaan and Madden, 2005; but, see also Louwerse and Connell, 2011, for a discussion of the influence of statistical regularities on this effect). That is, the meanings of linguistic stimuli rely on modality specific sensorimotor information or simulations. Within this framework it has been proposed that the switching cost is due to changing from one modality specific brain system to another.

The goal of the current study is to explore the modulation of the modality switch N400 effect. Specifically, we aim to explore whether this effect is sensitive to linguistic and semantic markers. By adding specific linguistic and semantic properties to the typical modality switch paradigm, we hope to better understand the timing and the automaticity of embodied cognition effects during language processing. An understanding of the timing and automaticity of embodied effects on language comprehension is necessary for building a better model of the role of embodied cognition in language processing. To realize this goal, we have added the factors negation and veracity to a typical modality switch paradigm. Additionally, we have implemented a different task for the participants.

Typically, studies looking at the modality switch effect have utilized the property verification task. As discussed above, participants have to verify that a property is "usually true" or "usually false" of a particular concept (e.g., Pecher et al., 2003). In order to explore the role of veracity and negation within this paradigm, we decided to implement the sentence verification task. Sentence verification is similar to property verification. In sentence verification, sentences are presented and subjects respond with a true or false judgment at the end of the sentence. Comparing items that work in both tasks it is clear that some items can be almost identical ("A blender can be loud"), while others can only be used in the sentence verification ("A baby drinks milk"). The advantage of using sentence verification rather than property verification is that the former has a long history of being used to investigate veracity and negation both behaviorally (for a review of the sentence verification task, see Carpenter and Just, 1975) and in ERP experiments (e.g., Fischler et al., 1983).

## **WHY VERACITY AND NEGATION?**

Veracity and negation have been studied outside of the domain of embodied cognition extensively. For veracity, it has been consistently shown that when participants are asked to judge the veracity of a sentence, true sentences are verified faster than false sentences (for example, Trabasso et al., 1971; Clark and Chase, 1972; Wason, 1980). The primary explanation for this is that readers match the relevant conceptual information provided in the sentence to either the external situation (when the task requires comparing the veracity of a sentence to a given picture) or their general world knowledge (when the task involves sentences only).When the conceptual information and external situation/world knowledge are incongruent (a false sentence) there is a slowing of responses (Carpenter and Just, 1975; see also Fischler et al., 1983). Similarly, a corresponding modulation of the N400 effect using ERPs has been seen for false sentences (e.g., Fischler et al., 1983; Hagoort et al., 2004). However, whether this comparison between information in the sentence and general world knowledge relies on an embodied representation of the sentence in order to judge veracity is not clear. Furthermore, to our knowledge no model of embodied cognition has adequately described how this comparison process may happen. This is a point we return to.

Across many experiments it has been found that sentences containing negation are verified or read slower than sentences that do not contain negation (Wason, 1959, 1980; Trabasso et al., 1971; Clark and Chase, 1972; Carpenter and Just, 1975; Singer, 2006). Furthermore, an interaction of negation and veracity has been replicated many times. Essentially, true affirmative sentences (*Six is an even number*) are verified or read faster than false affirmative sentences (*Six is an odd number*), while true negative sentences (*Six is not an odd number*) are verified or read slower than false negative sentences (*Six is not an even number*). "Two-step" theories of negation suggest that the reason that determining the truth value of a negated sentence is particularly difficult is because people have to first suppose an "inner proposition" (*Six is an odd number*) before they can apply the negation term to compute the truth value (e.g., Kintsch, 1974; Carpenter and Just, 1975; Clark and Clark, 1977; see Kaup et al., 2007a for review). A related finding has also been shown using ERPs. Specifically, negative

sentences lead to a different pattern in the N400 compared to affirmative sentences (Fischler et al., 1983). Although the typical finding with affirmative sentences is a larger N400 for false, semantically incorrect sentences, for sentences containing negation it is the correct, semantically coherent sentences that lead to a larger N400 amplitude. It is often assumed that this N400 reflects the "inner proposition," prior to the point negation is actually integrated (e.g., Fischler et al., 1983). In sum, the results with both ERPs and reading times suggest that true negated sentences are more difficult to process than false negated sentences.

The only exception to the processing difficulties and ERP pattern for negation appears to be when a context is used that supports the use of negation (e.g., Wason, 1965; Wales and Grieve, 1969; Glenberg et al., 1999; Garton and Robertson, 2003; Nieuwland and Kuperberg, 2008; Tian et al., 2010). When there is an appropriate context, the processing of negation appears to be processed in a manner similar to affirmative sentences. That is, the pattern of reaction times and ERPs look no different from what you would expect with an affirmative sentence.

Interestingly, both false sentences and negated sentences have presented complications in terms of how they are represented in an embodied framework. Barsalou (1999) describes negation as being closely related to the concept of truth. Although both negation and falsity are discussed in the context of comparing a sentence to a situation (or picture) as opposed to background knowledge about the topic, essentially Barsalou proposes that both are represented by creating absent mappings within a simulation between the relevant entities. Specifically, when making a simulation of the information in a sentence, either a false sentence or a negated sentence can lead to a simulation that fails. The marking of that failure, noting the absence of a binding between the relevant entities is what underlies the representation. For example, when simulating the sentences "It's false that there is a balloon above the cloud" and "It's true that there is not a balloon above the cloud" noting the absence of a binding between balloon and cloud is necessary in the simulation of both sentences. Based on this explanation, one might expect to find a similar ERP modulation relative to modality switching for both false sentences and sentences containing negation since according to this embodied cognition framework, they are simulated/represented in the same manner. However, the possible mechanisms of embodied veracity and negation processing have not been well explored. It is still an open question whether, and especially how, an embodied representation could support veracity judgment and negation processing.

Finding that modality switching interacts with veracity and/or negation would help us better understand how sentence processing relies on embodied cognition. Furthermore, it is possible that we see differential ERP modulation for modality switching in sentences containing negation compared to false sentences. Finding such an effect would indicate that the Barsalou (1999) account of negation and false sentences is insufficient. For these reasons, we have implemented the sentence verification task to explore modality switching in true and false sentences that contain negation.

Following a brief review of the small amount of research that exists on the embodied nature of veracity and negation, details of the current experiment will be discussed.

## **VERACITY AND MODALITY SWITCHING WITH AFFIRMATIVE SENTENCES**

The study most relevant to the current study is a recent one by Hald et al. (2011). The authors explored veracity and the modality switch effect with *affirmative* sentences. In this study, the experimental materials included both true and false modality matched and mismatched pairs (see **Table 1**).

For example, the ERPs were compared for *soft* (vs. *soft*) and hard (vs. *hard*) depending on the modality match/mismatch. Additionally, the ERPs to true vs. false sentences (*soft* vs. *hard*) were compared within match and within mismatch conditions.

As discussed above, in traditional ERP studies a consistently larger amplitude N400 is typically seen for words that complete a sentence in such a way as to make the truth value of the sentence false (for example, at the final word when comparing *a ham is blue* vs. *a ham is pink*; Fischler et al., 1983). However, it is unclear how or whether a match or mismatch in modality may affect the processing related to veracity in such cases. It has been suggested that when a false sentence is read, simulation fails. That is, the meaning of the sentence cannot be successfully mapped onto reality (Barsalou,1999). Presumably at this point a new simulation is performed, somehow grounded in the failed simulation (see Barsalou, 1999 for more details on this argument). However, whether and how this actually occurs is unclear. One of the purposes of looking at false sentences in the modality switch paradigm was to shed light on the process of understanding a false sentence, and to explore how this may occur according to embodied models of cognition.

The results of Hald et al. (2011) indicated a different pattern of results for true and false sentences. Specifically, for the true sentences, switching modalities elicited a greater negativity across anterior electrodes as early as 160 ms after the onset of the critical word (*soft*). This effect was seen in three time windows: from 160 to 215 ms, from 270 to 370 ms, and again from 500 to 700 ms (see also Collins et al., 2011 for similar ERP results using the property verification task). However, for the false sentences, no significant effect of modality switching was seen. When comparing the effect of veracity (*soft* vs. *hard*) within the mismatch condition (A leopard is spotted – A peach is *soft/hard*), a typical N400 was seen for false sentences compared to true sentences. However, when the modality matched (An iron is hot – A peach is *soft/hard*), no effect of veracity was found. In so far as the N400 amplitude reflects difficulty in processing, this result suggests that the construction of a simulation in one modality aided the matching modality simulation of the target sentence. Possibly this led to the false sentences being no more difficult to comprehend than the true sentences.



Critical words are shown here in bold for clarification.

This study suggests that veracity judgments are grounded in an embodied manner. That is, when a saving can be made in the embodied simulation of the sentence by having the same modality simulated twice in a row, this leads to improved ability to judge the veracity of false sentences. Although this result indicates that embodied cognition is important for the processing of semantics related to judging truth value, it does not address whether embodied cognition plays a role in more linguistically marked aspects of language, namely negation.

## **NEGATION, EMBODIED COGNITION, AND CONTEXT**

Evidence supporting the idea that at least at a late point in time, negation processing relies on embodied simulations comes from Kaup et al. (2007b) to Kaup and Zwaan (2003). In both studies, they assessed the accessibility of a word that was either negated or not using a recognition task. For example, in Kaup and Zwaan (2003) participants were presented with short discourses that contained a color term that was either mentioned within the scope of a negative context or not, which then led to a situation where the color was either present or not. Participants had to determine whether a color term was in the previous sentence (probe recognition task). For example, for the sentence *Sam was relieved that Laura was not wearing her pink dress*, the probe word *pink* was presented after the sentence at an early and late time delay. In this example, the color term pink was within the scope of negation and the color pink would not actually be present in the situation described. Results indicated that at the early delay probe point (500 ms delay after sentence end) response times were slower when the color term had been negated. In the late time delay (1500 ms) the response time to the color term was influenced by the content of the situation (whether the situation described meant the color would be part of the situation, for example *Sam wished that Laura was not wearing her pink dress*, where the pink dress is part of the situation vs. *Sam was relieved that Laura was not wearing her pink dress*, where the pink dress is not part of the situation). This experiment, as well as others (e.g., Kaup et al., 2006) support the general idea that a simulation is made that notes the absence of negated information, making that information more difficult to retrieve. However, these studies only speak to the eventual representation of the negation, rather than to the ongoing process of comprehending/representing the negation as the sentence unfolds. Furthermore, these studies support the idea that something like a simulation is built, but do not address the specifics of how/whether the simulation is grounded in perceptual, action, and emotional information (although, see Kaup et al., 2006, for results suggesting that spatial information is part of this embodied simulation, at least at a delayed point). The current study will specifically address the role of perceptual modalities in the online processing of negation. However, as discussed earlier, when it comes to negation, context matters.

Whether an early effect of negation processing appears is largely related to the context in which negation is used, that is when the context supports the use of negation (see Glenberg et al., 1999). The goal of the current study was to better understand what the role of a modality match/mismatch may be on the ongoing processing of negation. The current study was designed to answer the following questions: first, can we find a modality switch effect with sentences containing negation? If we do see such an effect, this suggests that sentences containing negation are grounded in perceptual systems<sup>1</sup> . Secondly, since context has been shown to affect the processing of negation, can modality information similarly change the processing of negation? Given that Hald et al., 2011 found that modality matching aided the processing of false sentences, could modality matching similarly facilitate negation processing?

#### **THE CURRENT STUDY**

The current study is based on the Hald et al. (2011) study in Section "Veracity and Modality Switching with Affirmative Sentences."However,in addition to exploring the effect of veracity and modality switching, here the target sentences all included negation (see **Table 2** in Materials and Methods for example stimuli).

For the modality switch effect in negated sentences, we compared the ERPs time locked to the identical word in a sentence, depending on whether the previous context sentence matched or mismatched in modality. For example, we compared the ERP to the word *soft* in the true sentence *The marble isn't soft* when it was preceded by a modality matched sentence (*A summer night is balmy*) vs. when it was preceded by a modality-mismatched sentence (*A kingfisher is bright blue*; see Materials and Methods for details about the sentence materials). Finally, to explore the effect of modality and negation on veracity we compared the ERPs to true vs. false sentences within the match condition and then within the mismatch condition (for example comparing *The marble isn't soft* vs. *The marble isn't hard* when the previous sentence matched in modality.

According to embodied accounts of cognition/language processing (e.g., Barsalou, 1999, 2009; Zwaan and Madden, 2005), as well as the previous results discussed above, we expect to see an effect of modality switching in the true sentences. However,

<sup>1</sup>We assume that this would also be the case for motor and emotional systems (i.e., a mismatch effect would occur for negated sentences containing action or emotional information). However, as the current study only looks at perceptual modalities, we can only conclude that sentences containing negation are grounded in perceptual systems.



Critical words are shown here in bold for clarification.

the negation may cause this effect to be delayed. This would be in line with the delayed embodied effects in negated sentences found by Kaup et al. (2007b). For the false sentences, it is unclear whether an effect of modality switching will be seen at all given the previous results (e.g., Hald et al., 2011). Finally, it may be the case that modality matching might actually aid the processing of negation, as has been seen with discourse context (e.g., Nieuwland and Kuperberg, 2008). If that is the case, then we would expect negated false sentences to elicit greater N400 amplitudes than negated true sentences when preceded by a modality matched sentence. For the mismatched condition we would expect the true negated sentences to elicit a larger N400 amplitude than the false negated sentences, since this pattern of results is typically found when negated sentences are presented out of context (in line with Fischler et al., 1983). Overall, by examining the modality switch effect in combination with veracity and negation, a richer understanding of the parameters by which embodied cognition influences language comprehension should be achievable.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Sixteen participants were initially recruited from the Psychology undergraduate cohort attending Canterbury Christ Church University and took part in the study. Of these three were eliminated during the filtering of target EEG events due to a large amount of data loss (i.e., a loss of more than one third of target events). A further two participants were excluded from the final analysis because their EEG recordings exhibited excessive artifacts resulting in the loss of a large number of trials (i.e., a loss of more than one third of trials), resulting in a final sample of eleven participants (seven females; aged 18–32, mean 21.1; four males; aged 18–26, mean 22.5). Participants were awarded course credit for completing the study and all had normal or corrected to normal vision, were right handed, native English speakers, and had not been diagnosed with reading or speaking difficulties.

Ethical approval for the study was granted by Canterbury Christ Church University's Faculty Research Ethics Committee and all participants provided written consent prior to taking part in the study.

#### **STIMULUS MATERIAL AND DESIGN**

Materials comprised 160 pairs of experimental sentences consisting of an initial sentence, referred to as the Modality Context sentence and a second Target sentence. The Modality Context sentences were always true non-negated statements and were evenly divided into those that described either a visual (50%) or haptic property (50%) of an object. The Modality Context sentences were a subset of items that have been previously rated as more salient in one modality than others (see Pecher et al., 2003; Van Dantzig et al., 2008; Lynott and Connell, 2009).

The target sentences were always negated (e.g., Rice isn't black/white) and in half of the trials their modality matched that of the Modality Context sentence, with veracity of the target sentence equally balanced. Hence, modality-match/-mismatch and target sentence veracity were fully crossed creating 40 pairs of modality matched true sentences, 40 pairs of modality matched false sentences, 40 pairs of modality-mismatched true sentences, and 40 pairs of modality-mismatched false sentences.

False versions of the negated target sentences were created using words that were independently rated as the opposite of the salient modality feature of the object (see Hald et al., 2011). For example, in the true negated visual target sentence "Rice isn't black" the salient visual feature of "black" was replaced with "white" (see **Table 2** for example stimuli; a full list of negated sentences is available on request).

To ensure that there was an equal number of affirmative and negative sentences an additional 160 filler sentences were constructed. These comprised 80 affirmative and 80 negative sentences. Half of the filler sentences contained strong modality related properties, using tactile, visual, auditory, and gustatory modality related information. The remaining half was not based on modality specific information but merely contained highly related words conveying false information (e.g.,"A ball is refereed"; see Pecher et al., 2003 for similar use of semantically related items). However,it was not possible to match the number of sentence pairs that were context negative-target affirmative with those that were context affirmative-target negative. Such a procedure would have required an additional extra 80 sentence pairs which would also have increased the duration of the task and in all likelihood led to a reduction in participant motivation and engagement levels. Thus, given that the participant remained unaware of the fact that sentences were presented in pairs it seemed more important to control for the absolute number of affirmative and negative sentences and true and false sentences.

The critical words were matched on a number of measures including: (i) word log (lemma) frequency (true-matched modality: 2.37; true-mismatched modality: 2.37; false-matched modality: 2.32; and false-mismatched modality: 2.32, from Baayen et al., 1993); (ii) word length (true-matched modality: 4.5 letters; truemismatched modality: 4.5 letters; false-matched modality: 4.7 letters; and false-mismatched modality: 4.7 letters); and (iii) word class (all adjectives). In addition, none of the critical words was over 12 letters in length.

The pairs of sentences were presented in a pseudo-randomized order specific to each participant (created using Mix;Van Casteren and Davis, 2006) using a fully within participants design. The use of a within participants design meant that the findings from this study could be easily compared to previous similar designs (e.g., Fischler et al., 1983; Pecher et al., 2003; Hald et al., 2011).

## **PROCEDURE FOR THE ERP STUDY**

After reading an information sheet, participants completed a short questionnaire asking about language background, basic health, and handedness. They then completed a standard consent form and began the experiment. Each participant was tested individually in a quiet room, seated in a comfortable chair approximately 70 cm from the computer monitor and were asked not to move or blink during the presentation of the sentences. Participants were asked to read each sentence for comprehension and decide whether it was true or false.

The stimuli were presented using the E-Prime 2.0 (Schneider et al., 2002) stimuli presentation platform. Each session began with a practice block of 10 sentences,which were similar in nature to the experimental items. At the end of the practice block, the participant had the opportunity to ask questions relating to the task. The remaining sentences were then split into six blocks, each lasting for approximately 12 min, with a short break between blocks. Each block began with two filler items, which were similar in nature to the experimental items. These filler items were included to minimize the potential loss of data due to artifacts resulting from beginning a task.

Each trial began with a fixation point ("+++") displayed for 1 s in the center of the screen. Participants were told that they could blink their eyes during the fixation display if needed, but to be prepared not to blink during the upcoming sentence. After a variable time delay (randomly varying across trials from 300 to 450 ms), the sentence was presented word by word in white lowercase letters (Courier New, 18-point font) against a black background. The first word and any proper noun were capitalized and the final word of each sentence was followed by a full stop. Words were presented for 200 ms with a stimulus onset asynchrony of 500 ms. Following presentation of the final word in each sentence the screen remained blank for 1000 ms after which three question marks appeared, along with the text, "1:true" and "5:false." Participants needed to respond by pressing either the "1" or the "5" on the number keypad of a standard keyboard to indicate whether they thought the sentence was true or false. The association between number and veracity was counterbalanced so that for all participants, half of the time the number 1 indicated true and half the time the number 5 indicated true. If participants responded incorrectly the feedback message "Wrong Answer" was displayed and if they took more than 3000 ms to respond the feedback message "Too Slow" was displayed. Exactly the same presentation procedure was used for context and target sentences so that participants remained unaware that sentences were presented in pairs. Following the experiment all participants were debriefed.

## **EEG RECORDING AND ANALYSIS**

The EEG was recorded using a 64-channel WaveGuard Cap utilizing sintered Ag/AgCl electrodes connected to an ANT amplifier (ANT, Enschede, Netherlands). An average reference was used. The electrodes were placed according to the 10–20 standard nomenclature (Jasper, 1958) over midline (FPz, Fz, FCz, Cz, CPz, Pz, POz, and Oz) lateral (Fp1, Fp2, AF3, AF4, AF7, AF8, F1, F2, F3, F4, F5, F6, F7, and F8), fronto-central (FC1, FC2, FC3, FC4, FC5, and FC6), central (C1, C2, C3, C4, C5, and C6), temporal (FT7, FT8, T7, T8, TP7, and TP8), centro-parietal (CP1, CP2, CP3, CP4, CP5, and CP6), parietal (P1, P2, P3, P4, P5, and P6), and occipital (PO3, PO4, PO5, PO6, PO7, PO8, O1, and O2) positions. The signals were digitized online with a sampling frequency of 512 Hz and bandpass filtered from 0.01 to 100 Hz. Electrode impedance was maintained below 10 kΩ.

Analysis was conducted using ASA (ANT, Enschede, Netherlands) software. EEG data were initially screened for potential artifacts in a critical window ranging from −100 to 1000 ms post stimulus onset. Trials containing artifacts were excluded from further analysis, which resulted in 90.83% of epochs being included.

## **RESULTS**

An overview of nine representative electrodes (out of 64 total electrodes) is shown in **Figures 1** and **2**. **Figure 1** shows the effect of modality for true sentences. **Figure 2** shows the same effect for false sentences.

Based on established effects from the literature, together with a visual inspection of the ERP waveforms, we divided the analysis into the following time windows after critical word onset: 190–300 ms to capture the N1–P2 complex, 325–400 ms to capture a smaller peaked N400 effect, 300–500 ms to encompass the N400 window, and 600–850 ms for late effects. The results for each time window are discussed in turn below. **Figures 1** and **2** illustrate the effect of modality switching for each of these four time windows.

For each time window, a fully within participants three-way analysis of Modality switch (match, mismatch), Veracity (true, false), and Region (anterior, posterior) was conducted. This was followed by planned comparisons of (i) Modality switch for true sentences, (ii) Modality switch for false sentences, (iii) Veracity for matched sentences, and (iv) Veracity for mismatched sentences.

#### **FIRST TIME WINDOW: 190–300 MS**

This time window was selected to examine the N1–P2 complex. In the overall 2 × 2 × 2 analysis, a main effect of Modality switch was found [*F*(1, 10) = 5.04,MSE = 0.27, *p* < 0.05],where different modality sentences evoked greater positivity than same modality (0.292 vs. 0.242µV, difference 0.05). A Modality switch by Region interaction [*F*(1,10) = 5.19,MSE = 7.79,*p* < 0.05] was alsofound, as well as a Modality switch by Region by Veracity interaction [*F*(1, 10) = 6.35, MSE = 8.79, *p* < 0.05].

We investigated both interactions using a simple main effects analysis. For true sentences alone, a Modality switch effect was found across frontal electrodes [*F*(1, 10) = 29.79, MSE = 2.00, *p* < 0.001], where a greater positivity was seen for modality mismatch than match (0.837 vs. 0.180µV, difference 0.657). Staying with the true sentences, the Modality switch effect was reversed for the posterior electrodes [*F*(1, 10) = 25.33, MSE = 1.48, *p* < 0.01; −0.244 vs. 0.277µV, difference −0.521; see **Figure 3**]. For false sentences, no effect of modality switch was found.

Similarly, for modality matched sentences, no effect of Veracity was found. For modality-mismatched sentences, however, a marginal effect of Veracity was found for the frontal electrodes [*F*(1, 10) = 4.23, MSE = 6.75, *p* = 0.067], where true sentences evoked a greater positivity than false (0.837 vs. 0.382µV, difference −0.455). This effect was reversed in the posterior region (−0.244 vs. 0.192µV, difference 0.436) but failed to reach significance [*F*(1, 10) = 3.52, MSE = 7.42, *p* = 0.090].

#### **SECOND TIME WINDOW: 325–400 MS**

This window was selected to examine early but brief N400-like effects. The overall analysis for this window showed only interactions of Veracity by Region [*F*(1, 10) = 7.91, MSE = 17.61, *p* < 0.05] and Veracity by Region by Modality switch [*F*(1, 10) = 9.31, MSE = 19.20, *p* < 0.05].

These interactions were investigated using simple effects. For true sentences, we found a marginally significant effect of Modality switch [*F*(1, 10) = 4.28, MSE = 20.72, *p* = 0.065] in the frontal electrodes,where mismatch showed a greater positivity than match (0.929 vs. 0.126µV, difference 0.803). This effect was reversed in the posterior electrodes [*F*(1, 10) = 5.30, MSE = 10.30, *p* < 0.05],

True-Mismatched condition, the green line shows the True-Matched condition. The limits of each of the four time windows for analysis are indicated (1 = 190–300 ms; 2 = 325–400 ms; 3 = 300–500 ms; 4 = 600–850).

**FIGURE 2 | Event-related potential traces for false sentences for nine selected sites across the scalp, time locked to onset of the critical word (presented at 0 ms).** Negative activation is plotted up. The blue lines show

the False-Mismatched condition, the black line shows the False-Matched condition. The limits of each of the four time windows for analysis are indicated (1 = 190–300 ms; 2 = 325–400 ms; 3 = 300–500 ms; 4 = 600–850).

**difference.** Blue hues indicate negative potentials, red hues positive potentials. The two conditions shown are True-Mismatch **(A)** and True-Match **(B)**.

where greater negativity appeared for the modality-mismatched sentences (−0.590 vs. 0.040µV, difference −0.63; see **Figure 5** below). For false sentences, no effect of Modality switch was found across the frontal electrodes, but was found for the posterior electrodes [*F*(1, 10) = 5.20, MSE = 5.51, *p* < 0.05], where greater positivity was associated with mismatch compared to match (0.465 vs. 0.009µV, difference 0.456). See **Figure 4** for a topographic illustration of this effect.

We investigated Veracity for matched sentences and found no effect in either the frontal or posterior regions. However, for mismatched sentences, we found a frontal effect for Veracity [*F*(1, 10) = 24.71, MSE = 6.62, *p* < 0.01], where false sentences elicited a greater negativity than true (−0.162 vs. 0.929µV, difference −1.091), as well as a reversed posterior effect [*F*(1, 10) = 36.00, MSE = 4.26, *p* < 0.001], where greater negativity was associated with true sentences vs. false (0.465 vs. −0.590µV, difference 1.055). This is illustrated in **Figures 6** and **7** below.

#### **THIRD TIME WINDOW: 300–500 MS**

We chose this time window to examine N400-like effects. The overall analysis found a main effect of Modality switch [*F*(1, 10) = 5.59, MSE = 0.37, *p* < 0.05], where overall matched sentences showed greater negativity than mismatched (0.100 vs. 0.161µV, difference 0.061), as well as a Veracity by Region interaction [*F*(1, 10) = 7.78, MSE = 24.12, *p* < 0.05] and a Veracity by Region by Modality switch interaction [*F*(1, 10) = 8.54, MSE = 16.41, *p* < 0.05].

Using simple main effects, we examined Modality switch in the frontal region for true sentences, finding a main effect [*F*(1, 10) = 6.67, MSE = 12.64, *p* < 0.05], where mismatch showed the greater positivity (0.857 vs. 0.074µV, difference 0.783; see **Figure 5**.

This effect was reversed for the posterior region [*F*(1, 10) = 8.63, MSE = 6.35, *p* < 0.05], where greater negativity was associated with mismatched sentences (−0.509 vs. 0.122µV, difference −0.631). For false sentences, there was no effect of Modality switch in either the frontal or posterior regions.

There was no effect of Veracity for modality matched sentences in the frontal or posterior regions. However, for mismatched sentences, we found a marked positivity for true sentences in the frontal region [*F*(1, 10) = 25.24, MSE = 6.77, *p* < 0.05; 0.857µV vs. −0.258, difference −1.115]. For the posterior region, this effect was reversed [*F*(1, 10) = 30.05, MSE = 5.18, *p* < 0.001; 0.555µV vs.−0.509, difference 1.064]. **Figure 6** illustrates this veracity effect at site CPz and **Figure 7** illustrates a topographical plot of the veracity effect.

#### **FOURTH TIME WINDOW: 600–850 MS**

This time window was chosen to examine late positive effects. In the overall analysis, we found interactions of Veracity by Modality switch [*F*(1, 10) = 3.71, MSE = 0.64, *p* < 0.05], and Veracity by Modality switch by Region [*F*(1, 10) = 7.06, MSE = 30.12, *p* < 0.05]. See **Figures 1** and **2** for a plot of representative electrodes within this time window and **Figure 8** for topographical plot of this effect.

Simple main effects were used to examine the effect of Modality switch in the frontal region for true sentences. We found a marginally significant effect of Modality switch [*F*(1, 10) = 4.59, MSE = 20.98, *p* = 0.058], where matched sentences showed a greater negativity than mismatched (−0.883 vs. −0.046µV, difference 0.837). In the posterior region, the direction of this relationship was reversed but did not reach significance (*p* = 0.08).We

**FIGURE 5 | Event-related potential in microvolts across the scalp at 364 ms post onset of the critical word, approximately at the peak of the difference.** Blue hues indicate negative potentials, red hues positive potentials. The two conditions shown are True-Mismatch **(A)** and True-Match **(B)**.

repeated these analyses for false sentences and found a significant effect of Modality switch in the frontal region [*F*(1, 10) = 5.27, MSE = 8.50, *p* < 0.05], where a greater negativity was seen for mismatched sentences (−0.752 vs. −0.181µV, difference −0.571). For the posterior region, this effect was reversed but did not reach significance (*p* = 0.072).

We next examined the effect of Veracity in the frontal region for modality matched sentences but found no reliable differences. Similarly, we found no reliable veracity differences in modality-mismatched sentences for either region.

#### **REACTION TIME DATA**

Participants made a true/false judgment after each sentence was presented. We include an analysis of the reaction time data here for completeness. However, since the reaction time data of Pecher et al. (2003) included three times the number of participants per condition than we have used here, we do not necessarily expect to have enough power to detect all differences. Additionally, in order to minimize movement artifacts during the critical word, participants were not required to give a speeded response (see Procedure for the ERP Study for details). All data were trimmed using a non-recursive criterion of 2.5 SD from the mean, which resulted in a loss of 2.5% (40/1600) of trials. Means and standard deviations are presented in **Table 3**. A mixed effects regression model was used to take into account the effects of both participants and items on response times (modeled as random intercept effects, cf., Janssen, 2012). For this analysis, we included all correct responses to target sentences. The fixed

effect of Modality showed a trend [*F*(1, 1318) = 2.78, *p* = 0.096] whereby slower response times were seen in the Matched compared to Mismatched condition (739.6 and 716.6 ms; *d* = 0.033). There was no fixed effect of Veracity [*F*(1, 1322) = 2.42, *p* = 0.120] or Modality ×Veracity [*F*(1, 1323) = 0.92, *p* = 0.337] interaction. The random intercept effect of Participants was significant (*Z* = 2.09, *p* < 0.05) and there was similar trend for Items (*Z* = 1.65, *p* = 0.098). These random intercepts are to be expected, are of no direct theoretical relevance and hence are not discussed. Accuracy was consistently high across all conditions: True-Match 83.5% accurate; True-Mismatch 85.25% accurate; False-Match 86.25% accurate; and False-Mismatch 85.75% accurate. A mixed ANOVA revealed no main effects of Modality [*F*(1, 9) = 0.12, *p* = 0.74], Veracity [*F*(1, 9) = 0.46, *p* = 0.52], or interaction between Modality and Veracity [*F*(1, 9) = 0.56, *p* = 0.47].

## **DISCUSSION**

The goal of the current study is to explore the modulation of the modality switch N400 effect. Specifically we hope to understand whether this effect is sensitive to linguistic and semantic markers. To realize this goal, we have added the factors negation and veracity to a typical modality switch paradigm, but in this case while simultaneously recording ERPs. We hoped to shed light on two questions. First, do sentences containing negation show a

**Table 3 | Average reaction time (ms) and standard deviation for the true/false judgments on target sentences.**


modality switch effect? Finding such an effect would suggest that sentences containing negation are grounded in perceptual systems. Secondly, previous studies on negation indicate that context can affect the processing of negation. Specifically, when negation is used within a supporting context, processing costs of using negation are minimal. Can matching modality information similarly change the processing of negation? In short, our results indicate that the answer to both of these questions is "yes." Sentences containing negation do show a modality switch effect similar to that seen with affirmative sentences. Additionally, the effect of veracity suggests that matching modality information can affect the processing of negation. Specifically, we see a different N400 pattern for veracity when modality matches, but a standard N400 pattern to veracity when modality mismatches. The details of these effects are discussed below in turn. Finally, in the Section "Conclusion," we speculate what the current results may mean in terms of the role of embodied simulation in language comprehension more generally.

#### **MODALITY SWITCH EFFECT FOR TRUE SENTENCES**

The modality switching results of the negated true sentences parallel previous results found with affirmative sentences (e.g., Collins et al., 2011; Hald et al., 2011). An effect of switching modalities was found in all four of the time windows. Specifically, as early as 190 ms after the onset of the critical word (Time window 1), truemismatched modality sentences led to a greater negativity across the posterior electrodes compared to true-matched modality sentences. This greater negativity for the true-mismatched modality sentences continued across the posterior electrodes, resulting in significant differences in the time windows 300–500 ms (as well as 325–400 ms) and again from 600 to 850 ms. Additionally, across the frontal-central electrodes, it was the true-matched modality sentences that showed greater negativity rather than the truemismatch modality sentences. This same overall ERP pattern was seen with the true-mismatch vs. true-match sentences in the comparable affirmative experiment (Hald et al., 2011). Likewise, as in Hald et al. (2011), no significant effect of modality was seen in the reaction times for true sentences.

This modality switch effect has been previously explained in terms of the idea that our conceptual system is grounded in modality specific or embodied simulations (Pecher et al., 2003; Hald et al., 2011; see below for more details); the current finding extends the role of embodied simulations to the immediate processing involved in negated sentences. This is interesting for at least three reasons. First, although some previous behavioral evidence suggests that negation is represented in an embodied fashion, via a simulation that notes the absence of negated information (e.g., Kaup and Zwaan, 2003), none of the previous studies have addressed the role of perceptual modalities on negation processing

as the sentence unfolds online. Our results indicate that the role of embodied simulation on negation processing can be immediate and online, rather than a delayed process.

Secondly, the embodied account of negation and false sentences as described by Barsalou (1999) would predict similar ERP modulations for the modality switching for both false and negative sentences. That prediction is not supported here. If negated sentences showed a similar pattern to false sentences, one might expect that the effect of modality switching on negated true sentences would be very different than what was found for modality switching in affirmative true sentences. We do not find that here. Instead it appears that modality switching effects are quite similar regardless of whether the sentences are affirmative or negative.

Lastly, the current study provides an additional demonstration of an N400-like effect being sensitive to modality switching. Possibly the amplitude of the ERP in this case serves as an indicator of the ease or difficulty of retrieving stored conceptual knowledge related to a word. This modulation may depend on both the stored conceptual representation as well as the previous contextual information (see Kutas et al., 2006). For example, when a visual context is followed by the target sentence "*Rice isn't*. . .," participants are likely to form expectations that may be biased by the visual context which leads to a simulation which is biased to new visual information. When the sentence continues with a "visual" word, the word is immediately integrated in the simulation. However, when a tactile word is displayed the modality of the simulation has to be changed which leads to the modality switch effect and the observed negativity in the ERP. Before discussing alternative explanations of the modality switching effect in true sentences, a short discussion of our modality switching results with false sentences is necessary.

#### **MODALITY SWITCH EFFECT FOR FALSE SENTENCES**

In the current study we have seen a small but significant effect of modality switching for false-mismatched modality sentences compared to false-matched modality sentences in the posterior electrodes in the time window 325–400 ms after critical word onset (Time window 2). Interestingly, the effect of modality switching for the false sentences is opposite to that seen here for true sentences. False sentences led to a greater N400-like effect for the match modality condition compared to the mismatch condition. With the true sentences the mismatch led to a greater N400-like effect compared to the match condition. This finding is different than what was previously seen with false affirmative sentences (Hald et al., 2011), where no significant effect of modality switching was found. However, it should be noted that the pattern of results for the false sentences in the Hald et al. (2011) study mirrors that seen here; it is simply that the effect was not sufficiently robust to reach significance.

Why the effect for false sentences is significant in one study but not in the other cannot yet be fully explained and as such is worth exploring further in future research. However, since the significant effect for false sentences is quite small (a 0.46 amplitude difference) as well as occurring only in a short time window (325–400 ms), it seems likely that the modality switch effect may be more difficult to find when using false sentences. Previously, we discussed the possibility that the null effect with false sentences may be due to a simulation of the sentence failing (see Hald

et al., 2011). Specifically, we assumed that participants compared the information from the simulation of the false sentence to background knowledge they have, and when the simulation did not match background knowledge, the simulation failed (also, see Barsalou, 1999, for a discussion of simulations failing with false sentences). However, it was felt at the time that this was not an entirely adequate explanation of falsity, since it seems that making the simulation of the false sentence itself would still show some benefit of a modality match. We felt a more reasonable explanation was that when participants tried to simulate"*the cellar is light* " (an affirmative false sentence example from Hald et al., 2011) out of context they were unable to immediately activate the relevant perceptual/action/emotion information due to limited experience with the information in the sentences. Essentially we claimed that such simulations take longer out of context, and the modality switch effect being a small and subtle effect, is not observed in this case. However, in the current study with false sentences, we did observe such a modality switch effect. This may be due to the negation itself changing the type of perceptual information that is included in the sentences that needs to be simulated. With negated sentences, the individual lexical items that make up the sentences are concepts that we have had extensive experience of being paired together. To illustrate this take the "*Rice isn't white*" example. Rice typically *is* white. Given this, it may be the case that it is this relationship between the two concepts that allows participants to more quickly simulate the false sentence, which leads to the small, but significant effect of modality switching. The difference in the direction of the effect with regard to false sentences (false-match sentences leading to greater negativity compared to false-mismatch sentences as opposed to true-mismatch sentences leading to greater negativity compared to true-match sentences) may simply be an indication of the falseness of the sentence, but at this point further research is needed to better understand why false sentences lead to the opposite effect in the ERPs compared to true sentences.

Overall, the results indicate an effect of modality switching on the ERPs regardless of whether the sentences are true or false, but the specific effect differs depending upon veracity (true sentences leading to a larger N400 for mismatch compared to match pairs; false sentences leading to a larger N400 for match compared to mismatch pairs). Essentially it seems that when the reader is in the visual modality, they can easily predict/expect from "Rice isn't. . ." anything that is in the visual modality except "white." "White" is particularly unexpected in this context and therefore produces a larger N400. ERP results for modality switching with affirmative sentences are somewhat similar to ERP modulations that have been found for pictures and combined sentence-picture stimuli (Barrett and Rugg, 1990; Ganis et al., 1996). Essentially, we again found a very similar effect. This suggests that negative sentences, like affirmative sentences that refer to a highly salient physical aspect of an object induce ERP effects that are comparable to those effects that have been obtained with pictures (see Hald et al., 2011, for a more detailed discussion of the parallel between results obtained with pictures and those obtained with sentences). Overall, interpreting these results within an embodied cognition framework would suggest that our participants generated a mental simulation of the properties of the object (*Rice isn't black*), which produced activation that is very similar to actually seeing the object. An intriguing direction for future research would be to determine whether the visual sentences show this effect more robustly than the tactile sentences, as might be predicted by this explanation. Furthermore, by examining other modalities as well as possible actions and emotions we may be able to find specific ERP signatures that are related to the particular modality/action or emotion being simulated. Some suggestion that this may indeed occur comes from Collins et al. (2011), where differential ERP effects were seen for modality switching for visual vs. auditory properties.

## **VERACITY RESULTS FOR MODALITY MISMATCH**

With affirmative sentences a larger N400 is typically seen for false sentences compared to true sentences (e.g., Hagoort et al., 2004). However, for negative sentences without a context, the pattern of results typically reverses (e.g., Fischler et al., 1983). True sentences lead to a larger N400 compared to false sentences. This suggests that as far as the N400 is immediately sensitive to integrating words into the higher-order representation, people appear to be at first only considering something like *rice* + *black* when trying to comprehend single sentences containing negation (as in the "two-step" theories of negation discussed in the introduction). In line with this idea, we found that the true-mismatched negated sentences elicited a larger N400 amplitude than the falsemismatched negated sentences in Time windows 2 and 3. No differences were seen across the other two time windows. Similarly, in the reaction times we saw that overall (collapsing across modality match/mismatch) false sentences were responded to faster than true sentences. No interaction was seen with modality, an issue that will be discussed below.

It has been suggested that for negation to be processed immediately, like affirmative sentences, a context of plausible denial is necessary (Wason, 1965). A context of plausible denial is when one negates something that may have been mistakenly believed (e.g., "The nurse was not a woman"). In the current study, for the modality-mismatched sentences there was no context to aid the processing of negation, and therefore it is not surprising we obtained results compatible with negation not being immediately processed. As outlined above,when the critical word"black"comes in, the modality of the situation needs to change and this causes a delay. By the time of the N400 window, the modality of the simulation may have switched to visual, but given the negation without context, the simulation is essentially based on "rice" and "black" at this stage and therefore a standard N400 for negated sentences is observed.

## **VERACITY RESULTS FOR MODALITY MATCH**

Independent of the explanation of the modality switching effect, our results for matching modality with negated sentences are interesting for another reason. As discussed in the introduction, studies have typically shown that when negated sentences are presented without a discourse context, it is the true sentences that elicit a greater N400 than the false sentences. For our modality matching sentences, this typical finding disappears. Instead we see no difference between the true and false sentences when the modality matches, in any of the four time windows. This is the same pattern

as we found with the affirmative sentences (Hald et al., 2011), where no effect of veracity was seen when the modality matched. Why might such a robust effect as veracity disappear when the modalities match? We offer the following tentative hypothesis. In the matched modality case, after a simulation that highlights, for example, visual features (*A giraffe is spotted*), simulating *Rice isn't white/black* benefits from also being in the visual modality. As discussed in the introduction, it is often assumed that in order to determine truth value of negative sentences, people have to first suppose an "inner proposition," in this case something like "*Rice is white/black*." We would suggest that rather than a proposition *per se*, our results suggest that if an early representation like this occurs, then it is more likely to be an embodied simulation than a proposition. Therefore the modality match allows for a richer simulation to arise more quickly, potentially making both true and false sentences equally easy to process. We propose that this is likely the reason why no N400 difference is seen for the true negated condition compared to the false negated condition, similar to what was seen for affirmative sentences. In combination with the results on affirmative sentences, we propose that matching modality allows for a quicker and broader simulation of relevant properties of the sentence, including support for less likely properties of the sentence. Hence making even a typically false property of an object easier to process (see Hald et al., 2011 for a more detailed explanation of how this works with affirmative sentences).

However, facilitating a simulation in itself does not remove the difficulty of processing negation. If that were the case, the false negated sentences should have elicited a larger amplitude N400 than the true negated sentences. Instead we see no difference between the true and false negated sentences. This is also what we found with affirmative sentences (Hald et al., 2011). It seems likely that our results with modality matching facilitates early processes related to veracity judgment and possibly prior to negation being fully taken into account and before the final veracity judgment has been decided. In other words, it seems that the modality match allows for a simulation that is more "accepting" of a wider variety of possible properties of an object, including the less typical ones. However, this is just speculation at this point and clearly needs to be followed up.

The reaction time pattern corresponds to this ERP pattern. The reaction time difference between true and false sentences appears much smaller in the modality matching condition than in the mismatch condition. However, whilst a trend in response times between modality was observed this was not robust and given the lack of any interaction between modality and veracity we can only speculate that with more participants this pattern may more closely match the pattern seen in the ERPs. As noted in the Results section, we were not expecting a significant effect of modality in the reaction times.

Overall, these results suggest that modality matching modulates the veracity-related N400. This may be similar to how discourse context can modulate the effect of veracity (see Hald et al., 2007). How important modality information is for negation processing is still not fully understood, but it may be the case that modality matching information can act as a form of context, like plausible denial, which allows for more immediate processing of negation. Essentially, if readers expect to stay within a modality, that limits what can be said and hence what might be negated.

## **ALTERNATIVE EXPLANATIONS FOR MODALITY MATCH EFFECTS**

Although our results fit well with the idea that readers are creating an embodied simulation grounded in the perceptual systems, whereby a mismatch in perceptual information in the sentences leads to a greater processing load, there are alternative explanations for this finding. One alternative explanation of modality switching is based on the organization of the linguistic semantic system. It could be the case that the linguistic semantic system is organized in such a way that is sensitive to modality information, but is still symbolic. This would mean that this effect is not due to activation in modality specific regions in the brain, but instead is due to a type of semantic priming. That is, semantic priming based on modalities rather than semantic association. In the original study by Pecher et al. (2003) the authors attempted to rule out this possibility by conducting a control study where they looked for modality switching type costs with sentences that matched/mismatched in semantic associations (based on Nelson et al., 1999 norms). For example, they looked at sentence pairs like "Sheet can be spotless – Air can be clean." compared to "Sheet can be spotless – Meal can be cheap." Here "spotless" and "clean" are highly associated semantically where "spotless" and "cheap" are not. However they found no priming effect or any effect on errors for "clean" compared to "cheap." There was no cost to switching between sentences that matched or mismatched semantically. This is not too surprising since it has long been known that lexical semantic priming effects are typically very short lived and are not sustained past 1–2 intervening words (e.g., Zwitserlood et al., 2000). Nonetheless, the results of the priming control study further supported the idea that the modality switch costs were due to modality specific information predicted by an embodied model of cognition, rather than priming of symbolic symbols organized by modality (see alsoVan Dantzig et al., 2008; Oosterwijk et al., 2012; for additional results suggesting that the modality switch effect is not due to semantic priming alone).

However, a recent study by Louwerse and Connell (2011) suggests that instead of relying on an embodied cognition account alone to describe this type of data, they propose that a symbolic and an embodied cognition account can be complementary. They used statistical information about word co-occurrences to predict response times in a modality switch paradigm where participants verified whether properties shared or shifted modalities. Overall, they suggest that two factors contribute to the modality switch effect, semantic priming for modality information (the linguistic word co-occurrence information) and secondarily embodied semantic information. Although our study is not designed to tease apart these differences, what is striking about both the current study as well as the results from Hald et al. (2011) is that the two modalities used (visual and tactile) would not be predicted to show any linguistic priming effects according to Louwerse and Connell (2011). Within their model, they found that the linguistic account did not make such fine-grained distinctions between all of the five modalities. Important for the current paper, visual, and tactile modalities were not distinguished within their linguistic model. Accordingly, this would mean that any modality switch

effects found here or in Hald et al. (2011) cannot be due to priming of symbols organized by modality, at least if there is no distinction made between visual and haptic modalities. The model proposed by Louwerse and Connell (2011) does not exhaust the possibilities of statistical effects. It is well possible that there are statistical effects they have not picked up that could still be influencing our results. This same basic argument, that linguistic word cooccurrence factors alone cannot account for modality switching effects, was also offered by Connell and Lynott (2011) to account for modality switching costs seen with novel concepts (e.g., *jingling onion*). At present we can only speculate about the influence of statistical word co-occurrence on our results. We believe that there may well be an influence of statistical word co-occurrence information in tandem with an embodied approach, leading to the current results. However, the Louwerse and Connell (2011) approach to linguistic context may not capture statistical patterns at an appropriate level of granularity (contextual frame). The authors define linguistic context as the frequency of firstorder co-occurrences of modality specific words (p. 384), which may be insensitive to patterns at other levels. A more sensitive model of word co-occurrences may demonstrate that both statistical properties as well as an embodied approach contribute to our findings with visual and haptic modalities. The difficult task will be to determine under what circumstances statistical information and embodied information/processing differ. For example, one may imagine a situation where word co-occurrence is very low, just because we do not talk about that property of an object often. Nonetheless, in these circumstances, a modality switch effect is still seen due to the embodied simulation. This may be exactly what occurs with novel combinations (e.g., *jingling onion*; Connell and Lynott, 2011). Here there is no word co-occurrence information to rely upon, but a simulation allows us to easily come up with an interpretation of this combination (e.g., An onion that makes a jingling sound when you move it). Maybe the main purpose of the embodied simulation is to support novel combinations, but this is clearly speculative [see Lynott and Connell, 2010 for a review of models of conceptual combinations, including one that utilizes embodied conceptual combination (ECCo)]. There may be many empirical ways of teasing apart embodied and linguistic co-occurrence accounts, and there is already a growing body of evidence suggesting that information like type of stimuli (Louwerse and Jeuniaux, 2010), the particular cognitive task at hand (Louwerse and Jeuniaux, 2010), and the time of processing (Louwerse and Connell, 2011; Louwerse and Hutchinson, 2012) all appear to influence the interaction between statistical information and embodied information.

At this stage there is no evidence for a symbolic system with the complexity needed to account for our results, but we cannot rule out the possibility that evidence for such a system will be found in the future. Our motivation in beginning this project was to better understand the time course of embodied representations with negative true and false sentences rather than understanding the precise nature of how these seemingly embodied representations come about. Furthermore, we feel that a purely statistical account of the type proposed by Louwerse and Connell (2011) is unlikely to accommodate our results. Secondly, by using sentences that matched on modality but varied on veracity, a simple associative priming explanation would predict the same (or similar) findings for both the true and false sentences when they mismatch on veracity. This is not what we found here or in the affirmative study (Hald et al., 2011). Nonetheless, several follow-up studies are currently being conducted to more satisfactorily address whether an associative priming explanation can better account for the data than an embodied framework.

## **CONCLUSION**

Overall, our results fit well with idea that during comprehension we construct embodied simulations that are based on the previous discourse information in order to integrate the incoming information with the current simulation (see Glenberg and Robertson, 1999; Zwaan and Madden, 2005; for detailed accounts of how these simulations arise). Specifically our results suggest that the construction of a simulation in one modality for the context sentence can aid the simulation of the target sentence if it is in the same modality. This indicates that the simulation process, which is central to embodied language processing, can be predictive (in line with Barsalou, 2009), and that a stronger prediction can be made when there is no modality switch. We find that it is important to illustrate that judging veracity and understanding negation (linguistic and semantic markers) both seem to be influenced by embodied simulations during language comprehension. However, this is only half of the story. Leaving the conclusion at that is not satisfying; there are already many studies supporting the general idea that embodied simulations underlie language comprehension. We believe that by adding veracity and negation to the list of factors that seem to be influenced by embodied simulations allows us add something new to the larger puzzle of how embodied simulation supports language processing. Specifically, we propose several parameters regarding how embodied simulations support language comprehension in relationship to veracity and negation.

First, our very early effects of modality switching (beginning as early as 190 ms) suggest that the timing of embodied representations can be very fast. This is important because it suggests that the perceptual systems are involved in more than just a late deeppostlexical aspect of semantic processing. Aside from the timing of the effect, we believe that the modality switch effect related to veracity is due to an automatic, yet context driven simulation that is made by meshing the affordances of (i.e., Gibson, 1979) and world knowledge about the objects and actions included in the sentence (and wider discourse when available). Rather than performing some sort of comparison process between the simulation and the situation at hand (as Barsalou, 1999 proposes), instead we propose that the veracity judgment comes out of the process of building the simulation. When you have a false sentence, a slow down<sup>2</sup> in the simulation occurs, since the process of meshing the affordances is more difficult due to having less experience with the relevant objects and actions in combination in the real world. In terms of our results, this "slow down" is evidenced by a much smaller modality switch effect. This same slow down occurs when

<sup>2</sup>When we use the term "slow-down" or slowed simulation here, what we mean is that reaching a final simulation may be slower, however it may be more accurate to describe this as more difficult. Additional research is needed to better determine the best way to characterize this type of simulation.

you receive novel compounds (e.g.,Connell and Lynott,2011); that is, they find smaller switching costs with novel compounds. However, one's ability to consciously determine whether to interpret a slowed simulation as due to falseness or simply due to a new concept that we have little experience with depends on the context. In the context of the current experiment (judging sentences to be true or false), you will reach a "false" judgment from that slowed down simulation. On the other hand,when the context is to come up with a valid interpretation of a novel compound (such as, *jingling onion* in Connell and Lynott, 2011), you will not interpret a "slowed" simulation as an indication of falseness, but instead as a new concept. As discussed, it is possible that the modality match allows for a simulation that is more "accepting" of a wider variety of possible properties of an object, including less typical ones, but this process is more difficult. We do not have the space here to expand on all of the predictions this would make, but for example this would suggest that if we tested novel compounds with ERPs, we should find a similar modulation of the ERP for novel compounds as we see here for false sentences: namely, a much smaller effect of modality switching. Furthermore, this "slower" simulation may be the locus of the opposite amplitude switch effect seen in the false sentences, but further research is needed to confirm whether this is the case or not. Lastly, in relationship to negated sentences, we believe that understanding negation depends on the same simulation process

## **REFERENCES**


described above for veracity. However, unlike veracity, the correct interpretation of negation needs a different type of contextual support and it does not always fall out of the particular context/task demands in the same way that it may do for judging veracity vs. understanding novel compounds. Instead there may be a need for a second process of negating information that is already simulated (as proposed by Kaup et al., 2007a) when there is not much contextual support. However, when there is supporting discourse context and/or supporting world knowledge or in our case,modality matching, the simulation may be able to immediately negate the relevant information while building the simulation (in line with Nieuwland and Kuperberg, 2008). We believe the lack of veracity effects on our negated modality matched sentences may be an indication of the initial steps in this simulation process that could lead to immediate negation during the simulation.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Diane Pecher and her co-authors for making the materials of their study available to us and for helpful discussion about the design of this experiment. We would also like to thank Louise Connell for useful discussion about the design of the study and the interpretation of our results as well as our two reviewers for their valuable comments, especially in relationship to the interpretation of the results.


of language comprehension: how is negated text information represented?" in *Higher Level Language Processes in the Brain: Inference and Comprehension Processes*, eds F. Schmalhofer and C. A. Perfetti (Mahwah, NJ: Erlbaum), 255–288.


speech production: evidence from picture naming. *Lang. Cogn. Process.* 15, 563–591.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 27 September 2012; accepted: 08 February 2013; published online: 28 February 2013.*

*Citation: Hald LA,Hocking I,Vernon D, Marshall J-A and Garnham A (2013) Exploring modality switching effects in negated sentences: further evidence for grounded representations. Front. Psychol. 4:93. doi:10.3389/fpsyg.2013.00093*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Hald, Hocking , Vernon, Marshall and Garnham. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The body of evidence: what can neuroscience tell us about embodied semantics?

## **Olaf Hauk \* and Nadja Tschentscher**

MRC Cognition and Brain Sciences Unit, Cambridge, UK

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

**Reviewed by:**

Lotte Meteyard, University of Reading, UK Tatjana Nazir, L2C2-CNRS, France

#### **\*Correspondence:**

Olaf Hauk, MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, UK. e-mail: olaf.hauk@mrc-cbu.cam.ac.uk

Semantic knowledge is based on the way we perceive and interact with the world. However, the jury is still out on the question: to what degree are neuronal systems that subserve acquisition of semantic knowledge, such as sensory-motor networks, involved in its representation and processing? We will begin with a critical evaluation of the main behavioral and neuroimaging methods with respect to their capability to define the functional roles of specific brain areas. Any behavioral or neuroscientific measure is a conflation of representations and processes. Hence, a combination of behavioral and neurophysiological interactions as well as time-course information is required to define the functional roles of brain areas. This will guide our review of the empirical literature. Most research in this area has been done on semantics of concrete words, where clear theoretical frameworks for an involvement of sensory-motor systems in semantics exist. Most of this evidence still stems from correlational studies that are ambiguous with respect to the behavioral relevance of effects. Evidence for causal effects of sensory-motor systems on semantic processes is still scarce but evolving. Relatively few neuroscientific studies so far have investigated the embodiment of abstract semantics for words, numbers, and arithmetic facts. Here, some correlational evidence exists, but data on causality are mostly absent. We conclude that neuroimaging data, just as behavioral data, have so far not disentangled the fundamental link between process and representation. Future studies should therefore put more emphasis on the effects of task and context on semantic processing. Strong conclusions can only be drawn from a combination of methods that provide time-course information, determine the connectivity among poly- or amodal and sensory-motor areas, link behavioral with neuroimaging measures, and allow causal inferences.We will conclude with suggestions on how this could be accomplished in future research.

**Keywords: embodiment, semantics, neuroimaging, fMRI, EEG/MEG,TMS**

## **INTRODUCTION**

It seems obvious that the way we interact with the world shapes the way we represent concepts and knowledge. However, the degree to which experience shapes concepts and cognitive strategies in the fully developed brain is still poorly understood. In its most general form, theories of embodied cognition assume that human mental functions are shaped by the way the human body interacts with the environment (Varela et al.,1992;Clark,1997;Barsalou,2008). Theories differ with respect to the degree of embodiment they consider relevant, and assumptions range from that the environment is part of cognition, that the goal of perception is action, and that sensorymotor systems aid cognitive processes (e.g., Wilson, 2002). In the neuroscience of semantics, the debate focuses mainly on the question as to what degree perceptual and motor systems of the brain contribute to semantic representations and processes (Barsalou et al., 2003; Fischer and Zwaan, 2008; Nazir et al., 2008; Knoeferle et al., 2010; Kiefer and Pulvermüller, 2011; Pulvermüller, 2012). Even in this relatively circumscribed research field, the views on the relevance of embodiment rangefrom"strongly embodied" to"fully disembodied" (Meteyard et al., 2010). It therefore seems important to ask what we mean by embodiment in a specific context, and what type of evidence we accept to determine its relevance. It is unlikely that there is a one-fits-all definition of embodiment, and we may find that sensory-motor systems contribute a lot to one aspect of cognition (e.g., semantics or mental imagery), but hardly at all to another (e.g., arithmetic problem solving).

Our main aim for this article was to formulate the major methodological challenges for neuroscientific research on embodiment, and offer suggestions on how different methods can be used to answer specific questions of embodied semantics. How can we test theories of embodied semantics? Which methods are suitable to test what type of predictions? We are not attempting to develop another theory of semantics, but rather ask what type of questions can be answered with existing methodology. On this basis, we will provide a review of studies dealing with embodied semantics for single-words and numerical cognition. These are well-focused research areas, in which a large body of behavioral and neuroscientific evidence has already been acquired. They are therefore well-suited to illustrate the methodological and theoretical challenges of the research area, and it should be expected to be the most likely candidates to converge on a general conclusion.

## **MOTIVATION FOR EMBODIED THEORIES OF SEMANTICS**

Few papers on the neuroscience of embodied semantics contain an explicit definition of "(semantic) representation". It is implicitly assumed that it refers to an implemented code or symbol system that stands for external entities. Marr (1982) provides the definition "A representation is a formal system for making explicit certain entities or types of information, together with a specification of how the system does this" (chap. 1.2). It is not obvious how this relates to measures of brain activation, in particular with respect to the second part of the definition.

In this section, we will briefly summarize the main theoretical motivations for investigations on embodied semantics. The starting point for most embodied approaches is the question: how can a network of symbolic relationships relate to the real world? At some point, the tree of symbolic relationships should be grounded in sensory-motor experience. This is the "symbol grounding problem" of artificial intelligence (Harnad, 1990): one can write sophisticated computer programs that transform and process symbols that stand for semantic representations, but it is not clear how these symbols acquire meaning or intentionality. This problem is illustrated in Searle (1980)'s "Chinese room" problem: an English speaker who manually executes an algorithm to translate written English symbols into written Chinese symbols does not necessarily "understand" Chinese herself. Harnad suggests a "hybrid" symbolic/non-symbolic model, in which abstract functions can emerge by means of "bottom-up grounding" of categories from grounded sensory representations. Similarly, Barsalou's theory of perceptual symbol systems suggests that conceptual knowledge is represented as bottom-up and top-down interactions between sensory-motor systems and higher-level association areas (Barsalou, 1999). These higher-order cortices, convergence zones, or convergence regions have been localized to different parts of the brain (Damasio et al., 2004; Binder and Desai, 2011).

Mechanistic models have been proposed to explain how sensory-motor areas of the brain become connected with core language areas based on Hebbian principles of association learning (Hebb et al., 1971; Braitenberg and Pulvermüller, 1992; Pulvermüller, 1999; Wennekers et al., 2006). However, this does not necessarily imply that the fully developed brain cannot represent information independently of its original source. In order to categorize a word or an object as a horse, we may not have to invoke a full picture of a horse, but instead this decision may be made in higher-level association areas alone. Taken to the extreme, embodied theories of semantics might suggest that we need to activate the retina in order to understand the word "rose" – but if we consider this implausible, then we cannot argue for activation in visual cortex on purely theoretical grounds either. In the interest of speed and accuracy, it may even be more optimal to represent some information in local and specialized rather than distributed brain networks. Obviously, solid empirical evidence is needed as to whether or when sensory-motor systems contribute to semantic processes.

A particularly challenging case for theories of embodiment are abstract words, since they have no obvious referents in sensorymotor experience. Possible approaches to the incorporation of abstract semantics in frameworks of embodiments are for example summarized by Glenberg et al. (2008) and Pecher et al. (2011). First, abstract semantics can rely on concrete concepts by means of metaphor or image schemas (Lakoff, 1987; Gibbs and Steen, 1999). For example, the abstract knowledge that a proposition can be true or false, but not both, may be based on the sensory experience that an object can be either inside or outside a container, but not both. Second, some abstract concepts can be based on generalizations from situated simulations (Barsalou, 1999). For example, the concept of "truth" may be based on repeated experience of the consistency between simulated predictions (e.g., in order to verify a statement such as "the cup is on the table") and perception (seeing the cup on the table). Third, Glenberg and Robertson (1999) formulated the "indexical hypothesis", which states that abstract propositions can be acquired on the basis of concrete concepts. For example, abstract information transfer (such as reading something to someone) is grounded in concepts that describe the transfer of objects. It has also been found that not only sensory-motor, but also affective-emotional experience, shapes the processing of abstract words (Kousta et al., 2011).

As has been pointed out before (Kiefer and Pulvermüller, 2011), the embodiment of abstract semantics has only been addressed by a few neuroscientific studies yet. However, questions about the role of sensory-motor systems have also been asked in the context of numerical cognition (Lakoff and Nunez, 2000). For example, the "mental number line" has been suggested as the basis of the "SNARC" effect (Dehaene et al., 1993; Wood et al., 2008), and effects of finger-counting habits on number processing have recently been reported (Tschentscher et al., 2011; Fischer et al., 2012). We therefore included number processing in our review on embodied abstract semantics.

## **THE MENTAL IMAGERY DEBATE**

Some theoretical and methodological issues regarding the involvement of sensory-motor processes in higher cognition have already been raised in the debate about the role of pictorial representations in mental imagery, which started about four decades ago before the appearance of modern neuroimaging (e.g., Paivio, 1971; Kosslyn, 1975). The most relevant part of this debate for the present review is the question put forward by Anderson (1978)<sup>1</sup> : can our experiments actually distinguish between different types of representations (e.g., visual or propositional, abstract or modalityspecific)? Similar questions have been asked by other authors more recently (Thomas, 1999; Pylyshyn, 2002), but in this section we will follow the line of argument presented by Anderson (1978).

Anderson formally decomposed the information processing sequence between stimulus perception and response execution using operators: an operator *E* for stimulus encoding, which turns a stimulus *S*<sup>i</sup> into an internal representation *I*<sup>j</sup> (i and j do not have to be the same). There can be internal transformations *T* that can transform representations into different formats, such that *T*(*I*i) = *I*<sup>j</sup> . Finally, there are decoding operators that associate responses *R* with representations, i.e., *R*<sup>j</sup> = *D*(*I*i). The basis of the argument is that "If we restrict ourselves to behavioral data, we cannot directly observe the internal processes *E*, *T*, and *D* nor the internal representations" (p. 263). What we measure is necessarily

<sup>1</sup>We would like to thank Richard Henson for drawing our attention to this debate.

a conflation of encoding, transformation, and decoding processes. Thus, one model with specific assumptions about the structure of internal representations can be mimicked by another model with different assumptions about internal representations, when appropriate choices for the other operations are made. In other words, what we measure reflects a representation-process pair, and a change in assumptions about representations can be compensated for by changes in assumptions about processes. Anderson admits that there may be further constraints on representations and processes, e.g., based on parsimony, although he calls parsimony "an unfortunately subjective concept" (p. 266). However, he demonstrates – e.g., in a detailed analysis of letter rotation – that in the field of mental imagery, the most important behavioral evidence can be accommodated by both pictorial and propositional accounts.

Anderson offers an intriguing solution to this dilemma: just live with it. The fact that another theory also explains the data does not mean that either theory is useless (e.g., wave or particle theories of light are still useful in different contexts, and a general theory linking the two took a long time to develop). But even if we accept the argument that two competing theories can be useful, we of course still strive to find the most general theory that may comprise both as special cases. In order to do so, the only possibility is to find new sources of evidence.

Neurophysiological methods are a promising possibility, since they may bring us closer to "online" processing in the brain. If any methodology would allow us to measure a representation or an operation directly, we would be one step ahead. However, as Anderson notes, there are serious reasons for challenging neurophysiological data because "they do not provide anything like direct observation of the mental objects" (p. 271). Similar arguments have been put forward by other authors (Page, 2006). In principle, the logic about the conflation of representations and processes also applies to neurophysiological measurements. Certain parsimony and plausibility constraints may be justified – but they *have* to be justified. We therefore need to understand what we measure, and how we can relate this to our models and theories. This will be discussed in the following section.

## **NEUROSCIENTIFIC METHODS FOR THE INVESTIGATION OF SEMANTICS**

In the previous section, we highlighted the limitations of behavioral data for revealing semantic representations. Here, we will introduce the most common neuroimaging methods that have been employed in this endeavor. Neuroimaging data clearly exceed behavioral data in complexity. The hope is that the information contained in spatio-temporal patterns of brain dynamics allows specific conclusions about perceptual or cognitive processes and representations (e.g., Henson, 2005). However, three major problems complicate the interpretation of neuroimaging data:


Problem 1 is particularly important for metabolic neuroimaging, where the measured entity (e.g., the hemodynamic BOLD response) has a temporal resolution of several seconds or more (e.g.,Buckner, 1998). Activation observed with these methods may occur at the processing stage of interest, or at any later stage that is sensitive to its output. For example, if the semantic decoding of the word "hammer" leads to the activation of mental imagery processes or episodic memories involving a hammer, then the latter may cause activation in motor cortex, not the former. In order to interpret fMRI contrasts,whether in a univariate or multivariate manner, one needs to account for processes up to several seconds after stimulus onset. This can be a challenging task when contrasting stimuli that can easily be categorized by participants, such as words vs. line drawings, words vs. pseudowords, or for different semantic word categories.

This problem can be addressed using methods with high temporal resolution in the millisecond range (such as EEG/MEG), which may distinguish "early" processes (e.g., lexico-semantic information retrieval) from "late" ones (such as mental imagery). One can plausibly argue that activation that occurs in latency ranges of earliest lexico-semantic information retrieval (e.g., around 200 ms after stimulus onset) is too early to reflect mental imagery processes (Pulvermüller, 1999; Hauk and Pulvermüller, 2004).

However, problem 2 still remains: activation may be triggered by a stimulus in a spreading-activation, automatic association, or conditioned manner, but it may not contribute causally to the process of interest. As Mahon and Caramazza (2008) have pointed out, there may be at least four possible explanations for early activation in sensory-motor areas, e.g., in response to the word "kick": "(1) the word 'kick' directly activates the motor system, with no intervening access to abstract conceptual content; (2) the word 'kick' directly activates the motor system and in parallel activates abstract conceptual content; (3) the word 'kick' directly activates the motor system and then subsequently activates an abstract conceptual representation; and finally, (4) the word 'kick' activates an abstract conceptual representation and then activates the motor system." As an illustration, these authors point to the example of the Pavlovian dog: the fact that it salivates as soon as it hears the bell does not mean that salivating contributes to the recognition of the sound of the bell. Effects occurring in the same latency range (e.g., sensory-motor activation in a latency range associated with lexico-semantic access) do not necessarily imply that the underlying processes affect each other.

In order to show that activation in a brain region causally affects a process, one has to reverse the logic of neuroimaging: instead of measuring brain activity in response to a stimulus or task, one needs to modulate activity in a brain region and measure the effect on performance. A non-invasive technique that allows short term stimulation of specific brain areas with reasonable spatial resolution is transcranial magnetic stimulation (TMS; Pascual-Leone et al., 2000). Short pulses of magnetic fields, usually applied via palm-sized coils, induce electrical currents in the underlying brain tissue. These pulses can induce muscle twitches in thumbs or legs, or lead to temporary speech arrest (e.g., Devlin and Watkins, 2006). Stimulation can lead to "temporary lesions," i.e., impair the function of a brain area, which can be reflected in

slower response times or higher error rates. Conversely, it is also possible to "prime" a brain area, which can result in improved performance. The physiological mechanisms that lead to either of these outcomes are not yet fully understood. The effect of stimulation of particular brain areas on behavioral performance provides the strongest non-invasive evidence that these brain areas indeed contribute to the process of interest. A potential problem is that induced activation may spread from one brain area to other connected ones. This can potentially be a problem when we want to distinguish effects in primary motor cortex from premotor cortex, or primary sensory areas from higher-level areas. A possible alternative to TMS is transcranial direct current stimulation (tDCS; Paulus, 2011). However, it is not as spatially specific as TMS (at least with current technology).

We would like to point out that studies using motor-evoked potentials (MEPs; e.g., Buccino et al., 2005; Papeo et al., 2009), which show a modulation of excitability of motor areas during language comprehension, do not demonstrate a causal role of motor cortex in language comprehension. Just as EEG/MEG and fMRI, they only show that language processing affects motor cortex, but not vice versa. We will therefore focus on studies that have studied the effect of motor cortex stimulation on language performance.

Another possibility to investigate the effect of "modulation" of brain activity on behavior is to study patients with specific brain impairments, e.g., after stroke. The number of studies that can be run in this way is obviously limited. It can be difficult to establish the spatial specificity of brain lesions, their knockon effects on connected brain areas, and the effects of neuronal plasticity. Furthermore, severe damage to general language or semantic functions may mask more subtle effects, e.g., for different word categories. This may be the reason why Binder and Desai (2011) recently concluded that "conceptual deficits in patients with sensory-motor impairments, when present, tend to be subtle rather than catastrophic" (p. 531). We included neuropsychological studies in our review where we felt appropriate, but a detailed discussion of the limitations of this methodology is not within the scope of this paper.

The demonstration that a brain area is activated in a particular contrast, and that activation in this brain area predicts performance, still does not uniquely define the function of this area (point 3 above). The same brain area could, in principle, serve different functions depending on which neurons are active or which other brain areas are involved. All available non-invasive neuroimaging methods rely on signals produced by large numbers (thousands to millions) of neurons and synapses. Furthermore, brain areas are highly interconnected. The finding that an area is active during finger movement as well as during language comprehension does not directly imply that exactly the same processes of the former are involved in the latter. Experimentally, priming or repetition suppression paradigms can potentially address the problem whether the same or different neuronal populations are involved in different processes (Henson and Rugg, 2003): if two processes involve different neurons in the same area, they may not prime each other.

We conclude that there is no single brain measure that can be interpreted in terms of "representations" or unique processes in a straight forward manner. For word processing, this means that the activation of sensory-motor areas is consistent with their involvement in semantics, but it is not a proof. TMS studies have the potential to provide crucial evidence for the causal involvement of a brain area in cognitive processes, but it may still not uniquely determine the functional role of the targeted brain areas, which requires further experimental manipulations.

## **REVIEW OF THE EMPIRICAL LITERATURE**

In this section, we will test how the methodological constraints formulated above have been addressed in the empirical literature. We will begin with behavioral evidence for the view that sensory-motor knowledge affects language processing, followed by evidence from metabolic neuroimaging (mostly fMRI) studies for activation of sensory-motor systems during semantic processing, and by an analysis of the time-course of these effects based on EEG/MEG studies. Finally, we will ask whether there is evidence for a causal role of sensory-motor systems in semantics, mostly relying on TMS data.

## **CONCRETE WORDS**

## **Behavior**

Behavioral evidence for an interaction between action and language was provided for example by the Action-Sentence-Compatibility-Effect (ACE): participants performed a hand movement faster when the direction of movement was congruent with the direction of movement described in a preceding sentence, compared to when it was incongruent (Glenberg and Kaschak, 2002). Similar interference effects have been observed with visual and auditory motion paradigms (Kaschak et al., 2005, 2006). Evidence for this type of interaction has also been provided for single-word processing. When participants were required to perform a grasping movement triggered by the visual presentation of a word, movement kinematics changed depending on whether the word was hand-action-related or not (Boulenger et al., 2006). Effects of action-word type on response execution have also been documented in other studies (Dalla Volta et al., 2009; Mirabella et al., 2012). Meteyard et al. (2007) demonstrated that listening to verbs that described an upward or downward motion interfered with performance in a motion detection task, i.e., perceptual sensitivity was impaired when verb motion and displayed motion were incongruent. Similarly, motion direction of dot patterns and indicated by visually presented words interfered in a lexical decision task (Meteyard et al., 2008). The fact that these interference effects occurred when motion patterns were presented near the perceptual threshold, rather than supra-threshold, was taken as evidence that "automatic activation of motion-responsive area MT+ . . . gives rise to the interference between perceptual and semantic information processing" (p. R732). These results demonstrate that interference between semantics and perceptual-motor information occurs at the behavioral level. However, it is not direct evidence that this interference is due to an overlap of semantic and perceptual-motor systems. The interference could still occur at a separate level which is sensitive to the congruency of output among different processes.

## **fMRI and PET**

A number of fMRI and PET studies have shown that sensorymotor areas become active during language comprehension, mostly in the action domain (Hauk et al., 2004; Tettamanti et al., 2005; Aziz-Zadeh et al., 2006; Kemmerer et al., 2008; Boulenger et al., 2009; Raposo et al., 2009; Willems et al., 2010b; Hauk and Pulvermüller, 2011; Pulvermüller et al., 2011), but also for auditory (Kiefer et al., 2008), visual (Pulvermüller and Hauk, 2006; Simmons et al., 2007; Hauk et al., 2008a), and olfactory-gustatory concepts (Gonzalez et al., 2006; Barros-Loscertales et al., 2012), and for a mixture of those (Noppeney and Price, 2002, 2003; Hauk et al., 2008a; Kiefer et al., 2012).

The finding of category-specific differences in sensory-motor areas is usually directly taken as evidence that neuronal representations of semantic knowledge have been shaped by individual experience. An interesting test case are left- and right-handers: does the way we usually perform an action shape the way we represent is semantically? Two studies on this issue have led to different results. Willems et al. (2010a) reported more activation in left motor cortex when right-handers read uni-manual actionrelated words (e.g., "throw"), while left-handers showed more activation in right motor areas. Hauk and Pulvermüller (2011) used uni-manual and bi-manual (e.g., "clap") words, and found that uni-manual words activated left motor cortex in both leftand right-handers, while bi-manual words activated motor cortex bilaterally in both groups. The authors of both studies interpreted their results in terms of embodied semantics. However, while the former argued that "implicit mental simulation during language processing is body specific" (abstract), the latter concluded that their results reflect the influence of "left-hemispheric language dominance on the formation of semantic brain circuits on the basis of Hebbian correlation learning" (abstract). These conclusions are not contradictory. Differences between these studies may be explained by the use of different tasks. Willems et al. (2010a) used a lexical decision task, which engages motor areas and may focus participants' attention to action-related aspects of the stimuli. Hauk and Pulvermüller used a silent reading task that did not require an explicit response. This should stimulate further research into the effects of task demands on brain activation during semantic processing.

Some of the previous empirical results have been questioned on empirical grounds. For example, Postle et al. (2008) did not find somatotopic activation to action-words in their fMRI study, and pointed out that evidence for motor cortex activation to actionwords in cytoarchitectonically defined motor areas is inconsistent across studies. Nevertheless, taken together these studies provide strong evidence for the differential activation of distributed sensory-motor brain areas for different semantic word categories.

These studies do not, however, address the ambiguity of fMRI data with respect to the processing stage at which effects occur. In principle, all these results may reflect post-semantic processes such as mental imagery. Only few fMRI studies have addressed this issue directly. Tomasino et al. (2007) measured fMRI signals when participants read short phrases that were either action-related or not. They had to perform an imagery task, in which they were explicitly told to imagine the situation described by the phrase, or they had to perform a letter detection task. A difference in brain activation between action-related and non-action-related sentences occurred only in the imagery, but not in the letter detection task. The authors conclude that previous motor cortex activation results for action-words may have been caused by mental imagery processes. However, a letter detection task may have prevented not only mental imagery processes to take place, but may have been too superficial to even engage semantic processes. The results are therefore also consistent with the view that letter detection does not evoke semantic processing at a level that involves sensor-motor systems.Willems et al. (2010a)reported that action-words in a lexical decision task produced activation patterns in motor areas that were non-overlapping with activation patterns in a mental imagery task. Although this demonstrates that motor areas may play different roles in imagery and semantics, it also implies that motor cortex activation in non-imagery tasks cannot be fully explained in terms of mental imagery.

Wheatley et al. (2005) used a priming paradigm and could show that activity in left inferior temporal and left ventral premotor cortex, which were differentially activated by words referring to animate and manipulable objects respectively, were sensitive to semantic priming. Because the stimulus-onset-asynchrony between prime and target was only 150 ms, they argued that these areas are involved in the automatic processing of object meaning. Hauk et al. (2008b) studied the effect of semantic category on the word frequency effect. A negative modulation of brain activity by word frequency was found for visually related words in left inferior temporal areas, and for action-related words in left middle/superior temporal cortex. Assuming that a negative correlation with word frequency indicates processing at a lexicosemantic level, this is evidence that differential effects for semantic word categories indeed occur at a semantic rather than imagery stage. However, such a frequency effect was not observed in motor cortex.

In conclusion, a number of fMRI studies have provided evidence for the differential activation of sensory-motor areas during word and sentence comprehension, but the evidence that this reflects semantic rather than imagery processes is indirect and still scarce.

#### **EEG/MEG**

Fast psychophysiological methods such as EEG/MEG are less ambiguous with respect to processing stages. Although the exact time-course of lexico-semantic processes is still an ongoing research issue (Sereno and Rayner, 2000; Grainger and Holcomb, 2009; Hauk et al., 2012), we can assume that brain activity well before the earliest button presses in a lexical or semantic decision task (i.e., before ∼400 ms) are not related to mental imagery. Several studies have reported effects of semantic variables on brain responses already around 200 ms (Pulvermüller et al., 1999; Hauk et al., 2006; Amsel, 2011). Although fewer studies exist compared to the fMRI domain, reviews of the evidence for early semantic effects have already been provided (Pulvermüller,2005;Hauk et al., 2008b; Kiefer and Pulvermüller, 2011). For example, Pulvermüller et al. (2001) and Hauk and Pulvermüller (2004) reported differences between action-word types in the ERP around 200 ms. Using a method to estimate the possible neuronal generators of these effects, the latter study found a pattern of results consistent with somatotopy. Similarly, differences between words with and without acoustic semantic features occurred in ERPs around 200 ms, and in a parallel fMRI study activation to words with acoustic

features overlapped with areas activated during listening to sounds (Kiefer et al., 2008). Around the same latency, Moscoso del Prado Martin et al. (2006) observed ERP differences between color- and form-related words. Source estimation revealed more activation for form-related words in frontal brain areas, while color-related words activated temporal cortex.

All studies reviewed in the previous paragraph used visual word presentation. Visual words are easier to control for physical and psycholinguistic variables than speech stimuli, since the latter are extended over time and acoustic features can be difficult to quantify. However, of particular interest are studies using "mismatch negativity" (MMN) paradigms, which allow studying brain responses to stimuli outside the focus of attention (Pulvermüller and Shtyrov, 2006). Even when participants were distracted by watching a silent movie, the brain responses measured by MEG to auditorily presented words around 200 ms differed depending on the action-word category (Pulvermüller et al., 2005b). Source estimation revealed that leg-related words producing more activity around the vertex (i.e., consistent with leg motor cortex), and hand/mouth-related words activating more lateral areas (consistent with hand/mouth motor cortex). In addition to the"earliness" of these effects, the fact that they occur outside the focus of attention has been taken as evidence that they occur at an automatic semantic level, and are not under strategic control.

There is convergence among these studies that differences between semantic word categories occur around 200 ms after stimulus onset<sup>2</sup> . However, there are clearly fewer studies than in the fMRI domain. What is more, the analysis of EEG and MEG data is less standardized, and more difficult to compare across studies, than for fMRI. The fact that several studies using very different methodology converge on the same conclusion should be taken as support for their conclusions. At the same time, it is difficult to integrate these results in a common coordinate frame.

#### **Transcranial magnetic stimulation**

The strongest conclusions could potentially be drawn from TMS studies,which can test the effect of temporal stimulation of specific brain areas on behavioral performance. Effects of TMS on language comprehension and production are well-established (Devlin and Watkins, 2006), but evidence for a causal involvement of sensory-motor areas in semantic processing in this area of research is surprisingly scarce. As we have pointed out before, studies using MEPs provide correlational rather than causal evidence, and we will here focus on studies that have studied the effect of motor cortex stimulation on language performance.

Pulvermüller et al. (2005a) investigated the effects of TMS pulses delivered at 150 ms after word presentation to hand and leg motor cortex on performance in a lexical decision task. Target words were either hand- or leg-related. The authors found an interaction of stimulation site and word type, i.e., responses to arm-related words were faster after hand motor cortex was stimulated, and faster to leg-related words after leg motor cortex stimulation. The fact that response facilitation rather than inhibition was observed was attributed to the fact that TMS pulses were delivered at relatively low intensities. Tomasino et al. (2008) studied effects of TMS on hand-action-verb processing at different stimulation latencies and in different tasks. Sub-threshold TMS pulses were applied to hand motor cortex and vertex, respectively, at different latencies between 150 and 750 ms after word presentation. In different tasks, participants had to indicate by button press whether they had finished reading (silent reading), judge whether the action involved a rotation of the hand (imagery), or whether the word occurred frequently in a newspaper (frequency judgment). The main result was a facilitatory effect of hand motor cortex stimulation at all stimulation latencies, but only in the imagery task. The authors therefore argued that motor cortex is only involved in action-verb processing when it involves simulation of the corresponding movement. However, the silent reading task did not require any lexical or semantic processing at all in order to initiate a response (and there are no correct or incorrect responses in this task). The other two tasks are quite unfamiliar tasks, which elicited response times of about 1200 ms, compared to about 600 ms in the Pulvermüller et al. (2005a)study. It is therefore possible that a lexical decision task is more sensitive to effects of motor cortex stimulation.

In conclusion, direct evidence from non-invasive studies for a causal link between motor cortex and language processing is still scarce. In particular, there is no evidence yet that sensorymotor cortex stimulation disrupts semantic processing. Evidence of this kind has only been provided by studies on clinical populations, such as Parkinson's disease (Boulenger et al., 2007; Herrera et al., 2012), stroke patients (Neininger and Pulvermüller, 2001, 2003; Trumpp et al., 2012), and semantic dementia (Pulvermüller et al., 2010). Kemmerer et al. (2010) compared behavioral measures of action comprehension and lesion overlap in a large group of brain damaged patients. Lesions in several brain areas including precentral gyrus, possibly extending into hand-related motor areas, as well as ventral postcentral gyrus predicted performance in tasks such as word-picture matching or word comprehension for action-verbs. This is probably the strongest neuropsychological evidence so far that these areas contribute to action-verb processing. However, using a similar approach but with smaller sample size, Arevalo et al. (2012) did not find evidence for somatotopic effects of different action-word categories.

#### **ABSTRACT WORDS**

#### **Behavior**

In the behavioral domain, the ACE mentioned above for concrete sentences has also been observed for abstract sentences (Glenberg et al., 2008): participants performed a hand movement faster when the direction of movement was congruent with the direction of information flow (e.g., reading to somebody or being read to) described in a preceding sentence, compared to when it was incongruent. Similar effects have been observed with metaphors (Santana and de Vega, 2011). To our knowledge, this type of evidence has so far not been provided for abstract words in isolation. Several fMRI studies have investigated the embodiment of abstract sentences. For example, Boulenger et al. (2009) reported somatotopic activation for idiomatic sentences ("he grasps the idea" and "she kicks the habit"). Two other studies failed to find such effects

<sup>2</sup>For auditory stimuli, the reference latency is often not stimulus onset, but the point in time at which crucial information become available, such as the word recognition point.

for abstract sentences, which may be due to stimulus selection or experimental design (Aziz-Zadeh et al., 2006; Raposo et al., 2009).

## **fMRI**

A number of studies have investigated effects of general word concreteness or abstractness, and for example have found more activation for abstract compared to concrete words in left inferior frontal cortex (e.g., Fiebach and Friederici, 2004; Noppeney and Price, 2004; Sabsevitz et al., 2005). However, only few fMRI studies have investigated differences between different abstract word categories with respect to embodiment.Ruschemeyer et al. (2007) used abstract German words that contained concrete action-words as stems (e.g., "be-greifen," which means "to comprehend" and contains the stem "grasp"), but they did not activate motor cortex. Moseley et al. (2011) hypothesized that the meaning of emotionwords is grounded in emotion-expressing actions, and that"neural circuits controlling facial expressions and bodily actions related to an emotion concept like 'anger' are tightly linked to our neural representation of the word denoting it." In line with this hypothesis, they found stronger motor cortex activation to emotion-words compared to non-action-related words. This was still the case when the analysis was restricted to emotion-words that did not directly refer to actions (such as "frown").

## **EEG/MEG**

As for fMRI, several EEG/MEG studies have investigated general effects of concreteness. Holcomb et al. (1999) and Adorni and Proverbio (2012) found a modulation of the N400 component by concreteness in sentence context. Amsel (2011) reported effects of multiple semantic variables, including imageability, already around 200 ms. However,without information about the neuronal sources these results do not provide direct evidence that early brain responses to abstract words contain signs of embodiment. In an MEG version of their previous fMRI experiment, Boulenger et al. (2011) presented literal and idiomatic sentences and analyzed the time-course of brain responses after the critical word (e.g., "habit" in "she kicked the habit"). Literal and idiomatic sentences differed in their brain responses after about 200 ms. Interestingly, there was also evidence for somatotopic activation for arm- and leg-related idioms in this latency range. To our knowledge, this is the only study so far that has tested theories of embodiment for abstract concepts using EEG/MEG, and no data from single-word studies are available so far.

## **Transcranial magnetic stimulation**

Evidence from TMS studies is also rare. Glenberg et al. (2008) measured TMS-induced MEPs in their behavioral study described above, and found that MEP amplitudes were greater in transfer sentences than no-transfer sentences, and that there was little difference between concrete and abstract sentences. As explained before,MEPs do not allow inferences about causality of motor cortex in language processing. Pobric et al. (2008) demonstrated using repetitive TMS that disruption of right posterior superior temporal sulcus impaired processing of novel compared to conventional metaphors, but the effects of sensory-motor stimulation were not studied. Similarly, the involvement of left ventrolateral prefrontal cortex in abstract word processing has been demonstrated in neuropsychological and TMS data (Hoffman et al., 2010), but again this does not demonstrate a link between abstract concepts and sensory-motor brain systems. We are not aware of any direct evidence from neuropsychology or TMS that has demonstrated this link for abstract semantics yet.

## **NUMBERS**

In the previous section, we noted that evidence for embodied abstract word semantics from neuroimaging and neuropsychology is scarce. Importantly, evidence for a causal link between sensory-motor systems and processing of abstract words does not exist yet. We therefore ask here whether this evidence exists in the domain of numerical cognition. Numbers are an interesting case because they have no direct referent in the real world, but can assume different meanings in many different contexts. It has been suggested that the concept of numbers is shaped by sensory-motor experience during development. Fingercounting habits might impact on number semantics based on the way children acquire knowledge about numbers by counting with their fingers (e.g., Butterworth, 1999). This learning process may also involve innate systems for magnitude processing (Dehaene et al., 2003). Supporters of the embodied view on numerical cognition have proposed that systematic sensorymotor associations during number acquisition remain part of our numerical knowledge in form of conceptual metaphors (Lakoff and Nunez, 2000). Some review articles have already discussed the neuroscientific literature on numerical cognition from an embodied perspective (Andres et al., 2008; Fischer, 2012). We here review the evidence in the context of embodied abstract semantics and in the light of our methodological considerations above.

## **Behavior**

Several studies have reported interference effects between spatial and motor information and performance during number processing. In the spatial domain, it has been demonstrated numerous times that participants make associations between number magnitude and space, e.g., along a "mental number line" (e.g., Izard and Dehaene, 2008). An important source of evidence stems for the "SNARC" (spatial-numerical association of response codes) effect, which means that participants usually respond faster to small numbers using their left hand and faster to larger numbers using their right hand (Dehaene et al., 1993; Wood et al., 2008). Although the exact direction of this association may be flexible (e.g., Shaki and Fischer, 2012), it is evidence for a link between spatial representations and number magnitude processing. However, a recent study suggests that this effect may not reflect properties of the semantic representations of numbers, but rather how we organize them in working memory in specific tasks (van Dijck and Fias, 2011). In the motor domain, evidence has been provided for the impact of numerical tasks on finger-counting related movements (Di Luca et al.,2006), and for priming of number magnitude through canonical finger-counting postures (Di Luca and Pesenti, 2008). Badets and Pesenti (2010) reported a motor-to-semantic interaction, showing that observed closing grip postures slowed down the processing of large numbers. Furthermore, an influence of individual finger-counting habits on the SNARC effect has been found (Fischer, 2008).

## **fMRI**

In the fMRI domain, a recent study used multi-voxel-patternanalysis (MVPA) to test for brain areas that are sensitive to the congruency between visually presented numerical and spatial intervals (Koten et al., 2011). These areas were intraparietal sulcus, frontal eye fields, and supplementary motor areas. With respect to motor cortex, it has been reported that brain activation evoked by visually presented small numerals in the precentral gyrus was lateralized according to whether participants report to usually use their left or right hand to gesture small numbers, i.e., small numerals mainly activated left-lateral premotor cortical regions in "rightstarters," and right-lateral premotor cortical areas in "left-starters" (Tschentscher et al., 2011).

## **EEG/MEG**

Only few studies have addressed questions about the functional locus of the SNARC effect using ERP methodology. Two studies have provided evidence that spatial-numerical associations occur at a response preparation stage rather than during semantic processing, e.g., using lateralized readiness potentials (Keus et al.,2005; Gevers et al., 2006). To our knowledge, there are no ERP/ERF studies on finger-counting related effects in number processing.

## **Transcranial magnetic stimulation**

Several TMS studies have addressed the role of motor circuits and spatial-numerical associations in number processing. Some of them measured the modulations of MEPs during numerical tasks, and consequently do not provide causal evidence (Andres et al., 2007; Sato et al., 2007). A few studies have shown that stimulation of parietal areas, in particular around angular gyrus, reduces the effect of spatial-numerical associations, e.g.,in a line-bisection task (Gobel et al., 2001; Cattaneo et al., 2009) or in a SNARC paradigm (Rusconi et al., 2007). The effect in the latter study was attributed to disruption of the link between numbers and visuo-spatial attention rather than to interference with core number representations. One TMS study provided evidence for a causal role of angular gyrus in both number processing and finger movements (Rusconi et al., 2005). However, to our knowledge no study has investigated the direct effect of motor cortex stimulation on number processing performance yet. For a more complete overview of this research area, see Sandrini and Rusconi (2009). Neuropsychological evidence for a functional overlap between the spatial and motor system and number processing is provided by the Gerstmann Syndrome (Gerstmann, 1924; PeBenito, 1987), resulting from damage of the left parietal lobule, which results in difficulties with number processing, orientation in space, control of actions, and representation of own body shapes (for review, see Butterworth, 1999).

## **ARITHMETIC**

## **Behavior**

In analogy to number processing, several behavioral studies have shown that arithmetic fact retrieval have an "operational momentum," e.g., addition problems are associated with movements to the right and subtraction problems with movements to the left (Pinhas and Fischer, 2008; Knops et al., 2009). There is also behavioral evidence that simple arithmetic operations can involve finger-numerical representations in adults (Badets et al., 2010; Klein et al., 2011). For example, in a response-effect compatibility paradigm, Badets et al. (2010) observed faster responses to simple addition problems when congruent finger-counting gestures were presented. However, a recent study suggested that implicit finger-counting knowledge only impacted on simple arithmetic problem solving when participants were requested to use counting strategies (Imbo et al., 2011). This challenges the relevance of finger-counting knowledge for adults' simple arithmetic, considering that adults mostly use memory retrieval strategies instead of counting when solving simple arithmetic problems (LeFevre et al., 1996, 2006).

## **fMRI**

A number of fMRI and PET studies have shown activation in parietal and motor areas of the brain during arithmetic processing (see Arsalidou and Taylor, 2011 for review). However, to what degree these activations reflect embodied processes has only been addressed by a small number of studies. The operational momentum effect has been demonstrated in an fMRI study using an MVPA approach (Knops et al., 2009). This study showed that brain activation patterns in the frontal eye field were similar for real eye movements to the left and right on the one hand, and addition and subtraction problems on the other. Several fMRI studies have suggested a special role for visual-spatial processes in addition, and for verbal processes in multiplication (Chochon et al., 1999; Zhou et al., 2007; Grabner et al., 2009). In the motor domain,Andres et al. (2012) found common activation for mental calculation and finger representations in an fMRI conjunction analysis in bilateral horizontal intraparietal sulcus (hIPS) and posterior superior parietal lobule (PSPL), but not in motor cortex.

## **EEG/MEG and TMS**

Only a few ERP studies investigated differences between arithmetic operation types with respect to sensory-motor concepts, and found evidence for stronger involvement of visual-spatial processes in addition and for verbal processes in multiplication in early time windows of arithmetic fact and rule retrieval (Zhou et al., 2006, 2009). To our knowledge, there is no evidence from EEG/MEG studies using source estimation that could shed light on the time-course of activation in sensory-motor systems during arithmetic fact retrieval. Similarly, evidence for a causal involvement of sensory-motor systems from TMS studies is still missing. In line with neuropsychological evidence from the Gerstmann Syndrome, specific impairment of simple arithmetic processes has been observed in patients with parietal cortex damage (Dehaene and Cohen, 1997; Lemer et al., 2003). However, there is no such evidence for impairment arithmetic skills due to lesions in motor cortices.

In conclusion, there is some correlational evidence for a role of spatial-numerical and motor associations in numerical cognition. This evidence stems mostly from studies on number perception, and to a lesser degree from studies on arithmetic fact retrieval. However, time-course information or evidence for a causal link between sensory-motor systems and numerical processing is currently non-existent or scarce.

## **CONCLUSION**

We have reviewed the theoretical and methodological challenges that are faced by the neuroscientific investigation of embodied semantics. Although there are several theoretical approaches that plausibly accommodate a role of sensory-motor systems in semantic processing (Harnad, 1990; Barsalou, 1999; Pulvermüller, 2012), it remains a challenging empirical question to what degree cortical sensory-motor systems contribute to semantics in the fully developed brain. Among the different interpretations of the concept "embodiment" (Wilson, 2002), we focused on the role of neuronal sensory-motor systems in semantics (Meteyard et al., 2010). Going back to the mental imagery debate (Kosslyn, 1975; Anderson, 1978), we pointed out that any measurement of behavioral or brain responses will necessary reflect a conflation of representations and processes, i.e., a mix of how information is stored and how it is retrieved in a particular context. We then highlighted the major strengths and weaknesses of neuroimaging methods fMRI, EEG/MEG, TMS for the investigation of sensory-motor systems in semantics, which guided our review of the empirical literature on this topic. Our review mainly covered the literature on concrete and abstract word processing, as well as number processing and arithmetic fact retrieval as special instances of abstract semantics.

## **SUMMARY AND INTERPRETATION OF THE EXISTING EVIDENCE**

Our review of the empirical literature revealed that the bulk of evidence for embodied word semantics stems from fMRI studies on concrete words, which have demonstrated that the perception of words leads to activation in cortical sensory-motor systems depending on their referents. Unfortunately, fMRI data are correlational, have very low temporal resolution, and are arguably the least conclusive with respect to the functional interpretation of these effects. There are much fewer studies on this issue in the EEG/MEG than in fMRI domain. Nevertheless, the existing studies seem to converge on a latency of about 200 ms after stimulus onset for the earliest differences in brain responses between semantic word categories.

The interpretation of these effects as evidence for embodiment crucially depends on the neuronal generators of these signals, namely whether they originate in sensory-motor areas of the brain. The spatial resolution of EEG/MEG measurements, as well as source estimates derived from them, is inherently limited (Molins et al., 2008; Hauk et al., 2011). However, the existing evidence is consistent with the view that the generators of the early effects are distributed according to their sensory-motor associations, e.g., somatotopically in the case of action-words.

Only one TMS study has provided evidence for a causal link between sensory-motor systems and semantics. There is some evidence from neuropsychological studies that damage to sensorymotor areas can affect semantic processing.We conclude that there is strong evidence, although yet no proof, that cortical sensorymotor systems subserve concrete semantics. However, some crucial evidence on the time-course of sensory-motor activation in word processing, and in particular on the causal effects of sensorymotor activation on language performance, is still scarce and inconsistent.

The evidence for embodied abstract semantics is clearly weaker than for concrete semantics. Evidence for behavioral interactions exists at the sentence level, but not for single-words. Several studies have investigated fMRI responses to abstract sentences, such as idioms, but the findings are inconsistent. This may be due to experimental paradigms, stimulus selection and analysis methods, but clearly further research is needed to reconcile these studies. Evidence about the time-course of embodied abstract semantics, or the causal relationship between sensory-motor systems and abstract semantic processing, is almost non-existent. Similarly, research on numerical cognition has provided some evidence that sensory-motor systems may be involved in the retrieval and processing of numbers and arithmetic facts. The few ERP studies on this topic suggest that spatial-numerical associations play a role at the level of response selection rather than semantics. Questions about the time-course and causality of these effects should be addressed in more detail by further research. The techniques employed in research on concrete words may provide some guidance in this area.

How can the existing evidence inform theories of embodied semantics? Several theories posit a role for sensory-motor systems in semantic processing for concrete words (Harnad, 1990; Barsalou, 1999; Pulvermüller, 2012). Mechanistic models have been developed that describe how these theories may be realized by neuronal networks (Wennekers et al., 2006; Garagnani et al., 2008). It is therefore likely that the findings for concrete words reviewed above reflect the "grounding of the tree of semantic relationships" as in Harnad's framework, or the bottom level of "convergence zones" in Barsalou's framework, or "distributed category-specific networks based on Hebbian associations learning" in Pulvermüller's framework.

However, based on the existing evidence, it is difficult to define the functional role of sensory-motor systems in semantic processing more precisely. Harnad and Barsalou describe the idea of convergence zones at different levels of abstraction. This may be illustrated on the basis of an example provided by Wilson (2002): We may start learning about the meaning of numbers by gesturing small numbers with our fingers. At the beginning, we fully flex the corresponding fingers. When we get better at this, we just briefly twitch them. At some point, even this may not be necessary any more, but our motor cortex may still be activated, e.g., to aid our short term memory. But why stop there? At some later stage, activation may only occur in areas that are several synaptic relays removed from motor cortex, e.g., in parietal or frontal lobes. It may then depend on the particular problem we need to solve, or information we need to retrieve from the stimulus, which of these neuronal systems contributes to performance. Do we need information from sensory-motor systems in order to decide whether "tree" refers to a tool or not, or can the necessary information be retrieved from higher-level convergence zones (Harnad, 1990; Barsalou et al., 2003) or the semantic hub (Patterson et al., 2007; Pulvermüller et al., 2010)? Does this differ from the task of determining whether trees can be blue? Are sensory-motor systems necessary for every task that involves semantic processing?

In research on number processing, behavioral and ERP evidence suggests that effects of spatial-numerical associations (e.g., the mental number line) do not occur at the level of semantic representations, but rather during later strategic processes such as working memory or response selection (Gevers et al., 2006;

van Dijck and Fias, 2011). In conclusion, novel experimental paradigms and analysis methods are required to define the role of sensory-motor systems in semantic processing in more detail.

#### **FUTURE DIRECTIONS: BEHAVIOR AND CONNECTIVITY**

Very few studies so far have demonstrated effects of activation in sensory-motor systems on task performance in word processing. It is still possible that word stimuli automatically activate distributed semantic networks, but whether these affect performance is not clear yet, and it may depend on the particular task. Future studies using correlational measures such as fMRI or EEG/MEG could use activation values in specific brain areas as predictors for behavioral measures such as reaction times. Novel methods of functional and effective connectivity analysis may shed some light on the connectivity between sensory-motor areas and possible convergence zones or hubs (Valdes-Sosa et al., 2011).

Assuming that sensory-motor systems play an essential role in semantics, it is still an open question as to how the activity in these distributed areas is coordinated or bound together. Some authors have pointed toward the anterior temporal lobe as the semantic hub (Rogers et al., 2004; Patterson et al., 2007; Pulvermüller et al., 2010), while others have attributed this function to the angular gyrus, or possibly multiple regions (Binder and Desai, 2011). Novel methods for connectivity analysis may clarify this issue. If connection strengths between a brain region and sensorymotor systems were found to be modulated by semantic category (e.g., action- vs. object-word) or by task context (e.g., lexical vs. semantic decision), this would provide strong evidence that this region indeed serves as a hub. Even stronger evidence would be provided if these connection strengths also predicted behavioral performance, e.g., in categorization or identification tasks, or in naming.

EEG/MEG can be particularly useful in this endeavor, not just because they can distinguish "early" from "late" processing stages, but also because they allow different types of connectivity analyses. It is not yet clear how (or in how many different ways) brain areas communicate with each other. A possible candidate are oscillations (e.g., Fries et al., 2007), and functional connectivity may be reflected in coherence or phase-coupling among brain areas within and across frequency bands (Schoffelen and Gross, 2009). In addition, effective connectivity can be assessed using measures of Granger causality or structural equation modeling (Kiebel et al., 2009; Valdes-Sosa et al., 2011). Effective connectivity measures even allow inferences about sources that are not reflected in the measured signal, such as subcortical generators (David et al., 2011). These developments will provide powerful tools to disentangle the distributed neuronal networks underlying semantic processing.

## **FUTURE DIRECTIONS: FLEXIBILITY AND AUTOMATICITY OF SEMANTIC PROCESSING**

Surprisingly few neuroscientific studies have systematically investigated the effects of task modulation on semantic word processing. If they did, it was mainly in order to distinguish imagery from semantics, rather than to analyze semantic processing in more detail. There is growing evidence that word recognition is flexible (Balota and Yap, 2006; Norris, 2006), and that semantic word processing is sensitive to task demands (Martens and Kiefer, 2009; van Dam et al., 2012). A detailed investigation of the spatio-temporal brain dynamics under different well-defined task demands is still lacking, and should be the focus of future research on embodied semantics. For example using the methodological approaches mentioned in the previous section, one could test how well activity in sensory-motor systems predicts behavioral performance in tasks that require different levels of semantic detail, ranging from "abstract or concrete?" to "does it involve handling with the index finger?".

A few recent studies have already investigated the effect of task demands on action-word processing. In an fMRI study, van Dam et al. (2012) investigated brain activation to words that had to be judged either for color or for action attributes. Areas in the left parietal lobes activated more for action-words than for abstract words, but only during action-related judgments. This was interpreted as evidence for flexible and context-dependent semantic processing. From these data it is not yet clear whether task demands affect early retrieval of semantic information, or later stages of processing. This type of experiment, for example systematically varying depth of semantic processing or type of semantic judgment, should also be performed with EEG/MEG methodology.

Furthermore, it will be important to test whether sensorymotor activation in semantics reflects the activation of the same neuronal populations as for example in movement execution of object perception, or whether it reflects the activation of different neuronal populations in the same areas. This can potentially be addressed using priming or adaptation paradigms (Henson and Rugg, 2003; Wheatley et al., 2005; Gold et al., 2006). Recent studies have introduced motor priming paradigms to the investigation of embodiment (Glenberg et al., 2010). In a recent combined EEG/MEG study, arm- and leg-related words were presented shortly after participants initiated the experimental trial themselves by button press (Mollo et al., 2011). In different blocks, they either pressed the button by finger or by foot, respectively. The button remained pressed until a letter string appeared. If this string was a real word, participants released the button as quickly as possible. If it was a pseudoword, they kept the button pressed until the end of the trial. In the source space analysis, the authors found an effect of congruency between effector used for the button press (finger or foot), and word type (arm- or leg-related). This congruency effect occurred around 150 ms after the onset of the letter string, and not only in motor cortex, but also in a left posterior superior temporal area. Thus, pre-activation of a specific part of the motor cortex led to word-type-specific modulation of brain activity in a non-motor language area at a very early stage of processing. This suggests that motor areas related to finger or foot movements are essential parts of neuronal cell assemblies for action-related words. Future studies could apply this paradigm with different movement and word types, and under varying task demands.

A particularly interesting case are words with multiple meanings. Studies on single-word processing usually (often implicitly) assume that a word read in isolation activates its dominant meaning (e.g., that"kick"refers to hitting something with thefoot, rather than to the feeling you get from riding a roller-coaster). This can only be studied in sentence context, which was not the focus of this review. It has been suggested that concepts are composed of parts that are context-dependent, and other parts that are contextindependent (Barsalou, 1982). The spatio-temporal dynamics of polysemy may provide intriguing evidence for the flexibility of semantic processing.

Theories of embodied concrete semantics can, to some degree, be translated to abstract semantics as long as abstract concepts bear some relationship to concrete entities, by means of abstraction or metaphor (e.g., Lakoff and Nunez, 2000; Glenberg et al., 2008). This is clearly a fruitful field for future research. The question remains whether sensory-motor systems are also involved in "pure" abstract semantics. We are able to acquire concepts without sensory experience, e.g., by means of discourse and context (Bloom, 2001). Aspects of this process can be modeled by means of latent semantic analysis (Landauer and Dumais, 1997; Louwerse and Ventura, 2005). It will therefore be an important question for future empirical studies to what degree abstract semantic processing is driven by higher-level convergence zones, and to what degree lower-level sensory-motor systems are involved.

#### **GENERAL CONCLUSION**

We have demonstrated that even in a relatively circumscribed research area such as concrete and abstract semantics for single words, it is difficult to define the specific function of sensorymotor areas. The empirical evidence is still inconsistent, and its functional interpretation limited. As some authors have pointed out previously (Wilson, 2002; Meteyard et al., 2010), different

#### **REFERENCES**


interpretations of embodiment exist. The right question to ask may not be "embodied or not?" but rather "embodied to what degree?" The possibility that sensory-motor systems may contribute more or less to different types of semantic processes has so far received little attention in the neuroscientific literature, although similar arguments have been presented in the debate about the role of visual representations in mental imagery (Pylyshyn, 2002).

Furthermore, one may ask whether the role of sensory-motor systems in semantics, or more generally in cognition, differs among individuals – are some individuals more embodied than others? There is evidence that experience with particular types of concepts, e.g., in sport, music, and dance, may shape the way we process actions and language (Calvo-Merino et al., 2006; Beilock et al., 2008; Hoenig et al., 2011). The investigation of these questions will be an exciting endeavor for future research. However, as we have shown for the simple case of single-word processing, a number of important experimental and methodological challenges need to be addressed before we can arrive at firm conclusions. While our methods have certainly become more complex over the last few decades, the brain has not become simpler. The major scientific challenge will be to formulate questions that we are able to answer.

#### **ACKNOWLEDGMENTS**

We would like to gratefully acknowledge the support of the Medical Research Council UK (Olaf Hauk, Nadja Tschentscher: MC-060-5PR40) and the Gates Cambridge Scholarship (Nadja Tschentscher).

sensory modality. *Cereb. Cortex* 22, 2554–2563.


somatotopy in idiom comprehension. *Cereb. Cortex* 19, 1905–1914.


support both early and late interaction of numerical and spatial information. *Front. Hum. Neurosci.* 5:115. doi:10.3389/fnhum.2011.00115


decision on motion words. *Curr. Biol.* 18, R732–R733.


Norris, D. (2006). The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. *Psychol. Rev.* 113, 327–357.

Page, M. P. (2006). What can't functional neuroimaging tell the cognitive psychologist? *Cortex* 42, 428–443.


and verb-containing phrases is modulated by the presence of overt grammatical markers. *Neuroimage* 60, 1367–1379.


correlates of processing verbs with motor stems. *J. Cogn. Neurosci.* 19, 855–865.


active perception approach to conscious mental content. *Cogn. Sci.* 23, 207–245.


neural evidence from right- and lefthanders. *Psychol. Sci.* 21, 67–74.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 September 2012; accepted: 23 January 2013; published online: 13 February 2013.*

*Citation: Hauk O and Tschentscher N (2013) The body of evidence: what can neuroscience tell us about embodied semantics? Front. Psychology 4:50. doi: 10.3389/fpsyg.2013.00050*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Hauk and Tschentscher. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The functional role of the periphery in emotional language comprehension

## **David A. Havas <sup>1</sup>\* and James Matheson<sup>2</sup>**

<sup>1</sup> Department of Psychology, University of Wisconsin-Whitewater, Whitewater, WI, USA <sup>2</sup> School of Cognitive Science, Hampshire College, Amherst, MA, USA

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Daniel Casasanto, Max Planck Institute for Psycholinguistics, Netherlands Marta Ponari, University College London, UK

#### **\*Correspondence:**

David A. Havas, Department of Psychology, University of Wisconsin-Whitewater, 800 W. Main Street, Whitewater WI 53190, USA. e-mail: havasd@uww.edu

Language can impact emotion, even when it makes no reference to emotion states. For example, reading sentences with positive meanings ("The water park is refreshing on the hot summer day") induces patterns of facial feedback congruent with the sentence emotionality (smiling), whereas sentences with negative meanings induce a frown. Moreover, blocking facial afference with botox selectively slows comprehension of emotional sentences. Therefore, theories of cognition should account for emotion-language interactions above the level of explicit emotion words, and the role of peripheral feedback in comprehension. For this special issue exploring frontiers in the role of the body and environment in cognition, we propose a theory in which facial feedback provides a context-sensitive constraint on the simulation of actions described in language. Paralleling the role of emotions in real-world behavior, our account proposes that (1) facial expressions accompany sudden shifts in wellbeing as described in language; (2) facial expressions modulate emotional action systems during reading; and (3) emotional action systems prepare the reader for an effective simulation of the ensuing language content.To inform the theory and guide future research, we outline a framework based on internal models for motor control. To support the theory, we assemble evidence from diverse areas of research. Taking a functional view of emotion, we tie the theory to behavioral and neural evidence for a role of facial feedback in cognition. Our theoretical framework provides a detailed account that can guide future research on the role of emotional feedback in language processing, and on interactions of language and emotion. It also highlights the bodily periphery as relevant to theories of embodied cognition.

**Keywords: embodied cognition, language comprehension, simulation, facial feedback, emotion, botox, motor control, constraint satisfaction**

## **INTRODUCTION**

Language can cause powerful and reliable changes in the emotions of readers. A best-selling novel induces similar patterns of emotions across millions of independent readers. Yet, language is ambiguous at every level of analysis (Quine, 1960). How, in the face of this pervasive ambiguity, does language reliably influence our emotions? Proposed constraints in language understanding have ranged from innate, universal knowledge structures (Fodor, 1975, 1983) to probabilistic interaction between levels of linguistic representation (Kintsch, 1988).

For this special issue exploring frontiers in the role of the body and environment in cognition, we propose an alternative framework for describing interactions of language and emotion in which emotion constrains language processing through interactions between central systems for language and emotion processing, and the emotional periphery. In particular, we propose that facial feedback provides a context-sensitive constraint for guiding simulation of actions described in language. By the periphery, we mean aspects of the peripheral nervous system most closely associated with the emotions – the peripheral nerves and musculature of facial expression. The idea of peripheral constraints

in high-level cognition is not new, although early peripheral theories of cognition made only limited progress (e.g., McGuigan, 1966).

Initial support for the account comes from embodied theories of cognition (Glenberg, 1997; Barsalou, 1999) that propose overlapping neural systems for processing both emotions and language about emotions (e.g., Niedenthal, 2007). The hypothesis that language about emotions will engage the same neural systems involved in real-world emotional experience is supported by research showing that lexical processing on words that directly name emotions (happy, sad, etc.) can be affected by emotional states (Niedenthal et al., 1997), and that strongly emotional words activate central circuitries of emotion (Citron, 2012). However, because existing theories have focused on language at the lexical level, they can't readily explain effects of emotions in language that doesn't explicitly describe emotions. While some parts of the neural systems for emotion and language may overlap, others may be dissociated, and natural discourse likely includes all possible combinations. Here we focus only the most difficult case for a theory of language and emotion – the case where genuine emotion is felt at the periphery even though the driving sentence does not

contain an emotion word. This approach allows us to account for findings that are not easily explained by existing accounts of emotion and language, and it generates novel predictions about the interaction of emotion and language.

Our account differs from previous embodied theories by focusing on how emotion influences language processing above the lexical level. Rather than proposing a common neural substrate for emotion and language,we suggest that emotion states influence the simulation of actions described in language. We articulate this claim by building on mechanistic theories of motor control and simulation that explicitly provide a role for peripheral feedback in ongoing behavior. Doing so allows us to explain evidence that emotion states impact language that is not explicitly emotional. Previous accounts are unable to explain such evidence because they fail to consider how emotion impacts language above the lexical level, and because they rely on the claim about overlapping neural systems for emotional language and states of emotion.

The account carries three important assumptions about how emotion interacts with written language (although the account may also apply to verbal language understanding). All three assumptions are based on a functional view of emotion (e.g., Frijda, 1986, 2007; Levenson, 1994; Keltner and Gross, 1999; Barrett, 2006) that propose emotions produce physical changes in the body for guiding effective actions in the world. First, facial expressions accompany sudden shifts in wellbeing as described in text, much as they accompany sudden shifts in wellbeing in real-world situations. Second, facial expressions modulate emotional action systems during reading, much as they modulate emotional action systems in real-world behavior. And third, emotional action systems prepare the reader for an effective simulation of the ensuing language content, much as they prepare the organism for effective real-world actions. In short, peripheral expressions of emotion constrain language comprehension, just as they constrain effective actions.

To support the theory, we have organized the paper into two halves that each focus on one of its main claims. The first half addresses the claim that the emotional periphery has a functional role in language comprehension. We draw on research regarding the role of bodily feedback in language comprehension, evidence for emotion-language interactions from embodied cognition, and evidence from facial feedback theories of emotion. We give special attention to a recent theory of language, the Action-Based Language (ABL, Glenberg and Gallese, 2012) theory that provides a mechanistic framework for describing peripheral-central interactions in language processing. To elaborate the theory, we consider modifications of the ABL framework that lead to testable predictions for future study. The second half of the paper addresses the claim that emotions constrain language comprehension. We review evidence that emotion constrains action, cognition, and simulation, and we address the neural systems that are likely involved in this function. We begin by reviewing the evidence from embodied theories of language comprehension.

#### **A ROLE OF THE PERIPHERY IN LANGUAGE**

#### **EMBODIED THEORIES OF EMOTIONAL LANGUAGE COMPREHENSION**

Embodied theories of cognition provide a straightforward explanation for the close link between language and emotion. These theories suggest that language processing involves a mental simulation grounded in bodily and neural states of action, perception, and emotion (Glenberg, 1997; Barsalou, 1999; Havas et al., 2007). By simulation, such theories generally mean a representation of the situations, objects, or events described in text that is instantiated in the same neural systems used in original experience. By grounding, it is meant that semantic processing involves modality-specific symbols, rather than abstract, arbitrary, or amodal symbols as proposed by classical theories of language (Barsalou, 1999). Thus, language about action and perception involves the same neural and bodily systems used in action (Glenberg and Kaschak, 2002; Hauk et al., 2004) and perception (Pecher et al., 2004; Kaschak et al., 2005; Tettamanti et al., 2005; Rüschemeyer et al., 2010).

To develop the claim that comprehension of emotional language involves emotion simulation, Havas et al. (2007) measured the time needed to comprehend sentences describing emotionally laden events when the participant was in a matching or mismatching emotional state. Sentences, while emotional, made little or no reference to emotion states. An example pleasant sentence is, "You and your lover embrace after a long separation." An unpleasant sentence is, "The police car pulls up behind you, siren blaring." They covertly manipulated emotion using a procedure developed by Strack et al. (1988) which involves holding a pen in the mouth to produce either a smile (holding the pen using only the teeth) or a frown or pout (holding the pen using only the lips and not the teeth). This procedure has been shown to reliably influence positive and negative emotional experiences in the absence of conscious mediation (Adelman and Zajonc, 1989). They expected an interaction such that the processing of pleasant sentences would be faster when the pen is held in the teeth (and participants are smiling) than when the pen is held in the lips (so that smiling is prevented), and vice versa for the time to process unpleasant sentences. This is precisely what was found, both when participants were asked to judge the emotionality of the sentences, and when they were asked to simply read the sentences.

Why should being in a particular emotional state facilitate comprehension of the sentence? As suggested above, one possibility is that simulation occurs at the lexical level. Emotion words might activate central emotion systems that are potentiated by a matching emotional state (but not by a mismatching emotional state). This account is consistent with lexical priming theories of emotion-cognition interactions (Bower, 1981, 1991), in which the pen manipulation activates an emotion concept (e.g., "happy"), which then primes words associated with that emotion. Words that occur in pleasant sentences might elicit more positive emotional activation or less negative emotional activation than words that occur in unpleasant sentences.

In a subsequent experiment, Havas et al. (2007) used the pen manipulation in a lexical decision task to test the lexical priming account of their findings. They used words taken from their stimulus sentences that were rated as being "central to the meaning of the sentence," as well as strongly emotional words taken from an emotion-word database. Although lexical decision times for words were speeded when preceded by semantically associated words (a classic priming effect), they were not speeded by the pen manipulation. Thus, a simple mood-priming account based on facial feedback is unlikely to explain the results.

Here, we develop an alternative, supra-lexical account of emotion simulation that focuses on the role of the peripheral-central interactions in grounding emotional language. We propose that emotion states of the body are called upon in real-time processing of emotional language, and that feedback from these states helps constrain subsequent simulation of the language content. Although we agree that modality-specific systems are involved in language processing, and that partially overlapping neural systems are involved in both emotional experience and emotional language processing, this account differs from previous accounts in two ways: first, it provides a framework for examining emotion-language interactions above the lexical level and second, it extends emotional grounding beyond central processing systems to account for influences of the emotional periphery.

Our account begins by integrating evidence for peripheral influences in language and emotion.

#### **EVIDENCE FROM EMBODIED THEORIES OF COGNITION**

How strong is the evidence for a role of the periphery in language comprehension? There is evidence from motor cognition research that peripheral action systems play a part in simulation (e.g., de Lange et al., 2006), but the equivalency of simulation in motor imagery and language processing is unclear (Willems et al., 2009). While embodied theories of language have provided strong evidence for interactions in the central nervous system between linguistic and non-linguistic neural processes, evidence for peripheral influence in language processing is weaker. For example,Zwaan and Taylor (2006) asked participants to turn a dial clockwise or counterclockwise as they read through a text. When the required hand movement conflicted with the action described in the text (e.g., "turn the volume down low"), the phrase took longer to read. The authors explain this finding in terms of ideomotor theories (e.g., Greenwald, 1970) in which the idea of an action (reading the sentence) potentiates its execution. Presumably, peripheral activity interacts with simultaneous central motor planning processes involved in imagining the actions conveyed by the sentence, although explanations based on central motor planning processes are also plausible.

A stronger example is based on a study of the impact on perceptual judgments of lifting actions, which are heavily shaped by proprioceptive feedback (Hamilton et al., 2004). Observers lifted a weight while they simultaneously judged a weight being lifted in a video. When the observers' weight was lighter than that in the video, they tended to overestimate the observed weight, and when their weight was heavier, they tended to underestimate the observed weight. This finding is surprising because it runs counter to the intuitive prediction that one's own movements should prime the interpretations of another's actions. Instead, the results demonstrate a repulsion effect where the neural feedback of an action is dedicated to one task (lifting a weight), it is presumably unavailable for another task (visual judgment of weight), and this biases the perceptual judgment in a direction away from the current action.

A similar repulsion effect in language comprehension was reported by Scorolli et al. (2009). They tested for an impact of sentence processing on lifting actions. A priming based account would predict that a sentence describing the lifting of a light object (e.g., pillow) would prime underestimates of the weight and result in faster lifting, whereas a sentence describing the lifting of a heavy object (e.g., tool chest) would prime overestimates of the weight, and slower lifting. After participants heard a sentence describing the lifting of a light object, they tended to lift light boxes slower (as if they overestimated the weight) and heavy boxes faster (as if underestimating the weight), and vice versa for sentences describing the lifting of a heavy object. While it's possible that the interactions occur solely in central processing, these findings suggest that simulation in language comprehension is sensitive to concurrent feedback from the body.

More compelling evidence that peripheral feedback plays a functional role in language comprehension comes from two studies using emotional language (Havas et al., 2010). First, electromyographic recording of facial muscle activity (EMG) during language comprehension showed that comprehension of emotional language generates corresponding emotional facial expression. Muscle activity was recorded from the specific facial muscles for producing angry and sad expressions (corrugator supercilii), and happy facial expressions (orbicularis oculii, and zygomaticus majoris) while participants read angry, happy, and sad sentences, and pressed a button when the sentence had been understood. The dependent variable of interest was the activity of the three muscle groups between sentence onset and when participants pressed a button indicating they had read it. Stimulus sentences made little or no reference to emotions or emotion states: an example of a happy sentence is "The water park is refreshing on the hot summer day," a sad example is "You slump in your chair when you realize that all of the schools rejected you," and an angry example is "The pushy telemarketer won't let you return to dinner."

As predicted, facial muscles responded in an emotion congruent way to the sentences (see **Figure 1**). In the corrugator (frown) muscle, activity was greater for sad and angry, than for happy, sentences and vice versa in orbicularis and zygomaticus (smiling) muscles. Moreover, although the average reading times were several seconds, the muscular differentiation occurs rapidly – within 1000 ms of sentence onset.

A second, critical experiment asked whether peripheral feedback from emotion expression has a functional role in understanding emotional language. That is, does peripheral feedback from the facial expression contribute to language processing? For the study, first-time cosmetic surgery clinic patients about to receive botox injections in the corrugator muscle for treatment of glabellar (frown) lines were recruited. There were two reading sessions, just before botox injection and then 2 weeks after, wherein participants read the angry, sad, and happy sentences used in the above EMG experiment. Botox is a highly potent neurotoxin that causes temporary muscle denervation, and blocks muscle feedback by preventing release of acetylcholine (ACh) from presynaptic vesicles at the neuromuscular junction. Botox has also been shown to affect the intrafusal junction, reducing tonic afferent discharge (Rosales et al., 1996). Muscle relaxant effects result from the decrease in extrafusal muscle fiber activity and muscle strength within 1–3 days of injection, with peak weakening at around day 21 (Pestronk et al., 1976). It was predicted that paralysis of the muscle used in expressing emotions of anger and sadness would selectively affect comprehension of angry and sad, but not

**FIGURE 1 | Facial EMG change in microvolts from baseline (1000 ms before sentence onset) for emotional sentences across sentence quarters, and overall (inset; vertical bars represent mean EMG change during sentence presentation, and horizontal bars indicate significant comparisons) from Havas et al., 2010.** Activity in muscles for frowning (corrugator) and smiling (orbicularis and zygomaticus) diverges rapidly after onset of happy, angry, and sad, sentences. The fourth sentence quarter corresponds to participants' pressing of a button to indicate they understood the sentence. Sentence presentation durations have been standardized.

happy, sentences. As predicted, paralysis of the corrugator muscle selectively slowed comprehension of angry and sad sentences relative to pre-injection reading times, but happy sentences weren't affected.

This finding provides strong evidence for peripheral emotional feedback in language comprehension, but it is consistent with two accounts of emotion simulation. First, botox could have influenced participants' mood, perhaps by releasing them from anxiety, and this change in mood differentially primed the words found in the emotional sentences. This mood-congruency account is consistent with that of Bower (1981, 1991) and Niedenthal (2007) in that secondary, central changes in mood state drive the observed interaction. However, mood measures taken at each reading session showed no change in negative affect and a decrease in positive affect. Thus, the evidence supports a second account: that emotional feedback constrains simulation of the actions and events described in the language.

#### **EVIDENCE FROM FACIAL FEEDBACK THEORY**

Support for this conclusion comes from facial feedback theories of emotion. Darwin (1872/1998) laid the foundation for research on the role of feedback in emotion, stating "The free expression by outward signs of an emotion intensifies it. On the other hand, the repression, as far as possible, of all outward signs, softens our emotion" (p. 22). William James (1884) directed attention of emotion researchers to the autonomic nervous system (ANS) and viscera as a source of emotions, initiating a vigorous debate about the informational adequacy of the viscera in producing differentiated emotionalfeelings. However,James (1884) had intended to include motor, as well as visceral, feedback in his theory (p. 192). Allport (1924) carried this idea forward, suggesting that autonomic patterns differentiated only pleasant and unpleasant emotions, but that the somatic system further distinguished emotions within each broad class.

Tomkins (1962) and Gellhorn (1964) were the first to emphasize a crucial role of facial feedback in emotion experience. Tomkins argued that because the nerves of the face are more finely differentiated, they provide more rapid and complex feedback to central brain mechanisms than do the viscera. He also noted that facial expressions precede visceral changes during an emotion episode. Gellhorn (1964) suggested a neurophysiological route via the hypothalamus by which finely tuned facial feedback influenced cortical processing of emotion. Izard (1977) further contextualized the role of facial feedback by describing it as a necessary, but insufficient, component of emotion experience. Still, he agreed that differentiation in consciousness of emotions depends on the rapid and specific sensory feedback from the face.

Paul Ekman (1992) updated James' model of emotion, proposing that emotional situations trigger facial reactions, which then trigger specific patterns of autonomic response, and the combined somatic and autonomic patterns constitute emotional states. A good deal of evidence supports Ekman's view. First, Robert Levenson and colleagues have provided strong evidence that distinct emotional facial expressions produce differential ANS activity (Ekman et al., 1983; Levenson et al., 1990; Levenson, 1992). They used the directed facial action task, in which participants are instructed to pose their face into a prototypical facial expression. As a result, the subjects show emotion-specific ANS patterns, and report experiencing the expressed emotion (Levenson et al., 1990). In addition, similar facial responses are observed across diverse cultures (Ekman and Friesen, 1971; Ekman, 1972), suggesting that facial expressions reflect a universal, functional adaptation.

This function may be inherently social. A proposal from social cognition research suggests that emotional expressions may transmit automatically across individuals through a mechanism of "emotional contagion" (Hatfield et al., 1994). Studies have shown that observing facial expressions automatically activates facial mimicry in the observer's expressions (Dimberg, 1982; Hatfield et al., 1994), even in response to subliminally presented stimuli (Dimberg et al., 2000). Thus, feedback from the mimicry of another's emotion expression may produce a similar emotion state in the observer, allowing for the automatic and implicit convergence of emotions across individuals. Neuroimaging studies show that areas consistently found to be involved in both observation and execution of facial expression include emotional processing regions of the brain, like the amygdala, insula, and cingulate gyrus, as well as motor areas (Molenberghs et al., 2012). Recent efforts to focus on the neural correlates of automatic facial mimicry (as opposed to mere observation) have combined brain imaging with facial EMG. So far, these studies have reliably found automatic facial mimicry to engage the same emotional brain networks, including the amygdalar region, insula, and cingulate cortex (Schilbach et al., 2008; Heller et al., 2011; Likowski et al., 2012). The relevance of these brain areas to the present theory will be discussed in greater detail below.

Despite the strong evidence for a causal role of facial expressions in emotional processing, theorists differ as to whether this relationship is due to facial feedback (Tomkins, 1962; Laird, 1974; Izard, 1991), or facial efference (motor output). For an instance of the latter, Ekman (1992) argues for a central, direct connection between motor cortex and other brain areas involved in coordinating physiological changes. The controversy has persisted mainly because these two possibilities have been very difficult to separate experimentally, although progress may be made through methods that manipulate facial feedback more precisely (i.e., with botox; Havas et al., 2010). For example, a neuroimaging study showed that botox-induced paralysis of the corrugator muscle 2 weeks prior to an facial expression imitation task reduced activation in neural centers involved in emotion processing (namely, amygdala, and orbitofrontal cortex), relative to activation in the same subjects before injection (Hennenlotter et al., 2009). In addition, they found that botox treatment reduced the functional connectivity of the amygdala with the dorsolateral pons, a brain stem region implicated in control of autonomic arousal (Critchley et al., 2001). Results of this type provide convincing evidence for the role for facial feedback in modulating central circuitries of emotion.

An important recent finding is that facial feedback effects may be largest during processing of ambiguous emotional stimuli. This idea echoes those of earlier theorists that assign facial feedback to tasks involving more finely differentiated emotions (e.g., Allport, 1924; Izard, 1977). Using a quasi-experimental design, Davis et al. (2010) compared self-reported emotions in subjects who chose facial botox injections to subjects who chose control injections that do not paralyze the facial muscles. Subjects rated their reactions to emotional video clips of varying valence and intensity both before and after injections, but because injections were administered in muscles used in both positive and negative emotions, results were interpreted only in terms of the overall magnitude of emotional experience rather than the valence. Overall, they found that botox injections reduced the magnitude of emotional response to the video clips relative to control injections (of cosmetic filler that doesn't affect muscle activity). However, the reduction occurred only for video clips of mild positive intensity and not for strongly positive and negative clips. The authors suggest that the emotionality of strongly emotional clips is over-determined by responses of other, perhaps visceral, emotion systems.

Another recent study demonstrates that manipulation of facial feedback (both blocking and enhancing) impacts processing of ambiguous emotional stimuli – in this case, pictures of emotional faces. On the basis of findings that facial mimicry enhances perception of emotions in the faces of others (Goldman and Sripada, 2005),Neal and Chartrand (2011) asked participants to decode the expressions in pictures of faces. The facial expression stimuli were ambiguous in this task because they were completely obscured except for the region directly surrounding the eyes. In their first study, they compared the effects in patients with botox injections in facial muscles to those in a control injection that did not impact facial feedback. Accuracy in emotion perception was lower in patients whose facial feedback was blocked. In a second study, the authors determined that the reverse was also true by amplifying facial feedback using a restricting facial gel that produces muscle resistance, known to increase proprioceptive feedback. Performance accuracy in the emotion recognition task increased relative to control participants, but this difference was absent in controls tasks that are not supposed to involve facial mimicry.

In sum, the evidence from embodied cognition and from facial feedback theory suggests a functional role for facial feedback in emotion processing tasks, and perhaps particularly in tasks that involve automatic processing of emotionally ambiguous stimuli. Given this evidence, there are at least two ways in which facial feedback might influence a simulation of emotional sentences. First, facial feedback might contribute to a simulation by generating activity in modality-specific (i.e., emotional) systems of the brain. For example, feedback from a frown might potentiate the neural systems involved in sad or angry moods, and would thus enhance the recognition of language describing sadness or anger. However, this account fails to explain the absence of an effect of the pen in processing individual words from the study of Havas et al. (2007). Furthermore, this account fails to explain the absence of moodcongruent changes in the study of Havas et al. (2010). Evidence that botox injections selectively impact mood in non-clinical subjects is scant<sup>1</sup> . One study shows that patients who received botox injections in the frown muscle report normal levels of depression and anxiety compared to patients receiving other cosmetic surgery treatments who score in the borderline morbid range on these measures (Lewis and Bowler, 2009). However, because this study was correlational in design, participant self-selection cannot be ruled out. An alternative account that is consistent with the functional view of emotions outlined here is that emotion feedback allows context-sensitive modulation of perceptions, actions, and the simulation of actions in language. In this account, facial feedback contributes a highly sensitive source of information about the affective potential of the linguistic context that serves to constrain action simulation.

To understand how emotional feedback might constrain the simulation of action, we turn to a language-processing framework that explicitly provides a role for feedback in language: Glenberg

<sup>1</sup>Although evidence for mood changes from conscious, self-generated expressions is stronger (e.g., Duclos and Laird, 2001), these data may not bear on the specific, unconscious mechanisms we are able to isolate with botox. Furthermore, any mood changes due to facial feedback may be secondary to the amygdala-mediated changes that we propose occur in language processing.

and Gallese's (2012) Action-based Language theory (ABL). The ABL theory is based on internal models framework of motor control in which bodily feedback contributes to the acquisition and updating of an internal representations for motor control. Glenberg and Gallese show how the framework (and peripheral feedback) can be applied to language, and they provide an explicit definition of simulation in language comprehension. After describing this work, we propose a modification of the ABL model for emotional language comprehension. By building on the ABL model, we aim to firmly ground our account in theories of action and to be explicit in our assumptions.

#### **THE INTERNAL MODELS FRAMEWORK**

Computational approaches to motor control propose that the brain uses internal model for the control of behavior (Wolpert et al., 2001, 2003). Forward models (or, predictors) provide a model of the relation between a motor command and the sensory (vision, proprioception, touch, etc.) consequences of that movement. The function of a predictor is to predict these sensory consequences so that, given a particular motor command, the sensory outcome can be anticipated. A predictor might model, for example, the sensory consequences of lifting a cup to drink (that it will be heavy with water).

On the other hand, inverse models (or, controllers) do the inverse – they compute the context-sensitive motor commands necessary to accomplish a particular goal. A controller might model, for instance, the trajectory, and velocity of arm movements needed to lift a cup to drink. A biologically plausible account of how controllers are formed is through feedback error learning (Kawato, 1990, 1999). FEL uses performance error, or the difference between the desired and actual trajectory, for learning how to control movements. Much like the cruise control in a car, a controller monitors sensory feedback and continually adjusts motor output in order to maintain the desired outcome. Through this feedback error computation, the controller learns a functional mapping from motor commands to goal-based actions.

For control of simple actions like reaching for a cup, multiple predictor-controller pairs, or modules, are used (see **Figure 2**; Wolpert and Kawato, 1998; Wolpert et al., 2003). But even simple actions are ambiguous. For example, lifting a cup when it is full has different dynamics than when the cup is empty, so different modules will be needed for each contingency and several modules may initially become activated<sup>2</sup> . The actual motor command is a weighted function of the outputs from the active controllers where the weighting of each controller is determined by two factors: the prior probabilities that each module is actually appropriate for the current context (the object affords action; Gibson, 1979), and the posterior probability, which is determined by prediction error. For example, if the selected module was not correct, then the

**FIGURE 2 | A simplified internal models framework based on Glenberg and Gallese's (2012) ABL model.** Here, we add a signal for learning to predict the reward of actions. Multiple modules, composed of paired predictors and controllers, anticipate the sensory and affective consequences of actions. Prediction error, derived from the actual sensory and affective consequences, drives learning in the controller and adjusts the responsibility for a particular module. As in Glenberg and Gallese's model, actual motor output is a weighted function of modules, higher-level modules provide hierarchical control of goal-based actions in the form of prior probabilities that influence lower-level module selection, and a gain controller is added for simulation in language comprehension.

prediction error will be large and this will decrease the module's responsibility weighting. Thus, bodily feedback provides an ongoing signal for deriving contextually appropriate actions in realtime motor control. Bodily feedback be particularly important when dealing with novel contexts. Recent evidence suggests that feedback gains are increased during early stages of learning when the appropriate controller is ambiguous (Franklin et al., 2012).

For goal-based actions, Wolpert and colleagues (Haruno et al., 2001) have proposed that higher-level modules for goal-based action (say, drinking) learn to coordinate a sequence of lowerlevel actions, like reaching to grasp, lifting a cup, and taking a drink. The higher-level controller generates prior probabilities that lower-level modules are needed, while the higher-level predictor predicts the posterior probabilities that lower-level modules are accurate. This hierarchical organization reflects the neural architecture of the motor control system where at higher cortical levels the motor system is organized into actions rather than individual movements (e.g., Umilta et al., 2008). This feature allows the motor system to create combinations of elementary units that are contextually appropriate, or that satisfy multiple simultaneous constraints. It has been noted that both language and motor control share this quality (McClelland et al., 1988).

#### **THE ACTION-BASED LANGUAGE THEORY**

In their ABL theory, Glenberg and Gallese (2012) propose that the same solution used in motor control was exploited through

<sup>2</sup>Although modules are unnecessary in dynamical systems approaches to motor control (e.g., Churchland et al., 2012), Wolpert and Kawato (1998) discuss several advantages to using a modular approach. One important advantage for present consideration is that language is modular, or conveyed by discrete units in the form of phonemes or words. Productivity in language is accomplished by combining these discrete units, from different levels, in novel ways (Hockett, 1960), much as Wolpert and Kawato propose for the production of novel movements.

evolution by language. They link language and action through the neural overlap between the mirror neuron system for action and Broca's area in the inferior frontal cortex (IFG) for speech articulation (see also Fadiga et al., 2006). The mirror neuron system encodes motor intentions (either observed or executed), including the motor intentions behind heard or observed speech acts. Because in human development, motor actions often co-occur with speech (e.g., a parent might say the word for an action while they demonstrate that action to a child) speech articulation primes motor action, and vice versa, through associative Hebbian learning. For example, the module for articulating a word like "drink" is associated through social development with the module for the motor actions involved in drinking. Likewise, language about a "girl"activates the module used to predict the sensory consequence of moving the eyes to see a girl illustrated in a children's book.

For their model of language comprehension, Glenberg and Gallese add a gain control for the gating of sensory feedback, and for inhibiting motor output in "offline" simulation, imagery, planning, practice, and language (see **Figure 2**). For example, if the gain is set to inhibit motor output, but the predictor is free to make sensory predictions, then the output resembles mental imagery<sup>3</sup> .

As Glenberg and Gallese illustrate, the ABL model gives an account of simulation in comprehending a sentence like, "The girl takes the cup from the boy." First, motor output gain is set low to avoid literally acting out the actions described in the language. Upon hearing words for objects or individuals (e.g., "The girl"), speech action controllers are activated, which in turn activate the associated action controllers for interacting with those objects or individuals. Output from the controller produces a prediction of the sensory consequences of such interaction, akin to mental imagery of the objects or individuals. Upon hearing verbs (e.g.,"takes"), speech controllers pass activation to multiple possible action controllers that could fulfill the goal of the action (e.g., reach to grab, and the controller is selected according to the prior probability that it can fulfill the goal. After processing an image of "the cup," selection of the next controller is weighted by prior probabilities for the objects of such actions (e.g.,"from the boy" affords receiving a cup, whereas "from the tree" does not). Importantly, the prior probability assigned to each action controller depends in part on the action's affordances (Gibson, 1979) for fulfilling the higher-level goal conveyed by the language (to drink). As Glenberg and Gallese put it,"comprehension is the process of fitting together actions suggested by the linguistic symbols so that those actions accomplish a higher-level goal . . ." (Glenberg and Gallese, 2012, p. 12–13). If goal-based actions can't readily be integrated ("The girl takes the cup from the tree"), then comprehension is challenged.

#### **INTERNAL MODELS FOR EMOTIONAL LANGUAGE**

Although not addressed by Glenberg and Gallese, we believe the internal models account of comprehension carries an important additional implication regarding cases where comprehension is challenged, or in the language of the framework, where there is a failure to select modules that fulfill the higher-level goal of the language. Such cases should result in performance error and a consequent adjustment in controller output. In online motor control, feedback from such controller output provides contextual information for adjusting the unfolding action. In online language understanding, controller output could serve a similar function for guiding an unfolding simulation. Context should be particularly useful when the actions needed to simulate the meaning of the sentence are ambiguous, or underspecified in the language. Context, which we take here to mean the current state of body-world interactions (or affordances), helps to guide the selection of an appropriate controller. Thus, the model suggests that language will call on the body when comprehension is challenged by underspecified affordances for action-object integration.

This implication suggests a way that emotions interact in language. The following proposal rests on the assumption that emotions accompany a sudden change in wellbeing relative to the current state, and that they automatically lead to actions that can capitalize upon, or mitigate, that change (see also Frijda, 1986, 2007). To illustrate this assumption, imagine encountering a bear while walking in the woods. The experience would automatically engage modality-specific neural systems, including emotion systems that motivate actions. Quickly, both the body and brain would be reconfiguredfor taking adaptive actions.And because the body has changed, the affordances of the situation have changed: a walking stick in your hand may now be readily perceived as a potential weapon for defense. As this scenario illustrates, the most effective action in an emotional situation is determined by the combination of changes in bodily preparation for action, and the affordances provided in the environment. The neurophysiological bases for such changes are discussed in the following sections.

In understanding a sentence, affordances for effective action must be provided by the language. We propose that language that describes a change in the state of wellbeing that invites but underspecifies effective action will make module-selection difficult, and this will lead to an increase in motor output in the form of facial patterns that reflect an estimate of the affective change described in the language (e.g., improvement or decline is reflected by a smile or a frown, respectively)<sup>4</sup> . For example, a reader can only infer the most effective actions when understanding the meaning of a sentence like "The water park is refreshing on the hot summer day." Effective actions might include wading, and splashing in the water – actions that would allow someone to capitalize on the potential for relief from heat, as implied by the sentence. Because understanding the language requires that the reader infer these actions (they are not made explicit in the language), the result will be facial afference in the form of a smile. By extension, language that describes a shift in the state of wellbeing in which effective action is over-determined may not elicit facial efference. The effective actions in the sentence,"You slam on the brake and curse when a driver cuts you off," are already well specified. Although the language is emotional in both cases, we hypothesize that the former sentence should lead to greater facial efference than the latter.

<sup>3</sup> In most cases, the gain control inhibits most movement, but some movement may not be completely inhibited, as seen in gesture that accompanies speech. Gesture research has shown that tasks that involve more strongly simulated actions are more likely to evoke speech-accompanying gestures (Hostetter and Alibali, 2008, 2010), even when communicative demands are held constant across tasks.

<sup>4</sup>Because facial muscles produce tonic afferent discharge, a decrease in facial muscle output would be informative for module-selection as well.

Although our account is speculative, the previous sections have reviewed a wide range of evidence for a key feature of the theory – a role for the emotional periphery in language comprehension. The following sections review a wide range of support for a second key feature of the theory – that emotion constrains language comprehension. To bolster the claim, we first show how emotion constrains action, cognition, and simulation. We then address the neural bases for emotion constraints in language comprehension before we consider additional features of the theory.

## **EMOTION CONSTRAINS LANGUAGE COMPREHENSION EMOTION CONSTRAINS ACTION**

Most likely, emotions evolved to prepare organisms for effective actions. When we are angry, our fists clench, our heart rate is increased, and we are prepared for aggressive or defensive actions. When sad, our posture deflates, our heart rate decreases, and we experience loss of energy. In short, our emotions constrain our future possibilities for action.

Early emotion theorists recognized that different emotions correlate with distinct changes in the body. Following James (1884) infamous emotional feedback theory in which he equated bodily feedback with the subjective experience of emotion, appraisal theorists (e.g.,Arnold, 1960; Frijda, 1986, 2007) proposed implicit cognitive processes that mediate an emotional stimulus and bodily response. On the other hand, strong theoretical arguments (Zajonc, 1980; Murphy and Zajonc, 1993) and neuroscientific evidence suggest that emotional situations can organize action systems directly without any intervening cognitive processing.

Working in rats, LeDoux (1996, 2002) identified the amygdala as a critical structure in mediating fear learning. The central nucleus of the amygdala initiates fear responses, including freezing, escape, and autonomic changes, and the basal nucleus projects to motor circuits in the ventral striatum where information about an aversive stimulus contributes to action selection (Alexander and Crutcher, 1990). Because the pathway from thalamus to the amygdala bypasses the cortex and is thus more direct than the cortical route, it provides a neural mechanism by which emotional situations directly influence emotional behaviors, bypassing cognitive processes.

Regardless of whether amygdala activation from emotional stimuli arises in humans via direct or indirect pathways (for debate on this question, see Pessoa and Adolphs, 2010; Cauchoix and Crouzet, 2013), the critical finding for the present purpose is that activity in the amygdala appears to correspond to changes in the current state of wellbeing. In monkeys, the amygdala has been shown to be highly sensitive to the value of a reward relative to the current state of the body (Paton et al., 2006; Belova et al., 2007, 2008). In humans, a similar mechanism has been demonstrated with a procedure called backwards-masking, where an emotionally arousing stimulus is presented very briefly and is then followed by a neutral stimulus that blocks the emotional stimulus from entering consciousness. Such unconsciously presented fearful stimuli have been shown to cause increases in skin conductance and heart rate that reflect autonomic arousal (Esteves et al., 1994). The specific brain changes that occur during unconscious emotion processing have been examined by combining the backwards-masking procedure with fMRI. When participants in an fMRI scanner are presented with pictures of either fearful or happyfacesfor a subliminal duration,followed by neutralfaces, the subliminally perceived emotionalfaces cause differential activity in the amygdala (Whalen et al., 1998). Fearful masked faces increased amygdala activity, whereas the happy faces decreased amygdala activity. Thus, cross-species evidence indicates that emotional stimuli organize action system immediately, sometimes unconsciously, to fulfill the goals at hand. Action is central in emotion in part because emotional responses are implemented in the form of action tendencies, or bodily responses that potentiate adaptive actions. That is, emotions constrain bodily actions.

There is evidence that the amygdala also responds to changes in wellbeing that are signaled by symbolic or linguistic stimuli. Phelps et al. (2001) told participants that they might receive an electric shock to the wrist paired with one stimulus (a blue square), but that another stimulus (a yellow square) signaled that no shock would occur. Using fMRI, they found that presentations of the symbol connoting threat preceded activation of the left amygdala, which correlated with the physiological expression of fear learning. They also found a correlation between the expression of fear and activity in the left insula, an area involved in cortically representing the affective state of the body. This suggests that the left amygdala is involved in the expression of fears and associated bodily states that are imagined through the use of symbols. Amygdala activation has consistently been observed in response to the presentation of emotional words (reviewed in Citron, 2012), and during reading of emotionally intense narratives (Wallentin et al., 2011).

Based on this association with emotional language comprehension in humans, we propose that the amygdala encodes changes in wellbeing described in language. For example, amygdala responses to reading about a sudden improvement in outlook ("Incredibly, the numbers drawn all match those on the ticket in your hand") marshal autonomic (perhaps parasympathetic) resources involved in joy, whereas amygdala responses to reading about a sudden decline in wellbeing ("Your grandmother had a stroke and is in critical condition") elicits other, perhaps sympathetic, changes in the ANS). These autonomic modulations serve to constrain the possibilities for action, and thus constrain the possibilities for action simulation.

A defining feature of emotions is that their effects are often systematic, phasically influencing a range of actions in a hierarchical manner (Panksepp, 1998). The ANS regulates cardiovascular, gastrointestinal, electrodermal, respiratory, endocrine, and exocrine organs in support of action responses to challenge and opportunity (Levenson, 1992, 2003). Several theorists have proposed that emotions are organized at higher functional levels, constituting two basic motivational circuits (Lang et al., 1990; Davidson, 1992; Gray, 1994). For example, Lang and Bradley have proposed that emotions are organized around two motivational systems, appetitive and defensive, mediated by distinct systems at cortical and limbic levels (Lang et al., 1990; Bradley et al., 2001). In terms of actions, this division translates roughly into behaviors of approach and withdrawal, respectively, where appetitive activation generally leads to approach behaviors, and defensive activation generally leads to withdrawal behaviors (Davidson, 1992, 1995, 1998).

An important consequence to this hierarchical organization is that emotions constrain actions in a probabilistic, rather than deterministic,manner. Top-down emotional constraints on action will be modified by bottom-up constraints of the environment. Thus, emotion states don't correspond to specific actions, but rather something much like action tendencies, so that the same emotion state may lead to categorically related but unique actions depending on the particular context. For instance, at the highest level of organization, motivational engagement of the defensive system may prompt different emotion states like fear or anger, depending on whether the situation calls for flight or fight (Lang et al., 1990). And at a lower-level of emotional action, anger may or may not lead to striking out, depending on whether the confrontation escalates or is averted. Thus, effective actions are jointly influenced by underlying emotion states and the sensorimotor affordances that arise in the situation (Gibson, 1979). In our formulation, these joint functions are served by the global autonomic changes elicited by the face, and the simulation of action as guided by the language. Next,we discuss evidence and theory that emotion is capable of constraining cognition.

#### **EMOTION CONSTRAINS COGNITION**

Several theorists have proposed that emotion systems help guide cognitive processes (Pribram, 1969; Nauta, 1971; Damasio, 1994). Here we only briefly discuss one kind of cognition: decisionmaking. Damasio and colleagues observed that patients with lesions in the prefrontal cortex (ventromedial prefrontal cortex, VMPFC) were severely impaired in personal and social decisionmaking, and in particular have difficulty in anticipatingfuture positive and negative consequences of their actions, in spite of otherwise preserved intellectual abilities, including language (Damasio, 1979, 1994). Their decision-making is often slow and error prone, and sometimes random and impulsive. However, immediately available rewards and punishments do influence their behavior. Whereas most people show increased skin conductance (a measure of autonomic arousal) in anticipation of a risky choice, even before they explicitly know the choice is risky, VMPFC patients do not.

To account for the pattern of deficits, Damasio et al. (1991), Damasio (1994) proposed a somatic marker hypothesis in which the components of a complex experience are recorded in modalityspecific neural systems, and these records become associated with the emotional response that occurred during the experience. The VMPFC is responsible for learning the associations between a complex situation (e.g., walking in the woods and encountering a bear) with the accompanying emotion state (e.g., fear), and for reactivating the emotion state when components of the original experience are later encountered (e.g., seeing the walking stick by the door might reactivate feelings of fear). This function is valuable in that it provides an implicit emotional "marker" which signals the value of each decision before action is taken. Emotion reactivation can occur via a "body-loop,"whereupon the viscera actually change and the ensuing changes are relayed to somatosensory cortices, including the insula. Or, emotional changes can occur via an "as-if-body-loop" where signals are conveyed directly to the cortex, bypassing the physiological changes. Together, the insula and anterior cingulate gyrus may be important in integrating cortically

mediated cognitive functions with somatosensory and autonomic changes (see also Medford and Critchley, 2010).

When do decisions engage the "body-loop" or "as-if-bodyloop"? Bechara and Damasio (2005) suggest the "body-loop" becomes increasingly important under circumstances of uncertainty or ambiguity. For example, normal subjects generate little skin conductance responses during tasks that involve decisionmaking under relative certainty, compared to tasks involving decision-making under ambiguity. It is intriguing to note the parallel with the internal models framework in which peripheral feedback is particularly important during learning of tasks with novel (ambiguous) dynamics.

By providing a representation of "what it feels like" to be in a particular situation, a somatosensory pattern in the insula may be particularly important in constraining a simulation of actions. First, through strong projections to the amygdala, the insula can modulate actions by influencing ANS changes. Second, the emotional somatosensory pattern helps to constrain the process of reasoning over multiple options and future outcomes by marking the sensory components, which describe a related scenario of future outcome, as good or bad. Somatic states influence cognitive processing by acting as a biasing signal, and can be used to rapidly accept or reject certain option-outcome pairs. Without this function, the decision process would depend entirely on logic operations over many option-outcome pairs, which is slower and may fail to account for previous experience – just the pattern of behavior seen in VMPFC damaged patients.

Damasio (1994) proposes that emotional representations for use in social communication have their own distinct structure, the anterior cingulate cortex (ACC), stemming from observations of patients with damage in this area. Whereas damage to the face area of the motor cortex will impact the ability to voluntarily make a smile, it spares the ability to make a genuine, spontaneous smile. Conversely, emotion-related movements originate in the ACC, and patients with damage to this area show abnormal spontaneous facial expressions of emotion, but normal voluntary facial movement.

Damasio's proposed mechanism by which somatic state representations influence cognition is through the activation of neuromodulator nuclei that project to cortical networks. Bechara and Damasio (2005) hypothesize that the biasing action of somatic states on response selection is mediated by the release of the major neurotransmitter systems, dopamine (DA), serotonin (5- HT), noradrenalin (NA), and ACh whose nuclei are located in the brainstem. Changes in neurotransmitter release induced by somatic state signals modulate the synaptic activities of cortical neurons subserving behavior and cognition, thereby providing a mechanism for somatic states to exert a biasing effect on cognition. In their account, these two neural systems of emotion (neuromodulation and somatic markers) interact to provide predictions about "what it feels like" to engage in particular actions. Ascending neuromodulators facilitate computation of future rewards given the current state of the body, thereby constraining action selection in frontal cortices.

Although the somatic marker hypothesis has provided evidence for a constraining role of emotion in one kind of cognitive task that involves simulation (of future rewards in decision-making), additional evidence comes from tasks that more closely resemble language comprehension.

#### **EMOTION CONSTRAINS SIMULATION**

In the view we are presenting, emotional language calls upon emotion systems of the body that constrain a simulation of actions and events described by the text. Our view differs from other simulation accounts in that emotion simulation occurs even in the absence of explicit affective information like emotional words. That is,we assume readers will use their own emotional knowledge to make inferences based on described actions or events that are not explicitly emotional. Thus, readers bring to bear two sources of information in understanding language: external information provided by actions in the language, and internal information provided by an emotional inference mechanism.

This feature of our theory bears a resemblance to theories from several other areas of research, which we briefly mention here. First, discourse comprehension research shows that readers readily bring their knowledge of emotions to make inferences about story characters' emotions (Gernsbacher and Robertson, 1992; Gernsbacher et al., 1992, 1998; Haenggi et al., 1993). Moreover, readers make such emotional inferences just as readily in the absence of explicit emotional information, simply from descriptions of story characters' actions, as they are when emotional information is present (deVega et al., 1996; Gygax et al., 2007). Thus, our theory is congruent with research from discourse comprehension.

An important claim of our view is that readers' emotions serve to constrain interpretation of the language. This idea can be traced back at least to "reader's response" literary theorists who argued that the reader's personal experiences provide the basis for textual understanding (Iser,1978). Some empirical supportfor this notion is provided by theorists of literary appreciation (Miall, 1988, 1995; Miall and Kuiken, 1994) who argued that emotions play a primary role in appreciating literary stories. In one study (Miall, 1988), participants read short stories phrase-by-phrase while reading times were collected. Afterward, participants rated each phrase for its emotional significance ("Is feeling significant to this phrase?"), and correlations between reading times and affective ratings were measured. There was a positive correlation in the early part of the stories where readers are presumably using affect to guide a search after meaning. Correlations became negative later in the story, presumably because affect is now confirming the interpretations set up in the early part. Citing Damasio's patients with VMPFC damage who are unable to select among possible response options, Miall (1995)speculates that in reading literature, this deficit might present as a failure to decide among possible inferences about a sentence in a story. However, because the methods used by literary theorists often focus on post-comprehension processes, they can't speak to how emotional states are generated to begin with. As described above, our view is that facial expressions are generated at points of ambiguity.

Our theory also bears a resemblance to social cognition research into "mentalizing," or the ability to explain and predict behavior of others in terms of one's own mental or emotional states (Frith et al., 1991) and empathy, or the ability to share the feelings of others (Decety and Lamm, 2006). Because the mental states of others are not directly observable, they must be inferred solely on the basis of overt behaviors, or abstract (i.e., verbal) descriptions of those behaviors.Whereas emotional decision-making is associated with the VMPFC, mentalizing from verbal material (i.e., inferring the likely goals, intentions, and desires of people described in stories) reliably engages more dorsal regions of the medial prefrontal cortex (MPFC and the ACC), as reported in a large meta-analysis of neuroimaging studies (Van Overwalle, 2009). Just as somatic state representations in the insular cortex are well suited, both functionally and anatomically, to contribute to decision-making, they may serve to constrain the processes that take place in the MPFC (Augustine, 1996). Functionally, anticipated somatosensory states would provide an experiential basis for predicting the future behavior of others, in much the same way as they help guide one's own subsequent behaviors.

Other research has shown that somatic state representations in the insula might provide a basis for empathy. Neuroimaging studies have shown that the same regions of the insula are active both during experience of aversive events, such as disgust (Wicker et al., 2003) and pain (Singer et al., 2004), and during the observation of those states in others. Overlapping activity in the insula across these divergent modes of experience is thought to indicate a neural mechanism for emotional understanding, and provides initial support for somatic state representations in inferring others' emotions (Wicker et al., 2003).

#### **NEURAL BASES OF EMOTION CONSTRAINTS IN LANGUAGE**

In previous sections, we have mentioned the neural circuits involved in some aspects of our theory. Here, we address two remaining questions. First, how are facial responses elicited by neural processing of sentences? While this question is unexplored in the neuroscientific literature, we propose that facial responses arise in response to sentences that convey a sudden change in wellbeing relative to the current state of the body, and underspecify the appropriate course of action, driving emotional action inferences. Such sentences may produce a state of cognitive conflict about which actions are appropriate for fulfilling the goals in the language. Take the sad sentence (written by an undergraduate research assistant for our EMG and botox studies), "You slump in your chair when you realize all the schools rejected you." For the present purpose, we can consider the higher-level goal of the sentence to be a simulation of the dejection, anguish, and exasperation (and the correlated actions) associated with social and vocational disappointment. Simulating the initial action of the sentence (slumping) will generate a modality-specific prediction of the sensorimotor consequences of the action, including a prediction of withdrawal, or perhaps pain (MacDonald and Leary, 2005), in somatic cortices. But because the reader's actual current somatic state (alertness and engagement as required by the reading task) conflicts with the somatic prediction, a large prediction error will result, forcing a shift in action controllers to simulate the higher-level goal of the sentence. However, effective actions are not specified in the remainder of the sentence, and so the ensuing simulation is faced with a conflict. Here, we propose that a facial expression will be triggered that reflects the direction of the somatic prediction error (a frown). The resulting context-sensitive facial feedback will modulate the emotional state of the body (as

described above), and update the somatic state representation for use in simulation<sup>5</sup> .

We consider the cingulate cortex a likely substrate for mediating facial efference because it is strongly associated with task performance under cognitive conflict (Botvinick et al., 2004),is proposed to underlie the integration of cognitive and emotional processes (Bush et al., 2000), and contains direct projections to the facial nucleus (as recently demonstrated in monkeys; Morecraft et al., 2004). Tasks that involve cognitive conflict elicit facial activity (Schacht et al., 2009). And while positively and negatively valenced words elicit subgenual cingulate cortex activity (Maddock et al., 2003), repetition of emotional words produces a clear habituation response (as reported in Maddock et al., 2003), suggesting that novelty of the emotional stimulus might be important. Interestingly non-verbal emotional stimuli (pictures of facial expressions) do not activate subgenual cingulate cortex (e.g., Maddock, 1999), perhaps because they convey affective meaning directly, whereas emotional words involve a higher degree of semantic inference.

Next, how might somatic state representations constrain action simulation during language comprehension? Given the strong bidirectional connection between the anterior insula and inferior frontal gyrus (IFG), which includes Broca's region in humans (Mesulam and Mufson, 1985; Augustine, 1996), we see at least two possibilities. One is that they provide a modality-specific neural substrate for the representation of emotion states described in language, as predicted by other emotion simulation accounts (Havas et al., 2007; Niedenthal, 2007). If so, then the same region of the insula should be active during both language about emotion and during real emotion. Accordingly, Jabbi et al. (2008) found that a region of anterior insula (extending to inferior frontal operculum) became active when the same participants either felt disgust, saw facial expressions of disgust, or read short passages describing a disgusting situation. The functional overlap supports simulation theories of social cognition in general, although interesting differences between the three conditions were observed in the connectivity findings. Reading passages about disgust uniquely included Broca's area in the left IFG.

A second possibility is that somatic state representations encode autonomic constraints of the body that differentially affect the simulation (and execution) of some actions over others, much as autonomic constraints influence real actions. Thus, somatic state representations would help resolve ambiguity in action simulation. If so, then we would expect that current body states can become rapidly incorporated into online language comprehension processes. Indeed, behavioral evidence has shown that bodily constraints on action are incorporated within early stages of syntactic ambiguity resolution (within 500 ms) during sentence comprehension (Chambers et al., 2004). The insula has a long-standing role in language-related motor control (Dronkers, 1996). A neurodegenerative disease that impacts both the insula and language is progressive non-fluent aphasia (PNFA). Patients with (PNFA) are selectively impaired in sentence comprehension, but spared in single-word comprehension, and other non-linguistic cognitive abilities (Peelle et al., 2008). Although a role of insular cortex in resolving ambiguity during sentence comprehension has yet to be explored systematically, extant data support such a role.

## **UNIQUE FEATURES OF THE THEORY**

Although our account is speculative, it differs from other accounts of emotion simulation in language and thus makes unique predictions. Foremost, emotion influences language processing above the lexical level. Rather than provide a common neural substrate for emotion and language about emotion (e.g., Niedenthal, 2007), somatic state representations influence a simulation of actions as driven by speech action controllers in Broca's area. This account remains congruent with embodied theories of language comprehension because the outputs of action controllers are predictions in modality-specific regions of the brain (Barsalou et al., 2003; Pulvermuller, 2005), and because emotion state constrain a modality-specific simulation. Although the two accounts of emotion simulation make differing predictions,we don't believe they are mutually exclusive, and are rather likely to operate in tandem during online language comprehension.

The model offers an explanation for a range of empirical observations on the interaction of emotion and language comprehension. For example, in the study of Havas et al. (2007), we found an interaction of emotion and language comprehension: body states of emotion (smiling, and frowning) that are congruent with the emotional meaning implied in the sentence facilitate comprehension, whereas emotion states that are incongruent hinder it. Consider reading one of theAngry sentencesfrom that study,"After the fight with the stubborn bigot, you slam the car door." The negative emotional expression produced by holding the pen in the lips activates associated negative state representations (angry or sad) in somatosensory cortices, biasing the selection of effective actions (e.g., aggressive, or defensive actions). Because the body is prepared to produce the kind of actions that are required for understanding the sentence, a simulation of the second half of the sentence ("you slam the car door") is completed with ease. By contrast, a positive somatosensory representation produced by holding the pen in the teeth would hinder the simulation of such actions.

This account also explains emotional interactions during language comprehension when there is no pen to force a facial expression. Here, simulating the action in the sentence produces somatic prediction error, and generates an emotional response in preparation for subsequent understanding. For example, the initial phrase in the sentence, "You slump in your chair when you realize that all of the schools rejected you" will generate emotional afference compatible with the initial decrease in wellbeing. This is the result we found using EMG (reported in Havas et al., 2010).

Finally, we can explain how blocking facial afference that is congruent with the emotionality of a sentence might hinder comprehension. Despite any facial afference generated in processing the angry and sad sentences, botox prevents negative facial feedback from modulating central emotion circuits that would otherwise constrain the simulation. But because happy expressions are unaffected, they are free to modulate central circuits of emotion, and constrain the simulation of happy sentences.

<sup>5</sup> If the predicted somatic state error is small, then a shift in the action controller may not be necessary. For example, if the reader is already in a somatic state congruent with the language, then comprehension processes are predicted to proceed with facility in the absence of facial afference.

By way of comparison with Glenberg and Gallese's (2012) ABL model, we too assume that the solution used in motor control for deriving emotionally appropriate action was exploited through evolution by language. For language, our theory works much like the ABL model in that modules are grounded in actions and sensory predictions, a gain control mechanism suppresses literal execution, and controller output is tantamount to a simulation. However, there are also several features that are new in our theory. Foremost, selection of modules for running a simulation of language is determined not just by motor prediction error, but also by a somatic error signal. Thus, an extension of the ABL model for emotional language comprehension would add a forward model that learns to predict future somatic states that result from actions. Action controllers for simulation are jointly determined by the operation of both types of predictor that work in a complementary way to determine the relative goodness of particular actions. Where the predictors are uncertain, the reward model can guide behavior, and vice versa. When effective actions are underspecified in the language, emotion simulation will guide the derivation of those actions. This feature may have implications for comprehension of abstract concepts, and may explain why concepts that bear on a person's wellbeing but that don't specify particular action, like "freedom" or "justice," are often emotionally evocative.

Another difference between the ABL theory and ours is that lower-level control structures are constrained by higher-level emotion states. That is, global states of emotion (that correspond to action tendencies of approach or withdrawal, for example) will constrain the simulation of actions in a probabilistic fashion. Because emotional facial expressions change action tendencies through modulation of the ANS (Levenson, 1992), they predispose the body for taking certain actions. For example, a positive emotion state will potentiate actions of approach (Davidson, 1992). If language understanding requires a simulation of similar such actions, then comprehension will have been facilitated. Thus, because smiling will potentiate actions of approach and affiliation, it is likely to facilitate a simulation of the actions in a sentence like, "You lean over your birthday cake and blow out all the candles."

Finally, our account gives emotion a central role in language comprehension, even for simulation of language that is only implicitly emotional. We think this is fitting – language conveys emotional meaning at every level of analysis, from prosody, to morphology, to syntax (Majid, 2012), and a reader's emotions are engaged by language at the earliest stages of processing (Van Berkum et al., 2009).

## **LIMITATIONS AND FUTURE DIRECTIONS**

Our purpose in this article is to provide a theoretical synthesis of research from several domains, with an emphasis on recent, and intriguing findings. By necessity, we have overlooked vast areas of work that deserve consideration, and only mentioned some work that deserves deeper consideration. Further refinement of the theory will depend on a more careful accounting of this work. For example, our theory bears a similarity to accounts of facial expression recognition in which facial feedback provides a source of automatic, rapid, and unconscious constraints

on processing (e.g., Dimberg et al., 2000). Another important, and fast-developing body of research that deserves greater attention surrounds the notion of "simulation." Our focus on the mechanisms of emotion simulation may have overlooked broader developments in this area are likely to bear on the present theory.

Another limitation concerns our treatment of alternative accountsforfacialfeedback effects. One important alternative rests on changes in mood, and studies have demonstrated that facial feedback can influence mood, and mood processes (e.g., Kleinke et al., 1998; Duclos and Laird, 2001). While we have developed our theory partly in an effort to account for evidence against this hypothesis (see Havas et al., 2010), mood-based explanations will need to be carefully considered in future empirical validation of the theory.

Much of our theory derives from the internal models framework, and its recent projection to language in the ABL model of Glenberg and Gallese (2012). Although in its present form the account is an advance in that it suggests how simulation in Broca's region may modulated by emotion systems, much work is needed to establish the validity of the ABL theory, and to connect it with emotional language comprehension. Although many details are still to be worked out, we consider this a step toward specifying interactions of emotion and language that have long interested researchers, and whose existing empirical connection is currently only tenuous.

We have claimed that our theory supports simulation-based accounts of language comprehension (Glenberg and Kaschak, 2002; Barsalou et al., 2003; Pulvermuller, 2005) by providing a mechanism by which emotion influences action controllers in LIFG for driving the simulation of modality-specific actions and perceptions (as described by Glenberg and Gallese, 2012). Our account is embodied in that understanding language involves a simulation of meaning in multimodal brain areas that correspond to the referents in the language. Language results from the operation of controllers (which learn to derive actions from sensory goals) and predictors (which learn to predict the sensory consequences of those actions) in LIFG. Thus, understanding the meaning of the word "clap"involves first deriving the speech module (in Broca's area) for uttering the word "clap"from the text, and then generating sensory predictions of the actions (in pre-motor and motor cortex) and the sounds (in auditory cortex) involved in clapping. As generated by facial feedback, emotion states (in the insula) constrain the selection of controllers and predictors to facilitate simulation of the language content. Thus, simulation is grounded in action, perception, and emotion.

Although LIFG is not always implicated in simulation theories of language (but see Pulvermuller, 2005), we believe this region is important for embodied theories for two reasons. First, LIFG is critical in syntax, and any theory that fails to account for this involvement is necessarily incomplete. Second, an important challenge for embodied theories is to explain predication, or conceptual combination into grammatically meaningful statements. We believe that the present theory contributes to the grounding of predication in action and emotion.

For future work, one promising feature of the model is that it suggests a constraint on the creativity of the human conceptual system. Recall that somatic prediction error signals the relative value of taking a particular action in a particular context, and can be used for action selection. Specifically, the signal corresponds to the predicted change in emotional state resulting from the action, as represented in somatic cortices. This signal is likely to be important in guiding the combination of concepts during language comprehension. Glenberg and Robertson (2000) suggested that conceptual combination is constrained by the affordances of the objects described in noun-verb combinations. They presented participants with sentences describing novel situations that ended in one of three ways, and participants judged the sentences as sensible or nonsense. For example, the phrase "Bill needed to paint the top of the barn wall so he stood on his . . ." could be followed either by "ladder," "tractor," or "hammer." They found that sentences ending with objects that afforded accomplishing the goal but that were used in an unusual way (tractor) were judged as sensible just as readily as sentences ending with objects that both afforded the goal and were used in a typical way (ladder). Yet, sentences ending with non-afforded and unusual objects (hammer) were quickly judged as nonsense, despite the fact that the word "hammer" was similar to the word "tractor" in many other ways (both are strongly associated with the context, both are tools, both are common words, etc.). Thus, they argued that conceptual combinations are constrained by whether the affordances of objects in the language can be meshed in service of reaching goals.

A benefit from our approach is that it helps to differentiate conceptual combinations that may equally afford goal-obtainment, but differ in the emotional value with which they do so. For example, standing on a tractor may not be as expedient or safe as standing on a ladder to paint the top of a barn. By contrast, the somatic error signal helps to differentiate these options on the basis of their value for the organism. Actions that afford success more expediently (i.e., they deliver the reward of goal attainment more directly,with greater certainty, or more quickly) will be understood more readily, subject to the current emotional state of the reader. Thus, the present model enriches Glenberg and Robertson's (2000)

account without reverting to standard, amodal linguistic criteria commonly used to explain semantic combination effects (word frequency, animacy, typicality, etc.).

## **CONCLUSION**

By selectively blocking muscle feedback, botulinum toxin-a (botox) has allowed researchers a new opportunity to test the role of the body in cognition. Recent experiments with emotional facial feedback have shown that botox modulates emotion experience and its neural centers, and selectively affects emotion-language comprehension, thereby strongly supporting facial feedback theories of emotion and embodied accounts of cognition.

Using a functional account of emotion, we explored implications of this research for a mechanistic understanding of the body's role in language, and proposed a role of bodily feedback in providing context-sensitive constraints on language processing. Paralleling the role of emotions in real-world behavior, our account proposes that (1) facial expressions accompany sudden shifts in wellbeing as described in language; (2) facial expressions modulate emotion action systems during reading; and (3) emotional action states prepare the reader for an effective simulation of the ensuing language content. In language comprehension, modules in Broca's area learn to predict the emotional consequences of simulated actions, and prediction error leads to facial afference. Facial feedback provides context-sensitive modulation of visceral states, and these emotional state changes become represented in somatosensory cortex. In turn, somatic representations constrain simulation of actions and action inferences for deriving the meaning of the language. By selectively blocking emotional feedback, botox systematically affects the simulation value of actions and perceptions described in the language. Our theoretical framework, based on internal models, provides a detailed account that can guide future research on the role of emotional feedback in language processing, and on interactions of language and emotion. It also highlights the bodily periphery as relevant to theories of embodied cognition.

## **REFERENCES**


versus sentence mapping when representing fictional characters' emotional states. *Lang. Cogn. Process.* 7, 353–371.


supervised learning," in *Advanced Neural Computers*, ed. R. Eckmiller (Amsterdam: Elsevier), 365–372.


The relationship between social and physical pain. *Psychol. Bull.* 131, 202–223.


Jenike,M. A. (1998). Masked presentations of emotional facial expressions modulate amygdala activity without explicit knowledge. *J. Neurosci.* 18, 411–418.


Zwaan, R., and Taylor, L. J. (2006). Seeing, acting, understanding: motor resonance in language comprehension. *J. Exp. Psychol. Gen.* 135, 1–11.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2012; accepted: 07 May 2013; published online: 27 May 2013.*

*Citation: Havas DA and Matheson J (2013) The functional role of the periphery in emotional language comprehension. Front. Psychol. 4:294. doi: 10.3389/fpsyg.2013.00294*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Havas and Matheson. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The role of scene type and priming in the processing and selection of a spatial frame of reference

## *Katrin Johannsen\* and Jan P. De Ruiter*

*SFB 673 Alignment in Communication, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Bielefeld, Germany*

#### *Edited by:*

*Louise Connell, University of Manchester, UK*

#### *Reviewed by:*

*Dermot Lynott, University of Manchester, UK Berenice Valdés Conroy, Universidad Complutense de Madrid, Spain*

#### *\*Correspondence:*

*Katrin Johannsen, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Universitätsstraße 25, D-33615 Bielefeld, Germany. e-mail: katrin.johannsen@ uni-bielefeld.de*

The selection and processing of a spatial frame of reference (FOR) in interpreting verbal scene descriptions is of great interest to psycholinguistics. In this study, we focus on the choice between the relative and the intrinsic FOR, addressing two questions: (a) does the presence or absence of a background in the scene influence the selection of a FOR, and (b) what is the effect of a previously selected FOR on the subsequent processing of a different FOR. Our results show that if a scene includes a realistic background, this will make the selection of the relative FOR more likely. We attribute this effect to the facilitation of mental simulation, which enhances the relation between the viewer and the objects. With respect to the response accuracy, we found both a higher (with the same FOR) and a lower accuracy (with a different FOR), while for the response latencies, we only found a delay effect with a different FOR.

**Keywords: spatial perception, priming, psycholinguistics, cognitive processes, spatial reference frames, scene perception**

## **INTRODUCTION**

Expressing spatial relations is an important aspect of everyday communication. By using spatial terms, we indicate the location of one object in relation to another, to ourselves, to an interlocutor or to cardinal points. These different ways of expressing a spatial relation depend on the choice of frame of reference (FOR). A FOR can generally be described as a set of axes that defines space (Carlson, 1999). The point of intersection constitutes the origin (Miller and Johnson-Laird, 1976). The relative FOR establishes a ternary relationship which comprises a reference object, a located object, and a viewpoint. Using the intrinsic FOR, however, leads to a viewpoint-independent binary relationship between a reference object and located object (Levinson, 1996, 2003). In the present study, the origin of the relative FOR coincides with the egocentric perspective of the viewer whereas the origin of the intrinsic FOR is objectcentered. The absolute FOR depends on environmental features such as cardinal points and will not be considered in the present study.

Crucially, spatial projective terms such as "next to," "in front of " and "behind" ("neben," "vor" and "hinter" in German) are ambiguous if it is unclear which FOR is adopted. Different FORs appear to be used differently in everyday life. There have been many attempts to identify preferences for specific FORs leading to ambiguous results. The relative FOR, being perceptually available and avoiding the extra computational effort needed for mental rotation, has been considered predominant by some authors (Linde and Labov, 1975; Levelt, 1982, 1989) whereas other authors have claimed that the intrinsic FOR predominates (Miller and Johnson-Laird, 1976) or is at least preferred (Carlson-Radvansky and Irwin, 1993; Carlson-Radvansky and Radvansky, 1996; Taylor et al., 1999). How FORs are chosen and maintained has been studied intensively (e.g., Carlson-Radvansky and Irwin, 1994; Carlson, 1999; Watson et al., 2004; Ball et al., 2009). Research has shown that, when choosing a FOR, all FORs are initially active (Carlson-Radvansky and Irwin, 1994) until one is selected. This selection is affected by various situational factors, for instance by functional relations between the objects (Carlson-Radvansky and Radvansky, 1996), motion characteristics (Levelt, 1984), gravity (Friederici and Levelt, 1990), or by alignment to the FOR chosen by the interlocutor in dialogue (Watson et al., 2004). These results indicate that there may not be a uniform default FOR but rather that the FOR selection is affected by situational influences.

A question that has, to our knowledge, not been addressed yet is whether the type of scene used to present the stimuli has a direct effect on the acceptability and processing of different FORs. This question arises from considerations of the disparities between FORs and of the role of embodiment.

A principle difference between the relative and the intrinsic FOR is that only the former requires the viewer as an origin. The relative FOR is indispensable for our navigation in the world as its use involves computation of relevant spatial relations. As the relative FOR originates in the viewer, an embodiment of the viewer may be considered a necessary prerequisite. Using the relative FOR for depictions therefore requires a mental simulation of a positioned viewer in the scene. Mental simulation in the processing of spatial relations irrespective of FOR has been reported elsewhere (e.g., Coventry et al., 2010).

We assume that there is little incentive for such a mental simulation of a viewer in depictions that exclusively involve configurations of objects and do not contain natural elements and that the relative FOR may thus be less preferred than the intrinsic FOR. However, if object configurations are embedded in depictions of natural environments, the use of the relative FOR may become more likely, as it is easier for a viewer to imagine being in a natural environment than in a scene that only contains "floating" objects.

Recent studies varied in their construction of scenes. Studies that have found a preference for the intrinsic compared to the relative FOR vary from using only a depiction of two or more objects without background elements (e.g., Carlson-Radvansky and Radvansky, 1996; Taylor and Rapp, 2004) to line-drawing scenes with rudimentary background elements (Carlson-Radvansky and Irwin, 1993). However, Taylor and Tversky (1996) showed that speakers chose different frames of reference for their descriptions of spatial environments depending on the characteristics of the scene they were shown.

We assume that presenting a realistic scene might result in a processing advantage for the relative FOR, as viewers are more likely to perform a mental simulation and establish a relation between the objects and themselves. Therefore, we hypothesize a higher acceptability of the relative FOR in more naturalistic scenes.

We investigated this hypothesis by presenting identical object configurations in two different scenes, and measuring the acceptability and reaction times (RTs) in a sentence-picture verification task. Sentences in German such as "Die Pflanze ist vor dem Stuhl" ("The plant is in front of the chair") were used to assign a reference frame to the picture. In order to present a realistic scene, we chose a living-room scenario so that, in one version, the depiction showed a room with two embedded objects, whereas in the other version the same two objects were depicted in front of a white background.

In addition to the influence of scene type on FOR selection we were also interested in FOR-related priming effects. Recent studies have shown that the time needed for spatial term assignment in a FOR is prolonged when a different FOR has previously been processed (Carlson-Radvansky and Jiang, 1998; Carlson and van Deman, 2008). This effect1 has been interpreted as inhibition of the non-selected FOR (Carlson and van Deman, 2008). However, this investigation of priming effects focused on RTs, and did not include an analysis of response accuracy. If RTs were prolonged due to inhibition, we hypothesize that the accuracy ratings could also be affected. Inhibition of the non-selected FOR should lead to more rejections of targets following a prime trial with a different FOR than with a neutral or identical FOR. More rejections are expected because the inhibited FOR may not only be more difficult to process, which is revealed by longer response latencies, but also, in cases of stronger inhibition, be less available.

Even though priming effects in FOR selection have not been reported for RTs, other studies have suggested their possible existence. Watson et al. (2004) reported that interlocutors in dialog tended to use the same FOR as their interlocutor had previously used. This alignment effect was argued to result from priming of FORs. Thus, it is plausible to assume that not only can a different FOR delay responses, but that the same FOR can also speed up responses due to FOR priming, and that these effects might be observable in both RTs and accuracy ratings.

In order to assess these different possible effects of same and different FORs on accuracy and RTs, we included three conditions in the experiment: a match condition in which prime and target used the same FOR, a mismatch condition, which required switching the FOR between prime and target and a control condition, in which the prime trial did not disambiguate between FORs. Thus, the FOR on the target trial could either have been activated by having been used in the preceding trial (match condition) or inhibited by being available but not being selected (mismatch condition). The control condition served as a baseline for comparisons.

Our expectations were that having the same FOR would result in more acceptances of target trials in the match condition than in the control condition, while having different FORs would result in more rejected target trials in the mismatch condition than in the control condition. With regard to RTs, different FORs are predicted to lead to longer response latencies in mismatch target trials than in the control target trials, as described in earlier studies (Carlson-Radvansky and Jiang, 1998; Carlson and van Deman, 2008), while having the same FOR was predicted to yield shorter response latencies for target trials in the match condition compared to the control condition.

## **MATERIALS AND METHODS**

## **PARTICIPANTS**

Fifty students of Bielefeld University (21 men, 29 women) ranging in age from 20 to 62 (*M* = 26*.*9, *SD* = 8*.*5) were paid for their participation in the experiment. All participants were native German-speakers and each of them saw only one version (either the one with or the one without background).

## **STIMULI AND DESIGN**

The experiment comprised 400 trials, consisting of 96 prime and 96 target trials as well as 200 distractor trials and 4 primetarget pairs with definite "no" answers according to both FORs. Sentence-picture verification for the prime trials were correct for the neutral configurations and for the relative and intrinsic FOR in 32 cases each. With regard to the target trials, 48 cases were correct for the relative and 48 for the intrinsic FOR. We thus made sure that 50% of the trials had as correct response "yes" and the other 50% had "no" as a correct response. The distractor trials consisted of 100 "yes" and 100 "no" response trials.

Stimuli consisted of a sentence and a picture presented subsequently. The sentence was presented auditorily (in German) and spatially described the object configuration in the picture. Sentence duration was approximately 2 s and during its presentation, participants saw a white screen. Immediately after the presentation of the stimulus sentence, the picture was shown.

Pictures of object configurations were created using indoor planning software ("Sweet Home 3D") in two versions which differed in background. In one version, following Henderson and Hollingworth's (1999) idea, we used a semantically coherent, human-scaled view of a living-room (**Figure 1**). This version constituted a *true scene* (Henderson and Ferreira, 2004), which will be referred to here as "scene with background." In the other

<sup>1</sup>This effect has also been called "negative priming" in the literature (Carlson-Radvansky and Jiang, 1998; Carlson and van Deman, 2008).

version, the object configurations were shown in front of a white background (**Figure 2**). This version (*ersatz scene*, Henderson and Ferreira, 2004) will be referred to here as "scene without background." Participants saw either the version with or the version without background, therefore there were equal number of scenes with and without background. The size of the scene

**FIGURE 1 | Scene with background and both FOR.**

was 33 × 17 cm and the unconstrained viewing distance was approximately 70 cm.

Three types of pictures were created: experimental, neutral, and distractor. The experimental pictures consisted of two discrete objects in the foreground (reference object and located object). We used three different triaxial reference objects (chair, armchair, sofa), which were rotated on the vertical axis at angles of 0, 90, and 270◦ in order to vary the mapping of the horizontal intrinsic axes to the horizontal relative axes. Reference objects in the prime and target pictures had the same orientation and were always in an upright position. The located objects were biaxial (plant, stool) thus revealing no predefined horizontal orientation and were placed along the horizontal axes of the reference object (in front of, behind, to the left/right of). For the 0◦ rotation, the located object was only positioned to the left/right of in order to dissociate the relative and the intrinsic FOR. In order to keep the number of trials within a reasonable limit, we did not present every object in every possible combination of rotation, located object, and reference frame in all three conditions as this would have led to 180 prime target pairs. Thus, we reduced the number of target pictures to 16 which were presented with both FOR in all three conditions resulting in 96 prime target pairs. The 32 target trials (16 with a relative FOR and 16 with an intrinsic FOR) in each experimental condition consisted of 8 trials with

0◦ rotation, 12 trials with 90◦ rotation, and 12 trials with 270◦ rotation. The position of the located object was controlled for the axis between prime and target trials: the located object was positioned on the same axis in 16 prime and target trials and across axis in the other 16 trials per condition. The reference object and located object were positioned at the same, short distance from each other throughout the picture sequence.

In neutral pictures, the located object was placed along the vertical axis of the reference object leading to an alignment of the FORs. This alignment eliminated the need for the viewer to choose between the intrinsic and the relative FOR. For these configurations, eight additional objects were introduced (bench, box, chest, bottle, lamp, notebook, fish tank, carpet). Fifty distractor pictures were created that contained only a single discrete object in the foreground (pieces of furniture, toys, a book, a bottle, etc.).

The presentation of the visual stimuli was preceded by an auditorily presented sentence (in German) describing the object configuration and implicitly assigning the intrinsic or the relative FOR (or both in the control condition). The sentence "The *<located object>* is *<spatial term>* the *<reference object>*" was played over loudspeakers. See **Figure 1** for an example picture of a scene with background with intrinsic and relative FOR. In the distractor trials the sentence "The *<object>* is *<adjective>*" was used, with a color or shape adjective. The picture remained on the screen until a response was given. There was no inter-trial interval.

Using a standard priming paradigm, we constructed three conditions of prime-target pairs: two experimental (match, mismatch) and one control condition. All three conditions had identical target trials in order to directly compare, both within-subject and within-item, the influence of the different prime conditions. Furthermore, each target trial was presented both with a relative and with an intrinsic FOR. In the match condition, prime and target trials used the same FOR (intrinsic-intrinsic, relative-relative). The mismatch condition contained different FORs for prime and target trial (intrinsic-relative, relative-intrinsic) and in the control condition a specific FOR was only used for the target picture (neutral-relative, neutral-intrinsic). We thus obtained a 2 × 3 design consisting of the factors "background" (with, without) and "priming condition" (match, mismatch, control) and accuracy and RTs of target trials as dependent variables. See **Figure 2** for examples of Prime-Target pairs for each condition using a relative FOR in the Target trials.

With regard to scene type analysis, accuracy and RTs of prime trials were dependent variables in a 2 × 2 design with the factors "background" (with and without) and FOR (relative and intrinsic). Both FOR had identical prime trials to compare within-subject and within-item the effect of FOR processing.

To avoid effects resulting from simple repetition priming, we used different objects as well as different spatial terms for prime and target sentences. Furthermore, two distractor pictures were presented between successive prime-target pairs in order to avoid interactions between the FORs. In order to minimize other influences on FOR selection, we only used object configurations which did not show a functional relation between located objects and reference object (Carlson-Radvansky and Radvansky, 1996).

The randomization procedure took into account the priming condition, the rotation of the reference object (different rotations between prime-target pairs) and the reference object (changing objects between prime-target pairs) as well as the located object (position).

## **PROCEDURE**

At the beginning of the experiment, the instructions were shown in written form on the monitor, informing the participants that they would hear a sentence after which a picture would be shown. The participants' task was to determine whether the sentence was an adequate description of the picture as quickly and accurately as possible and respond by pressing predefined yes/no keys on a button box. The experiment started with 5 practice trials followed by 400 experimental trials. In each trial, a sentence was presented acoustically (i.e., played on loudspeakers) while the computer monitor showed a white screen. Immediately afterwards, a picture of the aforementioned object configuration was shown. The picture remained on the screen until a response was given. Response times were measured from the onset of the picture display to the key-press response using E-Prime (Psychology Tools Software). The participants were unaware of the objective of the experiment and of the type of trials they were completing. No feedback was given during the experiment. The experiment lasted 30 min including a short break midway through.

## **RESULTS**

Statistical analysis was carried out in "R" software (R Core Development Team, 2011) using the lme4 package (Bates et al., 2011). Linear mixed-effects models were used for the analysis of RTs and mixed-effects logistic regression (generalized linear mixed models, GLMM) for the analysis of accuracy.

RTs below 200 ms and above 4000 ms (1.4% of the data) were considered outliers, and were excluded from the analysis.

## **SCENE TYPE**

Descriptions regarding the neutral prime pictures were accepted in both conditions in 99% of the cases and were excluded from the analysis (33.3% of the data) as they did not require a choice between the intrinsic or the relative FOR. Accuracy and RTs of prime trials are presented in **Table 1**.

In order to analyse effects of scene type and FOR on prime trial accuracy, we implemented a mixed-effects logistic regression analysis. We posited scene type, FOR and their interaction as fixed effects, and used random slopes and intercepts for subjects and items. We found a significant main effect of FOR,


revealing a higher acceptability of the intrinsic compared to the relative FOR (β = −2*.*9, *SE* = 1*.*11, *Z* = −2*.*63, *p <* 0*.*001). Furthermore, there was a significant main effect of scene type (β = −2*.*14, *SE* = 0*.*72, *Z* = −2*.*99, *p <* 0*.*01) and a significant interaction between the two reflecting the higher accuracy of the intrinsic FOR in the condition without background (β = 2*.*92, *SE* = 1*.*48, *Z* = 1*.*97, *p <* 0*.*05).

RTs of the correct prime trials using the relative or intrinsic FOR were analysed (39.8% of the prime trials). Fitting a linear mixed-effects model with RT of the prime trial as dependent variable, a random slope and intercept for subjects and a random intercept for items, no significant main effects of background (β = −0*.*8912, *SE* = 84*.*3531, *t* = −0*.*011, *p >* 0*.*05) or FOR (β = −125*.*7638, *SE* = 91*.*61, *t* = 1*.*373, *p >* 0*.*05) were found.

## **PRIMING EFFECTS**

Subsets of data were used for the statistical analysis of priming effects, as we wish to consider only those trials in which the potential prime was accepted by the participants. In the analysis of the acceptability of target trials, we considered only trials that followed an accepted prime trial (72.9% of the trials). In the analysis of target RTs, we considered only trials in which both the prime and the target were accepted (45.9%).

For the analysis of target trial accuracy with regard to priming effects, we fitted a logistic mixed-effects model with scene type and priming condition as fixed effects, a random slope, and intercepts for subjects and a random intercept for items. Model comparison revealed a significant main effect of priming condition (*p <* 0*.*001) but not of scene type. Accuracy of match and mismatch condition differed significantly from the control condition revealing a higher accuracy in the match condition (β = 1, *SE* = 0*.*39, *Z* = 2*.*58, *p <* 0*.*01) and a lower accuracy in the mismatch condition (β = −0*.*8, *SE* = 0*.*37, *Z* = −2*.*14, *p <* 0*.*05). **Figure 3** depicts the accuracy of target trials after an accurate prime trial collapsed across background version.

In order to analyse priming effects with regard to RTs, we fitted a linear mixed-effects model with full random slopes and intercepts for subjects and a random intercept for items. Taking the target RT as the dependent variable and the scene type and priming condition as independent variables, model comparison revealed that only the priming condition yielded a significant main effect (*p <* 0*.*05). This effect was attributable to the prolongation of RTs in the mismatch condition (β = 226*.*71, *SE* = 56*.*49, *t* = 4*.*01, *p <* 0*.*01) while the match condition did not differ significantly from the control condition (β = 42*.*92, *SE* = 53*.*00, *t* = 0*.*81, *p >* 0*.*05). Mean accuracy, RTs, and standard deviations (SD) are shown in **Table 2**. In order to quantify the priming effect, we calculated the differences in RTs by subtracting the mismatch and match from the control condition.

## **DISCUSSION**

## **INFLUENCE OF SCENE TYPE ON FOR PROCESSING**

Our results revealed main effects of FOR and background on accuracy in the prime trials as well as a significant interaction between FOR and background. The interaction suggested that the clear preference for the intrinsic FOR in the condition without background was diminished in the condition with background, resulting in both FORs being accepted almost equally often. This equalization of accuracy resulted from a decrease in accuracy of the intrinsic FOR combined with an increase in accuracy of the relative FOR. The latter reflects our expectations that people are more likely to use the relative FOR and thus bring in their own perspective when the scene is more natural than in depictions without background elements. Being based on the viewer's direct perception (Miller and Johnson-Laird, 1976), Levinson (1996) claimed that "relative systems of spatial description build in a viewpoint" (p. 371), which implies that using the relative FOR demands an embodied viewer in order to establish this viewpoint. Requiring an embodied origin, the relative FOR can only be processed in depictions of scenes via a mental simulation of the viewer in the scene. This stands in line with Wilson's (2002) idea that off-line cognition is body based and that sensorimotor resources are used to simulate physical aspects of the world. Thus, when natural elements are included, this increases the probability of establishing a ternary relationship resulting in a "boost" in the availability of the relative FOR.

Interpreting the decrease in acceptability of the intrinsic FOR in the scene with background, as indicated by the main effect of background condition, however, is not straightforward. General preferences for the intrinsic FOR have been reported



*\*p < 0.01.*

in previous studies (e.g., Carlson-Radvansky and Irwin, 1993; Carlson-Radvansky and Radvansky, 1996; Taylor et al., 1999; Taylor and Rapp, 2004). Given that these studies did not use scenes with background, the results are comparable to our findings for the scene without background. The scene with background may, however, have led to a decrease in this preference by reducing its saliency and increasing the saliency of the relative FOR. Findings that point to the influence of the environment on FOR selection have been described by Taylor and Tversky (1996), who showed that participants used relative, intrinsic, and extrinsic frames of reference differently depending on the environment they were asked to describe. They interpreted their findings as a reflection of how we interact with the environment.

Another line of research that points in this direction are findings from studies using neuroimaging technology to investigate brain activation patterns resulting from different visual stimuli. In general, stimuli embedded in a scene and stimuli presented without background scene induce different brain activation patterns. Using fMRI technology, a certain brain region, the "parahippocampal place area" (PPA), could be identified, which responded selectively to scenes but not to single objects or object arrays (e.g., Epstein and Kanwisher, 1998; Henderson et al., 2008). In addition, it has been reported that the visually perceived spatial structure of the environment is processed by the PPA (Epstein et al., 1999) and that the PPA is viewpointspecific and thus plays a crucial role in establishing the relationship between the viewer and the spatial structures of the environment (Epstein et al., 2003). Thus, these findings may reflect that the relative FOR, for which the establishment of a relationship between the viewer and spatial structures is a prerequisite, is more likely to be used in scenes compared to object arrays.

Interestingly, activation of brain areas in the middle temporal and middle superior temporal areas have recently been reported as resulting from mentally simulated motion in the processing of static pictures (Coventry et al., in press). The localisatory differences may be explained by the fact that mental simulation did not require a viewpoint in the scene and the stimuli were pictures without background.

Our results indicate that humans have different preferences for FORs depending on the scene type. Following this idea, we assume a further decrease in preference for the intrinsic FOR in favor of the relative FOR when participants are embodied in the scene. This is a matter for further investigations.

## **PRIMING EFFECTS**

Our experiment was designed to investigate priming effects for RTs and accuracy ratings. The results showed longer RTs and lower accuracy ratings for different FORs, but for same FORS we only found higher accuracy ratings and no RT effect.

Longer RTs for different FORs have been reported previously (Carlson-Radvansky and Jiang, 1998; Carlson and van Deman, 2008) and our results support these earlier findings. The prolongation of RTs in trials that required a switching of FORs has been interpreted as inhibition (Carlson-Radvansky and Jiang, 1998; Carlson and van Deman, 2008). This inhibition increases the cognitive effort needed for the adoption of different FORs in subsequent trials.

However, we did not only find longer RTs but also lower accuracy ratings. We interpret this as resulting from the strength of inhibition. Longer RTs reflect relatively mild inhibition, as the FOR that was inhibited could still be adopted. The fact that a large proportion of trials in the mismatch condition were rejected reflects a more powerful inhibition, one that made the FOR completely unavailable.

With regard to the RTs, we found no processing advantages in the match condition. However, the accuracy of target trials in the match condition was significantly higher than in the control condition. This suggests the presence of a priming effect. It has been claimed that, initially, multiple FORs are active and compete for selection (Carlson-Radvansky and Irwin, 1994; Carlson-Radvansky and Jiang, 1998). Our results reveal that the selection of a specific FOR leads to a persistently higher level of activation in the subsequent trial and thus to a selection advantage. This indicates that FOR selection is not only accompanied by inhibition of the non-selected FOR but also by a higher level of activation of the selected FOR.

The finding that participants showed a corresponding effect for accuracy in both conditions, but for RT there was only a prolongation in the mismatch condition, is difficult to explain. We speculate that the higher processing complexity of switching FOR in the mismatch condition also leads to a higher error rate in the sentence verification, whereas the easier processing in the match condition makes the sentence verification less error prone. This would imply that there is no speed-accuracy trade-off in this task, which is supported by an inspection of the RTs in the trials with erroneous responses: the erroneous responses were *slower* (*M* = 1184, *SD* = 535) than the correct responses (*M* = 1139, *SD* = 577, *t(*2559*)* = 2, *p <* 0*.*05).

The preference for using the same FOR has also been shown in a dialogue study, in which speakers tended to use the same FOR as their interlocutor had (Watson et al., 2004). This alignment was attributed to priming effects and has been described for different levels of linguistic representation including abstract concepts such as FORs (see Pickering and Garrod, 2004, for an overview). Priming effects are thus discussed to play a central role in communication (Pickering and Garrod, 2004).

In conclusion, our results show that people are more likely to describe a scene from an egocentric point of view when the scene has a realistic background. We explain this phenomenon by assuming that the presence of a background stimulates an embodied mental simulation of a real scene.

More generally, our results show that priming does not only have facilitatory effects on referential communication, but can also slow us down or decrease our communicational efficiency, depending on the sequential context in which utterances occur.

## **ACKNOWLEDGMENTS**

The authors would like to thank Constanze Vorwerg and Gert Rickheit for their contributions to earlier versions of this work. This work was funded by the German Research Foundation (DFG) within the Collaborative Research Centre 673 "Alignment in Communication."

## **REFERENCES**


language, visual attention, and perceptual simulation. *Brain Lang.* 112, 202–213.


Wiley), 251–268. Retrieved from: http://hdl.handle.net/2066/15488


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 October 2012; accepted: 26 March 2013; published online: 10 April 2013.*

*Citation: Johannsen K and De Ruiter JP (2013) The role of scene type and priming in the processing and selection of a spatial frame of reference. Front. Psychol. 4:182. doi: 10.3389/fpsyg.2013.00182*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Johannsen and De Ruiter. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Can speaker gaze modulate syntactic structuring and thematic role assignment during spoken sentence comprehension?

## **Pia Knoeferle\* and Helene Kreysa**

Cognitive Interaction Technology Excellence Center, Bielefeld University, Bielefeld, Germany

#### **Edited by:**

Judith Holler, Max Planck Institute Psycholinguistics, Netherlands

#### **Reviewed by:**

Charles Jr. Clifton, University of Massachusetts Amherst, USA Andriy Myachykov, Northumbria University, UK

#### **\*Correspondence:**

Pia Knoeferle, Cognitive Interaction Technology Excellence Cluster, Bielefeld University, Morgenbreede 39, Building H1, D-33615 Bielefeld, Germany. e-mail: knoeferl@cit-ec. uni-bielefeld.de

During comprehension, a listener can rapidly follow a frontally seated speaker's gaze to an object before its mention, a behavior which can shorten latencies in speeded sentence verification. However, the robustness of gaze-following, its interaction with core comprehension processes such as syntactic structuring, and the persistence of its effects are unclear. In two "visual-world" eye-tracking experiments participants watched a video of a speaker, seated at an angle, describing transitive (non-depicted) actions between two of three Second Life characters on a computer screen. Sentences were in German and had either subjectNP1-verb-objectNP2 or objectNP1-verb-subjectNP2 structure; the speaker either shifted gaze to the NP2 character or was obscured. Several seconds later, participants verified either the sentence referents or their role relations. When participants had seen the speaker's gaze shift, they anticipated the NP2 character before its mention and earlier than when the speaker was obscured. This effect was more pronounced for SVO than OVS sentences in both tasks. Interactions of speaker gaze and sentence structure were more pervasive in role-relations verification: participants verified the role relations faster for SVO than OVS sentences, and faster when they had seen the speaker shift gaze than when the speaker was obscured. When sentence and template role-relations matched, gazefollowing even eliminated the SVO-OVS response-time differences. Thus, gaze-following is robust even when the speaker is seated at an angle to the listener; it varies depending on the syntactic structure and thematic role relations conveyed by a sentence; and its effects can extend to delayed post-sentence comprehension processes. These results suggest that speaker gaze effects contribute pervasively to visual attention and comprehension processes and should thus be accommodated by accounts of situated language comprehension.

**Keywords: visually situated sentence comprehension, speaker gaze, visual context effects, sentence structure, eye tracking**

## **INTRODUCTION**

Past research has provided ample evidence that information in the non-linguistic context can incrementally modulate a listener's visual attention during real-time sentence comprehension. This has been shown for aspects of the visual context such as size contrast between objects (Sedivy et al., 1999), their shape (Dahan and Tanenhaus, 2005), the semantic relationships between objects (Huettig and Altmann, 2004), referential contrast (Tanenhaus et al., 1995), depicted clipart events (Knoeferle et al., 2005), realworld action events (Knoeferle et al., 2011), action affordances (Chambers et al., 2004), the spatial location of objects (Altmann, 2004), gestures (e.g.,Campana et al., 2005), and the speaker's locus of gaze (e.g., Hanna and Brennan, 2008).

To accommodate these effects, accounts of language comprehension (e.g., the Coordinated Interplay Account, CIA; Knoeferle and Crocker, 2007) assume that words in the utterance guide (visual) attention to relevant aspects of the visual context or their mental representation; the words are co-indexed with the attended scene information, and the latter can then influence language comprehension and visual attention. However, the existing processing accounts (see also, e.g., Altmann and Kamide, 2007, 2009) do not yet accommodate the behavior of the speaker him/herself, despite the fact that speaker-based information such as iconic gestures (Wu and Coulson, 2005), beat gestures (Holle et al., 2012), and a speaker's gaze can rapidly affect language comprehension.

For instance, in an eye-tracking study on effects of speaker gaze, a speaker and a listener faced each other with two arrays of shapes between them (Hanna and Brennan, 2008). A typical trial consisted of the speaker first inspecting and then mentioning one of two blue circles (target: with five dots; competitor: with a different number of dots). Approximately 1000 ms after the listener had heard *blue* in *blue circle* and before the utterance disambiguated the target by specifying the number of dots (*with five dots*), the listener looked more at the target than at the competitor, suggesting that speaker gaze disambiguated a referentially ambiguous target object (Hanna and Brennan, 2008; Experiment 1). In a related experiment, a robot that faced the participant frontally described size relations between objects in the scene; the description was

either true or false, and the robot either looked toward the object it was about to mention, or it looked at an object other than the one it would mention, or it looked straight ahead and thus at none of the objects. Participants were highly likely to follow a robot speaker's gaze shift (Staudte and Crocker, 2011; Experiment 1).

To sum up, speaker gaze has been shown to permit the anticipation of upcoming referents in settings in which the speaker faced the listener fully frontally (Hanna and Brennan, 2008; Staudte and Crocker, 2011). In addition, participants who did (vs. did not) follow the robot's gaze showed larger gaze congruence effects in their sentence verification times (shorter response latencies for congruent vs. incongruent robot gaze, Staudte and Crocker, 2011; see also Richardson and Dale (2005), for reports that coordination of speaker and listener gaze can improve listeners' performance on comprehension questions compared to a randomized baseline).

The present research aims to extend the existing findings in several regards. First, we asked whether gaze effects are robust even when the speaker does not face the listener fully frontally. Although the precision of gaze-direction detection is high when facing another person, it decreases as that person turns sideways (e.g., Gibson and Pick, 1963; Cline, 1967). Thus, speaker gaze might affect a listener's visual attention rapidly only when the listener can see both of the speaker's eyes, and when head movements can be detected easily. Alternatively, this may be possible even when she is positioned at an angle (e.g., at 45–60˚, see **Table 1**), a finding that would underscore the robustness of speaker gaze effects.

Second, we asked whether speaker gaze can affect processes such as syntactic structuring and thematic role assignment in addition to referential anticipation. If the speaker's gaze – much like action and object information – interacts with syntactic structure building and thematic role assignment, we should see differential effects on the processing of sentence structures that vary in the canonicality of the grammatical and thematic role relations they convey (see, e.g., Tanenhaus et al., 1995; Chambers et al., 2004; Knoeferle et al., 2005). Alternatively, speaker gaze-based referent anticipation occurs"across the board,"in which case its time course should be similar for different sentence structures.

Finally,we know little about the temporal persistence of speaker gaze effects. Staudte and Crocker (2011) reported that gazefollowing during comprehension (vs. the failure to follow the speaker's gaze) led to faster response times in a sentence verification task. The temporal persistence of such gaze-following effects, however, is not clear from their results, since participants on average responded immediately at sentence end.

Two "visual-world" eye-tracking experiments examined these three open issues. Participants inspected videos in which the thematic role relations between two out of three virtual characters, displayed on a computer screen, were described either by a visually present speaker or by a disembodied voice (the speaker was


**Table 1 | Overview of the experimental conditions (congruence is not depicted).**

The English translation of the SVO sentence is "the waiter congratulates the millionaire outside the shop," while the OVS sentence implies that the waiter is being congratulated by the saxophone player. The video is represented by a snapshot illustrating the gaze condition.

grayed out by a superimposed bar). The speaker's eyes were only partially visible, since she was videotaped at a 45–60˚ angle relative to the camera. Sentences had either subject-verb-object (SVO) or object-verb-subject (OVS) structure, and grammatical function and thematic role relations were unambiguous for all critical trials. The speaker always inspected and mentioned the central character first, and, just after uttering the verb, she shifted her gaze to the post-verbal role filler (one of the outer two characters, the NP2 referent).

We analyzed fixations to the NP2 referent which started in the time window after the speaker gaze shift and before the NP2 referent was mentioned; which fell in that time window; as well as the onset latencies of the listeners' first fixation to the NP2 referent after the speaker's gaze shift. Together, these three measures provide insight into the time course with which listeners shift their attention toward the NP2 referent. If the speaker's gaze rapidly affects listeners'visual attention and language comprehension even in this non-frontal setting, then we should see faster post-verbal anticipation of the target character (the NP2 referent) when the speaker is visible (vs. when she is grayed out). If gaze-following is not robust in this setting, then perhaps only some listeners will be able to use it, leading to non-reliable effects of speaker gaze on participants' visual anticipation of the post-verbal referent.

To test interactions of speaker gaze with sentence structure, we exploited a structural variation of German: both object- and subject-initial main clauses are grammatical, but the latter are canonical while the former are not. In reading, structurally unambiguous OVS sentences elicit longer reading times than SVO sentences, reflecting processing difficulty (e.g., Hemforth, 1993; Knoeferle and Crocker, 2009). During spoken comprehension, people can begin to anticipate the object referent of SVO sentences while hearing the verb (Knoeferle et al., 2005; Weber et al., 2006). For OVS sentences, by contrast, participants initially incorrectly anticipated an object- rather than a subject-NP2 referent at the verb, and this for both locally structurally ambiguous (Knoeferle et al., 2005; Weber et al., 2006) and unambiguous (Kamide et al., 2003) sentences. However, participants shifted their visual attention to the correct post-verbal (subject) referent when case marking and world knowledge (Kamide et al., 2003), intonation (Weber et al.,2006), or depicted events (Knoeferle et al.,2005) indicated the OVS structure of the sentence. If these context effects on syntactic structuring extend to speaker gaze, we should see later anticipation of the post-verbal (subject) referent for OVS sentences than of the post-verbal (object) referent for SVO sentences. Alternatively, if speaker gaze does not interact with syntactic structuring, we should see similar anticipation of the target referent for both SVO and OVS sentences after the speaker's gaze shift. Finally, if speaker gaze combined with accusative (object) case marking can alleviate some of the difficulty of processing object-initial structures, we might see a numerically larger effect of speaker gaze on anticipation of the post-verbal referent for object- compared with subject-initial sentences.

To address the persistence of speaker gaze effects, and to ground the interpretation of the eye-movement pattern, we recorded response times and accuracy in a verification task that was substantially delayed after the speaker's gaze shift and after sentence end. Faster RTs in this delayed task when the speaker is present (vs. absent), would corroborate the view that these effects can be longlasting. In addition, if the systematic relationship between gazefollowing and congruence effects (Staudte and Crocker, 2011) extends to our study, then we should see effects of gaze-following on response times also in our experiments, perhaps even in interaction with sentence structure (e.g., either SVO or OVS could benefit more from gaze-following). Alternatively, speaker gaze effects might be short-lived and not affect delayed, post-comprehension verification response times.

In Experiment 1, participants verified whether two depicted characters had (vs. had not) been mentioned in the sentence; this "referential" task served to replicate the results from prior studies (Hanna and Brennan, 2008; Staudte and Crocker, 2011). To ensure that any absence of interactions between speaker gaze and sentence structure in Experiment 1 were not the result of "shallow" processing for the referential task, Experiment 2 required participants to verify thematic role relations (see **Figure 1**).

## **EXPERIMENTS**

## **PARTICIPANTS**

Participants for Experiment 1 (*N* = 32; 24 females; mean age = 22; *SD* = 2.8) and Experiment 2 (*N* = 32; 28females;mean age = 24.3; *SD* = 4.3) were students at Bielefeld University and received 6 C for participating. All had normal or corrected-to-normal vision, were unaware of the purpose of the experiment, and signed an informed consent form.

## **MATERIALS AND DESIGN**

From a pool of 162 avatars in the virtual world "Second Life," we selected 72 male characters that were identified unambiguously by 20 participants in a pilot naming test. These were combined into 24 triplets of characters and photographed in a neutral outdoor setting in Second Life. Twenty-four German subject-verb-object (SVO), and object-verb-subject (OVS) sentence pairs described a transitive action between the central character of a triplet (e.g., the waiter, "NP1 referent") and one of the two outer characters (e.g., the millionaire, "NP2 referent"; see **Table 1**). The action itself was not depicted, so only the sentence identified the roles of agent and patient. From the sentence pairs and images we created 24 items consisting of four videos each: in the first two, the speaker could be seen producing either the SVO or the OVS version of the sentence [**Table 1**(a) and (b)] and looking at the characters in order of their mention (see Sentence Stimuli in Appendix for the sentence stimuli, and http://wwwhomes.uni-bielefeld.de/pknoeferle/ Homepage/KnoeferleLab\_Stimuli/MVI\_10b\_Kellner.MOV for an example video). The other two videos played back the identical SVO or OVS sentences, but the speaker was occluded [**Table 1**(c) and (d)]. Since the characters themselves did not move, this led to the impression of a static image with audio (which we will nonetheless refer to as a "video" in the following).

For the video recordings, a Canon PowerShot G10 camera was positioned approximately 1.5 m from the speaker. She was seated to the right of a 20<sup>00</sup> Apple iMac 8.1 screen displaying the static scene with the three characters. In the recording, the speaker looked first at the camera and smiled. To give participants an example of what a gaze to each of the characters looked like, she next inspected them in a fixed order without speaking (middle, left,

and right character). Before initiating the sentence, she shifted her gaze back to the middle character (the NP1 referent) and stayed there as she uttered the first noun phrase (mean speech onset: 6870 ms after video onset). Shortly after uttering the verb and before the second noun phrase, the speaker's gaze shifted from the NP1 referent to the NP2 referent (shift onset: *M* = 949 ms after verb onset; *M* = 740 ms before NP2 onset). She looked back into the camera at the end of the sentence (total duration of the video: *M* = 13,143 ms).

role relations verification: Does the arrow reflect the thematic role relations of the sentence?

For the post-video response task, we created verification templates with stick people as placeholders for the three avatar characters (see **Figures 1A,B**). In Experiment 1, the task was to verify whether both of the circled characters had been referenced by the sentence or not: for condition (c) in **Table 1**, the correct response to the template in **Figure 1A** would have been "yes," since the positions of the waiter and the millionaire are circled and both were mentioned. In Experiment 2, participants verified whether the arrow on the template correctly (vs. incorrectly) characterized who-does-what-to-whom in the sentence. For **Table 1**(c) followed by **Figure 1B**, the response would be "no," since the arrow points from the right character (the millionaire) to the waiter in the center. The waiter is not the patient of the sentence, but the agent, so a matching arrow would point from him outward, to the sentential patient (the millionaire). For experimental items, the matching arrow was always the reverse of the mismatching arrow (i.e., both matching and mismatching arrows connected the two mentioned characters); in filler trials,50% of the arrows implicated the unmentioned character.

Overall, there were three within-subject factors: *speaker* ("gaze" vs. "no gaze"),*sentence structure* (SVO vs. OVS), and *congruence* of the sentence and the post-sentence template (congruous vs. incongruous). Only the first two factors were manipulated during the sentence and could therefore potentially affect online eye movements. Prior to the NP2, the case marking on the determiner of the NP1 (*Der* vs. *Den*) indicated constituent order (SVO and OVS, respectively), but not the identity of the NP2 referent. Since the nouns and the verb of all sentences were semantically and thematically unrelated, who-does-what-to-whom was never linguistically disambiguated prior to the second noun. By contrast, the speaker's gaze shift to the NP2 referent could, in principle, prompt the listener to anticipate the NP2 referent. All three manipulated factors could affect post-sentence verification response times.

Counterbalancing ensured that half of the videos showed the NP2 referent on the right side of the screen, the other half on the left (i.e., the speaker shifted her gaze equally often to either side); it also ensured that the mentioned outer character in each video was equally often a thematic agent and patient; and that the "match" response was assigned to the left button on a button box for one half of the participants, and to the right button for the other half. Every participant experienced equal numbers of shifts to either side, as well as equal numbers of match/mismatch responses and of SVO/OVS sentences.

The three within-subject factors resulted in eight lists, which were presented in a different pseudo-randomized order for each participant. Each list contained one version of each of the 24 experimental items, and 48 filler items. These used a variety of sentence structures (e.g., subject-initial, dative-initial, passive, and prepositional constructions), combined with Second Life images or clipart depictions of action events. Half of the filler trials showed the speaker.

### **GAZE-DETECTION PRETEST**

A gaze-detection pretest with a different group of participants (*N* = 20) examined how rapidly and accurately people could detect the speaker's gaze shift from the NP1 to the NP2 referent. Participants watched the recorded videos and pressed a button as soon as they noticed that the speaker shifted her gaze away from the middle character after sentence start, indicating the direction of the shift. Detection accuracy was high (98%), and participants were fast to respond (*M* = 498 ms, *SD* = 386 ms). Note that the speaker moved both her head and her eyes to look at the relevant character, with the eyes shifting slightly before the onset of the head movement. This saccade was coded as the onset of the gaze shift and will be used in the analyses; however, we cannot exclude that it was possible for participants to make use of the head movement instead of the eye movement. Thus, the term "speaker gaze" does not refer to eye movements only; we use it in a wide sense to refer to the direction of attention by the speaker.

## **APPARATUS AND PROCEDURE**

Participants were seated in front of an Eyelink 1000 desktop head-stabilized eye tracker (SR Research) and the experimenter calibrated their right eye with a 9-point dot pattern. Participants were instructed on-screen that they would watch a series of unrelated videos which they should attend to and try to understand. They were informed that we were interested in the effect of different types of video complexity on memory retention, and were told that the main experiment would be followed by a memory test for the videos they had seen. This cover story was devised to mask the within-participant gaze manipulation and to ensure that participants paid attention to all aspects of the videos. They were further asked to verify as quickly and accurately as possible whether the post-video template matched (vs. did not match) the sentence.

Each participant completed four practice trials with feedback on their accuracy, followed by a second calibration; then the experiment began. Each trial started with a central fixation dot that participants fixated, followed by the video. As soon as this ended, the verification template appeared, and participants used a Cedrus response box to indicate whether the template matched the sentence (no feedback was provided during the experiment). Participants usually took a break half-way through the experiment, followed by recalibration; additional calibration was performed when necessary. The post-experiment memory test consisted of four practice and 24 experimental trials: participants inspected a snapshot from each of the 24 experimental (but not filler) videos in the same or the opposite speaker gaze condition as during the experiment. Their task in the memory test was to verify quickly and accurately whether these snapshots had (vs. had not) been present in the experiment. This resulted in a 2 × 2 design (*speaker*: gaze vs. no gaze; *previous occurrence*: yes vs. no). The experiment concluded with a debrief form and lasted 45–55 min.

## **ANALYSES**

#### **RESPONSE-TIME ANALYSES**

Response times (RTs) were time-locked to the display onset of the verification template. Using linear mixed models with crossed random intercepts and slopes for participants and items, we analyzed log-transformed RTs within 2 *SD* of each participant's mean per congruence condition, including only trials with accurate responses. Details on model selection can be found in the Section "Model Selection Procedure for RTs and Eye-Movement Data"; the final models are listed in the Section "Details for the Linear Mixed Models Analyses" in the Appendix.

#### **EYE-MOVEMENT ANALYSES**

Trials with recording problems (e.g.,miscalibration, external noise, or track loss) and inaccurate responses were excluded from the analyses. Since we were most interested in the allocation of attention following the speaker's gaze shift, we selected two primary time windows for the analyses (onsets and offsets for these were computed on a trial-by-trial basis): a"SHIFT" time window and an "NP2" time window. The SHIFT time window lasted for 800 ms from the onset of the speaker's gaze shift in a particular video. In no gaze trials, the shift onset of the corresponding gaze video was used; this was possible because the underlying video was the same in both conditions and served to make the two maximally comparable. Across trials, the end-point of the SHIFT window corresponded roughly to the mean onset of the NP2 determiner (at *M* = 740 ms from shift onset; *SD* = 178). The NP2 window contained the following 800 ms, up to 1600 ms from shift onset, on a trial-by-trial basis. Roughly, this spanned the first half of the unfolding NP2 (NP2 offset: *M* = 1749 ms from shift onset, *SD* = 244). The two time windows were further split into 100 ms periods for some analyses, thus providing a detailed view of the time course with which speaker gaze affected fixations.

In both experiments and both time windows, we analyzed the mean log-gaze probability ratio with which listeners were likely to be fixating the target character over the competitor, and the target character over the NP1 referent. Additionally, we analyzed the log-transformed latencies of listeners' first fixation to the target character after the speaker's gaze shift, and the number of fixations to the target character in the SHIFT and NP2 time windows.

#### **Log-gaze probability ratios**

Mean log-gaze probability ratios were determined by dividing the probability of fixating the target character (aggregated over 20 ms bins) by the probability of fixating (a) the competitor [ln(*P*(target)*/P*(competitor))] or (b) the NP1 referent [ln(*P*(target)/*P*(NP1 ref))]<sup>1</sup> . A score of zero indicates that the two characters were fixated to an equal extent; a positive value implies that the target was fixated more than the competitor or the NP1 referent, and a negative value that it was fixated less. To analyze these probability ratios, we fitted separate linear models over participants and items<sup>2</sup> (see Sections "Model Selection Procedure for RTs and Eye-Movement Data" and "Details for the Linear Mixed Models Analyses" in Appendix for details).

### **Onset latencies of the first target fixation after the speaker's gaze shift**

Fixation onset latencies were based on the first fixation to the target character made after the onset of the speaker's gaze shift plus 100 ms. Such a post-shift fixation to the target character occurred in 99% of all accurate trials in Experiment 1, and 95% in Experiment 2. The onset of the speaker's gaze shift was subtracted from the onset time of this fixation in order to obtain the latency in milliseconds from speaker gaze shift. We removed outliers ± 2 *SD* from each participant's mean per gaze condition (Experiment 1: 24/739 trials; Experiment 2: 28/701) and log-transformed the data to reduce positive skew.

#### **Model selection procedure for RTs and eye-movement data**

Model selection followed the same procedure for analyses of response times, log-gaze probability ratios, and first fixation latencies. The initial model included two fixed factors for first fixation latencies (speaker and sentence structure), and three fixed factors each for response-time analyses (speaker, sentence structure, and congruence) and log probability ratios (speaker, sentence structure, and the 100 ms time windows<sup>3</sup> ). All fixed factors were centered around a mean of zero to minimize collinearity, resulting in negative contrast coding (≈−0.5) for the factor levels no gaze, OVS, and incongruous, and positive contrast coding (≈ + 0.5) for gaze, SVO, and congruous, respectively. The eight levels of the time factor were also centered and ranged from ≈−3.5 to ≈+3.5. In addition, the initial models included all two-way interactions (and for RTs only: the three-way interaction of speaker,

<sup>1</sup> Since log-ratios are undefined for 0, we replaced counts of 0 in numerator or denominator by 0.1 before the division.

<sup>2</sup> Since the log ratio measure relies on aggregation,it is not possible to include crossed random effects of participants and items in the same linear mixed model.

<sup>3</sup>We also looked at models without the time factor: these found comparable results, but with a worse fit.

sentence structure, and congruence), as well as random intercepts for participants and/or items, and random slopes with all the fixed factors and their interactions. If this model did not converge, we removed interactions in the random parts of the model in rising order of variance explained, until convergence was achieved (note that the initial model always converged for log-gaze probability ratios).

The first converged model was defined as the "maximal model<sup>4</sup> ," against which subsequent simpler models were compared by log-likelihood ratio tests, following a backward selection procedure. We removed any fixed-effect interactions that did not contribute significantly to the maximal model, as well as their corresponding random slopes. This procedure continued until either the removal of a term led to a significant decrease in model fit (log-likelihood ratio), or until the model contained only main effects. The resulting model was designated our "final" model, for which we report the coefficients, *SE*, and *t*-values for all fixed effects and interactions (if present). Coefficients were considered significant if the absolute value of the *t*-statistic was greater than 2. Details on the final structure of all models can be found in Section "Details for the Linear Mixed Models Analyses" in Appendix.

#### **Hierarchical log-linear analyses of fixation counts to the target**

We also produced crosstables of fixation counts to the NP2 referent in the two time windows, for speaker × sentence structure<sup>5</sup> . The analyses were performed using backward elimination (see Field, 2005). For each time window, we performed one analysis with participant as random factor (1–32), and a second analysis with item as random factor (1–24). Reported partial χ 2 and *p*-values are for the partial associations after inspection of *k*-way significance.

#### **Relating real-time gaze-following to post-sentence gaze effects**

To relate real-time gaze-following to post-sentence verification responses we performed two analyses. First, we analyzed correlations between eye movement and response-time measures: we determined for each fixation to the target character after the speaker's gaze shift whether it occurred in the SHIFT time window, the NP2 time window, or thereafter, and then restructured the data to identify the first fixation in each trial to the target character (six trials without a fixation to the target character were excluded from Experiment 1, ten from Experiment 2). We computed difference scores by subtracting the no gaze count from the gaze count for each participant's total number of trials with target fixations in the SHIFT period. Equivalent difference scores were created for first target fixation latency and RT from template onset. These difference scores were entered into correlation analyses.

Second, we entered a variable coding whether participants did or did not fixate the target character in the SHIFT time window into a linear mixed model of response times (log-transformed, outliers beyond 2 *SD* excluded, see the description of response-time analyses). The final model for Experiment 1 contained the centered factors congruence, the new factor *gaze-following*, and their interaction. In Experiment 2, the final model included sentence structure as well as congruence and gaze-following, and their interactions. We also included a random intercept and random slopes for all fixed factors by participants, and a random intercept only for items (the removal of random slopes was necessary to achieve convergence).

#### **RESULTS FOR EXPERIMENT 1**

#### **ACCURACY AND RESPONSE-TIME RESULTS**

In Experiment 1, one (bilingual) participant had to be replaced. Participants made at least 21/24 accurate responses in the verification task (>85%); their mean accuracy was 97%. Accuracy in the verification task was not modulated by the manipulated factors (χ 2 tests: *p*s > 0.8). Accuracy in the post-experiment memory test was around chance (45%). Response times in the main experiment were significantly affected by congruence only: participants responded faster when the template was congruous (828 ms, *SD* = 309) than incongruous with the sentence (946 ms, *SD* = 30; coefficient = −0.07, *SE* > 0.01, *t* = −8.6; other *t*s < |1|).

#### **EYE-MOVEMENT RESULTS**

Inspection of eye-movement proportions in **Figure 2A** reveals that when the speaker was visible (gaze), fixations to the target character rose steeply 200 ms before it was mentioned (*M* = 740 ms from shift, *SD* = 178). By contrast, in cases where the speaker was not visible (no gaze), fixations to the target character increased only half-way through the NP2 that referenced it. Correspondingly, as can be seen in **Figures 2B,C**, fixations to the competitor and to the NP1 referent declined more quickly for the gaze than no gaze conditions. Although participants eagerly inspected the speaker before she began to speak, they hardly ever looked at her during the sentence: fixations to the speaker were as rare as to the background (**Figure 2D**).

#### **Log-gaze probability ratio analyses**

**Table 2** presents the mean log-gaze probability ratios for the target vs. competitor and target vs. NP1 referent by condition for the SHIFT time window, and **Table 3** the corresponding inferential analyses. Seeing the speaker inspect the target character increased listeners' tendency to look at the target compared to the competitor; this gaze effect on target inspection increased over time and did not vary with sentence structure. In addition, participants were more likely to inspect the target over the competitor during OVS than SVO sentences. The NP1 referent was fixated more than the target in the no gaze condition, but less so in the gaze condition. It was also fixated more in SVO than in OVS sentences (the latter effect was reliable by participants only).

In the NP2 time window, as participants heard the name of the NP2 referent, they were overall more likely to fixate the target than the competitor (significant intercept in **Table 4**; note that we have abstained from providing an overview of mean log-gaze probability ratios in the interest of readability, since the general pattern of results can be inferred from **Figure 2**). Crucially, participants fixated the target more when the speaker looked at it (*M* = 3.04, *SD* = 2.48), compared to when she was grayed out (*M* = 0.39,

<sup>4</sup>We always compared our maximal models to models with the same fixed effects structure but no random slopes (random intercepts only). The pattern of results was the same, but model fit suffered.

<sup>5</sup> Initial tables also contained the other scene regions (speaker, NP1 referent, competitor character, and background), but this approach had to be abandoned due to sparse frequency counts in many cells. Sufficient expected counts are a requirement of hierarchical log-linear analyses.

**the background.** All graphs begin at the onset of the speaker's gaze shift. Mean onsets of the NP2 and the ending phrase are marked with vertical gray bars.

**Table 2 | Experiment 1, SHIFT time window: Mean log-gaze probability ratios by condition for fixations to the target character (a) over the competitor or (b) over the NP1 referent.**


A positive number indicates preferred inspection of the target character; negative numbers preferred inspection of the competitor/NP1 referent. SD in parentheses.

*SD* = 2.5), and this tendency continued to increase over the time window. Sentence structure no longer had any direct effect on log-gaze probability ratios, although it interacted with time bin by participants (greater increase in target fixations for SVO than OVS sentences).

#### **Onset latency of first fixation to the NP2 referent**

Participants began to fixate the target character earlier if they could see the speaker's gaze shift (*M* = 832 ms from shift, *SD* = 562) than when they could not (*M* = 1165 ms, *SD* = 688). This speedup occurred for both sentence structures (mean differences for gaze vs. no gaze: 386 ms (SVO) and 282 ms (OVS); main effect of gaze: *t* = 4.81, coefficient = 0.35, *SE* = 0.07; main effect of structure: *t* < 1.5).

#### **Hierarchical log-linear analyses of fixation counts**

A reliable interaction of speaker and sentence structure in the SHIFT time window meant that while more anticipatory fixations to the target character occurred in the gaze condition than in the no gaze baseline, this increase was larger for SVO than OVS sentences [SVO: gaze 42% of all fixations vs. no gaze 17%; OVS: gaze 45% vs. no gaze 36%, see **Figure 2**; LRχ 2 (subj) = 9.80, *p* < 0.01; LRχ 2 (item) = 9.84, *p* < 0.01]. Overall, participants were more likely to anticipate the target when the speaker was visible vs. grayed out [LRχ 2 (subj/item) = 47.56, *p* < 0.001; variation as a function of participants], and in OVS vs. SVO sentences [LRχ 2 (subj/item) = 10.20, *p* = 0.001]. In the NP2 time window, the only significant effect in the partial associations was a main effect of speaker, with more looks to the target when the speaker was visible [gaze: 58% vs. no gaze: 41%, LRχ 2 (subj/item) = 17.97, *p* < 0.001].

#### **Association between gaze-following and post-sentence effects**

The difference scores for early fixation counts and first fixation latencies were highly correlated (Kendall's *tau* = −0.69,


#### **Table 3 | Experiment 1, SHIFT time window: Coefficients, SE and t-values for the final models of log-ratios of target fixations.**

t-values in bold indicate a significant effect.

**Table 4 | Experiment 1, NP2 time window: Coefficients, SE and t-values for the final models of log-ratios of target fixations. The corresponding means and SD are included in the main text where necessary for interpretation.**


t-values in bold indicate a significant effect.

*p* < 0.001), but neither was correlated with the response-time difference scores (both *p*s > 0.8). In the linear mixed model of response times, only congruence affected response times, as described above: participants reacted faster to congruous than incongruous templates (*t* = 7.7, coefficient = 0.16, *SE* = 0.02). Neither gaze-following nor its interaction with congruence significantly predicted response times (*t*s < 1).

## **SUMMARY AND DISCUSSION**

Experiment 1 confirmed that participants anticipated the NP2 referent in NP1-V-NP2 sentences shortly after the speaker shifted gaze to that referent, and often even before its mention (see the hierarchical log-linear, log-gaze probability, and first fixation onset latency analyses). This seems to have been possible through peripheral vision, since the speaker was rarely fixated (see also Hanna and Brennan, 2008; Staudte and Crocker, 2011). It occurred in a setting in which the speaker was positioned at an angle to, rather than frontally opposite, the listener. Furthermore, the presence of the speaker interacted with sentence structure in affecting anticipatory shifts in attention to the target, but this interaction was found only in (hierarchical log-linear) analyses on fixations that *started* in the SHIFT time window.

In the post-sentence RTs, we found picture-sentence congruence effects (cf. Gough, 1965; Clark and Chase, 1972; Carpenter and Just, 1975; Staudte and Crocker, 2011), but speaker gaze and sentence structure effects were absent, and thus short-lived. Gazefollowing during the sentence further did not correlate with postsentence response times. The absence of speaker gaze effects on the RTs differs at first glance from the results in Staudte and Crocker (2011), who reported shorter response latencies for congruent vs. incongruent robot gaze. Unlike Staudte and Crocker (2011) who contrasted incongruent with congruent gaze, we contrasted congruent gaze with no gaze; this comparison may plausibly elicit less pronounced gaze effects. In addition, participants' responses in their study were speeded and thus may have been more closely tied to incremental gaze effects during comprehension than our responses, which occurred much later: in our materials, the NP2 was followed by an unrelated end phrase (such as "outside the supermarket"), as well as by the verification template, so the average total time between the speaker's gaze shift and the verification response was *M* = 14,068 ms (*SD* = 593 ms) – presumably ample time for any effects of gaze or structure to vanish.

One noteworthy point is that an interaction between speaker gaze and sentence structure in the SHIFT window emerged only in one out of three gaze measures. It is possible that sentence structure affects only some aspects of the eye-movement record. Alternatively, the task (verifying referents) encouraged participants to shy away from "deep" processing of sentence structure. Experiment 2 examined the latter possibility by changing the post-sentence task. Rather than verifying whether two circled characters had (vs. had not) been mentioned in the sentence, Experiment 2 used templates in which an arrow between two mentioned referents indicated who-does-what-to-whom, and was either congruous or incongruous with the thematic role relations of the sentence. Successful performance on this task requires computing the thematic role relations of the sentence and matching them against the depicted characters. To the extent that such a task focuses (visual) attention, an interaction of speaker gaze with sentence structure might be reflected in multiple eye-gaze measures and potentially even in post-sentence RTs. Experiment 2 thus provides a further opportunity to examine how seeing a speaker's gaze shift interacts with syntactic structure building and incremental thematic role assignment.

## **RESULTS FOR EXPERIMENT 2 ACCURACY AND RESPONSE-TIME RESULTS**

In Experiment 2, five participants had to be replaced (two bilingual; two misunderstood the task; one accuracy rate <75%). The remaining participants made at least 20/24 accurate responses; the mean accuracy of 96% did not vary by condition (*p*s > 0.85). Accuracy in the post-experiment memory task was around chance (52%).

Response times were significantly affected by both congruence and sentence structure: they were shorter when the template matched (vs. mismatched) the sentence (match: *M* = 969 ms, *SD* = 395; mismatch: *M* = 1149 ms, *SD* = 466; *t* = −6.07, coefficient: −0.17, *SE* = 0.03), and, unlike in Experiment 1, also shorter for SVO than OVS sentences (SVO: *M* = 1007 ms, *SD* = 347; OVS: *M* = 1114 ms, *SD* = 517; *t* = −2.39, coefficient: −0.05, *SE* = 0.02). The interaction of congruence and speaker approached significance (*t* = 1.79), with a greater difference in RTs (slower in the no gaze than gaze conditions) for incongruous compared to congruous trials.

#### **EYE-MOVEMENT RESULTS**

**Figure 3** shows a steep increase of fixations to the target character during the SHIFT time window in conditions where the speaker

**Table 5 | Experiment 2, SHIFT time window: Mean log-gaze probability ratios by condition for fixations to the target character (a) over the competitor or (b) over the NP1 referent.**


SD in parentheses.

was visible (gaze), just like in Experiment 1. By contrast, in the no gaze condition, fixations to the target character increased only in the second half of the NP2 window. Overall the gaze pattern was similar to that in Experiment 1; for the graphs of fixations to the other scene regions see Section "Detailed Graphs for Experiment 2" in Appendix.

#### **Log-gaze probability ratio analyses**

In the SHIFT window, participants fixated the target substantially more than the competitor or the NP1 referent when the speaker's gaze was visible, and this tendency increased over time (see **Tables 5** and **6**). Unlike in Experiment 1, sentence structure had no consistent effect in either comparison, though in the by participants analysis it interacted with speaker gaze: in the no gaze condition there was a stronger preference for the NP1 referent over the target in SVO compared to OVS sentences, but this difference due to sentence structure was substantially reduced in the gaze condition.


#### **Table 6 | Experiment 2, SHIFT time window: Coefficients, SE and t-values for the final models of log-ratios of target fixations.**

t-values in bold indicate a significant effect.



Descriptive statistics are included in the main text.

t-values in bold indicate a significant effect.

The pattern of results in the NP2 time window largely matched the earlier time window: participants fixated the target substantially more than the competitor in the gaze than no gaze conditions (*M* = 3.61, *SD* = 2.23 vs. *M* = 0.40, *SD* = 2.62), and this tendency continued to increase (see **Table 7** for the inferential analyses). As in the SHIFT window, sentence structure interacted with speaker gaze in the by participants analysis (a greater difference between no gaze and gaze for SVO than OVS sentences).

## **Onset latency of first fixation to the NP2 referent**

Again, participants were faster to fixate the target character when they could (vs. could not) see the speaker's gaze shift. Unlike in Experiment 1 however, sentence structure also interacted with speaker gaze: in SVO sentences, participants fixated the target character 358 ms earlier with gaze than with no gaze. This difference was substantially smaller for OVS sentences (*M* = 155 ms; *t* = 2.73, coefficient: 0.29, *SE* = 0.11).

#### **Hierarchical log-linear analyses of fixation counts**

In the SHIFT time window, hierarchical log-linear analyses confirmed a main effect of speaker [gaze: 39% vs. no gaze: 28% of target fixations, LRχ 2 (subj/item) = 13.07, *p* < 0.001]. As in the other two analyses, speaker gaze and sentence structure interacted by participants [LRχ 2 (subj) = 8.04; LRχ 2 (item) = 7.48, *p*s < 0.01], with a substantially larger effect of speaker for SVO (17%) than for OVS sentences (6%).

#### **Association between gaze-following and post-sentence effects**

The difference scores for early fixation counts and first fixation latencies were highly correlated (Kendall's *tau* = −0.66,

*p* < 0.001), but neither correlated with the response-time difference scores (*p*s > 0.5). The final model of response times revealed a reliable effect of both sentence structure (*t* = −2.33, coefficient = −0.05, *SE* = 0.02) and gaze-following (*t* = −2.46, coefficient = −0.05, *SE* = 0.02): Participants were faster to respond for SVO than OVS sentences (*M* = 997 ms vs. *M* = 1047 ms), and faster when they had (vs. had not) followed the speaker's gaze (*M* = 994 ms vs. *M* = 1045 ms, respectively). **Figure 4** clarifies the reliable three-way interaction (*t* = −2.52, coefficient = −0.18, *SE* = 0.07): The facilitatory effect of gazefollowing on response times was more pronounced for OVS than SVO sentences, but only when the template matched the sentence.

## **SUMMARY AND DISCUSSION**

The results from Experiment 2 replicated the rapid speaker gaze effects on listeners' sentence comprehension and visual attention to the target. In all three eye-movement analyses, the speaker gaze effect was larger for SVO than OVS sentences in the SHIFT window. This interaction of speaker gaze and sentence structure continued into the NP2 time window, though it was reliable by participants only (log-gaze probability ratios).

Unlike in Experiment 1, responses were faster when participants had vs. had not followed the speaker's gaze, and for SVO than OVS sentences. Gaze-following eliminated the SVO-OVS difference, but only when the template matched the sentence (see **Figure 4**). Thus, following a speaker's gaze during sentence comprehension in preparation for verifying thematic role relations alleviated the difficulty involved in understanding OVS sentences.

Overall, the task focus on thematic role relations verification brought out interactions of speaker gaze and sentence structure in both time windows and in all three measures. This highlights the important influence of the listener's current comprehension goal on online visual attention (see Salverda et al., 2011). Moreover, it provides strong evidence for the processing of speaker gaze in close temporal coordination with incremental syntactic structuring and thematic interpretation.

## **GENERAL DISCUSSION**

The present research examined three issues regarding effects of speaker gaze on a comprehender's visual attention and language comprehension. We asked (a) whether speaker gaze effects extend from a frontal speaker-listener setup to settings where the speaker is positioned at an angle relative to the comprehender (which arguably makes gaze shifts harder to detect); (b) whether speaker gaze merely enables the anticipation of referents, or whether it is also linked to other comprehension processes such as syntactic structuring and incremental thematic role assignment; and (c) whether speaker gaze effects on a listener's visual attention are short-lived or last into substantially delayed verification processes after sentence end. We recorded participants' eye movements as they inspected videos of a speaker who shifted her gaze to the NP2 referent of a subject-verb-object (SVO) or object-verb-subject (OVS) sentence shortly after producing the verb. We also recorded participants' response latencies in a delayed, post-sentence verification task on whether two characters had (vs. had not) been mentioned in the sentence (Experiment 1) or whether depicted thematic role-relations matched (vs. did not match) the sentential thematic role relations (Experiment 2). From this investigation, we gained insight into the extent to which speaker gaze effects play a role during real-time language comprehension and should thus be accommodated by existing accounts of situated language comprehension (e.g.,Knoeferle and Crocker, 2006, 2007;Altmann and Kamide, 2009).

In both the reference- and the role-relations verification tasks, listeners' eye gaze rapidly followed the speaker's gaze shift to the NP2 referent before the speaker mentioned it. Thus, a speaker's gaze shift to an upcoming referent can elicit rapid shifts in the visual attention of listeners – not only in a frontal speaker-listener setting (Hanna and Brennan, 2008; Staudte and Crocker, 2011), but also when the speaker is angled by 45–60˚ relative to the listener.

Speaker gaze moreover rapidly affected core comprehension processes such as syntactic structuring and thematic role assignment, on top of referential anticipation. The syntactic structure of the sentence clearly modulated listeners' visual anticipation of the NP2 referent when the task focused the listener's attention on processes of thematic role interpretation (Experiment 2). In fact, this effect was observed in one of the gaze measures even when the task was referent verification and thus did not require "deep" processing of the syntactic structure and thematic role relations (Experiment 1).

Participants made earlier first fixations to the target and generally fixated it more when they saw the speaker shift her gaze to this character, a gaze benefit that was more pronounced (faster and a greater numerical difference) for SVO than OVS sentences. Moreover, speaker gaze effects continued well into the NP2, and even extended to responses that were made considerably later, at least when participants verified the thematic role relations of the sentence. In fact, gaze and sentence structure interacted in modulating the response times: SVO sentences were verified faster overall, but when participants followed gaze during OVS sentences in the congruous condition, their mean response times were as fast as for SVO sentences. Thus, gaze-following can eliminate the difficulty associated with the processing of OVS sentences.

The cross-situational robustness of speaker gaze effects and their interaction with syntactic structuring and thematic role assignment suggest that existing accounts of visually situated language comprehension should accommodate them. The Coordinated Interplay Account predicts visual context effects closely temporally coordinated with when relevant aspects of visual context are identified by language. In line with this prediction, the present findings contribute the insight that the listener's gaze shift to the target character occurred in close temporal coordination with the speaker's gaze shift (see also, e.g., Hanna and Brennan, 2008; Staudte and Crocker, 2011). Likewise, evidence for interactions of speaker gaze effects with sentence structure appeared shortly after the speaker's gaze shift. Overall, the reported results fit with the prediction that visual context effects will appear closely time-locked to when visual context information is identified as relevant; by contrast, the accounts do not yet accommodate the outcome of verification processes (but see Knoeferle et al., in preparation).

With regard to the mechanism through which speaker gaze informs language comprehension, its effects likely differ compared to depicted referents or actions, which have been the focus of attention in previous studies. For depicted actions, for instance, a referential "match" with the verb can clarify that an action is relevant for comprehension. By contrast, speaker gaze is neither referenced nor associated with lexical entries, so its relevance at a given point in time must be computed differently. This could happen via knowledge of the functional role of the speaker in the communicative process, together with peripherally perceived dynamic motion (e.g., of gaze and head shifts).

Overall, while a direct comparison of action and speaker gaze effects will determine the extent of their similarities, the present findings clarify that speaker gaze effects are robust to variation in speaker-listener position; that they not only enable referential anticipation but also interact with core comprehension processes such as syntactic structuring and thematic role assignment; and that there are situations in which they extend in time and scope beyond the end of the current sentence to influence response times in a delayed verification task.

### **ACKNOWLEDGMENTS**

This research was supported by the Cognitive Interaction Technology Excellence Cluster (funded by the German Research Council) at Bielefeld University, Germany. We thank Linda Krull, Anne Kaestner, Eva Mende, Lydia Diegmann, and Eva Nunnemann for their assistance with preparing the stimuli and collecting data.

### **REFERENCES**


and mental representation. *Cognition* 111, 55–71.


issue edited by A. Myachykov, C. Scheepers, and Y. Shtyrov) 2:376. doi:10.3389/fpsyg.2011.00376


discourse comprehension. *Cogn. Sci.* 29, 1045–1060.


indices of iconic gesture comprehension. *Psychophysiology* 42, 654–667.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 August 2012; accepted: 13 November 2012; published online: 05 December 2012.*

*Citation: Knoeferle P and Kreysa H (2012) Can speaker gaze modulate syntactic structuring and thematic role assignment during spoken sentence comprehension? Front. Psychology 3:538. doi: 10.3389/fpsyg.2012.00538*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Knoeferle and Kreysa. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX**

## **SENTENCE STIMULI**

The following list contains the sentence stimuli first in subject-verb-object, then in object-verb-subject sentence structure. The same ending phrase was used in both conditions. The videos can be obtained from the authors upon request.


## **DETAILS FOR THE LINEAR MIXED MODELS ANALYSES**

Overview of the final models across experiments and analyses. The factors "match," "gazefollow," "gaze," and "struc" are place holders for the experimental variables congruency, gaze-following, speaker, and sentence structure, respectively. These predictors were centered around 0, so that the factor levels incongruous, no gaze-following, no gaze, and OVS were negatively coded, while congruous, gazefollowing, gaze, and SVO were positive. With regard to log-gaze probability ratios, the same final models were obtained both for the comparisons of target over competitor vs. target over NP1 referent fixations, and for the SHIFT and NP2 time windows. The *R* 2 and sigma values are approximated in these cases.


## **DETAILED GRAPHS FOR EXPERIMENT 2**

## **Experiment 2**

Proportion of fixations to (a) the competitor and (b) the NP1 referent. Graphs begin at the onset of the speaker's gaze shift. Mean onsets of the NP2 and the ending phrase are marked with vertical gray bars.

## Specific to whose body? Perspective-taking and the spatial mapping of valence

## **Jonathan F. Kominsky <sup>1</sup> and Daniel Casasanto2,3,4\***

<sup>1</sup> Department of Psychology, Yale University, New Haven, CT, USA

<sup>2</sup> Department of Psychology, The New School for Social Research, New York, NY, USA

<sup>3</sup> Neurobiology of Language Department, MPI for Psycholinguistics, Nijmegen, NL, Netherlands

<sup>4</sup> Donders Center for Cognition, Brain, and Behavior, Radboud University, Nijmegen, NL, Netherlands

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Sascha Topolinski, Universität Würzburg, Germany Dermot Lynott, University of Manchester, UK

#### **\*Correspondence:**

Daniel Casasanto, Department of Psychology, The New School for Social Research, 80 Fifth Avenue, 7th Floor, New York, NY 10011, USA. e-mail: casasanto@alum.mit.edu

People tend to associate the abstract concepts of "good" and "bad" with their fluent and disfluent sides of space, as determined by their natural handedness or by experimental manipulation (Casasanto, 2011). Here we investigated influences of spatial perspective taking on the spatialization of "good" and "bad." In the first experiment, participants indicated where a schematically drawn cartoon character would locate "good" and "bad" stimuli. Right-handers tended to assign "good" to the right and "bad" to the left side of egocentric space when the character shared their spatial perspective, but when the character was rotated 180˚ this spatial mapping was reversed: good was assigned to the character's right side, not the participant's. The tendency to spatialize valence from the character's perspective was stronger in the second experiment, when participants were shown a full-featured photograph of the character. In a third experiment, most participants not only spatialized "good" and "bad" from the character's perspective, they also based their judgments on a salient attribute of the character's body (an injured hand) rather than their own body. Taking another's spatial perspective encourages people to compute space-valence mappings using an allocentric frame of reference, based on the fluency with which the other person could perform motor actions with their right or left hand. When people reason from their own spatial perspective, their judgments depend, in part, on the specifics of their bodies; when people reason from someone else's perspective, their judgments may depend on the specifics of the other person's body, instead.

**Keywords: body-specificity hypothesis, handedness, perspective taking, space, valence**

## **INTRODUCTION**

Across many cultures, the right side is associated with things that are good and lawful, and the left side with things that are dirty, bad, or prohibited. The association of "good" with "right" and "bad" with "left" is evident in positive and negative idioms like "my right-hand man" and "two left feet," and in the meanings of English words derived from the Latin for "right" (dexter) and "left" (sinister).

Beyond patterns in language, people also implicitly associate positively and negatively valenced ideas with "right" and "left" – but not always in the way that linguistic and cultural conventions suggest. Rather, associations between valence and left-right space depend on the way people use their hands (Casasanto, 2009, 2011). When asked to decide which of two products to buy, which of two job applicants to hire, or which of two alien creatures looks more honest, intelligent, or attractive, right- and left-handers tend to respond differently: right-handers tend to prefer the product, person, or creature presented on their right side, but left-handers tend to prefer the one on their left (Casasanto, 2009). This pattern persists even when people make judgments orally, without using their hands to respond. Children as young as 5 years old already make evaluations according to handedness and spatial location, judging animals shown on their dominant side to be nicer and smarter than animals on their non-dominant side (Casasanto and Henetz, 2012).

The implicit association between valence and left-right space influences people's memory and their motor responses, as well as their judgments. In one experiment, participants were shown the locations of fictitious positive and negative events on a map, and asked to recall the locations later. Memory errors were predicted by the valence of the event and the handedness of the participant: right-handers were biased to locate positive events too far to the right and negative events too far to the left on the map, whereas left-handers showed the opposite biases (Brunyé et al., 2012). In reaction time tasks, right- and left-handers were faster to classify words as positive when responding by pressing a button with their dominant hand, and faster to classify words as negative when responding with their non-dominant hand (de la Vega et al., 2012).

Associations of handedness with valence and space have been observed beyond the laboratory, in the speech and gestures of right- and left-handed US presidential candidates during televised debates (Casasanto and Jasmin, 2010). In right-handers, righthand gestures were more strongly associated with positive-valence speech than left-hand gestures, and left-hand gestures were more strongly associated with negative-valence speech than right-hand gestures; the opposite associations between hand and valence were found in left-handers, despite the centuries-old tradition of training public speakers to gesture with the right hand for good things and the left hand for bad things (or not to use the left hand at all; Quintilianus, 1920).

Together, these data from studies using questionnaires, reaction time tasks, map tasks, and spontaneous gestures suggest that the association of positivity and negativity with people's dominant and non-dominant sides of space are habitually activated, with a high degree of automaticity, when people evaluate the positivity of stimuli or recall information with a positive or negative valence. These findings provide one line of support for Casasanto's (2009, 2011) *body-specificity hypothesis*: if the content of the mind depends, in part, on the way we interact with the environment with our bodies, then people with different kinds of bodies should tend to think differently, in predictable ways.

The body-specific association of valence with left-right space is robust, but it is also flexible. Casasanto (2009) proposed that people come to associate "positive" with their dominant side of space because they can usually interact with their physical environment more fluently on this side, using their dominant hand. This proposal follows from the finding that fluent perceptuo-motor interactions with the environment generally lead to more positive feelings, whereas disfluent interactions lead to more negative feelings and evaluations (e.g., Reber et al., 1998; Beilock and Holt, 2007; Oppenheimer, 2008; Ping et al., 2009). To test whether manual motor fluency drives associations between valence and space, Casasanto and Chrysikou (2011) studied how people think about "good" and "bad" after their dominant hand had been impaired, reversing the usual asymmetry in motor fluency between their right and left hands. This reversal of motor fluency resulted in a reversal of behavioral responses: right-handers whose right hand was impaired permanently by a unilateral stroke, or temporarily by wearing a cumbersome glove on the right hand in the laboratory, tended to associate "good" with the *left* side of space, like natural left-handers.

The finding that even a few minutes of experiencing a reversed motor asymmetry can completely reverse people's usual judgments about the spatial mapping of valence has several implications. First, it shows that motor experience is sufficient to cause people to associate "good" with one side of space or the other, at least temporarily. Second, this finding supports a proposal at the heart of body-specificity: context shapes thinking, and the body is an ever-present part of the context in which we use our minds. To the extent that the body provides a stable context, the body-specific representations that people form are likely to appear stable over time; to the extent that body-relevant aspects of the context change, representations they activate may change accordingly (Casasanto, 2011).

To date, the body-specificity hypothesis has been tested with participants in isolating contexts: People's brains and behaviors have been measured while they were interacting primarily with a piece of paper (Casasanto, 2009; Casasanto and Chrysikou, 2011; Casasanto and Henetz, 2012) or a computer screen (Willems et al., 2009, 2010; Brunyé et al., 2012; de la Vega et al., 2012), or while making monologic statements into a television camera (Casasanto and Jasmin, 2010). Perhaps as a consequence, the data suggest that

in all of these previous studies people's body-specific neural and mental representations have been computed from an egocentric perspective. That is, at least by default, people tend to imagine actions (Willems et al., 2009) and understand the meanings of action verbs (Willems et al., 2010) based on the way they would perform these actions with their own bodies, and they tend to activate associations between space and valence based on the long- or short-term constraints of their own manual motor fluency, using an egocentric spatial frame of reference (Casasanto, 2011).

Yet, in the richer physical and social world outside of the lab or the television studio, *other people* often feature prominently in the contexts in which we use our minds, and people often adopt other people's mental or spatial perspectives.When communicating spatial information to another person, people frequently describe things from the recipient's spatial perspective rather than their own (Schober, 1993, 1995; Mainwaring et al., 2003). Of particular relevance, people may spontaneously take the spatial perspective of another person depicted in a photograph when reasoning about "right" and "left," especially when action is implied (Tversky and Hard, 2009). In face-to-face interactions, listeners tend to mimic the speaker's bodily movements mirror-wise: if the speaker leans to her right, listeners lean to their *left*, so as to move in the same absolute direction as the speaker (but the opposite direction in body-centered space), suggesting that they spontaneously adopt an allocentric spatial perspective (Bavelas et al., 1988).

The present study investigates the consequences of spatial perspective-taking on the body-specific spatial mapping of "good" and "bad."Although initial tests of the body-specificity hypothesis have focused on the role of one's own body in shaping thoughts, feelings, and judgments, the idea that all thinking occurs from an egocentric perspective is ruled out by the studies reviewed above. Casasanto et al. have suggested that people may sometimes represent other people's actions allocentrically, in terms of the specifics of *their* bodies, which are either observed or assumed (e.g., see Willems et al., 2010, p. 73; Beveridge et al., 2012). There is no doubt that people can change spatial perspectives flexibly. Here we investigated how perspective-taking interacts with the bodily characteristics of the participant and of a depicted "other" to determine judgments about the left-right mapping of "good" and "bad."

Do people only compute space-valence mappings on the basis of their own bodily characteristics, or can they also compute these mappings on the basis of another person's bodily characteristics (observed or assumed), when asked to reason about the other person's choices? To find out, in three experiments, we asked righthanded participants to perform a simple diagram task, adapted from Casasanto (2009). Participants saw a character named Bob in the center of a screen in between two boxes, one on the participant's left and the other on their right. They were asked to indicate which box Bob would put "good" things in, and which box he would put "bad" things in. For half of the participants, Bob was facing the same direction that they were (Shared Perspective condition: Bob's right was the participant's right), and for the other half Bob was facing the opposite direction (Opposite Perspective condition: Bob's right was the participant's left). All participants were instructed to reason about Bob's placement of good and bad things *taking Bob's perspective.*

In previous experiments, Bob was always facing the same direction as the participants. Results showed a strong tendency for left-handers to say that Bob would place good things on their left, and for right-handers to say that he would place good things on their right. This basic result in the "Bob" task has been replicated across seven experiments, conducted on three continents (Casasanto, 2009; Casasanto and Chrysikou, 2011; de la Fuente et al., 2011). We therefore expected that in the Shared Perspective condition, right-handers would tend to assign good things to the box on their right.

For the Opposite Perspective condition, we sought to distinguish two possibilities. First, participants could *still* tend to assign good things to the box on their right side. This would suggest that people's judgments about the spatial mapping of valence are entirely egocentric: regardless of Bob's spatial perspective (and of the explicit instructions to consider it), participants' own motor fluency determines their responses. Alternatively, right-handed participants could tend to assign good things to the box on their left. This would suggest that participants are adopting an allocentric perspective, and reasoning about Bob's preferences on the basis of *his* motor capacities – on the assumption (perhaps implicit) that Bob is a right-hander,which is true of about 90% of the population (Coren, 1992).

## **EXPERIMENT 1: PUTTING BODY-SPECIFIC SPACE-VALENCE MAPPINGS IN PERSPECTIVE**

Experiment 1 provided an initial test of the effect of spatial perspective on the left-right mapping of emotional valence, first using a simple cartoon character as in previous "Bob" experiments (Experiment 1a), and second using a more naturalistic color photograph of "Bob," to facilitate perspective-taking (Experiment 1b).

## **EXPERIMENT 1A Methods**

*Participants.* Three hundred adults (over 18 years old by self report) were recruited anonymously via Amazon Mechanical Turk, and participated online, for payment.

*Materials and procedure.* Materials and procedure were adapted from Casasanto (2009, Experiment 3). After providing informed consent, participants performed a two-question diagram task that has been shown to elicit contrasting space-valence judgments in right- and left-handers (Casasanto, 2009; Casasanto and Chrysikou, 2011; de la Fuente et al., 2011). Participants saw a cartoon character's head in the center of the screen between two empty boxes, one on the participants' right and the other on their left. They were told that the character, named Bob, loves zebras and thinks they are good, but hates pandas and thinks they are bad (or vice versa, with the assignment of valence to the animals counterbalanced across participants). Participants were asked to indicate where Bob would put each of the animals if he were going to put the good animal in one box and the bad animal in the other, by clicking inside one box and then the other. The order in which participants were asked to locate the good and bad animals was counterbalanced, to ensure that any associations between space and valence were not confounded with numerical or temporal order.

Participants were randomly assigned to one of two versions of the experiment. For half of the participants, Bob was facing the same direction that they were (Shared Perspective condition; **Figure 1B**), and for the other half Bob was facing the opposite direction (Opposite Perspective condition; **Figure 1A**). All participants were instructed to take Bob's perspective when reasoning about his placement of the good and bad animals, as in previous written versions of the "Bob" experiment (Casasanto, 2009, Experiments 1–2)*.*

After completing the diagram task, participants answered two filler questions, and then provided a brief rationale for where they thought Bob would place the "good" animal. They then completed the Edinburgh Handedness Inventory (EHI) (Oldfield, 1971),with one added item:"Which hand do you use a computer mouse with?" This question was not used in calculating the EHI score, and was includedfor exploratory purposes to informfuture studies. Finally,

to determine whether participants were capable of taking Bob's perspective accurately, we showed them **Figure 1A** and asked them to click on the box on the right (or left, randomly determined), from Bob's point of view. Participants were then given an optional demographic questionnaire.

*Design.* The design of the experiment included three factors of interest [Valence (Good animal, Bad animal), Space (Left box, Right box), and Perspective (Shared perspective, Opposite perspective)], as well as two factors not of interest [Animal assignment (Panda = Good, Zebra = Good) and Question order (Positive animal first, Negative animal first)], resulting in a 2 × 2 × 2 × 2 × 2 design. Ideally, the design would also include a fourth factor of interest: the handedness of the participant, which would add (in the simplest case) another binary factor. However, given the rate of left-handers in the population, about 10%, we estimated that we would need a sample size of at least 1,000 participants in order to have a sufficient number of left-handers randomly assigned to each cell of the design. Fortunately, the design allows the effect of perspective-taking to be evaluated within a single handedness group, so rather than collecting a much larger sample, we decided to exclude data from all non-right-handed participants (EHI < 40).

#### **Results and discussion**

Left-handers (*n* = 23) and ambidextrous participants (*n* = 61) were excluded, leaving only right-handed participants (*n* = 209). Among right-handers, 91% of participants correctly answered the perspective-taking manipulation check. Of these 191 participants, 15 did not click inside of either box on the test items, so their data could not be analyzed. This left 176 participants whose data were analyzed: 85 participants in the Opposite Perspective condition and 91 in the Shared Perspective condition.

Throughout these results, we will refer to the placement of the "good" animal from the *participant's* perspective (i.e., egocentric right and left). In the Shared Perspective condition, the majority (63%) of participants placed the "good" animal on their right (57 right = good vs. 34 left = good, sign test *p* = 0.02), replicating previous findings in right-handers. By contrast, in the Opposite Perspective condition, the pattern was reversed, though only marginally significant, with the majority (60%) of participants placing the "good" animal on their left (i.e., on Bob's right: 51 left = good vs. 34 right = good, sign test *p* = 0.08). A binary logistic regression confirmed the significant effect of Perspective condition on placement of the "good" animal (Wald χ <sup>2</sup> = 8.86, df = 1, *p* = 0.003, OR = 2.52, 95% CI = 1.37 – 4.61), indicating that participants in the Opposite Perspective condition were about 2.5 times more likely to place the "good" animal on Bob's right than participants in the Shared Perspective condition (**Figure 2**).

Analyses of the debriefing data showed that, of the participants included in the main analysis, 25% justified their assignment of the"good" animal to the right or left box on the basis of either their own handedness or Bob's handedness. This rate was surprisingly high: in previous versions of this task, the percent of participants who explained their responses in terms of handedness has ranged from 5 (de la Fuente et al., 2011, Experiment 2) to 14% (Casasanto, 2009, Experiment 2). We do not know why the rate of debriefing

responses mentioning handedness was higher in this study than in previous studies that used different versions of the same task. One possibility is that our Amazon Mechanical Turk participants, who were completing the study at their leisure, took more time to reflect on possible explanations for their choices than participants in previous studies, who were tested in the laboratory or in face-to-face conversations with the experimenter, and whose most frequent debriefing response in some studies was "I don't know." On this account, the increased mentions of handedness during the debriefing may not indicate that a greater proportion of participants were conscious of making their choices on the basis of handedness *during the task*; rather, these debriefing data could indicate that a greater proportion of participants generated a handedness-related explanation *post hoc*, given sufficient time to reflect on their responses. On another possibility, some of the participants in the present study may have been familiar with the idea of handedness-based space-valence associations which, since they were first reported in 2009, have been described several times in high-circulation newspapers and magazines. Whatever the correct explanation may be, we note that similar patterns of responses were found in participants who mentioned handedness during the debriefing as in those who did not. When Debriefing Response (Mentioned handedness, Did not mention handedness) was added to the binary logistic regression model, it did not interact with Perspective to predict the side of participants'"good animal" responses (Wald χ <sup>2</sup> = 2.04, df = 1, *p* = 0.15), and the effect of Perspective was still significant when the interaction of Perspective and Debriefing Response was controlled (Wald χ <sup>2</sup> = 4.51, *p* = 0.03, OR = 2.02, 95% CI = 1.06 – 3.86).

In summary, when right-handed participants shared their visuo-spatial perspective with Bob, they tended to indicate that Bob would place the "good" animal on their (mutual) right side. By contrast, when Bob was rotated 180˚ such that his perspective was opposite the participants', they tended to indicate that Bob would place the good animal on *his right*, which was their own left side. It appears that the association of "good" with the right side is not restricted to the egocentric right; rather, when asked to consider another' person's perspective, right-handers will apply the same "good is right"mapping for someone else's point of view.

Overall, the effect of perspective on participants'judgments was highly significant, but we note that the effect in the condition of greatest interest (Opposite Perspective) was only marginally significant, perhaps because people are not accustomed to computing the spatial perspective of a schematic, disembodied cartoon head, viewed from above. In Experiment 1b,we repeated this experiment using a full-featured photograph of "Bob," viewed from either the back (Shared Perspective) or the front (Opposite Perspective), reasoning that the richer, more naturalistic stimulus could enhance the perspective-taking effect found in Experiment 1a.

## **EXPERIMENT 1B Methods**

*Participants.* Three hundred new participants from Amazon Mechanical Turk participated, for payment.

*Design, materials, and procedure.* The design, materials, and procedure were identical to Experiment 1a with the following

exceptions: each participant saw one of the images in **Figure 3**, depicting a full-featured "Bob" rather than an abstract line drawing.

## **Results and discussion**

After removing left-handed (*n* = 13) and ambidextrous participants (*n* = 59) there were 222 right-handed participants, 214 of whom (96%) passed the perspective-taking manipulation check. Of these participants, 6 failed to click inside the boxes, leaving 208 right-handed participants whose data could be analyzed: 112 in the Shared Perspective condition and 96 in the Opposite Perspective condition.

In the Shared Perspective condition, the majority (78%) of participants indicated that the "good" animal should be placed in the box on their right (87 good = right vs. 25 good = left, sign test *p* = 0.001). In the Opposite Perspective condition, the majority (73%) of participants indicated that the "good" animal should be placed in the box on their left (Bob's right) (70 good = left vs. 26 good = right, sign test *p* = 0.001). A binary logistic regression confirmed the effect of Perspective condition on the placement of the "good" animal (Wald χ <sup>2</sup> = 48.02, df = 1, *p* = 0.001, OR = 9.37, 95% CI = 4.98 – 17.64) indicating that participants in the Opposite Perspective condition were almost 10 times more likely to place the "good" animal on Bob's right than participants in the Shared Perspective condition (**Figure 4**).

Analyses of the debriefing data showed that, of the participants included in the main analysis, 42% justified their assignment of the "good" animal to the right or left box on the basis of either their own handedness or Bob's handedness. In a further analysis, Debriefing Response (Mentioned handedness, Did not mention handedness) was added to the binary logistic regression model. There was a significant interaction between Debriefing Response and Perspective (Wald χ <sup>2</sup> = 14.64, df = 1, *p* = 0.001, OR = 21.82, 95% CI = 4.50 – 105.83), indicating that the effect of Perspective was stronger in participants who explicitly mentioned handedness (Wald χ <sup>2</sup> = 37.14, df = 1, *p* = 0.001, OR = 75.20, 95% CI = 18.74 – 301.74) than in those who did not, but the effect of Perspective remained significant in the majority of participants who did not mention handedness (Wald χ <sup>2</sup> = 10.45, *p* = 0.001, OR = 3.45, 95% CI = 1.63 – 7.30). Pairwise differences between the number of "Good = Left" and "Good = Right" responses were

significant in both the Shared and Opposite Perspective conditions, regardless of whether participants mentioned handedness in the debriefing (**Table 1**).

The results of Experiment 1b corroborate those of Experiment 1a: when right-handers share Bob's spatial perspective, they tend

**Table 1 | Judgments from participants who did and did not mention handedness when justifying their responses in Experiment 1b.**


"Good = Left" and "Good = Right" are coded from the participant's perspective.

to assign the "good" animal to the box on their right and the "bad" animal to the box on their left. By contrast, when asked to decide where a 180˚-rotated Bob would place the animals, participants tend to assign the "good" animal to the box on *their left* and the "bad" animal to the box on *their right.*

In order to compare the strength of the effect of Perspective between Experiments 1a and 1b, we conducted an additional binary logistic regression adding Experiment to the model used in the main analysis. The interaction of Perspective (Shared, Opposite) and Experiment (1a, 1b) was significant (Wald χ <sup>2</sup> = 8.64, df = 1, *p* = 0.003, OR = 3.73, 95% CI = 1.55 – 8.96), indicating that the effect of perspective-taking effect on the spatialization of valence was stronger in Experiment 1b than Experiment 1a, presumably because participants were able to compute space-valence relationships more easily or more automatically when shown a more lifelike depiction of Bob.

As in previous tests of body-specific space-valence associations, here participants'judgments appear to follow the"dominant side is good"mapping (whether they activate this association consciously or unconsciously). On the simplest interpretation of these data, when Bob shares their point of view, participants compute "left" and "right" from an egocentric spatial perspective, and when Bob has the opposite point of view, participants compute "left" and "right" from an allocentric spatial perspective.

Yet, there is an alternative to this conclusion. The data from the Opposite Perspective condition are consistent with participants computing "left" and "right" allocentrically, from Bob's 180<sup>0</sup> rotated viewpoint, *based on Bob's bodily characteristics* – assuming (perhaps implicitly) that Bob is a right-hander, which is true of about 90% of the population (Coren, 1992). But the data are also consistent with the possibility that participants are not really considering Bob's bodily characteristics, at all, and are instead adopting what we will call a "rotated egocentric" perspective: maybe participants are projecting their own bodily characteristics onto Bob (perhaps because they cast themselves in the "role" of Bob). In which case, in the Opposite Perspective condition they would assign the "good" animal to the box on their left, not because they assume that Bob is a right-hander (based on the handedness statistics of the population), but rather because they themselves are right-handed, and they compute space-valence associations *based on their own bodily characteristics* even when asked to reason from another person's perspective.

Adopting a"rotated egocentric"perspective would be consistent with other demonstrations of surprising egocentrism in adults, in which experimental participants project their own bodily characteristics onto another person. For example, in one set of experiments,when asked to recall the eye color of well-known celebrities, brown-eyed participants were biased to attribute brown-eyedness to most of the stars tested, but blue-eyed participants were biased to attribute blue-eyedness to the stars, despite the rarity of blueeyedness in the population (Casasanto and Staum Casasanto, 2011). This effect persisted when analyses were controlled for how well participants knew the celebrities, how well they liked them, and how confident participants were in their judgments: participants still tended to project their own bodily characteristics onto other people. If such egocentric projection of one's own bodily traits onto others accounts for the results of the Opposite Perspective condition here, it would be inappropriate to conclude that switching points of view caused participants to spatialize valence from an allocentric perspective, based on Bob's (assumed) bodily characteristics.

One way to distinguish between the "allocentric" and "rotated egocentric" possibilities would be to repeat Experiment 1 in left-handers. If participants reason about Bob's choices from an allocentric perspective, then right- and left-handers should respond similarly in the Opposite Perspective condition, since both groups should assume that Bob is a right-hander, based on the statistics of the population. Alternatively,if participants reason from a rotated egocentric perspective, then right- and left-handers should show opposite patterns of responses in the Opposite Perspective condition: right-handers should impute right-handedness to Bob and choose the box on their left, but left-handers should impute left-handedness to Bob and choose the box on their right. Yet, there are practical and theoretical limitations to this proposed test. Practically speaking, a very large sample would be needed in order to recruit a sufficient number of left-handers from the general population. Theoretically, these imagined data would still be correlational, and therefore subject to speculations about other unexamined differences between right- and left-handers' judgments.

In order to distinguish between the "allocentric" and "rotated egocentric" possibilities while addressing both of these concerns, for Experiment 2 we conducted a true experimental manipulation in right-handers, randomly assigning them to make judgments about Bob's preference when provided with a highly salient indicator of his manual motor fluency with his right vs. left hand.

## **EXPERIMENT 2: ARE PARTICIPANT'S REASONING ON THE BASIS OF BOB'S BODY OR THEIR OWN?**

In order to determine whether participants in Experiment 1 were making judgments based on Bob's bodily characteristics or their own, in Experiment 2 we asked right-handers to judge where Bob would place the good and bad animals while viewing a picture of him that made it easy to tell whether he could act more fluently with his right or left hand. Bob (viewed from either the front or the back) wore a sling on either his right or left arm, indicating that either his left hand was temporarily impaired (making him functionally a right-hander) or his right hand was impaired (making him functionally a left-hander; see Casasanto and Chrysikou, 2011; **Figure 5**).

If participants can reason about Bob's spatialization of "good" and "bad"from a genuinely allocentric perspective, on the basis of Bob's bodily characteristics, then in both the Shared Perspective and the Opposite Perspective conditions participants assigned to see Bob as functionally left-handed (sling on right arm) should respond differently from those assigned to see him as functionally right-handed (sling on left arm), since in all cases the sling makes it apparent which side is Bob's"good" side (i.e., his fluent side). Alternatively, if participants reason about Bob's choices from a rotated egocentric perspective, projecting their own bodily characteristics onto Bob, then the sling should have no effect on participants' judgments. As in Experiment 1, in the Shared Perspective condition right-handed participants should put the good animal on their right, and in the Opposite Perspective condition they should put the good animal on their left (Bob's right), regardless of which arm the sling appeared on.

## **METHODS**

## **Participants**

Six hundred new participants from Amazon Mechanical Turk participated, for payment.

## **Design, materials, and procedure**

The design, materials, and procedure were identical to those in Experiment 1b, with the following exception: participants were randomly assigned to see one of the photographs in **Figure 5**, in which Bob, viewed from either the front or the back, wore a sling on either the right arm or the left.

## **RESULTS AND DISCUSSION**

Of the 469 right-handed participants who produced codable responses, 450 (96%) answered the perspective-taking manipulation check question correctly. According to a binary logistic regression, Perspective (Shared, Opposite), and Sling Arm (Right, Left) interacted to predict participants' placement of the good animal in the box on their right or left (Wald χ <sup>2</sup> = 113.86, df = 1, *p* = 0.0001, OR = 157.57, 95% CI = 62.21 – 399.11). Binary logistic regressions were then conducted for each Perspective condition, as well as sign tests for each condition.

**Figure 6** shows the assignment of the "good" animal in each condition. In the Opposite Perspective condition, when the sling was on Bob's left arm, participants (*n* = 105) put the"good"animal on their left 84% of the time (88 good = left vs. 17 good = right, sign test *p* = 0.001). When the sling was on Bob's right arm, participants (*n* = 107) put the good animal on their left only 22% of the time (24 good = left vs. 83 good = right, sign test *p* = 0.001; Wald χ <sup>2</sup> = 67.17, *p* = 0.001, OR = 17.90, 95% CI = 8.98 – 35.69).

In the Shared Perspective condition, when the sling was on Bob's left arm, participants (*n* = 112) put the "good" animal on their right 80% of the time (90 good = right vs. 22 good = left, sign test *p* = 0.001), whereas when the sling was on Bob's right arm (*n* = 104), participants put the good animal on their right only 32% of the time (33 good = right vs. 71 good = left, sign test *p* = 0.001; Wald χ <sup>2</sup> = 46.86, *p* = 0.001, OR = 8.80, 95% CI = 4.72 – 16.41).

In summary, in both the Shared Perspective and Opposite Perspective conditions, the majority of participants assigned the "good" animal to Bob's fluent side of space, that is, the side ipsilateral to his sling-free arm. Results suggest that participants in the Opposite Perspective condition were adopting a genuine allocentric perspective, not a rotated egocentric perspective, and basing their judgments on Bob's bodily characteristics rather than their own.

## **GENERAL DISCUSSION**

In two experiments, we demonstrated that taking another person's perspective can influence judgments about the spatial mapping of emotional valence. In Experiment 1, when participants shared the same spatial point of view as the cartoon character whose preferences they were asked to reason about, the (right-handed) participants tended to compute space-valence relationships egocentrically, showing the "good is right" bias found previously in right-handers (Casasanto, 2009). That is, participants indicated that the "good" side of space was the side on which they could interact with the physical environment more fluently using their dominant hand. When the participants' point of view was rotated 180˚ from the character's, however, the spatial mapping of valence was reversed: "Good" was assigned most often to the character's right side (i.e., the participant's left). This effect of

spatial perspective was strengthened when the cartoon character used in Experiment 1a was replaced with a color photograph of a man (Experiment 1b), presumably because the full-featured photograph enabled participants to compute space-valence relationships from the character's perspective more easily or more automatically.

The results of Experiment 1 were compatible with two possibilities: when participants computed space-valence relationships they could have been adopting an allocentric perspective, basing their judgments on the character's bodily characteristics (assuming, perhaps implicitly, that the character was right-handed, like the 90% majority of people). Alternatively, they could have been adopting a "rotated egocentric" perspective, projecting their own handedness onto the character or putting themselves in his shoes, and basing their judgments on their own bodily characteristics. In Experiment 2, the character's hand dominance could be inferred unambiguously. Results clearly indicated that participants' were adopting an allocentric perspective: they spatialized "good" and "bad" on the basis of the character's bodily characteristics, not their own.

Is it possible that the results of these experiments were artifacts of the particular task used, in which participants were explicitly asked to spatialize "good" and "bad," and to use their hands when responding? For example, could clicking the mouse on one's dominant side of the screen have been easier than clicking on the other side, leading to a trivial association between space and valence? It is unlikely that such task characteristics can explain these results, for several reasons. First, there is no reason to believe that using the mouse with one's dominant hand should cause participants to prefer to click in a particular box, or that it was easier to click in one box vs. the other. Furthermore, even if responding with the mouse did bias people toward responding on one side of the screen (e.g., if it were slightly easier to click on one side than the other), the fact that participants' preferred box *reversed* according to whether Bob was facing toward them or away from them definitively rules out any explanation for their responses based on the location of the mouse, or how easy it was to click in the right or left box.

More broadly, across previous studies using the "Bob goes to the zoo" task, results obtained in versions of that task that required manual responses (e.g., Casasanto, 2009, Experiments 1–2) have been statistically indistinguishable from results of "hands-free" versions that only required oral responses (e.g., Casasanto, 2009, Experiment 3; de la Fuente et al., 2011), addressing concerns about the use of the hands during this task. More broadly still, we note that the fluency-based body-specific association of left-right space and valence has been shown in a wide variety of tasks with diverse dependent measures (e.g., diagram tasks, forced-choice questionnaires, reaction time tasks,visual-hemifield tasks,location memory tasks, analyses of spontaneous gestures), and in diverse populations including healthy right- and left-handed adults from the USA, Germany, The Netherlands, Spain, and Morocco, as well as hemiparesis patients, children as young as 5 years old, and US presidential candidates who did not know that they were experimental "participants" (Casasanto, 2009; Casasanto and Jasmin, 2010; Brookshire and Casasanto, 2011; Casasanto and Chrysikou, 2011; de la Fuente et al., 2011; Brunyé et al., 2012; Casasanto and Henetz, 2012; de la Vega et al., 2012).

We acknowledge, however, that this study is only a first test of the effects of spatial perspective on left-right valence associations. It would be useful to corroborate these results with further tests that use more implicit dependent measures, which could rule out other potential task-based explanations. For example,

an anonymous reviewer suggested this alternative account of the findings of Experiment 2: rather than reasoning about "good" and "bad" from Bob's perspective, participants could have used a simple matching strategy, matching the good animal with the "good" (i.e., uninjured) side of Bob's body. Although our data cannot rule out this possibility, such an explanation cannot account for the perspective-taking evident in Experiment 1, or the results of the several previous versions of the"Bob goes to the zoo" task reviewed above.

One question left open by this study is: to what extent do people take another's spatial point of view spontaneously, and therefore reason about space and valence from an allocentric perspective, based on characteristics of the other person's body? In these experiments,we explicitly instructed participants to adopt the character's perspective (and made sure they were capable of doing so correctly). Yet, even without explicit instruction, people routinely represent the perspective of people with whom they interact faceto-face, as is evidenced by studies of dialog (e.g., Schober, 1995), gesture (McNeill, 1992), mimicry (Bavelas et al., 1988), and spatial descriptions of pictures (Mainwaring et al., 2003; Tversky and Hard, 2009). It is likely, therefore, that allocentric reasoning about space and valence may occur spontaneously.

Another open question is the extent to which people's reasoning about space and valence is constituted by modality-specific simulations of motor actions, and the fluency with which they could be performed on one side of space or the other. Previous studies show that people imagine actions and understand decontextualized action verbs,in part,via motor simulations constructedfrom a body-specific, egocentric perspective. That is, when asked to imagine "grasping" or to read the verb "grasp," right- and left-handers preferentially activate motor areas in the hemisphere contralateral to their dominant hand that are used for planning and performing manual actions (Willems et al., 2009, 2010). There is clear evidence that the body-specific spatialization of emotional valence depends on an individual's history of motor actions (Casasanto and Chrysikou, 2011). It is not known, however, whether the mental representations that underlie reasoning about the spatial correlates of valence are constituted, in part, by motor simulations: that is, motor simulations of actions as people would perform them with their own bodies (when they adopt an egocentric perspective), and motor simulations of relatively fluent and disfluent actions as they would be performed by another (when they adopt an allocentric perspective). Determining whether the representations underlying behavioral effects like those we show here include simulations in motor circuits that support acting with the dominant and non-dominant hands will require more direct observations of neural activity (e.g., using fMRI) or direct interventions on neural circuits that compute hand actions (e.g., using rTMS or tDCS).

## **CONCLUSION**

These experiments corroborate previous studies showing that the spatial mapping of "good" and "bad" is body-specific. Furthermore, they show for the first time that the body this spatial mapping is *specific to* is not necessarily one's own. When we reason from our own perspective, our judgments are conditioned by the particulars of our bodies; when we reason from someone else's perspective, our judgments may be conditioned by the particulars of *their* bodies. The body shapes our thoughts, feelings, and judgments because it is an ever-present part of the context in which we use our minds (Casasanto, 2011). Other people are also an important element of the context in which we do our thinking, therefore thinking is sensitive to the specifics of their bodies, as well as our own.

## **REFERENCES**


### **ACKNOWLEDGMENTS**

The authors thank Matthew Fisher for his assistance in creating the stimuli for Experiments 1b and 2. This research was supported in part by a grant from the Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía and the European Regional Development Fund (P09-SEJ-4772) and by a James S. McDonnell Foundation Scholar Award to Daniel Casasanto.

Descriptions of simple spatial scenes in english and japanese. *Spat. Cogn. Comput.* 3, 3–42.


representations of action verbs neural evidence from right- and left-handers. *Psychol. Sci.* 21, 67–74.

Willems, R. M., Toni, I., Hagoort, P., and Casasanto, D. (2009). Body-specific motor imagery of hand actions: neural evidence from right- and lefthanders. *Front. Hum. Neurosci.* 3:39. doi:10.3389/neuro.09.039.2009

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 August 2012; accepted: 23 April 2013; published online: 13 May 2013.*

*Citation: Kominsky JF and Casasanto D (2013) Specific to whose body? Perspective-taking and the spatial mapping of valence. Front. Psychol. 4:266. doi: 10.3389/fpsyg.2013.00266*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Kominsky and Casasanto. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## A feeling for numbers: shared metric for symbolic and tactile numerosities

## **Florian Krause<sup>1</sup>\*, Harold Bekkering<sup>1</sup> and Oliver Lindemann<sup>2</sup>**

<sup>1</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands <sup>2</sup> Division of Cognitive Science, University of Potsdam, Potsdam, Germany

**Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Carmelo M. Vicario, University of Queensland, Australia Zaira Cattaneo, University of Milano-Bicocca, Italy

#### **\*Correspondence:**

Florian Krause, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, P. O. Box 9104, 6500 HE Nijmegen, Netherlands. e-mail: f.krause@donders.ru.nl

Evidence for an approximate analog system of numbers has been provided by the finding that the comparison of two numerals takes longer and is more error-prone if the semantic distance between the numbers becomes smaller (so-called numerical distance effect). Recent embodied theories suggest that analog number representations are based on previous sensory experiences and constitute therefore a common magnitude metric shared by multiple domains. Here we demonstrate the existence of a cross-modal semantic distance effect between symbolic and tactile numerosities. Participants received tactile stimulations of different amounts of fingers while reading Arabic digits and indicated verbally whether the amount of stimulated fingers was different from the simultaneously presented digit or not. The larger the semantic distance was between the two numerosities, the faster and more accurate participants made their judgments. This cross-modal numerosity distance effect suggests a direct connection between tactile sensations and the concept of numerical magnitude. A second experiment replicated the interaction between symbolic and tactile numerosities and showed that this effect is not modulated by the participants' finger counting habits. Taken together, our data provide novel evidence for a shared metric for symbolic and tactile numerosities as an instance of an embodied representation of numbers.

**Keywords: number cognition, tactile perception, finger counting**

## **INTRODUCTION**

It has been argued that numbers are cognitively represented in an approximate and analog manner (e.g., Dehaene et al., 1993). Main evidence for this notion comes from the so-called *numerical distance effect* (Moyer and Landauer, 1967). When participants are asked to perform a magnitude judgment (i.e., compare two numbers by their semantic size) responses are slower, when the semantic distance between the two numbers is small (e.g., 2 vs. 3), compared to when the semantic distance is large (e.g., 1 vs. 4; Gallistel and Gelman, 1992). This effect of the numerical distance has been consistently explained by a representational overlap of neighboring numbers on a hypothetical analog mental continuum of numerical magnitudes (e.g., Restle, 1970; Dehaene and Changeux, 1993). That is, a particular number does not only activate the representation of exactly this number, but also the representation of the numbers next to it. Consequently, the further apart two numbers are, the less do they activate each other and the easier it is to discriminate between them. Support for this idea is also provided by studies on human and non-human cortical activations in response to numerosity information that demonstrated the existence of number-sensitive neurons with overlapping tuning curves in macaque (Nieder and Miller, 2003) as well as in the human parietal cortex (Piazza et al., 2004). Although the existence of an analog representation in humans and animals is very established, the origin and nature of this specific semantic representation of magnitude information is controversially debated (see e.g., Cohen Kadosh and Walsh, 2009).

In modern psycholinguistic research, several authors emphasized the idea of embodied cognition (Wilson, 2002), which basically holds that each semantic representation is grounded in previous sensorimotor experiences and therefore closely linked to low-level perceptual and motor codes (Glenberg and Kaschak, 2002; Barsalou, 2008; Fischer and Zwaan, 2008). Interestingly, the role of embodied representations has also been recently discussed in the context of number processing. For instance, recent research has shown that the perception of abstract numerical stimuli has a direct influence on response selection (Daar and Pratt, 2008) as well as movement generation (Vicario, 2012), demonstrating a close link between numerical concepts and action. It has been speculated that numerical magnitude information becomes meaningful only when it can be somehow mapped to concrete bodily experiences with size and magnitude in everyday life (Andres et al., 2008; Lindemann et al., 2009). A similar important role of size-related sensorimotor representations for numbers has been suggested by a recent theory on magnitude representations proposed by Walsh (2003), which assumes the existence of a shared generalized representation of magnitude. That is, numbers are thought to be processed by a single system which simultaneously codes for size-related information from other cognitive domains like, for instance, sensory and motor representations of physical size or temporal duration. Evidence for this notion comes from behavioral studies showing interferences between numbers and other types of magnitude information, such as the physical size of number symbols (Tzelgov et al., 1992), the perceived time (Oliveri

et al., 2008), the perceived size of an object (Badets et al., 2007), and the aperture size while object grasping (Lindemann et al., 2007).

Another observation often interpreted as evidence for an embodied representation of numbers is the existence of a strong association of fingers and numbers in most adults. This association is probably resulting from the habit to use fingers while counting (Lindemann et al., 2011; Bender and Beller, 2012; Moeller et al., 2012). For instance, in Italian adults this association has been demonstrated by a facilitation to respond to numbers 1 to 5 with the fingers of the right hand, and to numbers 6 to 10 with the fingers of the left hand (a mapping congruent to the prototypical finger counting strategy of the participants; Di Luca et al., 2006), as well as by a facilitation to judge if a number is smaller or larger than five when primed with a finger configuration compatible to the individual's counting strategy (Di Luca and Pesenti, 2008). In addition to this, we know that the preference to start counting with the left or with the right hand varies strongly between individuals and is independent of handedness (Lindemann et al., 2011). Interestingly, manual counting habits, like the individual fingernumber associations and starting preferences have been shown to affect symbolic number processing in adults, even if the use of fingers is not required (Fischer, 2008; Domahs et al., 2010). For instance, Fischer (2008) showed that the association of numbers with a spatial response (SNARC effect; Dehaene et al., 1993) was strongly affected by whether participants started to count on their right or left hand. Only for participants who started counting on their left hand a SNARC effect could be observed. In another study Domahs et al. (2010) investigated finger-based sub-basefive effects in an Arabic number comparison task in three different groups – German deaf signers, German hearing adults, and Chinese hearing adults. Their results revealed that sub-base-five effects were larger in the two German groups which use a sub-base-five finger counting system, compared to the Chinese group which uses a sub-base-10 finger counting system. Taken together, these studies speak for an important role of finger representations for the processing of symbolic numerical information.

While an increasing amount of studies investigated the cognitive effects of the finger-number associations, until today only few studies have examined tactile or haptic numerosity processing as such. As we know from recent experiments on tactile and haptic perception (Riggs et al., 2006; Plaisier et al., 2009; Plaisier and Smeets, 2011), tactile numerosity perception seems to be based on the same distinct cognitive processes as the enumeration of visual items (Atkinson et al., 1976). For instance,Riggs et al. (2006)stimulated the fingertips of their participants and asked them to name the number of stimulated fingers. The authors found that judgments were based on serial counting processes if more than three fingers were stimulated, since enumeration became more errorprone and slower with increasing set-size. In contrast, however, for small numerosities (i.e., less than four fingers) tactile enumeration was fast, effortless, and highly accurate (Riggs et al., 2006; Plaisier and Smeets, 2011; but see also Gallace et al., 2008) – a phenomenon well known from vision research and called "subitizing" (Kaufman et al., 1949). Recently, support for subitizing has also been demonstrated for active touch and the haptic exploration of the amount of objects in the hand (Plaisier et al., 2009; Plaisier and Smeets, 2011). That is, there is increasing evidence that numerosity

perception in the tactical and in the visual modality share the same processes. These findings suggest that all sensory numerosity information are represented by the same modality-independent magnitude system.

Taking into account the embodied view on cognition (e.g.,Wilson, 2002; Barsalou, 2008) and the idea of a single generalized metric for magnitudes (Walsh, 2003), one might speculate that tactile numerosity processing is based on the very same analog magnitude representation that is activated when reading symbolic numbers or solving arithmetic problems. Surprisingly, however, very little is known about the relationship and the commonalities between tactile and symbolic numbers. We assumed that tactile numerosity judgments are based on the same analog representations as involved in symbolic number processing irrespective of differences in format and modality. To examine this hypothesis, we made use of the numerical distance effect (Moyer and Landauer, 1967). We conducted two experiments, in which participants received tactile stimulations on their fingers of the left or right hand while reading an Arabic digit. The participants' task was to indicate as fast as possible whether the visually presented number matched the amount of stimulated fingers. If both tactile and symbolic numerosities are indeed mapped onto the same analog magnitude metric, we expected to observe a crossmodal numerosity distance effect reflected by an inverse linear relation between the judgment latencies in the magnitude comparison task and the semantic distance between the to-be compared numerosities. Crucially, we used a same-different task, and not a magnitude comparison task. That is, if alternatively, symbolic and tactile numerosities activate different analog magnitude representations or same-different comparisons find place on verbal codes, a modulation of the response latencies as a function of the semantic distance is not expected (cf.Van Opstal and Verguts, 2011; Defever et al., 2012).

Moreover, if the acquired associations between fingers and number modulate adults' processing of symbolic numerosity information, one might expect that counting habits also affect the enumeration or perception of numbers in the subitizing range. We therefore aimed to explore additionally the influence of finger counting habits on the tactile perception of numerosities and their comparison with symbolic numbers. To do so, we used an adapted version of the finger counting questionnaire of Lindemann et al. (2011) to classify the starting preference of our participants and tested whether detection times or cross-modal numerosity distance effects are modulated by these habits.

## **EXPERIMENT 1**

The aim of the first experiment was to investigate if tactile numerosities are mapped to the same analog representation of numerical magnitude as symbolic numerosities, as expected by the notion of a generalized magnitude system (Walsh, 2003). Participants had to verbally indicate if tactile presented numerosities were identical or different to visually presented Arabic digits. We expected to find a cross-modal semantic distance effect in the numerosity judgments reflected by longer response times when comparing tactile and symbolic numerosities that are close in distance. Furthermore, if finger counting habits affect this analog representation of numerical magnitude, both starting hand preferences as well as specific finger preferences should modulate a cross-modal numerosity distance effect.

## **METHOD**

#### **Participants**

Twenty-four students (five male, two left-handed) between 17 and 33 years of age (mean = 21.33, SD = 3.61) participated in the study in return of C5 or credit points. All of them reported to have normal or corrected-to-normal vision.

#### **Setup**

Participants were seated in front of a table with a computer screen (viewing distance approximately 60 cm) and two custom-made tactile stimulation devices (one for each hand; see also van Ede et al., 2010), each consisting of five piezoelectric Braille cells (Metec AG, Stuttgart, Germany). Each Braille cell had eight pins, arranged in two groups of four, which can be raised and lowered for about 1 mm. The tactile stimulation devices were each placed into a wooden, sound-shielded box on the table in front of the participant, such that he or she could neither see the hands being stimulated, nor hear mechanical noise from the stimulation. The orientation of the tactile stimulation devices within the boxes was such that participants could place their hands in a comfortable horizontally oriented resting position. A dynamic microphone and a custom-made voice-key device was used to record voice-onsets. The experiment was controlled using custom-made software. The experimenter was seated out of the participants' vision at a second table and used a keyboard to enter which verbal response was given.

#### **Material**

Visual target stimuli comprised the digits "1,""2,""3," and "4" presented in a light gray color in front of a dark background. Tactile target stimuli consisted of the simultaneous stimulation of one to four fingers of either the left or right hand. To examine the impact of the counting habits, always one to four suggestive fingers were stimulated starting with either the thumb or pinkie. That is, there were in total eight patterns of stimulation for each hand: four medial finger sets in which the number of stimulated fingers was started with the thumb (1 = [Thumb],2 = [Thumb, Index Finger], 3 = [Thumb, Index Finger, Middle Finger], 4 = [Thumb, Index Finger, Middle Finger, Ring Finger]) and four lateral finger sets starting with the pinkie. (1 = [Pinkie], 2 = [Pinkie, Ring Finger], 3 = [Pinkie, Ring Finger, Middle Finger], 4 = [Pinkie, Ring Finger, Middle Finger, Index Finger]). Depending on the reported finger counting preferences each stimulation pattern could be classified as either finger counting compatible or incompatible.

Individual finger counting habits and starting preference of each participant were determined by a finger counting questionnaire (Lindemann et al., 2011).

#### **Procedure**

Each trial began with the presentation of a fixation cross for 500 ms, followed by the simultaneous onset of the visual and tactile target stimuli. Tactile stimulation consisted of a repeated switching between raised (20 ms) and lowered (30 ms) states of all pins. Participants were instructed to decide whether the amount of stimulated fingers was equal or different to the numerical size of the visually presented digit. Responses were given verbally by uttering "Tee" (when the numerosities were identical) or "Toh" (when the numerosities were different). Since voice onset times served as decision time measures, we decided to use verbal utterances for which the first transient is phonologically the same. The target stimuli (tactile and visual) disappeared as soon as a verbal response was given and a blank screen was presented for a variable time between 1000 and 1500 ms. No feedback was given for erroneous responses. The next trial started after the experimenter classified given responses.

### **Design**

The experiment consisted of four blocks. Each block contained 128 trials (two repetitions of all possible combinations of eight stimulation patterns on two hands and four visually presented digits). All trials were presented in randomized order. The duration of the experiment was approximately 30 min.

## **RESULTS**

#### **Finger counting preferences**

The analysis of the finger counting questionnaire yielded that 58.3% of the participants preferred to start counting with their left and 41.7% with their right hand. Twenty-one participants reported a typical unimanual counting pattern for Western subjects and to start counting with the thumb. One participant reported a counting pattern that could not be classified according to existing categories of starting hand and preferred finger sequence (cf. Lindemann et al., 2011) and therefore had to be excluded from the analysis. The other two participants started counting with the pinkie and counted in successive order to the thumb. The reported finger counting pattern was used to classify the stimulated set of fingers into counting habit compatible and incompatible sets for all participants. That is, for 21 participants the medial fingers (thumb, index finger, middle finger, and ring finger) were classified as counting habit compatible and the lateral fingers (pinkie, ring finger, middle finger, and index finger) as counting habit incompatible, while for two participants the lateral fingers were classified as counting habit compatible and the medial fingers as counting habit incompatible.

#### **Numerosity comparisons**

Responses that deviated more than three SD from the mean response time of each participant (anticipatory responses: 0.04%; slow responses: 1.63%) were excluded from further analysis. Erroneous responses occurred in 7.37% of all remaining trials and were excluded from the response time analysis.

Median response times and errors were each entered in a separate repeated measures ANOVA with the within-subject factors Semantic Distance (0, 1, 2, 3), Set of Fingers (counting habit compatible, counting habit incompatible), Hand (left, right), and the between-subject factor Starting Hand (left, right). Reported degrees of freedom for the *F* statistics were Huynh–Feldt corrected, when necessary.

In line with our hypothesis, the reaction time analysis revealed a significant main effect of Semantic Distance, *F*(1.84, 38.73) = 25.03, *p* < 0.001, η 2 *<sup>p</sup>* = 0.54, showing an interaction of

tactile and symbolic numerosities. Responses were faster for a numerical distance between tactile and symbolic numerosity of 3 compared to a distance of 2, *t*(22) = −4.21, *p* < 0.001, as well as for a distance of 2 compared to a distance of 1, *t*(22) = −6.06, *p* < 0.001. There was no significant difference between a distance of 1 and a distance of 0 (same numerosity in both modalities), *t*(22) = 1.52, *p* = 0.14 (see **Figure 1**). There was a significant main effect of Set of Fingers, *F*(1, 21) = 11.55, *p* < 0.01, η 2 *<sup>p</sup>* = 0.36, reflecting shorter reaction times when stimulating the counting habit compatible fingers (i.e., for most participants starting from thumb to pinkie; 1131 ms), compared to the counting habit incompatible fingers (i.e., for most participants starting from pinkie to thumb; 1185 ms). The main effect of Hand did not reach significance, *F*(1, 21) = 0.41, *p* = 0.53, η 2 *<sup>p</sup>* = 0.02. Interestingly, the factors Semantic Distance and Set of Fingers interacted significantly, *F*(3, 63) = 8.80, *p* < 0.001, η 2 *<sup>p</sup>* = 0.30. *Post hoc F*-tests showed a stronger effect of Semantic Distance for the counting habit incompatible finger stimulations, *F*(2.20, 46.09) = 26.25, *p* < 0.001, η 2 *<sup>p</sup>* = 0.56, than for the compatible stimulations, *F*(2.01, 42.17) = 15.09, *p* < 0.001, η 2 *<sup>p</sup>* = 0.42. No significant effects were observed for the interactions Semantic Distance × Hand, *F*(2.41, 50.61) = 2.41, *p* = 0.09, η 2 *<sup>p</sup>* = 0.10, Semantic Distance × Starting Hand, *F*(3, 63) = 0.02, *p* = 1.0, η 2 *<sup>p</sup>* = 0.001, the Set of Fingers × Hand, *F*(1, 21) = 0.04, *p* = 0.84, η 2 *<sup>p</sup>* = 0.002, Set of Fingers × Starting Hand, *F*(3, 63) = 0.20, *p* = 0.66, η 2 *<sup>p</sup>* = 0.01, Hand × Starting Hand, *F*(1, 21) = 1.08, *p* = 0.31, η 2 *<sup>p</sup>* = 0.05, Semantic Distance × Set of Fingers × Hand, *F*(3, 63) = 0.61, *p* = 0.61, η 2 *<sup>p</sup>* = 0.03, Semantic Distance × Set of Fingers × Starting Hand, *F*(3, 63) = 0.32, *p* = 0.81, η 2 *<sup>p</sup>* = 0.02, Semantic Distance × Hand × Starting Hand, *F*(3, 63) = 1.90, *p* = 0.14, η 2 *<sup>p</sup>* = 0.08, Set of Fingers × Hand × Starting Hand, *F*(1, 21) = 0.06, *p* = 0.81, η 2 *<sup>p</sup>* = 0.003, and Semantic Distance × Set of Fingers × Hand × Starting Hand, *F*(3, 63) = 0.06, *p* = 0.98, η 2 *<sup>p</sup>* = 0.003.

The error analysis also revealed a significant main effect of Semantic Distance, *F*(2, 42) = 21.02, *p* < 0.001,η 2 *<sup>p</sup>* = 0.50. That is, participants made fewer errors for a numerical distance between tactile and symbolic numerosity of 3 compared to a distance of 1, *t*(22) = −4.58, *p* < 0.001, as well as for a distance of 2 compared to a distance of 1, *t*(22) = −5.10, *p* < 0.001. There was no significant difference between a distance of 1 and a distance of 0, *t*(22) = 0.76, *p* = 0.46. No main effects were observed for the factors Set of Fingers, *F*(1, 21) = 0.02, *p* = 0.89, η 2 *<sup>p</sup>* = 0.001, and Hand, *F*(1, 21) = 1.86, *p* = 0.19, η 2 *<sup>p</sup>* = 0.08. There were no significant effects for the interactions Semantic Distance × Set of Fingers, *F*(1.52, 32.14) = 0.39, *p* = 0.63, η 2 *<sup>p</sup>* = 0.02, Semantic Distance × Hand, *F*(2.27, 47.66) = 1.94, *p* = 0.15, η 2 *<sup>p</sup>* = 0.09, Semantic Distance × Starting Hand, *F*(3, 63) = 0.44, *p* = 0.73, η 2 *<sup>p</sup>* = 0.02, the Set of Fingers × Hand, *F*(1, 21) = 0.16, *p* = 0.69, η 2 *<sup>p</sup>* = 0.008, Set of Fingers × Starting Hand, *F*(1, 21) = 0.22, *p* = 0.65, η 2 *<sup>p</sup>* = 0.01, Hand × Starting Hand, *F*(1, 21) = 0.11, *p* = 0.74, η 2 *<sup>p</sup>* = 0.005, Semantic Distance × Set of Fingers × Hand, *F*(1.87, 39.32) = 0.28, *p* = 0.75, η 2 *<sup>p</sup>* = 0.01, Semantic Distance × Set of Fingers × Starting Hand, *F*(3, 63) = 0.42, *p* = 0.74, η 2 *<sup>p</sup>* = 0.02, Semantic Distance × Hand × Starting Hand, *F*(3,

63) = 1.71, *p* = 0.17, η 2 *<sup>p</sup>* = 0.08, Set of Fingers × Hand × Starting Hand, *F*(1, 21) = 0.3.70, *p* = 0.07, η 2 *<sup>p</sup>* = 0.15, and Semantic Distance × Set of Fingers × Hand × Starting Hand, *F*(3, 63) = 1.82, *p* = 0.15, η 2 *<sup>p</sup>* = 0.08.

#### **DISCUSSION**

As hypothesized, we found a cross-modal numerosity distance effect in the magnitude comparison task when participants were instructed to compare tactile presented numerosities with symbolically presented numerosities. That is, participants became faster and made fewer errors to judge the difference between tactile and symbolic numerosities, when the semantic distance between both numerosities was increased. This finding suggests that tactile numerosities are mapped to the same analog representation of magnitude as symbolic numerosities.

While starting preferences did not modulate the cross-modal distance effect, it was modulated by the set of fingers stimulated. Interestingly, the effect was stronger for counting habit incompatible finger sets than for counting habit compatible finger sets. This appears counter-intuitive as one would have expected the exact opposite pattern if finger representations were connected to an analog numerical magnitude representation, that is, a stronger effect for counting habit compatible finger sets. Furthermore, it has to be noticed that the vast majority of our subjects showed a prototypical Western finger counting habit (Lindemann et al., 2011) and started counting with the medial fingers from thumb to pinkie. Consequently, the dissociation between counting habit compatible and incompatible finger sets goes in the present study along with the dissociation between medial (i.e., starting from thumb) and lateral fingers (i.e., starting from pinkie), which seems to be a problematic confound for the interpretation of our findings. Consequently, it remains unclear if the differences between the stimulated finger sets and the modulation of the numerosity distance effect were driven by differences in the finger counting

preferences or whether they merely reflected differences in the hand physiology between the medial and lateral finger sets and resulting differences in touch acuity and cortical representation (cf. Elbert et al., 1995). To be more precise, a more developed cortical representation of the medial fingers could account for a faster and more precise detection of a tactile stimulation of these fingers, compared to the lateral fingers with a less developed cortical representation. To specifically investigate the influence of finger counting habits in our setting, independent of such physiological differences, we conducted a second experiment in which the same set of fingers was sequentially stimulated. Importantly, the type of sequence and direction of the tactile stimulations, not the set of fingers, defined the compatibility with finger counting habits.

### **EXPERIMENT 2**

The second experiment tests a potential influence of finger counting habits for the detection and representation of tactile numerosities. Since it cannot be excluded that the effect of the set of fingers in Experiment 1 might be driven by physiological differences, Experiment 2 aimed to introduce finger counting compatible and incompatible tactile numerosities while keeping the set of stimulated fingers constant. This has been achieved by sequential stimulations in two different directions; either starting from the thumb or starting from the ring finger. If finger counting habits influence the analog representation of numerical magnitude, participants that start counting with the thumb are expected to show a different cross-modal numerosity distance effect if the sequence of stimulation was not compatible to their direction of counting.

#### **METHOD**

#### **Participants**

Twenty-eight students (eight male, one left-handed) between 18 and 25 years of age (mean = 20.07, SD = 2.37) participated in the study in return of C5 or credit points. None of them participated in Experiment 1. All of them reported to have normal or corrected-to-normal vision.

#### **Setup and material**

The setup and material were identical to that of Experiment 1. The experiment was controlled using the software *Expyriment* (Krause and Lindemann, 2012). Participants were asked to indicate starting preference and specific finger counting habits (cf. Lindemann et al., 2011).

#### **Procedure and design**

The procedure and design were similar to Experiment 1, with two exceptions. First, tactile stimuli consisted of a stimulation of one to four fingers (1 = [Thumb] to 4 = [Thumb, Index Finger, Middle Finger, Ring Finger]). Crucially, all fingers were sequentially stimulated in two different directions: a forward direction, starting from the thumb, and a backward direction, starting from the last finger ending with the thumb. Second, the onset of a visual stimulus was equivalent to the offset of the tactile stimulation. This was done to ensure that response times were not confounded with differences in sequence length (e.g., when seeing the digit 1, a response could already be given after one

finger is stimulated, while when seeing a 4, one would need to wait until all four finger have been stimulated). Tactile stimulation always started with the stimulation of a single finger. After each 100 ms the next finger in the sequence was added to the stimulation. When all fingers were added the stimulation continued on all fingers until a total stimulation time of 600 ms was reached.

#### **RESULTS**

#### **Finger counting preferences**

The analysis of finger counting habits yielded that 57.1% of the participants preferred to start counting with their left and 42.9% with their right hand. Crucially, all participants reported to start counting with the thumb.

#### **Numerosity comparisons**

Erroneous responses (8.46%) as well as responses that deviated more than three SD from individual mean response times (only fast responses: 0.22%) were excluded from the response time analysis of the numerosity comparisons. Since all investigated participants started counting with the thumb, tactile stimulations in forward direction could be considered as finger counting compatible and backward stimulations as finger counting incompatible. Errors and median response times were each entered into a separate repeated measures ANOVA with the within-subject factors Semantic Distance (0, 1, 2, 3), Direction of Stimulation (finger counting compatible, finger counting incompatible), Hand (left, right), and the between-subject factor Starting Hand (left, right). Reported degrees of freedom for the *F* statistics were Huynh–Feldt corrected when necessary.

As in Experiment 1, the response time analysis revealed a significant main effect of Semantic Distance, *F*(2.01, 52.21) = 8.09, *p* < 0.001, η 2 *<sup>p</sup>* = 0.24, confirming our main hypothesis of an interaction of tactile and symbolic numerosities. That is, responses were faster for a numerical distance between a tactile and symbolic stimulus of 3 compared to a distance of 1, *t*(27) = −4.73, *p* < 0.001, as well as for a distance of 2 compared to a distance of 1, *t*(27) = −4.36, *p* < 0.001. There was no significant difference between a distance of 1 and a distance of 0 (same numerosity in both modalities), *t*(27) = −0.29, *p* = 0.77 (see also **Figure 2**). There was only a trend for an effect of Direction of Stimulation, *F*(1, 26) = 4.10, *p* = 0.053, η 2 *<sup>p</sup>* = 0.14, with descriptively slightly shorter reaction times for finger counting compatible stimulation sequence (732 ms) compared to incompatible stimulations (739 ms). That is, in contrast to Experiment 1, we did not observe a reliable advantage of finger counting compatible stimulations. No main effect of the factor Hand was observed, *F*(1, 26) = 0.15, *p* = 0.70, η 2 *<sup>p</sup>* = 0.01. Importantly, there was no interaction between the factors Semantic Distance and Direction of Stimulation, *F*(3, 78) = 0.12, *p* = 0.95, η 2 *<sup>p</sup>* = 0.01, showing that, unlike in Experiment 1, the numerical distance effect was not modulated by finger counting compatibility. No significant effects were observed for the interactions Semantic Distance × Hand, *F*(3, 78) = 1.23, *p* = 0.31, η 2 *<sup>p</sup>* = 0.05, Semantic Distance × Starting Hand, *F*(3, 78) = 0.75, *p* = 0.52, η 2 *<sup>p</sup>* = 0.03, Direction of Stimulation × Hand, *F*(1, 26) = 0.12, *p* = 0.73, η 2 *<sup>p</sup>* = 0.01, Direction of

Stimulation × Starting Hand, *F*(1, 26) = 0.16, *p* = 0.69, η 2 *<sup>p</sup>* = 0.01, Hand × Starting Hand, *F*(1, 26) = 0.12, *p* = 0.73, η 2 *<sup>p</sup>* = 0.01, Semantic Distance × Direction of Stimulation × Hand, *F*(3, 78) = 0.83, *p* = 0.48, η 2 *<sup>p</sup>* = 0.03, Semantic Distance × Direction of Stimulation × Starting Hand, *F*(3, 78) = 1.07, *p* = 0.37, η 2 *<sup>p</sup>* = 0.04, Semantic Distance × Hand × Starting Hand,*F*(3, 78) = 0.85, *p* = 0.47, η 2 *<sup>p</sup>* = 0.03, Direction of Stimulation × Hand × Starting Hand, *F*(1, 26) = 3.35, *p* = 0.08, η 2 *<sup>p</sup>* = 0.11, and Semantic Distance × Direction of Stimulation × Hand × Starting Hand, *F*(3, 78) = 2.30, *p* = 0.08, η 2 *<sup>p</sup>* = 0.08.

The error analysis revealed a significant main effect of Semantic Distance, *F*(2.28, 59.17) = 30.45, *p* < 0.001, η 2 *<sup>p</sup>* = 0.54, with fewer errors for a distance of 3 compared to a distance of 1, *t*(27) = −4.07, *p* < 0.001, as well as a distance of 2 compared to a distance of 1, *t*(27) = −4.27, *p* < 0.001. The difference between a distance of 1 and a distance of 0 was significant as well, *t*(27) = −2.93, *p* < 0.01. No significant main effects were observed for the factors Direction of Stimulation, *F*(1, 26) = 1.58, *p* = 0.22, η 2 *<sup>p</sup>* = 0.06, and Hand, *F*(1, 26) = 0.44, *p* = 0.51, η 2 *<sup>p</sup>* = 0.02. The 4-way interaction Semantic Distance × Direction of Stimulation × Hand × Starting Hand was significant, *F*(3, 78) = 3.28, *p* < 0.05, η 2 *<sup>p</sup>* = 0.11. Since our hypotheses are independent from this observed 4-way interaction between all factors, we did not further analyze and interpret this complex effect. There were no significant effects for the interactions Semantic Distance × Direction of Stimulation, *F*(2.27, 59.07) = 1.39, *p* = 0.28,η 2 *<sup>p</sup>* = 0.05, Semantic Distance × Hand, *F*(2.26, 68.19) = 1.07, *p* = 0.36, η 2 *<sup>p</sup>* = 0.04, Semantic Distance × Starting Hand, *F*(3, 78) = 0.13, *p* = 0.94, η 2 *<sup>p</sup>* = 0.01, the Direction of Stimulation × Hand, *F*(1, 26) = 1.81, *p* = 0.19, η 2 *<sup>p</sup>* = 0.07, Direction of Stimulation × Starting Hand, *F*(1, 26) = 0.74, *p* = 0.40, η 2 *<sup>p</sup>* = 0.03, Hand × Starting Hand, *F*(1, 26) = 0.34, *p* = 0.56, η 2 *<sup>p</sup>* = 0.01, Semantic Distance × Direction of Stimulation × Hand, *F*(2.43, 63.12) = 2.09, *p* = 0.12, η 2 *<sup>p</sup>* =

0.07, Semantic Distance × Direction of Stimulation × Starting Hand, *F*(3, 78) = 0.62, *p* = 0.61, η 2 *<sup>p</sup>* = 0.02, Semantic Distance × Hand × Starting Hand, *F*(3, 78) = 0.07, *p* = 0.98, η 2 *<sup>p</sup>* = 0.003, and Direction of Stimulation × Hand × Starting Hand,*F*(1, 26) = 0.39, *p* = 0.54, η 2 *<sup>p</sup>* = 0.02.

## **DISCUSSION**

Experiment 2 confirmed the finding of the cross-modal numerosity distance effect from Experiment 1. Again, the effect was present in both response times and error rates.

However, the cross-modal numerosity distance effect was not modulated by any finger counting preferences (Starting Hand or Direction of Stimulation), as would have been expected, if counting habits influence the analog representation of numerical magnitude. We interpret this as evidence that a common metric shared by the representation of tactile and symbolic numerosity information reflects a magnitude representation that is independent of finger representations and analog numerosity representations acquired while learning to count with the fingers.

In contrast to Experiment 1, in which finger counting compatibility led to faster responses, but was confounded with hand physiology, neither the stimulation direction nor the starting preference significantly influenced the perception of the tactile stimulus. While there was a trend for a main effect of Direction of Stimulation no main effect for Starting Hand could be observed. Thus, while counting habits do not influence a shared magnitude representation, they might have a marginal influence on the perception of a tactile stimulus.

#### **GENERAL DISCUSSION**

The current study demonstrates an interference between fingers and numbers on the level of analog numerical magnitude representations. In two experiments we investigated the relation between tactile and symbolic numerosities, and the influence of finger counting habits thereon. Our data provide first evidence for the existence of a cross-modal semantic distance effect in participants comparing tactile presented numerosities with symbolically presented numerosities. More specifically, responses were faster and less error-prone when judging two distant numerosities (e.g., 1 and 4) than when judging two close-by numerosities (e.g., 1 and 2).

Importantly, all numerosities used in the current study were within the range of subitizing and are thus assumed to be perceived directly and accurately without relying on a serial counting process (Riggs et al., 2006; Plaisier and Smeets, 2011). We can therefore assume that our results (at least in Experiment 1, where the stimulation was non-sequential) are not mediated by verbally counting the stimulated fingers. Rather, since the numerical distance effect has been consistently interpreted to reflect a representational overlap between neighboring items on an analog continuum (Moyer and Landauer, 1967; Restle, 1970), our results suggest that tactile presented numerosities were automatically mapped onto the same analog representation as symbolic numerosities. This interpretation receives further support by the fact that participants made a same-different judgment (and not a magnitude judgment), as it has been shown that the numerical distance effect resulting from a same-different judgment crucially depends on overlapping analog representations (Van Opstal and Verguts, 2011). Thus, the fact that we find a numerical distance effect allows us to exclude the possibility that the comparison of the tactile and symbolic numerosities was done by merely comparing verbal codes (since this would not have led to a distance effect). It is also unlikely that the magnitude representation of both numerosities was not activated directly, but through a preceding verbal code, since it has been shown that already preschoolers use surface features of numerical stimuli instead of a magnitude representation to solve a same-different judgment, when available (Defever et al., 2012). This means that if a verbal code preceded a magnitude representation in our setting, judgments could have already been solved on this more direct verbal level, without the need for a more abstract representation of the numerical magnitude. Crucially, again, a same-different judgment on the basis of such verbal codes would not have led to a numerical distance effect. Taken together, the current finding suggests that tactile and numerical numerosities share a common analog representation of numerical magnitude.

The finding of a cross-modal numerosity distance effect is in line with the notion of a generalized magnitude system (Walsh, 2003), which hypothesizes that the brain processes general magnitude information according to a shared metric, independent from the domain this magnitude information comes from. In our study, magnitude information from two different modalities (tactile, visual) and with two different notations (symbolic, non-symbolic) had to be processed and compared. The observation that the judgment latencies and accuracies depended on the cross-modal numerosity distance suggests that both types of numerosity information were mapped onto the same analog magnitude representation which were then utilized for the actual cognitive comparison. It has to be furthermore mentioned that the current study is focusing on the processing of small numerosities. It is therefore unclear whether visual and tactile numerosities share also cognitive codes for larger numbers. Taking into account the possibility that common representations are shaped while using the fingers to count, it is an important open question for further research whether these cross-domain associations are also present for numerosities larger 10.

The conclusion that processing of sensory and symbolic numerosity information leads to an activation of common analog codes supports the idea of embodied numerosities. The embodied cognition view claims that abstract cognitive concepts are "grounded" in sensorimotor experiences (Barsalou, 2008). That is, the content of abstract concepts, like numbers, is assumed to become meaningful by being coupled to bodily representations (Lindemann et al., 2009). Here, the cross-modal semantic distance effect reveals a direct relationship between tactile and abstract numerosities and the presence of a magnitude metric shared by both modalities. Representations of sensory experiences about size and numerosity might this way provide a grounding for the meaning of symbolic numbers and might therefore play a crucial role in the development of number concepts.

While we cannot entirely exclude that finger counting habits are responsible for the differences in the numerical distance effect between the sets of fingers found in Experiment 1, our data does also not provide any evidence for this. We observed a stronger numerical distance effect for the fingers which are not used to represent the numerosities during counting. However, if finger representations were indeed connected to an analog numerical magnitude representation, one would have expected the opposite, namely, a stronger numerical distance effect for those fingers compatible to this representation. Considering furthermore that no influence of finger counting habits on the numerical judgments could be found when the same set of fingers was stimulated in different sequential orders (Experiment 2), it seems very likely that physiological differences between the medial and lateral sets of fingers were responsible for the observed differences in the judgment latencies of Experiment 1.

In contrast to our study, some previous studies reported an influence of finger counting habits on the processing of symbolic numbers (e.g., Di Luca et al., 2006; Di Luca and Pesenti, 2008). The question arises therefore why finger counting habits did not affect the cross-modal numerosity comparison as investigated in the present paradigm. First, it is important to note that most of the existing literature demonstrated associations between finger patterns and numbers by means of a faster detection or stronger number activations for canonical finger patterns. These effects might be mediated by a perceptual familiarity of canonical finger patterns. While we observed a similar pattern of facilitation in Experiment 2 where stimulation sequences compatible with the participants' finger counting pattern were detected slightly faster and processed more fluently, this effect was, however, not statistically significant. Second, the current study is one of the first to investigate the influence of finger counting habits on an analog representation of numerical magnitude in the subitizing range. Following the literature on subitizing, this should have resulted in a very automatic and fast activation of the number concept (Kaufman et al., 1949). The absence of any influence of finger counting habits under these circumstances suggests that differently preferred patterns of fingers are not differently coupled to an analog representation of numerical magnitude. Typical finger counting patterns might instead constitute an additional independent numerical representation (see also Moeller et al., 2012 for a similar proposal) and represent verbally and perceptually mediated associations between postures and number meaning that are acquired while learning to count.

While the presence of cross-modal numerical distance effects supports the view of an embodied representation of numerical magnitude, we argue that the fact that this phenomenon is independent of acquired finger counting preferences shows that finger counting postures serve as the function of motor symbols and reflect probably the individuals' cognitive strategy to offload numerical information (Lindemann and Krause, 2012).

Taken together, the current study provides evidence for a shared metric for tactile and symbolic numerosities, as an instance of an embodied representation of numbers. Crucially, the underlying analog representation of numerical magnitude information appeared to be independent from finger representations.

## **REFERENCES**


canonical finger numeral configurations. *Exp. Brain Res.* 185, 27–39.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 September 2012; accepted: 04 January 2013; published online: 25 January 2013.*

*Citation: Krause F, Bekkering H and Lindemann O (2013) A feeling for numbers: shared metric for symbolic and tactile numerosities. Front. Psychology 4:7. doi: 10.3389/fpsyg.2013.00007*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Krause, Bekkering and Lindemann. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Perception of face and body expressions using electromyography, pupillometry and gaze measures

#### **Mariska E. Kret <sup>1</sup> , Jeroen J. Stekelenburg<sup>2</sup> , Karin Roelofs <sup>3</sup> and Beatrice de Gelder 2,4\***

<sup>1</sup> Psychology Department, University of Amsterdam, Amsterdam, Netherlands

<sup>2</sup> Cognitive and Affective Neurosciences Laboratory, Tilburg University, Tilburg, Netherlands

<sup>3</sup> Radboud University Nijmegen, Behavioural Science Institute and Donders Institute for Brain Cognition and Behaviour, Nijmegen, Netherlands

<sup>4</sup> Psychology and Neuroscience Department, Maastricht University, Maastricht, Netherlands

#### **Edited by:**

Judith Holler, Max Planck Institute Psycholinguistics, Netherlands

#### **Reviewed by:**

Ute Schmid, University of Bamberg, Germany

Ursula Hess, Humboldt-University, Germany

#### **\*Correspondence:**

Beatrice de Gelder, Psychology and Neuroscience Department, Maastricht University, Oxfordlaan 55, 6229 ER Maastricht, Netherlands. e-mail: degelder@unimaas.nl

Traditional emotion theories stress the importance of the face in the expression of emotions but bodily expressions are becoming increasingly important as well. In these experiments we tested the hypothesis that similar physiological responses can be evoked by observing emotional face and body signals and that the reaction to angry signals is amplified in anxious individuals. We designed three experiments in which participants categorized emotional expressions from isolated facial and bodily expressions and emotionally congruent and incongruent face-body compounds. Participants' fixations were measured and their pupil size recorded with eye-tracking equipment and their facial reactions measured with electromyography. The results support our prediction that the recognition of a facial expression is improved in the context of a matching posture and importantly, vice versa as well. From their facial expressions, it appeared that observers acted with signs of negative emotionality (increased corrugator activity) to angry and fearful facial expressions and with positive emotionality (increased zygomaticus) to happy facial expressions. What we predicted and found, was that angry and fearful cues from the face or the body, attracted more attention than happy cues. We further observed that responses evoked by angry cues were amplified in individuals with high anxiety scores. In sum, we show that people process bodily expressions of emotion in a similar fashion as facial expressions and that the congruency between the emotional signals from the face and body facilitates the recognition of the emotion.

**Keywords: facial expressions, emotional body language, scenes, pupil dilation, fixations, electromyography**

## **INTRODUCTION**

The communication of emotion includes recognizing signals of hostility or joy and reacting to signals of distress. Humans are especially sensitive to facial expressions and gestural signals of others, and use these signs to guide their own behavior. Previous research has largely focused on the perception of facial expressions (Haxby et al., 2000; Adolphs, 2002). But our ability to communicate also relies heavily on decoding messages provided by body postures (de Gelder et al., 2004, 2010; de Gelder, 2006; Kret et al., 2011c). The first goal of the current study is to test to what extent facial expressions are recognized and processed as a function of the accompanied body posture and vice versa. Second, research has shown that highly anxious individuals respond stronger to facial expressions than those with a low anxiety level (MacLeod and Cohen, 1993;Amin et al., 1998; Miers et al., 2008). Our second goal is to test whether highly anxious people are also hyper-reactive to body postures.

Before we lay out our research questions, we start with an overview on how humans generally recognize and react to emotional expressions and describe similarities and differences between faces and bodies in terms of the mechanisms involved in emotion expression and perception. Finally,we describe individual differences in these mechanisms with a focus on anxiety.

The perception of bodily expressions is a relatively novel topic in affective neuroscience, a field dominated so far by investigations of facial expressions. But faces and bodies are equally salient and familiar in daily life and often convey the same information about identity, emotion, and gender. Moreover, emotions from both sources are usually very well recognized as shown in different validation studies. The recognition rate of angry, fearful, and happy emotions is especially high (for the NimStim facial expression set, these emotions were correctly recognized at 74.3% with nine response alternatives, and the body postures in the Bodily Expressive Action Stimulus Test (BEAST) set were correctly recognized at 92.5% with four response alternatives). Even when these stimuli are presented subliminally, recognition tends to be well above chance (Esteves et al., 1994; Dimberg et al., 2000; Stienen and de Gelder, 2011a,b).

In addition to facial expressions, bodily expressions give us information about the action tendency of the agent. Aggressive body postures therefore, can be perceived as a more direct threat to physical harm than facial expressions (de Gelder et al., 2010). When we observe another individual, such as a friend expressing his or her anger toward us, different processes are initiated. First, the *attention* is drawn toward the threat, especially toward our friends' face or eyes (Green et al., 2003; Lundqvist and Öhman, 2005; Fox and Damjanovic, 2006) and toward his body posture (Bannerman et al., 2009). Next, *we become aroused too*: our heart beat changes, we start sweating, and our pupils dilate (Bradley et al., 2008; Gilzenrat et al., 2010). Moreover, it is possible that the observed emotion will be reflected in our own facial expression (Dimberg, 1982).

The perception of facial expressions and body postures is interactive and context-dependent (faces: Righart and de Gelder, 2006; Kret and de Gelder, 2012a; bodies: Kret and de Gelder, 2010). Meeren et al. (2005)show that observers judging afacial expression are strongly influenced by emotional body language; an amplitude increase of the occipital P1 component 115 ms after stimulus presentation onset points to the existence of a rapid neural mechanism sensitive to the agreement between simultaneously presented facial and bodily emotional expressions. Continuing this line of research, Aviezer et al. (2008) positioned prototypical pictures of disgusted faces on torsos conveying different emotions. Their results showed that combining a facial expression and a torso but sometimes also showing an object (for example underwear) induced changes in the recognition of emotional categories from the facial expressions to the extent where the "original" basic expression was lost when positioned on an emotionally incongruent torso.

Research has shown that whereas the immediate expression of emotions by the face and the body is automatic and predominantly regulated by subcortical structures (Aggleton, 2000; Lanteaume et al., 2007), the conscious regulation of emotional expressions (smiling during a job interview or hiding joy in a poker play) is steered by higher order cortical structures such as the orbitofrontal cortex (Damasio, 1994). Some people, such as those with an anxious personality type become socially inhibited because, in social interactions, they over-activate this network, which takes so much cognitive effort that it has a negative effect on the interaction (Kret et al., 2011a). Anxious individuals have a propensity for over-responding to social or emotional signals and in particular to those that are threatening. This hyper-responsiveness may translate to increased activation in the amygdala (Etkin et al., 2004; Hayes et al., 2012), increased attention toward threat (Bar-Haim et al., 2007) paired with increased pupil dilation (Kimble et al., 2010) and altered facial expressions as measured with electromyography (EMG; Dimberg and Thunberg, 2007). In addition, when confronted with facial expressions, they may attend to the wrong cues (Horley et al., 2003; Bar-Haim et al., 2005; Mogg et al., 2007). Moreover, highly anxious subjects are likely to give negative interpretations of ambiguous social situations in which conflicting information is presented (Huppert et al., 2007). Previous studies have suggested that anxious individuals prefer negative interpretations over other possibilities when facial expressions convey conflicting information (e.g., Richards et al., 2002; Yoon and Zinbarg, 2008). Most studies so far used facial expressions. But a recent study looked at vocalizations as well (Koizumi et al., 2011). They showed that anxious individuals when recognizing emotions from either the face or the voice in paired combinations were more likely to interpret others' emotions in a negative manner, putting more weight on the to-be-ignored angry cues. This interpretation bias was found regardless of the cue modality (i.e., face or voice). Interestingly, anxiety did not affect recognition of the face or voice cues when presented in isolation. Therefore, this interpretation bias is

due to poor integration of the face with simultaneously presented other cues such as voice cues among anxious individuals. We now would like to test whether anxious individuals also hyper-react to negative emotions expressed by the body and whether they would misinterpret positive emotions when conflicting cues from the face or the body are presented simultaneously.

#### **OBJECTIVES OF THE CURRENT STUDY**

We investigated the recognition of emotions from the face and the body separately, and when combined with a matching or nonmatching whole body. In *Experiment 1*, participants categorized happy, angry, and fearful isolated faces and happy, angry, and fearful isolated bodies. In *Experiment 2*, the same participants were asked to categorize emotions in facial expressions, but the face presented was on top of a body that expressed either the same, or a different emotion. *Experiment 3* used the same stimuli as Experiment 2, but participants were now asked to label the body emotion and ignore the facial expression. The experiments were given in a random order. We tested three main hypotheses:


## **Experiment 1. Categorizing isolated facial and bodily expressions of emotion**

### **MATERIALS AND METHODS PARTICIPANTS**

Thirty-seven students from Tilburg University (26 females, mean age 22.7, range 19–29 years old; 11 males; mean age: 23.8, range 19–32 years old) provided informed consent and took part in the experiment. All participants were included in the analyses except in the EMG analyses due to technical problems with the EMG data of four participants in Experiment 1 and three in Experiment 2 which were not recorded. The other data from these participants could be analyzed so they were not excluded from any other analyses. For Experiment 3, data for all participants was properly recorded and included in the analyses. Participants had no neurological or psychiatric history, were right-handed and had normal or corrected-to-normal vision. The study was performed in accordance with the Declaration of Helsinki and approved by the local medical ethical committee.

#### **MATERIALS**

Fearful, happy, and angry facial expressions of six male individuals that were correctly recognized above 80% were selected from the NimStim set (Tottenham et al., 2009). The corresponding bodily expressions were taken from the BEAST stimulus database (de

Gelder andVan den Stock, 2011). For the current study,we selected the best models, with recognition scores above 80% correct. We used only male bodies because we previously found that these evoke stronger arousal when anger and fear are expressed (Kret et al., 2011b; Kret and de Gelder, 2012b). Pictures were presented in grayscale, against a gray background. Using Photoshop the luminance of each stimulus was modified to the average luminance. A final check was made with a light meter on the test computer screen. The size of the stimuli was 354 × 532 pixels (see **Figure 1**).

#### **PROCEDURE**

After attaching the electrodes to the participants' face, the eyetracking device was positioned on the participant's head. Next, a nine-point calibration was performed and repeated before each block. Stimuli were presented using E-prime software on a PC screen with a resolution of 1024 by 768 and a refresh rate of 100 Hz. Each trial started with a fixation cross, shown for minimally 3000 ms until the participant fixated and a manual drift correction was performed by the experiment leader, followed by a picture presented for 4000 ms and a gray screen (3000 ms). The face and body stimuli were randomly presented within two separate blocks containing 36 trials each. To keep participants naive regarding the purpose of the EMG, they were told that the electrodes recorded perspiration. The order of the two blocks and also the order of the experiments were counterbalanced. Two additional passive viewing tasks had been given (results will be published elsewere). Participants were asked to categorize the emotion being depicted, choosing amongst three response alternatives that were written on the screen (anger,fear, happy) and three corresponding buttons on a button-box. The order of the emotion labels was counterbalanced. Participants were requested to indicate their choice after the stimulus disappeared from the screen.

## **MEASUREMENTS Facial EMG**

The parameters for facial EMG acquisition and analysis were selected according to the guidelines by van Boxtel (2010). BioSemi flat-type active electrodes were used and facial EMG was measured bipolarly over the zygomaticus major and the corrugator supercilii on the right side of the face at a sample rate of 1024 Hz. The common mode sense (CMS) active electrode and the driven right leg (DRL) passive electrode were attached to the left cheek and used as reference and ground electrodes, respectively (http://www.biosemi/faq/cmsanddrl.htm). Before attachment, the skin was cleaned with alcohol and the electrodes were filled with electrode paste. Raw data were first filtered offline with a 20– 500 Hz band-pass in Brain Vision Analyzer Version 1.05 (Brain Products GmbH), and full-wave rectified. Data were visually inspected for excessive movement during baseline by two independent raters who were blind to the trial conditions. Trials that deemed problematic were discarded, resulting in the exclusion of 6.07% (SD 7.50) of the trials from subsequent analysis. Due to technical problems, the EMG data of four participants in Experiment 1 and three in Experiment 2 were not recorded. Subsequently, mean rectified EMG was calculated across a 4000-ms post-stimulus epoch, and a 1000 ms pre-stimulus baseline period. Mean rectified EMG was expressed as a percentage of the mean pre-stimulus baseline EMG amplitude. Percentage EMG amplitude scores were averaged across valid trials and across emotions.

The zygomaticus is predominantly involved in expressing happiness. The corrugator muscle can be used to measure the expression of negative emotions including anger and fear (van Boxtel, 2010). In order to differentiate between these two negative emotions, measuring additional face muscles such as the frontalis would be necessary (Ekman and Friesen, 1978). However, this was not possible in the current experiment, due to the head-mounted eye-tracker. Activity of the corrugator in a specific context, such as by presenting clear emotional stimuli, can be interpreted as the expression of the observed emotion (Overbeek et al., 2012).

#### **Eye-tracking**

Eye movements were recorded with a sample rate of 250 Hz using the head-mounted EyeLink Eye-Tracking System (SensoMotoric Instruments GmbH, Germany). A drift correction was performed on every trial to ensure that eye gaze data were adjusted for movement. We used the default Eyelink settings which defined a blink as a period of saccade detector activity with the pupil data missing for three or more samples in a sequence. A saccade was defined as a period of time where the saccade detector was active for two or more samples in sequence and continued until the start of a period of saccade detector inactivity for 20 ms. The configurable acceleration (8000 degrees/s) and velocity (30 degrees/s) threshold were set to detect saccades of at least 0.5˚ of visual angle. A fixation was defined as any period that was not a blink or saccade. Analyses were performed on the proportion of time spent looking at each interest area within the time spent looking on the screen, with the first 200 ms discarded due to the fixed position of the fixation cross. In accordance with previous literature, blinks were linearly interpolated before subtracting a 500 ms baseline from the average pupil size during the last 2 s of picture presentation. The first 2 s were not included in the analysis to avoid influences of the initial dip in pupil size (Bradley et al., 2008).

#### **Anxiety measure**

On the day before testing, participants filled out the STAI Trait Measure (Spielberger, 1983). The average score was within the normal range 49.89 (standard deviation: 1.75, range: 46–54). The reason for giving this questionnaire on the day beforehand rather than after the experiment was to avoid possible influences of the task.

#### **DATA ANALYSIS**

Data from the different measurements were analyzed in separate ANOVAs with two body parts: (head and body) and three emotions (anger, fear, happiness). Due to technical failure, the EMG data of four participants were not recorded. Significant main effects were followed up by Bonferroni-corrected pairwise comparisons and interactions with two-tailed *t*-tests. In separate multiple linear regression models, we investigated the influence of anxiety, as measured with the STAI.

#### **RESULTS**

Participants categorized isolated facial and bodily expressions of anger, happiness and fear while their fixation patterns, pupil dilation, and facial muscle movements were being recorded. The objective of this experiment was to investigate whether isolated emotional expressions from the face and the body are processed similarly.

#### **ACCURACY**

There were main effects of body part and emotion [*F*(1, 36) = 87.00, *p* < 0.001; *F*(2, 72) = 12.64, *p* < 0.001] and an interaction between emotion and body part [*F*(2, 72) = 15.092, *p* < 0.001]. Faces were recognized at ceiling, and better than bodies (face: Mean = 0.985, SE = 0.004, body: Mean = 0.865, SE = 0.013) and as such there was no significant difference between the three *facial* expressions (although happy faces were slightly better recognized than fearful ones, Mean = 0.991, SE = 0.004 versus Mean = 0.973, SE = 0.010), but pairwise comparisons of the body postures showed that angry and fearful bodies were better recognized than happy ones (anger: Mean = 0.944, standard error (SE) = 0.015; happy: Mean = 0.757, SE = 0.037; fear: Mean = 0.896, SE = 0.015; *p*s < 0.01). The multiple linear regression model that included the accuracy rates per condition was significant [*F*(6, 28) = 2.64, *p* < 0.05]. A positive relation was found between the STAI and the recognition of fearful faces (β = 0.382, *t* = 2.217, *p* < 0.05) and a negative relation with the recognition of fearful bodies (β = 0.386, *t* = 2.424, *p* < 0.05) (see **Figure 3**).

#### **GAZE AND FIXATION BEHAVIOR**

There was a main effect of body part *F*(1, 36) = 304.06, *p* < 0.001 and of emotion *F*(2, 72) = 184.81, *p* < 0.001. Participants looked (as a proportion of the whole screen) longer at faces than at bodies (Mean = 0.998, SE = 0.003 versus *M* = 0.553, SE = 0.025, *p*s < 0.001) and at angry and fearful more than at happy expressions (anger: Mean = 0.814, SE = 0.014 and fear: Mean = 0.806, SE = 0.014 versus happy Mean = 0.691, SE = 0.013). There was no difference between anger and fear (*p* = 0.652). However, there was an interaction between body part and emotion *F*(2, 72) = 186.37, *p* < 0.001 that showed that these effects were fully driven by the body. This was confirmed with an ANOVA that included only body postures. Happy postures were less attended to than either angry or fearful postures *F*(2, 72) = 207.26, *p* < 0.001 (Mean = 0.396, SE = 0.025 versus Mean = 0.637, SE = 0.027 and Mean = 0.625, SE = 0.027, *p*s < 0.001). There was no effect of emotion on fixation duration on the whole face (*p* = 0.380). However, we found an effect of emotion on the duration of fixations on the eyes *F*(2, 72) = 64.32, *p* < 0.001. Participants attended longest to fearful eyes (Mean = 0.314, SE = 0.017 versus anger: Mean = 0.144, SE = 0.011 and happy: 0.234, SE = 0.017, *p*s < 0.001) (see **Figure 2**).

#### **ELECTROMYOGRAPHY**

There was an interaction between emotion and body part on the zygomaticus *F*(2, 68) = 6.15, *p* < 0.005. When analyzing the zygomaticus response to the different emotions separately for faces and for bodies, it appeared that this facial muscle only differentially responded to facial expressions *F*(2, 68) = 4.35, *p* < 0.05 and was more active following happy than angry faces (Mean = 115.480, SE = 4.994 versus Mean = 103.830, SE = 3.074, *p* < 0.05) (fear: Mean = 106.835, SE = 3.144). The corrugator showed a main effect of body part *F*(1, 34) = 17.35, *p* < 0.001 and was more responsive to bodies than faces (Mean = 105.033, SE = 0.952 versus Mean = 99.656, SE = 0.762). There was another main effect for emotion *F*(2, 68) = 7.31, *p* < 0.001, showing a greater response following fearful (and to some extent angry) than happy expressions (Mean fear: 103.749, SE = 0.802 and Mean anger: 102.406, SE = 0.583 versus Mean happy: 100.879, SE = 0.749, *p* < 0.005; *p* = 0.081). Anger and fear did not differ (*p* = 0.287). The marginally significant interaction between body part and emotion *F*(2, 68) = 2.62, *p* = 0.080 however, suggests that the main effect of emotion is driven by the facial expression. Analyzing the response to faces only showed again a main effect of emotion *F*(2, 68) = 13.62, *p* < 0.001, with greater responses for angry and fearful versus happy faces (Mean = 100.354, SE = 0.836 and Mean = 101.525, SE = 0.880, Mean = 97.089, SE = 1.010, *p-*values < 0.001). There was no emotion effect for bodies (*p* = 0.472). The multiple linear regression model that included all EMG responses (corrugator and zygomaticus) per condition was highly significant *F*(12, 22) = 5.092, *p* = 0.0005. There was a positive relation between the STAI and zygomaticus response following angry faces (β = 0.399, *t* = 2.738, *p* < 0.05). However, this "smile," was also paired with a frown, as there was a marginally significant relation between the STAI and corrugator activity following angry and happy faces (β = 0.262, *t* = 1.514, *p* = 0.1; β = 0.319,*t* = 1.933, *p* = 0.07). There was a positive relationship between the STAI and corrugator activity following angry bodies (β = 0.380, *t* = 2.841, *p* < 0.01). However, there were negative relationships between the STAI and zygomaticus

activity following fearful and happy bodies (β = 0.352, *t* = 2.404, *p* < 0.05; β = 0.451, *t* = 2.727, *p* < 0.05).

#### **PUPIL SIZE**

There was a main effect of body part *F*(1, 36) = 18.64, *p* < 0.001. Pairwise comparisons revealed greater pupil dilation

following bodies than faces (Mean = 173.320, SE = 16.048 versus Mean = 94.530, SE = 18.380, *p* < 0.001), probably due to the differences in size of the image (see **Figure 1**). In both cases, for faces and for bodies, the magnitude of the response was consistent with expectations (anger and fear > happy) but not significantly. For comparable results see Bradley et al. (2008). The multiple linear

regression model that included the pupil sizes per condition was marginally significant *F*(6, 29) = 2.305, *p* = 0.06. There was a positive relation between the STAI and pupil size following angry faces (β = 0.587, *t* = 2.488, *p* < 0.05).

## **DISCUSSION EXPERIMENT 1**

We used facial EMG, pupillometry, and gaze to measure similarities in the processing of body postures and facial expressions. Angry and fearful body postures and fearful eyes were the most frequent gaze targets. Participants reacted to the sight of the facial expressions with the expected muscular activity but not to body expressions as was previously reported (Magnée et al., 2007; Tamietto et al., 2009). But in line with the study by Magnée et al. (2007), we found that the corrugator responded more to bodies than to faces. One difference between the current and the previous studies is the addition of angry expressions. Adding this third emotion made the task more difficult which may be a reason for the larger differences between individuals in the current study. Moreover, the study by Tamietto et al. (2009) included only two participants with visual cortex blindness. A third difference is that in the current study we used only male actors. These task differences may explain the lack of differentiation of EMG signals between observing different bodily expressions.

With regard to anxiety state, we indeed observed hyperreactivity to emotional cues (MacLeod and Cohen, 1993; Amin et al., 1998; Miers et al., 2008). Anxious individuals showed a greater corrugator response to angry body postures and to angry faces (for similar results, see Dimberg and Thunberg, 2007). But in the latter case, this frown was paired with a smile. The meaning of the smile could be a sign of submission, a conciliatory smile which was paired with high arousal, as shown by their greater pupil dilation. A similar finding has been reported previously in subjects with a dismissing-avoidant pattern of attachment (characterized by repressing anxiety-related signals) who showed an increased zygomaticus response ("smiling reaction") to angryfaces (Sonnby-Borgström and Jönsson, 2004). In addition, we found that the more anxious subjects were, the better they were in decoding fearful faces, but the more difficulties they had in recognizing this emotion from body cues.

In the next experiments, we combine facial and bodily expressions in a face and a body categorization task. The goal is to test the influence of body expressions on the recognition of and responses to facial expressions and vice versa. In addition, the role of anxiety is investigated.

### **Experiment 2. Categorizing facial expressions of emotion in the context of body expressions**

In this experiment, participants (see Participants, Exp. 1 for details) categorized facial expressions that were presented together with emotionally congruent or incongruent body postures. The purpose of this study is to investigate whether recognition is facilitated with the presence of a congruent body posture and, in addition, whether the body expression influences not only how the face is perceived, but also how it is processed.

*Procedure.* Materials consisted of the same face and body images used in Experiment 1 but here the faces and bodies were combined in emotionally congruent and incongruent pairs (see **Figure 4**). The identity-pairs were kept the same across the three emotions, making nine combinations. The stimuli were divided in two blocks containing 36 random trials each with 18 congruent and 18 incongruent stimuli (72 trials in total). Participants were requested to label the facial expression. Thus, in order to perform well on this task, participants had to look at the face and ignore the bodily expression. On average, they spend 59% of their looking time at the face and 9% at the body. After the experiment, they were asked to describe what they had seen. All participants mentioned having seen emotional expressions. Most of them noticed that in some cases the facial and bodily expressions were incongruent.

*Data analysis.* Data from the different measurements were analyzed in separate ANOVAs with three facial expressions × three bodily expressions (anger, fear, happiness). To analyze the eyetracking data, we created two regions of interest (ROIs): the face and the body. Due to a technical failure, the EMG data of three participants were not recorded. Significant main effects were followed up by Bonferroni-corrected pairwise comparisons and interactions with two-tailed *t*-tests.

## *Results.*

*Accuracy.* There were main effects for facial expression *F*(2, 72) = 17.64, *p* < 0.001 and body expression *F*(2, 72) = 3.37, *p* < 0.05 and an interaction between face and body expression *F*(4, 144) = 9.75, *p* < 0.001. Pairwise comparisons revealed no differences between the body postures. Happy faces were better recognized than angry or fearful faces (happy: Mean = 0.984, SE = 0.007, anger: Mean = 0.968, SE = 0.010, fear: Mean = 0.887, SE = 0.020). In line with previous literature, participants were better in recognizing angry and fearful faces when accompanied with emotionally congruent versus incongruent bodies (angry face congruent versus angry face incongruent: Mean = 0.993, SE = 0.005 versus Mean = 0.956, SE = 0.016; fearful face congruent versus fearful face incongruent: Mean = 0.946, SE = 0.017 versus Mean = 0.857, SE = 0.027) *t*(36) ≥ 2.79,*p* < 0.01 (happy faces were recognized at ceiling;Meeren et al., 2005) (see **Figure 3**). Relations between recognition rates for the different conditions and the STAI score were investigated in a multiple regression model but this model was not significant (*p* > 0.05).

*Gaze and fixation behavior.* There was a main effect of facial expression on fixation duration on the face ROI *F*(2, 72) = 22.21, *p* = 0.001. A face was looked at longest when it expressed anger (anger: Mean = 0.657, SE = 0.034, happy: Mean = 0.552, SE = 031, fear: Mean = 0.551, SE = 0.033, *p*-values < 0.001). Body posture did not affect fixation durations on the face ROI (*p* = 0.426) (see **Figures 3** and **4**).

*Electromyography.* The zygomaticus reacted to the expression that was shown by the face, independent of the bodily expression *F*(2, 68) = 4.67, *p* = 0.012. Increased responses of happy versus angry (Mean = 111.948, SE = 2.946 versus Mean = 105.681, SE = 2.099, *p* < 0.01) and fearful faces (Mean = 105.860, SE = 3.358, *p* < 0.05) were observed. The corrugator also responded to the observed face *F*(2, 68) = 5.29, *p* < 0.01 and was more active for fearful versus happy expressions (Mean = 103.118, SE = 1.006 versus Mean = 100.169, SE = 0.739, *p* < 0.05; numerically consistent for angry (Mean = 102.730, SE = 1.161) versus happy faces *p* = 0.120). Relations between EMG responses to the different stimulus conditions and the STAI score were investigated in a multiple regression model but this model was not significant (*p* > 0.05) (see **Figure 3**).

*Pupil size.* Participant's pupils responded to all emotional expressions, as compared to baseline (*p*-values < 0.005) but


#### **Table 1 | Means and standard errors.**

there was no difference between the emotions (see **Table 1** for all means). Relations between pupil size responses to the different stimulus conditions and the STAI score were investigated via multiple regression. This model was significant (*F*(9, 26) = 2.454, *p* < 0.05). There was a positive relationship between the STAI and pupil size in the condition where fearful faces were paired with happy bodies (β = 0.646, *t* = 2.156, *p* < 0.05) (see **Figure 3**).

#### **DISCUSSION EXPERIMENT 2**

In this experiment, we investigated how participants perceive and categorize a facial expression presented in the context of a bodily expression. As expected, recognition of facial expressions improved when the body and face showed the same expression. We did not find an overall hyper-responsiveness in highly anxious subjects but we observed a specific increase in arousal (as measured by greater pupil dilation) in the condition where fearful faces were paired with happy bodies*.* In the next experiment, participants are asked to categorize the body posture and ignore the face.

#### **Experiment 3. Categorizing bodily expressions of emotion in the context of facial expressions**

In this experiment, the exact same stimuli were shown as in the previous experiment, but under different task instructions. Participants (see Participants, Exp. 1 for details) here were asked to attend to and categorize the body posture and ignore the facial expression. On average, they spend 58% looking at the body and 23% at the face.

*Data analysis.* Data from the different measurements were analyzed in separate ANOVAs with three facial expressions × three bodily expressions (anger,fear, happiness). Significant main effects were followed up by Bonferroni-corrected pairwise comparisons and interactions with two-tailed *t*-tests.

#### *Results.*

*Accuracy.* There were two main effects and an interaction [face: *F*(2, 72) = 4.91, *p* < 0.01; body: *F*(2, 72) = 24.15, *p* < 0.001; face × body: *F*(4, 144) = 4.88, *p* < 0.005]. Accuracy was lowest for happy bodies (happy: Mean = 0.749, SE = 0.033 versus anger:

Mean = 0.953, SE = 0.018 and fear: Mean = 0.905, SE = 0.014, *p*-values < 0.001), providing most room for an influence of facial expressions. Fear and anger were not significantly different (*p* = 0.104). As expected, happy bodies were recognized better in combination with a happy versus fearful or angry face (happy body congruent versus happy body incongruent: Mean = 0.914, SE = 0.014 versus Mean = 0.901, SE = 0.018) *t*(36) ≥ 2.73, both *p*-values < 0.01. The multiple regression model was not significant *F*(9, 26) = 1.485, *p* = 0.20. We predicted that anxious individuals would make mistakes when categorizing a happy posture in the context of an angry face. Indeed, the recognition rates in this condition were the only significant predictor in this model β = 0.844, *t* = 2.551, *p* = 0.01.

*Gaze and fixation behavior.* There was a main effect for body posture *F*(2, 72) = 124.82, *p* < 0.001. Participants attended longer to the body in the case of a threatening posture (anger: Mean = 0.649. SE = 0.022, fear: Mean = 0.642, SE = 0.020, happy: Mean = 0.463, SE = 0.020, *p*s < 0.001). There was also a main effect for facial expression *F*(2, 72) = 6.41, *p* < 0.005. Participants attended longer to the body when the face expressed happiness versus fear (Mean = 0.603, SE = 0.021 versus Mean = 0.566, SE = 0.021, *p* < 0.01, which was numerically consistent for anger, Mean = 0.584, SE = 0.019). The interaction was not significant *F*(4, 144) = 2.04, *p* = 0.093. Because participants still spent about a quarter of their time observing the face, we were able to analyze the effect of facial and bodily expressions on the looking times within the face ROI. There were main effects for facial expression and bodily expression on fixation durations within the face ROI *F*(2, 72) = 3.69, *p* < 0.05; *F*(2, 72) = 9.00, *p* < 0.001. Participants attended longer to fearful than angry (Mean = 0.243, SE = 0.02 versus Mean = 0.215, SE = 0.018, *p* < 0.05) or happy faces (Mean = 0.223, SE = 0.018, *p* = 0.161, ns). Interestingly, the looking times on the face depended mostly on the bodily expression, being longest when the body posture expressed happiness versus fear (Mean = 0.254, SE = 0.017 versus Mean = 0.207, SE = 0.019, *p* < 0.001) or anger (Mean = 0.219, SE = 0.021, *p* < 0.05).

*Electromyography.* There was a trend toward a main effect for body expression on the zygomaticus *F*(2, 66) = 2.73, *p* = 0.073 but follow-up pairwise comparisons did not yield any significant difference (happy versus angry bodies; Mean = 109.916, SE = 3.596 versus Mean = 102.785, SE = 2.130, *p* = 0.115). The corrugator did not show an effect of facial or bodily expression. The multiple regression model was significant *F*(18, 15) = 3.625, *p* < 0.01. We found a positive relation between the STAI and EMG activity of both the zygomaticus and the corrugator in the condition where angry faces were paired with fearful bodies (β = 0.614, *t* = 3.162, *p* < 0.01; β = 1.287, *t* = 2.488, *p* < 0.05). A positive relation was also found with the zygomaticus in the condition where happy faces were paired with angry bodies (β = 0.656, *t* = 3.152, *p* < 0.01).

*Pupil size.* Pupil dilation showed an increase in activity as compared to baseline *t*(36) ≥ 7.035, all *p*-values < 0.001 but did not respond more to one emotion than the other. The multiple regression model was not significant *p* > 0.05 (see **Table 2** for all means and SEs).


#### **FIGURE 4 | Categorizing facial and bodily expressions of**

**emotion.** The figure shows one stimulus exemplar per condition with a superimposed fixation map (duration based and averaged per condition). In experiment 2, participants categorized facial

expressions, whereas in experiment 3, they categorized bodily expressions. For visualization purposes, these heat maps are presented against a background of one exemplar stimulus from that condition.

## **DISCUSSION EXPERIMENT 3**

As expected, participants' recognition was best when the body and face showed the same expression. Although in this task participants were asked to focus on the body posture, the face still attracted substantial attention. A possible explanation is that they were uncertain about the body emotion and checked the face in search of clarification. Indeed, the congruency effect on accuracy scores seemed somewhat larger for bodies than for faces. Attention was shifted away from happy cues, whether expressed by the face or the body. In experiment 2, we observed EMG effects for facial expressions. In this experiment, participants focused on the body expressions, which may be an explanation for its lack of effect.

## **GENERAL DISCUSSION**

We report three experiments investigating the recognition of emotional expressions in the face and the body. In experiment 1, faces and bodies were presented and in experiment 2 and 3, the faces and bodies were combined in emotionally congruent and incongruent naturally looking, compound stimuli. The aim of these studies was to get insight into how the emotional signals from the face and those from the body posture, independently as well as jointly trigger physiological responses in the observer. Three hypotheses were tested. First, as predicted, we observed that the recognition of facial and bodily expressions was enhanced when their presentation was paired with an emotionally congruent face or body. Second, in line with our expectations, angry and fearful face and body cues attracted more attention than happy ones, independent of the context (emotionally congruent or incongruent face or body) in which they were presented. Third, as predicted, anxious participants showed enhanced pupil dilation and corrugator response to threatening cues from the face and the body. The combination of multiple measurements provides insight into the underlying processes and shows that individual differences in anxiety, as well as contextual factors influence our reaction to the emotional expression of another person. We first summarize the results before discussing the broader implications of our research.

Facial expressions were always accurately recognized but, as shown by Experiment 2, the presence of a body posture expressing the same emotion, increased recognition rates. The inverse was also true. In fact, the greatest congruency effect was observed for happy body expressions. Isolated happy body postures were recognized correctly 76% of the time. However, when combined with a happy face, these same bodies were recognized significantly better (81% correct). We observed that participants with high STAI scores more often interpreted happy body postures as threatening when the face showed an angry expression. This is in line with an earlier study which showed that anxious individuals could not ignore angry cues from the voice when interpreting facial expressions (Koizumi et al., 2011). Being anxious thus seems to influence the way social signals are interpreted. This is consistent with the literature on negative interpretation biases related to anxiety, especially in emotionally ambiguous situations (MacLeod and Cohen, 1993; Amin et al., 1998; Miers et al., 2008).

When presented with isolated facial expressions, participants fixated longest on fear expressions and more specifically on fearful eyes (Experiment 1). In Experiment 2, when categorizing facial expressions in the context of body postures, participants attended to the face in particular when it expressed anger. In Experiment 3, when categorizing bodily expressions, they attended less to the face when the body showed a threatening expression and focused more on the body when the face expressed happiness. A general pattern across experiments was that angry and fearful faces and bodies were looked at longer than happy expressions. In other words, attention was preferentially allocated to cues indicating potential threat during social interaction (Green et al., 2003; Schrammel et al., 2009).

Participants' pupils dilated in response to all expressions, independent of the source or the specific emotion, see Bradley et al. (2008) and Schrammel et al. (2009) for similar results. Anxiety scores predicted pupil dilation triggered by viewing angry faces (see also Kimble et al., 2010; Felmingham et al., 2011).

Participants' faces expressed a negative emotion in response to observing angry and fearful faces and expressed a positive emotion in response to happy faces. This was not the case for body postures. Magnée et al. (2007) observed a main effect of emotion (fear > happy) and a main effect of source (body > face) on the corrugator but they did not observe an interaction. Their study did not report to what extent the corrugator differentially responded to the different body expressions and therefore, comparison with the current study is difficult. As in Magnée et al. (2007), we observed a main effect of source (body > face) in Experiment 1. It is not clear what underlies this effect, but it could be that different processes than emotional synchronization are involved, such as emotion regulation or action preparation.

We show that people process bodily expressions of emotion in a similar fashion as facial expressions and that the presence of both adds up to the total percept of the emotion. Observing emotion in others is always arousing, whether the other person expresses a positive or a negative emotion. Pupil dilation seems to reflect a general appraisal of a social counterpart in terms of potential threat or reward from an interaction. The finding that anxious participants smiled and frowned simultaneously in response to an angry face illustrates that EMG activity in an emotional paradigm reflects more than emotional synchronization and that these rapid facial expressions serve as an affiliative signal that has important functions for social interaction (Fischer and Manstead, 2008; Hareli and Hess, 2012). Simultaneous measurement of the frontalis muscle could have given us more insight, especially for better differentiating between emotional expressions. Unfortunately, we were unable to measure this muscle, as it was occluded by the eye-tracker.

We show that, when it comes to fixation patterns, emotional cues, and especially those that are threatening, attracted participants'attention more than incongruence between the two different channels. This finding is in line with previous studies which also found longer looking times at angry expressions compared to threat-irrelevant expressions (De Bonis and Baque, 1978a; de Bonis and Freixa i Baque, 1983; Schrammel et al., 2009). Moreover, visual search studies have found that angry faces are typically detected more quickly and accurately than happy ones (de Bonis and Baque, 1978b; de Bonis et al., 1999).

The role of the amygdala, an often over-active brain area in anxious individuals (Etkin et al., 2004), in modulating this aspect of behavior, is not yet clear. For example,Adolphs et al. (2005) proposed that the fear recognition deficit in a patient with bilateral amygdala damage was caused by her inability to use information from the eye region of faces. Yet the amygdala is a complex structure with a number of nuclei that have different functions and different subcortical and cortical connections. These specific functions may explain the appearance of normal behavior or its disappearance in pathological groups. We recently found that Urbach–Wiethe disease (UWD) participants with specific damage to only the basolateral nucleus of the amygdala performed like healthy controls in recognizing face or body expressions. But when shown the incongruent face-body compounds used in Experiment 2 and 3, their facial expression recognition was significantly impaired for recognition of fearful as well as for angry faces (de Gelder et al., 2012). This result shows an intriguing similarity with the pattern we found in a study of violent offenders, a group in which deficits in the amygdala has been reported repeatedly (Anderson and Kiehl, 2012). Like the UWD patients, this group showed hyper-reactivity to the negative body expressions that were not relevant for correct task performance (Kret and de Gelder, under review). Interestingly, in the former experiment there was no difference in gaze behavior between the groups. In the current study, we did not find an overall hyper-responsiveness in highly anxious subjects but we observed a specific increase in arousal in the condition where fearful faces were paired with happy bodies while gaze behavior was unaffected*.* The present experiments represent an important step on using combined behavioral and physiological measures in experiments that use more complex stimuli than in the past. Further research is needed to understand

## **REFERENCES**


Orienting to threat: faster localization of fearful facial expressions and body postures revealed by saccadic eye movements. *Proc. Biol. Sci.* 276, 1635–1641.


how the physiological parameters used here in normal participants may or may not easily map onto the behavioral patterns.

## **CONCLUSION**

Common sense tends to hold that we read facial expressions like we read words on a page, meaning that we directly and unambiguously access the meaning word by word. However, the happy, angry, and fearful faces we see leave room for interpretation, as is clearly seen in the strong influence of the body expressions on recognition accuracy. In turn, bodily expressions are not free from contextual influences either and are recognized depending on the facial expression with which they are presented. We consistently found that participants focused more of their attention on angry and fearful versus happy cues and this counted for bodies as well as for faces. Moreover, when confronted with fear and anger, participants' corrugator muscle became more active. These effects were most pronounced as a function of increased anxiety.

#### **ACKNOWLEDGMENTS**

We thank A. van Boxtel for his advice regarding the EMG measurements, J. Shen for help with the Eyelink system, Jorik Caljauw from Matlab support for his help with writing Matlab scripts, and S. Bell for editorial assistance. Research was supported by NWO (Nederlandse Organisatie voor Wetenschappelijk Onderzoek; 400.04081) and European Commission (COBOL FP6-NEST-043403 and FP7 TANGO) grants to Beatrice de Gelder, by a Vidi grant (#452-07- 008) from NWO to Karin Roelofs and by a grant from the Royal Netherlands Academy of Sciences (Dr. J. L. Dobberke Stichting) to Mariska E. Kret.


stimulus basis for measuring perception of whole body expression of emotions. *Front. Psychol.* 2:181. doi:10.3389/fpsyg.2011.00181


Interpretation biases in social anxiety: response generation, response selection, and self-appraisals. *Behav. Res. Ther.* 45, 1505–1515.


awareness similarly for facial and bodily expressions. *Front. Hum. Neurosci.* 5:132. doi:10.3389/ fnhum.2011.00132


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2012; accepted: 11 January 2013; published online: 08 February 2013.*

*Citation: Kret ME, Stekelenburg JJ, Roelofs K and de Gelder B (2013) Perception of face and body expressions using electromyography, pupillometry and gaze measures. Front. Psychology 4:28. doi: 10.3389/fpsyg.2013.00028*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Kret, Stekelenburg , Roelofs and de Gelder. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## The role of the environment in eliciting phantom-like sensations in non-amputees

## **Elizabeth Lewis\*, Donna M. Lloyd and Martin J. Farrell**

School of Psychological Sciences, The University of Manchester, Manchester, UK

#### **Edited by:**

Judith Holler, Max Planck Institute Psycholinguistics, Netherlands

#### **Reviewed by:**

Roger Newport, University of Nottingham, UK Takao Fukui, INSERM U1028, France

#### **\*Correspondence:**

Elizabeth Lewis, School of Psychological Sciences, The University of Manchester, Zochonis Building, Brunswick Street, Manchester M13 9PL, UK. e-mail: elizabeth.lewis-2@ postgrad.manchester.ac.uk

Following the amputation of a limb, many amputees report that they can still vividly perceive its presence despite conscious knowledge that it is not physically there. However, our ability to probe the mental representation of this experience is limited by the intractable and often distressing pain associated with amputation. Here, we present a method for eliciting phantom-like experiences in non-amputees using a variation of the rubber hand illusion in which a finger has been removed from the rubber hand. An interpretative phenomenological analysis revealed that the structure of this experience shares a wide range of sensory attributes with subjective reports of phantom limb experience. For example, when the space where the ring finger should have been on the rubber hand was stroked, 93% of participants (i.e., 28/30) reported the vivid presence of a finger that they could not see and a total of 57% (16/28) of participants who felt that the finger was present reported one or more additional sensory qualities such as tingling or numbness (25%; 7/28) and alteration in the perceived size of the finger (50%; 14/28). These experiences indicate the adaptability of body experience and share some characteristics of the way that phantom limbs are described. Participants attributed changes to the shape and size of their "missing" finger to the way in which the experimenter mimed stroking in the area occupied by the missing finger. This alteration of body perception is similar to the phenomenon of telescoping experienced by people with phantom limbs and suggests that our sense of embodiment not only depends on internal body representations but on perceptual information coming from peripersonal space.

**Keywords: amputation, embodied experience, interpretative phenomenological analysis, peripersonal space, phantom limb, proprioception, rubber hand illusion**

## **INTRODUCTION**

The sense of one's own body is largely determined by the multisensory integration of visual, tactile, proprioceptive, and kinesthetic (and possibly auditory) information. Multisensory information about the body comes not only from the body itself, but also from the surrounding environment with which the body interacts. Though we are aware of the resulting sense of embodiment, we are not normally aware of the multisensory integration that produces it. This is only revealed in abnormal situations, such as body illusions, of which the rubber hand illusion (RHI; Botvinick and Cohen, 1998) is a striking example, and pathological phenomena, such as the experience of phantom limbs. Given that the RHI and phantom limbs both result from the alteration of the sensory input that creates the sense of embodiment, the goal of the present paper was, firstly, to determine whether an analog of the phantom limb experience could be created in participants without an amputation through a variation of the RHI. Secondly, we wanted to demonstrate the usefulness of first-person methodology in investigating the experience of embodiment. Although embodiment has been the subject of a wealth studies investigating normal body representation (e.g., Botvinick and Cohen, 1998; Armel and Ramachandran, 2003), only a handful of studies have sought to understand the phenomenological aspects and determinants of subjective experience underlying abnormal body

representations, such as phantom limbs. Such investigations are rare because they are difficult to carry out: the phantom limb is often a transitory phenomenon and, even when it is relatively long-lasting, the experience of pain that is associated with it makes it difficult for patients to reflect analytically on their experience of the phantom limb. The ability to create an analog of the phantom limb experience in intact participants could, therefore, prove to be an important tool in investigating the sense of embodiment and how it relies upon our interaction with the environment.

Our sense of embodiment depends on both bottom-up factors, in the form of incoming sensory information, and on top-down factors, such as the body schema. The body schema acts as a constraint on the sensory processes that underpin body representation by forming a tacit expectation of the body's possible movements (Head and Holmes, 1911; Cardinali et al., 2009). Previous studies have shown that when the expectation of the body schema is violated (as in the case of paralyzed limbs) this may, in some cases, lead to discomfort and a sense of "disownership" of the limb (Moseley et al., 2008) so that the person no longer feels that the limb belongs to them. Embodiment, therefore, can be seen as dependent on the functional capabilities of the body: if the body part is no longer in use or useful then it may become "disowned." This holistic sense of self arises through the interaction between the representation of

the body modified though multisensory integration and the perception of being in control of the body (i.e., the sense of agency), which forms judgments of self-attribution and limb ownership.

However, this holistic embodiment breaks down when alterations in body representation occur through neural damage or through a perceptual illusion. For example, in the RHI, a viewed prosthetic hand is stroked in precise spatial and temporal synchrony with the stroking of a participant's concealed hand. The majority of people report perceiving the touch from the rubber hand as if it were part of their own body. In other words, a "sense" of embodiment is transferred to an external object and the real hand is disowned. The subjective experience of this feeling of ownership over the rubber hand has generally been measured using self-report questionnaires (e.g.,Botvinick and Cohen, 1998;Armel and Ramachandran, 2003; Ehrsson et al., 2004, 2005, 2008; Mussap and Salton, 2006; Schaefer et al.,2006;Durgin et al.,2007;Kitadono and Humphreys, 2007; Lloyd, 2007; Tsakiris et al., 2007; Haans et al., 2008; Longo et al., 2008; Moseley et al., 2008; Capelari et al., 2009; Dummer et al., 2009; Kammers et al., 2009; Schütz-Bosbach et al., 2009; Shimada et al., 2009). Several authors have argued that the feeling of ownership over the rubber hand is induced because vision and touch capture converging, correlated information and this forms a meaningful percept, i.e., the visual perception of a hand being touched co-occurs with the tactile sensation of the hand being touched. This perception becomes dominant, and the conflicting proprioceptive information, which indicates the true position of the participant's hidden hand, is adapted leading to proprioceptive distortion (measured as the distance the intact hand is believed to have moved from its original starting point toward the rubber hand). Human functional neuroimaging studies have revealed that visual, tactile, and proprioceptive inputs relating to limb position are integrated in the premotor and parietal cortices (Lloyd et al., 2002) and the degree of premotor activation shows a linear relationship to how participants subjectively rate the illusion (Ehrsson et al., 2004, 2005).

The RHI demonstrates that the sense of embodiment is strongly influenced by the sensory information produced through interaction with the environment. In the case of the RHI, it is not only proprioceptive information generated internally by the body, but also visual and tactile information generated through interaction with the experimenter, that results in an altered sense of embodiment. Indeed, as we have seen, the proprioceptive sense is actually distorted so that it fits to a greater extent with the visual and tactile information. The idea that the relationship between the body and the world is integral to the overall sense of embodiment (i.e., we get a sense of our own bodies via their interaction with the world) also helps to make sense of other phenomena. In asomatognosia, for example, in which patients have no sense of ownership over one of their limbs, placing the neglected limb into the attended (i.e., contralateral) body space restores multisensory processing (Moro et al., 2004). In complex regional pain syndrome (CRPS), not only does the affected limb feel cooler but that whole side of space is physiologically dysregulated as when the affected limb was moved to the opposite side of space it got warmer (similarly when the "good hand" moved to the affected side of space it got cooler; Moseley et al., 2012). These clinical findings point to the same conclusion as experimental studies of the RHI: our sense of

embodiment is dynamic and dependent on the body's positioning within the space that surrounds it.

Peripersonal space is the region surrounding the body that acts as the interface between the body and the environment for defensive and purposeful action (Cardinali et al., 2009). Neurophysiologists have defined peripersonal space based on the spatial limits of visual receptive fields of individual neurons most often found in the parietal and premotor cortices of non-human primates (e.g., Graziano et al., 1997). For example, in monkey posterior parietal cortex, peripersonal space encoding involves ventral intraparietal area, which contains visuotactile bimodal cells for the face, arm, and hand. These cells use a body-part-centered reference frame to represent visual space around the body, such that the visual receptive field of the bimodal cell is bound to the space surrounding the tactile receptive field of a particular body part. Visual signals from a region of the body can therefore activate a somatotopic map relating to that body part and can remap with changes in posture (Graziano and Gross, 1994). Recent evidence from a functional magnetic resonance imaging (fMRI) study suggests that intraparietal areas similarly encode objects in the peripersonal hand space of humans and may also have a role in visually guided grasping (Makin et al., 2007). Thus, in addition to spatial location influencing our sense of embodiment, the body plays a role in structuring peripersonal space (i.e., it is body-part-centered). There are, in other words, reciprocal relations of influence between the sense of our bodies and our perception of peripersonal space.

One of the ways in which we can get a handle on the sense of embodiment and how it arises is through the investigation of abnormal embodiment. In cases such as these, the normally"invisible" processes that underlie embodiment become more apparent than is usually the case. One of the most striking forms of abnormal embodiment is the phantom limb. Ambroise Paré first reported phantom phenomena in amputee soldiers in the mid sixteenth century. However, it wasn't until Silas Weir Mitchell published the first detailed study (where the term"phantom"was used) in the nineteenth century that phantom limb phenomena (PLPh) became recognized as real sensory experiences and not psychiatric symptoms (Mitchell, 1871). PLPh have been defined as a "continuous awareness of a (or part of a) non-existing or de-afferented body part with specific form, weight, or range of motion" (Ribbers et al., 1989) and have been reported not only in amputees, but also in paraplegics and people with a congenital absence of limbs (Melzack and Loeser, 1978). PLPh may be felt as pins and needles, itching, tingling, or numbness (Katz, 1992; Montoya et al., 1997) whereas others experience embodiment without sensation (i.e., they know that the phantom is there but have no feeling in it – Hunter et al., 2003; Richardson et al., 2006). Movement and position sense may also remain intact in the phantom limb and can be spontaneous (i.e., spasm) or volitional. For example, some patients report gesturing during conversations and can carry out finger-aided counting (Ramachandran, 1993; Saadah and Melzack, 1994; Ramachandran and Hirstein, 1998; Brugger et al., 2000). Some patients also report the sensation of telescoping, in which the phantom shortens over time until only the digits remain on the end of the stump (Weiss and Fishman, 1963; Jensen et al., 1984). There may also be super-added sensations of feeling a watch on the arm or clothing against the phantom skin (Wesolowski and Lema, 1993) indicating that memory systems may help to maintain the phantom experience (Katz and Melzack, 1990; Richardson et al., 2006).

Capturing the subjective experience of embodiment, whether that be the experience of phantom limbs or the RHI, tends to rely on self-report questionnaires using Likert scales in which participants rate their level of agreement or disagreement with statements describing the experience (e.g., "the rubber hand feels like my hand"). But these questionnaires, which use a limited range of items, may have obscured the important subcomponents of embodied experience. A more rigorous analysis of first-person accounts can provide a scientific description of the phenomena and serve as the basis for quantification. An excellent example of this is a study of the alien hand experiment (TAHE) by Sørensen (2005); first reported by Nielsen (1963). In TAHE, the participant is asked to draw a series of objects, which he/she can only see via a tilted mirror. On some trials, the mirror is tilted in such a way that the participant actually sees the experimenter draw the object instead, giving the subjective experience that the participant is not in control of their body. By using first-person reports of the phenomena as experienced by the participant, categories for quantification were disclosed by the data themselves and it was possible to show that concepts such as agency and body schema do reflect real phenomenological aspects of experience.

A more recent study confirmed the utility of first-person methods for the study of embodiment. First-person accounts of subjective experience during the RHI were analyzed using interpretative phenomenological analysis (IPA; Lewis and Lloyd, 2010). IPA has its roots in symbolic interactionism; as such, it is concerned with how meanings are constructed by individuals both on a social and personal basis (Smith and Osborne, 2007). People may experience the same objective event but the meanings that each person attaches to it may be very different, and these personal meanings and feelings will be reflected in the language used to describe the experiences. In IPA, the researcher attempts to find the themes and categories that emerge naturally from the freely produced discourse of participants rather than imposing a preconceived set of categories on participants' responses. Such an approach is advantageous for investigations of novel phenomena, such as the RHI and PLPh, which may never have been experienced before, and therefore require an open and flexible approach to the language used by participants. In the study by Lewis and Lloyd (2010) IPA revealed four main themes of embodied experience during the RHI: recalibration of the body schema; violation of the body schema; multisensory integration; and illusory experience over time. Furthermore, the report of agency was a significant predictor of the amount (in centimeters) of proprioceptive distortion. This study shows how first-person methodologies can be empirically rigorous and how the introspective interview provides a rich, detailed account of embodied experience.

The aim of the present study was to establish, through the use of first-person phenomenological methods, whether an analog of the subjective experience of a phantom limb could be induced in intact participants by having them take part in a modified version of the RHI. We manipulated the RHI paradigm by removing the ring finger from a right rubber hand. Through this manipulation we aimed to discover whether participants, after being induced to feel ownership over the rubber hand, would feel the presence of the absent finger in a way analogous to the feeling of a phantom limb.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Thirty right-handed participants (13 males, 17 females; aged 19– 29 years, mean age of 22.5 years) were recruited from the University of Manchester via opportunity sampling. The study was approved by the School of Psychological Sciences research ethics committee. Participants were screened for tactile and proprioceptive impairments of the right hand. They received five pounds compensation for their time.

## **MATERIALS**

Participants sat at a table across from the experimenter and placed their right hand inside a specially constructed box (40 cm × 27 cm × 7 cm; **Figure 1A**). The ring finger of their right hand was placed on a predetermined spot concealed from view. The side of the box facing the experimenter was open so that the experimenter could stroke the participant's hand. A prosthetic (right) rubber hand was placed palm down on top of the box so that there was a 10 cm horizontal separation between the rubber hand and the participant's real hand. The rubber hand used in the standard RHI condition was intact and the rubber hand used in the missing finger condition was identical except that the ring finger had been cut off prior to the experiment.

The rubber hand was placed in full view of the participant and covered from the wrist by a fake sleeve so that it was a plausible extension of the participant's right arm. A second larger box with a hidden ruler (46 cm × 27 cm × 12 cm; **Figure 1B**) was used to conceal the equipment and take proprioceptive measurements. Both boxes were covered with felt to conceal any distinguishing marks which may have aided localization.

## **PROCEDURE**

Participants took part in both the standard RHI and the missing finger versions. Participants took part in the intact version of

**FIGURE 1 | (A)** Participants viewed a rubber hand with a missing finger on top of an open-sided box while their real hand was hidden from view beneath the box. The experimenter stroked both the real and the rubber hand simultaneously through one of the box's open sides; **(B)** A larger box with a ruler was used to measure participants' estimates of the position of their real hand on the basis of proprioception alone. This was done before and after inducement of the RHI and only the experimenter, seated opposite the participant, could see the markings on the ruler.

the illusion first as it was necessary to establish the basic illusion before the missing finger variation was performed. As it was the missing finger version that was the focus of interest, this order of presentation had the additional benefit of getting participants used to talking about the illusion. Participants had a 2–3 min break between the illusions where they were allowed to move freely. In both conditions, participants placed their right hand inside the apparatus while their left hand rested comfortably on the table beside the apparatus. Participants were informed that they would take part in an illusion and that they should describe their experience of this illusion as fully as possible. In addition, they were asked to watch the rubber hand and keep their right hand still. A single initial proprioceptive measure was obtained by placing the larger box over the equipment and sliding a marker across the hidden ruler until the participant indicated that it was over the location of their ring finger. Then the audio recorder was started. In both conditions the fingers of the rubber hand (excluding the ring finger) were stroked simultaneously with the participant's own fingers. If the participants did not provide a description of their experience after 1.5 min they were asked to describe how they felt about the rubber hand while they watched it being stroked. The RHI was considered established when the participant stated that either "the rubber hand felt like their hand," "the touch was felt in the rubber hand," or they felt that they could "move the rubber hand volitionally." Then the experimenter stroked the ring fingers simultaneously. During the missing finger condition, the experimenter mimed the stroking of a finger in the empty space that would have been occupied by the ring finger of the rubber hand while simultaneously stroking the participant's ring finger. Participants were reminded to describe their experience of the illusion. To encourage this they were prompted with the following questions: how does this feel? How do you feel about your real hand? Is this a comfortable experience? The questions were purposefully vague and their order was not fixed. After 3 min, a second proprioceptive measure was taken in the same manner as the first.

#### **ANALYSIS**

Audio recordings were transcribed verbatim and analyzed using Interpretative Phenomenological Analysis (IPA: Smith, 1996; Smith et al., 1999).

Transcripts were analyzed individually using a two stage process (**Figure 2** illustrates the elements of this analysis). After familiarization with a transcript, the left margin was used to code the themes of the adjoining parts of the transcript. Then, similar or related codes were collapsed into broader themes, which were noted in the margin on the right. Once this was complete for every transcript the data were considered as a whole. Variations in themes across transcripts were used to establish broader themes which could demonstrate the structure of the experience across the entire sample. Specifically, the themes highlighted differences between the transcripts in each condition. Themes are presented with examples of the participants' discourse as evidence as well as the identifying number for that participant in parentheses after each example of discourse. Quantitative measures of proprioceptive distortion were analyzed using analysis of variance (ANOVA) and SPSS v.16.

## **RESULTS QUANTITATIVE RESULTS**

A 2 × 2 repeated measures ANOVA (before/after illusion × fingerabsent RHI/finger-present RHI) was conducted to assess whether proprioceptive judgments of the position of the ring finger of the right hand were shifted away from its objective position before and after each illusion (finger-present vs. finger-absent). There was a significant main effect of the time point of the proprioceptive measure [*F*(1, 29) = 221.34, *p* < 0.001], but there was no main effect of finger presence (*F* < 1) or an interaction between the time point and finger presence [*F*(1, 29) = 1.41, *p* = 0.245]. Participants judged their finger to be significantly further away from its actual location after the illusion (present *M* = 7.03 cm, SD 2.27 cm vs. absent *M* = 6.80 cm, SD 3.13 cm) than they did at baseline (present *M* = 1.23 cm, SD 1.61 cm vs. absent *M* = 1.63 cm, SD 1.71 cm) but the presence or absence of the finger did not influence the proprioceptive judgments (See **Figure 3**).

#### **QUALITATIVE RESULTS**

## **Theme 1: "My invisible finger" – an altered body form changes the effect of multisensory integration on somatic experience**

When the intact fingers of the rubber hand were stroked, the participants' descriptions were the same in both conditions and their comments reflected the experience associated with the standard RHI. They described how the rubber hand felt as though it was their hand or the touch sensations felt located in the rubber hand, whilst their awareness of their own hand had diminished. During the missing finger condition, participants stated that they could still perceive their ring finger even though it was missing on the rubber hand. Many participants believed that this was the aim of

the experiment and emphasized that they could be fooled into owning another hand but they could not be fooled into perceiving their finger as absent. So just seeing and feeling ownership over an altered body form – in this case the rubber hand with a missing finger – does not induce changes in the phenomenal experience of body form; the hand is still experienced as intact:

"It feels like this (rubber) hand is my hand again. . .like my hand is up there. But it doesn't feel like I'm missing a finger, it doesn't feel like my finger disappears or anything. It still feels the same as normal." (4)

When the missing finger area was stroked simultaneously with the participants' ring finger, 93% of participants (i.e., 28/30) reported that they could perceive their ring finger extending out from the stump on the rubber hand even though they could see that it was not there. They discounted the visual information indicating that the finger was missing and the correspondence of tactile cues and visual information from the mimed stroking was sufficient to elicit a percept of their ring finger. The finger was predominantly described as "invisible" or using a metaphor to convey a physical entity which cannot be seen, for example, "my finger is made of glass" or "painted to match the color of the box." Two of the participants described an alternative experience of the illusion: when the area of the missing finger was stroked they instantly reported a holistic shift in awareness to their hidden hand. The following quotes demonstrate these two contrasting ways of describing the illusion

"Oh my god, I just felt like my hand was invisible! The finger isn't there but I feel like it should be so I feel like it is there, I just can't see it. The rubber hand before felt like it was my hand and this also feels like it is my hand. I just feel like Harry Potter's invisibility cloak has been draped over my ring finger. It is there I just can't see it. I feel like if I moved my finger toward it I would be able to touch it even though it is not there. Even though I can't see it, it doesn't feel like its missing or not there." (2)

"I was getting the feeling that this rubber hand was my real hand but as soon as I can see that I don't have a finger here, I can get the feeling of my (actual) hand, it suddenly changes. . .You get your normal state of mind back when you can see that you don't have a finger there even when you can get the feeling there." (29)

The majority of participants described their RHI experience as being akin to their normal somatic experience and they remarked that they could "actually feel the finger." But 25% of the participants (7/28) also reported a reduction in sensation or an increase in somatic intensity, pins and needles, or numbness when they could perceive an invisible finger:

"It feels a bit tingly but it feels like you are still carrying on (stroking) down the finger. I don't know if, because I'm watching it, I tense up, but it feels more strained or tingly." (4)

"I don't feel the sensations as much as before. I definitely feel a loss of sensation at that point (the stump). My finger feels numb." (9)

## **Theme 2: The dynamic relationship between the environment and the mental representation of the body is altered in the missing finger illusion**

During the RHI, the participants' sense of embodiment is changed by manipulating the sensory input generated by interaction with the environment. In the missing finger illusion the seen location of mimed finger stroking can determine the perceived "physical" qualities of the missing finger. It was difficult to trace the outline of the finger perfectly and slight, unintentional variations in the experimenter's precise stroking action naturally arose in the course of individual testing sessions. These variations influenced the perceived shape of the invisible finger. When the experimenter's stroking finger was seen to deviate from the expected shape of a finger, the participants reported that their invisible finger had changed shape, for example, by becoming longer or flatter etc. This was spontaneously reported by 50% of the participants (14/28) and the number of participants that reported different types of alteration in finger shape is given in **Figure 4**.

The following quote is illustrative of some participants' spontaneous reaction to variations in the mimed stroking of the missing finger:

"Arrgh! That totally felt out in space. That was amazing. It's just so compelling. . .but it feels like the invisible finger is kind of less solid than my other fingers, it's kind of a bit squashier. Cos I'm looking very closely, I can see it's very hard for you to trace a finger realistically, but I'm kind of adopting that, so my invisible finger isn't as solid as the other fingers around it." (6)

The changes in form could be quite large, for example, the finger could be perceptually extended by 3 cm to the edge of the experimental equipment. But the experiences reported were not limited to perceived shape differences. For example, some participants attributed the changes to their own active movements even though they were not moving. In addition, changes to the shape of the invisible finger were, in two participants, also associated with changes to perception of the rest of the hand. The seen location of

interaction, therefore, sometimes had a holistic effect on the whole hand representation and not just on the invisible finger:

"There I felt like my finger was reaching out quite a long way. I've basically got a big long ghost finger and it goes up to the boundary of wherever you make it go. That felt like my finger was really extended out, like the other fingers are clenched in a bit. I felt like I was pushing that (ring) finger out really far, like that one was extended and the others were kind of flexed." (6)

Perceived changes in the form of the hand were not reported during the standard RHI. When comparing the two conditions, the standard RHI was described as "normal" or "not unusual." There was a range of reactions to the missing finger condition. No participants reported a painful or uncomfortable experience but some did experience it as aversive. The participants were aware that their experiences were illusory but this did not diminish their impact. It was possible to experience an invisible finger but the participants described it as being accompanied by a sense of wrongness or as somehow being invalid. They were motivated to resolve the conflict between the visible absence of the finger and the felt presence of touch through active movement:

"When you stroked up to the stump, it felt about right and then I was almost expecting not to feel anything (past the stump), as ridiculous as that sounds. And when I still felt it, it was like an extreme sense of wrongness to be honest. Almost like I immediately wanted to move and shake my hand. The urge to move is quite marked. I'd say it was frustrating in that my immediate reaction would be to clench and unclench my fist to make sure everything is working." (11)

In fact, two participants not only felt this urge to move their fingers but actually removed their hand from the equipment.

During the standard RHI, participants often report that they feel as though their hand has taken on the rubbery texture of the rubber hand. Cross modal texture effects were also reported in the missing finger condition; however, these related to textures within the environment as opposed to the surface texture of the body. Two participants reported that they could feel the felt covered surface of the box on which the rubber hand was resting even though their hand was resting on the smooth surface of the table:

"When you were pressing down, I actually felt (that) it should hurt more than that because you went so low into the table; like my invisible finger was being pressed but then it wasn't. It's very realistic. When you press on the fingernail at the end. . .that very much feels like I'm pressing into the felty surface. So now I can really feel it on the felty stuff." (6)

## **DISCUSSION**

In both versions of the RHI, finger-present and finger-absent, errors in the proprioceptive judgments of finger position were larger (i.e., closer to the position of the rubber hand) after the induction of the illusion than they were prior to the induction of the illusion. There was no significant difference in judgments about finger location between the two forms of the illusion, and so both elicited a similar amount of proprioceptive adaptation. Even when there was no rubber finger to embody in the missing finger condition, the participants felt their real finger to be located in the area where the missing finger would have been. First-person reports confirmed that participants felt their finger protruding from the stump of the rubber hand and that they could feel an "invisible finger" in the location where the missing finger would have been. Although perceived finger location was comparable in each version of the illusion, the subjective experience of finger presence was different in the two cases. In the missing finger condition, participants typically reported somatic experiences such as numbness, which attenuated the strength of the reported tactile sensations. In addition, in the missing finger condition the perceived form of the invisible finger was not fixed and at certain times it was reported to change size and shape. These perceptions were often accompanied by a sense of "wrongness" and a recognition that these sensations should not happen. This was not the case in the standard, finger-present, version of the RHI, in which normal feelings of body awareness are experienced as located in the rubber hand.

In the current study 93% of the participants reported that they felt their finger extending out from the stump of the rubber hand and this is comparable to the incidence of a continued experience of a removed body part in amputees, which is as high as 98% in some samples (Ramachandran and Hirstein, 1998). Within the general phantom limb experience, however, there is a range of more specific phenomena that are only seen among subgroups of amputees. Some, for example, can describe the shape, size, and range of movement of the phantom (Katz, 1992; Richardson et al., 2006), whereas others only have a more vague sense of the phantom's presence. Some, in addition to the sense of the phantom's presence, have other sensations, such as pain or tingling. The prevalence of these more specific types of experience has been difficult to establish in amputees, but similar experiences were seen in some of the missing finger RHI participants. A large subsection of these participants could also describe the perceived shape and form of the finger even though they could not see it and additional sensations such as numbness, tingling, and other paresthesias were reported by around 25% of people. The missing finger illusion elicits a range of sensory phenomena which are associated with PLPh and may, therefore, provide a useful method for investigating the underlying mechanisms of some phantom limb experience.

However not all aspects of PLPh were replicated by this illusion. Firstly, the variation in position of the phantom and movement of the phantom, both of which occur in some real phantom limb cases, could not be elicited by the illusion due to the static position of the rubber hand. Secondly, noticeably absent from the range of experiences reported by our participants is any feeling of physical pain due to the missing finger illusion. This may be unsurprising given that none of our participants had actually undergone a traumatic amputation of their ring finger. However, it may also suggest that different mechanisms underlie non-painful and painful sensory phenomena. It is now well documented that phantom limb pain after upper limb amputation is associated with cortical re-organization of the primary somatosensory and motor cortices of the brain (Lotze et al., 2001; MacIver et al., 2008). Functional brain imaging studies show that activation of the lip/face area extends beyond its cortical boundaries to incorporate cortex normally devoted to processing information from the hand/arm. Furthermore, the intensity of the pain is positively correlated with the extent of re-organization, which can be reversed using an intervention based on mental imagery where amputees imagine moving the phantom limb (MacIver et al., 2008). At present it is unclear what causes such extensive re-organization to occur, although there are several theories (see Subedi and Grossberg, 2011, for a recent review). One theory suggests that it is the lack of afferent input to primary sensory cortex, which results in reorganization as the brain utilizes redundant cortex. This highly influential theory has provided the rationale for treatments for phantom limb pain, such as the mirror box (Ramachandran and Rogers-Ramachandran, 1996), which aim to restore afferent sensory input and provide motor feedback, although mirror therapy in general has had mixed results in randomized controlled trials (Brodie et al., 2007; for a discussion of this see Moseley et al., 2008).

The phantom pain and sensation may have its onset immediately or years after the amputation. There are reports of two peak periods of onset, the first within a month and the second a year after amputation (Schley et al., 2008). If the pain associated with phantom limbs is due to cortical re-organization, we would not expect to see this in the short time period over which we tested participants. Non-painful phenomena widely reported in the phantom limb literature and elicited in our study likely precede the development of pain, which takes longer to establish. One possibility is that non-painful phenomena are dependent upon altered multisensory correlations following amputation, which occur over shorter time periods. Cortical re-organization may then occur as a response to these altered multisensory inputs as the brain attempts to re-associate or adapt cortical areas to the objective form of the body. This proposed mechanism for the development of phantom limb pain is based on the idea of the "body schema" (Head and Holmes, 1911). The body schema can be thought of as a template of the entire body in the brain. Any change to the body, such as an amputation, results in the perception of a phantom limb. More recently, Melzack (1989) has proposed that the body schema is formed through a "neuromatrix," which is a network of neurons that integrate inputs from the somatosensory, limbic, and visual and thalamocortical regions of the brain and a "neurosignature," which is the patterns of brain activity that are constantly updated based on the conscious awareness of the bodily self (see also Iannetti and Mouraux, 2010). Together, they form output patterns, which can determine pain and meaningful bodily experience, such that deprivation of sensory input from the limbs leads to disruption of the neuromatrix, an abnormal neurosignature, and the development of pain. In addition to sensory-motor cortex, the parietal and frontal lobes have also been shown to be involved in both the normal and abnormal multisensory representation of the limb (Lloyd et al., 2002; Ehrsson et al., 2004) and may underlie phantom sensations including pain (McCabe et al., 2005). Future studies using functional brain imaging should help to determine whether cortical re-organization occurs due to an absence of sensory input or due to an altered pattern of multisensory integration and whether this correlates with subjective pain.

It may appear that, in addition to the absence of pain, another difference between the missing finger illusion and phantom limbs is that the illusion is clearly elicited by external stimulation provided by the experimenter's mimed stroking of the space where the missing finger would be whereas phantom limbs seem to be the result of internal processes and representations rather than external stimulation. We believe, however, that the current findings point to the possibility that phantom limbs are not the result of purely internal processes but that they too, like the missing finger, are influenced by sensory information coming from the external environment. There are many anecdotal reports suggesting that amputees can feel alterations the phantom experience as a result of interaction with the environment. For example, a phantom limb may recede or telescope to avoid obstacles (Giummarra et al., 2010), textures can be felt through the phantom (Björkman et al., 2010; Weeks et al., 2010). In addition, treatments which are seen to "stimulate" the phantom have been reported to be successful (Huang et al., 2009). In these instances the experience of the phantom limb is felt to change and the amputee attributes it to some physical object seen near to the body. When a sample of amputees who reported PLPh were surveyed about the perceived causes of their phantom experiences, the weather was selected by 47% and the next most commonly selected cause was stress which was selected by 8% (Sherman et al., 1984). These examples demonstrate that amputees sometimes attribute external causes to changes to their PLPh. There is currently no explanation of how this could occur. During normal embodiment, visual cues do not have to be seen on the body to elicit a change in somatic experience. Just seeing a threatening object near the body elicits fear and a physiological stress response and activates parietal brain regions encoding peripersonal space (Lloyd et al., 2006). This ability has been related to the existence of peripersonal space, a region around the body where visual cues are processed as relevant for the body either in terms of reaching, avoidance, or threat detection. Peripersonal space is demonstrated by the existence of cross modal interactions between vision and touch. PLPh may occur due to the same cross modal interactions that support normal embodiment but in a different situation than has ever been experienced before, i.e., when part of the body is missing.

The body changes continuously as we grow up and grow old and the body schema must also change to accommodate these changes. Many experiments demonstrate that changes to the experienced form of the body occur by resolving discrepancies between sensory information. In the extending nose procedure a participant can feel their nose becoming longer when reaching forward and touching another person's nose whilst a third person touches the participant's nose (Ramachandran and Blakeslee, 1998). Vibrating a muscle tendon can also lead to the illusory perception that a limb is becoming longer (Lackner, 1988). In these examples the body percept changes because sensory experiences that reference limb position are incongruent. The same perceptual adaptation most likely underlies the changes in size and shape reported during the missing finger illusion. Participants also report changes in the size of the hand during the standard RHI but the perceptual changes are limited to the visual form of the seen hand, which places constraints on the way that embodiment can be manipulated. For example, during embodiment illusions a participant feels their finger or arm stretch when they see it stretch, but the illusion of ownership is lost if part of the body is completely detached (Newport and Preston, 2010; Preston and Newport, 2012). In the present study, the form of the invisible finger changed very quickly and the same participant could report a range of alterations that were attributed to the seen location of the experimenter's finger. Touch information, from both a tactile and visual senses, shapes the finger percept but when a hand-like object is embodied it is the properties of the object that are adopted. Conversely in the disappearing hand illusion, when there is no body form to see and visual and tactile inputs are absent, there is an absence of embodiment, such that participants cannot feel or locate that body part (Newport and Gilpin, 2011). This situation corresponds to a common sense view of the sensory input that one would expect to be available to an amputee but, in a phenomenological sense, the experience of a phantom limb is more similar to experiences in the missing finger illusion. Collectively this suggests that residual sensory information in the context of an unseen body form may contribute to phantom experience.

The invisible finger phenomena demonstrate how awareness of peripersonal space cannot be thought of as something completely separate from the sense of embodiment. Indeed, they demonstrate that areas of objectively empty space near the body can themselves become part of the subjective embodiment experience. The idea of reciprocal lines of influence connecting sense of embodiment and perception of nearby space is consistent with evidence suggesting that the representation of space near the body changes following amputation. When comparing distances in the landmark position judgment task (Makin et al., 2010), amputees use the intact side of their body in near, but not far, space judgments, suggesting that they come to neglect the space near the missing hand. Space representation is dependent upon body understanding so when it changes, either through illusions, such as the RHI, or through physical alterations to body form such as amputation, the way that space around the body is represented also changes.

An illustration of the intimate connection between the sense of embodiment and peripersonal space can be found in the work of Moseley et al. (2008). A consequence of the RHI is that the temperature of the participant's real hand is lowered when ownership is transferred to the rubber hand (Moseley et al., 2008). A fall in limb temperature is also measured in CRPS, where damage to the nerves causes people to experience chronic pain and numbness and tingling sensations in the affected body part. These patients show altered tactile processing such that they prioritize tactile information in the unaffected hand over the affected hand. But this effect is reversed when the hands are crossed over the body and tactile input in the affected hand is now prioritized because it rests in the unaffected side of space (Moseley et al., 2009). This again demonstrates that how we experience our own bodies is not just a matter of what is happening within the body itself, but also is affected by the body's relationship with surrounding objects and spaces.

In this experiment we have demonstrated that an experience similar to that of the phantom limb can be induced in intact participants using a variation of the RHI. We have also shown, through the use of first-person methods, how this experience reproduces various phenomena associated with phantom limbs, although, significantly, not the pain or discomfort that is sometimes associated with phantom limbs. The way in which the experienced invisible finger is felt to grow in length or to otherwise alter its shape as a function of the way in which the experimenter mimicked the stroking of the missing finger indicates how the sense of embodiment is altered by the body's relationship with its surroundings. Given the similarities between this illusion and PLPh, it may be the case that phantom limbs, and the way that they alter their size and shape over time, are not only a function of an internal body representation but are influenced by relationships within peripersonal space. This, of course, is a matter requiring much further research, but is nevertheless suggestive of possible influences on the way that phantom limbs change over time.

The mediation of sense of embodiment by perception of the surrounding environment has wide philosophical implications, as it suggests that the boundary between self and world is not something absolute and clear-cut. This is a view that has long been advocated by thinkers in the phenomenological tradition, such as Merleau-Ponty (1958) who noted that foreign objects frequently become part of the subjectively experienced body: "The blind man's stick has ceased to be an object for him. . .

its point has become an area of sensitivity, extending the scope and active radius of touch" (p. 165). The blind person is, nevertheless, still aware of the stick, just not directly as an object, but indirectly via other objects: "In the exploration of things, the length of the stick does not enter expressly as a middle term: the blind man is rather aware of it through the position of objects than the position of objects through it." (pp 165–166). I am, in other words, "conscious of my body *via* the world" (p. 94). The experiences generated by the RHI in the present study are consistent with Merleau-Ponty's phenomenological analysis of embodiment.

## **CONCLUSION**

In this study we have been able to create an analog of the phantom limb experience in intact participants by using a variation of

## **REFERENCES**


multisensory brain areas.*J. Neurosci.* 25, 10564–10573.


the RHI in which one of the fingers was missing from the rubber hand. Analysis of first-person reports not only indicated a sense of presence of the missing finger, but the experience among some participants of a number of more specific sensations, such as tingling, associated with phantom limbs. The missing finger version of the RHI may, therefore, provide a means of investigating aspects of embodiment that are difficult to investigate in phantom limb patients themselves. In addition, the way in which the perceived size and shape of the invisible finger altered in the present study indicates that sense of embodiment depends on incoming sensory information from peripersonal space. This is consistent with previous phenomenological work on embodiment and suggests that aspects of the phantom limb experience itself may depend crucially on perception of surrounding space and interactions with the objects in it.


somatosensory cortex. *Neuroimage* 29, 67–73.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 September 2012; accepted: 18 December 2012; published online: 18 January 2013.*

*Citation: Lewis E, Lloyd DM and Farrell MJ (2013) The role of the environment in eliciting phantom-like sensations in nonamputees. Front. Psychology 3:600. doi: 10.3389/fpsyg.2012.00600*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Lewis, Lloyd and Farrell. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Training of manual actions improves language understanding of semantically related action sentences

#### **Matteo Locatelli <sup>1</sup> , Roberto Gatti 1,2 and Marco Tettamanti 2,3\***

<sup>1</sup> Laboratory of Movement Analysis, Vita-Salute San Raffaele University, Milano, Italy

<sup>2</sup> Division of Neuroscience, San Raffaele Scientific Institute, Milano, Italy

<sup>3</sup> Department of Nuclear Medicine, San Raffaele Scientific Institute, Milano, Italy

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Anna M. Borghi, University of Bologna, Italy Lawrence Taylor, Northumbria University, UK Arthur M. Glenberg, Arizona State University, USA

#### **\*Correspondence:**

Marco Tettamanti, Scientific Institute San Raffaele, Via Olgettina 58, I-20132 Milano, Italy. e-mail: tettamanti.marco@hsr.it

Conceptual knowledge accessed by language may involve the reactivation of the associated primary sensory-motor processes. Whether these embodied representations are indeed constitutive to conceptual knowledge is hotly debated, particularly since direct evidence that sensory-motor expertise can improve conceptual processing is scarce. In this study, we sought for this crucial piece of evidence, by training naive healthy subjects to perform complex manual actions and by measuring, before and after training, their performance in a semantic language task. Nineteen participants engaged in 3 weeks of motor training. Each participant was trained in three complex manual actions (e.g., origami). Before and after the training period, each subject underwent a series of manual dexterity tests and a semantic language task. The latter consisted of a sentence-picture semantic congruency judgment task, with 6 target congruent sentence-picture pairs (semantically related to the trained manual actions), 6 non-target congruent pairs (semantically unrelated), and 12 filler incongruent pairs. Manual action training induced a significant improvement in all manual dexterity tests, demonstrating the successful acquisition of sensory-motor expertise. In the semantic language task, the reaction times (RTs) to both target and non-target congruent sentence-picture pairs decreased after action training, indicating a more efficient conceptual-semantic processing. Noteworthy, the RTs for target pairs decreased more than those for non-target pairs, as indicated by the 2 × 2 interaction. These results were confirmed when controlling for the potential bias of increased frequency of use of target lexical items during manual training.The results of the present study suggest that sensorymotor expertise gained by training of specific manual actions can lead to an improvement of cognitive-linguistic skills related to the specific conceptual-semantic domain associated to the trained actions.

**Keywords: embodied cognition, conceptual-semantics, language understanding, sensory-motor system, action training**

## **INTRODUCTION**

Traditional accounts of word meaning have been dominated by work in historical linguistics, mainly dating from the nineteenth and twentieth centuries, promoting a view of lexical-semantic entries as tokens of knowledge shared among speakers of a given tongue that can be derived etymologically and compositionally, as can be described for instance in dictionaries. While this body of work has enormous implications for our formal education and for our daily life in a more and more multilinguistic social environment, the scientific advancement over the last few decades, particularly in the cognitive neurosciences, has emphasized the overwhelming complexity of the brain mechanisms that produce and capture word meaning. Such a major advancement has prompted a need to revise the theoretical accounts of word meanings as relatively crystallized entities in our mind, by taking into account the remarkably plastic and experience-dependent processes undergoing in our brain. One of the most implicationrich aspects of this shift has determined a re-framing in neuroscientific terms of the long-standing dispute among empiricist

and rationalist philosophers, beginning from Aristotle as opposed to Plato: in particular, the contemporary neuroscientific dispute has hinged on conflicting views, as to whether lexical-semantic information is represented in our brain in ways that are largely independent from the sensory-motor brain systems, being stored in hetero-modal cortices in an amodal format, or whether on the contrary it derives from sensory-motor experience and as such is deeply rooted in neural networks extending into the sensorymotor system (for recent reviews, see Kiefer and Pulvermüller, 2012; Meteyard et al., 2012).

In the latter view, the retrieval and processing of conceptual knowledge expressed by language in the form of words or sentences re-activates the same primary sensory and motor processes that are involved in the sensory-motor experience of the concepts' referents. The role of bodily perception and enactment in cognition, including language processing, has been emphasized by proponents of embodiment brain mechanisms (Pfeifer and Scheier, 2001), such as Embodied and Grounded theories focusing either on simulation processes (Barsalou, 1999, 2008), bodily states (Gallese and Lakoff, 2005), or actions situated in a social and physical environment (Glenberg and Kaschak, 2002; Rizzolatti and Craighero, 2004). Empirical evidence has convincingly demonstrated that word meaning is stored in distributed neural networks connecting conceptual content-specific sensory, motor, and emotion-related brain regions with the amodal Perisylvian cortex, with an essential contribution of the left anterior temporal lobe, acting as either a semantic (Patterson et al., 2007) or a modulatory hub (Kiefer and Pulvermüller, 2012). The reactivation of the different neural nodes constituting these distributed semantic networks appears to vary in a highly flexible manner, depending on the type of concept retrieval that is required by the given task, by the context in which it occurs, and by the focus on specific sensory-motor features (Hoenig et al., 2008; Ghio and Tettamanti, 2010; van Dam et al., 2012a,b).

Since the processing of action-related word meaning and congruent motor actions are thought to be subserved, under the flexible circumstances highlighted above, by partially overlapping neural networks, several experimental studies have compatibly shown that the temporal proximity between language processing and action execution tasks can lead to facilitatory/interference effects. For example, Glenberg and Kaschak (2002) showed that hand movements toward or away from the body were facilitated by sentences describing a congruent action (e.g., "He opened/closed the drawer," respectively), compared to when the hand movement was incongruent with the sentence. Similarly, Zwaan and Taylor (2006) found that sentences describing manual rotation (e.g., "He turned down/up the volume") facilitated the manual rotation of a knob (to the left/right, respectively). These modulatory effects can occur bi-directionally, as indicated by the finding that, in turn, manual rotation of the knob facilitated reading of sentences that implied a congruent rotation (Zwaan and Taylor, 2006). Boulenger et al. (2006) found that the processing of action-related verbs presented before the signal prompting for an upper-limb grasping movement facilitated movement kinematics, an effect that was ascribed to residual activation of motor areas by verb processing which lowered the amount of activation required by the subsequent grasping movement to reach threshold. In turn, when the action-related verbs were presented simultaneously to the start of the grasping movement, an interference on kinematic parameters was observed; this interference effect was ascribed to language and action processing simultaneously competing for the same neural resources (Boulenger et al., 2006; see also Chersi et al., 2010 for a computational model accounting for these results). Another critical factor for observing a facilitatory effect of action-related sentences onto a subsequent congruent response movement is that the action required for response (e.g., movement toward or away from the body) must have already been known and planned before the onset of sentence processing, as indicated by a study of Borreggine and Kaschak (2006). If, in turn, the required response action is declared to the subjects after sentence processing, the facilitatory effect disappears. This is most likely due to the temporal unavailability of the motor planning system being already engaged in binding other action features (Hommel et al., 2001), which, in the case of action-language compatibility studies, are expressed by action-related sentences (see also Scorolli et al., 2009 for a related finding). Interestingly, this temporal conflict can also arise as an

effect of processing two action-related sentences linked by simultaneity, as expressed by the adverb *while*, as opposed to the adverb *after* (de Vega et al., 2004).

Interference between language and action processing can also arise when the two tasks do not overlap in time, provided that the task that precedes induces endurable effects in the shared neural resources. This has been suggested by a study (Glenberg et al., 2008a) reporting a series of behavioral experiments, in which healthy participants were submitted to a repetitive, 20 min long, upper-limb motor task, consisting in moving beans, one at a time, from one container to another, with a movement either toward (one group of participants) or away from (the other group) the body. Immediately after, the participants made semantic sensibility judgments on a set of sentences in which the dimension of interest was between sentences describing object transfer toward or away from the reader. A significant motor task by language task interaction was found, such that participants responded more slowly to sentences with an object transfer direction matching the direction of the upper-limb action previously carried out. The interaction was found both for sentences with a concrete object ("Mark deals you the cards") and for sentences with an abstract object ("Ann delegates the responsibilities to you"). This result was interpreted as evidence for a saturation effect, making the motor system less responsive to processing action-related sentence content immediately after repetitive execution of a congruent action.

The rapidly growing amount of studies focusing on embodied language in the recent past has raised a hotly debated controversy in the cognitive neuroscience community as to whether distributed representations in the modality-specific cortices are indeed constitutive to conceptual-semantic language understanding, or just an epiphenomenon such as motor imagery (Mahon and Caramazza, 2008). Even among advocates of embodied language theories,there exist different nuances with respect to the constitutiveness argument, leading to a distinction between weak, moderate, and strong versions of the theory (Kemmerer, 2005; Meteyard et al., 2012). Besides what is regarded as evidence of a correlational nature deriving from functional magnetic resonance imaging (fMRI) experiments (e.g., Hauk et al., 2004; Tettamanti et al., 2005; Aziz-Zadeh et al., 2006; Gonzalez et al., 2006; Moscoso del Prado Martin et al., 2006; Kemmerer et al., 2008; Boulenger et al., 2009; Ghio and Tettamanti, 2010), more conclusive evidence on the necessary role of sensory-motor systems has been sought particularly relying on transcranial magnetic stimulation (TMS) of motor areas (Buccino et al., 2005; Pulvermuller et al., 2005a; Glenberg et al., 2008b;Tremblay et al., 2012) and in patients with lesions in the frontal cortex (Bak et al., 2001; Neininger and Pulvermuller, 2003; Cotelli et al., 2006; Bak and Chandran, 2012). Even with respect to TMS and neuropsychology, however, the available evidence remains controversial. As to the former type of studies, Papeo et al. (2009) showed that, contrary to the view that motor areas are rapidly and automatically activated by action-related language processing (Pulvermuller et al., 2005b), action verb processing induced late (500 ms) but not early (170 or 350 ms) modulatory effects on primary motor area activity. Furthermore, they showed that these modulatory effects were only found in a semantic decision but not in a syllabic task, thus suggesting a non-automatic, post-conceptual role of motor area activations in action-language processing. As to the latter type of studies, Papeo et al. (2010) showed that, in spite of results at the patients' group level confirming the previously described association between motor action deficits and action-related verb processing difficulties (e.g., Bak et al., 2001), in individual patients these two behavioral measures presented a double dissociation, suggesting that the neural systems for language and actions may be largely independent.

In order to help resolving the constitutiveness argument, a crucial notion is represented by causal influences. One needs not only to demonstrate that the processing of word meaning involves the activation of sensory-motor brain areas, but further more that the degree of such an involvement determines the efficiency of conceptual-semantic language understanding. To be truly convincing, this needs to be demonstrated not only in brain-damaged patients or by local perturbations induced by TMS, but also in an unperturbed healthy brain. A substantial leap forward in this direction has been provided by Beilock et al. (2008) in a combined fMRI and behavioral study, showing that specific sensory-motor expertise can improve the comprehension of related concepts in a semantic language task. In this study, ice-hockey players (possessing both playing and viewing experience) were compared to non-player ice-hockey fans (possessing viewing but not playing experience) and novices (no playing or viewing experience). The authors used reaction times (RTs) as a measure of the speed with which the three participant groups matched both the subject and, implicitly, the action-related verb predicate of a sentence with a picture of an individual performing an action presented immediately after. The sentence-picture pairs could refer to either everyday or ice-hockey actions. Beilock et al. (2008) demonstrated that, whereas the three participant groups did not differ in their performance with everyday actions, they significantly differed with ice-hockey actions, with both ice-hockey players and fans producing faster RTs than novices. Furthermore, regression analyses relating brain activation for passive everyday- and hockey-related sentence listening with the behavioral RTs data demonstrated that increasing ice-hockey experience (players > fans > novices) was positively correlated with higher activation of the left dorsal premotor cortex, a brain region supporting the selection of well-learned action plans.

A potential drawback of the Beilock et al.'s (2008) study is that the correlation between sensory-motor expertise and efficiency of conceptual-semantic language understanding was deduced by comparing populations with *de facto* different sport skills and attitudes, so that it is in principle not possible to univocally ascribe more efficient language comprehension to sensory-motor experience, as opposed to other preselected factors. Furthermore, it is in principle equally possible that the semantic understanding advantage of ice-hockey players and fans over novices does not (solely) derive from higher playing and viewing experience, but rather to the more frequent use of ice-hockey-related words in daily life.

In the present study, we aimed to provide direct and clearcut evidence that sensory-motor training in an homogeneous healthy population can lead to more efficient conceptual-specific semantic processing, as predicted by the constitutiveness argument. To this purpose, over a period of 3 weeks, we trained naive healthy subjects to perform complex manual actions (e.g., origami, prestidigitation, and tying sailor's knots). Before and after training, each participant underwent a series of manual dexterity tests [Minnesota Manual Dexterity Test (MMDT) and *ad hoc* tests for the trained manual actions], and a semantic language task. The latter consisted of an adapted version of the sentence-picture semantic congruency task employed by Beilock et al. (2008) and allowed us to measure the speed of conceptual retrieval for sentence meanings that were either semantically related (target items) or unrelated (non-target items) to the trained manual actions. We thus manipulated the two factors Semantic condition (target, non-target) and Training phase (pre, post) in a 2 × 2 factorial design with repeated measures. Our expectations were that gaining sensorymotor experience through a prolonged manual action training would lead to a more efficient conceptual-semantic processing of congruent action-related words, resulting into faster post-training RTs specifically for target sentence-picture pairs.

## **MATERIALS AND METHODS**

## **PARTICIPANTS**

Twenty volunteer subjects took part in the experiment: 10 subjects were randomly assigned to group A and 10 subjects to group B. The data of one participant of group A were discarded, due to poor performance in the semantic language task (67% of correct responses). The mean age of the remaining 19 participants (12 women) was 21.1 ± 1.5 years. All participants were righthanded,native Italian speakers and were students of theVita-Salute San Raffaele University with comparable educational level (high school certificate). They had normal or corrected to normal visual acuity and had no history of neurological, psychiatric, or orthopedic disorders that could affect training or test performance. In order to ensure optimal motor training, we excluded subjects possessing specific abilities related to the trained manual dexterity tasks (origami folding, tying sailor's knots, prestidigitation/rolling coins, sewing, and keyboard playing).

All volunteer subjects gave written informed consent to participate after receiving an explanation of the procedures, according to the Declaration of Helsinki, while remaining naive as to the purpose of the study. The study was approved by the Ethics Committee of the San Raffaele Hospital, Milan.

## **MANUAL DEXTERITY TRAINING**

Each participant trained in three manual dexterity motor tasks. Participants of both groups trained to make origami. Participants in group A also trained to tie sailor's knots, and to roll coins across their fingers. Participants in group B also trained to sew, and to play finger tapping sequences according to color-coded scores. Thirty-minute long training sessions were scheduled over a period of 3 weeks, 5 days/week, 10 min/task. The total of 15 training sessions for each participant were ordered so as to increase task difficulty over the 3-week training period. Task instructions were provided in the form of either still or silent motion pictures, carefully avoiding any accompanying verbal descriptions, particularly with respect to target action-related verbs.

Origami folding was performed using standard square origami paper. Increasing difficulty was achieved by training a different origami figure in each session, with figure in successive sessions displaying an increasing number of required folds and steps.

Sailor's knots were tied using two ropes (length: 1 m; diameter: 0.008 m). Increasing difficulty was achieved by training a different knot in each session, with knots in successive sessions displaying an increasing number of required manipulations and loops.

Coins were rolled from the index, to the middle, ring, and finally to the little finger knuckles of the right hand. Increasing difficulty was achieved by reducing the size of the coin every three sessions (2 euro coin, 50 cent euro coin, 1 euro coin, 20 cent euro coin, and 10 cent euro coin).

Sewing was performed with a sewing needle and all-purpose sewing thread. Fabric sheets with printed line drawings were provided. The participants sewed along the line drawings, using a uniform running stitch. Increasing complexity was achieved by training a different line drawing in each session, with drawings in successive sessions displaying an increasing number of elements and segments.

Finger tapping sequences were performed on a sheet of paper with seven printed circles, each circle of a different color. Scores were provided, consisting of sequences of color-number pairs. One pair after the other, the participants tapped the corresponding colored circle with the right hand finger indicated by the associated number (2: index; 3: middle; 4: ring; and 5: little). The finger tapping frequency was set by a metronome. Increasing difficulty was achieved by changing the scores every three sessions, and for each score, by increasing the metronome frequency every session (60, 90, and 120 bpm).

In order to control for verbal descriptions that the participants may have explicitly associated to the different motor tasks, at the end of the training period we asked participants to write descriptions of the manual dexterity tasks that they performed (see last paragraph of Semantic Language Task Data for information on how the written descriptions were scored and employed for the data analysis).

Before and after the training period, the participants were submitted to, respectively, pre-training and post-training manual dexterity assessments and a semantic language task.

#### **MANUAL DEXTERITY ASSESSMENTS**

The MMDT (Lafayette Instrument Company, Lafayette, IN, USA) was used to assess manual dexterity (Elfant, 1977; Lee and Tsang, 2001). The participants performed three trials of the MMDT turning task (Mandell et al., 1984), using their right hand. The mean of the scores obtained in the last two trials was considered for the analysis.

In addition, in order to specifically evaluate the improvement in performance in the trained manual dexterity motor tasks, we devised a specific metric for each task. For origami, we measured the time employed by each participant to faithfully fold the simplest figure in the training series. For tying sailor's knots, we measured the time employed by each participant to faithfully tie the simplest knot in the training series. For coin rolling, we measured the number of times (cycles) each participant errorlessly rolled the 2 euro coin from the index to the little finger knuckle during a 1-min interval. For sewing, we asked participants to sew along the simplest drawing in the training series, and counted the number of flawless stitches during a 1-min interval (stitches falling outside the drawing lines were considered as mistakes and not counted). Finally, for sequential finger tapping, we asked participants to tap according to the first score in the training series at a frequency of 60 bpm, and counted the number of mistakes (i.e., wrong colors, wrong fingers, and misses). The finger tapping performances were video-taped for subsequent scoring.

Note that in the pre-training assessments, the participants were confronted for the first time with novel tasks, but this is true both for the standardized MMDT and for the *ad hoc* manual dexterity motor tasks. In order to minimize the influence of procedural over manual novelty from the pre-training to the post-training sessions, participants were given sufficient time (5 min) to familiarize with the task instructions. In addition, each task was performed three times: the first trial served for familiarization, whereas only the last two trials was considered for the analysis by taking their mean score.

#### **LINGUISTIC STIMULI**

For each of the two participants' groups, we selected six manual action-related Italian transitive verbs describing the corresponding trained manual dexterity actions (**Table 1**). These verbs constituted the target semantic condition, for which we expected a specific facilitation at the conceptual-semantic level induced by manual training. As an experimental control, we also selected for each group six manual action-related transitive verbs, whose meaning was not associated with the trained manual dexterity actions (**Table 1**). These verbs constituted the non-target semantic condition. Note that, since two out of the three manual actions trained by each group differed between groups A and B, most of the verbs in the target semantic condition for group A could be used as verbs in the non-target semantic condition for group B, and vice versa (**Table 1**). Thus, the separation of participants in the two groups A and B served as a partial reciprocal control for the specificity of the manual training effect over conceptual-semantic verb processing. In other words, we expected the same verb to be associated both with a conceptual-semantic facilitation in the group where it belonged to the target semantic condition (e.g., group A), and with significantly reduced or no effects in the group where it belonged to the non-target semantic condition (e.g., respectively, group B).

The lexical frequency of verbs in the target versus non-target semantic conditions was balanced, using the Italian Corpus of Lexical Frequency (Laudanna et al., 1995), for both group A (*P* = 0.49) and group B verbs (*P* = 0.84). The number of letters (group A: *P* = 0.49; group B: *P* = 0.11) and syllables (group A: *P* = 0.61; group B: *P* = 0.08) was also balanced across the two conditions.

To serve as fillers for the semantic language task, we also selected 12 additional manual action-related transitive verbs, which did not bear any semantic relationships with the manual actions trained by either groups A and B (**Table 1**).

#### **SEMANTIC LANGUAGE TASK**

We paired each verb in **Table 1** with a color photograph of a manual interaction with objects taken from a standardized set of stimuli, which was developed to investigate the retrieval of lexical and conceptual action knowledge (Fiez and Tranel, 1997). For filler verbs, the manual action depicted in the photograph was incongruent with the verb meaning. For verbs in the target and non-target semantic conditions, it was congruent with the



verb meaning. Some congruent color photographs had to be taken *ad hoc*, as the corresponding actions were not present in the Fiez and Tranel set; for this, we used visual conventions matching as closely as possible those of the Fiez and Tranel set. Importantly, the congruent manual action depicted in the congruent photographs did not bear direct resemblance with the actions trained by the participants (e.g., the picture for "to fasten," associated to tying knots, represented a right and a left hand fastening shoe ties), and in some cases it was largely unrelated (e.g., the picture for "to manipulate," associated to rolling coins, represented a right and a left hand manipulating modeling clay). This was done in order to eliminate the potential bias in the semantic language task, deriving from visual familiarity with the situation depicted in the photographs, when the latter is similar to the situation experienced during manual dexterity training.

In order to measure the speed of lexical-conceptual retrieval in a semantic language task, we used an adapted version of the task employed by Beilock et al. (2008). All selected verbs were used in the third person singular, present simple tense form to create short declarative sentences of the form "Quella persona disegna" (English: "That person draws"). Participants were presented with

one sentence at a time for 1000 ms, followed by a 500-ms interval and by the associated picture for 3000 ms. The required task was to indicate, as quickly as possible, the congruency/incongruency of each sentence-picture pair by hitting the right (congruent) or left (incongruent) arrow keyboard key with the, respectively, middle and index right hand fingers. The 24 sentence-picture pairs were presented consecutively in one single block. Sentencepicture pairs were separated by a variable interval of either 3500, 4000, or 4500 ms. The pairs were presented in semi-randomized order, with different randomizations for the pre-training and the post-training sessions.

Stimulus presentation and response collection was controlled by a laptop with a 17<sup>00</sup> monitor, using Psychopy 1.64 software (Peirce, 2009). We calculated RTs as the time elapsed between the onset of picture presentation and the participant's response. The RTs for filler sentence-picture pairs were not analyzed.

Prior to the experimental sessions, the participants familiarized with the task instructions and performed a short familiarization block with four sentence-picture pairs not included in the experimental set.

#### **STATISTICAL ANALYSIS**

A significance α level of 0.05 was declared for all analyses.

#### **Manual dexterity assessment data**

In order to evaluate the effectiveness of manual training, MMDT scores and the scores relative to the specific metrics for the trained manual dexterity motor tasks were submitted to paired Student's *t*-tests comparing the post-training with pre-training performance.

## **Semantic language task data**

The collected RTs of all participants were pooled over groups A and B, according to the two semantic conditions (target versus non-target) and the two training phases (pre versus post). We discarded the RTs for incorrect answers, as well as those falling 2 SD above or below each subject's mean.

The effects of manual training onto target and non-target semantic processing were evaluated by means of paired Student's *t*-tests comparing post-training with pre-training RTs.

In order to investigate whether manual training induced a specific facilitation in the conceptual-semantic processing of target versus non-target stimuli, we used a repeated measures ANOVA on by-subject aggregated data with a 2 × 2 factorial design, the experimental factors being Semantic condition (target, non-target) and Training phase (pre, post). The assumption of sphericity was controlled by means of the Mauchly's sphericity test. We calculated main effects and interactions. A *post hoc* one-tailed Student's*t*-test was used to verify that post-training target stimuli were processed faster than post-training non-target stimuli.

A further analysis was performed in order to eliminate the potential bias deriving from the fact that the participants may have explicitly associated verbal descriptions to the trained manual dexterity tasks. We scored the written descriptions provided by the participants at the end of the training period (see Manual Dexterity Training) and eliminated from the analysis all the RTs relative to target sentence-picture pairs containing verbs referred by one or more participants. We then repeated the same 2 × 2 factorial analysis as for the complete stimulus set, including the sphericity test. Two participants of group B were left out from this analysis, as they did not provide any valid responses to the reduced stimulus set.

## **RESULTS**

#### **RESULTS OF THE MANUAL DEXTERITY ASSESSMENTS**

We compared the pre-training and post-training scores obtained in the MMDT and in the specific assessments for the trained manual dexterity motor tasks. All tests showed significant training effects in both groups (**Table 2**).

#### **RESULTS OF THE SEMANTIC LANGUAGE TASK**

The participants made on average 88.6% (SD ± 9.3) correct responses in the pre-training session and 92.1% (SD ± 7.1) correct responses in the post-training session, with no significant differences between sessions [*t*(37) = 1.539; *P* = 0.132]. The accuracy did not significantly differ between target and non-target sentencepicture pairs, neither pre-training [*t*(18) = 1.302; *P* = 0.209], nor post-training [*t*(18) = 1.286; *P* = 0.215]. The lack of significant results for accuracy replicates the observations of Beilock et al. (2008) using an almost identical semantic language task, and can be explained by the overall ease of the task. Accordingly, our prior hypotheses spelled out in the Section "Introduction" did not concern accuracy, but only RTs.

For target sentence-picture pairs, the participants responded on average after 652 ms (SD = 120) pre-training and 495 ms (SD = 76) post-training, with a significant [*t*(18) = −5.845; *P* = 0.000007] RTs reduction in the post-training session. A qualitatively similar effect was observed for non-target sentencepicture pairs: the participants responded on average after 633 ms (SD = 124) pre-training and 519 ms (SD = 65) post-training, with a significant [*t*(18) = −4.856; *P* = 0.00006] RTs reduction in the post-training session.

As a crucial analysis for our experimental question, we assessed whether manual training induced a specific facilitation in the conceptual-semantic processing of target versus nontarget sentence-picture pairs, by using a 2 × 2 repeated measures ANOVA. The main effect of Semantic condition was not significant [*F*(1,18) = 0.091; *P* = 0.767], whereas the main effect of Training phase was significant [*F*(1,18) = 32.683; *P* = 0.00002]. Most importantly, the Semantic condition by Training phase interaction was also significant [*F*(1,18) = 5.953; *P* = 0.025]. Accordingly, the post-training RTs for target sentence-picture pairs were significantly faster than those for non-target sentence-pictures pairs [*t*(18) = −2.242; *P* = 0.019] (**Figure 1A**).

The participants may have responded faster to target versus non-target sentence-picture pairs after training simply because they explicitly associated verbal descriptions to the trained manual dexterity tasks. To control for this potential bias, we eliminated the responses to stimulus pairs whose action-related verb had been used by one or more participants in their written descriptions of the trained manual dexterity tasks provided after the post-training session. This left us with the responses for stimuli containing, for group A, the target verbs "to hook," "to fasten," "to thread," and "to handle," and for group B, the target verb "to darn" (confront with **Table 1**). The mean RT was 691 ms (SD = 119) for target pretraining and 492 ms (SD = 73) for target post-training, with a significant [*t*(16) = −7.023;*P* = 0.000001] RTs reduction in the posttraining session. For non-target picture-sentence pairs the mean RT was 653 ms (SD = 113) pre-training and 527 ms (SD = 64) post-training, with a significant [*t*(16) = −5.164; *P* = 0.00005] RTs reduction in the post-training session. The 2 × 2 repeated measures ANOVA with this reduced response data set again showed that the main effect of Semantic condition was not significant [*F*(1,16) = 0.028; *P* = 0.870], that the main effect of Training phase was significant [*F*(1,16) = 47.897; *P* = 0.000003], and, most importantly, that the Semantic condition by Training phase interaction was also significant [*F*(1,16) = 9.006; *P* = 0.008]. Accordingly, the post-training RTs for target sentence-picture pairs were significantly faster than those for non-target sentence-pictures pairs [*t*(16) = −2.557; *P* = 0.011] (**Figure 1B**).

## **DISCUSSION**

The processing of action-related word meaning is thought to rely on distributed neural networks involving the amodal Perisylvian cortex and extending to the sensory-motor system, in a manner that flexibly depends on the context and task (Ghio and Tettamanti, 2010; van Dam et al., 2012b; Kiefer and Pulvermüller, 2012). The controversy as to whether the involvement of the content-specific sensory-motor cortices are indeed


Mean scores and SD for each measure pre- and post-training are indicated, together with the significance P-value of the paired Student's t-tests comparing the post-training with the pre-training performance.

constitutive to conceptual-semantic language understanding or instead just an epiphenomenon (Mahon and Caramazza, 2008; Meteyard et al., 2012) requires, in order to be further attacked, convincing evidence of causality, demonstrating that the degree of the sensory-motor involvement indeed determines the efficiency of conceptual-semantic language understanding, particularly in the intact, healthy brain. In the present study, we have attempted to fulfill these requirements by training naive healthy subjects to perform complex manual actions over a period of 3 weeks, and by measuring the post- versus pre-training effect on a semantic language task distinguishing between sentence meanings that were either semantically related (target items) or unrelated (nontarget items) to the trained manual actions. Consistently with our hypothesis and with the constitutiveness argument, we found a significant Semantic condition by Training phase interaction, and showed that the interaction was accounted for by faster posttraining RTs responses specifically for target versus non-target stimuli. This is suggestive of a causal relationship between actionrelated language processing and sensory-motor brain regions controlling manual actions. Due to the purely behavioral nature of our measurements, we cannot provide here any detailed descriptions of the involved sensory-motor brain regions, but we can speculate based on a previous neuroimaging study (Beilock et al., 2008) that these crucially involve the left dorsal premotor area. More in general, other brain regions of the action representation system distributed in the inferior frontal, parietal, and temporal lobes may also be involved (Grafton and Hamilton, 2007; Ghio and Tettamanti, 2010).

It is important to note that, above and beyond these essential neuroanatomical specifications, the implied causal relationship between action-related language processing and the motor system is not merely of a locationist type, such that shared brain regions become involved, for example, through Hebbian association learning (see, e.g., Fargier et al., 2012), but truly functional. The higher activation and/or neuronal density of relevant components of the motor system that follows from the experiencedependent acquisition of finer action control skills leads to a more efficient semantic comprehension of words or sentences conveying the corresponding concepts, as shown in the present and in the Beilock et al.'s (2008) study. This functional as opposed to a locationist conceptualization of causality also nicely fits with the view that the activation of each neural node constituting a distributed semantic network can be modulated in a flexible manner (Kiefer and Pulvermüller, 2012), with a full involvement of the semantic network producing a most vivid conceptual representation.

Although the concept-specific facilitation effect induced by manual training in the present study could be predicted based both on the constitutiveness argument that the degree of involvement of sensory-motor brain areas determines the efficiency of conceptual-semantic language understanding, and on the previous study by Beilock et al. (2008), there are other circumstances, as described in the Section "Introduction," in which the shared exploitation of common neural resources by language and action processing leads to interference effects. In particular, Glenberg et al. (2008a) let participants perform a repetitive manual task and found a concept-specific slowing down of RTs in a semantic language task performed immediately after. There were several methodological differences between the present study and the study by Beilock et al. (2008), on the one side, and the study of Glenberg et al. (2008a), on the other side, that may justify this discrepancy, including the fact that in the latter study the participants performed a highly stereotyped movement in one single session of 20 min, which was most likely not challenging enough to lead to an expansion of their sensory-motor experience and repertoire. This is even more likely the case, since the stereotyped movement (moving beans) consisted of a well-learned motor behavior, as

opposed to teaching a new behavior as in the present study. The temporal windows of neural plasticity investigated may also play an important role: the saturation effect (of probable neurophysiological origin) observed in the Glenberg et al.'s (2008a) study immediately after a brief motor task session may turn into a facilitation effect if the motor task is protracted over multiple sessions and if a sufficient amount of time elapses between the motor and the language tasks in order to permit structural neural plasticity to develop. This latter, long-term scenario may be more closely related to the experimental setting in both theBeilock et al.'s (2008) study, in which expertise was roughly equated to enduring individual attitudes, and in the present study, in which the participants were trained in complex manual actions over a period of 3 weeks.

The adoption of a long-term sensory-motor training program, protracted over a period of 3 weeks, to investigate plasticity effects on conceptual-specific semantic processing, is a major factor of experimental novelty of the present study, compared to the large amount of previous studies on action-language compatibility effects (e.g., Glenberg and Kaschak, 2002; de Vega et al., 2004; Borreggine and Kaschak, 2006; Boulenger et al., 2006; Zwaan and Taylor, 2006; Scorolli et al., 2009). This type of long-term training paradigms may be particularly helpful in the future to further explore, in an experimentally controlled manner, how conceptualsemantic linguistic representations are dynamically tuned by the constantly changing sensory-motor experiences across the individual life-time, similar to an increasingly widespread approach for cognitive studies outside the language domain (see, e.g., Kiefer et al., 2007; Weisberg et al., 2007; Bellebaum et al., in press).

It is also important to note that, although the Semantic condition by Training phase interaction indicates that there was a specific effect of manual training on target conceptual-semantic processing, the paired comparisons contrasting the post-training versus pre-training performance, separately for target and nontarget sentence-picture pairs, showed a marked decrease of RTs for both the target and the non-target conditions. The present study does not allow to distinguish between an interpretation of this general effect as being due either to an unspecific gain of procedural, motor, and executive skills induced by manual dexterity training (such that the participants were simply more responsive and compliant to the task's requests after training), or to a carryover effect of increased sensory-motor resources also available for the conceptual-semantic processing of non-target action-related verbs. In the former view, the greatest proportion of the variance of the Training phase effect would be explained by non-semantic factors related to manual dexterity, with conceptual-semantic factors inducing only a relatively smaller gain of response efficiency limited to the target condition, as represented by the Semantic condition by Training phase interaction. In the latter view, manual training conferred more efficiency to the semantic processing of both target and non-target action-related verbs, but with a significantly higher effect specifically for target verbs. This would be possible, for instance, if the neural plasticity effects induced by training in the dorsal premotor cortex would partially propagate from cell populations specific for the trained manual actions to other surrounding cell populations coding for other (non-target) manual actions. It is of course also possible that both factors contributed to the observed generalized effect of Training phase. Further studies will be required to discriminate between these scenarios.

There are in principle a few alternative explanations to account for the results presented here. The first alternative explanation is that the greater post-training RTs reduction observed for target compared to non-target sentence-pictures pairs may not be due to the greater sensory-motor experience acquired through training of the related manual actions, but simply to the fact that during training the participants used the target action-related verbs more frequently (e.g., to describe, rehearse, or plan the trained actions) – a concern that, as noted in the Section "Introduction," was not accounted for in the Beilock et al.'s (2008) study. While it is not possible to monitor the ongoing lexicon retrieval of the participants during the training period, we have done our best to control for this potential bias, by asking the participants at the end of the training period to provide written verbal descriptions of the trained motor tasks. We scored these descriptions and eliminated all the responses relative to target sentence-picture pairs containing verbs explicitly referred by even just one or by more than one participant. We then submitted this reduced response data set to the same 2 × 2 factorial analysis used for the complete set. The results of this control analysis were qualitatively identical to those of the complete response data set, and the significance levels of both the Semantic condition by Training phase interaction and the *post hoc* comparison between post-training target and non-target sentence-picture pairs were even increased. We are therefore confident that the acquired sensory-motor manual expertise, rather than simply verbal rehearsal, caused the observed concept-specific improvement in the semantic language task.

The second alternative explanation is that the observed results may again not be due to the greater sensory-motor experience acquired through manual training, but rather to a visual familiarity between the situation depicted in the pictures presented in the semantic language task with sentence-picture pairs and the situation experienced during manual training (e.g., objects manipulated, hand posture, and visual angle). However, as noted in Section "Semantic Language Task," the manual actions depicted in the photographs belonging to the target sentence-picture pairs did not bear direct visual resemblance with the manual actions trained by the participants. For example, the picture for the target verb "to fasten," associated to the trained manual action "tying knots," represented a right and a left hand fastening shoe ties; the picture for the target verb"to manipulate,"associated to the trained manual action "rolling coins," represented a right and a left hand manipulating modeling clay. The crucial notion here regards the separability of visual similarity from conceptual-semantic processing in the context of processing pictures in our semantic congruency judgment task. Neurophysiological studies in monkeys and neuroimaging studies in humans have provided abundant evidence that the visual recognition of an observed action involves two highly integrated but distinct neural pathways: one "dorsal" pathway for the analysis of how the action is physically carried out in relation to, for example, the object's location, size, and affordances, the hand's location and haptic configuration, and the required sequence of motor acts; and one "ventral" pathway for analyzing the "abstract" meaning of the observed action (Arbib, 2012). Our effort to minimize the visual resemblance between the

depicted and the trained actions was precisely aimed at eliminating as much as possible any effects of visual priming that may result from some overlap of neural coding in the "dorsal" pathway. Considering the example of fastening shoelaces in comparison to the trained action of tying sailor's knots, there were notable differences with respect to location, size, and affordance of laces versus rope objects, different hand configurations, and a different sequence of motor acts. This in contrast to the shared "abstract" semantic notion of "fastening/tying knots," which may lead to a neural coding overlap in the "ventral" pathway by both the depicted and the trained action. This latter overlap is however not a matter of concern, as it relates precisely to the conceptual-semantic level that we aimed to assess. We are therefore again confident that the effect of visual similarity also did not bias the language understanding improvement effect.

In sum, we conclude that an increase in sensory-motor expertise gained by training of specific manual actions can lead to a more efficient semantic processing of the specific actionrelated conceptual domain associated to the trained actions, with

## **REFERENCES**


a possible, relatively less pronounced, carry-over effect to the entire action-related domain. This modality-specific effect most likely depends on shared neural resources between the sensorymotor system and conceptual-semantic language processing and implies a bidirectional causal link, in which sensory-motor experience can influence word meaning representations and the processing of word meaning can in turn influence sensory-motor representations. This latter aspect may be revealed by future research.

## **AUTHOR CONTRIBUTIONS**

Designed research: Matteo Locatelli, Roberto Gatti, and Marco Tettamanti. Performed research: Matteo Locatelli. Analyzed data: Matteo Locatelli and Marco Tettamanti. Wrote the paper: Matteo Locatelli, Roberto Gatti, and Marco Tettamanti.

## **ACKNOWLEDGMENTS**

We are grateful to Marta Ghio and Matilde Vaghi for precious comments to our manuscript.


A., Belloch, V., et al. (2006). Reading cinnamon activates olfactory brain regions. *Neuroimage* 32, 906–912.


Category specificity in the processing of color-related and form-related words: an ERP study.*Neuroimage* 29, 29–37.


language systems. *Eur. J. Neurosci.* 21, 793–797.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 September 2012; accepted: 21 November 2012; published online: 10 December 2012.*

*Citation: Locatelli M, Gatti R and Tettamanti M (2012) Training of manual actions improves language understanding of semantically related action sentences. Front. Psychology 3:547. doi: 10.3389/fpsyg.2012.00547*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Locatelli, Gatti and Tettamanti. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Neurological evidence linguistic processes precede perceptual simulation in conceptual processing

## **Max Louwerse\* and Sterling Hutchinson**

Department of Psychology, Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Claudio Gentili, University of Pisa, Italy Christopher Kurby, Grand Valley State University, USA

#### **\*Correspondence:**

Max Louwerse, Department of Psychology, Institute for Intelligent Systems, University of Memphis, 202 Psychology Building, Memphis, TN 38152, USA. e-mail: maxlouwerse@gmail.com

There is increasing evidence from response time experiments that language statistics and perceptual simulations both play a role in conceptual processing. In an EEG experiment we compared neural activity in cortical regions commonly associated with linguistic processing and visual perceptual processing to determine to what extent symbolic and embodied accounts of cognition applied. Participants were asked to determine the semantic relationship of word pairs (e.g., sky – ground) or to determine their iconic relationship (i.e., if the presentation of the pair matched their expected physical relationship). A linguistic bias was found toward the semantic judgment task and a perceptual bias was found toward the iconicity judgment task. More importantly, conceptual processing involved activation in brain regions associated with both linguistic and perceptual processes. When comparing the relative activation of linguistic cortical regions with perceptual cortical regions, the effect sizes for linguistic cortical regions were larger than those for the perceptual cortical regions early in a trial with the reverse being true later in a trial. These results map upon findings from other experimental literature and provide further evidence that processing of concept words relies both on language statistics and on perceptual simulations, whereby linguistic processes precede perceptual simulation processes.

**Keywords: embodied cognition, symbolic cognition, symbol interdependency, perceptual simulation, language processing, EEG**

## **INTRODUCTION**

Conceptual processing elicits perceptual simulations. For instance, when people read the word pair *sky – ground*, one word presented above the other, processing is faster when *sky* appears above *ground* than when the words are presented in the reversed order (Zwaan and Yaxley, 2003; Louwerse, 2008; Louwerse and Jeuniaux, 2010). Embodiment theorists have interpreted this finding as evidence that perceptual and biomechanical processes underlie cognition (Glenberg, 1997; Barsalou, 1999). Indeed, numerous studies show that processing is affected by tasks that invoke the consideration of perceptual features (see Pecher and Zwaan, 2005; De Vega et al., 2008; Semin and Smith, 2008; for overviews). Much of this evidence comes from behavioral response time (RT) experiments, but there is also evidence stemming from neuropsychological studies (Buccino et al., 2005; Kan et al., 2003; Rueschemeyer et al., 2010). This embodied cognition account is oftentimes presented in contrast to a symbolic cognition account that suggests conceptual representations are formed from statistical linguistic frequencies (Landauer and Dumais, 1997). Such a symbolic cognition account that uses the mind-as-a-computer metaphor has occasionally been dismissed by embodiment theorists (Van Dantzig et al., 2008).

Recently, researchers have cautioned pitting one account against another, demonstrating that symbolic and embodied cognition accounts can be integrated (Barsalou et al., 2008; Louwerse, 2008, 2011; Simmons et al., 2008). For instance, Louwerse (2011) proposed the Symbol Interdependency Hypothesis, arguing that language encodes embodied relations which language users can use as a shortcut during conceptual processing. The relative importance of language statistics and perceptual simulation in conceptual processing depends on several variables, including the type of stimulus presented to a participant, and the cognitive task the participant is asked to perform (Louwerse and Jeuniaux, 2010). Louwerse and Connell (2011) further found that the effects for language statistics on processing times temporally preceded the effects of perceptual simulations on processing times, with fuzzy regularities in linguistic context being used for quick decisions and precise perceptual simulations being used for slower decisions. Importantly, these studies do not deny the importance of perceptual processes. In fact, individual effects for perceptual simulations were also seen early on in a trial, however, when comparing the effect sizes of language statistics and perceptual simulations,Louwerse and Connell (2011)found evidence for early linguistic and late perceptual simulation processes.

The results from these RT studies, however, only indirectly demonstrate that language statistics and perceptual simulation are active during cognition, because the effects are modulated by hand movements and RTs. Although such methods are methodologically valid, we sought to establish whether such conclusions were also supported by neurological evidence.

In the current paper our objective was to determine when conceptual processing uses neurological processes best explained by language statistics relative to neurological processes best explained by perceptual simulations. Given the evidence that both statistical linguistic frequencies and perceptual simulation are involved in conceptual processing (Louwerse, 2008; Simmons et al., 2008; Louwerse and Jeuniaux, 2010), and that the effect for language statistics outperforms the effect for perceptual simulations for fast RTs, with the opposite being true for slower RTs (Louwerse and Connell, 2011), we predicted that cortical regions commonly associated with linguistic processing, when compared with activation in cortical regions commonly associated with perceptual simulation, would be activated relatively early in a RT trial. Conversely, when compared with activation in cortical regions commonly associated with linguistic processing, cortical regions associated with perceptual simulation were predicted to show greater activity relatively later in a RT trial. Further, we predicted activation would be modified by the cognitive task, such that perceptual cortical regions would be more active in a perceptual simulation task, whereas linguistic cortical regions would be more active in a semantic judgment task.

Traditional EEG methodologies are not quite sufficient to answer this research question. For instance, event-related potential (ERP) methods only allow for analyses of time-locked components that activate in response to specific events over numerous trials (Collins et al., 2011; Hald et al., 2011). EEG recordings combined with magnetoencephalography (MEG) recordings can provide high-resolution temporal information and spatial estimates of neural activity, provided that appropriate source reconstruction techniques are used (Hauk et al., 2008). However, this technique establishes whether and when cortical regions are activated, but does not answer the question of what cortical regions are activated in relation to each other. Such a comparative analysis seems to call for a different and novel method.

We utilized source localization techniques in conjunction with statistical analyses to determine when and where relative effects of linguistic and perceptual processes occurred. We did this by investigating which regions of the cortex are responsible for activity throughout the time course of each trial. However, source localization determines only where differences emerge between conditions at specific points in time; our goal was to determine whether relatively stronger early effects of linguistic processes preceded a relatively stronger later simulation process. Consequently, we used established source localization techniques (Pascual-Marqui, 2002) to determine where differences in activation were present during an early versus a late time period. With that information we then ran a mixed effects model on electrode activation throughout the duration of a trial to identify the effect size for activation of linguistic versus perceptual cortical regions over time. This type of analysis is progressive in that it allowed us not only to determine that activation differed between linguistic and perceptual cortical regions but also allowed us to gain insight into the relative effect size of language statistics and perceptual simulation as they contribute to conceptual processing throughout the time course of a trial.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Thirty-three University of Memphis undergraduate students participated for extra credit in a psychology course. All participants had normal or corrected vision and were native English speakers. Fifteen participants were randomly assigned to the semantic judgment condition, and 18 participants were randomly assigned to the iconicity judgment condition.

#### **MATERIALS**

Each condition consisted of 64 iconic/reverse-iconic word pairs extracted from previous research (Louwerse, 2008; Louwerse and Jeuniaux, 2010; see Appendix). Thirty-two pairs with an iconic relationship were presented vertically on the screen in the same order they would appear in the world (i.e., *sky* appears above *ground*). Likewise, 32 pairs with a reverse-iconic relationship appeared in an order opposite of that which would be expected in the world (i.e., *ground* appears above *sky*). The remaining 128 trials contained filler word pairs that had no iconic relationship. Half of the fillers had a high semantic relation (cos = 0.55) and half had a low semantic relation (cos = 0.21), as determined by latent semantic analysis (LSA), a statistical, corpus-based, technique for estimating semantic similarities on a scale of −1 to 1 (Landauer et al., 2007). All items were counterbalanced such that all participants saw all word pairs, but no participant saw the same word pair in both orders (i.e., both the iconic and the reverse-iconic order for the experimental items).

#### **EQUIPMENT**

An Emotiv EPOC headset (Emotiv Systems Inc., San Francisco, CA, USA) was used to record electroencephalograph data. EEG data recorded from the Emotiv EPOC headset is comparable to data recorded by traditional EEG devices (Bobrov et al., 2011; Stytsenko et al., 2011). For instance, patterns of brain activity from a study in which participants imagined pictures were comparable between the 16-channel Emotiv EPOC system and the 32-channel ActiCap system (Brain Products, Munich, Germany; Bobrov et al., 2011). The Emotiv EPOC is also able to reliably capture P300 signals (Ramírez-Cortes et al., 2010; Duvinage et al., 2012), even though the accuracy of high-end systems is superior.

The headset was fitted with 14 Au-plated contact-grade hardened BeCu felt-tipped electrodes that were saturated in a saline solution. Although the headset used a dry electrode system, such technology has shown to be comparable to traditional wet electrode systems (Estepp et al., 2009). The headset used sequential sampling at 2048 Hz and was down-sampled to 128 Hz. The incoming signal was automatically notch filtered at 50 and 60 Hz using a 5th order sinc notch filter. The resolution was 1.95µV.

#### **PROCEDURE**

In both the semantic judgment and iconicity judgment conditions, word pairs were presented vertically on an 800 × 600 computer screen. In the semantic judgment condition, participants were asked to determine whether a word pair was related in meaning. In the iconicity judgment condition, participants were asked whether a word pair appeared in an iconic relationship (i.e., if a word pair appeared in the same configuration as the pair would occur in the world). Participants responded to stimuli by pressing designated yes or no keys on a number pad. Participants were instructed to move and blink as little as possible. Word pairs were randomly presented for each participant in order to negate any order effects. To ensure participants understood the task, a session of five practice trials preceded the experimental session.

## **RESULTS**

We followed prior research (Louwerse, 2008; Louwerse and Jeuniaux, 2010) in identifying errors and outliers. As in those studies, error rates were expected to be high in both the semantic judgment task and the iconicity task. Although some word pairs may share a low semantic relation according to LSA, sometimes for at least one word meaning, a higher semantic relationship might be warranted (see Louwerse et al., 2006). For example, according to LSA, *rib* and *spinach* has a low semantic relation (cos = 0.07), but in one meaning of *rib* (that of barbecue) such a low semantic relation is not justified (Louwerse and Jeuniaux, 2010). For the semantic judgment task, error rates were unsurprisingly approximately 25% (*M* = 26.07, *SD* = 7.51). Similarly, for the iconicity judgment condition, error performance can also be explained by the task. *Priest* and *flag* are not assumed to have an iconic relation, even though such a relation could be imagined. Error rates were around 25–30% (*M* = 29, *SD* = 8.53). For both the semantic judgment condition and the iconicity judgment condition, these error rates were comparable with those reported elsewhere (Louwerse and Jeuniaux, 2010). Analyses of the errors revealed no evidence for a speed-accuracy trade-off. In the RT analysis, data from each subject whose RTs fell more than 2.5 *SD* from the mean per condition, per subject, were removed from the analysis, affecting less than 3% of the data in both experiments.

A mixed effects regression analysis was conducted on RTs with order (*sky* above *ground* or *ground* above *sky*) as a fixed factor and participants and items as random factors (Richter, 2006; Baayen et al., 2008). *F*-test denominator degrees of freedom for RTs were estimated using the Kenward–Roger's degrees of freedom adjustment to reduce the chances of Type I error (Littell et al.,2002). For the semantic judgment condition, differences were found between the iconic and the reverse-iconic word pairs *F*(1, 2683.75) = 3.7, *p* = 0.05, with iconic word pairs being responded to faster than reverse-iconic word pairs, *M* = 1592.92, SE = 160.46 versus *M* = 1640.06, SE = 159.8. A similar result was obtained for the iconicity judgment condition, *F*(1, 3332.39) = 13.58, *p* < 0.001, again with iconic word pairs being responded to faster than reverse-iconic word pairs, *M* = 1882.87, SE = 155.43 versus *M* = 1980.80, SE = 154.67. This RT advantage has been reported elsewhere (Zwaan and Yaxley, 2003; Louwerse, 2008; Louwerse and Jeuniaux, 2010). What is not clear from these results is whether this effect can be explained by an embodied cognition account (iconicity through perceptual simulations), by a symbolic cognition account (word-order frequency), or by both. As in Louwerse and Jeuniaux (2010)language statistics and perceptual simulations were operationalized using word-order frequency and iconicity ratings.

### **ORDER FREQUENCY**

Language statistics were operationalized as the log frequency of *a*-*b* (e.g.,*sky – ground*) and *b*-*a* (e.g., *ground – sky*) order of word pairs (cf. Louwerse, 2008; Louwerse and Jeuniaux, 2010; Louwerse and Connell, 2011). The order frequency of all 64 word pairs within 3– 5 word grams was obtained using the large Web 1T 5-gram corpus (Brants and Franz, 2006).

#### **ICONICITY RATINGS**

Twenty-four participants at the University of Memphis estimated the likelihood that concepts appeared above one another in the real world. Ratings were made for 64 word pairs on a scale of 1–6, with 1 being extremely unlikely and 6 being extremely likely. Each participant saw all word pairs, but whether a participant saw a word pair in an iconic or a reverse iconic order was counterbalanced such that each participant saw iconic and reverse-iconic word pairs, but no participant saw a word pair both in an iconic and a reverse-iconic order. High interrater reliability was found in both groups (Group A: average *r* = 0.76, *p* < 0.001, *n* = 64; Group B: average *r* = 0.74, *p* < 0.001, *n* = 64), with a negative correlation between the two groups (average *r* = −0.72, *p* < 0.001, *n* = 64).

A mixed effects regression was conducted on RTs with order frequencies and iconicity ratings as fixed factors and participants and items as random factors. For the semantic judgment condition, a mixed effects regression showed that statistical linguistic frequencies significantly predicted RTs, *F*(1, 760.86) = 24.95, *p* < 0.001, with higher frequencies yielding faster RTs. Iconicity ratings did not yield a significant relation with RT,*F*(1, 762.09) = 0.46, *p* = 0.5 (see the first two bars in **Figure 1**; **Table 1**).

## **RESPONSE TIMES**

For the iconicity judgment condition, a mixed effects regression showed statistical linguistic frequencies again significantly predicted RT, *F*(1, 945.78) = 5.03, *p* = 0.03, with higher frequencies yielding faster RTs. Iconicity ratings also yielded a significant relation with RT, *F*(1, 947.65) = 5.61, *p* = 0.02, with higher iconicity ratings yielding lower RTs (see the second two bars in **Figure 1**; **Table 1**).

**Figure 1** shows that statistical linguistic frequencies explained RTs in both the semantic judgment and the iconicity judgment conditions, but the effect was stronger in the semantic judgment than in the iconicity judgment condition. **Figure 1** and **Table 1** also show the opposite results for perceptual simulation in that during the semantic judgment condition, the effect of perceptual simulation on RT was limited (and not significant). However, in the iconicity judgment condition, perceptual simulation was significant. The interaction for linguistic frequencies and condition

**FIGURE 1 | Strength of the mixed effects regressions on the RTs in absolute t-values for each of the two conditions for linguistic (order frequency) and perceptual (iconicity ratings) factors.** Asterisks mark significant strengths (p < 0.05) of relationship with RTs.



Note. Dependent variable is response time; \*\*p < 0.01, \*p < 0.05.

(semantic versus iconic) was significant, *F*(2, 1005.05) = 15.88, *p* < 0.001, as was the interaction for perceptual simulation and condition, *F*(2, 1634.20) = 2.9, *p* = 0.05. Indeed, the overall interaction between factors (linguistic and perceptual) and condition was significant, *F*(2, 1540.18) = 8.10, *p* < 0.001.

These findings replicate the RT data in Louwerse and Jeuniaux (2010). That is, order frequency better explained RTs than the iconicity ratings did in the semantic judgment task, but iconicity ratings better explained RTs than the order frequency did in the iconicity judgment task.

#### **EEG ACTIVATION**

As discussed earlier, we utilized previously established EEG source localization techniques in conjunction with statistical analyses to determine when and where relative effects of linguistic and perceptual processes occurred. Continuous neural activity was recorded from 14 international 10–20 sites (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4; Reilly, 2005, p. 139). Scalp recordings were referenced to CMS/DRL (P3/P4) locations. All electrode impedances were kept below 10 kΩ. As the Emotiv EPOC headset is noisier than high-end systems, to minimize oculomotor, motor, and electrogalvanic artifacts, a high-pass hardware filter removed signals below 0.16 Hz and a low-pass filter removed signals above 30 Hz (see Bobrov et al., 2011 and Duvinage et al., 2012 for similar filtering ranges with the Emotiv EPOC headset). The EEG was sampled at 2048 Hz and was down-sampled to 128 Hz. Gross eye blink and movement artifacts over 150µV were excluded from the analysis. All data were wirelessly collected via a proprietary Bluetooth USB chip operating in the same frequency range as the headset (2.4 GHz). Data were recorded using Emotiv Testbench software (Emotiv Systems, Inc., San Francisco, CA, USA).

Data were filtered using EEGLAB (Delorme and Makeig, 2004), an open-source toolbox for MATLAB (Mathworks, Inc., Natick, MA, USA). Independent component analyses were implemented using ADJUST, an algorithm that automatically identifies stereotyped temporal and spatial artifacts (Mognon et al., 2010). Any remaining oculomotor or motor activity was visually identified and removed from the dataset.

On average, subjects took 1809 ms to process and respond to the words presented on the screen. Therefore the sLORETA package (Pascual-Marqui, 2002) was used to localize general activity at an early (97–291 ms) and a late (1551–1744 ms) time interval (as we predicted linguistic processes would precede perceptual simulation) in both conditions. The early time period began shortly after presentation of the stimuli and the late time period began shortly before the subject response. LORETA used the MNI152 template (Fuchs et al., 2002) to compute a non-parametric topographical analysis of variance comparing differences between two maps of averaged cortical activity over each time period (Strik et al., 1998). The topographies significantly differed between conditions at early, *p* < 0.01, and late, *p* < 0.01, intervals, with the maximum source for the early time period being found around the left inferior frontal gyrus (iFG; near electrode sites FC5, F7, and T7) and the maximum source for the late time period being found near the lingual gyrus (near electrode sites O1, O2, P7, and P8). As source localization with EEG poorly maps anatomical correlates to function, these sites are obviously approximations of the relevant underlying cortical regions (Nunez and Srinivasan, 2005). Note we are not attempting to pinpoint exact regions of neural activity at a given time but instead we are simply attempting to compare general estimates of neural activity in early versus late processing (i.e., we would like to determine when processing occurs in more linguistic versus in more perceptual regions over the duration of a trial).

Although neural processes are quite distributed and bilaterally activate multiple cortical regions (Bullmore and Sporns, 2009; Bressler and Menon, 2010), there is considerable agreement that specific regions (such as the left iFG and left superior temporal gyrus (STG) consistently show increased activation during language processing (Cabeza and Nyberg, 2000; Papathanassiou, 2000; Blank et al., 2002; De Carli et al., 2007). The same applies to visual perception and visual imagery processes, which bilaterally activate multiple cortical regions, in particular occipital and parietal lobes (Kosslyn et al., 1993, 1999;Alivisatos and Petrides, 1996). Further, visual imagery of words activates these same regions that process incoming perceptual information (Ganis et al., 2004). Reichle et al. (2000) used fMRI to demonstrate that when told to rely on visual imagery while processing linguistic information, subjects were more likely to show increased activation in parietal lobes. As expected, when asked to rely on verbal strategies, activation in traditional language processing regions dominated. Finally, in an fMRI study, Simmons et al. (2008) found that when asked to generate situations in which a word might occur, subjects showed increased activity in the cuneus, precuneus, posterior cingulate gyrus, retrospinal cortex, and lateral parietal cortex. However, when asked to participate in a word association task, activation occurred in language processing regions of the brain, specifically the lateral left iFG and the medial inferior frontal. During early conceptual processing (first 7.5 s of a 15 s trial), activation

was similar to that of the word association task (i.e., these same language processing areas were active). This is consistent with our output from sLORETA in that during early processing, the maximum source was also the left iFG. Unlike early processing, Simmons et al. (2008) found that late conceptual processing (last 7.5 s of a 15 s trial) resulted in activation of the precuneus, posterior cingulate gyrus, and the right lateral parietal cortex (regions all closest to electrodes P7, P8, O1, and O2), the same regions active during situation generation. Although our sLORETA source localization indicated that the maximum source for our late time period was near the lingual gyrus, this region is also in closest proximity to electrode sites P7, P8, O1, and O2.

**Figure 2** shows the activation for a participant averaged across all trials in 100 ms increments. A relatively localized increase in activation in linguistic processing regions began almost immediately after a stimulus was presented. Around the middle of the trial, the activation dispersedfrom the linguistic processing regions toward perceptual processing regions. Late in the trial, localized activation was relatively greater in perceptual processing regions. This pattern matches the conclusions drawn by Louwerse and Connell (2011) on the basis of RT data and the results obtained through sLORETA, that linguistic processes precede perceptual processes.

To complement the pattern observed in **Figure 2** in both our RT data and in the sLORETA results, we performed a mixed effects regression on electrode activation. We assigned the linguistic cortical regions, as determined by sLORETA localization, a dummy value of 1, and we assigned the perceptual cortical regions, as determined by sLORETA localization, a dummy value of 2. We used electrode activation as our dependent variable, and participant, item, and receptor as random factors. The reason we used individual receptors as random factors was to rule out strong effects that could be observed for one receptor but not for others within the regions commonly associated with linguistic or perceptual processing. With this analysis, our objective was to determine to what extent linguistic or perceptual cortical regions overall showed increased activation throughout the trial. As in the previous analyses, *F*-test denominator degrees of freedom for the dependent variable were estimated using the Kenward–Roger's degrees of freedom adjustment.

For the semantic judgment condition, a significant difference was observed between linguistic and perceptual cortical regions, *F*(1, 1153108.58) = 46.70, *p* < 0.001. A similar pattern was found for the iconicity judgment condition, *F*(1, 1464148.76) = 24.07, *p* < 0.001. The fact that a difference was observed is perhaps uninteresting; differences between linguistic and perceptual regions are expected. Instead, the direction of the effect is important here. Recall that linguistic regions were dummy coded as 1, and perceptual regions were dummy coded as 2. Positive *t*-values would indicate that perceptual regions dominate, and negative *t*-values would indicate that linguistic regions dominate. Based on the findings in the RT analysis reported above, we predicted that linguistic regions would dominate in both the semantic and iconicity task, and more so in the semantic judgment task than in the iconicity judgment task. This prediction is supported by the results; *t*-values in both the semantic and iconicity tasks were negative, as predicted with higher *t*-values in the semantic task, *t* (1153109) = −6.83, *p* < 0.001, than in the iconicity task, *t* (1464149) = −4.91, *p* < 0.001, replicating the RT findings.

To determine whether linguistic processes precede perceptual simulation processes, we created 20 time bins for each trial per participant, per condition (cf. Louwerse and Bangerter, 2010). Each time bin was therefore approximately 80 ms for the semantic judgment condition and 95 ms for the iconicity judgment condition. Twenty time bins allowed for the largest number of groups for examining trends of each factor while retaining sufficient data points per participant to test the time course hypotheses. Mixed effects models were again run, now with time bin as an added predictor in the model. The *t*-values of the mixed effects models per time bin are shown in **Figure 3A**, **Tables 2** and **3**. The figure shows that *t*-values in both the semantic judgment and the iconicity judgment experiments are predominantly negative in the first half of the trial (suggesting a bias toward cortical regions associated with linguistic processing), and predominantly positive toward the end of the trial (suggesting a bias toward cortical regions associated with perceptual processing). Note here that these are the relative effect sizes for the two clusters of cortical regions (FC5, F7, and T7) and (O1, O2, P7, and P8), with the effects for individual electrodes filtered out. The findings do not show low activation for the perceptual processing areas early on in the trial (as words must of course be recognized by the visual system during processing); these results merely show that, relative to the brain regions associated with linguistic processing, the effect sizes of perceptual processing regions dominate later in the trial. Also note the relative effect for brain regions associated with perceptual processing very early in the trial (time bins 1–4), perhaps in line with the early activation of perceptual simulations (Hauk et al., 2008; Pulvermüller et al., 2009).

To further demonstrate the neurological evidence for relatively earlier linguistic processes and relatively later perceptual simulation, we fitted the *t*-test values for the 20 time bins using exponential, power law, and growth models. The fit of the sinusoidal curve was superior to these models across the two data conditions. **Figure 3B** presents the fit, the standard errors, and the values for the four variables. The sinusoidal fit converged in four iterations (iconicity task) and five iterations (semantic task) to a tolerance of 0.00001.

Using the sinusoidal model and the parameters derived from the data, the following figure emerged (**Figure 3B**). For both the semantic judgment and the iconicity judgment conditions,linguistic cortical regions dominated initially,followed later by perceptual cortical regions. As **Figure 3B** clearly shows, activation in linguistic cortical regions dominated in the semantic judgment task,

#### **Table 2 | Regression coefficients semantic judgment task EEG experiment.**


Note. Dependent variable is EEG activation: negative t values indicate a bias toward linguistic cortical areas, positive t-values a bias toward perceptual cortical areas; \*\*p < 0.01, \*p < 0.05.

whereas activation in perceptual cortical regions was prominent in the iconicity judgment task. Moreover, linguistic cortical regions showed greater activation relatively early in the trial, whereas perceptual cortical regions showed greater activation relatively late in processing. The results from these analyses are in line with results we obtained through both more commonly used source localization techniques and RT analyses, but they give a more detailed view of relative cortical activation for linguistic and perceptual processes throughout each trial.

## **DISCUSSION**

The purpose of this experiment was to neurologically determine to what extent both linguistic and embodied explanations can be used in conceptual processing. The results of a semantic judgment and an iconicity judgment task demonstrated that both language statistics and perceptual simulation explain conceptual processing. Specifically, statistical linguistic frequencies best explain semantic judgment tasks, whereas iconicity ratings better explain iconicity judgment tasks. Our results also showed that linguistic cortical

#### **Table 3 | Regression coefficients iconicity judgment EEG experiment.**


Note. Dependent variable is EEG activation: negative t values indicate a bias toward linguistic cortical areas, positive t-values a bias toward perceptual cortical areas; \*\*p < 0.01, \*p < 0.05.

regions tended to be relatively more active overall during the semantic task, and perceptual cortical regions tended to be relatively more active during the iconicity task. Moreover, on any given trial, neural activation progressed from language processing cortical regions toward perceptual processing cortical regions. These findings support the conclusion that conceptual processing is both linguistic and embodied, both in early and late processing, however when comparing the relative effect of linguistic processes versus perceptual simulation processes, the former precedes the latter (see also Louwerse and Connell, 2011).

Standard EEG methods, such as ERP, are extremely valuable when identifying whether a difference in cortical activation can be obtained for different stimuli. The drawback of these traditional methods is that excessive stimulus repetition is required. Moreover, ERP is useful in identifying whether an anomaly is detected (Van Berkum et al., 1999) or whether a shift in perceptual simulation has taken place (Collins et al., 2011), but does not sufficiently answer the question to what extent different cortical regions are relatively more or less active than others. The technique shown here used source localization techniques to determine where differences in activation were present during early and late processing. We then used that information to compare the relative effect sizes of two clusters of cortical regions over the duration of the trial. This method is novel, yet its findings match those obtained from more traditional methods (Simmons et al., 2008; Louwerse and Jeuniaux, 2010; Louwerse and Connell, 2011). This method obviously does not render fMRI unnecessary for localization. In our analyses we compared the relative dominance of different clusters of cortical regions (filtering out their individual effects). Such a comparative technique does not allow for localization of specific regions of the brain; it only allows for a comparison of (predetermined) regions.

How can the findings reported in this paper be explained in terms of the cognitive mechanisms involved in language processing? We have argued elsewhere that language encodes perceptual relations (Louwerse, 2011). Speakers translate prelinguistic conceptual knowledge into linguistic conceptualizations, so that perceptual relations become encoded in language, with distributional language statistics building up as a function of language use (Louwerse, 2008). Louwerse (2007, 2011) proposed the Symbol Interdependency Hypothesis,which states that comprehension relies both on statistical linguistic processes as well as perceptual processes. Language users can ground linguistic units in perceptual experiences (embodied cognition), but through language statistics they can bootstrap meaning from linguistic units (symbolic cognition). Iconicity relations between words (Louwerse, 2008), the modality of a word (Louwerse and Connell, 2011), the valence of a word (Hutchinson and Louwerse, 2012), the social relations between individuals (Hutchinson et al., 2012), the relative location of body parts (Tillman et al., 2012), and even the relative geographical location of city words (Louwerse and Benesh, 2012) can be determined using language statistics. The meaning extracted through language statistics is, however, shallow, but provides

## **REFERENCES**


good-enough representations. For a more precise understanding of a linguistic unit, perceptual simulation is needed (Louwerse and Connell, 2011). Depending on the stimulus (words or pictures; Louwerse and Jeuniaux, 2010), the cognitive task (Louwerse and Jeuniaux, 2010; current study), and the time of processing (Louwerse and Connell, 2011; current study) the relative effect of language statistics or perceptual simulations dominates. The findings reported in this paper support the Symbol Interdependency Hypothesis, with the relative effect of the linguistic system being more dominant in the early part of the trial and the relative effect of the perceptual system dominating later in the trial.

The RT and EEG findings reported here are relevant for a better understanding of the mechanisms involved in conceptual processing. They are also relevant for a philosophy of science. Recently, many studies have demonstrated that cognition is embodied,moving the symbolic and embodiment debate toward embodied cognition. The history of the debate (De Vega et al., 2008) is, however, reminiscent of the parable of the blind men and the elephant. In this tale, a group of blind men each touch a different part of an elephant in order to identify the animal, and when comparing their findings learn that they fundamentally disagree because they fail to see the whole picture. Evidence for embodied cognition is akin to identifying the tusk of the elephant, and evidence for symbolic cognition is similar to identifying its trunk. Dismissing or ignoring either explanation is reminiscent of the last lines of a parable: "For, quarreling, each to his view they cling. Suchfolk see only one side of a thing" (Udana, 6.4). Cognition is both symbolic and embodied; the important question now is under what conditions symbolic and embodied explanations best explain experimental data. The current study has provided RT and EEG evidence that both linguistic and perceptual simulation processes play a role in conceptual cognition, to different extents, depending on the cognitive task, with linguistic processes preceding perceptual simulation.

functional systems. *Nat. Rev. Neurosci.* 10, 186–198.


analysis of single-trial EEG dynamics including independent component analysis. *J. Neurosci. Methods* 134, 9–21.


representation of knowledge. *Psychol. Rev.* 104, 211–240.


epicenters with PET. *Neuroimage* 11, 347–357.


in conceptual processing. *J. Physiol. Paris* 102, 106–119.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 May 2012; accepted: 14 September 2012; published online: 16 October 2012.*

*Citation: Louwerse M and Hutchinson S (2012) Neurological evidence linguistic processes precede perceptual simulation in conceptual processing. Front. Psychology 3:385. doi: 10.3389/fpsyg.2012.00385 This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Louwerse and Hutchinson. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## **APPENDIX**

## **EXPERIMENTAL ITEMS USED IN EXPERIMENT 1 AND 2**

Airplane – runway Antenna – radio Antler – deer Attic – basement Belt – shoe Billboard – highway Boat – lake Boot – heel Bouquet – vase Branch – root Bridge – river Car – road Castle – moat Ceiling – floor Cork – bottle Curtain – stage Eyes – whiskers Faucet – drain Fender – tire Flame – candle Flower – stem Foam – beer Fountain – pool Froth – coffee Glass – coaster Grill – charcoal Handle – bucket Hat – scarf Head – foot Headlight – bumper Hiker – trail

Hood – engine Icing – donut Jam – toast Jockey – horse Kite – string Knee – ankle Lamp – table Lid – cup Lighthouse – beach Mailbox – post Mane – hoof Mantle – fireplace Mast – deck Monitor – keyboard Mustache – beard Nose – mouth Pan – stove Pedestrian – sidewalk Penthouse – lobby Pitcher – mound Plant – pot Roof – porch Runner – track Saddle – stirrup Seat – pedal Sheet – mattress Sky – ground Smoke – chimney Sprinkler – lawn Steeple – church Sweater – pants Tractor – field Train – railroad

## Using actions to enhance memory: effects of enactment, gestures, and exercise on human memory

## *Christopher R. Madan1,2\* and Anthony Singhal 1,3*

*<sup>1</sup> Department of Psychology, University of Alberta, Edmonton, AB, Canada*

*<sup>2</sup> Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany*

*<sup>3</sup> Centre for Neuroscience, University of Alberta, Edmonton, AB, Canada*

*\*Correspondence: cmadan@ualberta.ca*

#### *Edited by:*

*Judith Holler, Max Planck Institute Psycholinguistics, Netherlands*

#### *Reviewed by:*

*Harold Bekkering, University of Nijmegen, Netherlands Judith Holler, Max Planck Institute Psycholinguistics, Netherlands*

Few would doubt the benefits of exercise on one's physical well-being. However, the benefits of exercise on one's mental abilities are not nearly as extolled. More directly, the perspective that our bodies have a significant influence on our minds is still relatively new, though reviews by Rosenbaum (2005) and Madan and Singhal (2012a) suggest that this is beginning to change. This idea is also in line with the embodied approach to cognition (e.g., Clark, 1997; Lakoff and Johnson, 1999; Wilson, 2002; Anderson, 2003; Barsalou, 2008; Fischer and Zwaan, 2008). Briefly, embodied cognition suggests that physical properties of the human body, particularly the perceptual and motor systems, play an important role in cognition—the body influences the mind just as the mind influences the body. This approach is further supported by findings that individual body properties such as handedness can influence how individuals understand abstract concepts (Casasanto, 2009, 2011). One particularly interesting facet of the idea that our body can affect cognition is the influence of actions, gestures, and exercise on memory performance: the hypothesis is that our physical movements, and even the amount that we exercise, can affect our ability to remember. In the current paper we will provide an overview on the disparate research paradigms that support this hypothesis, and their resulting implications.

## **READING, IMAGINING, SEEING, AND DOING**

While the majority of studies investigating human memory use words or images as stimuli, Cohen (1981) asked participants to perform actions, to observe an experimenter perform the actions, or simply hear/read the instructions for the actions without it being performed. Participants were subsequently tested for their ability to recall the actions (e.g., "break the tooth-pick"). In later literature, these conditions were termed as self- or subject-performed tasks (SPT), experimenter-performed tasks (EPT), and verbal tasks (VT), respectively. Cohen found that participants were significantly more likely to remember SPTs or EPTs than VTs. Extending this finding, Denis et al. (1991) observed better memory for SPTs than actions that were imagined, testing imagined actions through both visual and motor imagery. Apart from the two studies described here, the work by Cohen and Engelkamp were the beginnings of a new field of research: memory of action events (for reviews, see Engelkamp and Zimmer, 1989; Engelkamp and Cohen, 1991; Zimmer and Cohen, 2001). The primary finding of this field was "the enactment effect": enhanced memory for SPTs, lending an early source of support to the embodied cognition perspective.

The leading theories proposed to explain the enactment effect were based on two main ideas: (A) Performed actions involve much richer and elaborative representations than mere verbal phrases, and (B) enacted actions engage the motor system whereas other methods of encoding do not. From perspective A, the finding that the enactment enhances memory by serving as an elaborative encoding strategy is also in-line with Craik and Lockhart's (1972) levels-of-processing framework, where information that is processed more deeply/elaboratively is remembered better than information that is only processed relatively shallowly/superficially. However, the nuances by which the enactment effect enhances memory [e.g., lack of a primacy efficacy, some resilience to aging-related attenuations; discussed in Engelkamp and Cohen (1991)] supports the possibility that engaging the motor system during encoding is dissimilar to how information is usually encoded through sensory-based modalities (e.g., visual, auditory; see Engelkamp and Zimmer, 1984; Zimmer et al., 2000).

While the enactment effect itself is noteworthy, it leads to a broader question: under what circumstances can actions enhance or impair memory? Here, a partial answer can be found in a relatively unrelated field: research into gestures.

## **GESTURE TO REMEMBER**

Gestures are motor actions that often accompany speech, and are intertwined with spoken content (McNeill, 1992; Krauss, 1998; Kelly et al., 2008). Recent findings suggest that gestures may be produced as a type of simulated action that arises when motor activation due to mental imagery processes exceeds a certain threshold (Hostetter and Alibali, 2008; Kelly et al., 2011), in close support of embodied cognition. Additionally, gesturing has been shown to improve problemsolving abilities by decreasing working memory load by conveying the same information through a second, imagebased, modality (Morsella and Krauss, 2004; Beilock and Goldin-Meadow, 2010; Cook et al., 2012). Since motor actions enhance memory (enactment effect), it seems reasonable to expect that gesturing may also affect memory encoding and retrieval. Supporting this notion, a number of studies have found that memory is enhanced in participants who observe a speaker who is also gesturing compared to observing a speaker who is not gesturing, or a gesturer who is not speaking (e.g., Thompson, 1995; Kelly et al., 1999). Thus, learning can be enhanced due to gesturing, even when one does not gesture themselves, but simply observes another gesturing. Considering that findings regarding the enactment effect found comparable recall rates with SPTs and EPTs, observing a gesture should be comparable to gesturing yourself.

Taking a more direct approach, Cook et al. (2010) presented participants with a series of short vignettes, after which they were asked to give detailed descriptions. The vignettes were then classified as either eliciting gestures during their description or not. Participants were given surprise free recall tasks after a brief delay, and after a 3-week delay. Recall rates were higher for vignettes associated with gesturing when described at both immediate and delayed tests, offering support for the notion that gestures can enhance learning and memory. In a subsequent experiment, enhanced memory performance was found even when participants were explicitly instructed to either gesture or not, rather than being allowed to spontaneously gesture. Stevanoni and Salmon (2005) found similar results with children, and recent studies have further investigated the influence of gestures on learning and memory (e.g., Straube et al., 2008; Macedonia et al., 2011; So et al., 2012).

Considering that motor actions can enhance memory for specific information, both through enactment and through gesturing, a further question is whether they can also enhance overall memory ability. In other words, can physical exercise enhance an individual's memory capacity?

## **WORKING OUT YOUR BODY TO EXPAND YOUR MIND**

While the idea that physical exercise could increase memory recall ability is recent focus of research, it has been shown several decades ago in older adults (Powell, 1974; Diesfeldt and Diesfeldt-Groenendijk, 1977), and has even been shown to lead to enhanced memory abilities as much as one year later (Perrig-Chiello et al., 1998). More recently, daily physical exercise has been shown to reduce the cognitive decline associated with aging as well as reduce the risk of developing Alzheimer's disease (Buchman et al., 2012).

Apart from research on older adults specifically, there is a considerable body of research on the effects of exercise on cognitive performance. Unfortunately, a comprehensive examination at the literature reveals inconsistent findings, with some studies finding an enhancement of cognitive ability due to exercise while others report impairments. A detailed review by Tomporowski (2003) resolves these inconsistencies by accounting for the nature of the physical activity used: intensive exercise to dehydration leads to impairments in cognitive performance, while less intensive, aerobic, exercise leads to enhanced performance, including enhanced memory ability. In addition to behavioral measures of enhanced memory, structural MRI images of the brain before and after week-to-month long exercise protocols have also shown increased hippocampal volume due to the exercise intervention (Pereira et al., 2007; Erickson et al., 2011), extending the results of a number of prior findings in rodents (e.g., Uysal et al., 2005; Pereira et al., 2007; Wu et al., 2007; Clark et al., 2011). While it appears clear that exercise has beneficial effects on memory and hippocampal neurogenesis, it should also be noted that the benefits of exercise on cognition are not confined to only memory or the hippocampus, but also extend to a wider range of cognitive processes, particularly executive function and the prefrontal cortex and anterior cingulate cortex (see Hillman et al., 2008, for a review).

Considering the long-term effects of physical exercise on memory ability, as well as the structural changes observed in hippocampal volume, it is worth considering if the opposite is also true: would obesity be correlated with decreases in memory performance? One established indicator of obesity is the body mass index (BMI). Lending some support to this hypothesis, Trakas et al. (2001) found obese individuals to self-report being more forgetful. While this is in-line with our prediction, this result alone is insufficient to evaluate if obesity is affecting memory ability itself, or perhaps just memory confidence, or other-related processes. However, drawing conclusions from two studies that tested this hypothesis directly with batteries of cognitive tasks (Elias et al., 2003; Gunstad et al., 2006), it appears that the answer is "yes", at least in some cases. In a preliminary analysis, Elias et al. (2003) found no difference in cognitive performance measures for normal weighted and overweight individuals and thus grouped the data for these participants together as "non-obese". When comparing memory performance for non-obese and obese participants, the obese participants performed worse in some memory tasks, but the effect was only observed in males. Gunstad et al. (2006) classified participants as normal, overweight, or obese and found significant memory impairments in a variety of memory tasks that correlated with BMI (and also not interacting with age). Other studies have used longitudinal analyses, however, results are mixed with some studies finding a relationship (Brubacher et al., 2004) and others finding no correlation (Cournot et al., 2006). Recent research further suggests that hippocampal neurogenesis may also be influenced by diet, insulin levels, and genetic factors (Brubacher et al., 2004; Lindqvist et al., 2006; Nichol et al., 2009; Wallner-Liebmann et al., 2010; Clark et al., 2011; Grillo et al., 2011). While these results are likely not enough to make you think twice about skipping on a run to watch TV, they do suggest that our mind and body may be more closely connected than previously thought—and extend the boundaries commonly applied to embodied cognition.

## **MOVING FORWARD**

Taken together, these unrelated lines of research all lead to one conclusion: our minds and our bodies are more connected than previously thought, and we should not choose between honing *either* our mind or our body. Related research can support this conclusion even further, where movement-related properties (e.g., "affordances", see Gibson, 1977, 1979) of objects and even words can influence how we process information (e.g., Handy et al., 2003; see Madan and Singhal, 2012a, for a review). In particular, recent findings suggest that the motoric properties of words representing objects, i.e., word manipulability, and how these words are processed can also influence verbal processing (Rueschemeyer et al., 2010; also see Just et al., 2010) as well as enhance memory recall (Madan and Singhal, 2012b).

In addition to supporting embodied cognition, the idea that actions enhance memory is also well in-line with the motor chauvinist perspective (Wolpert et al., 2001), where it is hypothesized the brain and, in turn, cognitive function may have evolved to facilitate an organism's ability move within their environment (also see Glenberg, 1997; Gallese and Sinigaglia, 2010; Madan and Singhal, 2012a). If this viewpoint is correct, one would predict that movements should enhance memory function. That is, actions that are executed should be remembered better than those that are read about. Ideas that are communicated in parallel with actions (e.g., gestures) should be remembered better than those that are communicated in the absences of movement. And, general memory ability should be enhanced by physical exercise. Current evidence suggests that all these predictions are valid.

An important idea that has emerged in cognitive science is that the body influences the mind. The embodied cognition approach suggests that motor output is integral to cognition, and the converging evidence of multiple avenues of research further indicate that the role of our body in memory processes may be much more prevalent than previously believed. The extent of this cannot be overstated, and has implications for all memory research. For instance the gesture literature suggests that if a participant were to use gestures while engaged in paired-associate learning, there is the chance that the results could be contaminated with variability due to the gesturing itself. Even more broadly, the majority of studies of memory rely on motor actions to provide behavioral measures of cognition, usually in the form of a button/key press. For example, a widely applied paradigm in cognition, the go/nogo task, requires overt motor responses on some trials and overt inhibition of motor processes on others. However, if the interaction between cognition and motor action is not a one-way process, that is, the action also influences the memory, perhaps amplifying or attenuating the effect size—there is the potential for other inferences to be drawn about the outcome.

## **ACKNOWLEDGMENTS**

This work was partly funded by a Discovery grant and Alexander Graham Bell Canada Graduate Scholarship from the National Science and Engineering Research Council of Canada held by Anthony Singhal and Christopher R. Madan, respectively.

## **REFERENCES**


perspectives from cognitive neuroscience, developmental psychology and education. *Lang. Linguist. Compass.* 2, 569–588.


elderly volunteers. *Age Ageing* 27, 496–475.


(2010). Insulin and hippocampus activation in response to images of high-calorie food in normal weight and obese adolescents. *Obesity* 18, 1552–1557.


*Received: 18 August 2012; accepted: 29 October 2012; published online: 19 November 2012.*

*Citation: Madan CR and Singhal A (2012) Using actions to enhance memory: effects of enactment, gestures, and exercise on human memory. Front. Psychology 3:507. doi: 10.3389/fpsyg.2012.00507*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Madan and Singhal. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

## Eyes on the mind: investigating the influence of gaze dynamics on the perception of others in real-time social interaction

#### **Ulrich J. Pfeiffer 1,2\*, Leonhard Schilbach1,3, Mathis Jording<sup>1</sup> , Bert Timmermans <sup>1</sup> , Gary Bente<sup>4</sup> and Kai Vogeley 1,2**

<sup>1</sup> Neuroimaging Group, Department of Psychiatry, University Hospital Cologne, Cologne, Germany

2 Institute of Neuroscience and Medicine – Cognitive Neurology (INM3), Research Center Juelich, Juelich, Germany

<sup>3</sup> Max-Planck-Institute for Neurological Research, Cologne, Germany

<sup>4</sup> Faculty of Human Sciences, Department of Media and Social Psychology, University of Cologne, Cologne, Germany

#### **Edited by:**

Judith Holler, Max Planck Institute Psycholinguistics, Netherlands

#### **Reviewed by:**

Daniel Richardson, University College London, UK Helene Kreysa, Friedrich-Schiller-University of Jena, Germany

#### **\*Correspondence:**

Ulrich J. Pfeiffer, Neuroimaging Group, Department of Psychiatry, University Hospital Cologne, Kerpener Strasse 62, 50937 Cologne, Germany. e-mail: ulrich.pfeiffer@uk-koeln.de

Social gaze provides a window into the interests and intentions of others and allows us to actively point out our own. It enables us to engage in triadic interactions involving human actors and physical objects and to build an indispensable basis for coordinated action and collaborative efforts.The object-related aspect of gaze in combination with the fact that any motor act of looking encompasses both input and output of the minds involved makes this non-verbal cue system particularly interesting for research in embodied social cognition. Social gaze comprises several core components, such as gaze-following or gaze aversion. Gaze-following can result in situations of either "joint attention" or "shared attention."The former describes situations in which the gaze-follower is aware of sharing a joint visual focus with the gazer. The latter refers to a situation in which gazer and gaze-follower focus on the same object and both are aware of their reciprocal awareness of this joint focus. Here, a novel interactive eye-tracking paradigm suited for studying triadic interactions was used to explore two aspects of social gaze. Experiments 1a and 1b assessed how the latency of another person's gaze reactions (i.e., gaze-following or gaze version) affected participants' sense of agency, which was measured by their experience of relatedness of these reactions. Results demonstrate that both timing and congruency of a gaze reaction as well as the other's action options influence the sense of agency. Experiment 2 explored differences in gaze dynamics when participants were asked to establish either joint or shared attention. Findings indicate that establishing shared attention takes longer and requires a larger number of gaze shifts as compared to joint attention, which more closely seems to resemble simple visual detection. Taken together, novel insights into the sense of agency and the awareness of others in gaze-based interaction are provided.

**Keywords: gaze-following, joint attention, shared attention, social interaction, agency, mentalizing, eye-tracking**

## **INTRODUCTION**

The visual system is a major source of information about the environment. In face-to-face social encounters it is not only a source of information but also a crucial means of non-verbal communication. Imagine the following everyday situation: you are sitting at the bar of a pub gazing contemplatively at your empty glass. Suddenly the bartender walks by and observes that your eyes are directed at the empty glass. As soon as you direct your gaze at him and back to the glass he will – without words – understand that you need another drink. Such instances of "social gaze" demonstrate how meaning can be conveyed by simple acts of looking. A considerable amount of research has been devoted to the development and function of social gaze (Argyle and Cook, 1976; Mundy and Newell, 2007; Shepherd, 2010). Gaze represents a non-verbal cue system which reflects perception and action simultaneously, or in which, as Gibson and Pick, 1963, p. 368) have noted, "any act of looking can be treated as a source of stimulation as well as

a type of response." Its salience in social encounters makes gaze a perfect tool to study "online" social interaction, i.e., face-to-face interaction between two persons in real-time (Schilbach et al., 2011).

Mainly due to methodological constraints, the study of online interaction has largely been neglected by researchers in social cognition (Schilbach et al., in press). In recent years, however, there have been exciting advances to create tools for the investigation of non-verbal and especially gaze-based social interaction (Redcay et al., 2010; Wilms et al., 2010; Staudte and Crocker, 2011; Bayliss et al., 2012). For example, Redcay et al. (2010) established a setup in which participants inside an MRI scanner could either interact face-to-face with an experimenter via a live video feed or watch a recording of the experimenter's behavior during previous interactions, thereby enabling the investigation of the processing of dynamic features of social interaction. Staudte and Crocker (2011) designed a series of experiments in which participants interacted with an artificial agent (i.e., a robot) in order to study the dynamic coupling between gaze and language in verbal human-robot interaction. Recently, Wilms et al. (2010) introduced an interactive eye-tracking setup which allows participants to interact with an anthropomorphic virtual character in a gaze-contingent manner. A similar program has been created recently by another group to study face-to-face interaction in social contexts (Grynszpan et al., 2012).

The advent of virtual reality techniques for research in neuroscience and psychology (Tarr and Warren, 2002; Bohil et al., 2011) has raised the general question why we need these displays to study human cognition. Bohil et al. (2011, p. 752) have noted that "an enduring tension exists between ecological validity and experimental control" in psychological research. They suggest that virtual reality techniques provide a way out of this dilemma because they provide naturalistic, real-world-like displays whilst offering full control over a selected set of experimental variables. Indeed, studies addressing the validity of using virtual characters have demonstrated that the interaction with virtual agents elicits social behaviors which are similar to real interaction (von der Pütten et al., 2010) and that uncontrolled aspects of another person's outer appearance and non-verbal behavior can be filtered out while participants' overall impression of an interaction remains intact (Vogeley and Bente, 2010). In addition, avatar- and video-mediated communication have shown to create comparable levels of experienced social presence and intimateness (Bente et al., 2008).

Before such paradigms can be used to study gaze in more complex social scenarios, basic parameters of different processes of social gaze need to be identified. Several of these processes have been defined by Emery (2000): *direct (or mutual) gaze* – a situation where two individuals direct their gaze at each other – is described as the most basic process of social gaze. If one individual detects that the other averts its gaze this can serve as a cue for a *gaze-following* reaction to the other's novel focus of visual attention. This results in a situation of *joint attention (JA)*, in which the gaze-follower is aware that he and the gazer have the same focus of attention – for instance, an object in the environment. In other words, in JA another person's gaze is hence used as a cue to this person's visual attention. This has been argued to represent a crucial prerequisite for the gaze-follower to infer the gazer's mental states (e.g., thoughts, intentions, feelings. . .) regarding an object of joint focus (Gopnik et al., 1994), an ability commonly referred to as mentalizing (Frith and Frith, 2006). Notably, JA does not require the gazer to be aware of the gaze-follower's reaction. In contrast, *shared attention (SA)* requires that *both* individuals are aware of focusing on the same object *and* of each other's reciprocal awareness of this joint attentional focus (Emery, 2000). Moreover, SA has been argued (Moll and Tomasello, 2007) to involve the gazer's intention to direct the other's gaze to a certain object in order to achieve a shared goal or share an experience, thereby providing a behaviorally accessible measure of shared intentionality. Notably, different but often overlapping descriptions of JA or SA exist in the literature (e.g., Clark, 1996; Povinelli and Eddy, 1996; Tomasello et al., 2005; Frischen et al., 2007; Mundy and Newell, 2007). The study presented in this article is largely guided by the comparably mechanistic account of Emery (2000), which provides a clear

conceptual distinction between JA and SA that is suited to provide empirical access to these processes.

Joint and shared attention constitute so-called triadic social interactions. In contrast to dyadic interactions which develop early in infancy and involve processes such as mutual gaze or reciprocal emotional displays (Stern, 1974), triadic interactions are characterized by involving "the referential triangle of child, adult, and some third event or entity to which the participants share attention" (Carpenter et al., 1998, p. 1). The establishment of reference to a certain aspect of the environment in a triadic interaction thus creates a form of perceptual common ground (Clark, 1996). This is a prerequisite for understanding each other's goals and intentions regarding the object of joint focus. So far, however, the temporal and spatial dynamics of gaze in triadic interactions have not been studied systematically using interactive (i.e., gaze-contingent) paradigms (for discussion, see Becchio et al., 2010; Schilbach et al., in press). Although pictures of objects have been used in gaze cueing studies (Bayliss et al., 2006, 2007; van der Weiden et al., 2010), interactive eye-tracking studies so far have been limited to simple geometric shapes as stimuli (Schilbach et al., 2010; Wilms et al., 2010; Pfeiffer et al., 2011).

Using pictures of real-world objects, the current study employs a more ecologically valid interactive eye-tracking setup to address the following questions: (1)*How does the perception of JA depend on the congruency (i.e., gaze-following and gaze aversion) and latency of another person's gaze reactions?* In experiments 1a and 1b, the effect of the congruency of gaze reactions – gaze-following and gaze aversion – as well as the latency with which these reactions follow participants' gaze shifts was manipulated. To this end, participants interacted with a virtual character in brief triadic interactions in which the character would either engage in joint or in non-joint attention (NJA) with different latencies. After each reaction, participants had to indicate how related they experienced this reaction to their own behavior. We argue that this can be taken as a measure to which *degree* participants experienced agency, i.e., that the other's reaction is a consequence of their own action. In its prevalent definition, the sense of agency is described as an all-or-none phenomenon relating to the awareness that we are the initiators of our own actions (de Vignemont and Fourneret, 2004; Synofzik et al., 2008). However, the sense of agency also encompasses an awareness of the consequences (e.g., another person's gaze shifts) inextricably linked to our actions (Bandura, 1989; Pacherie, 2012). As put forward by Pacherie (2012), in social interactions agency experience is not only influenced by high-level cognitive factors and sensorimotor cues, but also by perceptual consequences of one's own actions, including the reactions of another person. Specifically, we hypothesize that participants experience gazefollowing (which results in JA) as more strongly related to their own gaze behavior as compared to gaze aversion (which results in disparate attention). It is also predicted that the latency of gaze reactions modulates this experience: very short latencies, which might create an experience of coincidental looking, as well as very long latencies, which might disrupt the temporal contingency between actions, were supposed to decrease participants' sense of agency. (2) *Does gaze behavior differ in situations of JA and SA*? Although the concepts of JA and SA are theoretically distinct, it has never been tested experimentally whether they correspond to differences in the dynamics of gaze behavior. In Experiment 2, participants engaged in a series of triadic interactions in which they were asked to indicate whenever they experienced JA or SA. We hypothesized that SA requires an increased number of gaze shifts and takes longer to establish as compared to JA.

## **MATERIALS AND METHODS**

In this section, three different experiments will be described. These experiments largely rely on the same materials and methods. For the sake of brevity, those materials and methods that are common to all experiments will be indicated before the procedure of each experiment will be described separately.

#### **PARTICIPANTS**

In sum, 95 healthy female and male persons aged 19–42 years (*M* = 25.86, SD = 6.23), with no record of neurologic or psychiatric illnesses volunteered for the study. The numbers for each individual experiment are given in the description of that particular experiment below. All participants were naïve to the scientific purpose of the study and were compensated for their participation (10 Euro/h). Prior to the experiment, participants were asked to sign a written consent form in which they approved that participation is voluntary and that data are used in an anonymized fashion for statistical analysis and scientific publication. The study followed the WMA Declaration of Helsinki (Ethical Principles for Medical Research Involving Human Subjects) and was presented to and approved by the ethics committee of the Medical Faculty of the University Hospital Cologne, Germany.

#### **SETUP AND MATERIALS**

We made use of an interactive eye-tracking program recently developed (Wilms et al., 2010). This method allows participants to interact with an anthropomorphic virtual character by means of their eye-movements. Using a high resolution eye-tracking device (Tobii™T1750 Eye-Tracker, Tobii Technology AB, Sweden) with a digitization rate of 50 Hz and an accuracy of 0.5˚, participants' eye-movements could be detected exactly. Stimuli were presented on the 17<sup>00</sup> TFT screen of the eye-tracker with screen resolution set to 1024 by 768 pixels. Both the participant and the confederate were seated at a distance of 80 cm from their respective eye-tracker as depicted in **Figure 1A**. The viewing angle subtended 32˚ × 24˚. A PC with a dual-core processor and a GeForce

2 MX graphics board controlled the eye-tracker as well as stimulus presentation at a frame rate of 100 Hz. Integrated gaze extraction software (Clearview™, Tobii Technology AB, Sweden) made data available for real-time computation of stimulus presentation to the software package Presentation (Presentation™)<sup>1</sup> which was used to control stimulus presentation in a gaze-contingent manner (for details on the algorithm see Wilms et al., 2010). All data were analyzed using PASW Statistics 20 (SPSS Inc., Chicago, IL, USA)<sup>2</sup> .

#### **STIMULI**

One male and one female anthropomorphic virtual character were used in this study (Schilbach et al., 2010; Pfeiffer et al., 2011). Except for their eyes, the facial features of these characters were static in order to prevent the influence of non-verbal information other than gaze. Male participants interacted with the male character (exemplarily depicted in **Figure 1B**) and female participants with the female character, respectively. The potency of virtual characters to elicit social presence and the advantages of their usage in experiments on social cognition has been demonstrated previously (for detailed discussion, see Loomis et al., 1999; Bailenson et al., 2003; Vogeley and Bente, 2010).

The 32 object stimuli used here were taken from a previously published study (Bayliss et al., 2006) and consist of two different categories of everyday-life objects, i.e., typical "kitchen" and "garage" objects (**Figure 1B**). They were standardized with respect to likeability (*M* = 4.75, SD = 0.97 on a nine-level scale) and to participants' ability to assign them to their respective category (accuracy *M* = 95.3%, SD = 2.66). Each of the objects was used in two different colors (blue and red) and was mirrored to create two different orientations (i.e., the handle pointing to the left or the right). They were presented within a gray rectangle with a size of 306 × 108 pixels. All pictures were analyzed with respect to their size and their luminescence to ensure physical consistency. The manipulations of color and orientation yielded a total of 128 different pictures, which allowed for the presentation of two new pictures in each trial. **Figure 1B** depicts an example of a stimulus screen.

<sup>1</sup>http://www.neurobs.com <sup>2</sup>www.spss.com

## **COVER STORY**

Participants were led to believe that they would engage in a gaze-based interaction task with another participant and that the interaction would not be vis-à-vis but via virtual characters serving as avatars of their gaze behavior. More specifically, participants were instructed that their eye-movements would be conferred to a virtual character displayed on the screen of their interaction partner. Likewise, the eye-movements of their interaction partner would be visualized by a virtual character displayed on their screen. In fact, however, the interaction partner was a confederate of the experimenter and the virtual character's eye-movements were always controlled by a computer program to ensure full experimental control. Participants were debriefed about this manipulation after the experiment and belief in the cover story was controlled during a post-experiment interview.

## **PROCEDURE**

In the beginning of each experiment the participant and the confederate were seated in front of two eye-tracking devices. Female participants interacted with a female confederate, and male participants with a male confederate, respectively. Subsequently, they received written instructions on the computer screen. A room-divider visually separated both persons. After both of them indicated that they had understood the instructions, the participant's eye-tracker was calibrated. To sustain the cover story, the experimenter pretended to be calibrating the eye-tracker of the interaction partner as well. In addition, during the experiment both persons were asked to wear ear protection so that the participant was not distracted from the task and to make verbal communication impossible.

## **EXPERIMENT 1A**

The first experiment aimed at assessing at which latencies participants experienced gaze reactions – either gaze-following or gaze aversion – of another person as contingent on their own gaze shifts. It consisted of two main conditions: (1) JA trials in which the virtual character followed the participant's gaze and (2) NJA trials in which the virtual character did not follow the participant's gaze but shifted its gaze toward the other object. In both conditions the latency of the virtual character's gaze reactions was varied from 0 to 4000 ms in steps of 400 ms. This yielded eleven sub-conditions which were repeated eight times throughout the experiment, thereby resulting in a total of 176 trials which were presented in a randomized fashion.

Each trial started with an initiation phase in which participants were instructed to fixate the virtual character. Upon fixation two objects appeared to the left and the right of the virtual character. Participants were asked to shift their gaze to one of these objects as quickly as possible and to wait for the reaction of the virtual character. After the character's gaze reaction the scene remained static for another 500 ms before participants had to indicate by button press how strongly related they experienced the gaze reaction of the other to their own gaze shift on a fouritem scale (very related – rather related – rather unrelated – very unrelated). Each trial was followed by a short break in which a fixation cross was presented with a latency jittered between 1000 and 2000 ms. The total duration of the experiment was about 25 min.

In this experiment, 30 volunteers participated, out of which 27 (Mean age = 27.63, SD = 6.29, 15 female/12 male) entered the analysis. Two had to be excluded from data analysis because of technical problems and another one due to disbelief in the cover story.

## **EXPERIMENT 1B**

In order to enhance participants' sensitivity to the timing of *gazefollowing*, Experiment 1a was repeated without the non-JA condition, that is, the virtual character followed participants' gaze in *all* trials. Participants were instructed that their putative interaction partner was instructed to always look at the same object. As each sub-condition (i.e., reaction latencies from 0 to 4000 ms in steps of 400 ms) was repeated 16 instead of eight times, Experiment 1b did not differ structurally from Experiment 1a.

There were 24 participants in this experiment. Only 21 (Mean age = 23.86, SD = 5.74, 14 female/7 male) were included in the analysis as two had to be excluded due to technical problems and one due to disbelief in the cover story.

## **EXPERIMENT 2**

The aim of this experiment was to assess whether the theoretically proposed processes of JA and SA differ with respect to the interaction dynamics. The experimental design contained a between-subject and a within-subject factor. The within-subject factor was the order of initiation of the interaction sequence (selfinitiated vs. other-initiated) and the between-subject factor was task instruction (JA vs. SA). Prior to the experiment, participants were assigned in a randomized but gender-balanced fashion to either a JA or a SA group. In the JA group, participants were instructed to press a response button as soon as *they themselves were aware that both they and their interaction partner directed their attention to the same object*. In the SA condition, participants were asked to press the button as soon as *they were convinced that both of them were aware of each other directing their attention to the same object*. Particular caution was exerted to avoid any explanation that went beyond the descriptions written in italics above and any cues toward the theoretical concepts of JA and SA or related psychological processes.

In both JA and SA groups, the order of initiation of the interaction sequence (i.e., the within-subject factor) was manipulated block-wise. The initiator of a trial is the person who is the first to fixate one of the two objects on the screen. Participants either started with the self-initiated block in the first half of the experiment and then proceeded in the other-initiated block in the second half or vice versa. To avoid sequence effects, participants started with the self- or other-initiated block in an alternating fashion. Each block consisted of 32 trials. In the beginning of each trial two objects were shown for 3000 ms on the left and the right side of the screen so that participants could become acquainted to them and subsequently concentrate on the interaction task. After the acquaintance period the virtual character appeared in the center of the screen. This served as a cue to the initiation of the interaction. Participants were instructed that the establishment of mutual gaze with the virtual character was a prerequisite for the interaction sequence to start. Depending on the experimental block, there were two ways the interaction period could be initiated. (1) In trials of the self-initiated block participants were told to choose one object by fixating it and the virtual character followed their gaze. (2) In contrast, in trials of the other-initiated block the virtual character commenced the interaction by shifting its gaze to one of the objects. Participants were instructed to follow its gaze. As soon as the first gaze fixation on the virtual character (in the self-initiated condition) or on the chosen object (in the other-initiated condition) was detected, the dynamic interaction period started. When the participant looked at the virtual character, it responded by shifting its gaze to the participant to establish eye contact. When the participant looked back at the object,the virtual character followed his or her gaze. Gaze reactions of the virtual character followed with a latency that was jittered between 400 and 800 ms (i.e., latencies experienced as"natural"for human gaze reactions according to Experiments 1a and 1b). This interaction continued until participants – depending on the group they had been assigned to – indicated the experience of JA or SA (as described above) by pressing a button and thereby ending the current trial.

Overall, 43 participants participated in the study. As three of them were excluded due to technical problems, only 40 of them (Mean age = 24.75, SD = 5.15, 20 female/20 male) were included in the analysis.

#### **RESULTS**

#### **EXPERIMENT 1A**

The ratings of relatedness of the avatar's gaze reactions are depicted in **Figure 2A**. A two-way ANOVA for repeated-measures with the factors gaze reaction (joint vs. non-joint) and latency (0–4000 ms in steps of 400 ms) showed a main effect of gaze reaction: as expected, gaze-following reactions resulting in JA were experienced as more related to participants' gaze shifts as compared to gaze aversion resulting in NJA, *F*(1, 26) = 67.09, *p* < 0.001. In addition, there was a main effect of latency on participants' ratings of relatedness, *F*(5.83, 92.54) = 5.38, *p* = 0.001 (Greenhouse–Geisser corrected, ε = 0.36, due to a violation of the assumption of sphericity). For both joint and NJA trials, participants rated immediate reactions with a latency of 0 ms as considerably less related to their own gaze shift than reactions with higher latencies. In addition, ratings of relatedness seemed to decrease linearly for latencies greater than 800 ms (see also the "Combined Analysis of Gaze-Following in Experiments 1a and 1b" below). There was no significant interaction between these two factors, *F*(6.3, 163.76) = 1.26, *p* = 0.28.

#### **EXPERIMENT 1B**

**Figure 2B** shows the ratings of relatedness of the avatar's gaze reaction to participants' own gaze shift as a function of the latency of the reaction. A one-way repeated-measures ANOVA revealed that, similar to the results of Experiment 1a, there was a main effect of latency on participants' rating of relatedness of the other's gaze reaction, *F*(17.07, 54.87) = 26.78, *p* < 0.001 (Greenhouse– Geisser corrected, ε = 0.27). This effect was described by a highly significant linear trend, *F*(1, 20) = 53.14, *p* < 0.001, indicating a continuous decrease of relatedness ratings with increasing latency of gaze reactions.

#### **COMBINED ANALYSIS OF GAZE-FOLLOWING IN EXPERIMENTS 1A AND 1B**

In a separate set of analyses, we focused only on JA and compared the JA trials from Experiment 1a to Experiment 1b. The crucial difference between these two experiments was that in Experiment 1a the putative interaction partner had an additional option to react and could also avert his/her gaze, whereas in

in JA, only the latency of the gaze reaction is varied. For better comparability, the joint attention data of Experiment 1a (JA in the context of NJA as another option to act) are plotted together with the data from Experiment 1b (JA only).

Experiment 1b the virtual character would always follow participants' gaze, which participants were informed of during the instruction. In order to assess the influence of a second option to react on the perception of latency of gaze-following, we conducted a two-way repeated-measures ANOVA including only the JA trials from Experiment 1a and all trials from Experiment 1b with experiment as a between-subjects factor. There was a significant interaction between the factors experiment and relatedness rating, *F*(4.27, 196.3) = 11.02, *p* < 0.001 (Greenhouse– Geisser corrected, ε = 0.43). As **Figure 2B** shows, ratings from Experiment 1b (open circles), which consisted only of JA trials, suggest that participants experience gaze-following reactions as most related to their own gaze shift when they follow with a latency of 400 ms (*M* = 3.26, SD = 0.68). In Experiment 1a (filled circles) ratings for gaze reactions with a latency of 400 ms were significantly lower (*M* = 2.86, SD = 0.61), as shown by a *t*-test for independent samples, *t*(46) = −2.16, *p* = 0.038. Here, visual inspection of data suggests that maximum relatedness ratings were not reached before 800 ms. Furthermore, in Experiment 1b there was a continuous linear decrease of relatedness ratings beginning at 400 ms. This was confirmed by a highly significant linear trend, *F*(16.06, 42.67) = 53.14, *p* < 0.001, which is absent in the data of Experiment 1a, *F*(0.47, 17.49) = 0.7, *p* = 0.41. Taken together, these results suggest that when the interaction partner has no other choice but following participants' gaze, relatedness ratings peak earlier as compared to a context in which the other can either react by gaze-following or by gaze aversion. In addition, participants' are less sensitive to the latency of gaze-following in the context of action alternatives.

#### **EXPERIMENT 2**

An independent samples t-test indicated that significantly more gaze shifts were required to reach a situation of shared (*M* = 2.55, SD = 1.26) as compared to JA (*M* = 1.23, SD = 0.35). Furthermore, standard deviations indicate that the inter-individual variance was much higher in SA. This between-subject variance is also depicted in the box plot in **Figure 3A**. Importantly, the establishment of mutual gaze was a prerequisite for the initiation of the interaction to ensure that scan paths always began with a fixation of the virtual character. The increased number of gaze shifts also resulted in significantly longer trial durations in shared (*M* = 3886.39 ms, SD = 1838.91 ms) vs. JA (*M* = 2040.11 ms, SD = 974.64 ms), *t*(28.89) = −3.97, *p* < 0.001, *r* = −0.58. Interestingly, in JA participants showed significantly more gaze shifts in self-initiated trials (*M* = 1.41, SD = 0.68) compared to otherinitiated trials (*M* = 1.07, SD = 0.10), *t*(19.79) = 2.18, *p* = 0.042, *r* = 0.33, while there was no such effect of initiation in SA, *t*(38) = 0.24, *p* = 0.81 (see **Figure 3B**), indicating that only the gaze dynamics of JA were influenced by the initiation of the interaction.

## **DISCUSSION**

The present study introduced a novel interactive eye-tracking paradigm suitable to study multiple facets of triadic interactions between two agents and real-world objects in real-time. On a methodological level, this provides an important complement to previous work by our group which has not involved real objects but rather concentrated on the dyadic aspects of gaze-following and JA (Schilbach et al., 2010; Wilms et al., 2010; Pfeiffer et al., 2011). This methodological advancement was used for the empirical investigation of temporal and dynamic aspects of social gaze as a socially salient form of embodied actions with great ecological validity. In Experiments 1a and 1b, participant's sense of agency was measured as a function of both the congruency and latency of another person's gaze reaction. In Experiment 2, differences in gaze dynamics and trial duration resulting in JA and SA were examined. These results provide interesting insights into gaze behavior and the experience of gaze reactions in an ecologically

valid but experimentally controllable setting. Conceptual as well as methodological implications are discussed in the following.

## **EFFECTS OF THE CONGRUENCY OF GAZE REACTIONS**

Experiments 1a and 1b investigated how related participants experienced different latencies of gaze reactions to their own gaze behavior by varying these latencies and the congruency of reactions (i.e., gaze-following vs. gaze aversion) systematically. In the following, we suggest that the experience of relatedness can be taken as a measure of the sense of agency (Pacherie, 2012).

It was first predicted that the congruency of the other's gaze reaction (gaze-following vs. gaze aversion) strongly influences participants' sense of agency, as measured by their experience of relatedness. Indeed, results indicated that gaze-following is experienced more strongly related to one's own gaze shifts as compared to gaze aversion. It is highly plausible that this relates to a positive valence that has been associated with gaze-following in comparison to gaze aversion. The literature provides indirect evidence for positive and negative evaluations of gaze-following and gaze aversion, respectively. In a recent study aiming at unraveling the expectations of participants' regarding the behavior of a human interaction partner, we asked participants to interact with a virtual character in a similar interactive eye-tracking setup as in the present study (Pfeiffer et al., 2011). In order to distinguish social from non-social interaction, participants were led to believe that in any given interaction block consisting of a number of gaze trials the virtual character could either be controlled by another person or a computer algorithm. Their task was to decide based on the virtual character's gaze reactions whether they had been interacting with a human or a computer. Unbeknownst to participants, the reactions were always controlled by a computer algorithm to allow full experimental control. Results demonstrated that the proportion of human ratings increased linearly with increasing numbers of gaze-following trials in an interaction block, thereby indicating that in such simple gaze-based interactions, gaze-following and JA are taken as most indicative of true social interaction. This supports the present finding that gaze-following results in an enhanced experience of agency as expressed by higher ratings of self-relatedness.

Another set of studies emphasizes the positive valence of gaze-following in contrast to gaze aversion. A recent study used interactive eye-tracking in an MRI scanner to compare otherand self-initiated situations of JA and NJA and demonstrated a specifically positive valence of self-initiated JA (Schilbach et al., 2010). Results indicated that self-initiated JA correlates with activity in the ventral striatum, a brain region which is a part of the brain's reward system and whose activation has been linked to hedonic experiences (Liu et al., 2007). There is also evidence for negative affective evaluations of gaze aversion. For example, Hietanen et al. (2008) showed in an EEG study that watching pictures of persons averting their gaze leads to avoidancerelated neural activity, whereas watching pictures of persons with direct gaze correlated with approach-related signals. Furthermore, persons who avert their gaze are judged as less likeable and attractive as compared to persons exhibiting direct gaze (Mason et al., 2005) and gaze aversion is understood as

a non-verbal cue to lying and insincerity (Einav and Hood, 2008; Williams et al., 2009). It is conceivable that the intrinsically rewarding nature of initiating social interaction by leading someone's gaze in combination with the implicitly negative evaluation of averted gaze plays a prominent role in the increased feeling of relatedness for gaze-following as compared to gaze aversion.

## **THE INFLUENCE OF REACTION LATENCIES AND ACTION POSSIBILITIES ON THE EXPERIENCE OF GAZE REACTIONS**

We hypothesized that, while very short latencies might be perceived as coincidental, reactions with long latencies might be experienced as non-contingent upon one's own behavior. Indeed, the most obvious finding was that in all conditions reactions with a latency of 0 ms were experienced as considerably less related than the subsequent latency levels of 400 and 800 ms. This result is plausibly explained by the fact that a certain minimal delay needs to be present until a reaction can be experienced as causally linked to (or launched by) any given preceding action and not just as mere coincidence (Scholl and Tremoulet, 2000). Literature suggests that the natural latency of normal saccades (i.e., not express saccades) to any form of visual displacement on a screen is between 200 and 250 ms (Saslow, 1967; Yang et al., 2002). Although our results do not precisely show at which latencies a reaction is experienced as merely coincidental, it is conceivable that saccadic latencies are implicitly taken into account in participants' ratings of relatedness and that gaze reactions with latencies below 250 ms are therefore considered unrelated. However, further experiments are needed to investigate in detail how latencies of gaze reactions between 0 and 400 ms are experienced.

Notably, however, the experience of different latencies of a gazefollowing reaction appears to depend on the other person's options to act. When the other person can choose to follow or to avert her eyes, there is hardly any effect of latency on the experience of relatedness and even reactions with a substantial delay of 4000 ms are experienced as rather related. In contrast, when the other person always engages in gaze-following relatedness ratings decrease linearly starting at a latency of 400 ms. Furthermore, reactions with latencies of more than 2000 ms are experienced as unrelated to one's own gaze shifts – they fall below the dashed line symbolizing a neutral rating in **Figure 2B**, and thereby reach the level of unrelatedness that is associated with NJA.

The effect of the other person's options for action is interesting in that it throws new light on the role of perceived causality for one's sense of agency, which traditionally has to do with predicting the sensory consequences (avatar gaze shift) of self-produced actions (own gaze shift). This means that in a joint context,whereas my sensorimotor cues with respect to my own action remain identical to non-joint situations, I perceive the consequences of my actions *in the actions of the other person*. Therefore, the nature of the other person's behavior will have a bearing on my experience of self-agency. In particular, as Pacherie (2012) notes, the strength of the sense of agency is related to how well our predictions regarding another person's reaction to our own actions match with the actual reaction. This is specifically true in smallscale interactions – as in our experiments – in which every aspect of the interactors' behavior is accessible. Rather than investigating sense of agency in an all-or-none fashion, we therefore interpreted participants' ratings of relatedness of the other's gaze reaction as a measure of how strongly they experienced agency in a given gaze trial.

Adopting this view of agency, the results of experiments 1a and 1b could reflect the role of perceived causality for one's sense of agency. Haggard et al. (2002) have suggested that sense of agency depends crucially on the intentionality of the agent and found that it decreases with increasing action-outcome delays, as it does in Experiment 1b, and to a lesser degree in Experiment 1a. Subsequent research has shown that not only intentionality, but also *perceived* causality is crucial for the sense of agency. Buehner and Humphreys (2009) found that, when keeping action-outcome constant, given a strong perceived causal link, intentional binding was preserved at action – outcome delays of up to 4 s, as in Experiment 1a. However, there is a less persistent sense of agency in Experiment 1b although the actual causal link is stronger due to the avatar always following my gaze. This could mean that perceived causality is less important for my sense of agency in an interactive context. More plausibly, it could be that in an interactive context, since I am dealing with another agent, the evaluation of my own actions as causally efficacious is only meaningful *when I know that the other has different options for action*. Put otherwise, if I have to evaluate my own sense of agency, *given* that the effect is observed in the behavior of *another* agent, my judgment could be influenced crucially by the sense of agency I am able to attribute to the other (as suggested in Schilbach et al., in press). Further research is needed to look at the interdependency of one's sense of agency for self and other in interaction, but the data from the first experiment show that there is a difference between how sense of agency is experienced in social as compared to non-social situations.

## **DIFFERENCES IN GAZE DYNAMICS BETWEEN JOINT AND SHARED ATTENTION**

In Experiment 2, the dynamics of gaze behavior in situations of JA and SA were assessed while making use of the temporal parameters uncovered in Experiment 1b. As described in the introduction, the necessary criteria for *joint attention* require only one of the interaction partners to be aware of the joint focus of attention. *Shared attention*, however, warrants *both* gazer and gaze-follower to be simultaneously aware of focusing on the same object *and* on each other's awareness of focusing on the same object (Emery, 2000). Results clearly indicate that participants required a significantly higher number of gaze shifts between objects and the virtual character in order to establish SA as compared to JA. As a consequence of this, trial length was considerably longer. JA required only slightly more than one gaze shift on average and is reached significantly earlier in self- vs. other-initiated trials. This indicates that participants were able to make inferences about the emergence of JA by focusing on the object and seemingly observing their partner's gaze reaction at the same time. Due to the impossibility of fixating two spatially separated objects simultaneously, these data demonstrate that a peripheral and quick recognition of the other's gaze reaction is sufficient for the establishment of

JA. In contrast to SA, the establishment of JA happens rapidly and is characterized by considerably less inter-individual invariance (see **Figure 3A**). This suggests that JA is characterized by the mere detection of the other's focus of attention, thereby possibly representing a visual detection task rather than a mentalizing task. Unfortunately, it is not directly possible to compare reaction times between the present results and findings on visual detection. Previous studies have not used interactive settings but concentrated on the detection of objects in real-world scenes (Biederman, 1972) or on the detection of gaze direction in static displays (Franck et al., 1998). Using interactive eye-tracking, however, the link between JA and visual detection could now be assessed specifically.

In contrast, such an observation of the other's gaze behavior "out of the corner of the eyes" appears to be insufficient for a reliable identification of a situation of SA. It has previously been argued that SA might be characterized by an increased level of interactivity (Staudte and Crocker, 2011). According toKaplan and Hafner (2006), true SA requires a monitoring and understanding of the intentions of the other in a coordinated interaction process and is only reached when "both agents are aware of this coordination of "perspectives" toward the world" (Kaplan and Hafner, 2006, p. 145). The increased number of gaze shifts between the virtual character's face and the object and the correlated increase in trial length are indicative of such a coordinated interaction aimed at an alignment of intentions. Determining whether another person is aware of the object jointly focused upon as well as of "us" being aware of us being aware requires thinking about the other's mental states. This is reflected by the dynamics of gaze behavior which exceed the simple detection of a gaze shift to a joint focus of attention. In the vast majority of trials in the JA condition there is not a single look back to the virtual character's face, while this is practically always the case in the SA condition (**Figure 3**): participants have to re-establish eye contact at least once before they indicate to experience SA. It has recently also been shown in an interaction task within a minimalist virtual environment that higher complexity and reciprocity in the dynamics of a *tactile* interaction leads to the experience of interacting with another human agent (Auvray et al., 2009). The experience of non-verbal social interaction therefore more generally seems to hinge upon certain elaborate dynamics between actions and reactions.

A final observation refers to the substantial inter-individual variance in the number of gaze shifts participants exhibit before indicating the experience of SA (cf. **Figure 3A**). This connotes that gaze behavior as an embodied correlate of mentalizing is subject to greater inter-individual differences as compared to gaze behavior in a visual detection task. Literature suggests that interindividual differences in personality traits and behavioral dispositions strongly influence the performance in different types of mentalizing tasks, i.e., tasks that require reasoning about other persons' mental states. For example, self-reported measures of empathy (Baron-Cohen and Wheelwright, 2004) or of the drive to do things systematically (i.e., systemizing, Baron-Cohen et al., 2003) as well as the personality trait of agreeableness (for a detailed discussion, see Nettle and Liddle, 2008) have been shown to affect mentalizing in a variety of tasks. More studies are

required in order to determine which personality traits or behavioral dispositions result in the observed variance of gaze patterns in SA.

Taken together, the findings reported in this paper can be taken as a first fine-grained description of the temporal and spatial dynamics of social gaze in triadic interactions and their influence on our sense of agency and awareness of the mental states of others. Further assessment of the underlying mental processes is required to understand how manipulations of these aspects change our experience of a social interaction and our perception of the interaction partner.

#### **OUTLOOK**

Interactive eye-tracking paradigms incorporating virtual characters have proven specifically useful for the study of social interaction face-to-face and in real-time (Schilbach et al., in press). One major asset of such studies is that the results can be immediately fed back into novel designs with even greater ecological validity. This can stimulate the development for therapeutic tools to learn or improve non-verbal communication in autism spectrum disorders. These are characterized by impairments of the ability to interact with others, as well as by a specific deficiency in reading information from the eye region and interpreting gaze cues (Senju and Johnson, 2009). For example, autistic persons have problems engaging in JA – this is most apparent for the initiation of JA, although responding to another person's bid for JA can also be problematic (Mundy and Newell, 2007). In a recent report on attempts to teach autistic children to initiate and respond to bids of JA, they were required to engage in triadic interactions with an instructor and different kinds of toys (Taylor and Hoch, 2008). As this setting made eye contact difficult, JA was initiated by the instructor by pointing at an object instead of gazing at it. In the condition in which the children were supposed to initiate JA, they were prompted verbally to do so and explicitly told how to do it. A gaze-contingent display would be advantageous here for several reasons: first of all, the interaction with an avatar would be less distressing for autistic persons than real social interaction. Especially in the beginning of a training program this might be beneficial. Secondly, the training program could be designed in a highly structured manner. Features of the avatar's gaze behavior such as timing, gaze direction, or the length of direct gaze could be varied systematically while other facial features can be kept constant in order to prevent sensory overload. Thirdly, the simultaneous recording of eye-movements can be used to analyze scan paths in order to detect difficulties or peculiarities in the participant's gaze behavior. Furthermore, using interactive eye-tracking allows changing the avatar's reactions depending on the participant's gaze behavior in real-time. Lastly, a virtual setting provides more options

## **REFERENCES**


a minimalist virtual environment. *New Ideas Psychol.* 27, 32–47.

Bailenson, J. N., Blascovich, J., Beall, A. C., and Loomis, J. M. (2003). Interpersonal distance in immersive virtual environments. *Pers. Soc. Psychol. Bull.* 29, 819–833.

to highlight and manipulate objects, prompt certain actions, or deliver reinforcement for correct behavior.

Very recently, first attempts have been made to design gazecontingent virtual reality applications (Bellani et al., 2011; Lahiri et al., 2011). Lahiri et al. (2011) designed a virtual reality application for autistic adolescents in which they are required to interact with a realistically designed virtual classmate. Their task was to make this classmate as comfortable as possible by their behavior. They were positively reinforced the more they looked at the eyes of the character or followed their movements to an object on the screen. A gaze-contingent algorithm inspired by the one invented by Wilms et al. (2010) was used to detect fixations within predefined regions of interest (i.e., eyes, face, object) and to determine the kind of reinforcement depending on when and how long these regions were fixated. This provides a very interesting example for an implicit training of non-verbal social skills using a gaze-sensitive virtual environment. Although this approach is promising, therapeutic tools still have difficulties providing the avatars with realistic gaze behavior (Bellani et al., 2011). Although clearly more work is needed, results from the present study could potentially be incorporated into virtual therapeutic tools.

## **CONCLUSION**

A thorough exploration and understanding of the parameters of social gaze is crucial for the investigation and understanding of social interactions in gaze-contingent paradigms (Wilms et al., 2010; Bayliss et al., 2012; Grynszpan et al., 2012) and for the formulation of hypotheses regarding people's gaze behavior in online interaction (Neider et al., 2010; Dale et al., 2011). In addition, recent advances have been made to the development of dual eye-tracking setups which allow for investigating the gaze behavior of two participants interacting and collaborating in a shared virtual environment (Carletta et al., 2010). Although this approach is very promising, the design of tasks allowing for an assessment of interaction dynamics while controlling variables affecting the interaction still remains a challenge. Before true interaction without simulated others can be investigated, the use of interactive eye-tracking paradigms provides an important tool to study social gaze behavior in persons who experience being engaged and being responded to in an interaction.

#### **ACKNOWLEDGMENTS**

This study was partially supported by a grant of the Köln Fortune Program of the Medical Faculty at the University of Cologne to Leonhard Schilbach and by a grant "Other Minds" of the German Ministry of Research and Education to Kai Vogeley. The authors would like to thank Stephanie Alexius and Leonhard Engels for their assistance in data collection.


with Asperger syndrome or highfunctioning autism, and normal sex differences. *Philos. Trans. R. Soc. Lond. B Biol. Sci.* 358, 361–374.

Baron-Cohen, S., and Wheelwright, S. (2004). The empathy quotient: an investigation of adults

with Asperger syndrome or high functioning autism, and normal sex differences.*J. Autism Dev. Disord.* 34, 163–175.


*IEEE Trans. Neural Syst. Rehabil. Eng.* 19, 443–452.


sarcastic statements. *Percept. Mot. Skills* 108, 565–572.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 July 2012; accepted: 13 November 2012; published online: 03 December 2012.*

*Citation: Pfeiffer UJ, Schilbach L, Jording M, Timmermans B, Bente G and Vogeley K (2012) Eyes on the mind: investigating the influence of gaze dynamics on the perception of others in real-time* *social interaction. Front. Psychology 3:537. doi: 10.3389/fpsyg.2012.00537 This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Pfeiffer, Schilbach, Jording , Timmermans, Bente and Vogeley. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## At the mercy of strategies: the role of motor representations in language understanding

## **Barbara Tomasino<sup>1</sup>\* and Raffaella Ida Rumiati <sup>2</sup>**

1 Istituto di Ricovero e Cura a Carattere Scientifico "Eugenio Medea", San Vito al Tagliamento, Italy

<sup>2</sup> Neuroscience Area, Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Claudia Scorolli, University of Bologna, Italy Michael Kaschak, Florida State University, USA David Kemmerer, Purdue University, USA

#### **\*Correspondence:**

Barbara Tomasino, Istituto di Ricovero e Cura a Carattere Scientifico "Eugenio Medea", Polo Regionale del Friuli Venezia Giulia, Via della Bontà 7, San Vito al Tagliamento 33078, Italy. e-mail: btomasino@ud.lnf.it

Classical cognitive theories hold that word representations in the brain are abstract and amodal, and are independent of the objects' sensorimotor properties they refer to. An alternative hypothesis emphasizes the importance of bodily processes in cognition: the representation of a concept appears to be crucially dependent upon perceptual-motor processes that relate to it. Thus, understanding action-related words would rely upon the same motor structures that also support the execution of the same actions. In this context, motor simulation represents a key component. Our approach is to draw parallels between the literature on mental rotation and the literature on action verb/sentence processing. Here we will discuss recent studies on mental imagery, mental rotation, and language that clearly demonstrate how motor simulation is neither automatic nor necessary to language understanding. These studies have shown that motor representations can or cannot be activated depending on the type of strategy the participants adopt to perform tasks involving motor phrases. On the one hand, participants may imagine the movement with the body parts used to carry out the actions described by the verbs (i.e., motor strategy); on the other, individuals may solve the task without simulating the corresponding movements (i.e., visual strategy). While it is not surprising that the motor strategy is at work when participants process action-related verbs, it is however striking that sensorimotor activation has been reported also for imageable concrete words with no motor content, for "nonwords" with regular phonology, for pseudo-verb stimuli, and also for negations. Based on the extant literature, we will argue that implicit motor imagery is not uniquely used when a body-related stimulus is encountered, and that it is not the type of stimulus that automatically triggers the motor simulation but the type of strategy. Finally, we will also comment on the view that sensorimotor activations are subjected to a top-down modulation.

**Keywords: motor simulation, word representations, action understanding, imagery, cognitive strategies**

## **INTRODUCTION**

Functional magnetic resonance imaging (fMRI) studies have identified regions in sensorimotor cortex that are activated preferentially by action-related words but also for words with no motor content. The extent to which these patterns of activation are modulated by bottom-up or top-down mechanisms is currently unknown. Many cognitive processes rely on both"bottom-up"and "top-down" processing. One example is found in the mental rotation domain in which bottom-up processing is first triggered by the stimulus category, and then continues until sensorimotor or visuospatial operations are engaged. On the other hand, top-down processing refers to the modulatory effect exerted by cognitive strategies, which can be implicitly adopted by participants while solving the task at hand. Accordingly,motor representations can or cannot be activated depending on the type of strategy the participants adopt to perform tasks involving bodily – and non-bodily related stimuli.Whether the same pattern, speakingfor a top-down modulation of sensorimotor activation depending on strategy (or contextual) factors might be applied to the action-related word processing domain is currently under debate. Despite growing research efforts, the actual *cause* of the observed motor system activity during action word processing remains elusive (Kemmerer and Gonzalez-Castillo, 2010). Some authors argue that the actionrelated aspects of a word's meaning are represented in and around the motor strip and that these regions are automatically and invariably activated when action words are encountered, and should not be modulated by attentional demands, i.e., associationist theory (Pulvermuller et al., 2001, 2005a; Pulvermuller, 2005). Sensorimotor activations observed during language processing reflect how word meaning is stored in the brain. According to embodied theories of cognition, sensory-motor systems play an important role in the representation of concepts (Lakoff, 1987; Glenberg, 1997; Barsalou, 1999; Lakoff and Johnson, 1999; Glenberg and Kaschak, 2002; Feldman and Narayanan, 2004; Gallese and Lakoff, 2005). A bold version of embodiment theory (Gallese and Lakoff, 2005) does not just assume that our concepts can be represented in sensorimotor systems but rather that they *are* the sensorimotor systems. Secondary, embodied proposals argue that word meaning is linked to sensorimotor experience derived from the motor and perceptual simulation during comprehension; however, these simulation

processes are not a reflection of how meaning is represented (Mahon and Caramazza, 2008). Versions of embodied theories (Barsalou et al., 2008; Borghi and Cimatti, 2009; Dove, 2009; Louwerse and Jeuniaux, 2010; Borghi, 2012) based on a mixed view of how concepts are represented, propose that both amodal and modal conceptual representations coexist in conceptual processing, i.e., a "representational pluralism"; they also extend the embodied view of cognition to account not only for language grounding but also for the social and normative aspects of cognition (Borghi and Cimatti, 2009; Wilson-Mendenhall et al., 2012). However, it is still not clear how these recent theoretical developments can account for the lack of sensorimotor activation in some of the action-related word processing studies.Although simulation and associative learning theories are difficult to tease apart (e.g., Keysers and Perrett, 2004; Brass and Heyes, 2005), the contribution of the top-down strategic modulation might be a promising approach to investigate the interaction between the language and motor systems. With respect to whether the motor areas activation is bottom-up, both the embodied cognition theory and associationist theory lead to identical predictions. On both accounts the activation of the sensorimotor areas observed in several fMRI studies investigating action-related word processing is maintained to be stimulus-dependent (Hauk et al., 2004; Pulvermuller, 2005; Ruschemeyer et al., 2007;Kemmerer and Gonzalez-Castillo, 2010). Thus different mental operations may be at work during language processing depending on whether the stimulus type is an actionrelated word or a non-action word. By contrast, the associationist theory is certainly not in agreement with the top-down hypothesis. According to the associationist theory (Pulvermuller et al., 2001, 2005a; Pulvermuller, 2005), the activation of the sensorimotor cortex"should not require people to attend to language stimuli, but should instead be *automatic*" (Pulvermuller et al., 2005b). The top-down hypothesis instead holds that motor activation is not automatically triggered by the type of stimulus but by the type of strategy. Also the embodied cognition hypothesis, as it claims that "understanding" *is* sensory and motor simulation, is not compatible with the view that the type of strategy selected depends on top-down modulation of the context and tasks demands. Rather the top-down hypothesis is in line with the disembodied view the motor system may be activated but not necessarily so (Mahon and Caramazza, 2005, 2008).

Although previous studies also point to an involvement of the premotor cortex (PM) in processing action verbs (e.g., Tettamanti et al., 2005, 2008; Aziz-Zadeh et al., 2006), in the present review article we were primarily interested in the neural response pattern of the (left) M1 cortex, given the susceptibility of this region to top-down modulation (e.g., Kosslyn et al., 2001).

## **A LESSON FROM MENTAL ROTATION BOTTOM-UP HYPOTHESIS**

Bottom-up and top-down processes have specifically been triggered in studies that aimed at investigating whether the recruitment of motor representations in mental rotation depends on the stimuli, or on the particular mental operation adopted in solving the task, respectively. Although mental rotation (hereafter MR) is generally held to be under conscious control (Cooper and Shepard, 1973), there is also evidence that part of the processing escapes awareness. One way to go about characterizing the MR operations is to evaluate whether they differentially respond to the type of stimulus that is mentally rotated (Kosslyn et al., 1998; Rumiati et al., 2001; Tomasino et al., 2003), the reference frame (Wraga et al., 1999; Zacks et al., 1999, 2002), or to the type of strategy (Kosslyn et al., 2001).

The standard view has long maintained that the mechanisms involved in MR were essentially bottom-up, that is externally triggered by low-level information derived from the stimuli to be rotated (we will refer to this as the type of stimulus hypothesis). Three are the types of stimulus used in MR experiments: the 2D alphanumeric characters, 3D abstract pictures such as cubes and body parts such as hand shapes. All these stimuli can elicit two types of MR mechanisms (Kosslyn et al., 1998): (i) object-based spatial transformations, and (ii) egocentric perspective transformations. The former MR mechanism generates simulated rotations of, for instance, hands, remodeled as reaching movements in which subjects implicitly turn their own hands into correspondence with the pictured hand stimulus (Parsons et al., 1995, 1998; Kosslyn et al., 1998; Parsons and Fox, 1998). By contrast, the latter mechanism corresponds to imagined movements of one's point of view, generally used for mentally rotating external abstract pictures in the visual space, without the need of a motor simulation (Zacks et al., 2002, 2003). Thus, different operations may be recruited in MR depending on whether the stimulus type is a body part or a two or three-dimensional object (Parsons et al., 1995, 1998; Kosslyn et al., 1998; Parsons and Fox, 1998).

Neuropsychological studies documented a double dissociation between the processes underlying these two types of transformations, each of which can be selectively affected as a result of brain damage. In a group study, patients with right brain damage (RBD) showed impaired MR of external objects (e.g., a puppet and flag shapes), while patients with left brain damage (LBD) showed impaired MR of hands (Tomasino et al., 2003). These results are compatible with single case reports. On the one hand, patient MT, with left hemisphere brain damage, was described as being selectively impaired at left or right hand decisions despite being still able to mentally rotate Shepard and Metzler's stimuli (Rumiati et al., 2001). On the other hand, patient JB, with a bilateral inferotemporal lesion, was also observed as having a deficit in performing MR of Shepard and Metzler's stimuli (Sirigu and Duhamel, 2001). However, the ability to mentally rotate motor images of body parts was not investigated in JB or in other posterior left (Kosslyn et al., 1985; Metha and Newcombe, 1991; Morton and Morris, 1995) or right (Ratcliff, 1979; Farah et al., 1988; Ditunno and Mann, 1990; Bricolo et al., 2000) brain-damaged patients with a MR deficit.

Consistently with the notion that MR operations interact with the type of stimulus, neuroimaging research has provided *in vivo* evidence that different types of stimuli trigger different mental rotation-related clusters (Kosslyn et al., 1998). Using the Positron Emission Tomography (PET), these authors monitored the regional cerebral blood flow (rCBF) of healthy subjects during two mental rotation tasks. In the first task, subjects compared and decided whether two angular branching forms (i.e., Shepard– Metzler cubes) had the same (baseline) or different orientations (rotation condition), while in a second task stimuli used were line drawings of hand shapes. Mentally rotating branching forms enhanced bilateral activation in the right parietal lobe and in Brodmann Area (BA) 19, whereas mentally rotating hands enhanced unilateral left activation in the precentral gyrus (M1), most of the parietal lobe, the primary visual cortex, the insula, and frontal BAs 6 (PM) and 9 (superior frontal cortex). Kosslyn et al. (1998) proposed that at least two independent mechanisms are engaged in the mental rotation of hands and objects, one requiring processes that prepare motor movements, and one that does not, and that motor processes are recruited only when participants mentally rotated hands but not when they mentally rotated Shepard and Metzler's stimuli.

Psychophysical evidence too demonstrated that hands are a special type of stimulus. Response times during MR of body parts reflect the degree of awkwardness associated to the orientation of the hand stimulus and the length of the imagined path (Parsons, 1987, 1994; Parsons et al., 1995). This reaction time (RTs) pattern provides the evidence that subjects imagine a spatial transformation of their own body part from its actual orientation until it matches the stimulus orientation. By contrast, the effect of biological constraints on RTs has never been found duringMR of external objects, thus suggesting that MR may recruit different mechanisms depending on the type of stimuli involved in the mental transformation. Accordingly, MR of hands, but not of objects, implicitly triggers sensorimotor imagery rather than visuospatial imagery alone.

The view that MR operations are differentially triggered depending on the *type of stimulus* to be rotated, as suggested by the above reviewed studies, was soon modified following a neuroimaging study (Kosslyn et al., 2001) in which it was argued that the left M1 was not recruited for mentally rotating only body parts such as hand shapes, but also non-body-part stimuli such as external abstract objects (Cohen et al., 1996; Tagaris et al., 1996; Richter et al., 1997; Carpenter et al., 1999; Lamm et al., 2001; Vingerhoets et al., 2001), even though subjects were not explicitly instructed to use a particular strategy (Kosslyn et al., 2001). Kosslyn et al. (2001) argued that subjects might have spontaneously adopted a motor strategy, accounting thus for these results. This (Kosslyn et al., 2001) and other studies (Wraga et al., 2003; Tomasino and Rumiati, 2004; Tomasino et al., 2004) that soon followed paved the way to the formulation of the top-down hypothesis, as we will discuss in the following section.

## **TOP-DOWN HYPOTHESIS**

According to the top-down hypothesis, higher-level mechanisms guide individuals to select the most suitable cognitive strategy that allows them to solve MR tasks. Thus the original view that different MR mechanisms are elicited depending on the type of stimulus under rotation, has later been replaced by the hypothesis that this selection mechanism rather depends on the frame of reference or the type of strategy used in imagining inanimate objects rotating (Kosslyn et al., 2001; Zacks et al., 2002, 2003). This *top-down hypothesis* holds that there could be at least two strategies involved in MR. One strategy encompasses imagining what one would see if he/she manipulates an object, the other implicates imagining what one would see if someone else, or an external force, manipulates an object (Kosslyn et al., 2001). In that PET study (Kosslyn et al., 2001), subjects mentally rotated Shepard and Metzler stimuli using

either an external strategy or an internal strategy. Before performing this MR task, subjects either viewed an electric motor device rotating the 3D cube (external action) or they rotated it manually (internal action). Afterward, subjects performed the MR by imagining grasping the object, and turning it with their own hand, or by mentally viewing the stimulus as if it were being rotated by an electric motor device. The same region that in Kosslyn et al.'s (1998) PET study was activated in association with MR of hands only – the left primary motor cortex – here was enhanced when subjects simulated a manual rotation of the Shepard and Metzler's stimuli.

Neuropsychological evidence further supported the view that what matters in MR is the type of strategy adopted (Tomasino and Rumiati, 2004). Patients with unilateral brain lesions and healthy control subjects were instructed to adopt a motor (egocentric transformation) and, in a different block, a visual strategy (allocentric transformation) when performing MR of hand shapes (Experiment 1) or Shepard and Metzler's stimuli (Experiment 2). Independent of the type of stimulus, LBD patients showed a selective deficit in MR either hands and 3D cubes as a consequence of their manual activity, whereas RBD patients performed pathologically on a MR task in which they were required to apply a visual strategy (Tomasino and Rumiati, 2004). This study showed how MR could be achieved by recruiting different strategies, implicitly triggered or prompted at will, and each sustained by a unilateral brain network.

How can we reconcile the neuropsychological findings, supporting the view that MR is a lateralized process which depends on the type of stimulus (Tomasino et al., 2003), with those in favor of MR as depending on the strategy adopted (Kosslyn et al., 2001; Tomasino and Rumiati, 2004)? While in Tomasino et al. (2003) LBD patients were impaired at mentally rotating hands but not external objects, and RBD patients showed the opposite pattern, in a subsequent study (Tomasino and Rumiati, 2004), LBD patients, explicitly encouraged to apply either the motor strategy or the visual strategy, failed to rotate both types of stimuli when the operation was solved by means of a motor strategy, but succeeded when the alternative visual strategy was selected. As Kosslyn et al. (1998) argued, in the absence of clear instructions, participants spontaneously adopt one or the other strategy to perform MR. According to whether the mental operation intrinsically requires imagining limb movements (somatomotor operation) or the motion of visual objects (visuospatial operation), MR can be solved via motor or visual strategy. Thus both bottom-up and top-down strategies are used in MR, and their selection seems to depend on task settings, instructions, and other variables. Participants may voluntarily adopt one or the other strategy if prompted by the experimenter but, in a free choice paradigm, the preferred strategy can also be stimulus-dependent. When subjects are not instructed to adopt a given strategy, the type of stimulus determines which one is going to be selected moreover, these strategies can be implicitly transferred from one type of MR to another, and lateralization might vary according to the order of block presentation (Wraga et al., 2003). Transcranial magnetic stimulation (TMS) studies have shown that stimulation over the left M1 slowed down MR of hands but not of letters (Tomasino et al., 2005) or feet (Ganis et al., 2000). In Tomasino et al.'s (2005) study, subjects

were free to apply one or the other strategy, with the instructions requiring them to mentally rotate the stimulus on the right, and decide whether it was the same or a mirror image of the other. Since an interference effect due to stimulation was obtained only during MR of hands, it was held that hands *implicitly* require a mental motor transformation. By contrast, since TMS interferes with MR of hand shapes but not of letters, it has been argued that alphanumeric characters do not implicitly require a mental motor strategy (i.e., viewer-based) but rather a visuospatial strategy (i.e., object-based). Moreover, brain tumor patients with selective lesions, selectively affecting the hand sensorimotor representation, failed to mentally rotate hand shapes, but not letters, if they were free to use any cognitive strategy; this deficit, however, extended to abstract objects when the patients imagined moving them with their own hands, while maintaining the ability to visualize them rotating in space (Tomasino et al., 2010a). These neuropsychological findings provide conclusive evidence that discrete brain areas can be selectively recruited according to the strategy that is implicitly adopted while solving a cognitive task.

### **TOP-DOWN MODULATORY EFFECTS IN OTHER COGNITIVE DOMAINS**

That partially discrete brain networks can support different cognitive operations depending on their purpose has been demonstrated in other cognitive domains. For instance, the visual information can be used either for identifying objects (along the "what" stream) or for guiding action (along the "how" stream; Milner and Goodale, 1995). These authors described a patient, DF, with visual form agnosia caused by a bilateral occipital lesion, as being severely impaired at perceptually judging the orientation of a line as well as at showing with her fingers the dimensions of objects that were visually presented; however, she was able to orient her hand in a posting task as well as to execute normal reaching-grasping movements (Goodale et al., 1991; Milner and Goodale, 1995). The opposite pattern was observed in patient RV, with a bilateral occipital lesion, who failed to grasp objects whose visual shape he was almost perfectly able to identify (Goodale et al., 1994).

The existence of different networks specialized in carrying out the same cognitive operation according to its purpose is supported by different sources of evidence. For instance, it has been shown that differential neural mechanisms were enhanced when subjects solved the line bisection task either manually (action) or as perceptual judgments (vision; Weiss et al., 2003). In particular, in the latter condition, a unilateral activation of the right inferior parietal cortex, anterior cingulate, dorsolateral prefrontal cortex, including also the extrastriate and superior temporal cortex bilaterally, was observed. By contrast, the manual bisection task enhanced activation in the extrastriate, superior parietal, and premotor cortices bilaterally.

Finally, it has been shown how hemispheric specialization might be dependent upon the nature of the task rather than on the nature of the stimulus (Stephan et al., 2003). In their fMRI study, 16 right-handed volunteers performed two different tasks on an identical set of four letter words, three of which written in black and either the second or third letter in red. While in the letter-decision task, the participants were asked to ignore the position of the red letter and indicate whether or not the displayed word contained the target letter "A," in the visuospatial-decision task, they were required to ignore the language-related properties of the words and to judge whether the red letter was located left or right of the center of the word. Comparing letters in the visuospatial-decision task led to a significantly higher activation in the left inferior frontal gyrus, occipital cortex, ventral PM (PMv), anterior cingulate cortex (ACC), and supplementary motor cortex. In contrast, visuospatial decisions compared with letter decisions significantly increased the activation in the anterior and posterior parts of the right inferior parietal lobule. For the authors this functional dissociation suggests that the cognitive control mechanisms differentially directs attention to specific stimulus features and guide the subsequent information processing.When they analyzed the frontal regions responsible for cognitive control, an increased coupling between leftACC and left inferiorfrontal gyrus wasfound for letter decisions, and between the right ACC and right parietal areas for visuospatial decisions (Stephan et al., 2003). To conclude, the plasticity with which the brain adapts to the different tasks and contexts, and switches between hemispheres, in the studies reviewed above is comparable with the one found in the mental rotation domain (Tomasino and Rumiati, 2004).

## **MENTAL ROTATION AND ACTION-RELATED WORD PROCESSING**

## **BOTTOM-UP HYPOTHESIS**

The recruitment of the sensorimotor areas observed in several fMRI studies investigating action-related word processing has been interpreted as being stimulus-dependent (Hauk et al., 2004; Ruschemeyer et al., 2007;Kemmerer and Gonzalez-Castillo, 2010). For example, lexical decisions about action verbs, i.e., to judge whether a verb is a real word or a pseudoword, were found to lead to stronger high-frequency EEG activity at recording sites located closely above primary motor (M1) cortex (Pulvermuller et al., 2001). Interestingly, action words related to different body parts, i.e., face, arm, or leg movements, compared with nonaction words, activated the primary motor cortex and the PM in a somatotopic manner (Hauk et al., 2004; Buccino et al., 2005; Aziz-Zadeh et al., 2006). Listening to sentences expressing actions performed with the mouth, the hand, or the foot led to signal increased in different parts of the left PM depending on the effector involved in the action described in the sentence (Tettamanti et al., 2005; Aziz-Zadeh et al., 2006). TMS of the left M1 causes similar effector-specific M1 modulation during listening to hand and foot action-related sentences (Buccino et al., 2005), and during a lexical decision task (Pulvermuller et al., 2005b). In addition, the activation of the left M1 increased for action words (verbs and nouns) compared with non-action words (Oliveri et al., 2004).

Thus different mental operations may be at work during language processing depending on whether the stimulus type is an action-related word or a non-action related word. The sensorimotor activation during language processing has been interpreted as sensorimotor representations being an integral part of action word representation (Pulvermuller, 2005). According to the proponents of the associative learning approach (Pulvermuller, 2005), the activation of the sensorimotor cortex can play a specific functional role in recognizing action words (p. 578, Pulvermuller, 2005). Specifically, authors suggested that neurons in the fronto-central cortex differentially contribute to the semantic processing of action words, and hence called them semantic neurons, located in the inferior fronto-central cortex for face-related words, and in the superior central cortex for leg-related words (consistent with the known motor somatotopy; Pulvermuller, 2005).

A similar view is the one forwarded by the embodied hypothesis of language understanding according to which conceptual knowledge is grounded in sensory-motor systems (Barsalou, 1999; Feldman and Narayanan, 2004; Gallese and Lakoff, 2005). This idea is consistent with the view that word meaning is processed in dedicated cortical areas (e.g., Martin et al., 1995, 1996), and is in sharp contrast with the conceptual-level representation theory (e.g., Pylyshyn, 1984; Fodor, 2001), which suggests that the meaning of a verbally presented action is accessed through abstract amodal units. The latter view emphasizes the abstract, amodal, and symbolic character of concepts, which are thought to be represented outside the brain's sensory-motor systems. According to this view, concepts are not represented within the sensory and motor systems – the (so-called) disembodied cognition hypothesis. According to the disembodied cognition hypothesis, conceptual representations are "symbolic" and "abstract" and, as such, qualitatively distinct. An intermediate position is represented by the secondary embodiment, according to which amodal conceptual representations are instantiated by retrieving sensory and motor information by an independent, but associated, semantic system (Mahon and Caramazza, 2008). Lastly, recent theories based on multiple types of representation (Barsalou et al., 2008; Borghi and Cimatti, 2009; Dove, 2009; Louwerse and Jeuniaux, 2010; Borghi, 2012) propose the existence of both amodal and modal conceptual representations in conceptual processing, i.e., a "representational pluralism" (Dove, 2009) they also extend the embodied view of cognition to account not only for language grounding but also for the social and normative aspects of cognition (Borghi and Cimatti, 2009; Wilson-Mendenhall et al., 2012). However, it is still not clear how these recent theoretical developments can account for the lack of sensorimotor activation in some of the action-related word processing. The view that sensorimotor areas are activated depending on the *type of word*, has been challenged by several studies which showed how the recruitment of the sensorimotor areas is not automatic as held before (Pulvermuller et al., 2005b), but rather context-dependent (Tomasino et al., 2007, 2008; Papeo et al., 2009; van Dam et al., 2010b, 2012; Willems et al., 2010).

## **TOP-DOWN HYPOTHESIS**

Similarly to what has been observed in the mental rotation domain, individuals might be using different strategies in trying to understand action-related words or phrases. One of these strategies involves implicit simulation, that is a process that occurs when subjects unconsciously simulate the movement while performing another task, even in the absence of a precise instruction to do so (Jeannerod and Frak, 1999). The tasks which have been found to elicit implicit simulation are: mental rotation of body parts (e.g., Zacks et al., 1999; Kosslyn et al., 2001), handedness recognition of a visually presented hand (e.g., Parsons and Fox, 1998), judgments as to whether an action would be easy, difficult, or impossible (Johnson-Frey et al., 2002), and recognizing and understanding

actions of other individuals (e.g., Jeannerod and Frak, 1999). It has been suggested that implicit simulation activates effectorspecific regions in the PM cortex, presumably because it facilitates further action planning whenever subsequent cues call for movements to be explicitly executed or to be imagined (Willems et al., 2010).

Consistently with the top-down hypothesis, when we are trying to understand action-related words may implicitly imagine the corresponding movement, thus triggering the underlying motor representation. In the mental rotation domain, it has been shown that if participants are not clearly instructed, it is the type of stimulus that determines which strategy will be selected (Wraga et al., 2003). In most of the fMRI experiments, evaluating the neural correlates of action-related language processing (e.g., Hauk et al., 2004; Buccino et al., 2005; Tettamanti et al., 2005), subjects were not instructed to explicitly imagine themselves or somebody else performing the movements. This, by itself, does not ensure that they might have nevertheless implicitly performed motor imagery. Thus, in the effort to control for putative motor imagery during word processing, participants were asked to perform an imagery task and a letter detection task with action and non-action verbs and found that, the imagery task compared to the letter detection task, led to an enhanced M1 activation for action verbs relative to non-action related (Tomasino et al., 2007). In other studies, the effector-specific activation of M1 was observed during semantic judgments on action verbs, relative to task conditions where the access to word meaning was less explicit or only incidental, e.g., letter detection or syllable counting (Papeo et al., 2009) or during imagery, but not during lexical decision of action-related stimuli (Willems et al., 2010), although authors found premotor activation during lexical decisions, consistently with results from a TMS study in which authors found that stimulation of hand-related PM modulated the processing of hand-related action verbs during lexical decisions (Willems et al., 2011). Evidence for such strategic effect has been recently found also on other brain networks during reading (Cummine et al., 2012).

According to the idea we are trying to putforward here, different task strategies cause participants to lean on different sensorimotor representations. In a series of studies investigating different aspects of language representations (e.g., morphology, grammar, category specificity, semantics), we checked the type of task used and whether M1 was explicitly reported among the activated areas in the critical comparisons involving action verbs.

On the one hand, we identified a series of studies involving action words or verbs in which no activation of M1 was found. For instance, Perani et al. (1999) used a *lexical decision* task involving concrete and abstract verbs (presented in their infinitive form) and nouns, and failed to find a selective activation of M1 when subjects processed concrete verbs (e.g., to brush, to comb, to write). Interestingly, making "*pleasant/unpleasant* " *decisions* about verbs and nouns, presented either as stem or inflected (e.g., for verbs: sing or sings), did not activate the M1 cortex for verbs relative to nouns (Longe et al., 2007). Neither did a task requiring *generating a verb* for a noun (Petersen et al., 1998). Other authors probed the *comprehension* of motion verbs and found (compared to pseudowords) stronger activity in the left ventral temporal-occipital cortex, bilateral prefrontal cortex, and caudate; however, there was no activation of M1 (Grossman et al., 2002). Furthermore, numerous neuroimaging studies found the middle/superior temporal gyrus to be activated during *action word generation* (Martin et al., 1995, 1996; Fiez et al., 1996; Tranel et al., 2005). Raposo et al. (2009), for instance, showed that *passive listening* to armand leg-related verbs, presented in isolation (e.g., *kick*), elicited M1 activation in study 1, whereas that literal sentences (as in "*kick the ball*") and idiomatic sentences (as in "*kick the bucket* "), constructed using the same action verbs as in the single word study, elicited M1 to a lesser extent in study 2. Differently from passive listening of words presented in isolation, this latter task required participants to *listen to sentences* and to *decide* on half of them whether a visual *probe* word, presented on the screen a few seconds after the end of the sentence, *was related* to the meaning of the sentence. Interestingly, idiomatic sentences activated frontotemporal regions, associated with language processing, but not motor and premotor cortices (Raposo et al., 2009). Passive listening and silent reading not always elicit M1 activation. *Passive listening* of action-related literal sentences, e.g., "biting the peach" as compared to metaphorical sentences including action words, e.g., "biting off more that you can chew," did not elicit any significant activation of M1 (Aziz-Zadeh et al., 2006). Other authors instructed participants to *silently read* blocks of action words related to specific effectors (e.g., punch, bite, or stomp), and items with various levels of lexical information (non-body part-related meanings, non-words, and visual character strings presented in infinitive form) and, when a fixation cross or hashes were presented, to watch the stimuli without mentally reciting them (Postle et al., 2008). They failed to find a somatotopic organization of action-related language processing.

Other showed that *passive listening* to sentences describing actions performed with the mouth, the hand, or the leg, and to abstract sentences task (Tettamanti et al., 2005) activated the PM but not the M1. Other authors used a *silent reading* of sentences including manual action verbs plus a specific physical object presented in past, present, and future forms, as compared to abstract verbs, followed by a *reading comprehension* task, involving questions referred to a temporal aspect of the sentence (e.g., "Is the table currently being cleaned?") in half of the cases and to a non-temporal aspect (e.g., "Did the sentence refer to a piece of furniture?") in the remaining items. They found that irrespective of the tense, action-related sentences did not activate the M1 cortex (Gilead et al., 2013). In another fMRI study, participants *listened to* sentences including a hand/arm action verb (e.g., grab, punch), a verb primarily visual in nature (e.g., read, browse), and abstract verbs (e.g., allow, explain) and judged whether the sentences were sensible, pressing a response button with their left index finger only for sentences *judged to be nonsense* (Desai et al., 2010). M1 cortex was not reported among the activated areas neither for the motor vs. visual-related verbs contrast nor for the motor vs. abstract related verbs contrast. In addition, the overlap between areas activated in the motor localizer task and those activated in the motor vs. visual-related verbs contrast, motor vs. abstract related verbs contrast was found in the inferior postcentral focus (Desai et al., 2010). It has been shown that, while watching of short object-related action movies activated the hand sensorimotor area bilaterally, *listening to and producing* short

sentences describing object-related actions and man-made objects did not (Tremblay et al., 2003). Tremblay and Small (2011) found a functional specialization within the PMv for observing actions and for observing objects, and a different organization for processing sentences describing actions and objects. In addition, the *generation* of verbs with strong motor association, in a *minimal phrase context* eliciting active semantic processing, as compared to a rhyming task, did not trigger activations in motor-related areas (Khader et al., 2010). Authors (Khader et al., 2010) reported stronger activation for verb generation in the left superior temporal gyrus. Other authors presented verbs denoting actions that one performs mostly with hands involved in a general motor program (e.g., to clean) or a more specific motor program (e.g., to wipe), plus as control 20 mouth-related words (van Dam et al., 2010a). Participants were instructed to *read* all words and perform *a categorization* task in which a go response should be made only to verbs denoting a mouth action. Van Dam et al. failed to report M1 cortex among the activated areas for the action-related vs. abstract verbs contrast, independent of whether actions were involved in a general motor program and more specific motor program. In another fMRI study by the same authors, participants were presented with (1) action words (i.e., words highly associated with a specific action, such as stapler), (2) color words (i.e., words highly associated with a specific color, such as wedding dress), and (3) action-color words (i.e., words highly associated with both an action and a color, such as tennis ball or boxing glove) and were instructed to *listen to* all words carefully and to perform a go/nogo semantic *categorization* task, in which go responses should be made only to words denoting objects that were associated with either a green color or a foot action (van Dam et al., 2012). These authors found that when participants were instructed to focus on the action performed on a word's referent, as compared to when they were instructed to focus on the object's color, no M1 activation was reported within action areas. In another study, subjects *listened* carefully *to indirect requests* (IRs) for action which are speech acts in which access to an action concept is required, although it is not explicitly encoded in the language, e.g., "It is hot here!" in a room with a window is likely to be interpreted as a request to open the window, while in a desert will be interpreted as a statement, and were *instructed to decide* whether they think the person wanted something from them or not (van Ackeren et al., 2012). Van Ackeren et al. found that the comprehension of IR sentences, as compared to sentences devoid of any implicit motor information, activated cortical motor areas as the left SMA and IPL bilateral, but not the M1 cortex. In another study by Moody and Gennari (2010), participants *read* the stimulus sentences describing actions requiring more or less physical effort, e.g., pushing the piano implies more physical effort than pushing the chair, and *occasionally answered comprehension* questions requiring a yes/no answer (e.g., did the man forget the piano?) by using their left hand when responding. The M1 cortex was not found among the regions activated by the items, while the premotor region was sensitive to the degree of effort implied by the actions.

On the other hand, there are several studies in which the M1 cortex has been reported among the regions activated by action words/verbs/sentences related stimuli. In on one of them, for instance, subjects (i) produced a verb corresponding to the

presented noun (e.g., "drive" for "car"), and (ii) reading verbs and nouns (Frings et al., 2006). These authors found that among other areas, the M1 cortex was significantly activated during verb and noun silent *reading* task. In another study, *lexical decisions* about action verbs,i.e., to judge whether a verb is a real word or a pseudoword,led to stronger high-frequency EEG activity at recording sites located closely above primary motor (M1) cortex (Pulvermuller et al.,1999). If the processed action words are related to movements of different body parts, then the strongest in-going EEG current is detected close to the cortical representation of the respective body part (Pulvermuller et al., 1999). Interestingly, such a somatotopic activation of M1 has also been reported when participants *silently read* action words related to face, arm, or leg movements (Hauk et al., 2004) and even when they were presented with action words while they were engaged in a *distractor task* (Pulvermuller et al., 2005b). *Lexical decisions* activated the left sensorimotor area only for simple verbs with motor meanings and not for morphologically complex verbs built on a motor stem (e.g., comprehend, which contains the motor verb stem prehend; Ruschemeyer et al., 2007). Sub-threshold TMS stimulation of the hand area of left M1 leads to a facilitatory effect (i.e., faster response times in a *lexical decision* task) for arm- compared to leg-action-related words, and the opposite effect has been found for leg-action-related words after stimulation of the leg area (Pulvermuller et al., 2005a). The excitability of the left M1 hand area (as determined by suprathreshold stimulation and measured by motor evoked potentials, MEPs) is modulated during a *transformation task* involving action words as compared to non-action words (i.e., producing the singular/plural form of nouns or the third person singular/plural form for verbs; Oliveri et al., 2004). Similarly, *listening* to hand-actionrelated sentences decreased the amplitude of MEPs recorded from hand muscles, while listening to sentences related to foot actions modulated the MEPs recorded from foot muscles (Buccino et al., 2005). TMS delivered at the end of the sentence over the leg motor area in the left hemisphere caused larger MEPs recorded from the right gastrocnemius and tibialis anterior muscles during *silent reading* of legs related verbs included in literal, e.g., the man runs in the beautiful country, metaphorical, e.g., the woman runs with her fantasy often, and fictive motion sentences, e.g., the road runs along the impetuous river, than with idiomatic motion, e.g., between the neighbors runs bad blood, or mental sentences (Cacciari et al., 2011). Furthermore, *silent reading* of nouns referring to tools elicited activations in the hand area and silent reading of nouns referring foods elicited activation in regions implicated in mouth and face movements (Carota et al., 2012). Also, *passive silent reading* of hand verbs that described hand actions without tool-use, tool verbs, and their semantic radicals indicated hand involvement and tool verbs, and their semantic radicals indicated the tools or materials showed common activations within the hand-motion effect mask, in bilateral precentral gyrus (BA 4). *Silent reading* of idiomatic vs. literal sentences involving handand leg-related action words activated M1 when both idiomatic and literal sentences were being processed (Boulenger et al., 2009). A *go/no-go lexical tone judgment task* of Chinese tool-use action verbs emphasizing the hand involvement or the tool or material involvement and verbs that describe hand actions without tooluse in which participants were instructed to press button when

the visually presented word had Tone 2 (low rising tone), activated within the motor localized mask precentral gyrus (BA 4) bilaterally for all three verb conditions (Yang and Shu, 2011). *Silent reading* of a series of sentences with a verb depicting either a mental state (e.g., deceive, persuade) or an action (e.g., punch, kick), and answering to a *comprehension* question that followed and required focusing on the mental state of a protagonist in half of the cases and the other half on actions involving a protagonist activated M1 (Kana et al., 2012), activated M1 (Kana et al., 2012). Interestingly, M1 was activated despite verbs being presented in a third singular person perspective, M1 was found activated in contrast with previous studies in which authors doubted whether they did not found M1 activation because they used the third person perspective (Gilead et al., 2013), consistently with a TMS study showing that motor simulation occurs for verbs in the first, but not in the third person perspective (Papeo et al., 2009). *Semantic generation* task, in which participants were instructed to quickly *describe how they would physically interact* with the visually presented pictures or words referring to objects that are typically used by hand or the foot, activated somatotopically M1 (Esopenko et al., 2012).

From the above mentioned literature it seems that it is neither the type of stimulus triggering M1 activation, since it appears clear that action-related words do not automatically activate the M1 cortex, nor the type of task, since it has been shown how, for instance, silent reading of or passive listening to action-related items might or might not activate the M1 cortex. This inconsistency of M1 activation may be explained with subjects performing or not performing mental simulation. These findings support our hypothesis that M1 activation depends on whether or not subjects choose to perform the motor imagery (explicitly or automatically) to solve the task requirements. If subjects use the strategy of simulating the movement referred to by the (action) verbs, M1 is activated; if, however, they use another strategy when solving the task at hand, M1 cortex is not activated. Consistent with this view, it has been shown that M1 cortex showed effector-specific activation for action hand verbs, as compared to non-manual actions (e.g., to kneel) during an imagery task in which participants were instructed to read the word, close their eyes, imagine performing the action, and open their eyes to indicate that they had finished motor imagery), but not during lexical decision (Willems et al., 2010). Willems et al. (2010) found that parts of PM distinguished manual from non-manual actions during both lexical decision and imagery, but there was no overlap or correlation between regions activated during the two tasks. Results from another study showed that unless explicitly instructed to perform mental imagery, M1 is not activated during language processing (Tomasino et al.,2007).A top-down modulation of strategies could determine whether participants do or do not perform mental simulation during language task. The motor imagery based strategy might be at work especially for tasks involving passive listening or passive silent reading and lexical decisions. According to this idea, in the above mentioned tasks involving action words (Pulvermuller et al., 2001, 2005a,b; Oliveri et al., 2004; Buccino et al., 2005; Tettamanti et al., 2005) subjects were free to use (or to refrain from using) the strategy of simulating the actions. The subjects' free choice in underspecified task settings may explain why M1 is not always activated in the fMRI studies involving action word stimuli. As a consequence, the

above mentioned results suggest that listening to or silent reading of action-related words items is not such a passive task as it is held. This view is supported by studies showing how the crucial factor that determines the activity in motor and premotor regions during action word processing seems to be that the context in which the word is presented. According to this view it has been suggested that the lack of M1 activation might be due to subjects not explicitly attending to the motor attributes of the words, raising the possibility that motor cortex modulation may occur only when participants directly attend to the actions and their motor properties (Kable et al., 2002, 2005). Cognitive studies suggest that language comprehension may not be based on a full word-byword processing, and that the contextual meaning of the sentence may influence the semantic processing of the upcoming words (Marslen-Wilson and Tyler, 1980; Tyler and Wessels, 1983; Ferreira et al., 2002; Sanford and Sturt, 2002). Instructions too might be responsible for triggering or not a given processing strategy. It is known that cognitive processing of the same verbal stimuli can be modulated by explicit instructions (Fink et al., 2002). In the visuospatial domain, participants have been found to solve the Landmark test, both by explicitly comparing the lengths of the left and right line segments, and by computing the center of mass of the display. Solving the same task, by using the two strategies elicited different neural activations, with the explicit length comparisons (relative to line center judgments) differentially activating the left superior posterior parietal cortex, with a tendency toward activation of the equivalent area on the right, while the reverse comparison revealed differential activation in the lingual gyrus bilaterally and ACC (Fink et al., 2002).

Neuropsychological evidence supports the view of a top-downdependent involvement of the sensorimotor cortex in linguistic processing. Neurosurgical patients with selective lesions of the precentral and postcentral sulci silently read action-related verbs (face-, hand-, and feet-related verbs plus neutral verbs) for subsequent (i) motor imagery by vividness ratings and (ii) frequency ratings. They showed a task × stimulus interaction: a lesion affecting a part of the cortex that represents a body part also led to slower RTs during the generation of mental images for verbs describing actions involving that same body part. By contrast, no category-related differences were seen in the frequency estimations (Tomasino et al., 2012). Two arguments have been put forward to rule out the possibility that sensorimotor activation during action words processing was due to secondary imaginary processes. In an attempt to minimize the influence of imagery, some authors administered the linguistic task first, followed by the action execution or observation tasks (Boulenger et al., 2006, 2009). Others suggested that the early neurophysiological activation spreading to M1 cortex revealed by MEG (Pulvermuller et al., 2005b) strongly speaks against the possibility that a second step imagery process is required. The motor activation occurs at about 150 ms after presentation of a written word, when normally lexical and semantic effects emerge (Pulvermuller et al., 2001, 2005a,b; Boulenger et al., 2006).

To establish when motor imagery exerts its influence over the sensorimotor activation, TMS has been applied at different points in time (Tomasino et al., 2008). Similarly to what has been found before (Pulvermuller et al., 2001, 2005a,b; Boulenger et al., 2006), a specific modulation of response times found as early as 150 ms. As a new feature, however, it has been clarified that the effect of the TMS selectively modulated the response times during the imagery task only, compared with the frequency judgment task and the silent reading task used as control conditions, suggesting that the effect of motor simulation occurs earlier (i.e., at 150 ms) than once thought (Pulvermuller et al., 2001, 2005a,b; Boulenger et al., 2006). This result is consistent with previous studies on motor imagery, showing that the activation of motorrelated brain areas associated with motor imagery occurs very fast, within the first hundreds of milliseconds (Wang et al., 2010), and with evidence of sensorimotor activation as early as 270– 390 ms after stimulus onset (Kawamichi et al., 1998). Lastly, similar results can be found in memorization of action sentences with an involvement of M1 detected between 150 and 250 ms after stimulus onset (Masumoto et al., 2006). In conclusion, we argue that an activation of M1 in word processing is comparable to what has been shown in the mental rotation literature with individuals solving the MR tasks by relying on different strategies. The view that people can use different strategies while processing actionrelated words hypothesizes that, in some circumstances, people understand action verbs/sentences in part by emphasizing motor representations of what it's like to execute the designated action, in part by emphasizing visual representations of what it's like to see the designated action. This view reinforces the parallel we are drawing between mental rotation and action word processing. As Taylor and Zwaan (2009) wrote to account for neuropsychological data on action-related word processing: "(. . .) comprehension relies on a multivariegated system for conceptual representation that relies on experiential memory (including motor, sensory, and intuitive experiential traces)." In addition, the top-down effect produced by the strategy use is strengthened now by neuroimaging evidence linking the visual-semantic motion features of action verbs/sentences with the left posterolateral temporal cortex (for a review, see Gennari, 2012). In this domain too it is held that modality-specific brain regions processing visual motion such as the middle temporal area or areaV5 are not automatically or habitually engaged in language processing (Gennari, 2012). The lack of V5 activation in tasks in which motion information must be recruited suggests that V5 activation in is not integral to motion content processing *per se*, but rather it results from top-down influences or selective attention (Gennari, 2012). As it happens for the M1 cortex, the middle temporal area or area V5 is susceptible to top-down control and higher-level perceptual/conceptual influences: implied motion, apparent and illusory motion, "moving" sounds, and imagined motion can all elicit significant levels of activation in this area (Gennari, 2012). Similarly to M1 cortex, V5 responds more strongly when participants attend to motion compared to when they do not, even when the visual stimulation is the same (O'Craven et al., 1997).

Although it has been proposed that conceptual processing transcends the distinction between bottom-up, stimulus-driven, automatic processing, on the one hand, and top-down, strategy-driven, controlled processing, on the other hand (Simmons and Barsalou, 2003; Wilson-Mendenhall et al., 2012), the effect of strategy used during action-related verb processing might be still a promising approach.

## **THE CASE OF NON-ACTION RELATED, NEGATIONS, AND PSEUDO-VERBS WORD PROCESSING**

The series of studies we have reviewed thus far clearly indicate that the activations in the sensorimotor areas, observed while participants are engaged in tasks involving non-action related words, and those observed while participants perform mental rotation of abstract stimuli (Kosslyn et al., 2001; Wraga et al., 2003) have a lot in common. Motor activity has been observed not only during action-related words processing, but also during reading imageable concrete words with no motor content (D'Esposito et al., 1997; Mellet et al., 1998; Pulvermuller and Hauk, 2006; Postle et al., 2008), "non-words" with regular phonology (Postle et al., 2008), and pseudo-verbs (Shapiro et al., 2005, see p. 1060; Tomasino et al., 2010b). It has been shown that non-motor related words and pseudo-verbs could activate (frontal) cortical areas to a similar extent as action-related verbs (see also Roder et al., 2002). Taken together these findings, in the measure in which they show that activation in sensorimotor areas is not selectively triggered by action-related word stimuli only, further weaken the bottom-up hypothesis which, on the contrary, speaks for a type of stimulus-dependent modulation of sensorimotor activation.

For instance, pseudo-verbs can activate motor areas, as it was shown in a fMRI study using a lexical decision task on positive and negative imperatives (Tomasino et al., 2010b). Importantly, these motor activations were not modulated by the linguistic context, in contrast to action-related verbs for which the motor activations were systematically modulated by positive and negative contexts. This result suggests that it is not the activation of the motor areas *per se* that allows distinguishing the effect of action verbs from that of pseudo-verbs, but rather the *systematic modulation* of the motor system activity by the linguistic context, which only occurs for action verbs. Importantly, similar unspecific activations of motor areas responses to "non-words" with regular phonology have been observed also in other studies (Hagoort et al., 1999; Postle et al., 2008).

Negations too have been found to both increase and decrease sensorimotor areas. Sentential negation has been argued to transiently reduce the access to mental representations of the negated information (Tettamanti et al., 2008). Indeed, it has been found that the activation in left fronto-parietal regions and the effective connectivity in concept-specific embodied systems are reduced in the case of action-related negative sentences (Tettamanti et al., 2008). Similarly, activations in the hand region of the primary motor and premotor cortices werefound to be reducedfor negative hand-action-related imperatives, such as "Don't grasp!" compared to "Grasp!" (Tomasino et al., 2010b). Interestingly, the PM was also found to be activated, rather than reduced, by negations in other two studies involving a sentence-picture verification task (Hasegawa et al., 2002). According to the two-step simulation hypothesis of negation processing (Kaup and Zwaan, 2003; Kaup et al., 2007, 2010), when the comprehender processes negations, she creates a simulation of the negated state of affairs, and a simulation of the actual state of affairs. Negation is implicitly encoded in the deviation between both simulations (Ludtke et al., 2008). Taken together these results indicate that negations activate the sensorimotor cortex depending on whether the strategy of simulating the corresponding content of the sentences has or

not been blocked. In Tomasino et al. (2010b), simulation was blocked by means of an experimental manipulation involving the use of imperatives known, if heard, to refrain the participants from performing the corresponding action. In a sentence–picture verification paradigm, they might be free to apply the two-step simulation strategy, leading to an activation of the sensorimotor areas. Negation processing thus constitutes a further piece of evidence of the top-down modulation of sensorimotor activations.

That motor representations are only engaged under specific conditions and their effects are context-dependent is also supported by studies in which idiomatic sentences or metaphors are used as stimuli. The activation of sensorimotor areas by metaphorical or idiomatic phrases – which convey abstract concepts embedded in concrete content – would support the theories that abstract concepts are understood through analogies to sensation and action (Lakoff and Johnson, 1980; Gibbs, 2006; Bergen, 2007). While Boulenger et al. (2009) found somatotopic activation for figurative and literal action sentences involving leg and arm verbs, other studies have yielded somewhat inconsistent results. For instance, Aziz-Zadeh et al. (2006) found a somatotopically organized activation in the PM cortex for literal action sentences, but not for idiomatic phrases, Raposo et al. (2009) too found an activation in the premotor/motor regions for isolated action verbs, and to a lesser extent for literal action sentences, but not for figurative sentences using action verbs. These findings lend support to cognitive theories of semantic flexibility, by showing that the nature of the semantic context determines the degree to which alternative senses and particularly relevant features are processed when a word is heard (Raposo et al., 2009).

The non-action related/abstract words are the last class of stimuli we will review here that, included in fMRI studies as a control condition, have been found to activate the sensorimotor areas. Embodied theories vary for the level of embodiment they assign to abstract concepts. The strong version of the embodied hypothesis holds that abstract concepts, just like concrete ones, are grounded in the sensorimotor system (Lakoff and Johnson, 1980; Glenberg et al., 2008). Others have proposed that abstract and action-related word processing reflects a continuum rather than a dichotomy (Scorolli et al., 2011) since in a rating study about concreteness judgments on large sets of words a bimodal distribution (according to features, such as tangibility or visibility of the items), was found (Nelson and Schreiber, 1992). Evidence in support of the stronger version of embodiment is shaky. In fact, abstract sentences (e.g., to give some news) may (Glenberg et al., 2008) or may not (Ruschemeyer et al., 2007) exactly activate motor information as concrete ones do (e.g., to give a pizza). By comparing simple action-related verbs [such as "greifen" (to grasp)] and complex abstract verbs [such as "begreifen" (to comprehend)],Ruschemeyer et al. (2007)showed that only the former, triggered activity in premotor areas. Similarly, Tettamanti et al. (2005) reported a selective activation of motor areas for concrete sentences containing a manipulable object as opposed to sentences containing abstract objects.

Here we propose that the activation of the sensorimotor areas in association with abstract stimuli is most likely due to the intervention of mental imagery. Implicit motor imagery is not uniquely used when a body-related verb stimulus is encountered, and might be defined as a strategy implicitly triggered in association to generic imageable words, and proved adequate for eliciting activity in motor areas (Postle et al., 2008). The selected strategy can be implicitly transferred from one stimulus to another. In Wraga et al.'s (2003) study, while one group of participants saw a MR of hands block followed by a MR of 3D cubes block, a different group saw two sets of MR of 3D cubes blocks. They found that the left M1 cortex, the left insula, and the PM area bilaterally were selectively activated in participants who performed the MR of hand shapes before the MR of 3D cubes. By contrast, the right superior parietal lobe and the right occipito-temporal junction were enhanced in participants who performed only the MR of 3D cubes. The authors concluded that the motor strategy can covertly be transferred to the imagined transformations of stimuli other than body parts such as abstract ones. In a recent fMRI study, in which a similar implicit transfer of strategies paradigm was applied to motor and non-motor related verbs processing (Papeo et al., 2012), it was examined whether motor strategies adopted during a motor imagery task creates a cognitive context that would be implicitly transferred to a subsequent linguistic task. Participants performed a mental rotation block of either motor or visuospatial strategy, randomly presented before each block of silent reading of verbs describing hand actions or physical/psychological states. Irrespective of the verb category*,* reading following a mental rotation block of motor strategy, compared to reading following a mental rotation block of visuospatial strategy, increased activity in left primary motor cortex, bilateral PM and right somatosensory cortex. Thus, the cognitive context induced by the preceding motor strategy-based mental rotation modulated word-related sensorimotor responses. In a recent TMS study of the left M1 cortex (Scorolli et al., 2012; non-idiomatic), phrases composed by abstract or concrete verbs combined with abstract or concrete nouns (AA, CA, AC, CC) have been used. The authors found an early motor activation with concrete verbs and a delayed one with abstract verbs. This result first confirms the view that abstract words (verbs) also activate the motor system related to manual action. In addition, as to the delayed activation, authors argue that it is likely that the effort to process abstract words in the premotor cortex or other secondary areas is higher and therefore determines a stronger modulatory influence on M1.

With respect to the possible transfer of strategy account, as in this paradigm the context is induced by both action-related or non-action related verbs, with combinations of abstract verbs plus (abstract or concrete) nouns, the putative effect of transfer would be attenuated. Nevertheless, one cannot exclude that a preceding block, in which concrete verbs and concrete nouns were combined (e.g., grasp a pen), might have favored the transfer of a (motor) strategy effect on the subsequent block of concrete verb plus abstract noun, e.g., grasp an idea; or that a preceding block in which abstract verbs and concrete nouns are combined (e.g., suspect a pen), might have prompted a transfer of (motor) strategy effect, in this case triggered by the noun, on the subsequent block of abstract verb plus abstract noun, e.g., suspect freedom (i.e., non-sensible phrases). The results indeed showed greater MEPs amplitude for non-sensible phrases containing concrete verbs followed by abstract nouns).

Furthermore, as the timing of TMS is known to modulate action word processing (Papeo et al., 2009), one cannot exclude that an interaction between a putative transfer of strategies effect and stimulation time occurred in Scorolli et al.'s study. Showing that words with an abstract content can too enhance the sensorimotor areas activation strongly implies that the type of stimulus does not automatically trigger motor simulation as the embodied hypothesis would predict.

## **CONCLUSION**

To wrap up, in the case of both mental rotation and action word processing, motor simulation is not automatically triggered by the type of stimulus but by the type of strategy. We then argued that the type of strategy selected depends on top-down modulation such as the context and tasks demands. We also argued that whether the sensorimotor cortex is or it is not activated is determined by the type of strategy selected in word processing. Thus, the motor simulation is neither automatic nor necessary to language understanding. The top-down hypothesis instead holds that motor activation is not automatically triggered by the type of stimulus but by the type of strategy. Also the embodied cognition hypothesis, as it claims that "understanding" *is* sensory and motor simulation, is not compatible with the view that the type of strategy selected depends on top-down modulation of the context and tasks demands. Rather the top-down hypothesis is in line with the disembodied view the motor system may be activated but not necessarily so (Mahon and Caramazza, 2005, 2008).

Our view is consistent with the notion of flexibility in language representation whereby the degree to which a modality-specific region contributes to a representation depends on the context (Hoenig et al., 2008; van Dam et al., 2010b, 2012) in which conceptual features are retrieved. Flexibility is characterized by the relative presence or absence of activation in motor and perceptual brain areas. The key idea is that words are associated with more than one experiential feature; accordingly, word processing could be modified by encouraging participants to focus on one propriety vs. another. We also add that this top-down modulation might exert its influence also in selecting the type of strategy adopted while processing language. Our preferred view is that, as it happens in the mental rotation domain, neither the type of stimulus nor the type of task seems to automatically trigger M1 activation. Rather we propose that different strategies will cause participants to lean on different sorts of sensorimotor representations. According to this view M1 activation depends on whether or not subjects choose motor imagery (explicitly or automatically) as a strategy to solve the task requirements. The subjects' free choice in task settings may explain why M1 is not always activated in the fMRI studies involving action word stimuli. Particularly relevant here is the result that neural activity in M1 cortex areas 4a and 4p seems to be differentially modulated by attention to action (Binkofski et al., 2002).Accordingly,it has been suggested that the lack of M1 activation might be due to subjects not explicitly attending to the motor attributes of the words, thus raising the possibility that motor cortex modulation may occur only when participants directly attend to the actions and their motor properties. Lastly, this view is in accordance with studies suggesting that a crucial factor for observing activity in motor and premotor regions during action word processing seems to be that the context in which the word is presented supports a motor interpretation and that the word form as a whole conveys a motor meaning (van Dam et al., 2012).

## **REFERENCES**


embodied cognition: semantic generation to action-related stimuli. *Front. Hum. Neurosci.* 6:84. doi:10.3389/fnhum.2012.00084


Tomasino and Rumiati Top-down effects of strategies use

do things with words": role of motor cortex in semantic representation of action words. *Neuropsychologia* 50, 3403–3409.


*Mind and Its Challenge to Western Thought*. New York: Basic Books.


in spatial processing. *Cortex* 27, 153–167.


demands: a functional magnetic resonance imaging study. *Neuroimage* 15, 1003–1014.


Tettamanti, M., Manenti, R., la Rosa, P. A., Falini, A., Perani, D., Cappa, S. F., et al. (2008). Negation in the brain: modulating action representations. *Neuroimage* 43, 358–367.

Tomasino, B., Borroni, P., Isaja, A., and Rumiati, R. I. (2005). The role of the primary motor cortex in mental rotation: a TMS study. *Cogn. Neuropsychol.* 22, 348–363.

Tomasino, B., Ceschia, M., Fabbro, F., and Skrap,M. (2012). Motor simulation during action word processing in neurosurgical patients. *J. Cogn. Neurosci.* 24, 736–748.


Tomasino, B., Weiss, P. H., and Fink, G. R. (2010b). To move or not to move: imperatives modulate action-related verb processing in the motor system. *Neuroscience* 169, 246–258.


of speech production. *Nature* 423, 866–869.


imagery. *J. Cogn. Neurosci.* 22, 2387–2400.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2012; accepted: 10 January 2013; published online: 04 February 2013.*

*Citation: Tomasino B and Rumiati RI (2013) At the mercy of strategies: the role of motor representations in language understanding. Front. Psychology 4:27. doi: 10.3389/fpsyg.2013.00027*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Tomasino and Rumiati. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Abstract spatial concept priming dynamically influences real-world actions

## **Sarah M. Tower-Richardi 1,2,Tad T. Brunyé1,2\*, Stephanie A. Gagnon1,2,3, Caroline R. Mahoney 1,2 and Holly A. Taylor <sup>2</sup>**

<sup>1</sup> Cognitive Science Team, U.S. Army NSRDEC, Natick, MA, USA

<sup>2</sup> Department of Psychology, Tufts University, Medford, MA, USA

<sup>3</sup> Department of Psychology, Stanford University, Stanford, CA, USA

#### **Edited by:**

Louise Connell, University of Manchester, UK

#### **Reviewed by:**

Olivier Collignon, Université de Montréal, Canada Sarah Anderson, University of Cincinnati, USA

#### **\*Correspondence:**

Tad T. Brunyé, Department of Psychology, Tufts University, 490 Boston Avenue, Medford, MA 02155, USA.

e-mail: tbrunye@alumni.tufts.edu

Experienced regularities in our perceptions and actions play important roles in grounding abstract concepts such as social status, time, and emotion. Might we similarly ground abstract spatial concepts in more experienced-based domains? The present experiment explores this possibility by implicitly priming abstract spatial terms (north, south, east, west) and then measuring participants' hand movement trajectories while they respond to a body-referenced spatial target (up, down, left, right) in a verbal (Exp. 1) or spatial (Exp. 2) format. Results from two experiments demonstrate temporally dynamic and prime biased movement trajectories when the primes are incongruent with the targets (e.g., north – left, west – up). That is, priming abstract coordinate directions influences subsequent actions in response to concrete target directions. These findings provide the first evidence that abstract concepts of world-centered coordinate axes are implicitly understood in the context of concrete body-referenced axes; critically, this abstract-concrete relationship manifests in motor movements, and may have implications for spatial memory organization.

**Keywords: spatial cognition, embodied cognition, masked priming, mouse tracking, abstract concepts**

## **INTRODUCTION**

Spatial thinking is a fundamental part of our daily routines; we rely on spatial memory to navigate around our homes and to work, and we constantly acquire novel spatial experiences as we move through our environment. Traditional theories largely assume that our mental representations of space are tightly bound to specific experiences. For instance we can mentally represent ground-level views after navigating on foot, and we remember environmental structure after viewing a map (Tolman, 1948; Mou and McNamara, 2002;Jeffery and Burgess, 2006). More recent work, however, has demonstrated that memories of environments are flexible, malleable, and highly vulnerable to factors both internal (e.g., experience level, goals, handedness, spatial skills, preferences, and heuristics) and external (e.g., environment complexity and density) to an individual (i.e., Taylor et al., 1999;Waller, 2000; Hegarty et al., 2006; Brunyé and Taylor, 2009; Gyselinck et al., 2009; Brunyé et al., 2010a, 2012b). In this context, memories for environments experienced from a first-person (i.e., *egocentric*) perspective may also integrate *allocentric* spatial knowledge that is more abstract in nature, relying on fixed, self-removed coordinate terms.

It is unclear how exactly people might represent such abstract world-centered reference systems in spatial memory. A growing body of research suggests that understanding abstract concepts involves making connections to experience-based domains (Boroditsky and Prinz, 2008). For instance, people rely upon the concrete horizontal spatial axis when thinking about time (Boroditsky and Ramscar, 2002), the vertical spatial axis when thinking about social status (Piaget, 1927/1969; Tversky et al., 1991; Casasanto and Lozano, 2006; Gagnon et al., 2011), and both

horizontal and vertical spatial axes when thinking about affective valence (Meier and Robinson, 2004; Casasanto and Chrysikou, 2011). In each of these cases, people appear to use concrete perceptual, sensory, and interoceptive experiences with the world to structure abstract thought.

The notion that people integrate bodily experiences into mental representations is foundational to theories of embodied cognition (Lakoff and Johnson, 1999; Barsalou, 2008). Strong behavioral and neural evidence exists in favor of these theories; for instance, reading about an object primes the actions typically performed upon the object (Borghi et al., 2004), and reading action-related words biases subsequent hand movements (Glenberg and Kaschak, 2002; Richardson et al., 2003) and spontaneously activates motorically relevant areas of the pre-motor cortex (Kemmerer et al., 2008). Recent work further suggests that processing abstract language engages perceptual and action-based representations; for instance, abstract indications of movement direction (e.g., *delegating a responsibility*) facilitate subsequent movements of the hand in that direction (e.g., *away from self*; Glenberg et al., 2008b). The present study asks whether people might similarly use these types of experience-based representations to aid in conceptualizing abstract spatial concepts.

In contrast to abstract, intangible notions (e.g., *time, future*, *power*), people can directly perceive and act upon concrete space (e.g., *metric distance, a mountain*). As such, our interactions with the environment seem directly amenable to spatial representations that integrate perception and action. Indeed some recent work suggests that this is the case (Brunyé et al., 2010b; Wang et al., 2012). However, we also know that space can be thought about by

using *abstract* concepts of world-centered reference systems, such as seen with canonical coordinate space (north, south, east, west). Even though people cannot directly perceive these directions, they can monitor them during navigation (Mark, 1989), and use them to shape mental representations of space (Tversky, 1993; Dabbs et al.,1998;Golledge,1999). The coordinate terms *north, south, east* and *west* are completely abstracted from a body-referenced system and cannot be directly perceived in concrete space. In contrast, people can directly perceive *concrete* body-referenced directions along the *x, y,* and *z* planes; for instance, to the left or right along the mediolateral axis, and up or down relative to the dorsoventral axis. In spite of these differences, both abstract and concrete spatial concepts have been implicated in guiding and constraining spatial thought. Indeed, body-referenced directions are foundational to spatial language and communication, navigation, and the mental representation of large-scale space (Taylor and Tversky, 1996; Mou et al., 2004; Feist and Gentner, 2007). And even though people cannot directly perceive abstract coordinate directions, they can monitor them during navigation (Mark, 1989), and use them to shape mental representations of space (Tversky, 1993; Dabbs et al., 1998; Golledge, 1999).

The use of abstract spatial coordinate terms is critical for the transitionfrom purely egocentric knowledge to the construction of allocentric representations (Hart and Moore, 1973). The ability to structure first-person experience using a fixed (i.e., self-abstracted) referenceframe tends to develop between the ages of 5 and 11 (Herman and Siegel, 1978), and it is a difficult task, influenced by both spatial abilities and gender. For instance, those with generally high spatial abilities rely more upon coordinates during navigation; in contrast, those with lower spatial abilities rely more upon local landmarks and often find it difficult to use abstract spatial coordinate concepts when thinking about environments (for a review, see Wolbers and Hegarty, 2010). However, coding abstract coordinate directions in concrete space may provide a framework through which abstract spatial understanding can more easily manifest. In other words, binding self-relevant spatial references (e.g., *to the left*) to inherently self-abstracted coordinate directions (e.g., *west*) might facilitate transitions from egocentric to allocentric representational forms.

Along these lines, recent research has identified several peculiar tendencies when people attend to the north and south directions, which suggest that these coordinate directions are linked with some form of concrete space. For instance, people implicitly associate the north versus south with higher topography (Brunyé et al., in press) and social status (Gagnon et al., 2011). Further, there is consistent evidence, on both regional and international levels, that route planners tend to avoid routes that go initially northward versus southward, perhaps with the intention of avoiding more difficult locomotion (Brunyé et al., 2010a, 2012a). While the exact source of these effects remains unknown, it appears that people associate abstract concepts of coordinate space with concrete spatial axes. More specifically, people conceptualize north as up and south as down along the concrete vertical dimension (a concept first proposed, but not tested, by Shepard and Hurwitz, 1984). This association likely stems from the conventional orientation of north as up on maps; consistently viewing maps in this way may result in associating north with up, south as down, and east and west

as right and left, respectively, relative to the self (see Brown and Levinson, 1993 for alternative hypotheses that might be applied to Tzeltal speakers). Grounding abstract concepts of coordinate space might allow people to transfer knowledge from experiential spatial domains in an effort to understand an otherwise intangible concept.

To test this possible association, we conducted two experiments using a masked priming paradigm designed to implicitly activate semantic concepts without conscious awareness, by measuring the effects of prime type (i.e., abstract coordinate terms) on dynamic motor responses to concrete target directions (i.e., concrete spatial terms or arrows). To track motor responses, we tracked mouse movements toward target directions using the freely availably Mouse Tracker software (Freeman and Ambady, 2010). Mouse Tracker records real time x and y mouse coordinates at an approximate 60–75 Hz sampling rate. Tracking mouse trajectories allows for examining the continuous temporal (i.e., when) and spatial (i.e., in which direction) dynamics of the comprehension process as it unfolds (Spivey et al., 2005; Dale et al., 2007). Tracking the kinematics of a response can be used to assess movement dynamics that hold potential for exposing cognitive operations that would otherwise be unexposed using traditional behavioral measures (cf., Abrams and Balota, 1991; Balota and Abrams, 1995; Spivey and Dale, 2004, 2006; Magnuson, 2005). Directly related to the present topic, work using mouse tracking demonstrates its utility in indexing the influence of metaphor processing on motor actions: when participants processed information regarding the past or future, their mouse movements were drawn toward the left or right, respectively (Miles et al., 2010a,b). Likewise this type of online measure may provide unique insights into the spatial and temporal nature of abstract-concrete spatial concept interactions.

The present work extends the current literature by examining three characteristics of the apparent link between abstract coordinate space and concrete vertical space. First, whereas prior research has identified an apparent link between north/south and up/down (Brunyé et al., 2010a, 2012a, in press), we propose that people may similarly represent east and west as to the right and left (respectively) with respect to the egocentric left-right axis. Second, we propose that these associations between abstract and concrete spatial concepts will alter movement trajectories when people are primed with abstract concepts and attempt to make hand movements that are either congruent or incongruent with the primed direction. Finally, we examine the temporal dynamics of any effects of abstract concept activation on motor movements. Our first experiment examined these issues by combining verbal primes (e.g., NORTH) with verbal targets (e.g., UP), and our second experiment tested whether our results would maintain with non-verbal targets (i.e., directional arrows) in an effort to rule out the possibility that lexical associations alone between primes and targets were driving our effects. Together, we provide the first demonstration that abstract spatial concept understanding is grounded in the perceptual motor system.

## **EXPERIMENT 1**

#### **PARTICIPANTS AND DESIGN**

One hundred Tufts University undergraduate students participated for monetary compensation. Informed consent was obtained from all participants in accordance with the Tufts University Institutional Review Board. All self-reported as right handed (using the Edinburg Handedness Inventory; Oldfield, 1971) and native English speaking. Given earlier work demonstrating that approximately 25–45% of participants report being prime-aware at similar prime durations (i.e., seeing a 43–45 ms prime; Bodner and Masson, 2003; Bodner and Dypvik, 2005), during debriefing we explicitly asked participants whether this was the case; 33 participants reported noticing at least one directional prime. Data from these prime-aware participants (*M*age = 19.2; 13 male, 20 female) were removed, leaving 67 valid data sets for analysis (*M*age = 19.9; 25 male,42female). Note that whereas shorter prime durations may reduce the number of prime-aware participants, they may not consistently achieve a semantic level of analysis (e.g., 25–40 ms; Holcomb et al., 2005; Klauer et al., 2007).

We used a masked priming procedure with a 4 (Prime Type: North, South, East, West, Center, Non-word) × 4 (Target Direction: Up, Down, Right, Left) within-participants design. Masked priming involves presenting words for such a brief duration that they activate cognitive processes without conscious awareness (Marcel, 1983;Cheesman and Merikle, 1984). In most cases, masks are used to flank (both prior to and after) the presentation of a prime with nonsense letter strings intended to minimize the chance that participants notice the presence of a prime or discriminate it from the nonsense letters (Dehaene et al., 1998; Lee et al., 1999). In the current study, we chose a masked priming manipulation to reduce the task demands characteristic of studies examining perceptuo-motor traces in memory; the greater the awareness of overlap between cueing and target stimuli, the more difficult it becomes to show strong evidence for spontaneous perceptuo-motor involvement in guiding human behavior (cf., Machery, 2007; Mahon and Caramazza, 2008; van Dantzig et al., 2008; Ditman et al., 2010).

We recorded mouse initiation times, response times, and movement trajectories over time using the freely available *Mouse Tracker* software (Freeman and Ambady, 2010).

## **MATERIALS**

#### **Primes and targets**

We used six prime types corresponding to: the four coordinate prime directions (NORTH, SOUTH, EAST, WEST), one central control (CENTER), and a non-word control. A total of 36 nonword controls were generated using the ARC Non-word Database<sup>1</sup> , with each non-word ranging from 5 to 12 characters. A total of 100 forward and backward masks were generated using a Random Letter Sequence Generator<sup>2</sup> , with each sequence consisting of 12 letters (e.g., RVmoFcZNaDDu). Four target words were used (UP, DOWN, RIGHT, LEFT), referring to each of the four target locations.

#### **Target array configuration**

Using the Mouse Tracker software, we created an array of four black rectangular target boxes arranged along the horizontal and vertical axes on a 22<sup>00</sup> LCD monitor running at 1920 × 1200 resolution with a 70 Hz refresh rate (see **Figure 1**). At center, a START

pixels from the START button. In standardized coordinate space (see *Data Scoring* ), targets are centered at positions corresponding to −0.6 and 0.6 along the *x*-axis, and 0.15 and 1.35 along the *y*-axis. **PROCEDURE**

Participants were instructed to "move the mouse as quickly as possible to the box that correctly corresponds with the word presented on the screen." In a brief practice session, each participant was exposed to a series of 24 trials consisting of masked non-word primes, with six trials for each of the target rectangle locations (UP, DOWN, RIGHT, LEFT). At the beginning of each trial, the START button appeared at screen center. Upon left-clicking on the button with the mouse, the mouse cursor disappeared and the forward mask was presented for 71 ms, the prime for 43 ms, the backward mask for 71 ms, and then finally the target word (see **Figure 1**). The prime duration was selected based on work by Dehaene et al. (1998), which indicated that a 43 ms masked prime activated a semantic level of analysis as evidenced by both electrical brain activity (via event-related potentials) and hemodynamic response (via functional magnetic resonance imaging); specifically, Dehaene and colleagues found evidence that brain activation in response to masked word priming is not restricted to brain areas involved in sensory processing, but rather activated a range of brain mechanisms involved in perception, semantic categorization, and motor task preparation. Given these results, we expected that a 43 ms masked prime duration would be sufficient to activate semantic meaning of our coordinate primes (NORTH, SOUTH, EAST,WEST) without participants' conscious awareness.

button was located in a gray rectangle. Each rectangular target was sized at approximately 15% of the corresponding monitor dimension (i.e., 290 w × 170 h pixels), and centered at a location 475

Once the target word was presented, the mouse cursor became visible and active and was centered on the START box; the participant then moved the mouse cursor to the target box and clicked the left mouse button. Participants always responded using their dominant (right) hand with a conventional two-button (plus scroll wheel) optical computer mouse positioned flat on the table ahead of and slightly to the right of the computer monitor. Following Freeman and Ambady (2010), participants were instructed if they either did not begin moving the mouse cursor within the first second it appeared after the target word was presented ("Please start moving earlier on, even if you are not fully certain of a response yet" was displayed at the end of the trial), if they did not respond within 4 s ("Over time!" was presented in red at the center of the screen and the trial was ended), or if an incorrect response was made an X appeared in red at the center of the screen.

Following practice, participants began the main experiment. Participants were presented with 204 trials consisting of 132 directionally primed trials (33 each of NORTH, SOUTH, EAST,WEST), 36 trials using the CENTER prime, and 36 trials using the control non-word prime. Each set of trials was divided amongst the four target directions (UP, DOWN, RIGHT, LEFT), and the 204 trials were presented in random order.

## **RESULTS**

#### **DATA SCORING**

The Mouse Tracker software samples x and y mouse cursor position every 13–16 ms from the point the mouse becomes active

<sup>1</sup>http://www.maccs.mq.edu.au/∼nwdb/nwdb.html

<sup>2</sup>http://www.dave-reed.com/Nifty/randSeq.html

(target word presentation) to the response click. Incorrect trials are removed from further analysis, and all data undergo outlier trimming at 2.5 SD. Raw data from correct trials are rescaled to standardized coordinate space (*y*-axis range 0–1.5, *x*-axis range−1 to 1) and normalized over time using a linear interpolation process that results in 101 time steps (for more on this process, see: Spivey et al., 2005; Dale et al., 2007; Freeman et al., 2008, 2010; Freeman and Ambady, 2009, 2010). For each participant, we then averaged normalized data for each trial comprising each of our conditions.

Prior studies using mouse tracking have used several measures to quantify the differences in mouse trajectories relative to both optimal (vector-based) trajectories and across experimental conditions. These measures commonly include movement initiation time, movement duration, maximum deviation (MD), and area under the curve (AUC). Movement initiation time is the time (in ms) from the mouse becoming active to the participant beginning to move the mouse. Movement duration is the time (in ms) from the participant beginning mouse movement to clicking in the target region. MD is the peak amplitude of the movement trajectory relative to the optimal trajectory, and AUC is the area between the movement trajectory relative to the optimal trajectory. In general, MD and AUC are highly correlated and do not lead to different results (Freeman et al., 2008). In the present work, we analyze movement initiation time, movement duration, and to maintain compatibility with the extant literature we report MD (Freeman and Ambady, 2009; Miles et al., 2010a; Martens et al., 2012).

Thus, for each target direction (up, down, right, left) we plotted averaged and normalized mouse trajectories over time corresponding to each Prime Type.

#### **ANALYSES**

We plotted data and conducted repeated-measures analyses of variance (ANOVAs) separately for each target direction (up, down, left, right). Movement duration data, along with statistical test

results, are detailed in **Table 1**. Spatially and temporally normalized data are depicted in **Figures 2** and **3** for each Prime Type. Finally, we provide MD data, along with statistical test results, also in **Table 1**. Note that follow-up analyses showed no main or interactive effects of participant gender (all *p*'*s* > 0.32).

### **UP TARGET**

#### **Up target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.75; **Figure 2A**).

An ANOVA on movement duration times showed a marginal effect of Prime Type, *F*(5, 330) = 2.02, *p* = 0.07, η *<sup>2</sup>* = 0.03, with shortest movement durations following a North prime, and longest movement durations following an East or West prime. See **Table 1** for results from paired tests comparing the North (congruent) to each of the five other primes.

#### **Up target movement trajectory data**

We calculated MD, or the peak amplitude of each movement trajectory relative to the optimal trajectory (vector from start to target); note that downward and leftward deviations are negativegoing. These data were entered into an ANOVA, which revealed a marginal effect of Prime Type, *F*(5, 330) = 2.53, *p* < 0.05, η *<sup>2</sup>* = 0.04. As depicted in **Figure 2A** and detailed in **Table 1**, there was higher MD in the East prime condition relative to any other condition, and lower MD in the West prime condition relative to any other condition. In other words, the East prime biased Up target movement trajectories to the right, and West primes biased movement to the left.

#### **DOWN TARGET**

#### **Down target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.99; **Figure 2B**).


**Table 1 | Experiment 1 mean and standard error movement duration and maximum deviation (MD) data for each of the six prime types and four target types.**

For each target type, we provide results from paired t-test comparing the congruent prime (e.g., north prime, up target; presented in **bold** print) versus each of the other 5 prime types (\*\*p < 0.01, \*p < 0.05, <sup>m</sup>p < 0.10).

An ANOVA on movement duration times showed an effect of Prime Type, *F*(5, 330) = 3.39, *p* < 0.01, η *<sup>2</sup>* = 0.05, with shortest movement durations following a South prime, and longest movement durations following an East or West prime. See **Table 1** for results from paired tests comparing the South (congruent) to each of the five other primes.

#### **Down target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 330) = 3.85, *p* < 0.01, η *<sup>2</sup>* = 0.05. As depicted in **Figure 2B** and detailed in **Table 1**, there was higher MD in the East prime condition relative to any other condition, and lower MD in the West prime condition relative to any other condition. In other words, as also seen in the Up target condition, the East prime biased movement trajectories to the right, and West biased them to the left.

#### **LEFT TARGET**

#### **Left target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.71; **Figure 3A**).

An ANOVA on movement duration times trended toward an effect of Prime Type, *F*(5, 330) = 1.5, *p* = 0.19, η *<sup>2</sup>* = 0.02, but did not reach significance. Numerically, there were shortest movement durations following a West prime, and longest movement durations following a North or South prime. See **Table 1** for results from paired tests comparing the West (congruent) to each of the five other primes.

#### **Left target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 330) = 4.19, *p* < 0.01, η *<sup>2</sup>* = 0.06. As depicted in **Figure 3A** and detailed in **Table 1**, the highest MD occurred in the North prime condition, and lowest MD in the South prime condition. In other words, the North prime biased movement trajectories upward, and South biased them downward.

#### **RIGHT TARGET**

#### **Right target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.82; **Figure 3B**).

An ANOVA on movement duration times showed an effect of Prime Type, *F*(5, 330) = 2.91, *p* < 0.05, η *<sup>2</sup>* = 0.04, with shortest movement durations following an East prime, and longest movement durations following a North or South prime. See **Table 1** for results from paired tests comparing the East (congruent) to each of the five other primes.

#### **Right target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 330) = 3.66, *p* < 0.01, η *<sup>2</sup>* = 0.05. As depicted in **Figure 3B** and detailed in **Table 1**, the highest MD occurred in the North prime condition, and lowest MD in the South prime condition. In other words, the North prime biased movement trajectories upward, and South biased them downward.

#### **EXPERIMENT 1 DISCUSSION**

Results demonstrate dynamic and directionally specific effects of abstract primes on movement trajectories toward concrete target directions. However, it is unclear whether these effects are restricted to conditions of using linguistic primes and targets; in other words, might low-level lexical associations between primes and targets (e.g., NORTH → UP) be responsible for the present effects? We conducted a control experiment to test this possibility.

#### **CONTROL EXPERIMENT**

In this study,we replaced target words (UP,DOWN, LEFT,RIGHT) with arrows that pointed toward one of the four target directions. If our results are due to a metaphorical mapping between primed abstract coordinate directions and concrete target directions, and this effect exists above and beyond any simple lexical associations, then results should indicate similar movement trajectory biases to those found in Experiment 1.

#### **PARTICIPANTS AND DESIGN**

Fifty-nine Tufts University undergraduate students participated for monetary compensation, all right handed and native English speaking. Data from 9 prime-aware participants (*M*age = 20.4; three male, six female) were removed from further analysis, leaving 50 valid data sets for analysis (*M*age = 21.5; 15 male, 35 female).

of arrow, are increased for figure legibility.

The design matched that used in Experiment 1, with the 4 (Prime Type: North, South, East, West, Center, Non-word) × 4 (Target Direction: Up, Down, Right, Left) within-participants design.

#### **MATERIALS AND PROCEDURE**

All materials and procedures matched those of Experiment 1, with one exception: Rather than using target words, we used arrows that pointed in the direction of each of the four target locations (see **Figure 4**). Arrows were consistently sized (112 w × 57 h pixels) and rotated 90˚ to correspond to each of the four directions (UP, DOWN, LEFT, RIGHT).

#### **RESULTS**

#### **DATA SCORING AND ANALYSES**

All data scoring and analyses matched those used in Experiment 1. As before, follow-up analyses showed no main or interactive effects of participant gender (all *p*'*s* > 0.27).

#### **UP TARGET**

#### **Up target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* = 1.7, *p* = 0.13; **Figure 5A**).

An ANOVA on movement duration times showed a marginal effect of Prime Type, *F*(5, 245) = 1.96, *p* = 0.09, η *<sup>2</sup>* = 0.02, with shortest movement durations following a North prime, and longest movement durations following an East or West prime. See **Table 2** for results from paired tests comparing the North (congruent) to each of the five other primes.

#### **Up target movement trajectory data**

Maximum deviation data were entered into an ANOVA, which revealed a main effect of Prime Type, *F*(5, 245) = 2.34, *p* < 0.05, η *<sup>2</sup>* = 0.05. As depicted in **Figure 5A** and detailed in **Table 2**, there

was highest MD in the East prime condition and lowest MD in the West prime condition. In other words, the East prime biased Up target movement trajectories generally to the right, and West primes biased movement generally to the left.

#### **DOWN TARGET**

#### **Down target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.95; **Figure 5B**).

An ANOVA on movement duration times showed a marginal effect of Prime Type, *F*(5, 245) = 1.84, *p* = 0.10, η *<sup>2</sup>* = 0.04, with shortest movement durations following a South, North or Center prime, and longest movement durations following an East or West prime. See **Table 2** for results from paired tests comparing the South (congruent) to each of the five other primes.

#### **Down target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 245) = 2.71, *p* < 0.05, η *<sup>2</sup>* = 0.05. As depicted in **Figure 5B** and detailed in **Table 2**, there was highest MD in the East prime condition and lowest MD in the West prime condition. In other words, as also seen in the Up target condition, the East prime biased movement trajectories generally to the right, andWest biased them generally to the left.

#### **LEFT TARGET**

#### **Left target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* < 1, *p* = 0.83; **Figure 6A**).

An ANOVA on movement duration times revealed a marginal effect of Prime Type, *F*(5, 245) = 2.01, *p* = 0.08, η *<sup>2</sup>* = 0.04, with longest movement durations following a North or South prime. See **Table 2** for results from paired tests comparing the West (congruent) to each of the five other primes.

#### **Table 2 | Experiment 2 mean and standard error movement duration and maximum deviation (MD) data for each of the six prime types and four target types.**


Bold text indicates prime-target congruence. Superscript text indicates statistical significance: <sup>m</sup>p < 0.10, \*p < 0.05, \*\*p < 0.01.

#### **Left target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 245) = 2.75, *p* < 0.05, η *<sup>2</sup>* = 0.05. As depicted in **Figure 6A** and detailed in **Table 2**, the highest MD occurred in the North prime condition and lowest MD in the South prime condition. In other words, the North prime biased movement trajectories generally upward, and South biased them generally downward.

#### **RIGHT TARGET**

#### **Right target mouse initiation time and movement duration**

An ANOVA on mouse initiation times showed no effect of Prime Type (*F* = 1.6, *p* = 0.16; **Figure 6B**).

An ANOVA on movement duration times showed an effect of Prime Type, *F*(5, 245) = 2.83, *p* < 0.05, η *<sup>2</sup>* = 0.05, with longest movement durations following a North or South prime. See **Table 2** for results from paired tests comparing the East (congruent) to each of the five other primes.

#### **Right target movement trajectory data**

An ANOVA on MD data revealed a main effect of Prime Type, *F*(5, 245) = 3.01, *p* = 0.01, η *<sup>2</sup>* = 0.06. As depicted in **Figure 6B** and detailed in **Table 2**, the highest MD occurred in the North prime condition and lowest MD occurred in the South prime condition. In other words, the North prime biased movement trajectories generally upward, and South biased them generally downward.

## **DISCUSSION**

When implicitly primed with abstract coordinate directions (north, south, east, west), participants revealed several biases toward the prime-congruent spatial directions when making seemingly simple mouse movements toward target locations. Specifically, mouse trajectories showed consistent and statistically robust movement biases that demonstrate an attraction toward primed abstract directions. When participants were tasked with making movements to the right or left, priming with the abstract spatial terms "north" or "south" biased trajectories upward and downward, respectively. Similarly, when moving up or down, priming with the abstract spatial terms "east" or "west" biased trajectories rightward and leftward, respectively. Experiment 2

demonstrated that these effects exist above and beyond any influence of relatively low-level lexical associations between primed and target words. Together, these results provide the first evidence that people ground abstract spatial concepts in perceptuo-motor systems. This grounding mechanism is evidenced through biased movement trajectories that occur even when abstract concepts are activated outside of participants' awareness.

These results are consistent with research demonstrating that abstract concepts are frequently understood through metaphorical mappings to concrete experienced space (Barsalou, 1999; Boroditsky and Prinz, 2008; Miles et al., 2010a; Gagnon et al., 2011). Presently, it seems that people associate abstract spatial concepts with concrete spatial experience, and these associations are grounded in body-referenced axes. These body-referenced axes extend left-right along the mediolateral axis, and up-down along the vertical axis.

The associations between abstract and concrete space likely arise through simple correlational learning (Hebb, 1949); daily experiences viewing maps, atlases, and even navigation devices consistently associate north with the upward direction (and south with down). These experiences give rise to associations between east and right, and west and left that seem to exist even outside of participants' awareness. Indeed, free-association norms (Nelson et al., 1998) show very weak associations between abstract and concrete spatial terms, and implicit (but not explicit) associations between north and upward space predict route planning biases toward the south (Brunyé et al., in press). We also note, however, that mouse tracking is limited with regard to determining whether north and south are represented along the forward-backward (anteroposterior) versus up/down body axes; moving the mouse forward/backward results in translational movement on the computer monitor along the up/down axis. Thus, there is some conflict between the motoric and perceptual representations of mouse movement along the *y*axis. Though we have not tested between these axes, we expect that grounding abstract spatial concepts occurs along both forward/backward and up/down body-referenced axes; indeed some early work suggests that people may associate north with the forward egocentric direction (Shepard and Hurwitz, 1984), and our own work suggests associations between north and vertical perceived space (e.g., spatial topography; Gagnon et al., 2011).

The ability to track the online dynamics of mouse movement trajectories shows promise in revealing otherwise hidden cognitive operations (Spivey and Dale, 2006; Miles et al., 2010a). In the current study, response time patterns did not reveal consistent evidence for priming effects; for example, response times for moving the mouse to the left were not significantly affected by any prime type. Only when examining movement dynamics were we able to identify consistent evidence that abstract spatial concepts may be grounded in concrete representations of body-referenced space. Thus, relying exclusively on more traditional response measures does not always reveal the true dynamics of information processing as it unfolds over time. In this manner, movement trajectories are proving valuable in examining the spatial component of mental activity (Oliveri et al., 2009). The present data suggest a rather specific time course for abstract concept activations influencing movement trajectories.

In addition, mouse initiation times were not affected by spatial primes, suggesting relatively delayed onset of abstract concept priming effects on movement trajectories. It could be the case that motor movement trajectories were altered through a cascading activation mechanism that activates abstract conceptual content which then, in turn, influences the comprehension of subsequent content (Mahon and Caramazza, 2008). This type of effect would be congruent with work demonstrating that the motor system is activated approximately 200 ms following the presentation of a body-relevant action word. In the present design, if primed concepts are only beginning to activate the motor system by the time our target words are presented (71 ms after the prime), behavior becomes biased by the prime only after the onset of target-directed movement.

The present results speak to strong relationships between the ways in which we perceive and interact with our environment on a daily basis and the ways in which we process and represent abstract information. The idea that abstract concepts are bound to experience is foundational to theories of embodied cognition (Lakoff and Johnson, 1999; Barsalou, 2008), which posit that mental representations of both concrete and abstract concepts often reflect specific regularities in the way we perceive and interact with those concepts in the world (cf., Miles et al., 2010a). Through this theorized mechanism, people process abstract, intangible concepts through real-world perceptions and actions. In many cases this grounding mechanism facilitates deeper understanding of the abstract (Barsalou, 1999). In the context of this experiment, coding abstract coordinate directions in concrete space may provide a framework through which abstract spatial understanding can more easily manifest. More specifically, thinking about coordinate directions (e.g., *west*) in terms of self-relevant space (e.g., *to the left*), may facilitate the construction of allocentric mental models from egocentric experience. Interestingly, some languages such as Guugu Yimithirr (northeastern Australia) encode directions using world-centered reference systems; for instance, referring to sides of the body as east and west dependent on the facing direction of the individual (e.g., the bug is on your west arm; Brown and Levinson, 1992, 1993;Haviland, 1996, 1998). To our knowledge, no work has considered whether these individuals show any preferred association between these directions and sides of their body, or if they show relatively facilitated generation of allocentric models.

Several theoretical positions have been offered to account for the types of results described presently. Conceptual metaphor theory would suggest that image schemas of directly experienced axes (up, down, left, right) are necessary to structure abstract spatial concepts in order to facilitate understanding (Lakoff and Johnson, 1980; Gibbs, 2003); under this theory, people form image schemas to represent spatial relations, and this metaphorical mapping is reflected in language (e.g., *our relationship is headed south.*). A second position posits that understanding abstract language recruits the motor system (Glenberg et al., 2008a,b); under this theory, both concrete and abstract language that describe or imply (respectively) directional motion are at least partially understood through activation of the motor system. Perhaps the most extreme position posits that perceptuo-motor representations are not only involved but also necessary elements underlying the ability for humans to understand abstract concepts (Barsalou, 1999; Barsalou and Wiemer-Hastings, 2005). Somewhat more evenhanded treatments suggest that both linguistic and perceptuomotor representations influence the comprehension of abstract concepts; for instance, while abstract thought might not necessitate perceptual or motoric simulation, these processes might serve to enrich linguistic representations and facilitate deeper understanding (Mahon and Caramazza, 2008; Dove, 2009; Pecher et al., 2011).

We propose, congruent with some earlier claims regarding metaphor processing (Murphy, 1996; Pecher et al., 2011), that the abstract concept of coordinate direction likely has a learned structure of its own, as do body-referenced concrete directions. In other words, abstract coordinate direction *can* be understood without association to concrete space, though this concrete association is frequently relied upon to structure understanding. This may be particularly the case with individuals who find it difficult to understand abstract spatial concepts, such as those with lower spatial ability; future work might examine how spatial abilities modulate propensities toward mapping abstract spatial concepts to concrete space. Regardless of their precise source, however, even though abstract-concrete associations (i.e., north-up) might aid understanding in many situations, we propose that they might also impair certain types of behavior. For instance, our recent work suggests that implicit associations between abstract coordinates and concrete vertical space lead people to misassociate the northward direction with upward movement and thus greater physical exertion (due to gravity; Brunyé et al., in press); because action planning and perception involve assessments of predicted body states and affordances, the associations revealed in the present work might bias route planning and navigation decisions in unintended manners, potentially leading to suboptimal spatial decisions (Knoblich and Flach, 2001; Witt et al., 2004; Fajen, 2005; Proffitt, 2006).

We introduced the present research as having three main goals and hypotheses. First, we expected that people would associate abstract coordinate terms (i.e., cardinal directions) with egocentric body axes (up/down, left/right); response time data showed some support for this claim, with slower response times when participants were primed with a direction orthogonal to their target movement direction. Second, we expected that movement trajectories would prove valuable in elucidating otherwise hidden cognitive operations; movement trajectory data showed strong support for this claim, with reliable directionally specific movement biases toward orthogonal primes. Finally, we set out to examine the temporal dynamics of abstract concept activation influences on motor movements; results showed strong evidence for a temporally defined process by which abstract concepts alter

## **REFERENCES**


movement trajectories. Together, we provide the first demonstration that abstract spatial concept understanding is bound to concrete space and can manifest through the perceptuo-motor system.

#### **ACKNOWLEDGMENTS**

The authors thank Jonathan B. Freeman for assistance with data analysis.


in action. *Psychon. Bull. Rev.* 9, 558–565.


(Falls Church, VA: ASPRS-ACSM-AAG-URISA-AM/FM), 551–560.


Concepts: sensory-motor grounding,metaphors, and beyond. *Psychol. Learn. Motiv.* 54, 217–248.


productions. *Cogn. Psychol.* 23, 515–557.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 June 2012; accepted: 04 September 2012; published online: 27 September 2012.*

*Citation: Tower-Richardi SM, Brunyé TT, Gagnon SA, Mahoney CR and Taylor HA (2012) Abstract spatial concept priming dynamically influences realworld actions. Front. Psychology 3:361. doi: 10.3389/fpsyg.2012.00361*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012 Tower-Richardi, Brunyé, Gagnon, Mahoney and Taylor. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Embodied cognition: taking the next step

## **Roel M.Willems 1,2\* and Jolien C. Francken<sup>1</sup>**

<sup>1</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Netherlands <sup>2</sup> Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands

#### **Edited by:**

Dermot Lynott, University of Manchester, UK

#### **Reviewed by:**

Frank Garcea, University of Rochester, USA Liuba Papeo, Harvard University, USA

#### **\*Correspondence:**

Roel M. Willems, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, P.O. Box 9101, 6500 HB Nijmegen, Netherlands.

e-mail: r.willems@donders.ru.nl

Recent years have seen a large amount of empirical studies related to "embodied cognition." While interesting and valuable, there is something dissatisfying with the current state of affairs in this research domain. Hypotheses tend to be underspecified, testing in general terms for embodied versus disembodied processing. The lack of specificity of current hypotheses can easily lead to an erosion of the embodiment concept, and result in a situation in which essentially any effect is taken as positive evidence. Such erosion is not helpful to the field and does not do justice to the importance of embodiment. Here we want to take stock, and formulate directions for how it can be studied in a more fruitful fashion. As an example we will describe few example studies that have investigated the role of sensori-motor systems in the coding of meaning ("embodied semantics"). Instead of focusing on the dichotomy between embodied and disembodied theories, we suggest that the field move forward and ask how and when sensori-motor systems and behavior are involved in cognition.

**Keywords: embodied, cognition, semantics, embedded cognition, hypothesis generation**

## **INTRODUCTION: EXCITING EMBODIMENT**

In the last two decades, cognitive science has embraced the thesis of "embodiment." Embodied cognition stresses the intertwined nature of thinking and acting, and as such is an antidote to the traditional divide between cognition on the one hand and perception and action on the other. The excitement about embodiment within cognitive science lies mainly in its promise to destroy the traditional "sandwich" (or "hamburger") model of cognitive processing, with its strict perception-cognition-action scheme (e.g., Hurley, 2001). The sandwich model regards "thinking" as the real stuff (the beef so to say), and takes perception and action as separated slave systems, providing input to cognitive processors (perception) and executing its commands (action).

Instead, embodied cognition stresses that perception and action are directly relevant for our thinking, and that it is a mistake to regard them as separate. The thesis comes in various formats, and a more in depth coverage is beyond the scope of this article (e.g., O'Regan, 1992; Van Gelder, 1995; Clark, 1997; Barsalou, 1999; Wilson, 2002; Noe, 2004; Gallagher, 2005; Wheeler, 2005).

In this paper we want to take stock and see what embodiment has done for a particular research domain in cognitive science, namely the study of semantic representations. With respect to semantic representations, embodied cognition is related to the claim of modality-specific versus abstract representations, in which modality-specific views predict sensori-motor cortex to be constitutive of conceptual representations (see Kiefer and Pulvermüller, 2012 for an excellent recent overview).

This being an opinion paper, it is by no means our intention to give an overview of the field. Instead we highlight certain studies, where we could have chosen others. Of particular importance is that we have chosen to ignore the neuropsychological literature regarding semantic representations (see e.g., Gainotti, 2000; Caramazza and Mahon, 2003; Kiefer and Pulvermüller, 2012).

## **THE EROSION OF A CONCEPT: THE CASE OF EMBODIED SEMANTICS REPRESENTATIONS**

Often embodied cognition is defined very broadly. When we for example look at experiments investigating "embodied semantics," an important prediction is that understanding sensori-motor concepts leads to activation of sensori-motor cortices. So when people read about hand and foot actions, parts of the motor cortex involved in moving the hands and the feet are activated (e.g., Hauk et al., 2004; Tettamanti et al., 2005). Although interesting from the sandwich model perspective, it is unfortunate that the main hypothesis often does not go beyond predicting "involvement" of sensori-motor cortices (see also Binder and Desai, 2011; see also Chatterjee, 2010).

An illustration of this lack of specificity is how easily embodied cognition can capture strikingly different findings. For instance, Buccino et al. (2005) used single-pulse TMS to stimulate the hand or foot/leg motor area while participants were listening to sentences expressing foot and hand actions. Reaction times (RTs) and motor evoked potentials (MEPs) were specifically modulated for the effector involved in the described action: a hand-action-related sentence produced decreased MEPs in the hand area and slower RTs when subjects responded with their hand. The authors conclude that the processing of language modulates the activity of the motor system in an effector specific way. However, in another TMS study with a similar design Pulvermuller et al. (2005a) report that *faster* RTs are observed to hand/arm words after stimulation of the hand area.

It is striking that although the results are opposite (slower versus faster RTs),both are taken as confirmation of the embodied semantics theory. Instead, the researchers could have elaborated more about the reason of their divergent findings. For instance, maybe the differences arise because the interference occurs at a decision making level *after* semantic analysis (Mahon and Caramazza,2008; Chatterjee, 2010). By formulating more specific hypotheses, e.g., here on the direction of the effect and the underlying mechanism, these findings could have been more informative. It strikes us as disappointing to not go beyond the conclusion of *involvement* of cortical motor areas; the pattern of results suggests that something more interesting is going on than motor cortex activation in response to action words. One is left with the question what result would be taken as evidence against embodied cognition?

Another sign of an underspecified theory is that similar findings can be interpreted as evidence in favor as well as against embodiment. Take the studies of Saygin et al. (2010) and Bedny et al. (2008).

First, Saygin et al. showed activation of perceptual (visual) areas when subjects were reading sentences describing motion. More specifically, they found increased BOLD levels in motion sensitive area MT+ when participants read sentences like "The wild horse crossed the barren field" versus "The black horse stood in the barren field" (Saygin et al., 2010). Second, in the study of Bedny et al. participants judged pairs of words that implied motion (animals,e.g.,"the horse,""the dog"),had intermediate implied motion (tools, e.g., "the sword," "the axe"), or had little implied motion (natural kinds, e.g., "the bush," "the pebble"). These authors did not find modulation of MT+ activity for words with different motion ratings. Regions within posterior lateral temporal cortex were more active when comparing verbs and nouns, independent of the amount of motion associations of the words.

A general theory of embodiment would have predicted both studies to find modulation in area MT+ related to amount of motion expressed in the materials. The fact that the one study does observe such modulation, and the other does not is an interesting clue to the context-dependence of sensory cortex activations during language comprehension or as Saygin et al. (2010) p. 2486) put it: "the choice of task and stimuli can influence the power to detect modulations of MT+ by linguistic events." Instead, what happens is that one set of authors interpret their findings as in line with embodied cognition, and the other set of authors interprets their findings as evidence against embodiment, since they show that retrieval of sensory motor features is not obligatory during word comprehension (Bedny et al., 2008). The differences in their findings can probably be attributed to the differences in design. However, both studies generalize their results to the question of whether it supports an embodied or disembodied account, and it is in this interpretation stage that opposite conclusions are drawn.

Many experiments are driven by the "embodied versus disembodied" distinction. This is not a fruitful approach, and in the next section we will show that such a broad distinction does not do justice to the experimental findings that are available. To foreshadow our conclusion: Instead of quarreling about embodied versus disembodied, the field should take the next step and ask the question when and how sensori-motor cortices play a role in understanding.

#### **TAKING STOCK: EMBODIED SEMANTICS**

When we take a bird's eye perspective toward experiments studying sensori-motor cortex involvement when participants read or listen to language describing sensori-motor events (action and visual language), a few things stand out:


Of these findings, the latter one deserves more attention than it has gotten so far: Sensori-motor cortex involvement during understanding of action and perceptual language is task- and context-dependent.

For instance, it has been shown that the motor system is differently modulated depending on the experimental task. In a study by Sato et al. (2008) hand-action verbs interfered with button presses when participants performed a semantic task, but this was not the case when they performed a lexical decision task.

Similarly, in an elegant study Papeo et al. (2009) reported modulation of hand MEPs during reading of hand-action verbs when single-pulse TMS was applied, but again only during an explicit semantic categorization task (on action-relatedness) but not during a syllable detection task.

Another example of context-dependence is provided by Raposo et al. (2009) who showed that activation in motor cortex varied depending on the way verbs were presented: when verbs were viewed in isolation ("kick") or in literal sentences ("kick the ball") motor cortex was activated, but when the verbs were presented in idiomatic contexts ("kick the bucket"), no motor or premotor activation was present (see also Aziz-Zadeh et al., 2006; but see Boulenger et al., 2009).

Van Dam et al. (2012) varied the linguistic context in a different way: they instructed participants to focus either on the action or on the color aspect of a word's referent. Activation in action- and motion-related areas was higher in the former than in the latter condition. The authors suggest that the "action" context emphasized action properties of the object and that therefore the corresponding action features were relevant in constituting the concept.

## **CONCLUSION**

So on the one hand, the state of affairs is favorable to embodied semantics: there can be involvement of sensori-motor cortices in understanding action and perceptual language. This is an important insight and definitely constitutes a way forward in our thinking about the neural basis of conceptual knowledge (see Kiefer and Pulvermüller, 2012 for overview). But the involvement of sensori-motor cortex in conceptual representations is of a more complex nature than a simple binary "yes" or "no." Investigating "an involvement" of sensori-motor cortices in conceptual knowledge was perhaps a good first step, but needs to be followed up by more specific hypotheses. Future research needs to be more specific on when and how sensori-motor cortices are involved in language understanding. One reason for this is that current findings are too easily interpreted as confirming embodied accounts (see also Chatterjee, 2010). A second motivation is the fact that several studies show the context-dependence of sensori-motor involvement in language understanding. Computational models can be important in making the operations that take place in sensori-motor

## **REFERENCES**


cortices more explicit, and the field should take more advantage of those (e.g., Chersi et al., 2010). Only with such specificity can embodied cognition make progress and will the concept retain its value.

## **ACKNOWLEDGMENTS**

Publication costs for this article were paid by the Max Planck Society.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 August 2012; accepted: 10 December 2012; published online: 28 December 2012.*

*Citation: Willems RM and Francken JC (2012) Embodied cognition: taking the next step. Front. Psychology 3:582. doi: 10.3389/fpsyg.2012.00582*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2012Willems and Francken. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Embodied cognition is not what you think it is

## **Andrew D.Wilson\* and Sabrina Golonka**

School of Social, Psychological and Communication Sciences, Leeds Metropolitan University, Leeds, UK

## **Edited by:**

Louise Connell, University of Manchester, UK

## **Reviewed by:**

Louise Connell, University of Manchester, UK Tad Brunye, US Army NSRDEC and Tufts University, USA Louise Barrett, University of Lethbridge, Canada

#### **\*Correspondence:**

Andrew D. Wilson, School of Social, Psychological and Communication Sciences, Leeds Metropolitan University, Civic Quarter, Calverley Street, Leeds LS1 3HE, UK. e-mail: a.d.wilson@leedsmet.ac.uk

The most exciting hypothesis in cognitive science right now is the theory that cognition is embodied. Like all good ideas in cognitive science, however, embodiment immediately came to mean six different things. The most common definitions involve the straightforward claim that "states of the body modify states of the mind." However, the implications of embodiment are actually much more radical than this. If cognition can span the brain, body, and the environment, then the "states of mind" of disembodied cognitive science won't exist to be modified. Cognition will instead be an extended system assembled from a broad array of resources.Taking embodiment seriously therefore requires both new methods and theory. Here we outline four key steps that research programs should follow in order to fully engage with the implications of embodiment. The first step is to conduct a task analysis, which characterizes from a first person perspective the specific task that a perceiving-acting cognitive agent is faced with. The second step is to identify the taskrelevant resources the agent has access to in order to solve the task. These resources can span brain, body, and environment.The third step is to identify how the agent can assemble these resources into a system capable of solving the problem at hand. The last step is to test the agent's performance to confirm that agent is actually using the solution identified in step 3. We explore these steps in more detail with reference to two useful examples (the outfielder problem and the A-not-B error), and introduce how to apply this analysis to the thorny question of language use. Embodied cognition is more than we think it is, and we have the tools we need to realize its full potential.

**Keywords: embodied cognition, dynamical systems, replacement hypothesis, robotics, outfielder problem,A-not-B error, language**

## **INTRODUCTION**

The most exciting idea in cognitive science right now is the theory that cognition is *embodied*. It is, in fact one of the things interested lay people know about cognitive science, thanks to many recent high profile experiments. These experiments claim to show (1) how cognition can be influenced and biased by states of the body (e.g., Eerland et al., 2011) or the environment (Adam and Galinsky, 2012) or (2) that abstract cognitive states are grounded in states of the body and using the former affects the latter (e.g., Lakoff and Johnson, 1980, 1999; Miles et al., 2010).

The problem, however, is that *this is not really what embodied cognition is about*. Embodiment is the surprisingly radical hypothesis that the brain is not the sole cognitive resource we have available to us to solve problems. Our bodies and their perceptually guided motions through the world do much of the work required to achieve our goals,*replacing* the need for complex internal mental representations. This simple fact utterly changes our idea of what"cognition"involves, and thus embodiment is not simply another factor acting on an otherwise disembodied cognitive processes.

Many cognitive scientists, see this claim occupying the extreme end of an embodiment continuum, and are happy with the notion that there can be many co-existing notions of embodiment – maybe three (Shapiro, 2011) or even six (Wilson, 2002). Why rule out other research programs that seem to be showing results? Why not have one strand of embodied cognition research that focuses on how cognition can be biased by states of the body, and another strand that focuses on brain-body-environment cognitive systems? The issue is that the former type of research does not follow through on the necessary consequences of allowing cognition to involve more than the brain. These consequences, we will argue, lead *inevitably* to a radical shift in our understanding of what cognitive behavior is made from. This shift will take cognitive science away from tweaking underlying competences and toward understanding how our behavior emerges from the real time interplay of task-specific resources distributed across the brain, body, and environment, coupled together via our perceptual systems.

This paper will proceed as follows. After laying out the standard cognitive psychological approach to explaining behavior, we'll briefly point to some interesting lines of empirical research from robotics and animal cognition that support the stronger *replacement hypothesis* of embodied cognition (Shapiro, 2011). We'll then lay out a recommended research strategy based on this work. Specifically, we will detail how to use a *task analysis* to identify the cognitive requirements of a task and the resources (in brain, body, and environment) available to fill these requirements. According to this analysis, it is the job of an empirical research program to find out which of the available resources the organism is actually using, and how they have been assembled, coordinated, and controlled into a *smart, taskspecific device* for solving the problem at hand (Runeson, 1977; Bingham, 1988). We'll focus on two classic examples in detail:

*the outfielder problem* (e.g., McBeath et al., 1995) and the *Anot-B task* (e.g., Thelen et al., 2001). We'll then contrast this task-specific approach with some embodied cognition research in the standard cognitive psychology mold, and see how this latter research fails to successfully motivate any role for the body or environment, let alone the one identified in the research. Finally, we'll conclude with some thoughts on how to begin to apply this approach to one of the harder problems in cognitive science, specifically language use. Language is the traditional bête noir of this more radical flavor of embodiment, and our goal in this final section will be to demonstrate that, with a little work, a truly embodied analysis of language can, in fact, get off the ground.

## **STANDARD COGNITIVE EXPLANATIONS FOR BEHAVIOR**

The insight of early cognitive psychologists was that our behavior appears to be mediated by something internal to the organism. The classic example is Chomsky's (1959) critique of "Verbal Behavior" (Skinner, 1957) in which he argues that language learning and use cannot be explained without invoking mental structures (in this case, innate linguistic capabilities). In general, the theoretical entities cognitive psychologists invoke to do this internal mediation are *mental representations*.

At the time these ideas were taking off, research on perception suggested that our perceptual access to the world wasn't very good (see Marr, 1982; Rock, 1985 for reviews). This creates the following central problem for representations to solve. The brain is locked away inside our heads with only impoverished, probabilistic perceptual access to the world, but it has the responsibility of coordinating rapid, functional, and successful behavior in a dynamic physical and social environment. Because perception is assumed to be flawed, it is not considered a central resource for solving tasks. Because we only have access to the environment via perception, the environment also is not considered a central resource. This places the burden entirely on the brain to act as a storehouse for skills and information that can be rapidly accessed, parameterized, and implemented on the basis of the brain's best guess as to what is required, a guess that is made using some optimized combination of sensory input and internally represented knowledge. This job description makes the content of internal cognitive representations the most important determinant of the structure of our behavior. Cognitive science is, therefore, in the business of *identifying this content* and *how it is accessed and used* (see Dietrich and Markman, 2003 for a discussion of this).

Advances in perception-action research, particularly Gibson's work on direct perception (Gibson, 1966, 1979), changes the nature of the problem facing the organism. Perception is not critically flawed. In fact, we have extremely high quality, direct perceptual access to the world. This means that perception (and by extension, the environment) can be a useful resource, rather than a problem to be overcome by cognitive enrichment. Embodied cognition (in any form) is about acknowledging the role perception, action, and the environment can now play.

A radical conclusion emerges from taking all this seriously: if perception-action couplings and resources distributed over brain, body, and environment are substantial participants in cognition, then the need for the specific objects and processes of standard

cognitive psychology (concepts, internally represented competence, and knowledge) goes away, to be replaced by very different objects and processes (most commonly perception-action couplings forming non-linear dynamical systems, e.g., van Gelder, 1995). This, in a nutshell, is the version of embodiment that Shapiro (2011) refers to as the *replacement hypothesis* and our argument here is that *this hypothesis is inevitable once you allow the body and environment into the cognitive mix*. If such replacement is viable, then any research that keeps the standard assumptions of cognitive psychology and simply allows a state of the body to tweak cognition misses the point. To earn the name, embodied cognition research must, we argue, look very different from this standard approach.

## **EMBODIED COGNITION: FOUR KEY QUESTIONS**

The core question in psychology is *why does a given behavior have the form that it does?* The standard cognitive psychology explanation for the form of behavior is that it reflects the contents and operation of an internal algorithm (implemented as a mental representation) designed to produce that behavior on demand (e.g., Fodor, 1975, 2008). The work discussed below replaces complex internal control structures with carefully built bodies perceptually coupled to specific environments. (Of course, embodied cognition solutions will also sometimes require internal control structures. Critically, though, these internal control structures are taking part in the activity of distributed perceptually coupled systems from which behavior emerges online, in real time, in a context. Thus, explicit representations of behavior or knowledge have no place in embodied solutions.)

To get a rigorous handle on this claim, we suggest that there are four key questions any embodied cognition research program must address:


of their underlying dynamics (e.g., Bingham, 1995) and thus it is becoming common practice to formalize the task description using the tools of dynamical systems (e.g., Fajen and Warren, 2003; Bingham, 2004a,b; Schöner and Thelen, 2006).


The next sections will review what this new research looks like in practice; we will begin with some simpler cases that tackle and clarify some of our key questions, and end up with two cases of human behavior that demonstrate how to tie these four questions into a coherent research program.

## **EMBODIMENT IN ACTION**

#### **EMBODIMENT IN ACTION I: ROBOTS**

One of the most productive areas to demonstrate the strength of the replacement hypothesis is *robotics*. Robots built on the principles of embodiment are capable of interestingly complex behavior, demonstrating how far you can get without representational enrichment. When you build something yourself from scratch, you know exactly what is (and is not) included in the control systems. This means that your pool of potential explanations for a given behavior is constrained and enumerated, and you can answer questions 2 and 3 in great detail.

## **"Swiss" robots**

An early example of embodied cognition robotics comes from Maris and te Boekhorst (1996), who built small Didabots with infra-red detectors placed around their body and a very simple internal control structure: a single rule,"turn away from a detected obstacle." In this paper, the detector at the front of the robot was deactivated – the robot could no longer "see" anything directly ahead, but it could "see" off to the sides and behind. If it hit an obstacle (a white block) head on, it simply kept moving and pushed the block along until it turned to avoid the next obstacle (either another block or a wall). The first block was then left behind, and the net result (if there was more than one robot at work) was that the randomly scattered blocks were "tidied up" into heaps. This tidying behavior is not specified in the control structure of the robots; it emerges, in real time, from the relationship between the rule, the environment (the size and number of obstacles, the presence, or absence of other robots), and the bodies of the robots (the working front sensors have to be far enough apart to allow a block to fit, or else the robot simply successfully avoids the blocks). Importantly, then, the robots are not actually tidying – they are only trying to avoid obstacles, and their errors, in a specific extended, embodied context, leads to a certain stable outcome that looks like tidying (see also Pfeifer and Scheier, 1999; Pfeifer and Bongard, 2007 for extended reviews of this style of robotics). Understanding the resources the robots had available and how they were organized was what enabled the researchers to identify that the robots were not, in fact, trying to tidy anything up.

## **Locomotion and passive dynamics**

Why does walking have the form that it does? One explanation is that we have internal algorithms which control the timing and magnitude of our strides. Another explanation is that the form of walking depends on how we are built and the relationship between that design and the environments we move through.

Considering the resources available to solve this task highlights the centrality of an organism's design. Humans don't walk like lions because our bodies aren't designed like lions' bodies. The properties of our design are referred to as *passive dynamics* (McGeer, 1990). How are the segments arranged? How are they connected to each other? How springy are the connections? Robotics work on walking show that you can get very far in explaining why walking has a particular form just by considering the passive dynamics. For example, robots with no motors or onboard control algorithms can reproduce human gait patterns and levels of efficiency simply by being assembled correctly (e.g., Collins et al., 2005) 1 . Work at MIT has added simple control algorithms to this kind of system, which allows the robots to maintain posture and control propulsion more independently. The same algorithm can produce a wide variety of locomotion behaviors, depending on which robotic body they control (e.g., Raibert, 1986) 2 . None of these systems include a representation of the final form of their locomotion; this form emerges in real time from the interaction of the passive dynamics with the environment during the act of moving. These robots demonstrate how organisms might use distributed task resources to replace complex internal control structures.

## **Robot crickets**

A fascinating example of embodiment in nature has been replicated in the lab in the form of a robot (see Barrett, 2011 for the more detailed analysis of this case that we draw from here). Female crickets need to find male crickets to breed with. Females prefer to breed with males who produce the loudest songs. This means that the task facing female crickets is to find the males who sing the loudest. What resources do they use to solve this

<sup>1</sup>http://ruina.tam.cornell.edu/research/ for videos and more details of these robots. <sup>2</sup>http://www.ai.mit.edu/projects/leglab/ for videos and more details.

task? Female crickets have a pair of eardrums, one on each front leg, which are connected to each other via a tube. Sounds entering from the side activate that side's eardrum directly, and also travel through the tube from the other eardrum as well. These signals are out of phase if the sound is off to one side, and this increases the amplitude of that side's eardrum's response; this arrangement is therefore directional. This explains how the female can tell what direction a sound is coming from, but it doesn't explain how she uses this information to move toward this sound or how she manages to tune in to crickets of her own species. It so happens that the eardrums connect to a small number of interneurons that control turning; female crickets always turn in the direction specified by the more active interneuron. Within a species of cricket, these interneurons have a typical activation decay rate. This means that their pattern of activation is maximized by sounds with a particular frequency. Male cricket songs are tuned to this frequency, and the net result is that, with no explicit computation or comparison required, the female cricket can orient toward the male of her own species producing the loudest song. The analysis of task resources indicates that the cricket solves the problem by having a particular body (eardrum configuration and interneuron connections) and by living in a particular environment (where male crickets have songs of particular frequencies).

Webb (1995, 1996) have built robots that only have these basic capacities, and these robots successfully reproduce the form of the female cricket's exploratory behavior. The robots have no stored information about the male cricket's songs, and simply perceive and act using a particularly arranged body. It is clear that the robot doesn't explicitly implement "choosing the male with the strongest song"; finding him is simply the result of this embodied strategy operating in the context of multiple male crickets singing and is driven (this robotics work predicts) by the onset of chirps within the song. The success of this work results from carefully analyzing the task at hand, identifying available resources, and specifying how these resources are assembled by the agent (questions 1–3 outlined above).

## **Summary**

This robotics work and more like it (e.g., Brooks, 1999; Pfeifer and Scheier, 1999; Beer, 2003; Pfeifer and Bongard, 2007) reveal a great deal of complex behavior (from tidying, to locomotion, to mate selection) can emerge from placing the right type of body into a specific environmental context, without any explicit representation of the form of that behavior anywhere in the system. This work is a proof of concept that embodiment and embedding can therefore *replace* internal algorithms and lead to stable, functional behavior.

## **EMBODIMENT IN ACTION II: ANIMALS**

The robot work is fascinating is one part of a strong argument in favor of the replacement hypothesis. Of course, the next critical step is to establish whether biological organisms actually take advantage of these embodied solutions (question 4) or whether they follow a different, more computational path.

## **Crickets again**

Webb's robot crickets implement a simple embodied perceptionaction strategy to perform mate selection. A hypothesis that follows from this work is that females use the onset of a male's song to drive exploration, rather than attending to the entire song and "choosing" the best one. Observation of real crickets shows that female crickets do indeed move before they could possibly have processed an entire song, supporting this embodied "chirp onset" hypothesis (Hedwig and Webb, 2005; see also Barrett, 2011 for an overview).

## **Swarming, herding, hunting**

Many animals produce carefully coordinated activities with large numbers of conspecifics. Forming large groups (swarms, or herds, or flocks) is a valuable defense against predators, and maintaining these groups requires ongoing coordination across many individuals. This coordination is not centrally controlled, however, and is not the result of an explicit attempt to maintain a swarm. Instead, the coordination emerges from and is maintained by the operation of straight-forward perception-action coupling rules in a suitable context. Bird flocking is elegantly explained as a coupling between individuals constrained by three principles (Reynolds, 1987): *separation* (avoid crowding neighbors), *alignment* (steer toward average heading of neighbors), and *cohesion* (steer toward average position of neighbors). Interestingly, cohesion exhibits asymmetries that relate to the perceptual capabilities of birds; the average position is a center of mass of only the nearest 5–10 birds, weighted in favor of birds off to the side (reflecting the field of view for bird vision; Ballerini et al., 2007). Sheep herding is similarly straight-forward. Sheep head for the geometric center of the flock when a predator approaches, implementing a "selfish herd" strategy without any individual in the herd being "selfish" *per se* (Hamilton, 1971; King et al., 2012).

A more complex example of coordinated social activity is the pack hunting of wolves. The pattern of their activity, however, is readily explained by two simple rules: (1) move toward the prey until a minimum safe distance is reached, and then (2) move away from any other wolves that are also close to the prey (Muro et al., 2011). No leader is required, no instructions need be given; the form of the group's hunting activity emerges from a simple perception-action coupling strategy implemented by each individual, operating in a specific context.

Continuing the hunting theme, Barrett (2011) has an extended discussion on what she refers to as "the implausible nature of *Portia*," the jumping spider. *Portia* is capable of some remarkable feats: deceptive mimicry, creating diversions to distract prey, and taking extended detours in order to sneak up on dinner. This last is especially impressive – detours mean *Portia* must operate for extended periods without direct perceptual contact with its prey animal. This would seem to require some form of route planning (Heil, 1936; Barrett, 2011). As Barrett notes, this hypothesis seemed initially plausible because of the way in which *Portia* scans its environment – prior to taking the detour, it will sit and sway from side to side, seemingly evaluating potential routes and making a selection. However, this scanning behavior, coupled with the anatomy of the spider's eyes, is actually an embodied strategy that enables *Portia* to generate successful detours using

currently available perceptual information (e.g., Tarsitano and Jackson, 1997; Tarsitano and Andrew, 1999); *Portia* is perceiving, not planning.

#### **Summary**

The advantage of examples from the animal literature is that researchers are less likely to want to attribute performance to complex internal representations (only less likely, of course; the temptation is always there – Kennedy, 1992; Barrett, 2011). However, once we identify that embodied, situated perception-action couplings can produce complex adaptive behavior in other animals, it becomes more difficult to deny the existence of such solutions in our own repertoire unless one wishes to deny the evolutionary continuity between ourselves and the rest of the animal kingdom.

## **EMBODIMENT IN ACTION III: PEOPLE**

We will now review in some detail two excellent examples of successful replacement style embodied cognition in psychology. These examples are the *outfielder problem* and the *A-not-B error* (see Clark, 1999; Smith and Gasser, 2005 for other uses of these examples). They are useful because (a) they address all four key questions of good embodiment research and (b) both examples have standard cognitive psychology explanations that have been successfully replaced after numerous studies implementing the kind of embodied approach we are advocating for here. These sections will begin by describing the standard cognitive psychology explanations for the outfielder problem and the A-not-B error. We will then take a step back and analyze each task from an embodied cognition perspective, asking our four key questions:


## **EMBODIMENT IN ACTION III.I: THE OUTFIELDER PROBLEM**

How does a baseball outfielder catch a fly ball? There are many factors that make this task difficult; the fielder is far away from the batter, the ball is optically very small and remains so until it is very close to the fielder, the fielder has to move from their starting location to the location where the ball will land at some point in the future, and they have to arrive at this location in time to intercept the ball.

## **The standard explanation**

The initial hypothesis is that we catch fly balls by predicting their future location based on the physics of the ball's motion. A fly ball is an instance of projectile motion, and the physics of this kind of ballistic flight are relatively straight-forward. For an object of a given size and mass, the primary variables that determine the flight are initial direction, velocity, and angle (plus some local constants such as drag, air density, and gravity). Saxberg (1987a,b) suggested that outfielders perceive these initial parameters and then use them as input to an internal simulation (representation) of projectile motion. This representation allows outfielders to *predict* the future location of the ball (Trajectory Prediction). Once the future location of the ball has been predicted, the fielder can simply run to that location and wait.

## **The embodied solution**

Saxberg's (1987a,b) solution assumes that the act of catching a fly ball is a lot like solving a physics problem, relying on some limited resources (the ball's initial conditions) and some internal simulation. In contrast, the embodied solution first asks if that's true by asking "What *are* the resources that are available in this task, and how might they help a person trying to catch a ball?"

## **What is the task to be solved?**

A fielder stands in the outfield of a baseball diamond, around 250 ft from home plate. The batter pops a fly ball (projectile motion along a parabolic trajectory) into the air and the fielder must locomote from where they are, to where the ball will be when it hits the ground (hopefully in time to catch it before it hits the ground). So, the fielder's task is to move themselves so that they arrive at the *right place* at the *right time* to intercept a *fly ball*. Sometimes fielders are in a direct line with the flight of the ball, but the general problem to be solved involves the fielder being *off to one side*.

## **What are the resources available?**

The first thing to note is that, at the distances involved, the optical projection of the baseball is tiny. Any attempt to figure out how far away the ball is and where it's going using changes in optical projection size will be riddled with errors (if it's possible at all; Cutting and Vishton, 1995). These errors would propagate through any simulation, which makes solutions based on computing simulations of projectile motion unstable. This means that the simulation solution is not a likely resource (and in fact the evidence suggests it is not an option; Shaffer and McBeath, 2005). What else is available?

To identify the full range of available resources, we need to understand the physical properties of the fly ball event. Events unfold over time, and are distinguished from one another by their underlying *dynamics* (which describe both how the system changes over time and the forces which produced the change; Bingham, 1995). In the present example, the relevant dynamics are that of projectile motion. As a given example of the projectile motion dynamic plays out, it creates *kinematic* information which can be detected and used by an observer. Kinematic descriptions include only how the system changes over time, without reference to the underlying forces. Perceptual systems can only detect kinematic patterns, but observers actually want to know about the underlying dynamic event; this is the perceptual bottleneck (Bingham, 1988). Kinematics can *specify* the underlying dynamics, however (Runeson and Frykholm, 1983) and detecting a specifying kinematic pattern is equivalent to perceiving the underlying dynamic (solving the bottleneck problem and allowing direct perception as suggested by Gibson, 1966, 1979). The information that an outfielder might use to continuously guide their actions to the future position of the ball must therefore be kinematic and specific to this future position.

The batter provides the initial conditions of the ball's trajectory (direction, velocity, and angle) and, after that, the flight unfolds according to the dynamics of projectile motion. This dynamic produces motion along a parabolic trajectory. The form of this motion is that the ball initially rises and decelerates until it reaches a peak height when its velocity reaches zero; it then accelerates as it falls down the other side of the parabola. This motion is the kinematic information that is available to the observer.

The fielder also brings resources with them: these include the ability to detect kinematic information and (most usefully) to locomote over a range of speeds along any trajectory across the field.

#### **How might these resources be assembled to solve the task?**

How can the perceptual information specifying the dynamics of the fly ball be used in conjunction with the fielder's ability to perceive kinematics and locomote? The parabolic flight of the ball creates the possibility of two basic solutions. Each strategy requires the outfielder to move in a particular way so as to offset some aspect of the parabolic flight, either the acceleration or the curve of the path. If the fielder is able to successfully offset either the acceleration or the curve of the path, then they will end up in the right place in the right time to intercept the ball. When reading about these solutions in more detail below notice that neither one requires the fielder to predict anything about the ball's future location, only to move in a particular way with respect to the ball's current motion; this is *prospective control* (e.g., Montagne et al., 1999).

The first solution is called optical acceleration cancelation (OAC; e.g., Chapman, 1968; Fink et al., 2009) and requires the fielder to align themselves with the path of the ball and run so as to make the ball appear to move with constant velocity. The second strategy is called linear optical trajectory (LOT; e.g., McBeath et al., 1995) and requires the fielder to move laterally so as to make the ball appear to trace a straight line. Which strategy is adopted depends on where the fielder is relative to the ball (OAC works best if the ball is coming straight for you, LOT allows you to intercept a ball that is heading off to one side).

#### **Does the organism, in fact, assemble, and use these resources?**

The computational strategy suggests that the outfielder will run in a straight line to the predicted landing site. This is because the fielder computes the future landing site based on input variables that the fielder detects before setting off. Since the shortest path to a known landing site in open terrain is a straight line, the fielder should run directly to the place where they intercept the ball. Outfielders do not typically run in straight lines, ruling the computational strategy out. LOT and OAC predict either a curving path or one with a velocity profile that offsets the acceleration of the ball. The evidence generally favors LOT (e.g., McBeath et al., 1995) but there is evidence that OAC is a viable and utilized strategy under certain conditions (e.g., Fink et al., 2009).

These solutions have numerous advantages over the computational solution. First, instead of relying on an initial estimate of the ball's motion, which could be in error, they allow the fielder to continuously couple themselves to the ball. This coupling provides fielders with numerous opportunities for error detection and correction. Second, the strategies provide a continuous stream of information about how well the fielder is doing. If the ball still seems to be accelerating, or if its trajectory is still curved, this tells the fielder both that there is an error and what to do to fix the error. If the fielder is running flat out and is still unable to correct the errors, this specifies an uncatchable ball, and the fielder should switch to intercepting the ball on the bounce instead. The affordance property "catchableness" is therefore continuously and directly specified by the visual information, with no internal simulation or prediction required.

## **Summary**

In both LOT and OAC, various task resources (the motion of the ball, the fielder, and the relation between them specified by the kinematics of the ball viewed by the moving observer) have been assembled into a *task-specific device* (Bingham, 1988) to solve the task at hand (intercepting the projectile). This assembly is *smart*, in the sense described by Runeson (1977); it takes advantage of certain local facts of the matter to create a robust but task-specific solution (neither LOT nor OAC are a general solution to the problem of interception, for example). The most important lesson here is that the relation between perceptual information (about the motion of the ball) and an organism (the outfielder) *replaces* the need for internal simulation of the physics of projectile motion.

### **EMBODIMENT IN ACTION III.II: THE A-NOT-B ERROR**

What do children know about objects and their properties, and when do they come to this knowledge? Piaget (1954) investigated this question by asking children of various ages to searchfor objects that were hidden behind some obstacle in view of the children. Prior to about 7 months, children simply don't go looking for the object, as if it has ceased to exist. From around 12 months, children will happily go and retrieve the hidden object, seemingly now understanding that even though they can't see the toy they want, it's still there to be found. In the transition, however, children make a rather unusual "error" – after successfully reaching several times for a hidden object at a first location A, they will then fail to reach for the object hidden at location B, even though the hiding happened in full view of the object. They will instead reach to A again (hence "A-not-B error").

There are a variety of standard cognitive explanations for this error, but all in essence assume that (a) the child has developed the necessary object concept that includes the knowledge that objects persist even when out of view but (b) there is something about reaching that cannot tap that knowledge reliably. The child's underlying *competence* can be demonstrated using looking behavior as a measure, for example; children look longer at displays showing the error trial, suggesting they know something is not right (e.g., Baillargeon and Graber, 1988). The problem, therefore, is in the reaching *performance*: reaching cannot yet access the knowledge necessary. This performance-competence distinction is a common theme in the cognitive developmental literature. It assumes that the goal of the science is to understand the core competence, and that to do so you must devise clever methods to bypass the potential limitations of performance.

Thelen et al. (2001) challenged every single aspect of this account with their embodied dynamical systems model of the reaching task. This model was the end result of numerous experiments motivated by a rejection of the performance-competence distinction and a renewed focus on the details of the task at hand. As Thelen et al. put it,"The A-not-B error is not about what infants *have* and *don't have* as enduring concepts, traits, or deficits, but what they *are doing* and *have done*" (p. 4). The end result was an account of the A-not-B error that replaces object knowledge and performance deficits with the dynamics of perceiving and acting over time in the context of the reaching task.

## **What is the task to be solved?**

This is actually quite a complicated question. The canonical version of the task requires the infant to watch as an attractive toy is hidden at location A. The child is then allowed to search for and retrieve the object several times, after which the object is hidden at location B in full sight of the baby.

One of the inspirations for pursuing a dynamical system, embodied approach here was that almost every parameter of this task is known to affect infants' performance. These parameters include the distance to the targets, the distinctiveness of the covers, the delay between hiding and search, what the infant is searching for (food or a toy), whether the infant is moved and how much crawling experience they have (see Thelen et al., 2001 for a detailed overview). If the A-not-B error reflects object knowledge, why do these factors matter so much?

To get a handle on this question, the first thing that Thelen et al. (2001) did was to enumerate the details of the canonical task (Section 2.2) so that they had a clear understanding of the available resources that might impact infants' performance. First, the infant gets *continuous visual input* (Section 2.2.1) from two wells in a box placed a certain distance away from the child and apart from one another. The experimenter draws the infant's attention to the object, and then hides the object in well A. This *specific visual input* (Section 2.2.2) indicates which well the reaching target is in. After a short *delay* (Section 2.2.3) during which infants typically look at the cued location, they perform a *visually guided reach* (Section 2.2.4) to retrieve the object. This reach requires them to *remember* (Section 2.2.5) the location of the hidden object for the duration of the delay. This is repeated several times until the switch to the B location, at which point the infants make the error around 70– 80% of the time (depending on their *developmental status*; Section 2.2.6).

## **What are the resources available?**

In this version of the task, the resources that might impact performance include the details of the continuous and specific visual input, the length of the delay, and the delay's relationship to the temporal dynamics of the memory of the previous reaches. The infant also brings resources to the task. For instance, their performance depends on their ability to maintain visual attention and the way in which they perform visually guided reaches. Thelen et al. (2001) do not include an object concept as a resource. The purpose of this seeming omission is to see how well they can model the behavior without invoking any core competence separate from observed performance.

#### **How might these resources be assembled to solve the task?**

The reason why this work by Thelen et al. (2001) is such a powerful example of replacement style embodied cognition is that their model is an excellent example of using dynamical systems to explain how perceptual and embodied resources might be assembled to produce an error that, on the face of it, seems to require a representational explanation (in the form of an infant's object concept). The model specifies two locations in a metric field representing the infant's reach space and takes specific perceptual input about where to reach. This input raises activation at the appropriate location in the motor planning field and generates a reach in the right direction once a threshold is crossed. Reach direction planning unfolds continuously over time using population coding (c.f. Georgopoulos, 1995). Activation in this field has a temporal dynamic that prevents it from fading immediately; the movement planning field has memory about its recent behavior. Activations at different locations in the field interact, allowing for competition and cooperation between them. The model is initialized and presented with specific input; the behavior of the model emerges as the various competing dynamics (specific input, task input, memory, reach planning, etc.) unfold and change the shape of the field controlling reaching. By the time the specific input is switched to location B, the field has taken on a shape which reflects this competition, and the perceptual input from B is effectively being detected by a very different system than the one which first detected input from location A. Its behavior is correspondingly different; specifically, if the parameters match the canonical version of the task, the model will make the A-not-B error. Note there is no mention of an "object concept" in the model specification. Yet, the model is able to re-create the A-not-B error simply by implementing a reach system with its own dynamical properties.

#### **Does the organism, in fact, assemble, and use these resources?**

The model is extremely successful at capturing the key phenomena of the A-not-B task. It also captures how performance is affected by changes to task details (e.g., variation in reach delay, changes in object properties). Object concept based explanations have been proposed for these effects (e.g., see Diamond's, 2001 response to Thelen et al.'s, 2001 target article). However, there are other aspects of task performance that object concept explanations struggle to cope with. Most interestingly, the model predicts and then explains the novel experimental finding that the A-not-B error occurs*in the absence of hidden objects* (Smith et al., 1999). If there is no object to remember, then object concept based explanations are at a loss to explain why the error persists; after all, there is no object to conceptualize. In contrast, the embodied model predicts that the "error" comes from the immature dynamics of reaching, and not an incomplete object concept. This then suggests that you should be able to generate the error in older children by increasing the complexity of the reaching requirements. Consistent with this, Smith et al. (1999) and Spencer et al. (2001) generated the error in 2 year olds and similar reach biases have been observed in children up to 11 (Hund and Spencer, 2003) and even adults (Spencer and Hund, 2002). There is no clear reason to expect these biases on the basis of an object concept explanation. The best explanation for this pattern of results is that the observed reaching behavior does indeed emerge from the kind of embodied task dynamic described by the model.

#### **Summary**

The A-not-B task has a long history of explanations based in standard, representational cognitive psychology. These explanations assume that the reach is an error caused by an incomplete object concept, to which the immature motor system has limited access until around the age of 12 months. Thelen et al.'s (2001) embodied approach *replaces* the object concept with the dynamics of reaching to grasp and successfully accounts for the wide variety of context effects, as well as explaining novel versions of the error generated without any hidden objects and in older children.

## **THE CONCEPTUALIZATION HYPOTHESIS FOR EMBODIMENT: CONCEPTS AND GROUNDING**

We have identified embodied cognition as a cluster of research tied together by the same basic research strategy; (1) identify the task at hand, (2) identify the resources available within that task space that might help an organism solve the task, (3) generate hypotheses about how these resources are assembled and coordinated (perhaps formalizing this hypothesis in a model; see Bingham, 2001, 2004a,b for another example, and Golonka and Wilson, 2012 for a detailed analysis of that model), and finally (4) empirically test whether people, indeed, use these resources assembled in this way. This is not, however, the only style of research going under the banner of embodiment, and it's fair to ask on what basis we are ruling this other research out from our classification.

Many examples fall under what Shapiro (2011) calls the *conceptualization hypothesis*. This is the hypothesis that how we conceive of our world is grounded in and constrained by the nature of the perception-action systems that we are (our bodies). For example, Lakoff and Johnson (1980, 1999) describe how common metaphors are typically grounded in the nature of our bodies and experiences in the world (the future is *forward*, power is *up*, relationships are *a journey*). This style of research doesn't seek to replace the concept with a different process. Instead, it looks to find examples where use of the concept can be primed or altered by manipulations of the grounding state of the body.

There are many recent examples of this type of research in the literature; we will briefly focus on two representative studies. The first claims to demonstrate how a state of the body affects our access to a mental representation for magnitude estimation (Eerland et al., 2011) while the second claims to show an effect in the other direction, with a mental state biasing the body state the mental state is supposedly grounded in Miles et al. (2010).

#### **Leaning to the left makes the Eiffel Tower seem smaller**

People can generate sensible estimates of the magnitude of things, such as the height of the Eiffel Tower, even when they don't know the exact answer. These magnitudes are hypothesized to be generated by a mental representation of magnitudes organized like a number line, with small numbers at the left end and larger numbers to the right (Restle, 1970). Eerland et al. (2011) had people stand balanced slightly to the left or to the right of center to test the hypothesis that this postural bias would make either the left or right end of the number line more accessible. If it did, then people should be primed to generate lower estimates of magnitude when leaning left and greater ones when leaning right.

The results were mixed. When people leaned left they did, on average, make slightly smaller estimates than when leaning right and the authors concluded that these data support the hypothesis; access to the mental number line, arranged left to right, is, at least, partly grounded in the left to right sway of the body. It should

be noted, however, that the effect size was very small, the effect was not observed for all the questions, and there was no effect of leaning to the right.

#### **Thinking about the future makes you sway forward**

The second example of conceptualization style research is Miles et al. (2010), who had people engage in "mental time travel" by thinking about events in either the past of the future. They measured postural sway at the knee, and found that as people thought about the future this sway was biased toward the front (the future is *in front*). When people thought about events in the past, their sway was biased backward (the past is *behind*). Again the effect was small (peaking at a bias of approximately 2 mm in each direction) but the authors concluded that their data demonstrate a connection between the state of the body and the contents of the cognitive representation of time.

## **Where is the embodiment?**

Neither of these studies begins with a task analysis and neither considers what perceptual and embodied resources are available to solve the task. This eliminates the opportunity to discover what substantive role these resources can play in cognition. Instead, the assumption made in both these studies is that the task is solved internally, representationally, by a cognitive process that can tweak or be tweaked by a state of the body. There isn't any compulsory, critical, constitutive role for the body and environment in the proposed mechanism for solving the task at hand, as there is in all the other work reviewed. You cannot catch a fly ball without moving. The fielder's movement *inevitably* creates the information for either LOT or OAC, which can then structure the observed behavior. You cannot do the A-not-B task without reaching. Reaching *inevitably* invokes the dynamics of visually guided reaching, which can then structure the observed behavior. You can, however, lean left and not have it affect your estimates of magnitude, and you can think about the future without leaning forward. Conceptualization style embodiment research does not identify the body as a task-critical resource, nor does it generate any formal account of how the body forms part of a task-specific solution to the task at hand. At best, it demonstrates that sometimes thoughts and actions go together.

## **TAKING THE NEXT STEP – AN EMBODIED ANALYSIS OF LANGUAGE**

This paper has laid out what we propose is a necessary research strategy for a genuine embodied cognitive science. We've looked at a progression of existing research that follows this strategy, beginning with simple robotic systems up through non-human animal behavior, and on to two cases of human behavior – one straight-forward perception-action system (catching a fly ball) and one more traditional cognitive task (the A-not-B task). The point was to show that this approach is productive across a wide variety of tasks and behaviors, and that it demonstrates the kind of continuity evolutionary theory tells us exists across biology.

We would like to round this article out with an initial foray into an embodied analysis of that classic cognitive task, *language*. Our goal here is simply to take what we think is the first step: identifying the nature of a critical resource present in a language event, specifically the form and content of *linguistic information*. This can then guide and constrain the non-representational empirical investigations that we hope will follow.

#### **LANGUAGE: IT'S SPECIAL, BUT IT'S NOT MAGICAL**

Most psychologists generally assume that *catching* a fly ball and *talking about* catching a fly ball are two different *kinds* of task, in the sense that you can't use the tools appropriate to studying how to catch a fly ball to understand how we communicate through language. Language is a very interesting kind of behavior, and it has some properties that make it very special. But it is not magical; it is a product of evolution, the same as the rest of our behavior, so it makes perfect sense to expect it to be amenable to the analyses that have been so successful in other domains. In other words, our first move is simply to treat perception-action problems and language problems as the same kind of thing.

As we will discuss shortly, there is one important difference to worry about, specifically in how perceptual and linguistic information come to have their meaning. This difference, however, can only be seen by the third person, scientific analysis of the situation. An embodied approach should never forget that it's trying to explain the first person experience of the organism (a point made forcefully by Barrett, 2011) and from this perspective *there is no difference at all* between the two types of information. In its day-to-day life the organism never gets to "peer behind the curtain" – kinematic patterns in energy arrays are all we ever have access to. The job of the learning organism is to detect these patterns, and come to learn what they mean by using that information to do something. If you can use some information to intercept a fly ball, then you have demonstrated that you know that that's what the information means. Similarly, if you can use linguistic information to reply correctly to an interlocutor, you have again demonstrated that you know that that's what the information means. The basic process is the same; learn to detect the relevant structure and learn to use it appropriately.

#### **HOW INFORMATION GETS ITS MEANING**

Events in the world are identified by their underlying dynamics; these dynamics create kinematic patterns in energy arrays and these patterns can serve as *perceptual information* about the dynamics that created them (Bingham, 1995). For perception, structure in an energy array is *about* the dynamic event in the world that created the structure in the moment (for example, the optical information created by the motion of a fly ball is about the motion of the fly ball). This relationship is underwritten by ecological laws (Turvey et al., 1981) and detecting the information allows the organism to perceive the dynamical event.

Every language event (speech, writing, gesture) also creates structure in energy arrays (speech creates acoustic structure; writing and gesture creates optical structure). To an organism capable of language use, this structure can serve as *linguistic information*, and because we are treating them as the same kind of thing, we can analyze linguistic information the same way we analyze perceptual information. The only difference between perceptual information and linguistic information is in the relationship between the structure in the energy array and the meaning of the information. For language, the structure in the energy array is not about the

dynamics of, say, articulation; it's about whatever the words mean. The structure comes to have this meaning because of the social conventions of the language environment and what we learn is, therefore, a *conventional* meaning of the pattern. This conventional underpinning gives stability to linguistic information, but the difference between a law and a convention is very important. Conventions can change and so can the meaning of words; language is much less stable than perception. This decreased stability is, of course, a fact of language to be explained, so perhaps it is not a disaster for the analogy we are developing here.

#### **DO WE NEED REPRESENTATION?**

This is the point where standard cognitive science usually jumps in and claims that conventional meaning requires representational support. Linguistic information is created by the unfolding of a complex dynamic in the present time, but the meaning of this information is the conventional one that may be about something not present at that time; we can talk about things in their absence in a way that has no analogy in perception. So in what sense can linguistic information have meaning if not in the form of internal models of the people, objects, places, etc., to which the words refer?

This sticking point is, to some extent, a product of the form of the question. To ask what a word means implies something static and internal – words *have* meanings. So, our approach here is to ask the same question in a different way. As we said earlier, if someone is able to respond appropriately to linguistic information, then it is fair to say that this person knows what the information means. Instead of asking how we learn the meaning of words, we can ask, instead, how do we learn to use and respond to linguistic information? Can we respond appropriately to linguistic information without possessing mental representations? As discussed in the previous sections on robotics, quite interesting, and complex behavior can emerge without explicit internal models of it. Still, none of these robots used language.

In perception, the argument goes, representations are not necessary because the specification relationship between perceptual information and the world makes perceiving the information identical to perceiving the world (Gibson, 1966, 1979; Turvey et al., 1981). What this means is that organisms can respond appropriately to perceptual information without the need to cognitively enrich the perceptual input. The critical issue for language is whether the conventional relationship between linguistic information and what that information is about is sufficient to support something like direct perception.

Chemero (2009) has an extensive argument about how convention can indeed be sufficient, as part of his suggestion that even perceptual information can be grounded in convention. Specifically, he uses conventions as defined by the situation semantics of Barwise and Perry (1983) and we suggest that this analysis will be the place to begin to address this question in the future. To summarize the key points, Barwise and Perry proposed that information is created for organisms by *situations*; a given situation will be an instance (token) of a type of situation, and situations can be connected by constraints. If two types of situation, S1 and S2, are connected via a constraint, then a token S2 is informative about a token S1 by virtue of that constraint. An organism has access to that information if and only if they have access to one of the tokens and the constraint. This is precisely the case in the example of language. If S1 is "the situation being discussed" and S2 is "the language event of the discussion," these are connected by the constraints of the local language environment. By this account, a token of S2 (e.g., the utterance "the rain in Spain stays mainly in the plain") is informative about a token of S1 (the typical pattern of rain fall in Spain) but only to a skilled user of the English language. If the utterance was instead "La lluvia en España se mantiene principalmente en la llanura," our English language user would not be informed about S1 because they don't have access to the relevant constraints of Spanish. Situation semantics provides a formal language for talking about how linguistic information can be informative about the world even despite its basis in convention. There is much work still to do here, but as Chemero (2009) notes, this framework has the benefit of treating specifying and conventional information as the same kind of thing and it therefore seems like a good place to start a non-representational account of language meaning.

It is worth saying outright that arguing against the need for representations to support language is not the same thing as claiming that the brain has no role in language. The brain is clearly involved (as it is involved in perception/action) and an embodied approach to language will need to engage with this fact, so long as hypotheses about what the brain is doing are consistent with the embodied analysis we are applying here. For example, there is a literature on the coupling between articulation and neural dynamics as a mechanism for language comprehension. This work focuses on the production of syllables and models that in terms of oscillator dynamics which can then be coupled to the oscillator dynamics of the cortex (Luo and Poeppel, 2007; Giraud and Poeppel, 2012; Peelle and Davis, 2012). There is some dispute about whether the syllable is the correct phonetic level of analysis (Cummins, 2012), but regardless, the form of this argument matches parts of the analysis we propose here. In particular, this framework suggests a way to link linguistic information to cortical dynamics. Thus, in principle, there is no need to invoke representations to explain how linguistic information can precipitate actions. The non-representational alternative is a non-linear dynamical system where structure in energy arrays (in the form of perceptual and linguistic information) cause changes in cortical dynamics, which are coupled to limbs, mouths, etc., capable of taking action. Taking action (moving, speaking) changes the landscape of perceptual and/or linguistic information, which impacts the cortical dynamics, and so on.

#### **LANGUAGE, THOUGH SPECIAL, IS AMENABLE TO AN EMBODIED ANALYSIS**

We create linguistic information (e.g., speech or written text) to achieve goals (e.g., directing and regulating the behavior of ourselves and others). The dynamical system creating linguistic information entails the coupled dynamics of the articulators and the brain, both of which are nested in a socially defined language environment with its own dynamical properties. Language dynamics are therefore complex and defined across multiple coupled dynamical systems, but linguistic information is still being created by a dynamical event the same way perceptual information is; they are not different in kind.

This information is a critical task resource, in exactly the same way as perceptual information is a critical task resource. In fact, we argue that the similarities between the two are strong enough to import the analyses used with perception directly over to an analysis of language. The most important similarity is that *from the first person perspective of a perceiving, acting language user, learning the meaning of linguistic information, and learning the meaning of perceptual information is the same process*. The differences in the behavior supported by these two types of information (which are, indeed, important) arise from the differences in the way these two types of information come about and connect to their meaning. But the similarities mean the same basic approach to studying how we use information to perceive meaning can apply to language as much as to perception and action; a step forward in and of itself.

Although language is clearly a tremendous step up in terms of the complexity of the dynamics involved, the essential form of the analysis can remain the same. Linguistic information is a task resource in *exactly* the same way as perceptual information is a task resource, and we should treat it as such when we try to figure out how it fits into the task-specific device an organism is forming to solve a given problem. We suggest that it is vital to exhaust this strategy *first*, before leaping to the conclusion that it simply can't be done without the representations that many other cognitive systems just don't seem to require.

## **OTHER EMBODIED APPROACHES TO LANGUAGE: ANOTHER NOTE ON GROUNDING**

This is not the first attempt to embody language, but the previous efforts are more in line with the conceptualization hypothesis we reviewed above and suffer from the problems we highlighted there (as well as others; see Willems and Francken, 2012). They hypothesize that meaning is grounded in a simulation of previous experiences, a simulation which would include embodied elements of those previous experiences. Tasks measuring comprehension should reflect the presence of this kind of simulation (Barsalou, 1999). Two high profile attempts to measure these embodied simulation effects are the *action-sentence compatibility effect* (e.g., Glenberg and Kaschak, 2002) and the *sentence-picture verification task* (e.g., Stanfield and Zwaan, 2001).

#### **Action-sentence compatibility**

Glenberg and Kaschak (2002) had participants rate whether sentences were sensible. Some of the sentences implied a directional movement (e.g.,"close the drawer"implies a movement away from the person). Participants responded by moving to press a button, and the movement was either compatible or not with the implied direction in the sentence. Participants were faster when the response direction and the implied direction were compatible, and slower when they were not. The authors suggest that this demonstrates people are mentally simulating the action in the sentence in order to comprehend the sentence; "language understanding is grounded in bodily action" (Glenberg and Kaschak, 2002, p. 562).

#### **Sentence-verification task**

Stanfield and Zwaan (2001) tested the simulation hypothesis by providing people with sentences that implied an orientation for an object, e.g., "the pencil is in the cup" implies a vertical orientation while "the pencil is in the drawer" implies a horizontal orientation. They then showed people a picture of the object in a compatible or incompatible orientation and asked people to verify if the pictured object matched the sentence; participants were faster to respond in the compatible condition and vice versa.

The major problem with this research is that it again assumes all the hard work is done in the head, with perception and action merely tweaking the result. Before this type of research can tell us anything meaningful about language comprehension, more work must be done to answer some basic questions. There is no account of the resources that exist in the task presented to participants, and this is a critical part of identifying what the task is from the participants' perspective. For example, what is the information content of a picture of an object, what are the dynamics of button pressing behavior (or any response type being used), and what is the relationship between these two things – what happens if you try to control the latter using the former? These are not easy questions; for example, Gibson himself highlighted how difficult it is to establish exactly what the information content of a picture of something actually is (Gibson, 1979). But without this, you cannot begin to explain how hearing different sentences influences a button press response to those pictures. There may indeed be a story there; after all, the results have been demonstrated multiple times. But it is a story remaining to be told, and as in the rest of the work surveyed here, we think that the answer to these questions will likely lead to mental simulations being replaced the relevant dynamics identified by a task analysis.

## **CONCLUSION**

At the beginning of the twentieth century, a German teacher named Wilhelm von Osten owned a horse called Hans. Hans, he claimed, could count and do simple maths and he demonstrated this ability for several years in free shows. It wasn't until psychologist Oskar Pfungst tested this claim rigorously that the truth was revealed: Hans did not know maths, but he did know to stop tapping his hoof when his owner indicated that he had reached the correct answer (by visibly but subconsciously relaxing; von Osten was not a fraud). Abstract knowledge such as how to add is typically seen as requiring some form of internal representational state, but here, the cognitive explanation (that Hans had the internal ability to count) was *replaced* by a straight-forward perceptual coupling to his environment.

The story of Clever Hans has stood as a cautionary tale in psychology ever since; identifying an organism's actual solution to a problem requires the ability to identify *all* the potential solutions to a task followed by careful experimental testing to identify which of all the possible options are actually being used. This remains as true now as it did in 1907 when Pfungst ran his tests.

Standard cognitive science proceeds under two related assumptions that interfere with its ability to identify the actual solutions. These are poverty of stimulus, and the consequent need for internal, representational enrichment of perception. The objects and processes of standard cognitive psychology have a specific job to do that reflects the hypothesized need to enrich perceptual information. But these assumptions mean that cognitive research never even tests the genuinely embodied alternative solutions we now know are viable options.

Replacement style embodied cognition removes these assumptions and instead looks at all the resources in the environment that might support complex behavior and, critically, the information that might serve to tie them together. One of the most important discoveries of the last 40 years has been that there is, in fact, rich and varied information in the environment (Gibson, 1966, 1979) 3 that we are able to use to produce all manner of complex behaviors. The availability of this high quality perceptual information removes the need to invoke any additional cognitive constructs to explain interesting behaviors. Our behavior emerges from a pool of potential task resources that include the body, the environment and, yes, the brain. Careful analysis is required to discover exactly which of these resources and the relations between them form the actual solution used to solve a given task.

It is true that replacement style embodied cognition cannot currently explain everything that we do (Shapiro, 2011). Even some of the most enthusiastic researchers in embodied cognition think that there are "representation hungry" problems, which simply cannot be solved without something like an object or process from standard cognitive psychology (Clark and Toribio, 1994); language is the major case here. We are more optimistic. All that we can really conclude at this time is that replacement style embodied cognition cannot explain these problems *yet*. We believe that there is no principled reason why these behaviors cannot be explained with replacement style embodied solutions, given that human beings are, we think, best described as the kind of perceiving, acting, embodied, non-linear dynamical systems doing the replacing. This optimism reflects the successes we've described here, and especially the fact that when embodied cognition researchers *have* turned their attention to "representation hungry" problems, they have actually had great success. The embodied analysis of the A-not-B error remains the best example of this; it literally replaces "thinking about things in their absence" with embodied action. Another example is the work with *Portia* spiders (see above and Barrett, 2011 for a review). We have suggested a further step forward here, with an initial analysis of language that replaces what words *mean* with what language lets us *do*; of course, it remains to be seen if this is as successful (but, see also Port and Leary, 2005; Port, 2007; for more on tackling language).

Replacement style embodied cognition research has produced methods, formal tools (primarily in the form of dynamical systems models) and a great number of empirical successes. The explanations it produces place embodiment at the *center* of the organism's solution to a given task, rather than on the periphery, and this is the research we feel deserves the name embodied cognition.

<sup>3</sup>Three recent reviews of how Gibson's work in visual perception underpins much of the embodied cognition literature include Barrett (2011), Chemero (2009), and Shapiro (2011).

## **REFERENCES**


Chomsky, N. (1959). Verbal behavior. *Language* 35, 26–58.


principles and operations. *Nat. Neurosci.* 15, 511–517.


geometric and experiencedependent spatial categories. *J. Exp. Psychol. Gen.* 131, 16–37.


acting: in reply to Fodor and Pylyshyn (1981). *Cognition* 9, 237–304.


for long-distance throwing: smart mechanism or function learning? *J. Exp. Psychol. Hum. Percept. Perform.* 36, 862–875.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 August 2012; accepted: 26 January 2013; published online: 12 February 2013.*

*Citation: Wilson AD and Golonka S (2013) Embodied cognition is not what you think it is. Front. Psychology 4:58. doi: 10.3389/fpsyg.2013.00058*

*This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.*

*Copyright © 2013 Wilson and Golonka. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*